OpenAI Releases Three Realtime Audio Models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API
OpenAI released three new audio models through its Realtime API, each targeting a distinct capability in live voice applications: GPT-Realtime-2 for voice agents with reasoning, GPT-Realtime-Translate for live speech translation, and GPT-Realtime-Whisper for streaming transcription. Alongside the model releases, the Realtime API officially exits beta and is now generally available — a meaningful signal for developers who held off building production systems on it. All three models are available immediately through the OpenAI API and can be tested in the Playground. Together, they push voice applications past the basic question-and-answer loop — toward systems that can listen, reason, translate, transcribe, and act within a single conversation. GPT-Realtime-2: Voice Reasoning with a 128K Context Window The flagship release is GPT-Realtime-2, which OpenAI team describes as its first voice model with GPT-5-class reasoning. GPT-Realtime-2 can process harder requests, manage in...
