Cryptopolitan
2026-05-07 23:50:42

GPT-Realtime-2 brings GPT-5 intelligence to voice API

OpenAI released a new generation of voice models in its API on Wednesday, giving developers tools to build apps that can reason through spoken requests, translate across +70 languages, and transcribe speech as it happens. The three models are named GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. They move AI voice interfaces beyond simple Q&A exchanges into a territory where an AI agent can listen, think, and act mid-conversation. GPT-Realtime-2 brings sharper reasoning to voice GPT-Realtime-2 is the flagship. OpenAI says it offers GPT-5-class reasoning, a significant step up from its predecessor, GPT-Realtime-1.5. The model scored 15.2% higher on Big Bench Audio, a benchmark for audio intelligence , and 13.8% higher on Audio MultiChallenge, which tests instruction following in multi-turn spoken dialogue. The practical upgrades target developers building production voice agents. The model now supports a 128K context window, quadrupled from the previous 32K limit, and offers five tiers of adjustable reasoning effort from “minimal” to “xhigh.” It can call multiple tools simultaneously, recover from errors with spoken acknowledgments, and produce short bridging phrases like “let me check that” while processing a request. GPT-Realtime-Translate handles live speech translation. It accepts more than 70 input languages and outputs in 13, designed to keep pace with a speaker in real time. GPT-Realtime-Whisper provides streaming speech-to-text (STT), transcribing words as they are spoken rather than waiting for a completed utterance. Zillow, Deutsche Telekom test the models in production Several companies got early access. Zillow is building a voice assistant that can process complex real estate queries, handle tool calls to search listings, and comply with Fair Housing regulations. The company reported a 26-point improvement in call success rate on its hardest adversarial benchmark after prompt optimization with GPT-Realtime-2, reaching 95% compared to 69% previously. Deutsche Telekom is testing real-time translation for customer support, allowing callers to speak in their preferred language while the model handles the conversion on both sides. Priceline is exploring a voice based travel assistant that could manage flight searches, hotel changes, and on-the-ground translation in a single session. The models target companies looking to expand customer service capabilities, but also noted potential applications across education, media, events, and creator platforms. OpenAI said it built content moderation into the new models, with triggers that can halt conversations detected as violating harmful content guidelines. The company framed the guardrails as protection against spam, fraud, and other forms of abuse. On pricing, the Translate and Whisper models bill by the minute. GPT-Realtime-2 bills by token consumption. All three are available through OpenAI’s Realtime API, accessible via WebRTC, WebSocket, and SIP connection methods. Don’t just read crypto news. Understand it. Subscribe to our newsletter. It's free .

가장 많이 읽은 뉴스

관련뉴스

Crypto 뉴스 레터 받기
면책 조항 읽기 : 본 웹 사이트, 하이퍼 링크 사이트, 관련 응용 프로그램, 포럼, 블로그, 소셜 미디어 계정 및 기타 플랫폼 (이하 "사이트")에 제공된 모든 콘텐츠는 제 3 자 출처에서 구입 한 일반적인 정보 용입니다. 우리는 정확성과 업데이트 성을 포함하여 우리의 콘텐츠와 관련하여 어떠한 종류의 보증도하지 않습니다. 우리가 제공하는 컨텐츠의 어떤 부분도 금융 조언, 법률 자문 또는 기타 용도에 대한 귀하의 특정 신뢰를위한 다른 형태의 조언을 구성하지 않습니다. 당사 콘텐츠의 사용 또는 의존은 전적으로 귀하의 책임과 재량에 달려 있습니다. 당신은 그들에게 의존하기 전에 우리 자신의 연구를 수행하고, 검토하고, 분석하고, 검증해야합니다. 거래는 큰 손실로 이어질 수있는 매우 위험한 활동이므로 결정을 내리기 전에 재무 고문에게 문의하십시오. 본 사이트의 어떠한 콘텐츠도 모집 또는 제공을 목적으로하지 않습니다.