xAI built its voice stack fully in house, from voice activity detection to the audio models themselves, for Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support.
xAI built its voice stack fully in house, from voice activity detection to the audio models themselves, for Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support. The Grok Voice Agent API opened the stack to developers in December 2025, and standalone TTS and STT APIs followed in April 2026.
Grok TTS offers five expressive voices (Ara, Eve, Leo, Rex, and Sal, with Eve as the default) across 20 languages, with inline speech tags like [laugh], [sigh], and [whisper] for delivery control. REST requests accept up to 15,000 characters, and a WebSocket streaming variant accepts unbounded input. Both variants currently sit at the top of the Humanness Index™.
x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) $15.00 per 1M characters per the launch post (x.ai/api/voice; docs.x.ai/developers/models/text-to-speech). Secondary coverage reported $4.20 per 1M; the launch post figure is used.