Humanness Index™ · TTS model

xAI Grok TTS

Grok TTS is the text to speech model behind Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support.

Rank: #2
Humanness: 94
Likely rank: #1–6
Blind votes: 1,045

Standings as of Jul 28, 2026, 02:17 UTC

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Grok TTS key stats

Latency (measured): 460 ms¹
Languages: 20²
Price / 1M chars: $15³
Released: April 17, 2026⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Median of 50 sequential live WS trials with optimize_streaming_latency disabled (the flag is the only difference from the Streaming config), June 2026; includes network RTT.
docs.x.ai/developers/model-capabilities/audio/voice (checked 2026-06-10)
x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) $15.00 per 1M characters per the launch post (x.ai/api/voice; docs.x.ai/developers/models/text-to-speech). Secondary coverage reported $4.20 per 1M; the launch post figure is used.
x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) Standalone TTS API GA; the Grok Voice stack has been public since 2025-12-17.

Background

Grok TTS is the text to speech model behind Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support. xAI built the stack in house, from voice activity detection to the audio models themselves, and opened it to developers through the Grok Voice Agent API in December 2025 and a standalone TTS API in April 2026. The API offers five expressive voices across 20 languages, with inline speech tags like [laugh] and [whisper] for fine grained delivery control. On the Humanness Index™ it is the voice to beat: listeners pick it as the more human option more often than any other model in the field.

Sources: x.ai, x.ai

Release history

The Grok Voice stack went public in December 2025 with the Grok Voice Agent API at $0.05 per minute. The standalone TTS API reached general availability on April 17, 2026 at a flat $15.00 per 1M characters, with REST requests accepting up to 15,000 characters.

Sources: docs.x.ai

Position in the rankings

Standings as of Jul 28, 2026, 02:17 UTC

Rank	Provider	Model	Humanness	Latency
Baseline	Human	Homo Sapien	100	—
#1	ElevenLabs	Eleven v3	96	758 ms
#2	xAI	Grok TTS	94	460 ms
#3	MiniMax	Speech 2.8	91	325 ms
#4	Canopy Labs	Orpheus	89	—

See the full Humanness Index™ rankings

Frequently asked questions

How is Grok TTS tested on the Humanness Index™?: Listeners hear Grok TTS against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
What latency does Grok TTS have?: We measured a 460 ms median time to first audio over 50 live trials in June 2026. The two Grok entries share one WebSocket API and differ only by the optimize_streaming_latency flag; Grok TTS is measured with the flag disabled, and the Streaming entry, measured with it enabled, returned 285 ms.

Keep exploring

xAIAll xAI models on the Index Grok TTS (Streaming)Rank #6 · Humanness 86

Back to the Humanness Index™

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

Grok TTS key stats

Latency (measured)

460 ms¹

Languages

20²

Price / 1M chars

$15³

Released

April 17, 2026⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Median of 50 sequential live WS trials with optimize_streaming_latency disabled (the flag is the only difference from the Streaming config), June 2026; includes network RTT.

docs.x.ai/developers/model-capabilities/audio/voice (checked 2026-06-10)

x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) $15.00 per 1M characters per the launch post (x.ai/api/voice; docs.x.ai/developers/models/text-to-speech). Secondary coverage reported $4.20 per 1M; the launch post figure is used.

x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) Standalone TTS API GA; the Grok Voice stack has been public since 2025-12-17.

Background

Rank

Provider

Model

Humanness

Latency

Baseline

Human

Homo Sapien

100

—

ElevenLabs

Eleven v3

758 ms

xAI

Grok TTS

460 ms

MiniMax

Speech 2.8

325 ms

Canopy Labs

Orpheus

—

Frequently asked questions

How is Grok TTS tested on the Humanness Index™?

Listeners hear Grok TTS against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.

What latency does Grok TTS have?

We measured a 460 ms median time to first audio over 50 live trials in June 2026. The two Grok entries share one WebSocket API and differ only by the optimize_streaming_latency flag; Grok TTS is measured with the flag disabled, and the Streaming entry, measured with it enabled, returned 285 ms.