Skip to content
The Humanness Index™
Built by VapiGitHub

The Humanness Index™

The open benchmark for how human voice AI sounds. Built and operated by Vapi.

MethodologyGitHubContactvapi.ai

Code is Apache-2.0. Standings data is CC BY 4.0. Audio clips and source voices are licensed recordings, all rights reserved. Provider logomarks belong to their respective owners and are used nominatively. “The Humanness Index™” name and logo are Vapi trademarks; see TRADEMARKS.md.

  1. Humanness Index™
  2. xAI
  3. Grok TTS

Humanness Index™ · TTS model

xAI

Grok TTS

by xAI

Grok TTS is the text to speech model behind Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support.

Rank
#1
Humanness
100
Likely rank
#1–7
Blind votes
108

Standings as of Jun 13, 2026, 00:15 UTC

LowerHigher

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Grok TTS key stats

Latency (measured)
460 ms1
Languages
202
Price / 1M chars
$153
Released
April 17, 20264
  1. Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Median of 50 sequential live WS trials with optimize_streaming_latency disabled (the flag is the only difference from the Streaming config), June 2026; includes network RTT.
  2. docs.x.ai/developers/model-capabilities/audio/voice (checked 2026-06-10)
  3. x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) $15.00 per 1M characters per the launch post (x.ai/api/voice; docs.x.ai/developers/models/text-to-speech). Secondary coverage reported $4.20 per 1M; the launch post figure is used.
  4. x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) Standalone TTS API GA; the Grok Voice stack has been public since 2025-12-17.

Background

Grok TTS is the text to speech model behind Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support. xAI built the stack in house, from voice activity detection to the audio models themselves, and opened it to developers through the Grok Voice Agent API in December 2025 and a standalone TTS API in April 2026. The API offers five expressive voices across 20 languages, with inline speech tags like [laugh] and [whisper] for fine grained delivery control. On the Humanness Index™ it is the voice to beat: listeners pick it as the more human option more often than any other model in the field.

Sources: x.ai, x.ai

Release history

The Grok Voice stack went public in December 2025 with the Grok Voice Agent API at $0.05 per minute. The standalone TTS API reached general availability on April 17, 2026 at a flat $15.00 per 1M characters, with REST requests accepting up to 15,000 characters.

Sources: docs.x.ai

Position in the rankings

Standings as of Jun 13, 2026, 00:15 UTC

RankProviderModelHumannessLatency
#1xAIxAIGrok TTS100460 ms
#2xAIxAIGrok TTS (Streaming)98285 ms
#3CartesiaCartesiaSonic 3.582128 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Grok TTS tested on the Humanness Index™?
Listeners hear Grok TTS against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
What latency does Grok TTS have?
We measured a 460 ms median time to first audio over 50 live trials in June 2026. The two Grok entries share one WebSocket API and differ only by the optimize_streaming_latency flag; Grok TTS is measured with the flag disabled, and the Streaming entry, measured with it enabled, returned 285 ms.

Keep exploring

xAIxAIAll xAI models on the IndexxAIGrok TTS (Streaming)Rank #2 · Humanness 98

Back to the Humanness Index™

How human does your model really sound?

The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.

Add your modelStar on GitHub