Skip to content
The Humanness Index™
Built by VapiGitHub

The Humanness Index™

The open benchmark for how human voice AI sounds. Built and operated by Vapi.

MethodologyGitHubContactvapi.ai

Code is Apache-2.0. Standings data is CC BY 4.0. Audio clips and source voices are licensed recordings, all rights reserved. Provider logomarks belong to their respective owners and are used nominatively. “The Humanness Index™” name and logo are Vapi trademarks; see TRADEMARKS.md.

  1. Humanness Index™
  2. xAI

Humanness Index™ · Provider

xAI

xAI

x.ai

xAI built its voice stack fully in house, from voice activity detection to the audio models themselves, for Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support.

Best ranked model
#1 Grok TTS
Humanness
100

Standings as of Jun 13, 2026, 00:15 UTC

xAI
Models on the Index
2
Languages
20
Price / 1M chars
$15
Visit xAI

xAI models on the Humanness Index™

RankModelHumannessLatencyLanguagesPrice / 1M chars
#1Grok TTS100460 ms20$15
#2Grok TTS (Streaming)98285 ms20$15

Compare against the full Humanness Index™ rankings

About xAI

xAI built its voice stack fully in house, from voice activity detection to the audio models themselves, for Grok Voice, the assistant that ships on Grok mobile apps, Tesla vehicles, and Starlink customer support. The Grok Voice Agent API opened the stack to developers in December 2025, and standalone TTS and STT APIs followed in April 2026.

Sources: x.ai, x.ai

The Grok voice stack

Grok TTS offers five expressive voices (Ara, Eve, Leo, Rex, and Sal, with Eve as the default) across 20 languages, with inline speech tags like [laugh], [sigh], and [whisper] for delivery control. REST requests accept up to 15,000 characters, and a WebSocket streaming variant accepts unbounded input. Both variants currently sit at the top of the Humanness Index™.

Sources: docs.x.ai

xAI stats

Languages
201
Price / 1M chars
$152
  1. docs.x.ai/developers/model-capabilities/audio/voice (checked 2026-06-10)
  2. x.ai/news/grok-stt-and-tts-apis (checked 2026-06-10) $15.00 per 1M characters per the launch post (x.ai/api/voice; docs.x.ai/developers/models/text-to-speech). Secondary coverage reported $4.20 per 1M; the launch post figure is used.

Other providers on the Index

ElevenLabsElevenLabsBest ranked model #5 · Eleven v3CartesiaCartesiaBest ranked model #3 · Sonic 3.5MiniMaxMiniMaxBest ranked model #8 · Speech 2.5GradiumGradiumBest ranked model #19 · Gradium TTSCanopy LabsCanopy LabsBest ranked model #4 · OrpheusInworldInworldBest ranked model #9 · TTS-2Smallest.aiSmallest.aiBest ranked model #11 · Lightning v3.1NeuphonicNeuphonicBest ranked model #13 · neu_hq

Back to the Humanness Index™

How human does your model really sound?

The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.

Add your modelStar on GitHub