The Humanness Index™

Which voice model sounds the most human?

Name: The Humanness Index™ leaderboard
Creator: Vapi
License: https://creativecommons.org/licenses/by/4.0/

Sounding human is hard to measure, but it's what decides whether a call works. We clone one voice onto every model and play them blind against a real human, so you can hear which ones pass.

Read the whitepaper

Which voice sounds more human?

Same voice, different models.

Read along

So I can see here that the package was marked as delivered on Tuesday, but if you're saying it never arrived then what we'll do is... let me just. Yeah, I'm going to open a lost package investigation for you. That usually takes about forty-eight hours to resolve.

←→ play each side · space vote, then next pair

Step 1
Same voice, every model
We clone one conversational voice onto every model, so you're judging the model, not its demo reel.
Step 2
You listen blind
Two voices, same line, no labels. Pick the one that sounds more human.
Step 3
A real human sets the bar
Blind votes are fit into a rating, with a real human at 100. The higher the score, the more human the model sounds.

Humanness distribution

20 Models10 providers11600 unique votes

Color = rank Average

Why latency matters. A voice that lags breaks the conversation, no matter how human it sounds.

Likely Rank		Model
Baseline	Human	Homo Sapien	100	1296	—	—	617
#1–5	ElevenLabs	Eleven v3	96	1282	758 ms	$100	582
#1–6	xAI	Grok TTS	94	1275	460 ms	$15	585
#1–6	MiniMax	Speech 2.8	91	1268	325 ms	$60	557
#2–7	Canopy Labs	Orpheus	89	1260	—	Open source	556
#2–7	MiniMax	Speech 2 HD	89	1259	357 ms	$100	539
#3–7	xAI	Grok TTS (Streaming)	86	1252	285 ms	$15	548
#1–16	Speechify	Simba 3.2	83	1241	—	$10	45
#7–12	Inworld	TTS-1.5-max	78	1223	337 ms	$35	476
#7–12	ElevenLabs	Flash v2	76	1219	226 ms	$50	481
#7–13	ElevenLabs	Turbo v2	75	1216	302 ms	$50	465

The Index only includes models that support voice cloning: each battle plays the same cloned source voice through both models, so the comparison is head to head and fair. The Index is an open benchmark and is independent of the Vapi product: a model appearing here does not mean it is available in Vapi, and availability in Vapi plays no part in scoring. Don't see your model on this list? Contact us at humannessindex@vapi.ai.

What we Listen for

What makes a voice sound human?

Humanness doesn't break down into features. You either believe there's a person on the other end, or you don't. When that belief breaks, it's usually because of one of these.

Expressiveness

Emotion and emphasis. Stressing the right words, sounding like it means what it says instead of reading text aloud.

Tone & prosody

The intonation, rhythm, and melody of speech. The natural rise and fall of how people actually talk.

Artifacts

The little human sounds: breaths, stutters, natural pauses. A voice with none of them sounds too clean to be real.

Why trust this benchmark?

Any model can sound good on its own demo voice. The real test is how it handles your use case. We clone one voice across every model so the comparison is fair. Models that can't clone a voice can't be tested fairly, so they're not listed.

Most Human Models

#1 · Humanness leader

ElevenLabs

Eleven v3

Humanness

Latency: 758 ms
Languages: 70+
Votes: 582

ElevenLabs Eleven v3 currently leads the Humanness Index™. Across blind listening tests it's judged the most human-sounding voice in the field, the kind of delivery that holds up with real callers in production, not just in a demo.

xAI

Grok TTS

Humanness

Latency: 460 ms
Languages: 20
Votes: 585

MiniMax

Speech 2.8

Humanness

Latency: 325 ms
Languages: 40
Votes: 557

Voting in progress

Rankings are provisional

We're keeping the podium under wraps while the votes come in. Listen and vote above, and the most human models reveal once the standings settle.

Why this exists

Picking a TTS model for a voice agent comes down to one thing: does it sound human enough that people forget they're talking to software? You can't get that from demos or vendor claims. So we made it measurable and took the call out of our own hands: one voice cloned onto every model, played blind with no names attached, scored against a real human by the people who hear it.

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

Which voice model sounds the most human?

How it works

Same voice, every model

You listen blind

A real human sets the bar

Humanness Rankings

Humanness distribution

What makes a voice sound human?

Expressiveness

Tone & prosody

Artifacts

Why trust this benchmark?

Most Human Models

Why this exists

Find the most human-sounding voice for your agent.