Skip to content
The Humanness Index™
Built by VapiGitHub

The Humanness Index™

The open benchmark for how human voice AI sounds. Built and operated by Vapi.

MethodologyGitHubContactvapi.ai

Code is Apache-2.0. Standings data is CC BY 4.0. Audio clips and source voices are licensed recordings, all rights reserved. Provider logomarks belong to their respective owners and are used nominatively. “The Humanness Index™” name and logo are Vapi trademarks; see TRADEMARKS.md.

  1. Humanness Index™
  2. ElevenLabs
  3. Multilingual v2

Humanness Index™ · TTS model

ElevenLabs

Multilingual v2

by ElevenLabs

Multilingual v2 is the long standing ElevenLabs quality flagship, launched in August 2023 as the model that took the platform out of beta and extended its voices from English to 29 languages.

Rank
#16
Humanness
59
Likely rank
#1–20
Blind votes
3

Standings as of Jun 13, 2026, 01:14 UTC

LowerHigher

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Multilingual v2 key stats

Latency (measured)
1006 ms1
Languages
292
Price / 1M chars
$1003
Released
August 22, 20234
  1. Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Measured on the chunked HTTP /stream endpoint (the realtime stream-input WebSocket rejects non-realtime models); median of 50 sequential trials, June 2026, including network RTT.
  2. elevenlabs.io/docs/overview/models (checked 2026-06-11)
  3. elevenlabs.io/pricing/api (checked 2026-06-11) Bills at the Multilingual v2 / v3 ElevenAPI rate: $0.10 per 1k characters = $100 per 1M.
  4. techcrunch.com/2023/08/22/elevenlabs-voice-generating-tools-launch-out-of-beta/ (checked 2026-06-11) Launched alongside the ElevenLabs beta exit; the model docs publish no exact date.

Background

Multilingual v2 is the long standing ElevenLabs quality flagship, launched in August 2023 as the model that took the platform out of beta and extended its voices from English to 29 languages. ElevenLabs still describes it as its most lifelike model with rich emotional expression, and it remains the default recommendation for narration, audiobooks, and other pre rendered work where fidelity matters more than speed.

Sources: elevenlabs.io, techcrunch.com

At a glance

The quality counterpart to the realtime Turbo and Flash families: on the independent Coval benchmark it posts the best word error rate of any ElevenLabs model (3.9 percent) but batch class latency. Our own 50 trial benchmark measured a median of 1006 ms to first audio on its chunked HTTP streaming endpoint, the highest on the Index, which is why it competes here on humanness rather than speed.

Sources: benchmarks.coval.ai, elevenlabs.io

Position in the rankings

Standings as of Jun 13, 2026, 01:14 UTC

RankProviderModelHumannessLatency
#14Smallest.aiSmallest.aiLightning v3.162420 ms
#15MiniMaxMiniMaxSpeech 2 HD62357 ms
#16ElevenLabsElevenLabsMultilingual v2591006 ms
#17ElevenLabsElevenLabsFlash v259226 ms
#18CartesiaCartesiaSonic 244159 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Multilingual v2 tested on the Humanness Index™?
Listeners hear Multilingual v2 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
Why use Multilingual v2 over Flash or Turbo?
Quality over latency. Multilingual v2 is tuned for lifelike, emotionally rich output and posts the best word error rate of any ElevenLabs model on the independent Coval benchmark, but we measured a 1006 ms median time to first audio. Real time agents should look at the Flash and Turbo families; narration and pre rendered audio is where Multilingual v2 fits.

Keep exploring

ElevenLabsElevenLabsAll ElevenLabs models on the IndexElevenLabsTurbo v2Rank #12 · Humanness 64ElevenLabsTurbo v2.5Rank #6 · Humanness 78ElevenLabsFlash v2Rank #17 · Humanness 59ElevenLabsFlash v2.5Rank #7 · Humanness 73ElevenLabsEleven v3Rank #5 · Humanness 79

Back to the Humanness Index™

How human does your model really sound?

The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.

Add your modelStar on GitHub