Humanness Index™ · TTS model

ElevenLabs Multilingual v2

Retired from the arena

Multilingual v2 is the long standing ElevenLabs quality flagship, launched in August 2023 as the model that took the platform out of beta and extended its voices from English to 29 languages.

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Multilingual v2 key stats

Latency (measured): 1006 ms¹
Languages: 29²
Price / 1M chars: $100³
Released: August 22, 2023⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Measured on the chunked HTTP /stream endpoint (the realtime stream-input WebSocket rejects non-realtime models); median of 50 sequential trials, June 2026, including network RTT.
elevenlabs.io/docs/overview/models (checked 2026-06-11)
elevenlabs.io/pricing/api (checked 2026-06-11) Bills at the Multilingual v2 / v3 ElevenAPI rate: $0.10 per 1k characters = $100 per 1M.
techcrunch.com/2023/08/22/elevenlabs-voice-generating-tools-launch-out-of-beta/ (checked 2026-06-11) Launched alongside the ElevenLabs beta exit; the model docs publish no exact date.

Background

Multilingual v2 is the long standing ElevenLabs quality flagship, launched in August 2023 as the model that took the platform out of beta and extended its voices from English to 29 languages. ElevenLabs still describes it as its most lifelike model with rich emotional expression, and it remains the default recommendation for narration, audiobooks, and other pre rendered work where fidelity matters more than speed.

Sources: elevenlabs.io, techcrunch.com

At a glance

The quality counterpart to the realtime Turbo and Flash families: on the independent Coval benchmark it posts the best word error rate of any ElevenLabs model (3.9 percent) but batch class latency. Our own 50 trial benchmark measured a median of 1006 ms to first audio on its chunked HTTP streaming endpoint, the highest on the Index, which is why it competes here on humanness rather than speed.

Sources: benchmarks.coval.ai, elevenlabs.io

Frequently asked questions

How is Multilingual v2 tested on the Humanness Index™?: Listeners hear Multilingual v2 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
Why use Multilingual v2 over Flash or Turbo?: Quality over latency. Multilingual v2 is tuned for lifelike, emotionally rich output and posts the best word error rate of any ElevenLabs model on the independent Coval benchmark, but we measured a 1006 ms median time to first audio. Real time agents should look at the Flash and Turbo families; narration and pre rendered audio is where Multilingual v2 fits.

Keep exploring

ElevenLabsAll ElevenLabs models on the Index Turbo v2Rank #10 · Humanness 75 Turbo v2.5Latency 265 ms Flash v2Rank #9 · Humanness 76 Flash v2.5Rank #14 · Humanness 68 Eleven v3Rank #1 · Humanness 96

Back to the Humanness Index™

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

Multilingual v2 key stats

Latency (measured)

1006 ms¹

Languages

29²

Price / 1M chars

$100³

Released

August 22, 2023⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Measured on the chunked HTTP /stream endpoint (the realtime stream-input WebSocket rejects non-realtime models); median of 50 sequential trials, June 2026, including network RTT.

elevenlabs.io/docs/overview/models (checked 2026-06-11)

elevenlabs.io/pricing/api (checked 2026-06-11) Bills at the Multilingual v2 / v3 ElevenAPI rate: $0.10 per 1k characters = $100 per 1M.

techcrunch.com/2023/08/22/elevenlabs-voice-generating-tools-launch-out-of-beta/ (checked 2026-06-11) Launched alongside the ElevenLabs beta exit; the model docs publish no exact date.

Background

At a glance

Frequently asked questions

How is Multilingual v2 tested on the Humanness Index™?

Listeners hear Multilingual v2 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.

Why use Multilingual v2 over Flash or Turbo?

Quality over latency. Multilingual v2 is tuned for lifelike, emotionally rich output and posts the best word error rate of any ElevenLabs model on the independent Coval benchmark, but we measured a 1006 ms median time to first audio. Real time agents should look at the Flash and Turbo families; narration and pre rendered audio is where Multilingual v2 fits.