Background
Multilingual v2 is the long standing ElevenLabs quality flagship, launched in August 2023 as the model that took the platform out of beta and extended its voices from English to 29 languages. ElevenLabs still describes it as its most lifelike model with rich emotional expression, and it remains the default recommendation for narration, audiobooks, and other pre rendered work where fidelity matters more than speed.
Sources: elevenlabs.io, techcrunch.com
At a glance
The quality counterpart to the realtime Turbo and Flash families: on the independent Coval benchmark it posts the best word error rate of any ElevenLabs model (3.9 percent) but batch class latency. Our own 50 trial benchmark measured a median of 1006 ms to first audio on its chunked HTTP streaming endpoint, the highest on the Index, which is why it competes here on humanness rather than speed.
Sources: benchmarks.coval.ai, elevenlabs.io
Position in the rankings
Standings as of Jun 13, 2026, 01:14 UTC
Frequently asked questions
- How is Multilingual v2 tested on the Humanness Index™?
- Listeners hear Multilingual v2 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
- Why use Multilingual v2 over Flash or Turbo?
- Quality over latency. Multilingual v2 is tuned for lifelike, emotionally rich output and posts the best word error rate of any ElevenLabs model on the independent Coval benchmark, but we measured a 1006 ms median time to first audio. Real time agents should look at the Flash and Turbo families; narration and pre rendered audio is where Multilingual v2 fits.
How human does your model really sound?
The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.