Humanness Index™ · TTS model

ElevenLabs Flash v2

Flash v2 is ElevenLabs' ultra low latency English model, announced in December 2024 with speech generation around 75 ms plus network overhead.

Rank: #9
Humanness: 76
Likely rank: #7–12
Blind votes: 1,049

Standings as of Jul 28, 2026, 01:54 UTC

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Flash v2 key stats

Latency (measured): 226 ms¹
Languages: English²
Price / 1M chars: $50³
Released: December 18, 2024⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.
elevenlabs.io/docs/overview/models (checked 2026-06-10)
elevenlabs.io/pricing/api (checked 2026-06-10) ElevenAPI pay-as-you-go: Flash/Turbo $0.05 per 1k characters = $50 per 1M. Eleven v3 and Multilingual v2 bill $0.10 per 1k ($100 per 1M), encoded per model.
elevenlabs.io/blog/meet-flash (checked 2026-06-10)

Background

Flash v2 is ElevenLabs' ultra low latency English model, announced in December 2024 with speech generation around 75 ms plus network overhead. It trades a little of the Turbo family's expressiveness for speed, and ElevenLabs recommends it for real time conversational agents that only need English.

Sources: elevenlabs.io

At a glance

Flash v2 is the fastest English path on the ElevenLabs platform. In our 50 trial streaming benchmark it returned first audio in a median of 226 ms including network time from the benchmark machine.

Sources: elevenlabs.io

Position in the rankings

Standings as of Jul 28, 2026, 01:54 UTC

Rank	Provider	Model	Humanness	Latency
#7	Speechify	Simba 3.2	83	—
#8	Inworld	TTS-1.5-max	78	337 ms
#9	ElevenLabs	Flash v2	76	226 ms
#10	ElevenLabs	Turbo v2	75	302 ms
#11	Inworld	TTS-2	71	288 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Flash v2 tested on the Humanness Index™?: Listeners hear Flash v2 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
What latency does Flash v2 have?: ElevenLabs publishes roughly 75 ms generation time. In our own 50 trial streaming benchmark, which includes network time from the benchmark machine, Flash v2 returned first audio in a median of 226 ms.

Keep exploring

ElevenLabsAll ElevenLabs models on the Index Turbo v2Rank #10 · Humanness 75 Turbo v2.5Latency 265 ms Flash v2.5Rank #14 · Humanness 68 Eleven v3Rank #1 · Humanness 96 Multilingual v2Latency 1006 ms

Back to the Humanness Index™

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

Flash v2 key stats

Latency (measured)

226 ms¹

Languages

English²

Price / 1M chars

$50³

Released

December 18, 2024⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.

elevenlabs.io/docs/overview/models (checked 2026-06-10)

elevenlabs.io/pricing/api (checked 2026-06-10) ElevenAPI pay-as-you-go: Flash/Turbo $0.05 per 1k characters = $50 per 1M. Eleven v3 and Multilingual v2 bill $0.10 per 1k ($100 per 1M), encoded per model.

elevenlabs.io/blog/meet-flash (checked 2026-06-10)

Background

Rank

Provider

Model

Humanness

Latency

Speechify

Simba 3.2

—

Inworld

TTS-1.5-max

337 ms

ElevenLabs

Flash v2

226 ms

#10

ElevenLabs

Turbo v2

302 ms

#11

Inworld

TTS-2

288 ms

Frequently asked questions

How is Flash v2 tested on the Humanness Index™?

Listeners hear Flash v2 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.

What latency does Flash v2 have?

ElevenLabs publishes roughly 75 ms generation time. In our own 50 trial streaming benchmark, which includes network time from the benchmark machine, Flash v2 returned first audio in a median of 226 ms.