Humanness Index™ · TTS model

MiniMax Speech 2 Turbo

by MiniMax

Rank: #13
Humanness: 70
Likely rank: #9–16
Blind votes: 1,042

Standings as of Jul 28, 2026, 02:12 UTC

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Speech 2 Turbo key stats

Latency (measured): 315 ms¹
Languages: 32²
Price / 1M chars: $60³
Released: April 2025⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.
arxiv.org/abs/2505.07916 (checked 2026-06-11) MiniMax-Speech technical report (the Speech-02 architecture) lists 32 languages.
platform.minimax.io/docs/guides/pricing-paygo (checked 2026-06-11) T2A pay-as-you-go: speech-02-turbo at $60 per 1M characters.
minimax.io/news/speech-02-series (checked 2026-06-11) Series launch post dated April 2, 2025; rollout coverage ran through May 2025.

Background

Speech 2 Turbo (speech-02-turbo in the API) is the realtime tier of the MiniMax Speech 2 generation, launched in April 2025 and optimized for low latency interactive applications like voice agents and live translation. It shares the generation's learnable speaker encoder, which clones a voice from roughly ten seconds of audio without a transcript across 32 languages, and it ranked third on the Artificial Analysis Speech Arena at release while its HD sibling held first.

Sources: minimax.io, arxiv.org

At a glance

The realtime member of the Speech 2 pair and the direct ancestor of the Speech 2.8 entry already on the Index. In our 50 trial streaming benchmark it returned first audio in a median of 315 ms including network time, in line with the 325 ms we measured for its successor.

Sources: platform.minimax.io

Position in the rankings

Standings as of Jul 28, 2026, 02:12 UTC

Rank	Provider	Model	Humanness	Latency
#11	Inworld	TTS-2	71	288 ms
#12	Cartesia	Sonic 3.5	70	128 ms
#13	MiniMax	Speech 2 Turbo	70	315 ms
#14	ElevenLabs	Flash v2.5	68	197 ms
#15	Cartesia	Sonic 2	67	159 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Speech 2 Turbo tested on the Humanness Index™?: Listeners hear Speech 2 Turbo against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
How does Speech 2 Turbo relate to the Speech 2.8 entry?: The Speech 2.8 entry on the Index runs the newer generation, turbo tier; Speech 2 Turbo is the April 2025 realtime tier that preceded it. Both share the MiniMax cloning stack, so the blind tests compare generations of the same family directly.

Keep exploring

MiniMaxAll MiniMax models on the Index Speech 2.8Rank #3 · Humanness 91 Speech 2 HDRank #5 · Humanness 89

Back to the Humanness Index™

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

Speech 2 Turbo key stats

Latency (measured)

315 ms¹

Languages

32²

Price / 1M chars

$60³

Released

April 2025⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.

arxiv.org/abs/2505.07916 (checked 2026-06-11) MiniMax-Speech technical report (the Speech-02 architecture) lists 32 languages.

platform.minimax.io/docs/guides/pricing-paygo (checked 2026-06-11) T2A pay-as-you-go: speech-02-turbo at $60 per 1M characters.

minimax.io/news/speech-02-series (checked 2026-06-11) Series launch post dated April 2, 2025; rollout coverage ran through May 2025.

Background

Rank

Provider

Model

Humanness

Latency

#11

Inworld

TTS-2

288 ms

#12

Cartesia

Sonic 3.5

128 ms

#13

MiniMax

Speech 2 Turbo

315 ms

#14

ElevenLabs

Flash v2.5

197 ms

#15

Cartesia

Sonic 2

159 ms

Frequently asked questions

How is Speech 2 Turbo tested on the Humanness Index™?

Listeners hear Speech 2 Turbo against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.

How does Speech 2 Turbo relate to the Speech 2.8 entry?

The Speech 2.8 entry on the Index runs the newer generation, turbo tier; Speech 2 Turbo is the April 2025 realtime tier that preceded it. Both share the MiniMax cloning stack, so the blind tests compare generations of the same family directly.