Humanness Index™ · TTS model

MiniMax Speech 2.8

by MiniMax

Rank: #3
Humanness: 91
Likely rank: #1–6
Blind votes: 1,042

Standings as of Jul 28, 2026, 01:54 UTC

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Speech 2.8 key stats

Latency (measured): 325 ms¹
Languages: 40²
Price / 1M chars: $60³
Released: January 2026⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Measured on the MiniMax realtime turbo tier (June 2026 run, configured with the speech-2.5-turbo-preview request id); median of 50 sequential live streaming trials including network RTT.
platform.minimax.io/docs/api-reference/speech-t2a-http (checked 2026-06-10) Vendor lists 40+ languages for the current speech generation.
platform.minimax.io/docs/guides/pricing-paygo (checked 2026-07-23) Turbo tier $60 per 1M characters pay-as-you-go (speech-2.8-turbo; HD bills $100); the arena clips are the 2.8 generation, turbo tier (matches the measured realtime latency).
platform.minimax.io/docs/guides/models-intro (checked 2026-07-23, confidence: medium) Speech 2.8 family (speech-2.8-hd / speech-2.8-turbo). The arena clips are this generation, turbo tier, per MiniMax; the Speech-02 series preceded it in 2025-04 and Speech 2.5 in 2025-08.

Background

MiniMax's speech models moved fast through 2025 and into 2026, with the Speech-02 series arriving in April 2025, Speech 2.5 following in August, and the Speech 2.8 generation current since early 2026. The current generation supports more than 40 languages and clones a voice from roughly six to ten seconds of reference audio, using a learnable speaker encoder that needs no transcript. MiniMax is widely regarded as the strongest text to speech provider for Chinese, and its recent generations brought English accuracy and rhythm up alongside it.

Sources: platform.minimax.io

At a glance

The arena clips on this Index were generated with the Speech 2.8 generation, turbo tier, the realtime tier we also measured for latency. In our 50 trial streaming benchmark it returned first audio in a median of 325 ms.

Sources: platform.minimax.io

Position in the rankings

Standings as of Jul 28, 2026, 01:54 UTC

Rank	Provider	Model	Humanness	Latency
#1	ElevenLabs	Eleven v3	96	758 ms
#2	xAI	Grok TTS	94	460 ms
#3	MiniMax	Speech 2.8	91	325 ms
#4	Canopy Labs	Orpheus	89	—
#5	MiniMax	Speech 2 HD	89	357 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Speech 2.8 tested on the Humanness Index™?: Listeners hear Speech 2.8 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
Which MiniMax generation do the arena clips use?: The clips were generated with the Speech 2.8 generation, turbo tier, per MiniMax, the realtime tier we also measured for latency (325 ms median TTFB). This entry was labeled Speech 2.5 until July 2026, when MiniMax confirmed the benchmarked generation was 2.8.
Was the HD or Turbo variant benchmarked?: Turbo. Both the clips and the 325 ms latency figure come from the realtime turbo tier. The blind evaluation itself runs on 20 English customer support prompts, while MiniMax advertises more than 40 languages of capability for the generation.

Keep exploring

MiniMaxAll MiniMax models on the Index Speech 2 HDRank #5 · Humanness 89 Speech 2 TurboRank #13 · Humanness 70

Back to the Humanness Index™

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

Speech 2.8 key stats

Latency (measured)

325 ms¹

Languages

40²

Price / 1M chars

$60³

Released

January 2026⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Measured on the MiniMax realtime turbo tier (June 2026 run, configured with the speech-2.5-turbo-preview request id); median of 50 sequential live streaming trials including network RTT.

platform.minimax.io/docs/api-reference/speech-t2a-http (checked 2026-06-10) Vendor lists 40+ languages for the current speech generation.

platform.minimax.io/docs/guides/pricing-paygo (checked 2026-07-23) Turbo tier $60 per 1M characters pay-as-you-go (speech-2.8-turbo; HD bills $100); the arena clips are the 2.8 generation, turbo tier (matches the measured realtime latency).

platform.minimax.io/docs/guides/models-intro (checked 2026-07-23, confidence: medium) Speech 2.8 family (speech-2.8-hd / speech-2.8-turbo). The arena clips are this generation, turbo tier, per MiniMax; the Speech-02 series preceded it in 2025-04 and Speech 2.5 in 2025-08.

Background

Rank

Provider

Model

Humanness

Latency

ElevenLabs

Eleven v3

758 ms

xAI

Grok TTS

460 ms

MiniMax

Speech 2.8

325 ms

Canopy Labs

Orpheus

—

MiniMax

Speech 2 HD

357 ms

Frequently asked questions

How is Speech 2.8 tested on the Humanness Index™?

Listeners hear Speech 2.8 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.

Which MiniMax generation do the arena clips use?

The clips were generated with the Speech 2.8 generation, turbo tier, per MiniMax, the realtime tier we also measured for latency (325 ms median TTFB). This entry was labeled Speech 2.5 until July 2026, when MiniMax confirmed the benchmarked generation was 2.8.

Was the HD or Turbo variant benchmarked?

Turbo. Both the clips and the 325 ms latency figure come from the realtime turbo tier. The blind evaluation itself runs on 20 English customer support prompts, while MiniMax advertises more than 40 languages of capability for the generation.