Skip to content
The Humanness Index™
Built by VapiGitHub

The Humanness Index™

The open benchmark for how human voice AI sounds. Built and operated by Vapi.

MethodologyGitHubContactvapi.ai

Code is Apache-2.0. Standings data is CC BY 4.0. Audio clips and source voices are licensed recordings, all rights reserved. Provider logomarks belong to their respective owners and are used nominatively. “The Humanness Index™” name and logo are Vapi trademarks; see TRADEMARKS.md.

  1. Humanness Index™
  2. MiniMax
  3. Speech 2 Turbo

Humanness Index™ · TTS model

MiniMax

Speech 2 Turbo

by MiniMax

Speech 2 Turbo (speech-02-turbo in the API) is the realtime tier of the MiniMax Speech 2 generation, launched in April 2025 and optimized for low latency interactive applications like voice agents and live translation.

Rank
#10
Humanness
67
Likely rank
#1–20
Blind votes
2

Standings as of Jun 13, 2026, 01:14 UTC

LowerHigher

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Speech 2 Turbo key stats

Latency (measured)
315 ms1
Languages
322
Price / 1M chars
$603
Released
April 20254
  1. Vapi streaming benchmark (50 trials per model) (checked 2026-06-11) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.
  2. arxiv.org/abs/2505.07916 (checked 2026-06-11) MiniMax-Speech technical report (the Speech-02 architecture) lists 32 languages.
  3. platform.minimax.io/docs/guides/pricing-paygo (checked 2026-06-11) T2A pay-as-you-go: speech-02-turbo at $60 per 1M characters.
  4. minimax.io/news/speech-02-series (checked 2026-06-11) Series launch post dated April 2, 2025; rollout coverage ran through May 2025.

Background

Speech 2 Turbo (speech-02-turbo in the API) is the realtime tier of the MiniMax Speech 2 generation, launched in April 2025 and optimized for low latency interactive applications like voice agents and live translation. It shares the generation's learnable speaker encoder, which clones a voice from roughly ten seconds of audio without a transcript across 32 languages, and it ranked third on the Artificial Analysis Speech Arena at release while its HD sibling held first.

Sources: minimax.io, arxiv.org

At a glance

The realtime member of the Speech 2 pair and the direct ancestor of the Speech 2.5 entry already on the Index. In our 50 trial streaming benchmark it returned first audio in a median of 315 ms including network time, in line with the 325 ms we measured for its 2.5 successor.

Sources: platform.minimax.io

Position in the rankings

Standings as of Jun 13, 2026, 01:14 UTC

RankProviderModelHumannessLatency
#8MiniMaxMiniMaxSpeech 2.572325 ms
#9InworldInworldTTS-269288 ms
#10MiniMaxMiniMaxSpeech 2 Turbo67315 ms
#11InworldInworldTTS-1.5-max65337 ms
#12ElevenLabsElevenLabsTurbo v264302 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Speech 2 Turbo tested on the Humanness Index™?
Listeners hear Speech 2 Turbo against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
How does Speech 2 Turbo relate to the Speech 2.5 entry?
The Speech 2.5 entry on the Index runs the newer generation, turbo tier; Speech 2 Turbo is the April 2025 realtime tier that preceded it. Both share the MiniMax cloning stack, so the blind tests compare generations of the same family directly.

Keep exploring

MiniMaxMiniMaxAll MiniMax models on the IndexMiniMaxSpeech 2.5Rank #8 · Humanness 72MiniMaxSpeech 2 HDRank #15 · Humanness 62

Back to the Humanness Index™

How human does your model really sound?

The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.

Add your modelStar on GitHub