Skip to content
The Humanness Index™
Built by VapiGitHub

The Humanness Index™

The open benchmark for how human voice AI sounds, so you can pick the model that passes. Built by Vapi.

MethodologyGitHubContactvapi.ai

Code is Apache-2.0. Standings data is CC BY 4.0. Audio clips and source voices are licensed recordings, all rights reserved. Provider logomarks belong to their respective owners and are used nominatively. “The Humanness Index™” name and logo are Vapi trademarks; see TRADEMARKS.md.

  1. Humanness Index™
  2. Cartesia
  3. Sonic

Humanness Index™ · TTS model

Cartesia

Cartesia Sonic

by Cartesia

Sonic was Cartesia's debut voice model, released in May 2024 as the first text to speech engine built on the state space model architecture its founders pioneered in academia.

Rank
#21
Humanness
0
Likely rank
#22
Blind votes
111

Standings as of Jun 15, 2026, 05:56 UTC

LowerHigher

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Sonic key stats

Latency (measured)
116 ms1
Languages
152
Price / 1M chars
$503
Released
May 20244
  1. Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.
  2. docs.cartesia.ai/build-with-cartesia/tts-models/latest (checked 2026-06-10)
  3. cartesia.ai/pricing (checked 2026-06-10) 1 credit per character (docs.cartesia.ai/pricing); entry self-serve Pro plan is $5/mo for 100K credits, a $50 per 1M effective rate; larger plans drop to $37-39 per 1M. Same credit rate for every Sonic.
  4. cartesia.ai/blog/sonic (checked 2026-06-10)

Background

Sonic was Cartesia's debut voice model, released in May 2024 as the first text to speech engine built on the state space model architecture its founders pioneered in academia. At launch it generated lifelike speech with 135 ms model latency, the fastest in its class at the time, with instant voice cloning and speed and emotion controls. Three newer Sonic generations have since superseded it, but it remains the baseline that started Cartesia's run at real time voice.

Sources: cartesia.ai

At a glance

The first SSM voice model, with instant cloning and 15 languages. In our 50 trial streaming benchmark it returned first audio in a median of 116 ms, still among the fastest results on the Index.

Sources: docs.cartesia.ai

Position in the rankings

Standings as of Jun 15, 2026, 05:56 UTC

RankProviderModelHumannessLatency
#19CartesiaCartesiaSonic 322166 ms
#20GradiumGradiumGradium TTS20332 ms
#21CartesiaCartesiaSonic0116 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Sonic tested on the Humanness Index™?
Listeners hear Sonic against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
Is Sonic still Cartesia's current model?
No. Three newer generations have superseded it, with Sonic 3.5 as the current flagship. Sonic remains in the arena as the baseline of the family.

Keep exploring

CartesiaCartesiaAll Cartesia models on the IndexCartesiaSonic 2Rank #18 · Humanness 34CartesiaSonic 3Rank #19 · Humanness 22CartesiaSonic 3.5Rank #3 · Humanness 68

Back to the Humanness Index™

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodologyStar on GitHub

Build a TTS model? Add yours to the Index.