Sonic 3.5 key stats
- Latency (measured)
- 128 ms1
- Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.
- docs.cartesia.ai (checked 2026-06-10) Sonic 3 and 3.5; earlier Sonic and Sonic 2 shipped 15, encoded per model.
- cartesia.ai/pricing (checked 2026-06-10) 1 credit per character (docs.cartesia.ai/pricing); entry self-serve Pro plan is $5/mo for 100K credits, a $50 per 1M effective rate; larger plans drop to $37-39 per 1M. Same credit rate for every Sonic.
- docs.cartesia.ai/build-with-cartesia/tts-models/latest (checked 2026-06-10) Snapshot release.
Background
Sonic 3.5 is Cartesia's current flagship, released in May 2026. Cartesia positions it as its most natural and fastest model, with sub 90 ms latency and native support for 42 languages. It is tuned for production agent transcripts: it reads order numbers, emails, and confirmation codes correctly without preprocessing, and it resolves heteronyms like read and bow from the surrounding words.
Sources: docs.cartesia.ai
At a glance
Alphanumerics and heteronyms without preprocessing, 42 languages, and a published sub 90 ms latency claim. In our 50 trial streaming benchmark it returned first audio in a median of 128 ms, the fastest measured time among current generation models on the Index.
Sources: docs.cartesia.ai
Position in the rankings
Standings as of Jun 13, 2026, 00:15 UTC
Frequently asked questions
- How is Sonic 3.5 tested on the Humanness Index™?
- Listeners hear Sonic 3.5 against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
- How fast is Sonic 3.5?
- Cartesia publishes sub 90 ms latency. In our 50 trial streaming benchmark it returned first audio in a median of 128 ms including network time from the benchmark machine.
How human does your model really sound?
The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.