Skip to content
The Humanness Index™
Built by VapiGitHub

The Humanness Index™

The open benchmark for how human voice AI sounds. Built and operated by Vapi.

MethodologyGitHubContactvapi.ai

Code is Apache-2.0. Standings data is CC BY 4.0. Audio clips and source voices are licensed recordings, all rights reserved. Provider logomarks belong to their respective owners and are used nominatively. “The Humanness Index™” name and logo are Vapi trademarks; see TRADEMARKS.md.

  1. Humanness Index™
  2. Gradium
  3. Gradium TTS

Humanness Index™ · TTS model

Gradium

Gradium TTS

by Gradium

Gradium TTS comes from the Paris based team behind Kyutai, the open research lab that shipped the first real time conversational speech model.

Rank
#20
Humanness
24
Likely rank
#14–20
Blind votes
96

Standings as of Jun 13, 2026, 01:14 UTC

LowerHigher

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

Gradium TTS key stats

Latency (measured)
332 ms1
Languages
52
Price / 1M chars
$583
Voice cloning
10 s sample4
Released
December 2, 20255
  1. Vapi team measurement (checked 2026-06-11, confidence: medium) Measured and reported by the Vapi team; not yet reproduced by the in-repo 50 trial pipeline.
  2. docs.gradium.ai/api-reference/endpoint/tts-post (checked 2026-06-10) English, French, Spanish, Portuguese, and German.
  3. gradium.ai/pricing (checked 2026-06-10) 1 credit per character (docs.gradium.ai/guides/credits); entry XS plan is $13/mo for 225k credits, a $58 per 1M effective rate.
  4. docs.gradium.ai/api-reference/endpoint/tts-post (checked 2026-06-10)
  5. gradium.ai/blog/gradium (checked 2026-06-10) Out of stealth with production APIs; company founded 2025-09.

Background

Gradium TTS comes from the Paris based team behind Kyutai, the open research lab that shipped the first real time conversational speech model. Founded in September 2025 by generative audio pioneers from Google DeepMind and Meta, Gradium raised a $70M seed and launched production speech APIs in December 2025. Its text to speech streams with time to first audio well under 300 ms from servers in Europe and the US, speaks English, French, Spanish, Portuguese, and German, and clones a voice from a ten second sample.

Sources: gradium.ai

At a glance

Five languages, ten second cloning, and streaming from EU and US servers. The Index shows 332 ms, measured by the Vapi team in June 2026; Gradium reports around 155 ms on the independent Coval benchmark from its own region.

Sources: docs.gradium.ai

Position in the rankings

Standings as of Jun 13, 2026, 01:14 UTC

RankProviderModelHumannessLatency
#18CartesiaCartesiaSonic 244159 ms
#19CartesiaCartesiaSonic 327166 ms
#20GradiumGradiumGradium TTS24332 ms
#21CartesiaCartesiaSonic0116 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is Gradium TTS tested on the Humanness Index™?
Listeners hear Gradium TTS against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
What latency does Gradium TTS have?
The Index shows 332 ms to first audio, measured by the Vapi team in June 2026. Gradium publishes under 300 ms from EU and US servers and posts around 155 ms on the independent Coval benchmark; distance to its regions explains much of the spread.

Keep exploring

GradiumGradiumAll Gradium models on the Index

Back to the Humanness Index™

How human does your model really sound?

The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.

Add your modelStar on GitHub