Gradium TTS key stats
- Latency (measured)
- 332 ms1
- Voice cloning
- 10 s sample4
- Released
- December 2, 20255
- Vapi team measurement (checked 2026-06-11, confidence: medium) Measured and reported by the Vapi team; not yet reproduced by the in-repo 50 trial pipeline.
- docs.gradium.ai/api-reference/endpoint/tts-post (checked 2026-06-10) English, French, Spanish, Portuguese, and German.
- gradium.ai/pricing (checked 2026-06-10) 1 credit per character (docs.gradium.ai/guides/credits); entry XS plan is $13/mo for 225k credits, a $58 per 1M effective rate.
- docs.gradium.ai/api-reference/endpoint/tts-post (checked 2026-06-10)
- gradium.ai/blog/gradium (checked 2026-06-10) Out of stealth with production APIs; company founded 2025-09.
Background
Gradium TTS comes from the Paris based team behind Kyutai, the open research lab that shipped the first real time conversational speech model. Founded in September 2025 by generative audio pioneers from Google DeepMind and Meta, Gradium raised a $70M seed and launched production speech APIs in December 2025. Its text to speech streams with time to first audio well under 300 ms from servers in Europe and the US, speaks English, French, Spanish, Portuguese, and German, and clones a voice from a ten second sample.
Sources: gradium.ai
At a glance
Five languages, ten second cloning, and streaming from EU and US servers. The Index shows 332 ms, measured by the Vapi team in June 2026; Gradium reports around 155 ms on the independent Coval benchmark from its own region.
Sources: docs.gradium.ai
Frequently asked questions
- How is Gradium TTS tested on the Humanness Index™?
- Listeners hear Gradium TTS against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
- What latency does Gradium TTS have?
- The Index shows 332 ms to first audio, measured by the Vapi team in June 2026. Gradium publishes under 300 ms from EU and US servers and posts around 155 ms on the independent Coval benchmark; distance to its regions explains much of the spread.
How human does your model really sound?
The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.