Inworld models on the Humanness Index™
| Rank | Model | Humanness | Latency | Languages | Price / 1M chars |
|---|
| #11 | TTS-1.5-max | 65 | 337 ms | 15 | $35 |
| #9 | TTS-2 | 69 | 288 ms | 100+ | $25 |
About Inworld
Inworld builds voice models for interactive agents, and its TTS line climbed third party speech arenas through 2025, with the 1.5 generation reaching the top of the Artificial Analysis Speech Arena ahead of Google and ElevenLabs. Integration partners for its realtime models include Vapi.
Sources: inworld.ai
Model line
The 1.5 generation targets the quality and speed balance most production agents need, with a mini sibling for the lowest latency. Realtime TTS-2, released as a research preview in May 2026, conditions on the audio of prior conversation turns and holds a single voice identity across more than 100 languages.
Sources: docs.inworld.ai
Inworld stats
- docs.inworld.ai/release-notes/tts (checked 2026-06-10) TTS 1.5 generation; Realtime TTS-2 covers 100+, encoded per model.
- inworld.ai/pricing (checked 2026-06-10) On-demand rates: TTS 1.5 Max $35 per 1M characters; TTS-2 $25 per 1M, encoded per model.
How human does your model really sound?
The benchmark is open source. Suggest a model, read the methodology, or ask us to put your voice in the arena.