Humanness Index™ · TTS model

Inworld TTS-1.5-max

TTS-1.5-max is the flagship of Inworld's 1.5 generation, launched in late 2025 with P90 first chunk latency under 250 ms across 15 languages.

Rank: #8
Humanness: 78
Likely rank: #7–12
Blind votes: 1,043

Standings as of Jul 28, 2026, 04:12 UTC

A real arena clip: a cloned source voice reading a customer support prompt at phone quality.

TTS-1.5-max key stats

Latency (measured): 337 ms¹
Languages: 15²
Price / 1M chars: $35³
Released: 2025⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.
docs.inworld.ai/release-notes/tts (checked 2026-06-10) TTS 1.5 generation; Realtime TTS-2 covers 100+, encoded per model.
inworld.ai/pricing (checked 2026-06-10) On-demand rates: TTS 1.5 Max $35 per 1M characters; TTS-2 $25 per 1M, encoded per model.
docs.inworld.ai/release-notes/tts (checked 2026-06-10, confidence: medium) Late 2025; Inworld published no exact date.

Background

TTS-1.5-max is the flagship of Inworld's 1.5 generation, launched in late 2025 with P90 first chunk latency under 250 ms across 15 languages. Inworld measured the generation at 30 percent more expressive than its predecessor with a 40 percent lower word error rate, and the 1.5 line reached the top of the Artificial Analysis Speech Arena ahead of Google and ElevenLabs. It targets the quality and speed balance most production agents need.

Sources: docs.inworld.ai

At a glance

The quality and speed flagship of the 1.5 generation, with a mini sibling for the lowest latency. In our 50 trial streaming benchmark it returned first audio in a median of 337 ms including network time.

Sources: docs.inworld.ai

Position in the rankings

Standings as of Jul 28, 2026, 04:12 UTC

Rank	Provider	Model	Humanness	Latency
#6	xAI	Grok TTS (Streaming)	86	285 ms
#7	Speechify	Simba 3.2	83	—
#8	Inworld	TTS-1.5-max	78	337 ms
#9	ElevenLabs	Flash v2	76	226 ms
#10	ElevenLabs	Turbo v2	75	302 ms

See the full Humanness Index™ rankings

Frequently asked questions

How is TTS-1.5-max tested on the Humanness Index™?: Listeners hear TTS-1.5-max against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.
What does the max in TTS-1.5-max mean?: It is the flagship of Inworld's 1.5 generation, tuned for quality at production speed, with published P90 first chunk latency under 250 ms. A mini sibling targets the lowest latency.

Keep exploring

InworldAll Inworld models on the Index TTS-2Rank #11 · Humanness 71

Back to the Humanness Index™

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

TTS-1.5-max key stats

Latency (measured)

337 ms¹

Languages

15²

Price / 1M chars

$35³

Released

2025⁴

Vapi streaming benchmark (50 trials per model) (checked 2026-06-10) Median of 50 sequential live streaming trials, June 2026; includes network RTT from the benchmark machine.

docs.inworld.ai/release-notes/tts (checked 2026-06-10) TTS 1.5 generation; Realtime TTS-2 covers 100+, encoded per model.

inworld.ai/pricing (checked 2026-06-10) On-demand rates: TTS 1.5 Max $35 per 1M characters; TTS-2 $25 per 1M, encoded per model.

docs.inworld.ai/release-notes/tts (checked 2026-06-10, confidence: medium) Late 2025; Inworld published no exact date.

Background

Rank

Provider

Model

Humanness

Latency

xAI

Grok TTS (Streaming)

285 ms

Speechify

Simba 3.2

—

Inworld

TTS-1.5-max

337 ms

ElevenLabs

Flash v2

226 ms

#10

ElevenLabs

Turbo v2

302 ms

Frequently asked questions

How is TTS-1.5-max tested on the Humanness Index™?

Listeners hear TTS-1.5-max against another model in a blind head to head round, both voices reading the same customer support prompt from the same cloned source voice, and they pick whichever sounds more human. Its Humanness score derives purely from those votes.

What does the max in TTS-1.5-max mean?

It is the flagship of Inworld's 1.5 generation, tuned for quality at production speed, with published P90 first chunk latency under 250 ms. A mini sibling targets the lowest latency.