Meet MiniMax Speech-2.8

MiniMax Voice

Studio-grade text-to-speech, instant voice cloning, and sub-200ms voice agents in 40+ languages.

Cloud or on-premise, at half the cost of ElevenLabs. Available today.

40+ languages<200ms streaming TTFBFrom $60 / 1M charsOn-prem available today
Why MiniMax Voice

Built for production,
priced for shipping.

  • Voice cloning across 40+ languages, including Mandarin, Arabic, and Hindi.
  • Per-sentence emotion & style prompts. No SSML soup.
  • On-premise deployment available today, not a 2026 roadmap promise.
  • Long-form generation up to 30 minutes in a single call.
  • Voice agent ready. Drops into LiveKit, Pipecat, Twilio, any SIP trunk.
  • Half the price of ElevenLabs at HD parity, a quarter at Turbo.
QUALITY

Speech-2.5-HD ties with ElevenLabs Turbo v2.5 in blind MOS evaluations and outperforms it on emotional range. Speaker identity is preserved across long-form output without drift, making it usable for audiobook and podcast production end-to-end. No chunking, no manual stitching.

MOS 4.42 on internal eval set
Listen

Hear what production voice sounds like.

Two unedited first-take samples from Speech 2.8. No mastering, no EQ, no post-processing. What you hear is what the API returns.

Natural

Golden Voice (human-like)

English · Speech 2.8
Hey, it's me. How are ya? (chuckle) I hope you're having an awesome day! We actually had a bit of a crazy launch day yesterday, but I'm just recovered and ready to roll. You're listening to this and probably thinking I'm just chatting into a microphone, but here's the twist: I'm actually not human. I am the new Speech 2.8 model from MiniMax.
Listen for

Breaths, chuckles, throat-clears. Every disfluency you'd expect from a human.

Bilingual

Japanese × English, mid-sentence

JP × EN · Speech 2.8
Oh my gosh, you won't believe it, 今日は本当にすごかったの! I was running late for work, それから電車が止まっちゃって, and I'm like, 'Seriously?!' でも大丈夫, because guess what, 道で昔の友達にばったり会ったの!
Listen for

Native prosody on both sides. Same voice through the switch, no model swap.

Need a specific language or persona? Mention it on the demo form and we'll generate one before the call.

TTS API

Two models. One API. Real pricing.

Pick HD for content that ships to humans, Turbo for high-volume conversational workloads. Same SDK, same voices, same auth.

Most Expressive

Speech-2.5-HD

Cinematic delivery for content that ships to humans.

$100per 1M characters
  • Studio-grade 32kHz audio
  • Fine-grained emotion & style control
  • 40+ languages, native-quality prosody
  • Compatible with PVC custom voices
  • Long-form generation up to 30 minutes

Speech-2.5-Turbo

Built for high-volume, latency-sensitive workloads.

$60per 1M characters
  • Sub-200ms time-to-first-byte
  • Low-latency PCM streaming
  • 40+ languages, conversational tone
  • Compatible with IVC custom voices
  • Optimized for voice agent loops
On-Premise

Run Speech-2.5 in your own VPC.

Air-gapped deployments, GPU-aware scaling, full audit controls. Available today, not a 2026 roadmap promise.

Talk to sales
PlanMonthly volumePrice
StarterUp to 5M chars / month$2,500 / mo
GrowthUp to 25M chars / month$9,500 / mo
ScaleUp to 100M chars / month$28,000 / mo
EnterpriseUnlimited + SLA + GPU sizingCustom
Voice Cloning

Clone any voice in seconds, or studio-grade in days.

Two cloning paths so you can match the quality bar to the use case. Both work across all 40+ supported languages.

IVC

Instant Voice Clone

Upload 10 seconds of audio and start generating in under a minute. Built for prototypes, character voices, and personalized assistants.

  • 10s reference audio
  • Ready in <60 seconds
  • Best on Speech-2.5-Turbo
  • Pay-per-use, no setup fee
PVC

Professional Voice Clone

We fine-tune a dedicated model on a curated studio recording. Indistinguishable from the source speaker, even on long-form narration.

  • 30 min curated recordings
  • Trained in 3 to 5 business days
  • Best on Speech-2.5-HD
  • Available cloud or on-premise
Languages

Cloned voices speak every language we support.

40+ supported
EnglishMandarinSpanishPortugueseFrenchGermanItalianJapaneseKoreanArabicHindiTurkishRussianPolishDutchIndonesianVietnameseThaiCzechSwedishDanishFinnishNorwegianGreekHebrewRomanianHungarianUkrainianBulgarianCroatianSlovakTagalogMalayBengaliTamilTeluguMarathiUrduPersianSwahili
Voice Agent

Built for sub-200ms voice loops.

Drop into LiveKit, Pipecat, Twilio, or your own stack. The same TTS that powers our cloud API, tuned for real-time conversation.

Sub-200ms TTFB

Time-to-first-byte under 200ms on Turbo. Fast enough for natural turn-taking inside a voice loop.

PCM streaming

Stream raw 24kHz PCM directly into LiveKit, Pipecat, or your custom WebRTC pipeline.

Emotion & style control

Per-sentence prompts for tone, pace, energy, and emphasis. No prompt engineering tricks required.

Mid-sentence interruption

Cut audio cleanly when the user barges in, then resume with state intact.

40+ languages, mid-call

Switch language inside a conversation without swapping models or reloading voices.

SIP-ready

Drop into Twilio, Telnyx, or any SIP trunk. Tested for telephony codecs and 8kHz fallback.

Compare

MiniMax vs ElevenLabs vs Cartesia.

Where each platform actually wins. Numbers reflect public pricing and documented capabilities at the time of writing.

CapabilityMiniMax VoiceElevenLabsCartesia
HD model, per 1M chars$100~$300$99
Turbo model, per 1M chars$60~$99$25
Languages supported40+70+15+
Streaming TTFB<200 ms~250 ms<90 ms
Voice cloningIVC + PVCIVC + PVCIVC only
On-premise deploymentAvailable nowEarly access, Apr 2026Not offered
Emotion / style controlNative, per-sentenceNativeLimited
Long-form audio (>10 min)Single call up to 30 minChunking requiredChunking required

Sources: vendor pricing pages and product changelogs as of April 2026. ElevenLabs on-premise is in early access announced for April 2026 and not yet production-proven.

Ready when you are

Ship voice this quarter.

Book a 30-minute walkthrough with our solutions team. We'll generate samples in your target language and scope a deployment that fits your constraints, whether cloud, hybrid, or fully on-premise.