Choosing a Model

The MARS 8 family has six model variants, each optimized for different trade-offs between latency, quality, controllability, and deployment footprint. Use this guide to find the best fit for your application.

Quick Comparison

	MARS-8.1-Flash-Beta	MARS-8.1-Pro-Beta	MARS-Flash	MARS-Pro	MARS-Instruct	MARS-Nano
Parameters	600M	600M	600M	600M	1.2B	50M
TTFB	Faster than MARS-8.1-Pro-Beta	800 ms – 2 s	~150 ms	800 ms – 2 s	Higher (offline)	500 ms – 2 s
Real-time capable	Yes	Yes	Yes	Yes	No	Yes
Emotion/prosody control	No	No	No	No	Yes	No
On-device friendly	No	No	No	No	No	Yes

Decision Flowchart

Start with the primary constraint of your application:

1. Do you want to evaluate the newest MARS 8.1 beta models?

Yes - try MARS-8.1-Flash-Beta or MARS-8.1-Pro-Beta. Use speech_model: "mars-8.1-flash-beta" when you want the same MARS 8.1 quality target with faster generation. Use speech_model: "mars-8.1-pro-beta" when you want to compare the newest Pro-quality model against mars-pro. These 8.1 models may perform much better for pronunciation, overall prosody, accent control, accent coverage, quality across different accents, and expressiveness with high-pitch reference voices. Because these are beta models, validate them with your target voices, languages, accents, and formats before rolling them into production. For best results, your target language should use a reference voice in the same language and accent.

2. Do you need real-time, low-latency speech?

Yes - use MARS-Flash. Flash is purpose-built for conversational AI. With TTFB as low as 150 ms, it handles agentic conversations, call center bots, and live assistants where every millisecond counts. Choose Flash when:

You’re building a voice agent or assistant
End-to-end latency is a hard requirement
Speed and real-time responsiveness matter most

3. Do you need high-fidelity speech with voice and emotion transfer?

Yes - use MARS-Pro. Pro offers balanced speed and fidelity, delivering the best overall performance when speed is not the primary constraint. It preserves speaker identity and emotion from reference audio, making it ideal for dubbing, audiobooks, and expressive media. Choose Pro when:

You need expressive, natural-sounding speech
Real-time translation with voice transfer is required
You can tolerate 800 ms – 2 s of latency
Short or challenging reference audio is involved

4. Do you need fine-grained control over delivery style?

Yes - use MARS-Instruct. Instruct gives you director-level control. You can independently adjust speaker identity, emotion, and prosody using either a reference audio sample or a text description (e.g., “warm and reassuring tone”). Choose Instruct when:

You’re producing content for TV, film, or post-production
You need to guide emotion and delivery with text instructions
Latency is not a concern (batch/offline workflows)
You want to decouple speaker identity from speaking style

See the Emotional Voice Control tutorial for examples of emotion tags, prosody control, and best practices.

5. Do you need to run TTS on-device?

Yes - use MARS-Nano. At just 50M parameters, Nano runs in memory-constrained environments where cloud access isn’t available or practical. Choose Nano when:

You’re deploying on edge hardware or mobile devices
Strict memory and compute budgets apply
Cloud connectivity is unreliable or unavailable

Common Scenarios

Scenario	Recommended Model
Faster MARS 8.1 output with stronger accent quality	MARS-8.1-Flash-Beta
Evaluating the newest Pro-quality TTS output	MARS-8.1-Pro-Beta
Voice agent for customer support	MARS-Flash
Real-time language translation with voice cloning	MARS-Pro
Audiobook production	MARS-Pro
Movie dubbing with director-guided emotion	MARS-Instruct
Smart speaker / IoT device	MARS-Nano
Live conversational assistant	MARS-Flash
Post-production voice editing	MARS-Instruct
Offline batch TTS pipeline	MARS-Pro or MARS-Instruct

Still Not Sure?

Start with MARS-Pro if you want the best balance of speed and fidelity without worrying about constraints.
Switch to Flash if latency testing reveals Pro is too slow for your real-time pipeline.
Reach out on Discord if your use case doesn’t fit neatly into one of these categories.

​Quick Comparison

​Decision Flowchart

​1. Do you want to evaluate the newest MARS 8.1 beta models?

​2. Do you need real-time, low-latency speech?

​3. Do you need high-fidelity speech with voice and emotion transfer?

​4. Do you need fine-grained control over delivery style?

​5. Do you need to run TTS on-device?

​Common Scenarios

​Still Not Sure?

Quick Comparison

Decision Flowchart

1. Do you want to evaluate the newest MARS 8.1 beta models?

2. Do you need real-time, low-latency speech?

3. Do you need high-fidelity speech with voice and emotion transfer?

4. Do you need fine-grained control over delivery style?

5. Do you need to run TTS on-device?

Common Scenarios

Still Not Sure?