Quick Comparison
| MARS-Flash | MARS-Pro | MARS-Instruct | MARS-Nano | |
|---|---|---|---|---|
| Parameters | 600M | 600M | 1.2B | 50M |
| TTFB | ~150 ms | 800 ms – 2 s | Higher (offline) | 500 ms – 2 s |
| Real-time capable | Yes | Yes | No | Yes |
| Emotion/prosody control | No | No | Yes | No |
| On-device friendly | No | No | No | Yes |
Decision Flowchart
Start with the primary constraint of your application:1. Do you need real-time, low-latency speech?
Yes — use MARS-Flash. Flash is purpose-built for conversational AI. With TTFB as low as 150 ms, it handles agentic conversations, call center bots, and live assistants where every millisecond counts. Choose Flash when:- You’re building a voice agent or assistant
- End-to-end latency is a hard requirement
- Speed and real-time responsiveness matter most
2. Do you need high-fidelity speech with voice and emotion transfer?
Yes — use MARS-Pro. Pro offers balanced speed and fidelity, delivering the best overall performance when speed is not the primary constraint. It preserves speaker identity and emotion from reference audio, making it ideal for dubbing, audiobooks, and expressive media. Choose Pro when:- You need expressive, natural-sounding speech
- Real-time translation with voice transfer is required
- You can tolerate 800 ms – 2 s of latency
- Short or challenging reference audio is involved
3. Do you need fine-grained control over delivery style?
Yes — use MARS-Instruct. Instruct gives you director-level control. You can independently adjust speaker identity, emotion, and prosody using either a reference audio sample or a text description (e.g., “warm and reassuring tone”). Choose Instruct when:- You’re producing content for TV, film, or post-production
- You need to guide emotion and delivery with text instructions
- Latency is not a concern (batch/offline workflows)
- You want to decouple speaker identity from speaking style
4. Do you need to run TTS on-device?
Yes — use MARS-Nano. At just 50M parameters, Nano runs in memory-constrained environments where cloud access isn’t available or practical. Choose Nano when:- You’re deploying on edge hardware or mobile devices
- Strict memory and compute budgets apply
- Cloud connectivity is unreliable or unavailable
Common Scenarios
| Scenario | Recommended Model |
|---|---|
| Voice agent for customer support | MARS-Flash |
| Real-time language translation with voice cloning | MARS-Pro |
| Audiobook production | MARS-Pro |
| Movie dubbing with director-guided emotion | MARS-Instruct |
| Smart speaker / IoT device | MARS-Nano |
| Live conversational assistant | MARS-Flash |
| Post-production voice editing | MARS-Instruct |
| Offline batch TTS pipeline | MARS-Pro or MARS-Instruct |
Still Not Sure?
- Start with MARS-Pro if you want the best balance of speed and fidelity without worrying about constraints.
- Switch to Flash if latency testing reveals Pro is too slow for your real-time pipeline.
- Reach out on Discord if your use case doesn’t fit neatly into one of these categories.