Skip to main content
The MARS 8 family has four model variants, each optimized for different trade-offs between latency, quality, controllability, and deployment footprint. Use this guide to find the best fit for your application.

Quick Comparison

MARS-FlashMARS-ProMARS-InstructMARS-Nano
Parameters600M600M1.2B50M
TTFB~150 ms800 ms – 2 sHigher (offline)500 ms – 2 s
Real-time capableYesYesNoYes
Emotion/prosody controlNoNoYesNo
On-device friendlyNoNoNoYes

Decision Flowchart

Start with the primary constraint of your application:

1. Do you need real-time, low-latency speech?

Yes — use MARS-Flash. Flash is purpose-built for conversational AI. With TTFB as low as 150 ms, it handles agentic conversations, call center bots, and live assistants where every millisecond counts. Choose Flash when:
  • You’re building a voice agent or assistant
  • End-to-end latency is a hard requirement
  • Speed and real-time responsiveness matter most

2. Do you need high-fidelity speech with voice and emotion transfer?

Yes — use MARS-Pro. Pro offers balanced speed and fidelity, delivering the best overall performance when speed is not the primary constraint. It preserves speaker identity and emotion from reference audio, making it ideal for dubbing, audiobooks, and expressive media. Choose Pro when:
  • You need expressive, natural-sounding speech
  • Real-time translation with voice transfer is required
  • You can tolerate 800 ms – 2 s of latency
  • Short or challenging reference audio is involved

3. Do you need fine-grained control over delivery style?

Yes — use MARS-Instruct. Instruct gives you director-level control. You can independently adjust speaker identity, emotion, and prosody using either a reference audio sample or a text description (e.g., “warm and reassuring tone”). Choose Instruct when:
  • You’re producing content for TV, film, or post-production
  • You need to guide emotion and delivery with text instructions
  • Latency is not a concern (batch/offline workflows)
  • You want to decouple speaker identity from speaking style

4. Do you need to run TTS on-device?

Yes — use MARS-Nano. At just 50M parameters, Nano runs in memory-constrained environments where cloud access isn’t available or practical. Choose Nano when:
  • You’re deploying on edge hardware or mobile devices
  • Strict memory and compute budgets apply
  • Cloud connectivity is unreliable or unavailable

Common Scenarios

ScenarioRecommended Model
Voice agent for customer supportMARS-Flash
Real-time language translation with voice cloningMARS-Pro
Audiobook productionMARS-Pro
Movie dubbing with director-guided emotionMARS-Instruct
Smart speaker / IoT deviceMARS-Nano
Live conversational assistantMARS-Flash
Post-production voice editingMARS-Instruct
Offline batch TTS pipelineMARS-Pro or MARS-Instruct

Still Not Sure?

  • Start with MARS-Pro if you want the best balance of speed and fidelity without worrying about constraints.
  • Switch to Flash if latency testing reveals Pro is too slow for your real-time pipeline.
  • Reach out on Discord if your use case doesn’t fit neatly into one of these categories.