MARS-Flash (22.05khz/48khz)
Ultra-low latency TTS for real-time agents and assistants- Parameters: 600M
- TTFB: As low as 150 ms on certain GPUs like Blackwell
- Primary Use Cases:
- Agentic conversations
- Call center agents
- Live conversational assistants
MARS-Pro (48khz)
Balanced speed and fidelity for expressive real-time speech- Parameters: 600M
- TTFB: 800 ms โ 2 s
- Primary Use Cases:
- Real-time translation with voice and emotion transfer
- Expressive dubbing
- Audiobooks and digital media
- Notes: Delivers best overall performance when speed is not the primary constraint, especially with short or challenging reference audio.
MARS-Instruct (22.05khz)
Director-level emotional and prosodic control- Parameters: 1.2B
- TTFB: Higher latency (not intended for real-time use)
- Primary Use Cases:
- High-end TV and film production
- Movie dubbing and post-production editing
- Capabilities:
- Independent control of speaker identity and prosody
- Style and emotion can be tuned using both:
- A reference audio sample
- A textual description of desired prosody
MARS-Nano
Highly efficient TTS for on-device deployment- Parameters: 50M
- TTFB: 500 ms โ 2 s, depending on available compute
- Primary Use Cases:
- On-device applications
- Environments with strict memory and compute constraints
- Deployment Notes: Currently deployed with partners and providers such as Broadcom.
Tips For Best Results:
- For texts with numbers expand the numbers to words. For example, instead of โ123โ to โone hundred twenty threeโ or โone two threeโ as you need.
- For code-switched sentences, perform transliteration to convert the text to your chosen TTS language. Weโre improving the model to handle above nuances better, but we find that practically most LLM outputs feeding in already have the conversions. Weโve focused more on other parameters related to quality.
Advanced Customization
Fine-tune the audio with additional parameters to control the performance, style, and quality of the generated speech. These can be sent in the payload. More details available in the API Reference.user_instructions: Guide the voiceโs delivery (e.g., โWarm, clear, and conversationalโ). Only supported withmars-instruct.output_configuration: Set the audio format (wav,mp3), and apply enhancements.voice_settings: Enhance reference audio quality or maintain the source accent.inference_options: Adjust stability, temperature, and speaker similarity for unique results.
Language Support
MARS-Flash, MARS-Pro, and MARS-Instruct are released across multiple languages, collectively covering 99% of the worldโs speaking population.en-usโ English (United States)hi-inโ Hindi (India)fr-frโ French (France)es-esโ Spanish (Spain)de-deโ Germanja-jpโ Japanesear-xaโ Modern Standard Arabicko-krโ Koreanzh-cnโ Chinese (Simplified)it-itโ Italianes-mxโ Spanish (Mexico)pt-ptโ Portuguese (Portugal)pt-brโ Portuguese (Brazil)id-idโ Indonesiannl-nlโ Dutchru-ruโ Russianar-saโ Arabic (Saudi Arabia)ta-inโ Tamilte-inโ Telugubn-inโ Bengali (India)ar-egโ Arabic (Egypt)ar-syโ Arabic (Syria)ar-maโ Arabic (Morocco)mr-inโ Marathikn-inโ Kannadabn-bdโ Bengali (Bangladesh)as-inโ Assameseml-inโ Malayalamfr-caโ French (Canada)pl-plโ Polishtr-trโ Turkishpa-inโ Punjabi