Skip to main content
The MARS 8 family consists of zero-shot, multilingual Text-to-Speech (TTS) models designed to cover a wide range of production needs. Each model varies in latency, quality, controllability, and ideal use case, allowing you to choose the right fit for your application. Plus, MARS-Pro and MARS-Flash models are available on all major cloud providers.

Model Summary

ModelSample RateNotes
mars-8.1-flash-beta48kHzBeta MARS 8.1 model, optimized for faster generation. Try this model when quality is the priority, as it may perform much better for pronunciation, expressiveness with high-pitch references, overall prosody, accent control, accent coverage, and quality across different accents.
mars-8.1-pro-beta48kHzBeta MARS Pro model. Try this model when quality is the priority, as it may perform much better for pronunciation, expressiveness with high-pitch references, overall prosody, accent control, accent coverage, and quality across different accents.
mars-flash22.05kHz/48kHzUltra-low latency TTS for real-time agents, call center agents, and live conversational assistants.
mars-pro48kHzBalanced speed and fidelity for expressive real-time speech, dubbing, audiobooks, and digital media.
mars-instruct22.05kHzDirector-level emotional and prosodic control with embedded expressive text tags.

MARS-8.1-Flash-Beta (48khz)

Faster preview access to MARS 8.1 quality improvements
  • Model ID: mars-8.1-flash-beta
  • Primary Use Cases:
    • Faster evaluation of MARS 8.1 generation quality
    • Lower-latency workflows that still need stronger accent handling
    • Comparing output quality and speed against mars-flash and mars-8.1-pro-beta
  • Quality Improvements: Provides the same quality as MARS 8.1, including much better quality across different accents, while being faster than mars-8.1-pro-beta.
  • Notes: This is a beta model. For both MARS 8.1 beta models, your target language should use a reference voice in the same language and accent for the best results. Test with your target voices, languages, accents, and audio formats before using either beta model in production workflows.

MARS-8.1-Pro-Beta (48khz)

Preview access to the newest Pro-quality speech model
  • Model ID: mars-8.1-pro-beta
  • Primary Use Cases:
    • Evaluating the latest MARS Pro generation quality
    • High-fidelity TTS experiments before production rollout
    • Comparing output quality against mars-pro
  • Quality Improvements: May perform much better than mars-pro for pronunciation, overall prosody, accent control, accent coverage, and quality across different accents. It is also designed to improve expressiveness, especially for high-pitch reference voices.
  • Notes: This is a beta model. For MARS 8.1 beta models, your target language should use a reference voice in the same language and accent for the best results. Test it with your target voices, languages, accents, and audio formats before using it in production workflows.

MARS-Flash (22.05khz/48khz)

Ultra-low latency TTS for real-time agents and assistants
  • Parameters: 600M
  • TTFB: As low as 150 ms on certain GPUs like Blackwell
  • Primary Use Cases:
    • Agentic conversations
    • Call center agents
    • Live conversational assistants

MARS-Pro (48khz)

Balanced speed and fidelity for expressive real-time speech
  • Parameters: 600M
  • TTFB: 800 ms – 2 s
  • Primary Use Cases:
    • Real-time translation with voice and emotion transfer
    • Expressive dubbing
    • Audiobooks and digital media
  • Notes: Delivers best overall performance when speed is not the primary constraint, especially with short or challenging reference audio.

MARS-Instruct (22.05khz)

Director-level emotional and prosodic control
  • Parameters: 1.2B
  • TTFB: Higher latency (not intended for real-time use)
  • Primary Use Cases:
    • High-end TV and film production
    • Movie dubbing and post-production editing
  • Capabilities:
    • Independent control of speaker identity and prosody
    • Style and emotion can be tuned using both:
    • A reference audio sample
    • A textual description of desired prosody

MARS-Nano

Highly efficient TTS for on-device deployment
  • Parameters: 50M
  • TTFB: 500 ms – 2 s, depending on available compute
  • Primary Use Cases:
    • On-device applications
    • Environments with strict memory and compute constraints
  • Deployment Notes: Currently deployed with partners and providers such as Broadcom.

Tips For Best Results:

  • For texts with numbers expand the numbers to words. For example, instead of “123” to “one hundred twenty three” or “one two three” as you need.
  • For code-switched sentences, perform transliteration to convert the text to your chosen TTS language. We’re improving the model to handle above nuances better, but we find that practically most LLM outputs feeding in already have the conversions. We’ve focused more on other parameters related to quality.

Advanced Customization

Fine-tune the audio with additional parameters to control the performance, style, and quality of the generated speech. These can be sent in the payload. More details available in the API Reference.
  • output_configuration: Set the audio format (wav, mp3), and apply enhancements.
  • voice_settings: Enhance reference audio quality, maintain the source accent, or adjust the speaking rate.
  • inference_options: Adjust stability, temperature, and speaker similarity for unique results.

Language Support

MARS-8.1-Flash-Beta, MARS-8.1-Pro-Beta, MARS-Flash, MARS-Pro, and MARS-Instruct are released across multiple languages, collectively covering 99% of the world’s speaking population.
  • en-us - English (United States)
  • hi-in - Hindi (India)
  • fr-fr - French (France)
  • es-es - Spanish (Spain)
  • de-de - German
  • ja-jp - Japanese
  • ar-xa - Modern Standard Arabic
  • ko-kr - Korean
  • zh-cn - Chinese (Simplified)
  • it-it - Italian
  • es-mx - Spanish (Mexico)
  • pt-pt - Portuguese (Portugal)
  • pt-br - Portuguese (Brazil)
  • id-id - Indonesian
  • nl-nl - Dutch
  • ru-ru - Russian
  • ar-sa - Arabic (Saudi Arabia)
  • ta-in - Tamil
  • te-in - Telugu
  • bn-in - Bengali (India)
  • ar-eg - Arabic (Egypt)
  • ar-sy - Arabic (Syria)
  • ar-ma - Arabic (Morocco)
  • mr-in - Marathi
  • kn-in - Kannada
  • bn-bd - Bengali (Bangladesh)
  • as-in - Assamese
  • ml-in - Malayalam
  • fr-ca - French (Canada)
  • pl-pl - Polish
  • tr-tr - Turkish
  • pa-in - Punjabi