Convert text to speech in real-time with customizable voice characteristics, delivering audio content as it’s generated for immediate playback in your applications.
Submit Your Text & Configuration
Receive the Audio Stream
speech_model: Specify the model for synthesis.user_instructions: Guide the voice’s delivery (e.g., “Warm, clear, and conversational”). Only supported with mars-instruct.output_configuration: Set the audio format (wav, mp3), and apply enhancements.voice_settings: Enhance reference audio quality or maintain the source accent.inference_options: Adjust stability, temperature, and speaker similarity for unique results.user_instructions is only supported when speech_model is set to mars-instruct.output_configuration.format values depend on the selected speech_model:
| Speech Model | Supported output formats |
|---|---|
mars-pro | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-flash | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-instruct | wav, flac, adts, pcm_s16le, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Streaming Text-to-Speech request parameters.
Request body for /tts-stream.
The text to synthesize into speech (3–3000 characters).
3 - 3000"Jupiter, the largest planet in our solar system, is a gas giant with swirling storms."
BCP-47 locale for the input text (for example, en-us).
ro-ro, nl-nl, es-es, zh-tw, en-uk, el-gr, cs-cz, vi-vn, bn-bd, ar-tn, de-de, fr-ca, ar-xa, th-th, ar-eg, ar-sa, ar-sy, pa-in, zh-cn, ar-jo, ru-ru, bn-in, uk-ua, es-us, ja-jp, ar-ae, mr-in, en-au, de-ch, pt-pt, ar-kw, ar-qa, as-in, hi-in, fr-be, fi-fi, fr-fr, ar-dz, fr-ch, it-it, de-at, en-in, ko-kr, en-us, zh-hk, ar-om, ar-ma, pl-pl, ar-ly, es-mx, tr-tr, ar-iq, ar-lb, ml-in, pt-br, id-id, ar-bh, kn-in, nl-be, te-in, ar-ye, ta-in "en-us"
Voice profile ID to use for synthesis. Get available IDs from /list-voices.
x >= 1147320
Speech model variant to use for synthesis.
mars-pro, mars-flash, mars-instruct "mars-pro"
Optional guidance for style, tone, pronunciation, or delivery.
3 - 1000If true, improves pronunciation of names, brands, and other named entities.
true
Controls output format and enhancement options for the stream.
{
"format": "wav",
"duration": null,
"apply_enhancement": true
}Voice behavior preferences such as accent preservation and reference enhancement.
{
"enhance_reference_audio_quality": false,
"maintain_source_accent": false
}Model sampling controls that trade off stability, variation, and latency.
{
"stability": 0.6,
"temperature": 0.8,
"speaker_similarity": 0.7
}Streaming audio response
Binary audio stream in WAV format.