🚀 Introducing MARS8 Series — Four Powerful Variants | Available on All Major Clouds | Learn about the model here
🚀 Introducing MARS8 Series — Four Powerful Variants | Available on All Major Clouds | Learn about the model here
Convert text to speech in real-time with customizable voice characteristics, delivering audio content as it’s generated for immediate playback in your applications.
curl --request POST \
--url https://client.camb.ai/apis/tts-stream \
--header 'Content-Type: application/json' \
--header 'x-api-key: <api-key>' \
--data '
{
"text": "[laughter] He plays the [B EY1 S] guitar while catching a [B AE1 S] fish.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta",
"enhance_named_entities_pronunciation": true,
"output_configuration": {
"format": "wav"
},
"voice_settings": {
"enhance_reference_audio_quality": false,
"maintain_source_accent": false,
"speaking_rate": 1.5
}
}
'"<string>"Camb AI Python SDK Examples Link To Detailed Models OverviewDocumentation Index
Fetch the complete documentation index at: https://docs.camb.ai/llms.txt
Use this file to discover all available pages before exploring further.
Submit Your Text & Configuration
Receive the Audio Stream
language field takes a BCP-47 locale code (e.g. en-us, hi-in, zh-cn). It controls the accent and pronunciation of the generated speech — the model does not translate the input text, so the text you supply should already be written in the target language.
| Speech model | Locales supported |
|---|---|
mars-flash, mars-pro | 33 |
mars-8.1-flash-beta, mars-8.1-pro-beta | 158 |
mars-instruct | 141 |
es-mx over es-es for a Mexican Spanish accent, or zh-cn-sichuan over zh-cn for a Sichuan-flavored Mandarin.pt-br, not pt-BR).language is not supported by the selected speech_model, the API responds with HTTP 422 and a ValidationError body that lists the allowed locales for that model. Example:
{
"detail": [{
"loc": ["body"],
"msg": "Value error, Language 'zh-tw' is not supported for speech model 'mars-flash'. Allowed languages are: ['en-us', 'en-in', 'zh-cn', ...]"
}]
}
speech_model: Specify the model for synthesis. Available values include mars-8.1-flash-beta, mars-8.1-pro-beta, mars-flash, mars-pro, and mars-instruct.mars-instruct, you can also embed delivery tags directly in the text (for example, emotion tags or SSML-style pauses) to shape pacing and tone.output_configuration: Set the audio format (wav, mp3), sample rate, and toggle output enhancement.
apply_enhancement (boolean, optional): Applies output audio enhancement (loudness, denoising, polish). Defaults to true for most models, false for the speed-oriented mars-flash and mars-8.1-flash-beta models. Set explicitly to override.voice_settings: Enhance reference audio quality, maintain the source accent, or adjust the speaking rate.inference_options: Adjust stability, temperature, and speaker similarity for unique results.mars-8.1-flash-beta and mars-8.1-pro-beta models do not support the following parameters:acoustic_quality_boosttemperaturespeaker_similaritymaintain_source_accentstabilityoutput_enhancementenhance_named_entities_pronunciationlocalize_speaker_weightmars-8.1-flash-beta and mars-8.1-pro-beta models support inline controls for English pronunciation and expressive non-verbal sounds. Add these controls directly in the text field.
payload = {
"text": "He plays the [B EY1 S] guitar while catching a [B AE1 S] fish.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta"
}
payload = {
"text": "[laughter] You really got me. I didn't see that coming at all.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta"
}
[laughter], [sigh], [confirmation], [question], [surprise], [dissatisfaction].
mars-instruct)[speaking slowly] You need to understand this. It is very important. We should do this the right way.[angry] You need to understand this! It is very important, we should do this the right way![gentle, reassuring] Take a deep breath. You're doing well. Let's go step by step.Please pause here <break time="500ms"/> then continue in a calm, clear tone.voice_settings.speaking_rate. The streaming TTS endpoint does not support a duration parameter.output_configuration.format values depend on the selected speech_model:
| Speech Model | Supported output formats |
|---|---|
mars-8.1-flash-beta | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-8.1-pro-beta | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-flash | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-pro | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-instruct | wav, flac, adts, pcm_s16le, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
import requests
payload = {
"text": "Jupiter, the largest planet in our solar system, is a gas giant with swirling storms like the iconic Great Red Spot.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-instruct",
"enhance_named_entities_pronunciation": True,
"output_configuration": {
"format": "wav"
},
"voice_settings": {
"enhance_reference_audio_quality": False,
"maintain_source_accent": False,
"speaking_rate": 1.0
},
"inference_options": {
"inference_steps": 60,
}
}
headers = {
"x-api-key": "your-api-key"
}
response = requests.post(
"https://client.camb.ai/apis/tts-stream",
json=payload,
headers=headers,
stream=True
)
response.raise_for_status()
with open("output.wav", "wb") as audio_file:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
audio_file.write(chunk)
print("✨ Stream complete. Audio saved to output.wav")
import asyncio
from camb.client import AsyncCambAI, save_async_stream_to_file
from camb.types.stream_tts_output_configuration import StreamTtsOutputConfiguration
from camb.types.stream_tts_voice_settings import StreamTtsVoiceSettings
# Initialize the async client
client = AsyncCambAI(api_key="your-api-key")
async def main():
# Stream the TTS generation
response = client.text_to_speech.tts(
text="Experience high quality realistic sounds with Camb AI.",
language="en-us",
speech_model="mars-8.1-flash-beta",
voice_id=<voice_id>,
voice_settings=StreamTtsVoiceSettings(
speaking_rate=1.0
),
output_configuration=StreamTtsOutputConfiguration(
format="wav"
)
)
# Save the stream to a file (or process chunks as they arrive)
await save_async_stream_to_file(response, "async_stream_output.wav")
print("Audio stream saved to async_stream_output.wav")
if __name__ == "__main__":
asyncio.run(main())
The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Streaming Text-to-Speech request parameters.
Request body for /tts-stream.
The text to synthesize into speech (3–3000 characters). For mars-8.1-flash-beta and mars-8.1-pro-beta, you can include inline controls such as CMU phonemes ([B EY1 S]) and non-verbal tags ([laughter]).
3 - 3000"[laughter] He plays the [B EY1 S] guitar while catching a [B AE1 S] fish."
BCP-47 locale for the input text (for example, en-us).
ro-ro, nl-nl, es-es, zh-tw, en-uk, el-gr, cs-cz, vi-vn, bn-bd, ar-tn, de-de, fr-ca, ar-xa, th-th, ar-eg, ar-sa, ar-sy, pa-in, zh-cn, ar-jo, ru-ru, bn-in, uk-ua, es-us, ja-jp, ar-ae, mr-in, en-au, de-ch, pt-pt, ar-kw, ar-qa, as-in, hi-in, fr-be, fi-fi, fr-fr, ar-dz, fr-ch, it-it, de-at, en-in, ko-kr, en-us, zh-hk, ar-om, ar-ma, pl-pl, ar-ly, es-mx, tr-tr, ar-iq, ar-lb, ml-in, pt-br, id-id, ar-bh, kn-in, nl-be, te-in, ar-ye, ta-in, af-za, am-et, az-az, bg-bg, bs-ba, ca-es, cy-gb, da-dk, en-ca, en-gb, en-hk, en-ie, en-ke, en-ng, en-nz, en-ph, en-sg, en-tz, en-za, es-ar, es-bo, es-cl, es-co, es-cr, es-cu, es-do, es-ec, es-gq, es-gt, es-hn, es-ni, es-pa, es-pe, es-pr, es-py, es-sv, es-uy, es-ve, et-ee, eu-es, fa-ir, fil-ph, ga-ie, gl-es, gu-in, he-il, hr-hr, hu-hu, hy-am, is-is, jv-id, ka-ge, kk-kz, km-kh, lo-la, lt-lt, lv-lv, mk-mk, mn-mn, ms-my, mt-mt, my-mm, nb-no, ps-af, si-lk, sk-sk, sl-si, so-so, sq-al, sr-rs, sv-se, sw-ke, sw-tz, ta-lk, ta-my, ta-sg, ur-in, ur-pk, uz-uz, zh-cn-henan, zh-cn-liaoning, zh-cn-shaanxi, zh-cn-shandong, zh-cn-sichuan, zu-za, sa-in, tl-ph, es-xl, or-in, mai-in, sd-in, kok-in, mni-in, ks-in, doi-in, brx-in, sat-in "en-us"
Voice profile ID to use for synthesis. Get available IDs from /list-voices.
x >= 1147320
Speech model variant to use for synthesis. Use mars-8.1-flash-beta or mars-8.1-pro-beta to leverage inline pronunciation and non-verbal controls in text.
mars-8.1-flash-beta, mars-8.1-pro-beta, mars-flash, mars-pro, mars-instruct "mars-8.1-flash-beta"
If true, improves pronunciation of names, brands, and other named entities.
true
Controls output format and enhancement options for the stream.
Hide child attributes
Audio format for the streamed response. Choose a container (mp3, wav, flac, adts) or a raw PCM format (pcm_*).
wav, flac, adts, mp3, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be "mp3"
Optional sample rate in Hz. Use this to control the audio quality and compatibility with different devices.
48000
If true, applies output audio enhancement (loudness, denoising, polish). Defaults to true for most models; false for the speed-oriented mars-flash and mars-8.1-flash-beta models. Set explicitly to override the per-model default.
true
{ "format": "wav" }Voice behavior preferences such as accent preservation and reference enhancement.
Hide child attributes
Remove noise from reference audio. (useful when the reference has background noise or compression).
false
Maintain the accent from the original source audio.
false
Controls playback speed for generated speech.
1.5
{
"enhance_reference_audio_quality": false,
"maintain_source_accent": false,
"speaking_rate": 1.5
}Streaming audio response
Binary audio stream in WAV format.
curl --request POST \
--url https://client.camb.ai/apis/tts-stream \
--header 'Content-Type: application/json' \
--header 'x-api-key: <api-key>' \
--data '
{
"text": "[laughter] He plays the [B EY1 S] guitar while catching a [B AE1 S] fish.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta",
"enhance_named_entities_pronunciation": true,
"output_configuration": {
"format": "wav"
},
"voice_settings": {
"enhance_reference_audio_quality": false,
"maintain_source_accent": false,
"speaking_rate": 1.5
}
}
'"<string>"