🚀 Introducing MARS8 Series — Four Powerful Variants | Available on All Major Clouds | Learn about the model here
🚀 Introducing MARS8 Series — Four Powerful Variants | Available on All Major Clouds | Learn about the model here
Convert text to speech in real-time with customizable voice characteristics, delivering audio content as it’s generated for immediate playback in your applications.
curl --request POST \
--url https://client.camb.ai/apis/tts-stream \
--header 'Content-Type: application/json' \
--header 'x-api-key: <api-key>' \
--data '
{
"text": "Jupiter, the largest planet in our solar system, is a gas giant with swirling storms.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta",
"enhance_named_entities_pronunciation": true,
"output_configuration": {
"format": "wav"
},
"voice_settings": {
"enhance_reference_audio_quality": false,
"maintain_source_accent": false,
"speaking_rate": 1.5
},
"inference_options": {
"stability": 0.6,
"temperature": 0.8,
"speaker_similarity": 0.7
}
}
'"<string>"Camb AI Python SDK Examples Link To Detailed Models OverviewDocumentation Index
Fetch the complete documentation index at: https://docs.camb.ai/llms.txt
Use this file to discover all available pages before exploring further.
Submit Your Text & Configuration
Receive the Audio Stream
speech_model: Specify the model for synthesis. Available values include mars-8.1-flash-beta, mars-8.1-pro-beta, mars-flash, mars-pro, and mars-instruct.mars-instruct, you can also embed delivery tags directly in the text (for example, emotion tags or SSML-style pauses) to shape pacing and tone.output_configuration: Set the audio format (wav, mp3), and apply enhancements.voice_settings: Enhance reference audio quality, maintain the source accent, or adjust the speaking rate.inference_options: Adjust stability, temperature, and speaker similarity for unique results.mars-8.1-flash-beta and mars-8.1-pro-beta models do not support the following parameters:acoustic_quality_boosttemperaturespeaker_similaritymaintain_source_accentstabilityoutput_enhancementenhance_named_entities_pronunciationlocalize_speaker_weightmars-8.1-flash-beta and mars-8.1-pro-beta models support inline controls for English pronunciation and expressive non-verbal sounds. Add these controls directly in the text field.
payload = {
"text": "He plays the [B EY1 S] guitar while catching a [B AE1 S] fish.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta"
}
payload = {
"text": "[laughter] You really got me. I didn't see that coming at all.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta"
}
[laughter], [sigh], [confirmation], [question], [surprise], [dissatisfaction].
mars-instruct)[speaking slowly] You need to understand this. It is very important. We should do this the right way.[angry] You need to understand this! It is very important, we should do this the right way![gentle, reassuring] Take a deep breath. You're doing well. Let's go step by step.Please pause here <break time="500ms"/> then continue in a calm, clear tone.voice_settings.speaking_rate. The streaming TTS endpoint does not support a duration parameter.output_configuration.format values depend on the selected speech_model:
| Speech Model | Supported output formats |
|---|---|
mars-8.1-flash-beta | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-8.1-pro-beta | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-flash | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-pro | wav, mp3, flac, adts, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
mars-instruct | wav, flac, adts, pcm_s16le, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be |
import requests
payload = {
"text": "Jupiter, the largest planet in our solar system, is a gas giant with swirling storms like the iconic Great Red Spot.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-instruct",
"enhance_named_entities_pronunciation": True,
"output_configuration": {
"format": "wav"
},
"voice_settings": {
"enhance_reference_audio_quality": False,
"maintain_source_accent": False,
"speaking_rate": 1.0
},
"inference_options": {
"inference_steps": 60,
}
}
headers = {
"x-api-key": "your-api-key"
}
response = requests.post(
"https://client.camb.ai/apis/tts-stream",
json=payload,
headers=headers,
stream=True
)
response.raise_for_status()
with open("output.wav", "wb") as audio_file:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
audio_file.write(chunk)
print("✨ Stream complete. Audio saved to output.wav")
import asyncio
from camb.client import AsyncCambAI, save_async_stream_to_file
from camb.types.stream_tts_output_configuration import StreamTtsOutputConfiguration
from camb.types.stream_tts_voice_settings import StreamTtsVoiceSettings
# Initialize the async client
client = AsyncCambAI(api_key="your-api-key")
async def main():
# Stream the TTS generation
response = client.text_to_speech.tts(
text="Experience high quality realistic sounds with Camb AI.",
language="en-us",
speech_model="mars-8.1-flash-beta",
voice_id=<voice_id>,
voice_settings=StreamTtsVoiceSettings(
speaking_rate=1.0
),
output_configuration=StreamTtsOutputConfiguration(
format="wav"
)
)
# Save the stream to a file (or process chunks as they arrive)
await save_async_stream_to_file(response, "async_stream_output.wav")
print("Audio stream saved to async_stream_output.wav")
if __name__ == "__main__":
asyncio.run(main())
The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Streaming Text-to-Speech request parameters.
Request body for /tts-stream.
The text to synthesize into speech (3–3000 characters).
3 - 3000"Jupiter, the largest planet in our solar system, is a gas giant with swirling storms."
BCP-47 locale for the input text (for example, en-us).
ro-ro, nl-nl, es-es, zh-tw, en-uk, el-gr, cs-cz, vi-vn, bn-bd, ar-tn, de-de, fr-ca, ar-xa, th-th, ar-eg, ar-sa, ar-sy, pa-in, zh-cn, ar-jo, ru-ru, bn-in, uk-ua, es-us, ja-jp, ar-ae, mr-in, en-au, de-ch, pt-pt, ar-kw, ar-qa, as-in, hi-in, fr-be, fi-fi, fr-fr, ar-dz, fr-ch, it-it, de-at, en-in, ko-kr, en-us, zh-hk, ar-om, ar-ma, pl-pl, ar-ly, es-mx, tr-tr, ar-iq, ar-lb, ml-in, pt-br, id-id, ar-bh, kn-in, nl-be, te-in, ar-ye, ta-in "en-us"
Voice profile ID to use for synthesis. Get available IDs from /list-voices.
x >= 1147320
Speech model variant to use for synthesis.
mars-8.1-flash-beta, mars-8.1-pro-beta, mars-flash, mars-pro, mars-instruct "mars-8.1-flash-beta"
If true, improves pronunciation of names, brands, and other named entities.
true
Controls output format and enhancement options for the stream.
Hide child attributes
Audio format for the streamed response. Choose a container (mp3, wav, flac, adts) or a raw PCM format (pcm_*).
wav, flac, adts, mp3, pcm_s16le, pcm_s16be, pcm_s32be, pcm_s32le, pcm_f32le, pcm_f32be "mp3"
Optional sample rate in Hz. Use this to control the audio quality and compatibility with different devices.
48000
{ "format": "wav" }Voice behavior preferences such as accent preservation and reference enhancement.
Hide child attributes
Remove noise from reference audio. (useful when the reference has background noise or compression).
false
Maintain the accent from the original source audio.
false
Controls playback speed for generated speech.
1.5
{
"enhance_reference_audio_quality": false,
"maintain_source_accent": false,
"speaking_rate": 1.5
}Model sampling controls that trade off stability, variation, and latency.
Hide child attributes
Balances voice consistency and audio quality. Lower values keep the voice closer to the original speaker, higher values improve clarity and smoothness—especially useful for weaker reference audio.
0 <= x <= 10.6
Controls randomness. Higher values increase variation and expressiveness.
0.01 <= x <= 40.8
Higher values may improve fidelity at the cost of latency.
5 <= x <= 20015
Adjusts how closely the generated voice matches the original speaker’s voice.
0 <= x <= 10.7
Controls how much the voice shifts toward a native accent in the target language, at the cost of changing the speaker’s original voice identity slightly.
0 <= x <= 1Reduce acoustic noise at the cost of slightly worse speaker similarity.
{
"stability": 0.6,
"temperature": 0.8,
"speaker_similarity": 0.7
}Streaming audio response
Binary audio stream in WAV format.
curl --request POST \
--url https://client.camb.ai/apis/tts-stream \
--header 'Content-Type: application/json' \
--header 'x-api-key: <api-key>' \
--data '
{
"text": "Jupiter, the largest planet in our solar system, is a gas giant with swirling storms.",
"language": "en-us",
"voice_id": 147320,
"speech_model": "mars-8.1-flash-beta",
"enhance_named_entities_pronunciation": true,
"output_configuration": {
"format": "wav"
},
"voice_settings": {
"enhance_reference_audio_quality": false,
"maintain_source_accent": false,
"speaking_rate": 1.5
},
"inference_options": {
"stability": 0.6,
"temperature": 0.8,
"speaker_similarity": 0.7
}
}
'"<string>"