Documentation Index
Fetch the complete documentation index at: https://docs.camb.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
While SDKs and frameworks provide convenience, sometimes you need direct control over API calls. This tutorial shows how to call the Camb.ai TTS API directly using HTTP requests.When to Use Direct API
- Building integrations in languages without an SDK
- Need fine-grained control over request/response handling
- Debugging or testing API behavior
- Building custom streaming implementations
Listen to an Example
Prerequisites
Create an account
Sign up at CAMB.AI Studio if you haven’t already.
Get your API key
Go to Settings → API Keys in Studio and copy your key. See Authentication for details.
Basic TTS Request
POST /tts-stream returns a binary audio byte stream (for example audio/wav or audio/mpeg), not Server-Sent Events or JSON chunks. The server sends the Content-Type that matches your output_configuration.format. You can buffer the full body for short clips, or read in chunks for lower latency—see Stream Text-to-Speech Audio.
Streaming Response
For real-time playback, iterate over the raw response body as chunks arrive. Always validate the status before reading the stream—non-success responses may return JSON (for example validation errors), not audio. Responses can include theX-Credits-Required header for usage tracking (see the API reference).
Request Parameters
These fields match Stream Text-to-Speech Audio and the OpenAPI schema forPOST /tts-stream.
Required Parameters
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize (3–3000 characters) |
language | string | BCP-47 locale (e.g. en-us). Case-sensitive lowercase. Unsupported locales for the chosen speech_model return 422. |
voice_id | integer | Voice profile ID from /list-voices |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
speech_model | string | mars-8.1-flash-beta | mars-8.1-flash-beta, mars-8.1-pro-beta, mars-flash, mars-pro, mars-instruct. MARS 8.1 beta models support inline pronunciation and non-verbal tags in text; mars-instruct uses a different expressive tag set (API reference). |
output_configuration | object | format defaults to wav | format, optional sample_rate. Supported formats depend on speech_model (see the Output format support by model table in the API reference). |
voice_settings | object | — | speaking_rate, reference quality, accent controls (API reference). |
inference_options | object | — | e.g. inference_steps where applicable (API reference). |
enhance_named_entities_pronunciation | boolean | false | Improves named-entity pronunciation when supported. Not supported for mars-8.1-flash-beta or mars-8.1-pro-beta (same as the API reference note). |
mars-instruct does not support mp3 or pcm_s16be (see Output format support by model in the API reference).
Expressive and pronunciation controls
mars-8.1-flash-beta/mars-8.1-pro-beta: English CMU phoneme overrides (e.g.[B EY1 S]) and non-verbal tags such as[laughter]—see MARS 8.1 Beta Text Controls in the API reference.mars-instruct: Emotion and pacing tags and SSML-style pauses intext—examples below.
speech_model: "mars-instruct", you can encode expression directly in the text field.
English examples:
[speaking slowly] This is very important. Please pay close attention.[excited] We shipped the feature, and the response has been fantastic!Let's pause for a moment <break time="400ms"/> and continue clearly.
For a comprehensive guide on emotional expression, pauses, and prosody control, see the Emotional Voice Control tutorial.
Listing Voices
Get available voices:Playing Audio
Next Steps
Python SDK
Use the SDK for simpler integration
API Reference
Complete API documentation
Voice Agents
Build real-time voice applications
Voice Library
Browse available voices