Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.camb.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

While SDKs and frameworks provide convenience, sometimes you need direct control over API calls. This tutorial shows how to call the Camb.ai TTS API directly using HTTP requests.

When to Use Direct API

  • Building integrations in languages without an SDK
  • Need fine-grained control over request/response handling
  • Debugging or testing API behavior
  • Building custom streaming implementations

Listen to an Example

Prerequisites

1

Create an account

Sign up at CAMB.AI Studio if you haven’t already.
2

Get your API key

Go to Settings → API Keys in Studio and copy your key. See Authentication for details.

Basic TTS Request

POST /tts-stream returns a binary audio byte stream (for example audio/wav or audio/mpeg), not Server-Sent Events or JSON chunks. The server sends the Content-Type that matches your output_configuration.format. You can buffer the full body for short clips, or read in chunks for lower latency—see Stream Text-to-Speech Audio.
curl -X POST "https://client.camb.ai/apis/tts-stream" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from the command line!",
    "voice_id": 147320,
    "language": "en-us",
    "speech_model": "mars-8.1-flash-beta",
    "output_configuration": {
      "format": "wav"
    }
  }' \
  --output output.wav

Streaming Response

For real-time playback, iterate over the raw response body as chunks arrive. Always validate the status before reading the stream—non-success responses may return JSON (for example validation errors), not audio. Responses can include the X-Credits-Required header for usage tracking (see the API reference).
import asyncio
import os
import aiohttp


async def stream_tts(text: str, voice_id: int = 147320):
    """Yield audio chunks as they arrive; raises on non-success HTTP status."""
    api_key = os.getenv("CAMB_API_KEY")
    url = "https://client.camb.ai/apis/tts-stream"

    headers = {
        "x-api-key": api_key,
        "Content-Type": "application/json",
    }

    payload = {
        "text": text,
        "voice_id": voice_id,
        "language": "en-us",
        "speech_model": "mars-8.1-flash-beta",
        "output_configuration": {"format": "wav"},
    }

    timeout = aiohttp.ClientTimeout(total=120)

    async with aiohttp.ClientSession(timeout=timeout) as session:
        async with session.post(url, headers=headers, json=payload) as resp:
            print(f"Status: {resp.status}")
            print(f"Content-Type: {resp.headers.get('Content-Type')}")
            print(f"X-Credits-Required: {resp.headers.get('X-Credits-Required')}")
            resp.raise_for_status()

            async for chunk in resp.content.iter_chunked(4096):
                yield chunk


async def main():
    with open("streamed_output.wav", "wb") as f:
        async for chunk in stream_tts("This is streamed audio generation."):
            f.write(chunk)


if __name__ == "__main__":
    asyncio.run(main())

Request Parameters

These fields match Stream Text-to-Speech Audio and the OpenAPI schema for POST /tts-stream.

Required Parameters

ParameterTypeDescription
textstringText to synthesize (3–3000 characters)
languagestringBCP-47 locale (e.g. en-us). Case-sensitive lowercase. Unsupported locales for the chosen speech_model return 422.
voice_idintegerVoice profile ID from /list-voices

Optional Parameters

ParameterTypeDefaultDescription
speech_modelstringmars-8.1-flash-betamars-8.1-flash-beta, mars-8.1-pro-beta, mars-flash, mars-pro, mars-instruct. MARS 8.1 beta models support inline pronunciation and non-verbal tags in text; mars-instruct uses a different expressive tag set (API reference).
output_configurationobjectformat defaults to wavformat, optional sample_rate. Supported formats depend on speech_model (see the Output format support by model table in the API reference).
voice_settingsobject—speaking_rate, reference quality, accent controls (API reference).
inference_optionsobject—e.g. inference_steps where applicable (API reference).
enhance_named_entities_pronunciationbooleanfalseImproves named-entity pronunciation when supported. Not supported for mars-8.1-flash-beta or mars-8.1-pro-beta (same as the API reference note).
More details available in the API Reference. mars-instruct does not support mp3 or pcm_s16be (see Output format support by model in the API reference).

Expressive and pronunciation controls

  • mars-8.1-flash-beta / mars-8.1-pro-beta: English CMU phoneme overrides (e.g. [B EY1 S]) and non-verbal tags such as [laughter]—see MARS 8.1 Beta Text Controls in the API reference.
  • mars-instruct: Emotion and pacing tags and SSML-style pauses in text—examples below.
When you use speech_model: "mars-instruct", you can encode expression directly in the text field. English examples:
  • [speaking slowly] This is very important. Please pay close attention.
  • [excited] We shipped the feature, and the response has been fantastic!
  • Let's pause for a moment <break time="400ms"/> and continue clearly.
For a comprehensive guide on emotional expression, pauses, and prosody control, see the Emotional Voice Control tutorial.

Listing Voices

Get available voices:
import asyncio
import os
import aiohttp


async def list_voices():
    """List all available voices."""
    api_key = os.getenv("CAMB_API_KEY")
    url = "https://client.camb.ai/apis/list-voices"

    headers = {"x-api-key": api_key}

    async with aiohttp.ClientSession() as session:
        async with session.get(url, headers=headers) as resp:
            if resp.status == 200:
                voices = await resp.json()
                return voices
            else:
                raise Exception(f"Error: {resp.status}")


async def main():
    voices = await list_voices()

    print(f"Found {len(voices)} voices:\n")
    for voice in voices[:10]:  # Print first 10
        print(f"ID: {voice['id']}, Name: {voice['voice_name']}, Gender: {voice['gender']}")


if __name__ == "__main__":
    asyncio.run(main())

Playing Audio

import asyncio
import io
import os
import wave
import aiohttp
import numpy as np
import sounddevice as sd


async def play_tts(text: str):
    """Generate and play TTS audio."""
    api_key = os.getenv("CAMB_API_KEY")
    url = "https://client.camb.ai/apis/tts-stream"

    headers = {
        "x-api-key": api_key,
        "Content-Type": "application/json",
    }

    payload = {
        "text": text,
        "voice_id": 147320,
        "language": "en-us",
        "speech_model": "mars-8.1-flash-beta",
        "output_configuration": {"format": "wav"},
    }

    timeout = aiohttp.ClientTimeout(total=120)

    async with aiohttp.ClientSession(timeout=timeout) as session:
        async with session.post(url, headers=headers, json=payload) as resp:
            resp.raise_for_status()
            audio_bytes = await resp.read()

    # Parse WAV and extract PCM data
    with wave.open(io.BytesIO(audio_bytes), 'rb') as wav_file:
        sample_rate = wav_file.getframerate()
        audio_data = np.frombuffer(wav_file.readframes(-1), dtype=np.int16)

    sd.play(audio_data, samplerate=sample_rate)
    sd.wait()


if __name__ == "__main__":
    asyncio.run(play_tts("Hello! This audio is playing directly."))

Next Steps

Python SDK

Use the SDK for simpler integration

API Reference

Complete API documentation
https://mintcdn.com/cambai/2LvnefIkletroPxv/images/pipecat-orange.svg?fit=max&auto=format&n=2LvnefIkletroPxv&q=85&s=40cf8e001b8cadc8a4c3c557dea603d5

Voice Agents

Build real-time voice applications

Voice Library

Browse available voices