setHovering(true)} onMouseLeave={() => setHovering(false)}>

{title} {badge && {badge}}

{prompt &&

"{prompt}"

}

; }; ## Overview While SDKs and frameworks provide convenience, sometimes you need direct control over API calls. This tutorial shows how to call the Camb.ai TTS API directly using HTTP requests. ### When to Use Direct API * Building integrations in languages without an SDK * Need fine-grained control over request/response handling * Debugging or testing API behavior * Building custom streaming implementations ### Listen to an Example ### Prerequisites Sign up at [CAMB.AI Studio](https://studio.camb.ai) if you haven't already. Go to **Settings → API Keys** in Studio and copy your key. See [Authentication](/getting-started/authentication) for details. *** ## Basic TTS Request `POST /tts-stream` returns a **binary audio byte stream** (for example `audio/wav` or `audio/mpeg`), not Server-Sent Events or JSON chunks. The server sends the `Content-Type` that matches your `output_configuration.format`. You can buffer the full body for short clips, or read in chunks for lower latency—see [Stream Text-to-Speech Audio](/api-reference/endpoint/create-tts-stream). ```bash cURL theme={null} curl -X POST "https://client.camb.ai/apis/tts-stream" \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello from the command line!", "voice_id": 147320, "language": "en-us", "speech_model": "mars-8.1-flash-beta", "output_configuration": { "format": "wav" } }' \ --output output.wav ``` ```python Python (requests) theme={null} import os import requests def text_to_speech(text: str, voice_id: int = 147320) -> bytes: """Synchronous TTS call: stream the response and concatenate chunks (matches the streaming endpoint).""" api_key = os.getenv("CAMB_API_KEY") url = "https://client.camb.ai/apis/tts-stream" headers = { "x-api-key": api_key, "Content-Type": "application/json", } payload = { "text": text, "voice_id": voice_id, "language": "en-us", "speech_model": "mars-8.1-flash-beta", "output_configuration": {"format": "wav"}, } with requests.post(url, headers=headers, json=payload, stream=True) as response: response.raise_for_status() return b"".join( chunk for chunk in response.iter_content(chunk_size=1024) if chunk ) if __name__ == "__main__": audio = text_to_speech("Hello world!") with open("output.wav", "wb") as f: f.write(audio) ``` ```python Python (aiohttp) theme={null} import asyncio import os import aiohttp async def text_to_speech(text: str, voice_id: int = 147320) -> bytes: """Convert text to speech using direct API call; buffers full body after a successful response.""" api_key = os.getenv("CAMB_API_KEY") url = "https://client.camb.ai/apis/tts-stream" headers = { "x-api-key": api_key, "Content-Type": "application/json", } payload = { "text": text, "voice_id": voice_id, "language": "en-us", "speech_model": "mars-8.1-flash-beta", "output_configuration": { "format": "wav" }, } timeout = aiohttp.ClientTimeout(total=120) async with aiohttp.ClientSession(timeout=timeout) as session: async with session.post(url, headers=headers, json=payload) as resp: resp.raise_for_status() return await resp.read() async def main(): audio_data = await text_to_speech("Hello from the direct API!") with open("output.wav", "wb") as f: f.write(audio_data) print(f"Saved {len(audio_data)} bytes to output.wav") if __name__ == "__main__": asyncio.run(main()) ``` *** ## Streaming Response For real-time playback, iterate over the **raw response body** as chunks arrive. Always validate the status **before** reading the stream—non-success responses may return JSON (for example validation errors), not audio. Responses can include the `X-Credits-Required` header for usage tracking (see the [API reference](/api-reference/endpoint/create-tts-stream)). ```python theme={null} import asyncio import os import aiohttp async def stream_tts(text: str, voice_id: int = 147320): """Yield audio chunks as they arrive; raises on non-success HTTP status.""" api_key = os.getenv("CAMB_API_KEY") url = "https://client.camb.ai/apis/tts-stream" headers = { "x-api-key": api_key, "Content-Type": "application/json", } payload = { "text": text, "voice_id": voice_id, "language": "en-us", "speech_model": "mars-8.1-flash-beta", "output_configuration": {"format": "wav"}, } timeout = aiohttp.ClientTimeout(total=120) async with aiohttp.ClientSession(timeout=timeout) as session: async with session.post(url, headers=headers, json=payload) as resp: print(f"Status: {resp.status}") print(f"Content-Type: {resp.headers.get('Content-Type')}") print(f"X-Credits-Required: {resp.headers.get('X-Credits-Required')}") resp.raise_for_status() async for chunk in resp.content.iter_chunked(4096): yield chunk async def main(): with open("streamed_output.wav", "wb") as f: async for chunk in stream_tts("This is streamed audio generation."): f.write(chunk) if __name__ == "__main__": asyncio.run(main()) ``` *** ## Request Parameters These fields match [Stream Text-to-Speech Audio](/api-reference/endpoint/create-tts-stream) and the OpenAPI schema for `POST /tts-stream`. ### Required Parameters | Parameter | Type | Description | | ---------- | ------- | ------------------------------------------------------------------------------------------------------------------------- | | `text` | string | Text to synthesize (**3–3000** characters) | | `language` | string | BCP-47 locale (e.g. `en-us`). Case-sensitive lowercase. Unsupported locales for the chosen `speech_model` return **422**. | | `voice_id` | integer | Voice profile ID from [`/list-voices`](/api-reference/endpoint/list-voices) | ### Optional Parameters | Parameter | Type | Default | Description | | -------------------------------------- | ------- | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `speech_model` | string | `mars-8.1-flash-beta` | `mars-8.1-flash-beta`, `mars-8.1-pro-beta`, `mars-flash`, `mars-pro`, `mars-instruct`. MARS 8.1 beta models support inline pronunciation and non-verbal tags in `text`; `mars-instruct` uses a different expressive tag set ([API reference](/api-reference/endpoint/create-tts-stream)). | | `user_instructions` | string | `null` | Optional style, tone, pronunciation, or delivery guidance. Only supported with `speech_model: "mars-instruct"`. | | `output_configuration` | object | format defaults to `wav` | `format`, optional `sample_rate`. Supported formats depend on `speech_model` (see the **Output format support by model** table in the API reference). | | `voice_settings` | object | — | `speaking_rate`, reference quality, accent controls ([API reference](/api-reference/endpoint/create-tts-stream)). | | `inference_options` | object | — | e.g. `inference_steps` where applicable ([API reference](/api-reference/endpoint/create-tts-stream)). | | `enhance_named_entities_pronunciation` | boolean | `false` | Improves named-entity pronunciation when supported. **Not supported** for `mars-8.1-flash-beta` or `mars-8.1-pro-beta` (same as the API reference note). | More details available in the [API Reference](/api-reference/endpoint/create-tts-stream). `mars-instruct` does not support `mp3` or `pcm_s16be` (see **Output format support by model** in the API reference). *** ## Expressive and pronunciation controls * **`mars-8.1-flash-beta` / `mars-8.1-pro-beta`**: English CMU phoneme overrides (e.g. `[B EY1 S]`) and non-verbal tags such as `[laughter]`—see **MARS 8.1 Beta Text Controls** in the [API reference](/api-reference/endpoint/create-tts-stream). * **`mars-instruct`**: Emotion and pacing tags and SSML-style pauses in `text`, plus optional `user_instructions` for broader delivery guidance—examples below. When you use `speech_model: "mars-instruct"`, you can encode expression directly in the `text` field and use `user_instructions` for the overall style. English examples: * `[speaking slowly] This is very important. Please pay close attention.` * `[excited] We shipped the feature, and the response has been fantastic!` * `Let's pause for a moment and continue clearly.` For a comprehensive guide on emotional expression, pauses, and prosody control, see the [Emotional Voice Control tutorial](/tutorials/emotional-voice-control). *** ## Listing Voices Get available voices: ```python Python theme={null} import asyncio import os import aiohttp async def list_voices(): """List all available voices.""" api_key = os.getenv("CAMB_API_KEY") url = "https://client.camb.ai/apis/list-voices" headers = {"x-api-key": api_key} async with aiohttp.ClientSession() as session: async with session.get(url, headers=headers) as resp: if resp.status == 200: voices = await resp.json() return voices else: raise Exception(f"Error: {resp.status}") async def main(): voices = await list_voices() print(f"Found {len(voices)} voices:\n") for voice in voices[:10]: # Print first 10 print(f"ID: {voice['id']}, Name: {voice['voice_name']}, Gender: {voice['gender']}") if __name__ == "__main__": asyncio.run(main()) ``` ```bash cURL theme={null} curl -X GET "https://client.camb.ai/apis/list-voices" \ -H "x-api-key: YOUR_API_KEY" ``` *** ## Playing Audio ```python Using sounddevice theme={null} import asyncio import io import os import wave import aiohttp import numpy as np import sounddevice as sd async def play_tts(text: str): """Generate and play TTS audio.""" api_key = os.getenv("CAMB_API_KEY") url = "https://client.camb.ai/apis/tts-stream" headers = { "x-api-key": api_key, "Content-Type": "application/json", } payload = { "text": text, "voice_id": 147320, "language": "en-us", "speech_model": "mars-8.1-flash-beta", "output_configuration": {"format": "wav"}, } timeout = aiohttp.ClientTimeout(total=120) async with aiohttp.ClientSession(timeout=timeout) as session: async with session.post(url, headers=headers, json=payload) as resp: resp.raise_for_status() audio_bytes = await resp.read() # Parse WAV and extract PCM data with wave.open(io.BytesIO(audio_bytes), 'rb') as wav_file: sample_rate = wav_file.getframerate() audio_data = np.frombuffer(wav_file.readframes(-1), dtype=np.int16) sd.play(audio_data, samplerate=sample_rate) sd.wait() if __name__ == "__main__": asyncio.run(play_tts("Hello! This audio is playing directly.")) ``` ```python Converting raw PCM to WAV theme={null} import wave def save_as_wav(pcm_data: bytes, filename: str, sample_rate: int = 22050): """Save raw PCM16 mono bytes as a WAV file (use only when `output_configuration.format` is a raw `pcm_*` format).""" with wave.open(filename, "wb") as wav_file: wav_file.setnchannels(1) # Mono wav_file.setsampwidth(2) # 16-bit wav_file.setframerate(sample_rate) wav_file.writeframes(pcm_data) # Example: wrap PCM from a `pcm_*` stream after you have the full byte buffer # (If you use format "wav", the API already returns a WAV container—open it with wave.open(io.BytesIO(...)) instead.) if __name__ == "__main__": pcm_bytes = b"..." # your raw PCM payload save_as_wav(pcm_bytes, "output.wav", sample_rate=22050) ``` *** ## Next Steps Use the SDK for simpler integration Complete API documentation Build real-time voice applications Browse available voices