Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.camb.ai/llms.txt

Use this file to discover all available pages before exploring further.

Beta. Realtime speech-to-speech translation is available for testing in the Python SDK. Event shapes, configuration options, audio formats, and error semantics may change in backwards-incompatible ways before GA. Pin to the SDK version you test against.

Overview

Speak (or stream a file) in one language and receive the translation as live text and synthesized speech over a single WebSocket. The session exposes a typed event dispatcher, a built-in microphone helper, and a forward-compatible on_any subscription for events the server may add in future releases. Key features:
  • Translated text and audio β€” receive incremental translated text (response.text.delta / response.text.done) and translated speech audio (response.audio.delta) as you speak.
  • Typed events β€” a single ServerEventType enum with per-event typed payloads.
  • Microphone + file helpers β€” Microphone (via sounddevice) and FileAudioSource ship with the SDK.
  • Easy extensibility β€” new server event = one enum entry + one payload type + one parser entry.
Audio is PCM16, mono, 24 kHz in both directions β€” the rate the realtime endpoint expects for input and emits for output.

Prerequisites

1

Create an account

Sign up at CAMB.AI Studio if you haven’t already.
2

Get your API key

Go to Settings β†’ API Keys in Studio and copy your key. See Authentication for details.
3

Install the SDK

pip install camb-sdk
Skip this step if you’re using the direct API.
4

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Models

ModelCold bootNotes
irisNone (ready in ~1s)Low-latency. Recommended for interactive use.
lilac (default)~30s+
violet~30s+
orchid~30s+
Non-iris models cold-boot for 30+ seconds on the first connection; the server emits session.starting (and WebSocket keepalives) during that window. session.wait_until_ready() blocks until the session is active (up to 90s), so you don’t need to handle the boot wait yourself.

Supported languages

source_language and target_language accept the BCP-47 tags below (case-insensitive). The set is per model: lilac, violet, and orchid support all 22 languages, while iris supports the 14-language subset marked below. See the WebSocket API reference for the authoritative list.
CodeLanguageModels
ar-aeArabic (United Arab Emirates)all
ar-egArabic (Egypt)all
ar-saArabic (Saudi Arabia)all
cs-czCzech (Czechia)lilac, violet, orchid
de-deGerman (Germany)all
en-gbEnglish (United Kingdom)all
en-usEnglish (United States)all
es-esSpanish (Spain)all
fi-fiFinnish (Finland)lilac, violet, orchid
fr-caFrench (Canada)all
fr-frFrench (France)all
hi-inHindi (India)all
ja-jpJapanese (Japan)all
ko-krKorean (Korea)all
no-noNorwegianlilac, violet, orchid
pl-plPolish (Poland)lilac, violet, orchid
pt-brPortuguese (Brazil)all
sv-seSwedish (Sweden)lilac, violet, orchid
tr-trTurkish (Turkey)lilac, violet, orchid
uk-uaUkrainian (Ukraine)lilac, violet, orchid
ur-inUrdu (India)lilac, violet, orchid
zh-cnChinese (Mandarin, Simplified)all
iris supports: ar-ae, ar-eg, ar-sa, de-de, en-gb, en-us, es-es, fr-ca, fr-fr, hi-in, ja-jp, ko-kr, pt-br, zh-cn.

Get Started

Create an API Key

Generate a key at CAMB.AI Studio and export it as CAMB_API_KEY for the snippets below.

Install

pip install camb-sdk
sounddevice ships with camb-sdk, so the Microphone and Speaker helpers work out of the box. On Linux you may need PortAudio system libraries (e.g. apt install libportaudio2).

Quickstart (microphone)

Speak into your mic; the translated speech plays back through your speakers and the translated text prints as it arrives.
import asyncio
import os
import threading

import sounddevice as sd

from camb.client import CambAI
from camb.live_transcription import Microphone
from camb.realtime import ServerEventType

SAMPLE_RATE = 24000  # PCM16 mono, both directions


class Speaker:
    """Plays raw PCM16 mono bytes through the default output device."""

    def __init__(self, sample_rate: int = SAMPLE_RATE) -> None:
        self._buf = bytearray()
        self._lock = threading.Lock()
        self._stream = sd.RawOutputStream(
            samplerate=sample_rate, channels=1, dtype="int16", callback=self._cb
        )

    def _cb(self, outdata, frames, time_info, status) -> None:
        want = len(outdata)
        with self._lock:
            take = min(want, len(self._buf))
            outdata[:take] = bytes(self._buf[:take])
            del self._buf[:take]
        if take < want:
            outdata[take:] = b"\x00" * (want - take)  # underrun β†’ silence

    def start(self):
        self._stream.start()

    def feed(self, pcm: bytes):
        with self._lock:
            self._buf.extend(pcm)

    def close(self):
        self._stream.stop()
        self._stream.close()


async def main():
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.realtime.connect(
        source_language="en-us",
        target_language="de-de",
        model="iris",  # low-latency, no cold-boot wait
    )

    speaker = Speaker()

    @session.on(ServerEventType.TRANSCRIPT_COMPLETED)
    def _(event):
        print(f"\n[you]         {event.transcript}")

    @session.on(ServerEventType.TEXT_DONE)
    def _(event):
        print(f"[translation] {event.text}")

    @session.on(ServerEventType.AUDIO_DELTA)
    def _(event):
        speaker.feed(event.data)

    async with session:
        await session.wait_until_ready()
        speaker.start()
        mic = Microphone(sample_rate=SAMPLE_RATE, chunk_size=SAMPLE_RATE // 10)
        try:
            await session.stream_audio(mic)
        finally:
            speaker.close()


asyncio.run(main())

Quickstart (file β†’ file)

Useful on machines with no microphone (CI, servers). The input WAV must be 16-bit PCM, mono, 24 kHz; the translated audio is written to an output WAV.
import asyncio
import os
import wave

from camb.client import CambAI
from camb.live_transcription import FileAudioSource
from camb.realtime import ServerEventType

SAMPLE_RATE = 24000


async def main(in_path: str, out_path: str):
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.realtime.connect(
        source_language="en-us",
        target_language="de-de",
        model="iris",
    )

    out_audio = bytearray()
    audio_done = asyncio.Event()

    @session.on(ServerEventType.TEXT_DONE)
    def _(event):
        print(f"[translation] {event.text}")

    @session.on(ServerEventType.AUDIO_DELTA)
    def _(event):
        out_audio.extend(event.data)

    @session.on(ServerEventType.AUDIO_DONE)
    def _(_):
        audio_done.set()

    async with session:
        await session.wait_until_ready()
        await session.stream_audio(FileAudioSource(in_path, real_time=True))
        try:
            await asyncio.wait_for(audio_done.wait(), timeout=30)
        except asyncio.TimeoutError:
            pass

    if out_audio:
        with wave.open(out_path, "wb") as out:
            out.setnchannels(1)
            out.setsampwidth(2)
            out.setframerate(SAMPLE_RATE)
            out.writeframes(bytes(out_audio))
        print(f"Wrote {len(out_audio) / (SAMPLE_RATE * 2):.1f}s to {out_path}")


asyncio.run(main("input_24k_mono.wav", "translated_output.wav"))
Re-encode any source file to the required format with:
ffmpeg -i input.wav -ar 24000 -ac 1 -sample_fmt s16 input_24k_mono.wav
Feed the session clear speech. Music, silence, or noisy/low-quality audio may not be recognized by the speech model, in which case no transcript or translation is produced for that audio.

Events and Payloads

Supported events

All events are exposed through the ServerEventType enum.
EventWire typeNotes
SESSION_STARTINGsession.startingPipeline is booting (non-iris cold boot). Not yet ready for audio.
SESSION_CREATEDsession.createdSession is authorized and ready. wait_until_ready() resolves here.
SESSION_UPDATEDsession.updatedEcho of the active session configuration.
TRANSCRIPT_COMPLETEDconversation.item.input_audio_transcription.completedFinal transcript of a user utterance (source language).
TEXT_DELTAresponse.text.deltaIncremental translated text; additive within one response.
TEXT_DONEresponse.text.doneComplete translated text for the current response.
AUDIO_DELTAresponse.audio.delta (or binary frame)Chunk of synthesized translated speech (event.data is raw PCM16 bytes).
AUDIO_DONEresponse.audio.doneCurrent translated audio response is complete.
ERRORerrorServer error, or a handler exception surfaced by the SDK.
CLOSEDClosedSynthetic β€” emitted by the SDK when the WebSocket closes. Carries code and reason.
Which text events fire depends on the model. iris emits translated text (TEXT_DELTA / TEXT_DONE); lilac and orchid also emit the source-language transcript (TRANSCRIPT_COMPLETED). All models emit translated audio (AUDIO_DELTA).
Catch-all subscription. A future server event the SDK doesn’t model yet is still delivered to any handler registered via session.on_any(...) with the raw payload, so applications stay forward-compatible.

Subscribing to events

@session.on(ServerEventType.TEXT_DELTA)
def on_text(event):
    print(event.delta, end="", flush=True)

@session.on(ServerEventType.ERROR)
def on_error(err):
    print("error:", err.message)

# Forward-compat: receive every event, including ones added later.
@session.on_any
def on_any(event_type, payload):
    print(event_type, payload)

Configuration

OptionDefaultDescription
source_languageβ€” (required)BCP-47 tag of the input speech, e.g. en-us. Must be a supported language for the model.
target_languageβ€” (required)BCP-47 tag of the translation, e.g. de-de. Must be a supported language for the model.
modellilacOne of lilac, violet, iris, orchid.
output_modalities["text", "audio"]Subset of text and audio.
session = await client.realtime.connect(
    source_language="en-us",
    target_language="es-es",
    model="iris",
    output_modalities=["text", "audio"],
)

More Information