Skip to main content
WSS
/
v1
/
realtime

Documentation Index

Fetch the complete documentation index at: https://docs.camb.ai/llms.txt

Use this file to discover all available pages before exploring further.

Beta. The Speech to Speech WebSocket is generally available for testing but session events, configuration, and audio formats may change in backwards-incompatible ways before GA.
Bidirectional WebSocket endpoint for real-time speech translation. This endpoint is served by realtime-api-server at wss://realtime.camb.ai/v1/realtime, separate from the /apis/live-tts/ws and /apis/transcription/listen WebSocket endpoints.
GET /v1/realtime?model=lilac
Host: realtime.camb.ai
x-api-key: <YOUR_API_KEY>
The model query parameter is optional. Supported values are lilac, violet, iris, and orchid; the default is lilac. Authenticate with the x-api-key WebSocket request header. If your client cannot set WebSocket headers, send credentials in the first session.update event instead.

Quickstart

Use the Python SDK — it handles the session lifecycle (including the session.starting cold-boot wait), normalizes binary and base64 audio frames, and exposes typed events. Input and output audio are PCM16, mono, 24 kHz. The example below streams a WAV file and writes the translated speech to another WAV.
import asyncio
import os
import wave

from camb.client import CambAI
from camb.realtime import ServerEventType
from camb.live_transcription import FileAudioSource


async def main():
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.realtime.connect(
        source_language="en-us",
        target_language="de-de",
        model="iris",  # low-latency; lilac/violet/orchid cold-boot ~30s+
    )

    out_audio = bytearray()

    @session.on(ServerEventType.TEXT_DONE)
    def _(event):
        print("translation:", event.text)

    @session.on(ServerEventType.AUDIO_DELTA)
    def _(event):
        out_audio.extend(event.data)  # raw PCM16 mono 24 kHz

    async with session:
        await session.wait_until_ready()
        # Input WAV must be 16-bit PCM, mono, 24 kHz.
        await session.stream_audio(FileAudioSource("input_24k_mono.wav", real_time=True))

    with wave.open("translated.wav", "wb") as out:
        out.setnchannels(1)
        out.setsampwidth(2)
        out.setframerate(24000)
        out.writeframes(bytes(out_audio))


asyncio.run(main())
See the Realtime Speech Translation tutorial for the microphone quickstart, the full event list, and configuration. The sections below document the underlying wire protocol for reference (for example, if you are building a client in a language without an SDK).

Integration in 4 steps

1

Open the realtime socket

Connect to wss://realtime.camb.ai/v1/realtime. This endpoint is not under the client.camb.ai/apis namespace used by the other WebSocket API references.
async with websockets.connect(
    "wss://realtime.camb.ai/v1/realtime?model=lilac",
    additional_headers=[("x-api-key", "YOUR_API_KEY")],
) as ws:
    ...
2

Send `session.update` as the first message

The first WebSocket message must be a JSON session.update event. The server waits up to 10 seconds for it.
{
  "type": "session.update",
  "session": {
    "model": "lilac",
    "source_language": "en-us",
    "target_language": "de-de",
    "output_modalities": ["text", "audio"]
  }
}
The server responds with session.created, then session.updated.
3

Stream input audio

Send microphone audio as base64-encoded bytes in input_audio_buffer.append. Only text WebSocket messages are parsed as realtime events.
{
  "type": "input_audio_buffer.append",
  "audio": "<base64_audio_bytes>"
}
Each decoded audio payload can be up to 256 KiB.
4

Read translated output

Listen for transcript, translated text, and translated audio events. response.text.delta values are additive for the current response, and response.audio.delta contains base64-encoded synthesized audio bytes.

Authentication

Prefer the WebSocket request header:
x-api-key: <YOUR_API_KEY>
The initial session.update event can also carry credentials:
{
  "type": "session.update",
  "session": {
    "model": "lilac",
    "source_language": "en-us",
    "target_language": "de-de",
    "output_modalities": ["text", "audio"]
  },
  "auth": {
    "api_key": "<YOUR_API_KEY>"
  }
}
If both the request header and auth object are present, the request header credential is used.

Reference

The AsyncAPI spec above documents every client and server event. Quick lookup:

Session configuration

FieldTypeRequiredNotes
modelstringNoDefaults to the model query parameter, then lilac. Must be one of lilac, violet, iris, or orchid.
source_languagestringYesSource language tag, for example en-us. Must be a supported language for the chosen model.
target_languagestringYesTarget language tag, for example de-de. Must be a supported language for the chosen model.
output_modalitiesstring[]NoDefaults to ["text", "audio"].

Supported languages

source_language and target_language accept the BCP-47 tags below (case-insensitive). The set is per model: lilac, violet, and orchid support all 22 languages, while iris supports the 14-language subset marked below. Pick any supported language as the source and any supported language as the target.
CodeLanguageModels
ar-aeArabic (United Arab Emirates)all
ar-egArabic (Egypt)all
ar-saArabic (Saudi Arabia)all
cs-czCzech (Czechia)lilac, violet, orchid
de-deGerman (Germany)all
en-gbEnglish (United Kingdom)all
en-usEnglish (United States)all
es-esSpanish (Spain)all
fi-fiFinnish (Finland)lilac, violet, orchid
fr-caFrench (Canada)all
fr-frFrench (France)all
hi-inHindi (India)all
ja-jpJapanese (Japan)all
ko-krKorean (Korea)all
no-noNorwegianlilac, violet, orchid
pl-plPolish (Poland)lilac, violet, orchid
pt-brPortuguese (Brazil)all
sv-seSwedish (Sweden)lilac, violet, orchid
tr-trTurkish (Turkey)lilac, violet, orchid
uk-uaUkrainian (Ukraine)lilac, violet, orchid
ur-inUrdu (India)lilac, violet, orchid
zh-cnChinese (Mandarin, Simplified)all
iris supports: ar-ae, ar-eg, ar-sa, de-de, en-gb, en-us, es-es, fr-ca, fr-fr, hi-in, ja-jp, ko-kr, pt-br, zh-cn.

Client events

EventSupport
session.updateRequired first message.
input_audio_buffer.appendSupported after activation.
input_audio_buffer.clearRecognized but not supported; returns error.
input_audio_buffer.commitRecognized but not supported; returns error.
response.cancelRecognized but not supported; returns error.

Server events

EventDescription
session.createdSession has been authorized, started, and activated.
session.updatedActive session configuration.
conversation.item.input_audio_transcription.completedCompleted user transcript.
response.text.deltaAdditive translated text delta.
response.text.doneFinal translated text.
response.audio.deltaBase64-encoded translated audio bytes.
response.audio.doneCurrent translated audio response is complete.
errorUnsupported recognized event or billing stop decision.

Limits

LimitValue
Initial session.update timeout10 seconds
Maximum client event text size1 MiB
Maximum decoded audio payload per input_audio_buffer.append256 KiB
Audio encoding in input_audio_buffer.append.audioBase64-encoded audio bytes

Billing

Active sessions are charged in billing windows and finalized on close, failure, or billing stop. If billing stops a session, the server sends an error event whose error.message is the billing close reason, then ends the realtime loop.
Messages
Session Created
type:object

Sent after authorization, startup, and activation complete.

Session Updated
type:object

Sent immediately after session.created with the active session configuration.

Input Audio Transcription Completed
type:object

Completed user transcript produced by the realtime pipeline.

Response Text Delta
type:object

Incremental translated text. The delta is additive for the current response.

Response Text Done
type:object

Final translated text for the current response.

Response Audio Delta
type:object

Base64-encoded synthesized output audio bytes.

Response Audio Done
type:object

Current assistant audio response is complete.

Error
type:object

Structured error for unsupported recognized events and billing stop decisions.

Update Session
type:object

First client event. Authorizes and activates the realtime session.

Append Input Audio
type:object

Append base64-encoded microphone audio bytes to the realtime input stream.

Clear Input Audio Buffer
type:object

Recognized but not supported in this version. The server responds with an error event.

Commit Input Audio Buffer
type:object

Recognized but not supported in this version. The server responds with an error event.

Cancel Response
type:object

Recognized but not supported in this version. The server responds with an error event.