Speech To Speech (Websocket)
Stream microphone audio into a realtime translation session and receive transcripts, translated text, and translated audio.
WSS
Documentation Index
Fetch the complete documentation index at: https://docs.camb.ai/llms.txt
Use this file to discover all available pages before exploring further.
Beta. The Speech to Speech WebSocket is generally available for
testing but session events, configuration, and audio formats may
change in backwards-incompatible ways before GA.
realtime-api-server at wss://realtime.camb.ai/v1/realtime, separate from the /apis/live-tts/ws and /apis/transcription/listen WebSocket endpoints.
model query parameter is optional. Supported values are lilac, violet, iris, and orchid; the default is lilac.
Authenticate with the x-api-key WebSocket request header. If your client cannot set WebSocket headers, send credentials in the first session.update event instead.
Quickstart
Use the Python SDK — it handles the session lifecycle (including thesession.starting cold-boot wait), normalizes binary and base64 audio
frames, and exposes typed events. Input and output audio are PCM16, mono,
24 kHz. The example below streams a WAV file and writes the translated
speech to another WAV.
Integration in 4 steps
Open the realtime socket
Connect to
wss://realtime.camb.ai/v1/realtime. This endpoint is not under the client.camb.ai/apis namespace used by the other WebSocket API references.Send `session.update` as the first message
The first WebSocket message must be a JSON The server responds with
session.update event. The server waits up to 10 seconds for it.session.created, then session.updated.Stream input audio
Send microphone audio as base64-encoded bytes in Each decoded audio payload can be up to 256 KiB.
input_audio_buffer.append. Only text WebSocket messages are parsed as realtime events.Authentication
Prefer the WebSocket request header:session.update event can also carry credentials:
auth object are present, the request header credential is used.
Reference
The AsyncAPI spec above documents every client and server event. Quick lookup:Session configuration
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | No | Defaults to the model query parameter, then lilac. Must be one of lilac, violet, iris, or orchid. |
source_language | string | Yes | Source language tag, for example en-us. Must be a supported language for the chosen model. |
target_language | string | Yes | Target language tag, for example de-de. Must be a supported language for the chosen model. |
output_modalities | string[] | No | Defaults to ["text", "audio"]. |
Supported languages
source_language and target_language accept the BCP-47 tags below (case-insensitive). The set is per model: lilac, violet, and orchid support all 22 languages, while iris supports the 14-language subset marked below. Pick any supported language as the source and any supported language as the target.
All supported realtime languages (22)
All supported realtime languages (22)
| Code | Language | Models |
|---|---|---|
ar-ae | Arabic (United Arab Emirates) | all |
ar-eg | Arabic (Egypt) | all |
ar-sa | Arabic (Saudi Arabia) | all |
cs-cz | Czech (Czechia) | lilac, violet, orchid |
de-de | German (Germany) | all |
en-gb | English (United Kingdom) | all |
en-us | English (United States) | all |
es-es | Spanish (Spain) | all |
fi-fi | Finnish (Finland) | lilac, violet, orchid |
fr-ca | French (Canada) | all |
fr-fr | French (France) | all |
hi-in | Hindi (India) | all |
ja-jp | Japanese (Japan) | all |
ko-kr | Korean (Korea) | all |
no-no | Norwegian | lilac, violet, orchid |
pl-pl | Polish (Poland) | lilac, violet, orchid |
pt-br | Portuguese (Brazil) | all |
sv-se | Swedish (Sweden) | lilac, violet, orchid |
tr-tr | Turkish (Turkey) | lilac, violet, orchid |
uk-ua | Ukrainian (Ukraine) | lilac, violet, orchid |
ur-in | Urdu (India) | lilac, violet, orchid |
zh-cn | Chinese (Mandarin, Simplified) | all |
iris supports: ar-ae, ar-eg, ar-sa, de-de, en-gb, en-us, es-es, fr-ca, fr-fr, hi-in, ja-jp, ko-kr, pt-br, zh-cn.Client events
| Event | Support |
|---|---|
session.update | Required first message. |
input_audio_buffer.append | Supported after activation. |
input_audio_buffer.clear | Recognized but not supported; returns error. |
input_audio_buffer.commit | Recognized but not supported; returns error. |
response.cancel | Recognized but not supported; returns error. |
Server events
| Event | Description |
|---|---|
session.created | Session has been authorized, started, and activated. |
session.updated | Active session configuration. |
conversation.item.input_audio_transcription.completed | Completed user transcript. |
response.text.delta | Additive translated text delta. |
response.text.done | Final translated text. |
response.audio.delta | Base64-encoded translated audio bytes. |
response.audio.done | Current translated audio response is complete. |
error | Unsupported recognized event or billing stop decision. |
Limits
| Limit | Value |
|---|---|
Initial session.update timeout | 10 seconds |
| Maximum client event text size | 1 MiB |
Maximum decoded audio payload per input_audio_buffer.append | 256 KiB |
Audio encoding in input_audio_buffer.append.audio | Base64-encoded audio bytes |
Billing
Active sessions are charged in billing windows and finalized on close, failure, or billing stop. If billing stops a session, the server sends anerror event whose error.message is the billing close reason, then ends the realtime loop.Messages
Previous
Live Transcription (Websocket)Stream raw audio to CAMB over a single WebSocket and receive cumulative interim transcripts, word-level timing, and typed events.
Next
Messages