Live Transcription (Websocket)

WSS

streaming-transcription

listen

Beta. The live transcription WebSocket is generally available for testing but the schema, event shape, and error codes may change in backwards-incompatible ways before GA. Pin to the SDK versions you test against.

What you get back

After connect, the server emits a single Ready event, then streams Results messages as your audio is consumed. Each Results carries the cumulative transcript for the current utterance — replace your in-progress UI state with it rather than concatenating. The is_final flag tells interim refinements (false) apart from the frame that finalizes an utterance (true). After a final, the next Results starts a brand-new utterance from an empty string — so if you only ever overwrite your UI with the latest transcript, each new utterance erases the previous one. Show the interim frames as a live preview, then commit the text when is_final is true (the server also emits an UtteranceEnd boundary marker around the same point). Word-level timing (channel.alternatives[0].words, each entry with word/start/end/confidence) is populated only on the final frame (is_final: true); interim frames carry an empty words array. The UtteranceEnd marker also reports last_word_end. When the client stops sending audio, the connection closes cleanly with WebSocket close code 1000.

Supported source languages

Pass any of the codes below as the language query parameter, or omit it to default to en-us. You can also pass the Languages enum name (EN_US) or its numeric ID. The model performs automatic language detection regardless of the value passed.

All 143 supported source languages

Code	Language
`en-us`	English (United States)
`af-za`	Afrikaans (South Africa)
`am-et`	Amharic (Ethiopia)
`ar-ae`	Arabic (United Arab Emirates)
`ar-bh`	Arabic (Bahrain)
`ar-dz`	Arabic (Algeria)
`ar-eg`	Arabic (Egypt)
`ar-iq`	Arabic (Iraq)
`ar-jo`	Arabic (Jordan)
`ar-kw`	Arabic (Kuwait)
`ar-lb`	Arabic (Lebanon)
`ar-ly`	Arabic (Libya)
`ar-ma`	Arabic (Morocco)
`ar-om`	Arabic (Oman)
`ar-qa`	Arabic (Qatar)
`ar-sa`	Arabic (Saudi Arabia)
`ar-sy`	Arabic (Syria)
`ar-tn`	Arabic (Tunisia)
`ar-ye`	Arabic (Yemen)
`az-az`	Azerbaijani (Latin, Azerbaijan)
`bg-bg`	Bulgarian (Bulgaria)
`bn-bd`	Bangla (Bangladesh)
`bn-in`	Bengali (India)
`bs-ba`	Bosnian (Bosnia And Herzegovina)
`ca-es`	Catalan
`cs-cz`	Czech (Czechia)
`cy-gb`	Welsh (United Kingdom)
`da-dk`	Danish (Denmark)
`de-at`	German (Austria)
`de-ch`	German (Switzerland)
`de-de`	German (Germany)
`el-gr`	Greek (Greece)
`en-au`	English (Australia)
`en-ca`	English (Canada)
`en-gb`	English (United Kingdom)
`en-hk`	English (Hong Kong Sar)
`en-ie`	English (Ireland)
`en-in`	English (India)
`en-ke`	English (Kenya)
`en-ng`	English (Nigeria)
`en-nz`	English (New Zealand)
`en-ph`	English (Philippines)
`en-sg`	English (Singapore)
`en-tz`	English (Tanzania)
`en-za`	English (South Africa)
`es-ar`	Spanish (Argentina)
`es-bo`	Spanish (Bolivia)
`es-cl`	Spanish (Chile)
`es-co`	Spanish (Colombia)
`es-cr`	Spanish (Costa Rica)
`es-cu`	Spanish (Cuba)
`es-do`	Spanish (Dominican Republic)
`es-ec`	Spanish (Ecuador)
`es-es`	Spanish (Spain)
`es-gq`	Spanish (Equatorial Guinea)
`es-gt`	Spanish (Guatemala)
`es-hn`	Spanish (Honduras)
`es-mx`	Spanish (Mexico)
`es-ni`	Spanish (Nicaragua)
`es-pa`	Spanish (Panama)
`es-pe`	Spanish (Peru)
`es-pr`	Spanish (Puerto Rico)
`es-py`	Spanish (Paraguay)
`es-sv`	Spanish (El Salvador)
`es-us`	Spanish (United States)
`es-uy`	Spanish (Uruguay)
`es-ve`	Spanish (Venezuela)
`et-ee`	Estonian (Estonia)
`eu-es`	Basque
`fa-ir`	Persian (Iran)
`fi-fi`	Finnish (Finland)
`fr-be`	French (Belgium)
`fr-ca`	French (Canada)
`fr-ch`	French (Switzerland)
`fr-fr`	French (France)
`gl-es`	Galician
`gu-in`	Gujarati (India)
`he-il`	Hebrew (Israel)
`hi-in`	Hindi (India)
`hr-hr`	Croatian (Croatia)
`hu-hu`	Hungarian (Hungary)
`hy-am`	Armenian (Armenia)
`id-id`	Indonesian (Indonesia)
`is-is`	Icelandic (Iceland)
`it-it`	Italian (Italy)
`ja-jp`	Japanese (Japan)
`ka-ge`	Georgian (Georgia)
`kk-kz`	Kazakh (Kazakhstan)
`km-kh`	Khmer (Cambodia)
`kn-in`	Kannada (India)
`ko-kr`	Korean (Korea)
`lo-la`	Lao (Laos)
`lt-lt`	Lithuanian (Lithuania)
`lv-lv`	Latvian (Latvia)
`mk-mk`	Macedonian (North Macedonia)
`ml-in`	Malayalam (India)
`mn-mn`	Mongolian (Mongolia)
`mr-in`	Marathi (India)
`ms-my`	Malay (Malaysia)
`mt-mt`	Maltese (Malta)
`my-mm`	Burmese (Myanmar)
`nb-no`	Norwegian (Bokmål, Norway)
`ne-np`	Nepali (Nepal)
`nl-be`	Dutch (Belgium)
`nl-nl`	Dutch (Netherlands)
`pa-in`	Punjabi (India)
`pl-pl`	Polish (Poland)
`ps-af`	Pashto (Afghanistan)
`pt-br`	Portuguese (Brazil)
`pt-pt`	Portuguese (Portugal)
`ro-ro`	Romanian (Romania)
`ru-ru`	Russian (Russia)
`si-lk`	Sinhala (Sri Lanka)
`sk-sk`	Slovak (Slovakia)
`sl-si`	Slovenian (Slovenia)
`so-so`	Somali (Somalia)
`sq-al`	Albanian (Albania)
`sr-rs`	Serbian (Cyrillic, Serbia)
`su-id`	Sundanese (Indonesia)
`sv-se`	Swedish (Sweden)
`sw-ke`	Swahili (Kenya)
`sw-tz`	Swahili (Tanzania)
`ta-in`	Tamil (India)
`ta-lk`	Tamil (Sri Lanka)
`ta-my`	Tamil (Malaysia)
`ta-sg`	Tamil (Singapore)
`te-in`	Telugu (India)
`th-th`	Thai (Thailand)
`tl-ph`	Tagalog (Philippines)
`tr-tr`	Turkish (Turkey)
`uk-ua`	Ukrainian (Ukraine)
`ur-in`	Urdu (India)
`ur-pk`	Urdu (Pakistan)
`uz-uz`	Uzbek (Latin, Uzbekistan)
`vi-vn`	Vietnamese (Vietnam)
`zh-cn`	Chinese (Mandarin, Simplified)
`zh-cn-henan`	Chinese (Zhongyuan Mandarin Henan, Simplified)
`zh-cn-liaoning`	Chinese (Northeastern Mandarin, Simplified)
`zh-cn-shaanxi`	Chinese (Zhongyuan Mandarin Shaanxi, Simplified)
`zh-cn-shandong`	Chinese (Jilu Mandarin, Simplified)
`zh-cn-sichuan`	Chinese (Southwestern Mandarin, Simplified)
`zh-hk`	Chinese (Cantonese, Traditional)
`zh-tw`	Chinese (Taiwanese Mandarin, Traditional)

Sample Testing Script

The CAMB SDKs wrap the protocol below behind a typed event dispatcher. The snippets here stream a WAV file at real-time pace so the server sees arrival patterns equivalent to a live microphone capture. For mic input, swap FileAudioSource(...) for Microphone(...) (Python) or Microphone.fromBrowser / Microphone.fromNode (TypeScript) — see the Live Transcription tutorial.

# pip install camb-sdk
import asyncio
import os
import wave

from camb.client import CambAI
from camb.live_transcription import FileAudioSource, ServerMessageType


async def main(audio_path: str = "sample.wav") -> None:
    with wave.open(audio_path, "rb") as wf:
        sample_rate, channels = wf.getframerate(), wf.getnchannels()

    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.live_transcription.connect(
        model="boli-v5",
        language="en-us",
        encoding="linear16",
        sample_rate=sample_rate,
        channels=channels,
    )

    @session.on(ServerMessageType.READY)
    def _(_): print("Ready: streaming audio...")

    @session.on(ServerMessageType.RESULTS)
    def _(msg):
        text = msg.transcript.strip()
        if not text:
            return
        # Interim frames print as they arrive; commit the final on its own
        # line so the next utterance doesn't overwrite it.
        if not msg.is_final:
            print(f"[Interim] {text}\n", end="", flush=True)
        else:
            print(f"\r\033[K{text}\n", end="", flush=True)

    @session.on(ServerMessageType.CLOSED)
    def _(info): print(f"\nClosed: code={info.code}")

    async with session:
        await session.stream_audio(FileAudioSource(audio_path, real_time=True))


asyncio.run(main())

// npm install @camb-ai/sdk ws
import fs from "node:fs";
import { CambClient, ServerMessageType } from "@camb-ai/sdk";

// Minimal WAV reader — returns { sampleRate, channels, bitsPerSample, pcm }.
function readWav(path: string) {
  const buf = fs.readFileSync(path);
  let o = 12, sr = 0, ch = 0, bps = 0, dStart = -1, dLen = 0;
  while (o < buf.length - 8) {
    const id = buf.toString("ascii", o, o + 4);
    const size = buf.readUInt32LE(o + 4);
    if (id === "fmt ") { ch = buf.readUInt16LE(o + 10); sr = buf.readUInt32LE(o + 12); bps = buf.readUInt16LE(o + 22); }
    else if (id === "data") { dStart = o + 8; dLen = size; break; }
    o += 8 + size;
  }
  return { sampleRate: sr, channels: ch, bitsPerSample: bps, pcm: buf.subarray(dStart, dStart + dLen) };
}

const { sampleRate, channels, bitsPerSample, pcm } = readWav("sample.wav");

const client = new CambClient({ apiKey: process.env.CAMB_API_KEY! });
const session = await client.liveTranscription.connect({
  model: "boli-v5",
  language: "en-us",
  encoding: "linear16",
  sampleRate,
  channels,
});

session.on(ServerMessageType.Ready, () => console.log("Ready: streaming audio..."));
session.on(ServerMessageType.Results, (m) => {
  const text = m.transcript.trim();
  if (!text) return;
  // Interim frames print as they arrive; commit the final on its own line.
  if (!m.isFinal) process.stdout.write(`[Interim] ${text}\n`);
  else process.stdout.write(`\r\x1b[K${text}\n`);
});
session.on(ServerMessageType.Closed, (i) => console.log(`\nClosed: code=${i.code}`));

// Real-time-paced send so the server sees live arrival patterns.
const bytesPerSec = sampleRate * channels * (bitsPerSample / 8);
const chunk = Math.floor(bytesPerSec / 10);  // 100 ms
const t0 = Date.now();
let sent = 0;
for (let i = 0; i < pcm.length; i += chunk) {
  await session.sendAudio(pcm.subarray(i, i + chunk));
  sent += chunk;
  const drift = (sent / bytesPerSec) * 1000 - (Date.now() - t0);
  if (drift > 0) await new Promise((r) => setTimeout(r, drift));
}
await session.close();
await session.waitUntilClosed();

Timeout

The Live Transcription API has an internal timeout of 1 hour. Please add retries to handle/create further connections.

Raw protocol (no SDK)

If you cannot use the SDKs, the wire protocol is small enough to drive directly. Open wss://realtime.camb.ai/streaming-transcription/listen?... with the x-api-key header, send binary PCM frames matching the encoding / sample_rate / channels query string, optionally send {"type":"KeepAlive"} during silence, and finish with {"type":"CloseStream"}. The server returns {"type":"Ready"} once, then {"type":"Results", ...} frames carrying the cumulative transcript ("is_final": false for interim refinements, true on the frame that finalizes each utterance), and closes with WebSocket code 1000 on a clean shutdown.

Messages

Server Ready

type:object

Emitted once after the upstream transcription session is established.

type

type:string

required

Ready

Interim Transcription Result

type:object

Cumulative interim transcript. Each event carries the full transcript-so-far for the current utterance — update your UI by replacing the previous interim, not by concatenating. The current release emits interim results only (is_final is always false); when input stops, the connection closes cleanly with WebSocket code 1000.

type

type:string

required

Results

is_final

type:boolean

Always false in the current release.

start

type:number

Seconds since session start at which this segment begins.

duration

type:number

Segment duration in seconds. May be 0 for interim deltas.

channel

type:object

required

alternatives

type:array

transcript

type:string

confidence

type:number

words

type:array

word

type:string

start

type:number

end

type:number

confidence

type:number

metadata

type:object

request_id

type:string

model_uuid

type:string

model_info

type:object

name

type:string

version

type:string

Server Error

type:object

Emitted when the server cannot continue the session (for example: invalid query parameters, upstream model failure, or unsupported audio encoding). After emitting Error, the server closes the connection with a non-1000 WebSocket close code.

type

type:string

required

Error

code

type:string

Stable, machine-readable error identifier (for example invalid_encoding, model_unavailable).

message

type:string

required

Human-readable explanation safe to surface to end-users.

Binary Audio Frame

type:string

Raw audio bytes in the encoding declared on the query string.

Close Stream

type:object

Signal end of input. The server flushes and closes the connection.

type

type:string

required

CloseStream

Keep Alive

type:object

Optional heartbeat. The server accepts and ignores the contents.

type

type:string

required

KeepAlive

Last modified on June 30, 2026

Live TTS (WebSocket)Stream text in, receive synthesized speech audio + optional word-level timestamps in real time over a single WebSocket connection.

Messages

Getting Started

Models

Tutorials

SDK Guides

Hosting Platforms

Integrations

API Reference

Other Products

Release Logs

Live Transcription (Websocket)

What you get back

Supported source languages

Sample Testing Script

Timeout

Raw protocol (no SDK)

​What you get back

​Supported source languages

​Sample Testing Script

​Timeout

​Raw protocol (no SDK)

What you get back

Supported source languages

Sample Testing Script

Timeout

Raw protocol (no SDK)