Live Transcription

Beta. Live transcription is available for testing in the Python and TypeScript SDKs. Event shapes, configuration options, and error semantics may change in backwards-incompatible ways before GA. Pin to the SDK versions you test against.

Overview

Stream raw audio bytes to CAMB over a single WebSocket and receive cumulative transcripts in real time. The session exposes a typed event dispatcher (Ready, Results, Error, Closed), a built-in microphone helper, and forward-compatible onAny subscription for events the server may add in future releases. Key features:

Interim and final results — interim Results (is_final: false) carry the cumulative transcript so far for the current utterance; render them as a live preview. A final Results (is_final: true) closes the utterance; commit it so each utterance is preserved instead of being overwritten by the next one.
Word-level timing — the final Results (is_final: true) includes per-word start/end timestamps and confidence. Interim frames carry an empty words array.
Typed events — the same typed event surface in both SDKs, with per-event typed payloads.
Easy extensibility — new server event = one enum entry + one payload type + one parser entry. Nothing else changes.
Microphone helpers — sounddevice in Python, AudioWorklet in the browser, node-record-lpcm16 in Node.

How segments and cumulative updates work

Speech arrives as a series of utterances. Within one utterance the server streams cumulative interim Results (is_final: false) — each adds whole words to the transcript so far (the model emits complete words, never partial-word fragments). After a short pause in the audio the server finalizes the utterance with a Results whose is_final is true, carrying the complete utterance; the next Results then starts a brand-new utterance from an empty string.

                             utterance 1                            utterance 2
Results     →  R            R            R*           R            R            R*
is_final    →  false        false        true         false        false        true
transcript  →  "good"       "good day"   "good day"   "see"        "see you"    "see you"
                                         └ commit ┘                             └ commit ┘

R* marks the final frame. Because the transcript resets on every new utterance, replacing your UI with the latest transcript and nothing else makes each utterance erase the previous one — the bug you see as text “rewriting the same line” after a pause. Instead, show the interim frames as a live preview, then commit the text when is_final is true (print it on its own line, or append it to a list) so finished utterances are preserved. When the client stops sending audio, the connection closes cleanly with WebSocket close code 1000.

Live Transcription SDK vs Async Transcription SDK

	Live Transcription	Async Transcription
Transport	WebSocket (`/transcription/listen`)	REST (`/transcription/transcribe`)
Input	Streamed PCM	File URL or upload
Latency	~hundreds of ms per partial	Job-based polling
Use when	Live captioning, voice UX, agents	Recordings, batch jobs

Prerequisites

Create an account

Get your API key

Go to Settings → API Keys in Studio and copy your key. See Authentication for details.

Install the SDK

pip install camb-sdk

npm install @camb-ai/sdk

Skip this step if you’re using the direct API.

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Get Started

Create an API Key

Generate a key at CAMB.AI Studio and export it as CAMB_API_KEY for the snippets below.

Install

pip install camb-sdk

npm install @camb-ai/sdk node-record-lpcm16

npm install @camb-ai/sdk

The Python SDK ships sounddevice as a regular dependency, so the Microphone helper works out of the box. In Node the microphone adapter additionally requires the host sox binary. The browser adapter needs no extra packages — it uses getUserMedia and an inlined AudioWorklet.

Quickstart

import asyncio
import os

from camb.client import CambAI
from camb.live_transcription import Microphone, ServerMessageType

async def main():
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.live_transcription.connect(
        model="boli-v5",
        language="en-us",
        sample_rate=16000,
    )

    @session.on(ServerMessageType.RESULTS)
    def _(msg):
        text = msg.transcript.strip()
        if not text:
            return
        # Interim frames refine the current utterance; print them as they
        # arrive. On is_final the utterance is done — commit it on its own
        # line so the next utterance starts fresh instead of overwriting it.
        if not msg.is_final:
            print(f"[Interim] {text}\n", end="", flush=True)
        else:
            print(f"\r\033[K{text}\n", end="", flush=True)

    @session.on(ServerMessageType.CLOSED)
    def _(info):
        print(f"\nClosed code={info.code}")

    async with session:
        mic = Microphone(sample_rate=16000, chunk_size=1600)
        await session.stream_audio(mic)

asyncio.run(main())

import { CambClient, Microphone, ServerMessageType } from "@camb-ai/sdk";

const client = new CambClient({ apiKey: process.env.CAMB_API_KEY });

const session = await client.liveTranscription.connect({
    model: "boli-v5",
    language: "en-us",
    sampleRate: 16000,
});

session.on(ServerMessageType.Results, (msg) => {
    const text = msg.transcript.trim();
    if (!text) return;
    // Interim frames print as they arrive; commit the final on its own
    // line so the next utterance starts fresh instead of overwriting it.
    if (!msg.isFinal) {
        process.stdout.write(`[Interim] ${text}\n`);
    } else {
        process.stdout.write(`\r\x1b[K${text}\n`);
    }
});

const mic = Microphone.fromNode({ sampleRate: 16000 });
await mic.start();
await session.pipe(mic);

import { CambClient, Microphone, ServerMessageType } from "@camb-ai/sdk";

const client = new CambClient({ apiKey: API_KEY });

const session = await client.liveTranscription.connect({
    model: "boli-v5",
    language: "en-us",
});

// Keep finalized utterances; show the in-progress one live beneath them.
let committed = "";
session.on(ServerMessageType.Results, (msg) => {
    const text = msg.transcript.trim();
    const caption = document.getElementById("caption");
    if (msg.isFinal) {
        committed += (committed ? "\n" : "") + text;
        caption.innerText = committed;
    } else {
        caption.innerText = committed ? `${committed}\n${text}` : text;
    }
});

const mic = await Microphone.fromBrowser({ sampleRate: 16000 });
await mic.start();
await session.pipe(mic);

Events and Payloads

Supported events

Both SDKs expose the typed events below through a single ServerMessageType enum. Source tells you who emits each one. UtteranceEnd is a raw wire event with no dedicated enum member — it arrives through the onAny catch-all.

Event	Wire `type`	Source	Notes
`Ready`	`"Ready"`	Server	Fires once, immediately after the WebSocket upgrade.
`Results`	`"Results"`	Server	Fires many times. Carries the cumulative transcript for the current utterance. `is_final` is `false` for interim refinements and `true` on the frame that finalizes an utterance; the next `Results` after a final begins a new utterance. Per-word timing and confidence (`words[]`) are populated only on the final frame (`is_final: true`); interim frames carry an empty `words` array. Replace your in-progress line with `transcript`, then commit it when `is_final` is `true`.
`UtteranceEnd`	`"UtteranceEnd"`	Server (VAD)	Boundary marker emitted around finals. No dedicated typed event — delivered via `on_any` / `onAny`. Prefer `is_final` on `Results` as your commit signal; use this only if you need the raw boundary.
`Error`	`"Error"`	Server or SDK	Server-side: protocol errors (invalid encoding, model failure, etc.). SDK-side: handler exceptions and transport-level failures are re-emitted through the same channel so applications have one place to look.
`Closed`	`"Closed"`	SDK (synthetic)	Emitted by the SDK when the underlying WebSocket closes. Carries the close `code` and `reason` (e.g. `1000` for a clean `CloseStream`, `1008` for an auth failure).

Catch-all subscription. If a future server release adds a new event type before the SDK does, the dispatcher still delivers it to any handler registered via session.on_any(...) (Python) / session.onAny(...) (TypeScript) with the raw payload. Applications stay forward-compatible without forking the SDK.

How events work

The session reads JSON frames off the WebSocket, looks up the wire type in a parser registry, builds the typed payload, and fans out to every handler registered for that event. Unknown event types are still delivered through onAny so applications keep working when the server adds new messages.

Event payloads

{ "type": "Ready" }

{
  "type": "Results",
  "is_final": true,
  "start": 0.0,
  "duration": 1.24,
  "channel": {
    "alternatives": [{
      "transcript": "hello world",
      "confidence": 0.92,
      "words": [
        { "word": "hello", "start": 0.0, "end": 0.42, "confidence": 0.95 },
        { "word": "world", "start": 0.42, "end": 1.10, "confidence": 0.89 }
      ]
    }]
  },
  "metadata": {
    "request_id": "...",
    "model_info": { "name": "boli-v5", "version": "1.2.0" }
  }
}

{
  "type": "Results",
  "is_final": false,
  "channel": {
    "alternatives": [{ "transcript": "hello wor", "words": [] }]
  }
}

{ "type": "Error", "code": "invalid_encoding", "message": "Unsupported encoding 'opus'" }

{ "type": "Closed", "code": 1000, "reason": "" }

A final Results (is_final: true) carries the same fields as an interim one, with one difference: per-word timing is only populated on the final frame. Interim frames carry an empty words array ("words": []); the final frame fills in each word’s start, end, and confidence. There is no separate Final frame on the wire — finals arrive through the same Results handler — so branch on msg.is_final (Python) / msg.isFinal (TypeScript) to decide when to read word timing and commit an utterance.

Subscribing to events

@session.on(ServerMessageType.RESULTS)
def on_results(msg):
    # msg.is_final → interim refinement (False) vs finalized utterance (True)
    print(msg.is_final, msg.transcript, msg.words)

@session.on(ServerMessageType.ERROR)
def on_error(err):
    print(err.code, err.message)

# Forward-compat: receive every event including new ones added later.
@session.on_any
def on_any(event_type, payload):
    print(event_type, payload)

session.on(ServerMessageType.Results, (msg) => {
    // msg.isFinal → interim refinement (false) vs finalized utterance (true)
    console.log(msg.isFinal, msg.transcript, msg.words);
});

session.on(ServerMessageType.Error, (err) => {
    console.error(err.code, err.message);
});

session.onAny((event, payload) => {
    console.log(event, payload);
});

Adding a custom event

If you fork the SDK or wrap it for an internal use-case, adding a new server event is a three-step change in either language:

Add a new member to ServerMessageType.
Define the payload (a Pydantic model in Python, an interface in TypeScript).
Register a parser in PARSER_REGISTRY.

No dispatch code outside the registry needs to change.

Basic Configuration

Every option below is optional. Omit any to inherit the server default documented in api-reference/websockets/asyncapi.json.

Option	Default	Description
`model`	`boli-v5`	Transcription model. `boli-v5` and `boli-v5-transcribe` are supported.
`language`	`en-us`	Source language hint. Accepts BCP-47 codes (`en-us`, `pt-br`), the Languages enum name (`EN_US`), or its numeric ID. The model also auto-detects.
`encoding`	`linear16`	Audio encoding of the bytes you send. One of `linear16`, `linear32`, `alaw`, `mulaw`.
`sample_rate` / `sampleRate`	`16000`	Sample rate of the bytes you send. The server resamples internally.
`channels`	`1`	Channel count of the bytes you send. Multi-channel input is downmixed to mono.
`base_url` / `baseUrl`	`wss://client.camb.ai/apis`	Override the WebSocket base URL.

Basic configuration example

session = await client.live_transcription.connect(
    model="boli-v5-transcribe",
    language="pt-br",
    encoding="linear16",
    sample_rate=48000,
    channels=1,
)

const session = await client.liveTranscription.connect({
    model: "boli-v5-transcribe",
    language: "pt-br",
    encoding: "linear16",
    sampleRate: 48000,
    channels: 1,
});

Advanced Configuration

KeepAlive

Some intermediaries (load balancers, browser proxies) close idle WebSocket connections after a few seconds of silence. If your audio pipeline can be bursty, send a KeepAlive frame between bursts.

await session.keep_alive()

await session.keepAlive();

CloseStream

session.close() (Python: same name) sends {"type": "CloseStream"} and waits for the server’s clean 1000 close. Always prefer this over just hanging up — it ensures the server flushes any pending transcript.

Bring-your-own transport

The Python SDK’s connect() accepts a transport argument implementing the Transport protocol. The TypeScript client accepts a transport: () => Transport factory. Use this to inject a mock during testing or to plug in a custom WebSocket implementation.

Microphone Helpers

Python — sounddevice

from camb.live_transcription import Microphone

mic = Microphone(sample_rate=16000, chunk_size=1600, device=None)
with mic:
    chunk = mic.read()  # blocking

sounddevice ships with camb-sdk, so no extra install step is needed. On Linux you may need to install PortAudio system libraries (e.g. apt install libportaudio2) — sounddevice’s docs cover platform prerequisites.

TypeScript — browser

const mic = await Microphone.fromBrowser({
    sampleRate: 16000,
    chunkMs: 100,
});
await mic.start();

Internally the helper requests the platform sample rate via getUserMedia, then downsamples to the requested rate inside an AudioWorklet so the server always sees PCM16 LE little-endian.

TypeScript — Node

const mic = Microphone.fromNode({ sampleRate: 16000 });
await mic.start();

The Node adapter is built on node-record-lpcm16, declared in package.json as an optionalDependencies entry. The host machine also needs the sox binary on PATH.

Error Handling and Close Codes

Server errors

Whenever the server cannot continue, it emits an Error frame and closes with a non-1000 code:

{ "type": "Error", "code": "invalid_encoding", "message": "..." }

Transport errors

Connection-level failures (DNS, TLS, mid-stream drops) are surfaced through the same Error event with code: "transport_error" (TypeScript) or code: "handler_exception" (Python), keeping a single observable channel for application code.

Close codes

Code	Meaning
`1000`	Normal close. Either side sent `CloseStream` / closed cleanly.
`1006`	Abnormal close (transport dropped without a frame).
`4000+`	Application-specific. The server may use these for auth failures and quota errors.

Timeout

The Live Transcription API has an internal timeout of 1 hour. Please add retries to handle/create further connections.

More Information

/transcription/listen WebSocket reference — the underlying wire protocol.
Python SDK · TypeScript SDK — the full SDK guides.
Source: cambai-python-sdk · cambai-typescript-sdk.

Getting Started

Models

Tutorials

SDK Guides

Hosting Platforms

Integrations

API Reference

Other Products

Release Logs

Live Transcription

Overview

How segments and cumulative updates work

Live Transcription SDK vs Async Transcription SDK

Prerequisites

Get Started

Create an API Key

Install

Quickstart

Events and Payloads

Supported events

How events work

Event payloads

Subscribing to events

Adding a custom event

Basic Configuration

Basic configuration example

Advanced Configuration

KeepAlive

CloseStream

Bring-your-own transport

Microphone Helpers

Python — sounddevice

TypeScript — browser

TypeScript — Node

Error Handling and Close Codes

Server errors

Transport errors

Close codes

Timeout

More Information

​Overview

​How segments and cumulative updates work

​Live Transcription SDK vs Async Transcription SDK

​Prerequisites

​Get Started

​Create an API Key

​Install

​Quickstart

​Events and Payloads

​Supported events

​How events work

​Event payloads

​Subscribing to events

​Adding a custom event

​Basic Configuration

​Basic configuration example

​Advanced Configuration

​KeepAlive

​CloseStream

​Bring-your-own transport

​Microphone Helpers

​Python — sounddevice

​TypeScript — browser

​TypeScript — Node

​Error Handling and Close Codes

​Server errors

​Transport errors

​Close codes

​Timeout

​More Information

Overview

How segments and cumulative updates work

Live Transcription SDK vs Async Transcription SDK

Prerequisites

Get Started

Create an API Key

Install

Quickstart

Events and Payloads

Supported events

How events work

Event payloads

Subscribing to events

Adding a custom event

Basic Configuration

Basic configuration example

Advanced Configuration

KeepAlive

CloseStream

Bring-your-own transport

Microphone Helpers

Python — sounddevice

TypeScript — browser

TypeScript — Node

Error Handling and Close Codes

Server errors

Transport errors

Close codes

Timeout

More Information