Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.camb.ai/llms.txt

Use this file to discover all available pages before exploring further.

Beta. Live transcription is available for testing in the Python and TypeScript SDKs. Event shapes, configuration options, and error semantics may change in backwards-incompatible ways before GA. Pin to the SDK versions you test against.

Overview

Stream raw audio bytes to CAMB over a single WebSocket and receive cumulative transcripts in real time. The session exposes a typed event dispatcher (Ready, Results, Error, Closed), a built-in microphone helper, and forward-compatible onAny subscription for events the server may add in future releases. Key features:
  • Interim and final results — interim Results (is_final: false) carry the cumulative transcript so far for the current utterance; render them as a live preview. A final Results (is_final: true) closes the utterance; commit it so each utterance is preserved instead of being overwritten by the next one.
  • Word-level timing — every Results payload includes per-word start/end timestamps and confidence.
  • Typed events — the same typed event surface in both SDKs, with per-event typed payloads.
  • Easy extensibility — new server event = one enum entry + one payload type + one parser entry. Nothing else changes.
  • Microphone helperssounddevice in Python, AudioWorklet in the browser, node-record-lpcm16 in Node.

How segments and cumulative updates work

Speech arrives as a series of utterances. Within one utterance the server streams cumulative interim Results (is_final: false) — each adds whole words to the transcript so far (the model emits complete words, never partial-word fragments). After a short pause in the audio the server finalizes the utterance with a Results whose is_final is true, carrying the complete utterance; the next Results then starts a brand-new utterance from an empty string.
                             utterance 1                            utterance 2
Results     →  R            R            R*           R            R            R*
is_final    →  false        false        true         false        false        true
transcript  →  "good"       "good day"   "good day"   "see"        "see you"    "see you"
                                         └ commit ┘                             └ commit ┘
R* marks the final frame. Because the transcript resets on every new utterance, replacing your UI with the latest transcript and nothing else makes each utterance erase the previous one — the bug you see as text “rewriting the same line” after a pause. Instead, show the interim frames as a live preview, then commit the text when is_final is true (print it on its own line, or append it to a list) so finished utterances are preserved. When the client stops sending audio, the connection closes cleanly with WebSocket close code 1000.

Live Transcription SDK vs Async Transcription SDK

Live TranscriptionAsync Transcription
TransportWebSocket (/transcription/listen)REST (/transcription/transcribe)
InputStreamed PCMFile URL or upload
Latency~hundreds of ms per partialJob-based polling
Use whenLive captioning, voice UX, agentsRecordings, batch jobs

Prerequisites

1

Create an account

Sign up at CAMB.AI Studio if you haven’t already.
2

Get your API key

Go to Settings → API Keys in Studio and copy your key. See Authentication for details.
3

Install the SDK

pip install camb-sdk
Skip this step if you’re using the direct API.
4

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Get Started

Create an API Key

Generate a key at the CAMB API portal and export it as CAMB_API_KEY for the snippets below.

Install

pip install camb-sdk
The Python SDK ships sounddevice as a regular dependency, so the Microphone helper works out of the box. In Node the microphone adapter additionally requires the host sox binary. The browser adapter needs no extra packages — it uses getUserMedia and an inlined AudioWorklet.

Quickstart

import asyncio
import os

from camb.client import CambAI
from camb.live_transcription import Microphone, ServerMessageType

async def main():
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.live_transcription.connect(
        model="boli-v5",
        language="en-us",
        sample_rate=16000,
    )

    @session.on(ServerMessageType.RESULTS)
    def _(msg):
        text = msg.transcript.strip()
        if not text:
            return
        # Interim frames refine the current utterance; print them as they
        # arrive. On is_final the utterance is done — commit it on its own
        # line so the next utterance starts fresh instead of overwriting it.
        if not msg.is_final:
            print(f"[Interim] {text}\n", end="", flush=True)
        else:
            print(f"\r\033[K{text}\n", end="", flush=True)

    @session.on(ServerMessageType.CLOSED)
    def _(info):
        print(f"\nClosed code={info.code}")

    async with session:
        mic = Microphone(sample_rate=16000, chunk_size=1600)
        await session.stream_audio(mic)

asyncio.run(main())

Events and Payloads

Supported events

Both SDKs expose the typed events below through a single ServerMessageType enum. Source tells you who emits each one. UtteranceEnd is a raw wire event with no dedicated enum member — it arrives through the onAny catch-all.
EventWire typeSourceNotes
Ready"Ready"ServerFires once, immediately after the WebSocket upgrade.
Results"Results"ServerFires many times. Carries the cumulative transcript for the current utterance, plus word-level timing and confidence. is_final is false for interim refinements and true on the frame that finalizes an utterance; the next Results after a final begins a new utterance. Replace your in-progress line with transcript, then commit it when is_final is true.
UtteranceEnd"UtteranceEnd"Server (VAD)Boundary marker emitted around finals. No dedicated typed event — delivered via on_any / onAny. Prefer is_final on Results as your commit signal; use this only if you need the raw boundary.
Error"Error"Server or SDKServer-side: protocol errors (invalid encoding, model failure, etc.). SDK-side: handler exceptions and transport-level failures are re-emitted through the same channel so applications have one place to look.
Closed"Closed"SDK (synthetic)Emitted by the SDK when the underlying WebSocket closes. Carries the close code and reason (e.g. 1000 for a clean CloseStream, 1008 for an auth failure).
Catch-all subscription. If a future server release adds a new event type before the SDK does, the dispatcher still delivers it to any handler registered via session.on_any(...) (Python) / session.onAny(...) (TypeScript) with the raw payload. Applications stay forward-compatible without forking the SDK.

How events work

The session reads JSON frames off the WebSocket, looks up the wire type in a parser registry, builds the typed payload, and fans out to every handler registered for that event. Unknown event types are still delivered through onAny so applications keep working when the server adds new messages.

Event payloads

{ "type": "Ready" }
A final Results (is_final: true) has the exact same shape as an interim one — only the flag differs. There is no separate Final frame on the wire; finals arrive through the same Results handler, so branch on msg.is_final (Python) / msg.isFinal (TypeScript) to decide when to commit an utterance.

Subscribing to events

@session.on(ServerMessageType.RESULTS)
def on_results(msg):
    # msg.is_final → interim refinement (False) vs finalized utterance (True)
    print(msg.is_final, msg.transcript, msg.words)

@session.on(ServerMessageType.ERROR)
def on_error(err):
    print(err.code, err.message)

# Forward-compat: receive every event including new ones added later.
@session.on_any
def on_any(event_type, payload):
    print(event_type, payload)

Adding a custom event

If you fork the SDK or wrap it for an internal use-case, adding a new server event is a three-step change in either language:
  1. Add a new member to ServerMessageType.
  2. Define the payload (a Pydantic model in Python, an interface in TypeScript).
  3. Register a parser in PARSER_REGISTRY.
No dispatch code outside the registry needs to change.

Basic Configuration

Every option below is optional. Omit any to inherit the server default documented in api-reference/websockets/asyncapi.json.
OptionDefaultDescription
modelboli-v5Transcription model. boli-v5 and boli-v5-transcribe are supported.
languageen-usSource language hint. Accepts BCP-47 codes (en-us, pt-br), the Languages enum name (EN_US), or its numeric ID. The model also auto-detects.
encodinglinear16Audio encoding of the bytes you send. One of linear16, linear32, alaw, mulaw.
sample_rate / sampleRate16000Sample rate of the bytes you send. The server resamples internally.
channels1Channel count of the bytes you send. Multi-channel input is downmixed to mono.
base_url / baseUrlwss://client.camb.ai/apisOverride the WebSocket base URL.

Basic configuration example

session = await client.live_transcription.connect(
    model="boli-v5-transcribe",
    language="pt-br",
    encoding="linear16",
    sample_rate=48000,
    channels=1,
)

Advanced Configuration

KeepAlive

Some intermediaries (load balancers, browser proxies) close idle WebSocket connections after a few seconds of silence. If your audio pipeline can be bursty, send a KeepAlive frame between bursts.
await session.keep_alive()

CloseStream

session.close() (Python: same name) sends {"type": "CloseStream"} and waits for the server’s clean 1000 close. Always prefer this over just hanging up — it ensures the server flushes any pending transcript.

Bring-your-own transport

The Python SDK’s connect() accepts a transport argument implementing the Transport protocol. The TypeScript client accepts a transport: () => Transport factory. Use this to inject a mock during testing or to plug in a custom WebSocket implementation.

Microphone Helpers

Python — sounddevice

from camb.live_transcription import Microphone

mic = Microphone(sample_rate=16000, chunk_size=1600, device=None)
with mic:
    chunk = mic.read()  # blocking
sounddevice ships with camb-sdk, so no extra install step is needed. On Linux you may need to install PortAudio system libraries (e.g. apt install libportaudio2) — sounddevice’s docs cover platform prerequisites.

TypeScript — browser

const mic = await Microphone.fromBrowser({
    sampleRate: 16000,
    chunkMs: 100,
});
await mic.start();
Internally the helper requests the platform sample rate via getUserMedia, then downsamples to the requested rate inside an AudioWorklet so the server always sees PCM16 LE little-endian.

TypeScript — Node

const mic = Microphone.fromNode({ sampleRate: 16000 });
await mic.start();
The Node adapter is built on node-record-lpcm16, declared in package.json as an optionalDependencies entry. The host machine also needs the sox binary on PATH.

Error Handling and Close Codes

Server errors

Whenever the server cannot continue, it emits an Error frame and closes with a non-1000 code:
{ "type": "Error", "code": "invalid_encoding", "message": "..." }

Transport errors

Connection-level failures (DNS, TLS, mid-stream drops) are surfaced through the same Error event with code: "transport_error" (TypeScript) or code: "handler_exception" (Python), keeping a single observable channel for application code.

Close codes

CodeMeaning
1000Normal close. Either side sent CloseStream / closed cleanly.
1006Abnormal close (transport dropped without a frame).
4000+Application-specific. The server may use these for auth failures and quota errors.

More Information