Skip to main content
WSS
/
apis
/
transcription
/
listen
Beta. The live transcription WebSocket is generally available for testing but the schema, event shape, and error codes may change in backwards-incompatible ways before GA. Pin to the SDK versions you test against.

What you get back

After connect, the server emits a single Ready event, then streams Results messages as your audio is consumed. Each Results carries the cumulative transcript for the current utterance — replace your in-progress UI state with it rather than concatenating. The is_final flag tells interim refinements (false) apart from the frame that finalizes an utterance (true). After a final, the next Results starts a brand-new utterance from an empty string — so if you only ever overwrite your UI with the latest transcript, each new utterance erases the previous one. Show the interim frames as a live preview, then commit the text when is_final is true (the server also emits an UtteranceEnd boundary marker around the same point). Word-level timing (channel.alternatives[0].words, each entry with word/start/end/confidence) is populated only on the final frame (is_final: true); interim frames carry an empty words array. The UtteranceEnd marker also reports last_word_end. When the client stops sending audio, the connection closes cleanly with WebSocket close code 1000.

Supported source languages

Pass any of the codes below as the language query parameter, or omit it to default to en-us. You can also pass the Languages enum name (EN_US) or its numeric ID. The model performs automatic language detection regardless of the value passed.
CodeLanguage
en-usEnglish (United States)
af-zaAfrikaans (South Africa)
am-etAmharic (Ethiopia)
ar-aeArabic (United Arab Emirates)
ar-bhArabic (Bahrain)
ar-dzArabic (Algeria)
ar-egArabic (Egypt)
ar-iqArabic (Iraq)
ar-joArabic (Jordan)
ar-kwArabic (Kuwait)
ar-lbArabic (Lebanon)
ar-lyArabic (Libya)
ar-maArabic (Morocco)
ar-omArabic (Oman)
ar-qaArabic (Qatar)
ar-saArabic (Saudi Arabia)
ar-syArabic (Syria)
ar-tnArabic (Tunisia)
ar-yeArabic (Yemen)
az-azAzerbaijani (Latin, Azerbaijan)
bg-bgBulgarian (Bulgaria)
bn-bdBangla (Bangladesh)
bn-inBengali (India)
bs-baBosnian (Bosnia And Herzegovina)
ca-esCatalan
cs-czCzech (Czechia)
cy-gbWelsh (United Kingdom)
da-dkDanish (Denmark)
de-atGerman (Austria)
de-chGerman (Switzerland)
de-deGerman (Germany)
el-grGreek (Greece)
en-auEnglish (Australia)
en-caEnglish (Canada)
en-gbEnglish (United Kingdom)
en-hkEnglish (Hong Kong Sar)
en-ieEnglish (Ireland)
en-inEnglish (India)
en-keEnglish (Kenya)
en-ngEnglish (Nigeria)
en-nzEnglish (New Zealand)
en-phEnglish (Philippines)
en-sgEnglish (Singapore)
en-tzEnglish (Tanzania)
en-zaEnglish (South Africa)
es-arSpanish (Argentina)
es-boSpanish (Bolivia)
es-clSpanish (Chile)
es-coSpanish (Colombia)
es-crSpanish (Costa Rica)
es-cuSpanish (Cuba)
es-doSpanish (Dominican Republic)
es-ecSpanish (Ecuador)
es-esSpanish (Spain)
es-gqSpanish (Equatorial Guinea)
es-gtSpanish (Guatemala)
es-hnSpanish (Honduras)
es-mxSpanish (Mexico)
es-niSpanish (Nicaragua)
es-paSpanish (Panama)
es-peSpanish (Peru)
es-prSpanish (Puerto Rico)
es-pySpanish (Paraguay)
es-svSpanish (El Salvador)
es-usSpanish (United States)
es-uySpanish (Uruguay)
es-veSpanish (Venezuela)
et-eeEstonian (Estonia)
eu-esBasque
fa-irPersian (Iran)
fi-fiFinnish (Finland)
fr-beFrench (Belgium)
fr-caFrench (Canada)
fr-chFrench (Switzerland)
fr-frFrench (France)
gl-esGalician
gu-inGujarati (India)
he-ilHebrew (Israel)
hi-inHindi (India)
hr-hrCroatian (Croatia)
hu-huHungarian (Hungary)
hy-amArmenian (Armenia)
id-idIndonesian (Indonesia)
is-isIcelandic (Iceland)
it-itItalian (Italy)
ja-jpJapanese (Japan)
ka-geGeorgian (Georgia)
kk-kzKazakh (Kazakhstan)
km-khKhmer (Cambodia)
kn-inKannada (India)
ko-krKorean (Korea)
lo-laLao (Laos)
lt-ltLithuanian (Lithuania)
lv-lvLatvian (Latvia)
mk-mkMacedonian (North Macedonia)
ml-inMalayalam (India)
mn-mnMongolian (Mongolia)
mr-inMarathi (India)
ms-myMalay (Malaysia)
mt-mtMaltese (Malta)
my-mmBurmese (Myanmar)
nb-noNorwegian (Bokmål, Norway)
ne-npNepali (Nepal)
nl-beDutch (Belgium)
nl-nlDutch (Netherlands)
pa-inPunjabi (India)
pl-plPolish (Poland)
ps-afPashto (Afghanistan)
pt-brPortuguese (Brazil)
pt-ptPortuguese (Portugal)
ro-roRomanian (Romania)
ru-ruRussian (Russia)
si-lkSinhala (Sri Lanka)
sk-skSlovak (Slovakia)
sl-siSlovenian (Slovenia)
so-soSomali (Somalia)
sq-alAlbanian (Albania)
sr-rsSerbian (Cyrillic, Serbia)
su-idSundanese (Indonesia)
sv-seSwedish (Sweden)
sw-keSwahili (Kenya)
sw-tzSwahili (Tanzania)
ta-inTamil (India)
ta-lkTamil (Sri Lanka)
ta-myTamil (Malaysia)
ta-sgTamil (Singapore)
te-inTelugu (India)
th-thThai (Thailand)
tl-phTagalog (Philippines)
tr-trTurkish (Turkey)
uk-uaUkrainian (Ukraine)
ur-inUrdu (India)
ur-pkUrdu (Pakistan)
uz-uzUzbek (Latin, Uzbekistan)
vi-vnVietnamese (Vietnam)
zh-cnChinese (Mandarin, Simplified)
zh-cn-henanChinese (Zhongyuan Mandarin Henan, Simplified)
zh-cn-liaoningChinese (Northeastern Mandarin, Simplified)
zh-cn-shaanxiChinese (Zhongyuan Mandarin Shaanxi, Simplified)
zh-cn-shandongChinese (Jilu Mandarin, Simplified)
zh-cn-sichuanChinese (Southwestern Mandarin, Simplified)
zh-hkChinese (Cantonese, Traditional)
zh-twChinese (Taiwanese Mandarin, Traditional)

Sample Testing Script

The CAMB SDKs wrap the protocol below behind a typed event dispatcher. The snippets here stream a WAV file at real-time pace so the server sees arrival patterns equivalent to a live microphone capture. For mic input, swap FileAudioSource(...) for Microphone(...) (Python) or Microphone.fromBrowser / Microphone.fromNode (TypeScript) — see the Live Transcription tutorial.
# pip install camb-sdk
import asyncio
import os
import wave

from camb.client import CambAI
from camb.live_transcription import FileAudioSource, ServerMessageType


async def main(audio_path: str = "sample.wav") -> None:
    with wave.open(audio_path, "rb") as wf:
        sample_rate, channels = wf.getframerate(), wf.getnchannels()

    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.live_transcription.connect(
        model="boli-v5",
        language="en-us",
        encoding="linear16",
        sample_rate=sample_rate,
        channels=channels,
    )

    @session.on(ServerMessageType.READY)
    def _(_): print("Ready: streaming audio...")

    @session.on(ServerMessageType.RESULTS)
    def _(msg):
        text = msg.transcript.strip()
        if not text:
            return
        # Interim frames print as they arrive; commit the final on its own
        # line so the next utterance doesn't overwrite it.
        if not msg.is_final:
            print(f"[Interim] {text}\n", end="", flush=True)
        else:
            print(f"\r\033[K{text}\n", end="", flush=True)

    @session.on(ServerMessageType.CLOSED)
    def _(info): print(f"\nClosed: code={info.code}")

    async with session:
        await session.stream_audio(FileAudioSource(audio_path, real_time=True))


asyncio.run(main())

Raw protocol (no SDK)

If you cannot use the SDKs, the wire protocol is small enough to drive directly. Open wss://client.camb.ai/apis/transcription/listen?... with the x-api-key header, send binary PCM frames matching the encoding / sample_rate / channels query string, optionally send {"type":"KeepAlive"} during silence, and finish with {"type":"CloseStream"}. The server returns {"type":"Ready"} once, then {"type":"Results", ...} frames carrying the cumulative transcript ("is_final": false for interim refinements, true on the frame that finalizes each utterance), and closes with WebSocket code 1000 on a clean shutdown.
Messages
Server Ready
type:object

Emitted once after the upstream transcription session is established.

Interim Transcription Result
type:object

Cumulative interim transcript. Each event carries the full transcript-so-far for the current utterance — update your UI by replacing the previous interim, not by concatenating. The current release emits interim results only (is_final is always false); when input stops, the connection closes cleanly with WebSocket code 1000.

Server Error
type:object

Emitted when the server cannot continue the session (for example: invalid query parameters, upstream model failure, or unsupported audio encoding). After emitting Error, the server closes the connection with a non-1000 WebSocket close code.

Binary Audio Frame
type:string

Raw audio bytes in the encoding declared on the query string.

Close Stream
type:object

Signal end of input. The server flushes and closes the connection.

Keep Alive
type:object

Optional heartbeat. The server accepts and ignores the contents.