Transcription - Camb.ai

Overview

Convert spoken audio or video into accurate, timestamped text with speaker labels. The pipeline is asynchronous: submit a job with a local file or a public URL, poll until it completes, then retrieve the segmented transcript.

Prerequisites

Create an account

Get your API key

Go to Settings → API Keys in Studio and copy your key. See Authentication for details.

Install the SDK

pip install camb-sdk

Skip this step if you’re using the direct API.

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Code

import os
import time
from camb.client import CambAI
from camb.types.language_enums import Languages

client = CambAI(api_key=os.getenv("CAMB_API_KEY"))

def transcribe_audio():
    # Step 1: Submit the transcription job
    response = client.transcription.create_transcription(
        language=Languages.EN_US,  # English (US)
        media_url="https://example.com/meeting.mp3"
    )

    task_id = response.task_id
    print(f"Transcription task created: {task_id}")

    # Step 2: Poll until complete
    while True:
        status = client.transcription.get_transcription_task_status(task_id=task_id)
        print(f"Status: {status.status}")

        if status.status == "SUCCESS":
            # Step 3: Retrieve the transcript
            result = client.transcription.get_transcription_result(run_id=status.run_id)
            for segment in result.transcript[:3]:
                print(f"[{segment.start:.2f}-{segment.end:.2f}] {segment.speaker}: {segment.text}")
            break
        elif status.status == "ERROR":
            print(f"Transcription failed: {status.exception_reason}")
            break

        time.sleep(5)

transcribe_audio()

Transcribing a local file

Pass media_file instead of media_url to upload a local file:

with open("meeting.mp3", "rb") as audio_file:
    response = client.transcription.create_transcription(
        language=Languages.EN_US,
        media_file=audio_file
    )

Parameters

Required

Parameter	Type	Description
`language`	`Languages` enum	Source language for the spoken content (e.g. `Languages.EN_US`). The raw API also accepts locale-tag strings like `"en-us"`; numeric IDs still work but are deprecated. See Languages.
`media_file` or `media_url`	file or string	Provide exactly one — a local audio/video file, or a publicly accessible URL.

Optional

Parameter	Type	Description
`project_name`	string	Label for the job in your dashboard
`project_description`	string	Additional notes for the job
`folder_id`	integer	Place the run inside a specific folder
`word_level_timestamps`	boolean	Passed to `get_transcription_result` / `getTranscriptionResult` to return per-word timing in addition to segment-level timing

Result shape

get_transcription_result returns a TranscriptionResult with a transcript array. Each segment contains:

Field	Type	Description
`start`	float	Segment start time in seconds
`end`	float	Segment end time in seconds
`text`	string	Transcribed text for the segment
`speaker`	string	Speaker label (e.g. `Speaker 1`)

Languages

The Python and TypeScript SDKs expect the Languages enum:

Language	Enum	Locale (raw API)
English (US)	`Languages.EN_US`	`en-us`
Spanish (Spain)	`Languages.ES_ES`	`es-es`
French (France)	`Languages.FR_FR`	`fr-fr`
German (Germany)	`Languages.DE_DE`	`de-de`
Mandarin Chinese	`Languages.ZH_CN`	`zh-cn`
Japanese (Japan)	`Languages.JA_JP`	`ja-jp`
Arabic	`Languages.AR_SA`	`ar-sa`

If you’re calling the API directly (not via the SDK), pass the locale tag. Numeric language IDs still work but are deprecated.

For the full list of supported source languages, see the Source Languages reference.

Tips

Supported formats: .mp3, .wav, .aac, .flac, .mp4, .mov, and .mxf (MXF is enterprise-only). For best quality use a lossless format like WAV or FLAC.
File size: Uploaded files must be under 20 MB. For longer recordings, host the file and pass a media_url instead, or split the recording into chunks.
Polling timeout: For long media, cap your polling loop (e.g. 60 attempts x 5s = 5 minutes) and handle the timeout gracefully.
Word-level timestamps: Set word_level_timestamps=true when fetching the result to get precise per-word timing — useful for karaoke-style highlighting and subtitle alignment.
Pick the right language: Specifying the correct source language significantly improves accuracy. For multilingual content, choose the predominant language.

Next Steps

Create Transcription API

Full API reference for the transcription endpoint.

Poll Transcription Result

Status polling endpoint reference.

Get Transcription Run Result

Retrieve the transcript segments and timing data.

Dubbing

Translate a video into another language while preserving the original voice.

​Overview

​Prerequisites

​Code

​Transcribing a local file

​Parameters

​Required

​Optional

​Result shape

​Languages

​Tips

​Next Steps