Documentation Index
Fetch the complete documentation index at: https://docs.camb.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Convert spoken audio or video into accurate, timestamped text with speaker labels. The pipeline is asynchronous: submit a job with a local file or a public URL, poll until it completes, then retrieve the segmented transcript.Prerequisites
Create an account
Sign up at CAMB.AI Studio if you haven’t already.
Get your API key
Go to Settings → API Keys in Studio and copy your key. See Authentication for details.
Install the SDK
Code
Transcribing a local file
Passmedia_file instead of media_url to upload a local file:
Parameters
Required
| Parameter | Type | Description |
|---|---|---|
language | integer | Source-language ID for the spoken content (e.g. 1 for English). See Language IDs. |
media_file or media_url | file or string | Provide exactly one — a local audio/video file, or a publicly accessible URL. |
Optional
| Parameter | Type | Description |
|---|---|---|
project_name | string | Label for the job in your dashboard |
project_description | string | Additional notes for the job |
folder_id | integer | Place the run inside a specific folder |
word_level_timestamps | boolean | Passed to get_transcription_result / getTranscriptionResult to return per-word timing in addition to segment-level timing |
Result shape
get_transcription_result returns a TranscriptionResult with a transcript array. Each segment contains:
| Field | Type | Description |
|---|---|---|
start | float | Segment start time in seconds |
end | float | Segment end time in seconds |
text | string | Transcribed text for the segment |
speaker | string | Speaker label (e.g. Speaker 1) |
Language IDs
Unlike dubbing, transcription’slanguage parameter is a raw integer source-language ID. Some commonly used values:
| Language | ID |
|---|---|
| English | 1 |
| Spanish | 54 |
| French | 76 |
| German | 31 |
| Mandarin Chinese | 139 |
| Japanese | 88 |
| Arabic | 4 |
For the full list of supported source languages and their IDs, see the Source Languages reference.
Tips
- Supported formats:
.mp3,.wav,.aac,.flac,.mp4,.mov, and.mxf(MXF is enterprise-only). For best quality use a lossless format like WAV or FLAC. - File size: Uploaded files must be under 20 MB. For longer recordings, host the file and pass a
media_urlinstead, or split the recording into chunks. - Polling timeout: For long media, cap your polling loop (e.g. 60 attempts x 5s = 5 minutes) and handle the timeout gracefully.
- Word-level timestamps: Set
word_level_timestamps=truewhen fetching the result to get precise per-word timing — useful for karaoke-style highlighting and subtitle alignment. - Pick the right language: Specifying the correct source language ID significantly improves accuracy. For multilingual content, choose the predominant language.
Next Steps
Create Transcription API
Full API reference for the transcription endpoint.
Poll Transcription Result
Status polling endpoint reference.
Get Transcription Run Result
Retrieve the transcript segments and timing data.
Dubbing
Translate a video into another language while preserving the original voice.