POST
/
transcribe
Create Transcription
curl --request POST \
  --url https://client.camb.ai/apis/transcribe \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-api-key: <api-key>' \
  --form language=1 \
  --form file=@example-file \
  --form media_file=@example-file
{
  "task_id": "<string>"
}

Converting Speech to Text with Precision

Our transcription service transforms spoken content into accurate, readable text, enabling you to make your media content searchable, accessible, and analytically valuable. This endpoint initiates a transcription task, processing your media file and returning a unique identifier that allows you to track and retrieve your results.

Understanding Speech Transcription

Speech transcription technology analyzes media recordings of human speech or video that contains speech and converts them into written text. This process employs sophisticated machine learning models trained on diverse speech patterns, accents, and linguistic contexts to deliver high-quality text outputs. Our system handles various media formats and speaking situations, from clear studio recordings to more challenging environments with background noise. When you submit an media file for transcription, our system:
  1. Analyzes the audio signal to identify speech segments
  2. Processes these segments through advanced recognition models
  3. Applies language-specific rules and context awareness
  4. Generates a readable text transcript that captures the spoken content
This transformation creates valuable text assets from your media content, enabling new ways to search, analyze, and repurpose your spoken material.

Supported Languages

Our transcription service supports a wide range of languages. Some of the most commonly used include:
  • English (1)
  • Spanish (54)
  • French (76)
  • German (31)
  • Mandarin Chinese (139)
  • Japanese (88)
  • Arabic (4)

For a complete list of supported languages and their respective language codes, refer to our Language Support Documentation

Supported media Formats

For optimal transcription quality, we recommend using high-quality media with clear speech and minimal background noise. Our service accepts the following media formats:
  • MP3 (.mp3)
  • WAV (.wav)
  • AAC (.aac)
  • FLAC (.flac)
  • MP4 (.mp4)
  • MOV (.mov)
  • MXF (.mxf)
Please note that MXF format support is exclusively available to customers on our Enterprise plan, offering professional broadcast-quality media handling for organizations with advanced needs.

Request Example

You can create a transcription task by either uploading an media file or providing a URL to an media resource. Here are examples using cURL::
  • Using a local media file:
curl -X POST "https://client.camb.ai/apis/transcribe" \
  -H "x-api-key: YOUR_API_KEY" \
  -F "language=1" \
  -F "media_file=@/path/to/your/media_file.mp3"
  • Using a remote media URL:
curl -X POST "https://client.camb.ai/apis/transcribe" \
  -H "x-api-key: YOUR_API_KEY" \
  -F "language=1" \
  -F "media_url=https://example.com/media_file.mp3"
Here’s how to handle both scenarios in Python with the requests library:
import requests

def create_transcription_task(api_key, language=1, media_file_path=None, media_url=None):
    """
    Initiates a transcription task using either an media file or URL.

    Args:
        api_key (str): API authentication key
        language (str): Language code (default: "en")
        media_file_path (str, optional): Path to local media file
        media_url (str, optional): URL of remote media resource

    Returns:
        dict: Response containing task_id for status tracking
    """
    url = "https://client.camb.ai/apis/transcribe"
    headers = {"x-api-key": api_key}
    data = {"language": language}
    files = None

    # Validate input
    if not (media_file_path or media_url):
        raise ValueError("Must provide either media_file_path or media_url")
        if media_file_path and media_url:
        raise ValueError("Provide only one of media_file_path or media_url")

    # Prepare request components
    if media_file_path:
        with open(media_file_path, "rb") as media_file:
            files = {"media_file": (media_file_path.split("/")[-1], media_file)}
    else:
        data["media_url"] = media_url

    # Execute request
    response = requests.post(url, headers=headers, files=files, data=data)

    if response.ok:
        return response.json()
    else:
        print(f"Error {response.status_code}: {response.text}")
        return None

# Example with file upload
file_result = create_transcription_task(
    api_key="your_api_key",
    media_file_path="meeting_recording.mp3",
    language=1 # English (United States)
)

# Example with media URL
url_result = create_transcription_task(
    api_key="your_api_key",
    media_url="https://storage.example.com/interview.mp3",
    language="54 # Spanish (Spain)
)

Processing Time Considerations

Transcription processing time depends on several factors:
  • Media Duration: Longer files naturally take more time to process
  • Media Quality: Clear, high-quality recordings process more efficiently
  • Language Complexity: Some languages may require more processing time
  • System Load: Processing time can vary based on current system demand

Next Steps: Monitoring Your Transcription Task

After submitting your transcription request, you’ll want to monitor its progress and retrieve the results once processing completes. To do this:
  1. Use the /transcribe/{task-id} endpoint to check your task’s status
  2. Poll the status endpoint at reasonable intervals (we recommend 5-15 second intervals for most cases)
  3. Once the status shows as SUCCESS, you can retrieve your full transcript

Best Practices for Optimal Results

To get the most accurate transcriptions from our service:
  1. Use High-Quality Media: Whenever possible, use media recorded in quiet environments with minimal background noise.
  2. Appropriate Media Format: Submit uncompressed media formats like WAV or FLAC for best quality, or high-bitrate MP3s if file size is a concern.
  3. Speaker Clarity: Encourage clear speaking with moderate pace for best recognition accuracy.
  4. Specify the Correct Language: Always provide the correct language parameter to ensure our models apply the right language patterns.
  5. Segment Longer Content: For very long recordings (over 2 hours), consider splitting into multiple smaller files for more efficient processing.

Handling Common Issues

If you encounter problems with your transcription tasks, these troubleshooting steps may help:

Rejected File Uploads

Problem: Your request returns an error about the file upload. Potential Solutions:
  • Verify your file is in one of our supported formats (.mp3, .wav, .aac, .flac, .mp4, .mov, .mxf)
  • Check that your file isn’t corrupted or empty
  • Ensure your file doesn’t exceed our size limit (20 MB)

Incorrect Language Specification

Problem: Transcription results appear inaccurate or contain many errors. Potential Solutions:
  • Verify you specified the correct language code
  • For multilingual content, choose the predominant language

Taking Your Transcriptions Further

Once you’ve successfully transcribed your media content, consider these next steps:
  1. Semantic Analysis: Extract key topics, sentiments, and entities from your transcribed text
  2. Content Indexing: Make your media searchable by indexing the transcript content
  3. Accessibility Compliance: Use transcripts to make your media content accessible to all users
  4. Translation: Convert your transcript into other languages for global reach
  5. Summary Generation: Create concise summaries of longer transcribed content

Authorizations

x-api-key
string
header
required

The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.

Body

multipart/form-data

Response

200
application/json

Successful Response

A JSON that contains unique identifier for the task. This is used to query the status of the transcription task that is running. It is returned when a create request is made to process speech into text.