GET
/
transcription-result
/
{run_id}
curl --request GET \
  --url https://client.camb.ai/apis/transcription-result/{run_id} \
  --header 'x-api-key: <api-key>'
[
  {
    "start": 123,
    "end": 123,
    "text": "<string>",
    "speaker": "<string>"
  }
]

Access the detailed transcription of your media content through this powerful endpoint. When your transcription process completes, this API provides a comprehensive text representation of your audio or video content with precise timing information and speaker identification. This transcription data serves as the foundation for numerous content enhancement workflows, including subtitling, content analysis, and accessibility improvements.

The Power of Accurate Transcription

Transcription transforms the spoken word into structured text data, unlocking numerous possibilities for your content. Our system delivers transcriptions that include:

  • Temporal Precision: Exact start and end timestamps for each spoken segment
  • Speaker Differentiation: Clear identification of different speakers throughout the content
  • Verbatim Text: Accurate textual representation of all spoken content

This structured approach to transcription enables content creators, researchers, and educators to work with audio-visual materials in entirely new ways, making the content searchable, analyzable, and more accessible.

Retrieving Your Transcription Data

To access your completed transcription, you’ll need the unique run_id that was assigned when you initially submitted your transcription request. This identifier allows our system to locate your specific transcription results within our processing infrastructure.

Customizing Your Transcription Format

Our API offers flexible output options to suit your specific workflow needs. When retrieving your transcription, you can specify:

  • Format Type (format_type): Choose how your transcription is structured

    • txt: Plain text format with speaker labels and timestamps (default)
    • srt: SubRip Text format, ready for subtitle integration in most video players
    • vtt: WebVTT format, optimized for web-based video players and HTML5 applications
  • Data Type (data_type): Determine how you’ll receive the transcription data

    • file: Receive a pre-signed URL to download the complete transcription file (default)
    • json: Receive the raw transcription data directly in the API response

These parameters allow you to tailor the transcription output to perfectly match your intended use case, whether you’re developing subtitles, creating a searchable transcript, or performing content analysis.

Let’s explore how to retrieve and work with your transcription data using Python:

import requests
import json
import pandas as pd
from datetime import timedelta

# Authentication details
headers = {
    "x-api-key": "your-api-key",  # Replace with your actual API key
    "Content-Type": "application/json"
}

def get_transcription(run_id, format_type="txt", data_type="file"):
    """
    Fetches the complete transcription for a processed media file.
    
    Parameters:
        run_id (int): The unique identifier for your transcription task
        format_type (str): Output format - 'txt', 'srt' or 'vtt'
        data_type (str): How to receive data - 'file' for URL or 'json' for raw data
    
    Returns:
        If data_type is 'file': A URL to download the transcription file
        If data_type is 'json': An array of transcription segments with timing and speaker information
    """
    try:
        # Build query parameters based on user preferences
        params = {
            "format_type": format_type,
            "data_type": data_type
        }
        
        # Request the transcription with specified format and type
        response = requests.get(
            f"https://client.camb.ai/apis/transcription-result/{run_id}",
            headers=headers,
            params=params  # Include our format preferences in the request
        )
        
        # Ensure successful response
        response.raise_for_status()
        
        # Parse the response based on the requested data_type
        result = response.json()
        
        if data_type == "file":
            # For file requests, we receive a URL to download the transcript
            print(f"Successfully retrieved transcription file URL in {format_type} format")
            print(f"Download URL: {result['file_url']}")
            print(f"URL will expire in 24 hours - download your file soon")
            return result['file_url']
        else:
            # For JSON requests, we receive the transcription data directly
            transcription_data = result['segments']
            print(f"Successfully retrieved transcription with {len(transcription_data)} segments")
            return transcription_data
        
    except requests.exceptions.RequestException as e:
        print(f"Error retrieving transcription: {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(f"Response details: {e.response.text}")
        return None

# Example 1: Get a downloadable SRT file for subtitle integration
run_id = 12345
srt_url = get_transcription(run_id, format_type="srt", data_type="file")
print(f"You can now download the SRT file and import it directly into your video editor")

# Example 2: Get raw JSON data for analysis
run_id = 12345
transcription = get_transcription(run_id, format_type="txt", data_type="json")

# Let's analyze the JSON data if we received it
if transcription and isinstance(transcription, list):
    # Convert to DataFrame for easier manipulation
    df = pd.DataFrame(transcription)
    
    # Format timestamps as human-readable
    df['duration'] = df.apply(lambda row: round(row['end'] - row['start'], 2), axis=1)
    df['start_formatted'] = df['start'].apply(lambda x: str(timedelta(seconds=x)))
    df['end_formatted'] = df['end'].apply(lambda x: str(timedelta(seconds=x)))
    
    # Display a sample of the transcription
    print("\nTranscription Preview:")
    for i, row in df.head(5).iterrows():
        print(f"[{row['start_formatted']}{row['end_formatted']}] {row['speaker']}: {row['text']}")
    
    # Generate some basic analytics
    print("\nTranscription Statistics:")
    print(f"Total duration: {str(timedelta(seconds=df['end'].max()))}")
    print(f"Unique speakers: {df['speaker'].nunique()}")
    print(f"Most frequent speaker: {df['speaker'].value_counts().idxmax()}")
    
    # Export to various formats
    df.to_csv("transcription.csv", index=False)
    print("\nTranscription exported to CSV for further analysis")

Understanding the Response Structure

The structure of your transcription response will depend on the data_type parameter you specify in your request. Let’s explore both options in detail:

JSON Response Format (data_type=json)

When requesting raw JSON data, the transcription results are provided as an array of segment objects, each containing detailed information about a portion of spoken content:

FieldDescription
startThe precise starting point of the speech segment (in seconds)
endThe exact ending point of the speech segment (in seconds)
textThe verbatim transcription of the spoken content in this segment
speakerIdentifier for the person speaking during this segment

This structured JSON format is ideal for programmatic processing, allowing you to:

  • Perform custom analysis on the speech patterns
  • Build interactive transcript interfaces
  • Integrate the data directly into your application’s database

File Response Format (data_type=file)

When requesting a file download (data_type=file), the response provides a pre-signed URL that points to a file containing your transcription in the format you specified:

FieldDescription
file_urlA temporary URL allowing you to download the transcription file
expiryTimestamp indicating when the download URL will expire

The content of this file will be formatted according to your format_type selection:

TXT Format (format_type=txt)

A human-readable text file with timestamps and speaker identification:

[00:00:05 - 00:00:12] Speaker 1: Welcome to our discussion on artificial intelligence.
[00:00:13 - 00:00:18] Speaker 2: Thank you for having me. I'm excited to share insights.

SRT Format (format_type=srt)

A numbered sequence of subtitle blocks with precise timing:

1
00:00:05,000 --> 00:00:12,000
Speaker 1: Welcome to our discussion on artificial intelligence.

2
00:00:13,000 --> 00:00:18,000
Speaker 2: Thank you for having me. I'm excited to share insights.

VTT Format (format_type=vtt)

Web Video Text Tracks format, optimized for HTML5 video:

WEBVTT

00:00:05.000 --> 00:00:12.000
Speaker 1: Welcome to our discussion on artificial intelligence.

00:00:13.000 --> 00:00:18.000
Speaker 2: Thank you for having me. I'm excited to share insights.

These structured formats enable seamless integration with various content management systems, video editing platforms, and analytical tools. The speaker identification makes the transcription particularly valuable for multi-participant content like interviews, panel discussions, and dramatized works.

For multi-language content, our system intelligently identifies the dominant language for each segment, ensuring accurate transcription across mixed-language media.

Selecting the Right Format for Your Use Case

The flexibility to choose different output formats opens up specific workflows tailored to your needs. Let’s explore when to use each format option:

When to Use TXT Format

The plain text format (format_type=txt) is ideal when you need:

  • Human-readable transcripts for review and editing
  • Content that can be easily imported into word processors
  • A foundation for creating articles or blog posts from video content
  • Material for qualitative research analysis

TXT format preserves speaker information alongside the dialogue, making it perfect for interview transcription where speaker attribution is essential.

When to Use SRT Format

The SubRip Text format (format_type=srt) is your best choice when:

  • Creating subtitles for video editing software like Adobe Premiere or Final Cut Pro
  • Preparing content for DVD or Blu-ray authoring
  • Working with offline video players like VLC or Media Player Classic
  • Translating content where precise timing is required

SRT is the most widely compatible subtitle format across editing platforms and media players, making it the industry standard for subtitle work.

When to Use VTT Format

The Web Video Text Tracks format (format_type=vtt) shines when:

  • Embedding subtitles in HTML5 video players
  • Working with streaming platforms that use HLS or DASH protocols
  • Creating accessible web content that meets WCAG guidelines
  • Developing educational platforms with interactive transcripts

VTT offers additional features beyond SRT, including better support for styling and positioning, making it perfect for web-centric workflows.

Transforming Transcriptions into Valuable Assets

The detailed transcription data opens up numerous possibilities for content enhancement:

Educational Applications

Transform lectures and educational content into searchable resources by using the transcription to:

  • Create interactive transcripts that synchronize with video playback
  • Generate study guides with direct quotes from instructors
  • Enable students to search for specific concepts within long-form content

Content Analysis

Dive deeper into your media with analytical approaches:

  • Track speaker participation and engagement patterns
  • Identify key themes and topics through text analysis
  • Create word clouds and frequency analyses of terminology

Accessibility Enhancements

Make your content more inclusive with transcription-based features:

  • Generate accurate closed captions synchronized with your video
  • Create full transcripts for hearing-impaired audience members
  • Enable screen reader compatibility for audio content

Please note that our Professional and Enterprise plans offer additional transcription features, including enhanced speaker identification and industry-specific terminology support. The advanced MXF format processing is exclusively available to Enterprise customers.

Practical Workflow Integration

Incorporating transcription results into your content workflow creates numerous efficiencies:

  1. Content Creation: Use transcripts as the foundation for blog posts and articles
  2. Legal Documentation: Create verifiable records of interviews and statements
  3. Localization: Start with accurate transcriptions before beginning translation
  4. SEO Optimization: Improve discoverability with transcript-based metadata
  5. Research: Analyze verbal patterns and content themes systematically

By leveraging our transcription capabilities, you transform ephemeral spoken content into structured data assets that can be utilized across your organization’s content ecosystem.

Authorizations

x-api-key
string
header
required

The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.

Path Parameters

run_id
integer
required

The unique identifier for the run, which was generated during the transcription creation process and returned upon task completion.

Query Parameters

word_level_timestamps
boolean
default:false

When set to true, this parameter enables the generation of word-level timestamps in the response. These timestamps provide precise timing information for each word in the processed audio.

Response

200
application/json

Successful Response

The response is of type DialogueItem · object[].