Get Transcription Result
Retrieves the result of the transcription run using the provided run_id
.
Access the detailed transcription of your media content through this powerful endpoint. When your transcription process completes, this API provides a comprehensive text representation of your audio or video content with precise timing information and speaker identification. This transcription data serves as the foundation for numerous content enhancement workflows, including subtitling, content analysis, and accessibility improvements.
The Power of Accurate Transcription
Transcription transforms the spoken word into structured text data, unlocking numerous possibilities for your content. Our system delivers transcriptions that include:
- Temporal Precision: Exact start and end timestamps for each spoken segment
- Speaker Differentiation: Clear identification of different speakers throughout the content
- Verbatim Text: Accurate textual representation of all spoken content
This structured approach to transcription enables content creators, researchers, and educators to work with audio-visual materials in entirely new ways, making the content searchable, analyzable, and more accessible.
Retrieving Your Transcription Data
To access your completed transcription, you’ll need the unique run_id
that was assigned when you initially submitted your transcription request. This identifier allows our system to locate your specific transcription results within our processing infrastructure.
Customizing Your Transcription Format
Our API offers flexible output options to suit your specific workflow needs. When retrieving your transcription, you can specify:
-
Format Type (
format_type
): Choose how your transcription is structuredtxt
: Plain text format with speaker labels and timestamps (default)srt
: SubRip Text format, ready for subtitle integration in most video playersvtt
: WebVTT format, optimized for web-based video players and HTML5 applications
-
Data Type (
data_type
): Determine how you’ll receive the transcription datafile
: Receive a pre-signed URL to download the complete transcription file (default)json
: Receive the raw transcription data directly in the API response
These parameters allow you to tailor the transcription output to perfectly match your intended use case, whether you’re developing subtitles, creating a searchable transcript, or performing content analysis.
Let’s explore how to retrieve and work with your transcription data using Python:
Understanding the Response Structure
The structure of your transcription response will depend on the data_type
parameter you specify in your request. Let’s explore both options in detail:
JSON Response Format (data_type=json
)
When requesting raw JSON data, the transcription results are provided as an array of segment objects, each containing detailed information about a portion of spoken content:
Field | Description |
---|---|
start | The precise starting point of the speech segment (in seconds) |
end | The exact ending point of the speech segment (in seconds) |
text | The verbatim transcription of the spoken content in this segment |
speaker | Identifier for the person speaking during this segment |
This structured JSON format is ideal for programmatic processing, allowing you to:
- Perform custom analysis on the speech patterns
- Build interactive transcript interfaces
- Integrate the data directly into your application’s database
File Response Format (data_type=file
)
When requesting a file download (data_type=file
), the response provides a pre-signed URL that points to a file containing your transcription in the format you specified:
Field | Description |
---|---|
file_url | A temporary URL allowing you to download the transcription file |
expiry | Timestamp indicating when the download URL will expire |
The content of this file will be formatted according to your format_type
selection:
TXT Format (format_type=txt
)
A human-readable text file with timestamps and speaker identification:
SRT Format (format_type=srt
)
A numbered sequence of subtitle blocks with precise timing:
VTT Format (format_type=vtt
)
Web Video Text Tracks format, optimized for HTML5 video:
These structured formats enable seamless integration with various content management systems, video editing platforms, and analytical tools. The speaker identification makes the transcription particularly valuable for multi-participant content like interviews, panel discussions, and dramatized works.
Selecting the Right Format for Your Use Case
The flexibility to choose different output formats opens up specific workflows tailored to your needs. Let’s explore when to use each format option:
When to Use TXT Format
The plain text format (format_type=txt
) is ideal when you need:
- Human-readable transcripts for review and editing
- Content that can be easily imported into word processors
- A foundation for creating articles or blog posts from video content
- Material for qualitative research analysis
TXT format preserves speaker information alongside the dialogue, making it perfect for interview transcription where speaker attribution is essential.
When to Use SRT Format
The SubRip Text format (format_type=srt
) is your best choice when:
- Creating subtitles for video editing software like Adobe Premiere or Final Cut Pro
- Preparing content for DVD or Blu-ray authoring
- Working with offline video players like VLC or Media Player Classic
- Translating content where precise timing is required
SRT is the most widely compatible subtitle format across editing platforms and media players, making it the industry standard for subtitle work.
When to Use VTT Format
The Web Video Text Tracks format (format_type=vtt
) shines when:
- Embedding subtitles in HTML5 video players
- Working with streaming platforms that use HLS or DASH protocols
- Creating accessible web content that meets WCAG guidelines
- Developing educational platforms with interactive transcripts
VTT offers additional features beyond SRT, including better support for styling and positioning, making it perfect for web-centric workflows.
Transforming Transcriptions into Valuable Assets
The detailed transcription data opens up numerous possibilities for content enhancement:
Educational Applications
Transform lectures and educational content into searchable resources by using the transcription to:
- Create interactive transcripts that synchronize with video playback
- Generate study guides with direct quotes from instructors
- Enable students to search for specific concepts within long-form content
Content Analysis
Dive deeper into your media with analytical approaches:
- Track speaker participation and engagement patterns
- Identify key themes and topics through text analysis
- Create word clouds and frequency analyses of terminology
Accessibility Enhancements
Make your content more inclusive with transcription-based features:
- Generate accurate closed captions synchronized with your video
- Create full transcripts for hearing-impaired audience members
- Enable screen reader compatibility for audio content
Please note that our Professional and Enterprise plans offer additional transcription features, including enhanced speaker identification and industry-specific terminology support. The advanced MXF format processing is exclusively available to Enterprise customers.
Practical Workflow Integration
Incorporating transcription results into your content workflow creates numerous efficiencies:
- Content Creation: Use transcripts as the foundation for blog posts and articles
- Legal Documentation: Create verifiable records of interviews and statements
- Localization: Start with accurate transcriptions before beginning translation
- SEO Optimization: Improve discoverability with transcript-based metadata
- Research: Analyze verbal patterns and content themes systematically
By leveraging our transcription capabilities, you transform ephemeral spoken content into structured data assets that can be utilized across your organization’s content ecosystem.
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Path Parameters
The unique identifier for the run, which was generated during the transcription creation process and returned upon task completion.
Query Parameters
When set to true
, this parameter enables the generation of word-level timestamps in the response. These timestamps provide precise timing information for each word in the processed audio.
Response
Successful Response
The response is of type DialogueItem · object[]
.