Retrieves the result of the transcription run using the provided run_id
.
Access the detailed transcription of your media content through this powerful endpoint. When your transcription process completes, this API provides a comprehensive text representation of your audio or video content with precise timing information and speaker identification. This transcription data serves as the foundation for numerous content enhancement workflows, including subtitling, content analysis, and accessibility improvements.
Transcription transforms the spoken word into structured text data, unlocking numerous possibilities for your content. Our system delivers transcriptions that include:
This structured approach to transcription enables content creators, researchers, and educators to work with audio-visual materials in entirely new ways, making the content searchable, analyzable, and more accessible.
To access your completed transcription, you’ll need the unique run_id
that was assigned when you initially submitted your transcription request. This identifier allows our system to locate your specific transcription results within our processing infrastructure.
Our API offers flexible output options to suit your specific workflow needs. When retrieving your transcription, you can specify:
Format Type (format_type
): Choose how your transcription is structured
txt
: Plain text format with speaker labels and timestamps (default)srt
: SubRip Text format, ready for subtitle integration in most video playersvtt
: WebVTT format, optimized for web-based video players and HTML5 applicationsData Type (data_type
): Determine how you’ll receive the transcription data
file
: Receive a pre-signed URL to download the complete transcription file (default)json
: Receive the raw transcription data directly in the API responseThese parameters allow you to tailor the transcription output to perfectly match your intended use case, whether you’re developing subtitles, creating a searchable transcript, or performing content analysis.
Let’s explore how to retrieve and work with your transcription data using Python:
The structure of your transcription response will depend on the data_type
parameter you specify in your request. Let’s explore both options in detail:
data_type=json
)When requesting raw JSON data, the transcription results are provided as an array of segment objects, each containing detailed information about a portion of spoken content:
Field | Description |
---|---|
start | The precise starting point of the speech segment (in seconds) |
end | The exact ending point of the speech segment (in seconds) |
text | The verbatim transcription of the spoken content in this segment |
speaker | Identifier for the person speaking during this segment |
This structured JSON format is ideal for programmatic processing, allowing you to:
data_type=file
)When requesting a file download (data_type=file
), the response provides a pre-signed URL that points to a file containing your transcription in the format you specified:
Field | Description |
---|---|
file_url | A temporary URL allowing you to download the transcription file |
expiry | Timestamp indicating when the download URL will expire |
The content of this file will be formatted according to your format_type
selection:
format_type=txt
)A human-readable text file with timestamps and speaker identification:
format_type=srt
)A numbered sequence of subtitle blocks with precise timing:
format_type=vtt
)Web Video Text Tracks format, optimized for HTML5 video:
These structured formats enable seamless integration with various content management systems, video editing platforms, and analytical tools. The speaker identification makes the transcription particularly valuable for multi-participant content like interviews, panel discussions, and dramatized works.
The flexibility to choose different output formats opens up specific workflows tailored to your needs. Let’s explore when to use each format option:
The plain text format (format_type=txt
) is ideal when you need:
TXT format preserves speaker information alongside the dialogue, making it perfect for interview transcription where speaker attribution is essential.
The SubRip Text format (format_type=srt
) is your best choice when:
SRT is the most widely compatible subtitle format across editing platforms and media players, making it the industry standard for subtitle work.
The Web Video Text Tracks format (format_type=vtt
) shines when:
VTT offers additional features beyond SRT, including better support for styling and positioning, making it perfect for web-centric workflows.
The detailed transcription data opens up numerous possibilities for content enhancement:
Transform lectures and educational content into searchable resources by using the transcription to:
Dive deeper into your media with analytical approaches:
Make your content more inclusive with transcription-based features:
Please note that our Professional and Enterprise plans offer additional transcription features, including enhanced speaker identification and industry-specific terminology support. The advanced MXF format processing is exclusively available to Enterprise customers.
Incorporating transcription results into your content workflow creates numerous efficiencies:
By leveraging our transcription capabilities, you transform ephemeral spoken content into structured data assets that can be utilized across your organization’s content ecosystem.
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
The unique identifier for the run, which was generated during the transcription creation process and returned upon task completion.
When set to true
, this parameter enables the generation of word-level timestamps in the response. These timestamps provide precise timing information for each word in the processed audio.
Successful Response
The response is of type DialogueItem · object[]
.