The Power of Accurate Transcription
Transcription transforms the spoken word into structured text data, unlocking numerous possibilities for your content. Our system delivers transcriptions that include:- Temporal Precision: Exact start and end timestamps for each spoken segment
- Speaker Differentiation: Clear identification of different speakers throughout the content
- Verbatim Text: Accurate textual representation of all spoken content
Retrieving Your Transcription Data
To access your completed transcription, youβll need the uniquerun_id
that was assigned when you initially submitted your transcription request. This identifier allows our system to locate your specific transcription results within our processing infrastructure.
Customizing Your Transcription Format
Our API offers flexible output options to suit your specific workflow needs. When retrieving your transcription, you can specify:-
Format Type (
format_type
): Choose how your transcription is structuredtxt
: Plain text format with speaker labels and timestamps (default)srt
: SubRip Text format, ready for subtitle integration in most video playersvtt
: WebVTT format, optimized for web-based video players and HTML5 applications
-
Data Type (
data_type
): Determine how youβll receive the transcription datafile
: Receive a pre-signed URL to download the complete transcription file (default)json
: Receive the raw transcription data directly in the API response
Understanding the Response Structure
The structure of your transcription response will depend on thedata_type
parameter you specify in your request. Letβs explore both options in detail:
JSON Response Format (data_type=json
)
When requesting raw JSON data, the transcription results are provided as an array of segment objects, each containing detailed information about a portion of spoken content:
Field | Description |
---|---|
start | The precise starting point of the speech segment (in seconds) |
end | The exact ending point of the speech segment (in seconds) |
text | The verbatim transcription of the spoken content in this segment |
speaker | Identifier for the person speaking during this segment |
- Perform custom analysis on the speech patterns
- Build interactive transcript interfaces
- Integrate the data directly into your applicationβs database
File Response Format (data_type=file
)
When requesting a file download (data_type=file
), the response provides a pre-signed URL that points to a file containing your transcription in the format you specified:
Field | Description |
---|---|
file_url | A temporary URL allowing you to download the transcription file |
expiry | Timestamp indicating when the download URL will expire |
format_type
selection:
TXT Format (format_type=txt
)
A human-readable text file with timestamps and speaker identification:
SRT Format (format_type=srt
)
A numbered sequence of subtitle blocks with precise timing:
VTT Format (format_type=vtt
)
Web Video Text Tracks format, optimized for HTML5 video:
Selecting the Right Format for Your Use Case
The flexibility to choose different output formats opens up specific workflows tailored to your needs. Letβs explore when to use each format option:When to Use TXT Format
The plain text format (format_type=txt
) is ideal when you need:
- Human-readable transcripts for review and editing
- Content that can be easily imported into word processors
- A foundation for creating articles or blog posts from video content
- Material for qualitative research analysis
When to Use SRT Format
The SubRip Text format (format_type=srt
) is your best choice when:
- Creating subtitles for video editing software like Adobe Premiere or Final Cut Pro
- Preparing content for DVD or Blu-ray authoring
- Working with offline video players like VLC or Media Player Classic
- Translating content where precise timing is required
When to Use VTT Format
The Web Video Text Tracks format (format_type=vtt
) shines when:
- Embedding subtitles in HTML5 video players
- Working with streaming platforms that use HLS or DASH protocols
- Creating accessible web content that meets WCAG guidelines
- Developing educational platforms with interactive transcripts
Transforming Transcriptions into Valuable Assets
The detailed transcription data opens up numerous possibilities for content enhancement:Educational Applications
Transform lectures and educational content into searchable resources by using the transcription to:- Create interactive transcripts that synchronize with video playback
- Generate study guides with direct quotes from instructors
- Enable students to search for specific concepts within long-form content
Content Analysis
Dive deeper into your media with analytical approaches:- Track speaker participation and engagement patterns
- Identify key themes and topics through text analysis
- Create word clouds and frequency analyses of terminology
Accessibility Enhancements
Make your content more inclusive with transcription-based features:- Generate accurate closed captions synchronized with your video
- Create full transcripts for hearing-impaired audience members
- Enable screen reader compatibility for audio content
Practical Workflow Integration
Incorporating transcription results into your content workflow creates numerous efficiencies:- Content Creation: Use transcripts as the foundation for blog posts and articles
- Legal Documentation: Create verifiable records of interviews and statements
- Localization: Start with accurate transcriptions before beginning translation
- SEO Optimization: Improve discoverability with transcript-based metadata
- Research: Analyze verbal patterns and content themes systematically
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Path Parameters
The unique identifier for the run, which was generated during the transcription creation process and returned upon task completion.
Query Parameters
When set to true
, this parameter enables the generation of word-level timestamps in the response. These timestamps provide precise timing information for each word in the processed audio.
Response
Successful Response