Create Transcription
Creates a task to process speech into readable text. Submit either an audio file or a publicly accessible audio URL.
Converting Speech to Text with Precision
Our transcription service transforms spoken content into accurate, readable text, enabling you to make your audio content searchable, accessible, and analytically valuable. This endpoint initiates a transcription task, processing your audio file and returning a unique identifier that allows you to track and retrieve your results.
Understanding Speech Transcription
Speech transcription technology analyzes audio recordings of human speech and converts them into written text. This process employs sophisticated machine learning models trained on diverse speech patterns, accents, and linguistic contexts to deliver high-quality text outputs. Our system handles various audio formats and speaking situations, from clear studio recordings to more challenging environments with background noise.
When you submit an audio file for transcription, our system:
- Analyzes the audio signal to identify speech segments
- Processes these segments through advanced recognition models
- Applies language-specific rules and context awareness
- Generates a readable text transcript that captures the spoken content
This transformation creates valuable text assets from your audio content, enabling new ways to search, analyze, and repurpose your spoken material.
Supported Languages
Our transcription service supports a wide range of languages. Some of the most commonly used include:
- English (
1
) - Spanish (
54
) - French (
76
) - German (
31
) - Mandarin Chinese (
139
) - Japanese (
88
) - Arabic (
4
)
For a complete list of supported languages and their respective language codes, refer to our Language Support Documentation
Supported Audio Formats
For optimal transcription quality, we recommend using high-quality audio with clear speech and minimal background noise. Our service accepts the following audio formats:
- MP3 (.mp3)
- WAV (.wav)
- AAC (.aac)
- FLAC (.flac)
Request Example
You can create a transcription task by either uploading an audio file or providing a URL to an audio resource. Here are examples using cURL::
- Using a local audio file:
- Using a rempte audio URL:
Here’s how to handle both scenarios in Python with the requests
library:
Processing Time Considerations
Transcription processing time depends on several factors:
- Audio Duration: Longer files naturally take more time to process
- Audio Quality: Clear, high-quality recordings process more efficiently
- Language Complexity: Some languages may require more processing time
- System Load: Processing time can vary based on current system demand
Next Steps: Monitoring Your Transcription Task
After submitting your transcription request, you’ll want to monitor its progress and retrieve the results once processing completes. To do this:
- Use the
/transcribe/{task-id}
endpoint to check your task’s status - Poll the status endpoint at reasonable intervals (we recommend 5-15 second intervals for most cases)
- Once the status shows as
SUCCESS
, you can retrieve your full transcript
Best Practices for Optimal Results
To get the most accurate transcriptions from our service:
-
Use High-Quality Audio: Whenever possible, use audio recorded in quiet environments with minimal background noise.
-
Appropriate Audio Format: Submit uncompressed audio formats like WAV or FLAC for best quality, or high-bitrate MP3s if file size is a concern.
-
Speaker Clarity: Encourage clear speaking with moderate pace for best recognition accuracy.
-
Specify the Correct Language: Always provide the correct language parameter to ensure our models apply the right language patterns.
-
Segment Longer Content: For very long recordings (over 2 hours), consider splitting into multiple smaller files for more efficient processing.
Handling Common Issues
If you encounter problems with your transcription tasks, these troubleshooting steps may help:
Rejected File Uploads
Problem: Your request returns an error about the file upload.
Potential Solutions:
- Verify your file is in one of our supported formats (.mp3, .wav, .aac, .flac)
- Check that your file isn’t corrupted or empty
- Ensure your file doesn’t exceed our size limit (20 MB)
Incorrect Language Specification
Problem: Transcription results appear inaccurate or contain many errors.
Potential Solutions:
- Verify you specified the correct language code
- For multilingual content, choose the predominant language
Taking Your Transcriptions Further
Once you’ve successfully transcribed your audio content, consider these next steps:
- Semantic Analysis: Extract key topics, sentiments, and entities from your transcribed text
- Content Indexing: Make your audio searchable by indexing the transcript content
- Accessibility Compliance: Use transcripts to make your audio content accessible to all users
- Translation: Convert your transcript into other languages for global reach
- Summary Generation: Create concise summaries of longer transcribed content
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Body
Response
Successful Response
A JSON that contains unique identifier for the task. This is used to query the status of the transcription task that is running. It is returned when a create request is made to process speech into text.