Converting Speech to Text with Precision
Our transcription service transforms spoken content into accurate, readable text, enabling you to make your media content searchable, accessible, and analytically valuable. This endpoint initiates a transcription task, processing your media file and returning a unique identifier that allows you to track and retrieve your results.Understanding Speech Transcription
Speech transcription technology analyzes media recordings of human speech or video that contains speech and converts them into written text. This process employs sophisticated machine learning models trained on diverse speech patterns, accents, and linguistic contexts to deliver high-quality text outputs. Our system handles various media formats and speaking situations, from clear studio recordings to more challenging environments with background noise. When you submit an media file for transcription, our system:- Analyzes the audio signal to identify speech segments
- Processes these segments through advanced recognition models
- Applies language-specific rules and context awareness
- Generates a readable text transcript that captures the spoken content
Supported Languages
Our transcription service supports a wide range of languages. Some of the most commonly used include:- English (
1
) - Spanish (
54
) - French (
76
) - German (
31
) - Mandarin Chinese (
139
) - Japanese (
88
) - Arabic (
4
)
Supported media Formats
For optimal transcription quality, we recommend using high-quality media with clear speech and minimal background noise. Our service accepts the following media formats:- MP3 (
.mp3
) - WAV (
.wav
) - AAC (
.aac
) - FLAC (
.flac
) - MP4 (
.mp4
) - MOV (
.mov
) - MXF (
.mxf
)
Request Example
You can create a transcription task by either uploading an media file or providing a URL to an media resource. Here are examples using cURL::- Using a local media file:
- Using a remote media URL:
requests
library:
Processing Time Considerations
Transcription processing time depends on several factors:- Media Duration: Longer files naturally take more time to process
- Media Quality: Clear, high-quality recordings process more efficiently
- Language Complexity: Some languages may require more processing time
- System Load: Processing time can vary based on current system demand
Next Steps: Monitoring Your Transcription Task
After submitting your transcription request, youβll want to monitor its progress and retrieve the results once processing completes. To do this:- Use the
/transcribe/{task-id}
endpoint to check your taskβs status - Poll the status endpoint at reasonable intervals (we recommend 5-15 second intervals for most cases)
- Once the status shows as
SUCCESS
, you can retrieve your full transcript
Best Practices for Optimal Results
To get the most accurate transcriptions from our service:- Use High-Quality Media: Whenever possible, use media recorded in quiet environments with minimal background noise.
- Appropriate Media Format: Submit uncompressed media formats like WAV or FLAC for best quality, or high-bitrate MP3s if file size is a concern.
- Speaker Clarity: Encourage clear speaking with moderate pace for best recognition accuracy.
- Specify the Correct Language: Always provide the correct language parameter to ensure our models apply the right language patterns.
- Segment Longer Content: For very long recordings (over 2 hours), consider splitting into multiple smaller files for more efficient processing.
Handling Common Issues
If you encounter problems with your transcription tasks, these troubleshooting steps may help:Rejected File Uploads
Problem: Your request returns an error about the file upload. Potential Solutions:- Verify your file is in one of our supported formats (
.mp3
,.wav
,.aac
,.flac
,.mp4
,.mov
,.mxf
) - Check that your file isnβt corrupted or empty
- Ensure your file doesnβt exceed our size limit (20 MB)
Incorrect Language Specification
Problem: Transcription results appear inaccurate or contain many errors. Potential Solutions:- Verify you specified the correct language code
- For multilingual content, choose the predominant language
Taking Your Transcriptions Further
Once youβve successfully transcribed your media content, consider these next steps:- Semantic Analysis: Extract key topics, sentiments, and entities from your transcribed text
- Content Indexing: Make your media searchable by indexing the transcript content
- Accessibility Compliance: Use transcripts to make your media content accessible to all users
- Translation: Convert your transcript into other languages for global reach
- Summary Generation: Create concise summaries of longer transcribed content
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Body
1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
, 11
, 12
, 13
, 14
, 15
, 16
, 17
, 18
, 19
, 20
, 21
, 22
, 23
, 24
, 25
, 26
, 27
, 28
, 29
, 30
, 31
, 32
, 33
, 34
, 35
, 36
, 37
, 38
, 39
, 40
, 41
, 42
, 43
, 44
, 45
, 46
, 47
, 48
, 49
, 50
, 51
, 52
, 53
, 54
, 55
, 56
, 57
, 58
, 59
, 60
, 61
, 62
, 63
, 64
, 65
, 66
, 67
, 68
, 69
, 70
, 71
, 73
, 74
, 75
, 76
, 78
, 79
, 80
, 81
, 82
, 83
, 84
, 85
, 86
, 87
, 88
, 90
, 91
, 92
, 93
, 94
, 95
, 96
, 97
, 98
, 99
, 100
, 101
, 102
, 103
, 104
, 106
, 107
, 108
, 109
, 110
, 111
, 112
, 113
, 114
, 115
, 116
, 117
, 118
, 119
, 120
, 121
, 122
, 123
, 124
, 125
, 126
, 127
, 128
, 129
, 130
, 131
, 132
, 133
, 134
, 135
, 136
, 139
, 140
, 141
, 142
, 143
, 144
, 145
, 146
, 148
Media file for transcription
Media file for transcription
Audio URL for transcription. (Supports both video and audio files)
Media URL for transcription. (Supports both video and audio files)
Enter a distinctive name for your project that reflects its purpose or content. This name will be displayed in your CAMB.AI workspace dashboard and used to organize related assets, transcriptions, etc.. . Choose something memorable that helps you quickly identify this specific project among your other voice, audio and localization tasks.
3 - 255
Provide details about your project's goals and specifications. Include information such as the target languages for translation or dubbing, desired voice characteristics, emotional tones to capture, or specific audio processing requirements, outlining the workflow here can serve as valuable documentation for organizational purposes.
3 - 5000
Specify the organizational folder within your CAMB.AI workspace where this task should be created and stored. The folder must already exist in your workspace and be accessible through your current API key authentication. This helps maintain project organization by grouping related tasks together, making it easier to manage and locate your projects.
x >= 1
Response
Successful Response
A JSON that contains unique identifier for the task. This is used to query the status of the transcription task that is running. It is returned when a create request is made to process speech into text.