Create Voice from Description
Submits a request to initiate a task for creating a human-like voice from a given text prompt.
This endpoint enables you to generate custom, human-like synthetic voices based on your descriptive text prompts. Rather than selecting from pre-defined voice options, this innovative approach allows you to craft voices tailored to your specific needs by simply describing the voice characteristics you want. The endpoint initiates an asynchronous process, returning a task_id that you can use to monitor the generation progress and eventually retrieve your custom voice.Documentation Index
Fetch the complete documentation index at: https://docs.camb.ai/llms.txt
Use this file to discover all available pages before exploring further.
How Voice Generation Works
The voice generation process follows these steps:- You submit a detailed description of the voice you want to create, along with sample text for the voice to speak.
- The system analyzes your description and generates two sample voices matching those characteristics for you to choose from.
- The system returns a
task_idthat you can use to track the generation process. - You periodically check the status using the
/text-to-voice/{task_id}endpoint. - Once complete, you can access and use the generated voice in your applications.
Creating Effective Voice Descriptions
The quality and specificity of your voice description directly impacts the resulting voice. When crafting your description, consider including details about:- Gender and age range: âA middle-aged womanâ or âAn elderly manâ
- Accent and regional characteristics: âWith a mild Scottish accentâ or âSpeaking American English with Southern inflectionsâ
- Emotional qualities: âA warm, nurturing toneâ or âAn authoritative, confident deliveryâ
- Speaking style: âWho speaks slowly and deliberatelyâ or âWith an energetic, rapid-fire deliveryâ
- Cultural context: âA voice that would be at home narrating documentariesâ or âLike a friendly teacher explaining conceptsâ
- Vocal characteristics: âWith a slightly raspy qualityâ or âWith a deep, resonant toneâ
Example Request
Response
Upon successful submission, the endpoint returns atask_id that you can use to check the status of your voice generation task:
Monitoring Generation Progress
Voice generation is a computationally intensive process that typically takes some time to complete. To check the status of your generation task, periodically poll the/text-to-voice/{task_id} endpoint using the task_id received in the initial response.
Best Practices
- Be specific in your descriptions: The more detailed your voice description, the better the system can match your expectations.
- Consider the context: Tailor your voice to match the content and audience of your application.
- Start with longer descriptions: While 18 words is the minimum, starting with more detailed descriptions (30-50 words) often yields better results.
- Test variations: If your first voice isnât exactly what you need, try adjusting specific aspects of your description to refine the results.
- Include emotional context: Describing the emotional quality of the voice significantly improves the naturalness of the generated speech.
Limitations
- Voice descriptions must be at least 18 words (100+ characters) long.
- Very unusual or contradictory voice descriptions may yield unpredictable results.
Authorizations
The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Body
The text content that will be converted into synthesized speech. This text will be spoken by your generated voice and serves as a sample of the voice's capabilities.
A detailed description (minimum 18 words/100+ characters) of the desired voice characteristics. Be specific about gender, age, accent, emotional tone, speaking style, or cultural context to guide the synthesis engine in creating an authentic voice.
Response
Successful response
A JSON that contains the unique identifier for the task. This is used to query the status of the text to voice task that is running. It is returned when a create request is made for creating a text to voice task.