Create a Custom Voice
Creates a new voice clone by uploading an audio file reference.
Understanding Voice Cloning Technology
This endpoint empowers you to create personalized synthetic voices by leveraging our advanced voice cloning technology. When you submit an audio sample, our system analyzes the unique vocal characteristics of the speakerβincluding pitch, timbre, rhythm, and other subtle qualitiesβand generates a digital voice model that can speak any text with those same characteristics. This process effectively βclonesβ the voice from your sample, enabling you to create custom content that maintains the distinctive vocal identity of the original speaker.
For optimal cloning results:
- Provide a clear, high-quality recording with minimal background noise.
- The audio should feature natural speech patterns at a consistent pace and volume.
- Samples of 30-60 seconds typically provide sufficient data for accurate voice modeling.
The Voice Cloning Process
When you submit your request to this endpoint, several sophisticated processes occur:
-
Audio Analysis: Our system processes your audio file, identifying the core vocal characteristics that make the voice unique.
-
Feature Extraction: The system extracts key vocal features, including pitch range, tonal qualities, speech cadence, and articulation patterns.
-
Voice Model Creation: These extracted features are used to build a voice model that can synthesize new speech matching the original voiceβs characteristics.
-
Voice Registration: The newly created voice is registered in your account and assigned a unique identifier for future use in text-to-speech operations.
Once complete, your custom voice becomes immediately available for use across our platform.
Audio Sample Recommendations
The quality of your custom voice depends significantly on the quality of your audio sample. Consider these factors when preparing your sample:
-
Content Quality:
- Record in a quiet environment with minimal background noise or echo
- Use a good quality microphone positioned at an appropriate distance
- Maintain consistent volume and speaking pace throughout the recording
- Avoid vocal strains, forced expressions, or unnatural speech patterns
-
Content Selection:
- Choose text with diverse phonetic content that exercises different sounds
- Include a variety of sentence structures and intonation patterns
- Select content similar to what youβll ultimately want the voice to produce
- Avoid highly technical terms unless theyβre essential to your use case
-
Sample Duration:
- Aim for 30-60 seconds of clean audio at minimum
- Longer samples (2-3 minutes) can provide better results for more complex use cases
- Very short samples (15 seconds) may produce less consistent voice clones
Following these guidelines will help ensure that the resulting voice clone accurately captures the essence of the original speakerβs voice.
Implementation Example
Hereβs a comprehensive Python example that demonstrates how to create a custom voice, including proper error handling:
Working with Your Custom Voice
After successfully creating a custom voice, you can use it for various speech synthesis tasks:
-
Text-to-Speech Conversion: Use the voice ID with the
/tts
endpoint to convert text to speech in your custom voice. -
Voice Management: Find your custom voice in the list returned by the
/list-voices
endpoint, alongside public and shared voices. -
Voice Updates: If needed, you can create a new version of the voice by submitting a new request with the same voice name but a different audio sample.
-
Voice Publishing: If you initially created a private voice but later wish to share it, you can use our voice update endpoints to change its publication status.
Custom voices remain associated with your account indefinitely unless explicitly deleted, giving you a persistent library of voices for your various speech synthesis needs.
Use Cases for Custom Voices
Creating custom voices opens up numerous possibilities for personalized content:
- Brand Consistency: Create a consistent voice identity across all your digital touchpoints
- Character Development: Build unique voices for characters in games, animations, or interactive experiences
- Personalized Communication: Generate content that sounds like a specific individual (with appropriate permissions)
- Accessibility Solutions: Create synthetic versions of a personβs voice for use in assistive technology
- Multilingual Content: Clone a voice and use it to speak multiple languages while maintaining the same vocal identity
By understanding the possibilities, you can better strategize how custom voices might enhance your specific applications and user experiences.
Ethical Considerations
When creating and using custom voices, please consider these ethical guidelines:
- Always obtain proper consent from the person whose voice you are cloning
- Use voice clones respectfully and avoid creating misleading content
- Clearly identify synthesized speech when there might be confusion about its origin
- Follow applicable laws and regulations regarding biometric data and privacy
Responsible use of voice cloning technology helps ensure its continued availability while respecting individual rights and social norms.
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Body
Response
Successful Response
The response is of type object
.