Understanding Voice Cloning Technology
This endpoint empowers you to create personalized synthetic voices by leveraging our advanced voice cloning technology. When you submit an audio sample, our system analyzes the unique vocal characteristics of the speakerβincluding pitch, timbre, rhythm, and other subtle qualitiesβand generates a digital voice model that can speak any text with those same characteristics. This process effectively βclonesβ the voice from your sample, enabling you to create custom content that maintains the distinctive vocal identity of the original speaker.- Provide a clear, high-quality recording with minimal background noise.
- The audio should feature natural speech patterns at a consistent pace and volume.
- Samples of 30-60 seconds typically provide sufficient data for accurate voice modeling.
The Voice Cloning Process
When you submit your request to this endpoint, several sophisticated processes occur:- Audio Analysis: Our system processes your audio file, identifying the core vocal characteristics that make the voice unique.
- Feature Extraction: The system extracts key vocal features, including pitch range, tonal qualities, speech cadence, and articulation patterns.
- Voice Model Creation: These extracted features are used to build a voice model that can synthesize new speech matching the original voiceβs characteristics.
- Voice Registration: The newly created voice is registered in your account and assigned a unique identifier for future use in text-to-speech operations.
Audio Sample Recommendations
The quality of your custom voice depends significantly on the quality of your audio sample. Consider these factors when preparing your sample:-
Content Quality:
- Record in a quiet environment with minimal background noise or echo
- Use a good quality microphone positioned at an appropriate distance
- Maintain consistent volume and speaking pace throughout the recording
- Avoid vocal strains, forced expressions, or unnatural speech patterns
-
Content Selection:
- Choose text with diverse phonetic content that exercises different sounds
- Include a variety of sentence structures and intonation patterns
- Select content similar to what youβll ultimately want the voice to produce
- Avoid highly technical terms unless theyβre essential to your use case
-
Sample Duration:
- Aim for 30-60 seconds of clean audio at minimum
- Longer samples (2-3 minutes) can provide better results for more complex use cases
- Very short samples (15 seconds) may produce less consistent voice clones
Implementation Example
Hereβs a comprehensive Python example that demonstrates how to create a custom voice, including proper error handling:Working with Your Custom Voice
After successfully creating a custom voice, you can use it for various speech synthesis tasks:-
Text-to-Speech Conversion: Use the voice ID with the
/tts
endpoint to convert text to speech in your custom voice. -
Voice Management: Find your custom voice in the list returned by the
/list-voices
endpoint, alongside public and shared voices. - Voice Updates: If needed, you can create a new version of the voice by submitting a new request with the same voice name but a different audio sample.
- Voice Publishing: If you initially created a private voice but later wish to share it, you can use our voice update endpoints to change its publication status.
Use Cases for Custom Voices
Creating custom voices opens up numerous possibilities for personalized content:- Brand Consistency: Create a consistent voice identity across all your digital touchpoints
- Character Development: Build unique voices for characters in games, animations, or interactive experiences
- Personalized Communication: Generate content that sounds like a specific individual (with appropriate permissions)
- Accessibility Solutions: Create synthetic versions of a personβs voice for use in assistive technology
- Multilingual Content: Clone a voice and use it to speak multiple languages while maintaining the same vocal identity
Ethical Considerations
When creating and using custom voices, please consider these ethical guidelines:- Always obtain proper consent from the person whose voice you are cloning
- Use voice clones respectfully and avoid creating misleading content
- Clearly identify synthesized speech when there might be confusion about its origin
- Follow applicable laws and regulations regarding biometric data and privacy
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Body
The name or label to be assigned to the voice.
Represents the gender of the speaker in the provided audio. Values are encoded as integers.
0
, 1
, 2
, 9
The reference audio file that will be used to create the custom voice. The file should have clear speech to ensure optimal cloning accuracy. Supported formats include .aac
, .flac
, .mp3
and .wav
.
A brief summary of the custom voiceβe.g. its intended use, tone or character traits.
Set this to true
to publish this custom voice to the marketplace for others to use. By making it available in the marketplace you consent to the guidelines and terms & conditions.
The estimated or actual age of the speaker in the reference audio.
x >= 1
If set to true
, the system will apply audio enhancement techniques such as noise reduction and volume normalization to improve voice clarity.
The language of the reference audio file. This field is optional.
1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
, 11
, 12
, 13
, 14
, 15
, 16
, 17
, 18
, 19
, 20
, 21
, 22
, 23
, 24
, 25
, 26
, 27
, 28
, 29
, 30
, 31
, 32
, 33
, 34
, 35
, 36
, 37
, 38
, 39
, 40
, 41
, 42
, 43
, 44
, 45
, 46
, 47
, 48
, 49
, 50
, 51
, 52
, 53
, 54
, 55
, 56
, 57
, 58
, 59
, 60
, 61
, 62
, 63
, 64
, 65
, 66
, 67
, 68
, 69
, 70
, 71
, 72
, 73
, 74
, 75
, 76
, 77
, 78
, 79
, 80
, 81
, 82
, 83
, 84
, 85
, 86
, 87
, 88
, 89
, 90
, 91
, 92
, 93
, 94
, 95
, 96
, 97
, 98
, 99
, 100
, 101
, 102
, 103
, 104
, 105
, 106
, 107
, 108
, 109
, 110
, 111
, 112
, 113
, 114
, 115
, 116
, 117
, 118
, 119
, 120
, 121
, 122
, 123
, 124
, 125
, 126
, 127
, 128
, 129
, 130
, 131
, 132
, 133
, 134
, 135
, 136
, 139
, 140
, 141
, 142
, 143
, 144
, 145
, 146
, 147
, 148
, 149
, 150
Response
Successful Response