Create a Custom Voice

Understanding Voice Cloning Technology

This endpoint empowers you to create personalized synthetic voices by leveraging our advanced voice cloning technology. When you submit an audio sample, our system analyzes the unique vocal characteristics of the speaker—including pitch, timbre, rhythm, and other subtle qualities—and generates a digital voice model that can speak any text with those same characteristics. This process effectively “clones” the voice from your sample, enabling you to create custom content that maintains the distinctive vocal identity of the original speaker.

For optimal cloning results:

Provide a clear, high-quality recording with minimal background noise.
The audio should feature natural speech patterns at a consistent pace and volume.
Samples of 30-60 seconds typically provide sufficient data for accurate voice modeling.

The Voice Cloning Process

When you submit your request to this endpoint, several sophisticated processes occur:

Audio Analysis: Our system processes your audio file, identifying the core vocal characteristics that make the voice unique.
Feature Extraction: The system extracts key vocal features, including pitch range, tonal qualities, speech cadence, and articulation patterns.
Voice Model Creation: These extracted features are used to build a voice model that can synthesize new speech matching the original voice’s characteristics.
Voice Registration: The newly created voice is registered in your account and assigned a unique identifier for future use in text-to-speech operations.

Once complete, your custom voice becomes immediately available for use across our platform.

Audio Sample Recommendations

The quality of your custom voice depends significantly on the quality of your audio sample. Consider these factors when preparing your sample:

Content Quality:
- Record in a quiet environment with minimal background noise or echo
- Use a good quality microphone positioned at an appropriate distance
- Maintain consistent volume and speaking pace throughout the recording
- Avoid vocal strains, forced expressions, or unnatural speech patterns
Content Selection:
- Choose text with diverse phonetic content that exercises different sounds
- Include a variety of sentence structures and intonation patterns
- Select content similar to what you’ll ultimately want the voice to produce
- Avoid highly technical terms unless they’re essential to your use case
Sample Duration:
- Aim for 30-60 seconds of clean audio at minimum
- Longer samples (2-3 minutes) can provide better results for more complex use cases
- Very short samples (15 seconds) may produce less consistent voice clones

Following these guidelines will help ensure that the resulting voice clone accurately captures the essence of the original speaker’s voice.

Implementation Example

Here’s a comprehensive Python example that demonstrates how to create a custom voice, including proper error handling:

import requests
import os
import time

def create_custom_voice(file_path, voice_name, gender, age, description=None, language=None, is_published=False, enhance_audio=True):
    """
    Creates a custom voice clone from an audio sample

    Parameters:
        file_path (str): Path to the audio file containing the voice sample
        voice_name (str): Name to assign to the custom voice
        gender (int): Gender identifier (1=male, 2=female, etc.)
        age (int): Approximate age of the speaker
        description (str, optional): Detailed description of the voice
        language (str, optional): Language code of the voice (e.g., "en-US")
        is_published (bool, optional): Whether to make the voice publicly available
        enhance_audio (bool, optional): Whether to apply enhancement to the reference audio for better voice cloning accuracy.

    Returns:
        dict: Response from the API containing voice details
    """
    # Validate file exists
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"Audio file not found at: {file_path}")

    # Prepare file for upload
    files = {'file': open(file_path, 'rb')}

    # Prepare metadata
    data = {
        'voice_name': voice_name,
        'gender': gender,
        'age': age
    }

    # Add optional parameters if provided
    if description:
        data['description'] = description
    if language:
        data['language'] = language
    if is_published:
        data['is_published'] = is_published

    if enhance_audio:
        data["enhance_audio"] = enhance_audio

    try:
        # Make API request
        response = requests.post(
            "https://client.camb.ai/apis/create-custom-voice",
            files=files,
            data=data,
            headers={
                "x-api-key": os.environ.get("API_KEY")  # Get API key from environment variable
            }
        )

        # Close file handle
        files['file'].close()

        # Check for errors
        response.raise_for_status()

        # Return response data
        return response.json()

    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
        if response.text:
            print(f"Response details: {response.text}")
        return None
    except Exception as err:
        print(f"Error creating custom voice: {err}")
        return None
    finally:
        # Ensure file is closed even if an error occurs
        if 'file' in files and not files['file'].closed:
            files['file'].close()

# Example usage
if __name__ == "__main__":
    result = create_custom_voice(
        file_path="narrator_sample.wav",
        voice_name="Professional Narrator",
        gender=1,
        age=45,
        description="Clear, articulate voice with professional tone. Ideal for documentary narration.",
        language="1"
    )

    if result:
        print(f"Custom voice created successfully with ID: {result['voice_id']}")

Working with Your Custom Voice

After successfully creating a custom voice, you can use it for various speech synthesis tasks:

Text-to-Speech Conversion: Use the voice ID with the /tts endpoint to convert text to speech in your custom voice.
Voice Management: Find your custom voice in the list returned by the /list-voices endpoint, alongside public and shared voices.
Voice Updates: If needed, you can create a new version of the voice by submitting a new request with the same voice name but a different audio sample.
Voice Publishing: If you initially created a private voice but later wish to share it, you can use our voice update endpoints to change its publication status.

Custom voices remain associated with your account indefinitely unless explicitly deleted, giving you a persistent library of voices for your various speech synthesis needs.

Use Cases for Custom Voices

Creating custom voices opens up numerous possibilities for personalized content:

Brand Consistency: Create a consistent voice identity across all your digital touchpoints
Character Development: Build unique voices for characters in games, animations, or interactive experiences
Personalized Communication: Generate content that sounds like a specific individual (with appropriate permissions)
Accessibility Solutions: Create synthetic versions of a person’s voice for use in assistive technology
Multilingual Content: Clone a voice and use it to speak multiple languages while maintaining the same vocal identity

By understanding the possibilities, you can better strategize how custom voices might enhance your specific applications and user experiences.

Ethical Considerations

When creating and using custom voices, please consider these ethical guidelines:

Always obtain proper consent from the person whose voice you are cloning
Use voice clones respectfully and avoid creating misleading content
Clearly identify synthesized speech when there might be confusion about its origin
Follow applicable laws and regulations regarding biometric data and privacy

Responsible use of voice cloning technology helps ensure its continued availability while respecting individual rights and social norms.

Authorizations

x-api-key

string

header

required

The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.

Body

multipart/form-data

voice_name

string

required

The name or label to be assigned to the voice.

gender

enum<integer>

required

Represents the gender of the speaker in the provided audio. Values are encoded as integers.

Available options:

0,

1,

2,

9

file

required

The reference audio file that will be used to create the custom voice. The file should have clear speech to ensure optimal cloning accuracy. Supported formats include .aac, .flac, .mp3 and .wav.

description

string | null

A brief summary of the custom voice—e.g. its intended use, tone or character traits.

publish_voice_to_market_place

boolean | null

Set this to true to publish this custom voice to the marketplace for others to use. By making it available in the marketplace you consent to the guidelines and terms & conditions.

age

integer

default:30

The estimated or actual age of the speaker in the reference audio.

Required range: x >= 1

enhance_audio

boolean

default:false

If set to true, the system will apply audio enhancement techniques such as noise reduction and volume normalization to improve voice clarity.

language

enum<integer>

The language of the reference audio file. This field is optional.

Available options:

1,

2,

3,

4,

5,

6,

7,

8,

9,

10,

11,

12,

13,

14,

15,

16,

17,

18,

19,

20,

21,

22,

23,

24,

25,

26,

27,

28,

29,

30,

31,

32,

33,

34,

35,

36,

37,

38,

39,

40,

41,

42,

43,

44,

45,

46,

47,

48,

49,

50,

51,

52,

53,

54,

55,

56,

57,

58,

59,

60,

61,

62,

63,

64,

65,

66,

67,

68,

69,

70,

71,

72,

73,

74,

75,

76,

77,

78,

79,

80,

81,

82,

83,

84,

85,

86,

87,

88,

89,

90,

91,

92,

93,

94,

95,

96,

97,

98,

99,

100,

101,

102,

103,

104,

105,

106,

107,

108,

109,

110,

111,

112,

113,

114,

115,

116,

117,

118,

119,

120,

121,

122,

123,

124,

125,

126,

127,

128,

129,

130,

131,

132,

133,

134,

135,

136,

139,

140,

141,

142,

143,

144,

145,

146,

147,

148,

149,

150

Response

Successful Response

voice_id

integer

INTRODUCTION

API ENDPOINTS

ADMINISTRATION

Create a Custom Voice

Understanding Voice Cloning Technology

The Voice Cloning Process

Audio Sample Recommendations

Implementation Example

Working with Your Custom Voice

Use Cases for Custom Voices

Ethical Considerations

Authorizations

Body

Response

INTRODUCTION

API ENDPOINTS

ADMINISTRATION

​Understanding Voice Cloning Technology

​The Voice Cloning Process

​Audio Sample Recommendations

​Implementation Example

​Working with Your Custom Voice

​Use Cases for Custom Voices

​Ethical Considerations

Authorizations

Body

Response

Understanding Voice Cloning Technology

The Voice Cloning Process

Audio Sample Recommendations

Implementation Example

Working with Your Custom Voice

Use Cases for Custom Voices

Ethical Considerations