POST
/
create-custom-voice
curl --request POST \
  --url https://client.camb.ai/apis/create-custom-voice \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-api-key: <api-key>' \
  --form 'voice_name=<string>' \
  --form gender=0
{
  "voice_id": 123
}

Understanding Voice Cloning Technology

This endpoint empowers you to create personalized synthetic voices by leveraging our advanced voice cloning technology. When you submit an audio sample, our system analyzes the unique vocal characteristics of the speakerβ€”including pitch, timbre, rhythm, and other subtle qualitiesβ€”and generates a digital voice model that can speak any text with those same characteristics. This process effectively β€œclones” the voice from your sample, enabling you to create custom content that maintains the distinctive vocal identity of the original speaker.

For optimal cloning results:

  • Provide a clear, high-quality recording with minimal background noise.
  • The audio should feature natural speech patterns at a consistent pace and volume.
  • Samples of 30-60 seconds typically provide sufficient data for accurate voice modeling.

The Voice Cloning Process

When you submit your request to this endpoint, several sophisticated processes occur:

  1. Audio Analysis: Our system processes your audio file, identifying the core vocal characteristics that make the voice unique.

  2. Feature Extraction: The system extracts key vocal features, including pitch range, tonal qualities, speech cadence, and articulation patterns.

  3. Voice Model Creation: These extracted features are used to build a voice model that can synthesize new speech matching the original voice’s characteristics.

  4. Voice Registration: The newly created voice is registered in your account and assigned a unique identifier for future use in text-to-speech operations.

Once complete, your custom voice becomes immediately available for use across our platform.

Audio Sample Recommendations

The quality of your custom voice depends significantly on the quality of your audio sample. Consider these factors when preparing your sample:

  1. Content Quality:

    • Record in a quiet environment with minimal background noise or echo
    • Use a good quality microphone positioned at an appropriate distance
    • Maintain consistent volume and speaking pace throughout the recording
    • Avoid vocal strains, forced expressions, or unnatural speech patterns
  2. Content Selection:

    • Choose text with diverse phonetic content that exercises different sounds
    • Include a variety of sentence structures and intonation patterns
    • Select content similar to what you’ll ultimately want the voice to produce
    • Avoid highly technical terms unless they’re essential to your use case
  3. Sample Duration:

    • Aim for 30-60 seconds of clean audio at minimum
    • Longer samples (2-3 minutes) can provide better results for more complex use cases
    • Very short samples (15 seconds) may produce less consistent voice clones

Following these guidelines will help ensure that the resulting voice clone accurately captures the essence of the original speaker’s voice.

Implementation Example

Here’s a comprehensive Python example that demonstrates how to create a custom voice, including proper error handling:

import requests
import os
import time

def create_custom_voice(file_path, voice_name, gender, age, description=None, language=None, is_published=False, enhance_audio=True):
    """
    Creates a custom voice clone from an audio sample

    Parameters:
        file_path (str): Path to the audio file containing the voice sample
        voice_name (str): Name to assign to the custom voice
        gender (int): Gender identifier (1=male, 2=female, etc.)
        age (int): Approximate age of the speaker
        description (str, optional): Detailed description of the voice
        language (str, optional): Language code of the voice (e.g., "en-US")
        is_published (bool, optional): Whether to make the voice publicly available
        enhance_audio (bool, optional): Whether to apply enhancement to the reference audio for better voice cloning accuracy.

    Returns:
        dict: Response from the API containing voice details
    """
    # Validate file exists
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"Audio file not found at: {file_path}")

    # Prepare file for upload
    files = {'file': open(file_path, 'rb')}

    # Prepare metadata
    data = {
        'voice_name': voice_name,
        'gender': gender,
        'age': age
    }

    # Add optional parameters if provided
    if description:
        data['description'] = description
    if language:
        data['language'] = language
    if is_published:
        data['is_published'] = is_published

    if enhance_audio:
        data["enhance_audio"] = enhance_audio

    try:
        # Make API request
        response = requests.post(
            "https://client.camb.ai/apis/create-custom-voice",
            files=files,
            data=data,
            headers={
                "x-api-key": os.environ.get("API_KEY")  # Get API key from environment variable
            }
        )

        # Close file handle
        files['file'].close()

        # Check for errors
        response.raise_for_status()

        # Return response data
        return response.json()

    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
        if response.text:
            print(f"Response details: {response.text}")
        return None
    except Exception as err:
        print(f"Error creating custom voice: {err}")
        return None
    finally:
        # Ensure file is closed even if an error occurs
        if 'file' in files and not files['file'].closed:
            files['file'].close()

# Example usage
if __name__ == "__main__":
    result = create_custom_voice(
        file_path="narrator_sample.wav",
        voice_name="Professional Narrator",
        gender=1,
        age=45,
        description="Clear, articulate voice with professional tone. Ideal for documentary narration.",
        language="1"
    )

    if result:
        print(f"Custom voice created successfully with ID: {result['voice_id']}")

Working with Your Custom Voice

After successfully creating a custom voice, you can use it for various speech synthesis tasks:

  1. Text-to-Speech Conversion: Use the voice ID with the /tts endpoint to convert text to speech in your custom voice.

  2. Voice Management: Find your custom voice in the list returned by the /list-voices endpoint, alongside public and shared voices.

  3. Voice Updates: If needed, you can create a new version of the voice by submitting a new request with the same voice name but a different audio sample.

  4. Voice Publishing: If you initially created a private voice but later wish to share it, you can use our voice update endpoints to change its publication status.

Custom voices remain associated with your account indefinitely unless explicitly deleted, giving you a persistent library of voices for your various speech synthesis needs.

Use Cases for Custom Voices

Creating custom voices opens up numerous possibilities for personalized content:

  • Brand Consistency: Create a consistent voice identity across all your digital touchpoints
  • Character Development: Build unique voices for characters in games, animations, or interactive experiences
  • Personalized Communication: Generate content that sounds like a specific individual (with appropriate permissions)
  • Accessibility Solutions: Create synthetic versions of a person’s voice for use in assistive technology
  • Multilingual Content: Clone a voice and use it to speak multiple languages while maintaining the same vocal identity

By understanding the possibilities, you can better strategize how custom voices might enhance your specific applications and user experiences.

Ethical Considerations

When creating and using custom voices, please consider these ethical guidelines:

  1. Always obtain proper consent from the person whose voice you are cloning
  2. Use voice clones respectfully and avoid creating misleading content
  3. Clearly identify synthesized speech when there might be confusion about its origin
  4. Follow applicable laws and regulations regarding biometric data and privacy

Responsible use of voice cloning technology helps ensure its continued availability while respecting individual rights and social norms.

Authorizations

x-api-key
string
header
required

The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.

Body

multipart/form-data

Response

200
application/json

Successful Response

The response is of type object.