Voice Cloning

Overview

Create a custom voice clone from a reference audio file and use it to generate speech. The cloned voice captures the unique characteristics of the original speaker.

Requirements

Reference audio file (10-30 seconds of clear speech)
Supported formats: WAV, MP3, FLAC, OGG
Clean audio with minimal background noise works best

Prerequisites

Create an account

Get your API key

Go to Settings → API Keys in Studio and copy your key. See Authentication for details.

Install the SDK

pip install camb-ai

Skip this step if you’re using the direct API.

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Code

import os
from camb.client import CambAI, save_stream_to_file
from camb.types import StreamTtsOutputConfiguration
from camb.types.language_enums import Languages

client = CambAI(api_key=os.getenv("CAMB_API_KEY"))

def clone_voice():
    # Create custom voice from reference audio
    print("Creating custom voice from reference audio...")
    custom_voice = client.voice_cloning.create_custom_voice(
        file=open("reference.wav", "rb"),
        voice_name="my-cloned-voice",
        gender=1,  # 1 = male, 2 = female
        description="Custom cloned voice",
        language=Languages.EN_US
    )

    print(f"Voice created! ID: {custom_voice.voice_id}")

    # Generate speech with the cloned voice
    print("Generating speech with cloned voice...")
    response = client.text_to_speech.tts(
        text="Hello! This is my cloned voice speaking.",
        voice_id=custom_voice.voice_id,
        language="en-us",
        speech_model="mars-flash",
        output_configuration=StreamTtsOutputConfiguration(format="wav")
    )

    save_stream_to_file(response, "cloned_output.wav")
    print("Audio saved to cloned_output.wav")

clone_voice()

Parameters

Parameter	Description	Values
`gender`	Voice gender	`1` = male, `2` = female
`language`	Voice language	Use `Languages` enum (e.g., `Languages.EN_US`, `Languages.ES_ES`, `Languages.FR_FR`)

Tips

Use high-quality reference audio for best results
15-20 seconds of speech is ideal
Avoid background music or noise in reference audio
The cloned voice is saved to your account for future use

Next Steps

Emotional Voice Control

Add emotional expression to your cloned voices with mars-instruct.

Text to Speech

Generate speech with any voice using the SDK.

TTS with Accents

Speak in 140+ language accents with the same voice.

API Reference

Full voice cloning API specification.

Text to Sound Effects

Generate sound effects and music from text.

Getting Started

Models

SDK Guides

Tutorials

Hosting Platforms

Integrations

API Reference

Overview

Requirements

Prerequisites

Code

Parameters

Tips

Next Steps

Emotional Voice Control

Text to Speech

TTS with Accents

API Reference

Text to Sound Effects

Getting Started

Models

SDK Guides

Tutorials

Hosting Platforms

Integrations

API Reference

​Overview

​Requirements

​Prerequisites

​Code

​Parameters

​Tips

​Next Steps

Emotional Voice Control

Text to Speech

TTS with Accents

API Reference

Text to Sound Effects

Overview

Requirements

Prerequisites

Code

Parameters

Tips

Next Steps