Skip to main content

Overview

Create a custom voice clone from a reference audio file and use it to generate speech. The cloned voice captures the unique characteristics of the original speaker.

Requirements

  • Reference audio file (10-30 seconds of clear speech)
  • Supported formats: WAV, MP3, FLAC, OGG
  • Clean audio with minimal background noise works best

Prerequisites

1

Create an account

Sign up at CAMB.AI Studio if you haven’t already.
2

Get your API key

Go to Settings → API Keys in Studio and copy your key. See Authentication for details.
3

Install the SDK

pip install camb-ai
Skip this step if you’re using the direct API.
4

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Code

import os
from camb.client import CambAI, save_stream_to_file
from camb.types import StreamTtsOutputConfiguration
from camb.types.language_enums import Languages

client = CambAI(api_key=os.getenv("CAMB_API_KEY"))

def clone_voice():
    # Create custom voice from reference audio
    print("Creating custom voice from reference audio...")
    custom_voice = client.voice_cloning.create_custom_voice(
        file=open("reference.wav", "rb"),
        voice_name="my-cloned-voice",
        gender=1,  # 1 = male, 2 = female
        description="Custom cloned voice",
        language=Languages.EN_US
    )

    print(f"Voice created! ID: {custom_voice.voice_id}")

    # Generate speech with the cloned voice
    print("Generating speech with cloned voice...")
    response = client.text_to_speech.tts(
        text="Hello! This is my cloned voice speaking.",
        voice_id=custom_voice.voice_id,
        language="en-us",
        speech_model="mars-flash",
        output_configuration=StreamTtsOutputConfiguration(format="wav")
    )

    save_stream_to_file(response, "cloned_output.wav")
    print("Audio saved to cloned_output.wav")

clone_voice()

Parameters

ParameterDescriptionValues
genderVoice gender1 = male, 2 = female
languageVoice languageUse Languages enum (e.g., Languages.EN_US, Languages.ES_ES, Languages.FR_FR)

Tips

  • Use high-quality reference audio for best results
  • 15-20 seconds of speech is ideal
  • Avoid background music or noise in reference audio
  • The cloned voice is saved to your account for future use

Next Steps

Emotional Voice Control

Add emotional expression to your cloned voices with mars-instruct.

Text to Speech

Generate speech with any voice using the SDK.

TTS with Accents

Speak in 140+ language accents with the same voice.

API Reference

Full voice cloning API specification.

Text to Sound Effects

Generate sound effects and music from text.