Create Text to Sound

Transform your descriptive text into rich, evocative audio with our powerful text-to-sound generation system. This innovative endpoint enables you to create immersive sound effects, atmospheric audio, and expressive soundscapes directly from textual descriptions. Whether you’re developing interactive applications, enhancing storytelling experiences, or creating dynamic content, this capability allows you to rapidly generate audio assets that perfectly match your creative vision.

For optimal accuracy, it is recommended to use a concise prompt. The more specific and focused your text description, the more precisely our system can generate the corresponding audio effect.

The Sound Generation Process

When you submit a text-to-sound request, our system begins a sophisticated workflow:

Task Creation

The system registers your request and establishes a dedicated processing task, returning a unique task identifier (task_id) that you’ll use to track and retrieve your generated audio based on the returned run_id.

Text Analysis

Our advanced natural language processing algorithms analyze your text prompt, identifying key acoustic elements, emotional tones, and sonic characteristics described in your text.

Audio Synthesis

Based on the analysis, our specialized audio generation models synthesize a sound that captures the essence of your description, carefully crafting the audio to match your specified duration.

Please note that our system currently limits text-to-sound generation to a maximum duration of 10 seconds per request. This duration constraint helps ensure optimal audio quality and efficient processing for all users.

Throughout this process, you can monitor the status of your generation task by polling the /text-to-sound/{task_id} endpoint with the task_id provided in your initial response.

Creating Your First Text-to-Sound Request

Let’s examine how to initiate a sound generation task using Python:

import requests
import json

# Your API authentication
headers = {
    "x-api-key": "your-api-key",  # Replace with your actual API key
    "Content-Type": "application/json"
}

def create_text_to_sound(prompt, duration=8.0):
    """
    Submits a new text-to-sound generation task and returns the task ID for tracking.

    Parameters:
    - prompt: A descriptive text explaining the sound to generate
    - duration: The desired length of the audio in seconds (default: 8.0)
    """
    try:
        # Prepare the request body
        payload = {
            "prompt": prompt,
            "duration": duration
        }

        # Submit the generation request
        response = requests.post(
            "https://client.camb.ai/apis/text-to-sound",
            headers=headers,
            data=json.dumps(payload)
        )

        # Verify the request was successful
        response.raise_for_status()

        # Extract the task ID from the response
        result = response.json()
        task_id = result.get("task_id")

        print(f"Sound generation task submitted successfully! Task ID: {task_id}")
        return task_id

    except requests.exceptions.RequestException as e:
        print(f"Error submitting text-to-sound task: {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(f"Response content: {e.response.text}")
        return None

# Example usage
prompt = "A gentle rainfall on a tin roof, gradually intensifying into a thunderstorm"
duration = 10.0  # Generate a 10-second audio clip
task_id = create_text_to_sound(prompt, duration)

Monitoring Your Sound Generation Progress

After submission, your sound generation task enters our processing pipeline. You can monitor the progress by polling the status endpoint:

def check_sound_generation_status(task_id):
    """
    Checks the status of a text-to-sound generation task.
    Returns the current status and any available result information.

    Parameters:
    - task_id: The ID of the generation task to check
    """
    if not task_id:
        print("No task ID provided.")
        return None

    try:
        response = requests.get(
            f"https://client.camb.ai/apis/text-to-sound/{task_id}",
            headers=headers
        )

        # Verify the request was successful
        response.raise_for_status()

        # Parse the status information
        status_data = response.json()
        print(f"Current status: {status_data['status']}")

        # If the generation is complete, display the results
        if status_data['status'] == "SUCCESS":
            print("Sound generation completed successfully!")
            print(f"Audio URL: {status_data.get('audio_url')}")

        return status_data

    except requests.exceptions.RequestException as e:
        print(f"Error checking generation status: {e}")
        return None

# Check the status of your generation task
status_info = check_sound_generation_status(task_id)

Crafting Effective Text Prompts

The quality of your generated audio depends significantly on how well you craft your text prompts. Here are some professional recommendations for creating effective descriptions:

Be Specific: Instead of “ocean sounds,” try “gentle waves lapping against a sandy shore with seagulls calling in the distance.”
Include Context: Mentioning the environment helps, such as “footsteps echoing in a large empty cathedral” rather than just “footsteps.”
Describe Dynamics: Indicate how the sound should evolve, like “a violin note that starts softly, gradually crescendos, then fades away.”
Mention Emotional Qualities: Terms like “eerie,” “cheerful,” or “melancholic” help guide the emotional tone of the generated audio.
Reference Familiar Sounds: Comparing to common sounds can be helpful, such as “similar to the hum of an old refrigerator but with a metallic quality.”

Authorizations

x-api-key

string

header

required

The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.

Body

application/json

Response

200

application/json

Successful Response

A JSON that contains unique identifier for the task. This is used to query the status of the text to sound task that is running. It is returned when a create request is made to generate sound from text.

INTRODUCTION

API ENDPOINTS

ADMINISTRATION

Create Text to Sound

The Sound Generation Process

Creating Your First Text-to-Sound Request

Monitoring Your Sound Generation Progress

Crafting Effective Text Prompts

Authorizations

Body

Response

INTRODUCTION

API ENDPOINTS

ADMINISTRATION

​The Sound Generation Process

​Creating Your First Text-to-Sound Request

​Monitoring Your Sound Generation Progress

​Crafting Effective Text Prompts

Authorizations

Body

Response

The Sound Generation Process

Creating Your First Text-to-Sound Request

Monitoring Your Sound Generation Progress

Crafting Effective Text Prompts