Transform your descriptive text into rich, evocative audio with our powerful text-to-sound generation system. This innovative endpoint enables you to create immersive sound effects, atmospheric audio, and expressive soundscapes directly from textual descriptions. Whether you’re developing interactive applications, enhancing storytelling experiences, or creating dynamic content, this capability allows you to rapidly generate audio assets that perfectly match your creative vision.
For optimal accuracy, it is recommended to use a concise prompt. The more specific and focused your text description, the more precisely our system can generate the corresponding audio effect.
When you submit a text-to-sound request, our system begins a sophisticated workflow:
1
Task Creation
The system registers your request and establishes a dedicated processing task, returning a unique task identifier (task_id) that you’ll use to track and retrieve your generated audio based on the returned run_id.
2
Text Analysis
Our advanced natural language processing algorithms analyze your text prompt, identifying key acoustic elements, emotional tones, and sonic characteristics described in your text.
3
Audio Synthesis
Based on the analysis, our specialized audio generation models synthesize a sound that captures the essence of your description, carefully crafting the audio to match your specified duration.
Please note that our system currently limits text-to-sound generation to a maximum duration of 10 seconds per request. This duration constraint helps ensure optimal audio quality and efficient processing for all users.
Throughout this process, you can monitor the status of your generation task by polling the /text-to-sound/{task_id} endpoint with the task_id provided in your initial response.
Let’s examine how to initiate a sound generation task using Python:
Copy
import requestsimport json# Your API authenticationheaders = { "x-api-key": "your-api-key", # Replace with your actual API key "Content-Type": "application/json"}def create_text_to_sound(prompt, duration=8.0): """ Submits a new text-to-sound generation task and returns the task ID for tracking. Parameters: - prompt: A descriptive text explaining the sound to generate - duration: The desired length of the audio in seconds (default: 8.0) """ try: # Prepare the request body payload = { "prompt": prompt, "duration": duration } # Submit the generation request response = requests.post( "https://client.camb.ai/apis/text-to-sound", headers=headers, data=json.dumps(payload) ) # Verify the request was successful response.raise_for_status() # Extract the task ID from the response result = response.json() task_id = result.get("task_id") print(f"Sound generation task submitted successfully! Task ID: {task_id}") return task_id except requests.exceptions.RequestException as e: print(f"Error submitting text-to-sound task: {e}") if hasattr(e, 'response') and e.response is not None: print(f"Response content: {e.response.text}") return None# Example usageprompt = "A gentle rainfall on a tin roof, gradually intensifying into a thunderstorm"duration = 10.0 # Generate a 10-second audio cliptask_id = create_text_to_sound(prompt, duration)
After submission, your sound generation task enters our processing pipeline. You can monitor the progress by polling the status endpoint:
Copy
def check_sound_generation_status(task_id): """ Checks the status of a text-to-sound generation task. Returns the current status and any available result information. Parameters: - task_id: The ID of the generation task to check """ if not task_id: print("No task ID provided.") return None try: response = requests.get( f"https://client.camb.ai/apis/text-to-sound/{task_id}", headers=headers ) # Verify the request was successful response.raise_for_status() # Parse the status information status_data = response.json() print(f"Current status: {status_data['status']}") # If the generation is complete, display the results if status_data['status'] == "SUCCESS": print("Sound generation completed successfully!") print(f"Audio URL: {status_data.get('audio_url')}") return status_data except requests.exceptions.RequestException as e: print(f"Error checking generation status: {e}") return None# Check the status of your generation taskstatus_info = check_sound_generation_status(task_id)
The quality of your generated audio depends significantly on how well you craft your text prompts. Here are some professional recommendations for creating effective descriptions:
Be Specific: Instead of “ocean sounds,” try “gentle waves lapping against a sandy shore with seagulls calling in the distance.”
Include Context: Mentioning the environment helps, such as “footsteps echoing in a large empty cathedral” rather than just “footsteps.”
Describe Dynamics: Indicate how the sound should evolve, like “a violin note that starts softly, gradually crescendos, then fades away.”
Mention Emotional Qualities: Terms like “eerie,” “cheerful,” or “melancholic” help guide the emotional tone of the generated audio.
Reference Familiar Sounds: Comparing to common sounds can be helpful, such as “similar to the hum of an old refrigerator but with a metallic quality.”
The x-api-key is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
A textual description of the sound you want to generate. This required field should contain a clear, descriptive explanation of the desired audio effect. While our system can process lengthy descriptions, concise prompts typically yield more accurate results.
Specify how long you want your generated audio to be, measured in seconds. This optional parameter defaults to 8.0 seconds if not explicitly set. The duration value directly impacts how the audio evolves over time, with longer durations allowing for more complex sonic development.
Enter a distinctive name for your project that reflects its purpose or content. This name will be displayed in your CAMB.AI workspace dashboard and used to organize related assets, transcriptions, etc.. . Choose something memorable that helps you quickly identify this specific project among your other voice, audio and localization tasks.
Provide details about your project's goals and specifications. Include information such as the target languages for translation or dubbing, desired voice characteristics, emotional tones to capture, or specific audio processing requirements, outlining the workflow here can serve as valuable documentation for organizational purposes.
Specify the organizational folder within your CAMB.AI workspace where this task should be created and stored. The folder must already exist in your workspace and be accessible through your current API key authentication. This helps maintain project organization by grouping related tasks together, making it easier to manage and locate your projects.
Required range: x >= 1
Response
Successful Response
A JSON that contains unique identifier for the task. This is used to query the status of the text to sound task that is running. It is returned when a create request is made to generate sound from text.