Create Text to Sound
Creates a task to generate an audio file from a given text prompt.
Transform your descriptive text into rich, evocative audio with our powerful text-to-sound generation system. This innovative endpoint enables you to create immersive sound effects, atmospheric audio, and expressive soundscapes directly from textual descriptions. Whether you’re developing interactive applications, enhancing storytelling experiences, or creating dynamic content, this capability allows you to rapidly generate audio assets that perfectly match your creative vision.
The Sound Generation Process
When you submit a text-to-sound request, our system begins a sophisticated workflow:
Task Creation
The system registers your request and establishes a dedicated processing task, returning a unique task identifier (task_id
) that you’ll use to track and retrieve your generated audio based on the returned run_id
.
Text Analysis
Our advanced natural language processing algorithms analyze your text prompt, identifying key acoustic elements, emotional tones, and sonic characteristics described in your text.
Audio Synthesis
Based on the analysis, our specialized audio generation models synthesize a sound that captures the essence of your description, carefully crafting the audio to match your specified duration.
Throughout this process, you can monitor the status of your generation task by polling the /text-to-sound/{task_id}
endpoint with the task_id
provided in your initial response.
Creating Your First Text-to-Sound Request
Let’s examine how to initiate a sound generation task using Python:
Monitoring Your Sound Generation Progress
After submission, your sound generation task enters our processing pipeline. You can monitor the progress by polling the status endpoint:
Crafting Effective Text Prompts
The quality of your generated audio depends significantly on how well you craft your text prompts. Here are some professional recommendations for creating effective descriptions:
-
Be Specific: Instead of “ocean sounds,” try “gentle waves lapping against a sandy shore with seagulls calling in the distance.”
-
Include Context: Mentioning the environment helps, such as “footsteps echoing in a large empty cathedral” rather than just “footsteps.”
-
Describe Dynamics: Indicate how the sound should evolve, like “a violin note that starts softly, gradually crescendos, then fades away.”
-
Mention Emotional Qualities: Terms like “eerie,” “cheerful,” or “melancholic” help guide the emotional tone of the generated audio.
-
Reference Familiar Sounds: Comparing to common sounds can be helpful, such as “similar to the hum of an old refrigerator but with a metallic quality.”
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Body
Response
Successful Response
A JSON that contains unique identifier for the task. This is used to query the status of the text to sound task that is running. It is returned when a create request is made to generate sound from text.