Get Voice from Description Result
Retrieve AI-generated voice interpretations and select your preferred vocal style from multiple audio options using run_id
.
Understanding Voice Generation Results
This endpoint provides access to the results of your voice generation request after processing has completed. When you submit a text description through the /text-to-voice
endpoint, our system analyzes your parameters and creates not just one, but three distinct voice interpretations based on your description. This approach gives you the opportunity to compare different vocal realizations and select the one that best matches your intended use case.
How to Access Your Voice Interpretations
To retrieve your voice generation results, youβll need the run_id
that was provided when you initially submitted your voice generation request. This identifier uniquely references your specific generation task and allows you to access its results once processing is complete.
The endpoint follows this structure:
Where {run_id}
should be replaced with your actual run identifier. For example:
Understanding the Response
When you call this endpoint, the system returns a JSON object containing URLs to three audio previews. Each preview represents a different interpretation of your voice description, speaking the same text content you provided during submission. These previews have the following characteristics:
Preview Variations Explained
-
Preview 1: Primary Interpretation This first interpretation closely adheres to the core parameters of your description. It represents the systemβs primary understanding of your requested voice characteristics, focusing on the fundamental attributes you specified such as gender, age range, and basic tonal qualities.
-
Preview 2: Nuanced Alternative The second interpretation explores alternative expressions of your description with modified pacing and emphasis patterns. This version might adjust factors like speaking rate, rhythmic patterns, and the relative emphasis placed on different syllables or words, creating a subtly different feel while still honoring your core description.
-
Preview 3: Boundary Exploration The third interpretation takes a more experimental approach, testing the boundaries of your description parameters. This version might intensify certain characteristics or introduce complementary vocal qualities that werenβt explicitly stated but might enhance the overall effect of the voice.
Each of these interpretations applies different:
- Emotional cadence variations: How emotional qualities rise and fall throughout the speech
- Prosody adjustments: Alterations in intonation patterns, rhythm, stress, and tonal qualities
- Phonetic emphasis patterns: Changes in how individual sounds are articulated and emphasized
These variations are carefully calibrated to give you meaningful choices rather than random alternatives. By comparing the three interpretations side by side, you can identify which vocal qualities best serve your specific needs.
From Preview to Permanent Voice
The preview voices you receive are temporary by design, intended to help you evaluate different interpretations of your description. However, you can convert any preview you particularly like into a permanent custom voice that becomes available throughout our speech synthesis ecosystem.
Creating a Permanent Voice Profile
The process of converting a preview into a permanent voice involves these key steps:
Retrieve and evaluate your preview samples
First, call this endpoint to access your three preview options and determine which one best meets your needs.
Create a permanent voice profile
Once youβve selected your preferred preview, you can use the /voices/create-custom-voice
endpoint to establish a permanent voice profile based on that preview.
Integrate your custom voice
After creation, your new voice becomes available for use across all compatible speech synthesis services within our platform.
This workflow allows you to move from concept (your text description) to evaluation (the three previews) to implementation (your permanent custom voice) in a streamlined process.
Implementation Example
The following Python code demonstrates how to retrieve your preview samples and convert your preferred option into a permanent custom voice:
Best Practices for Voice Selection
When evaluating your preview options, consider these factors to make the most effective choice:
-
Content context: Different voices may be more suitable depending on whether your content is informational, narrative, promotional, or instructional.
-
Audience expectations: Consider what vocal qualities would resonate best with your target audience based on their demographic and psychographic characteristics.
-
Brand alignment: If the voice will represent your brand, assess which interpretation best embodies your brand personality and values.
-
Emotional impact: Listen for how each voice conveys emotional subtext and choose the one that elicits the desired emotional response.
-
Technical qualities: Consider practical aspects like clarity, intelligibility across different listening environments, and how well the voice handles specialized vocabulary.
By comparing your three preview options with these considerations in mind, you can select the voice interpretation that will most effectively serve your communication objectives.
By understanding the nuances of voice interpretation and making thoughtful selections from your preview options, you can establish voice profiles that authentically convey your message and connect with your audience.
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Path Parameters
The unique identifier for the run, which was generated during the text to voice creation process and returned upon task completion.
Response
Successful Response
The response is of type object
.