Understanding Voice Generation Results
This endpoint provides access to the results of your voice generation request after processing has completed. When you submit a text description through the/text-to-voice
endpoint, our system analyzes your parameters and creates not just one, but three distinct voice interpretations based on your description. This approach gives you the opportunity to compare different vocal realizations and select the one that best matches your intended use case.
How to Access Your Voice Interpretations
To retrieve your voice generation results, youβll need therun_id
that was provided when you initially submitted your voice generation request. This identifier uniquely references your specific generation task and allows you to access its results once processing is complete.
The endpoint follows this structure:
{run_id}
should be replaced with your actual run identifier. For example:
Understanding the Response
When you call this endpoint, the system returns a JSON object containing URLs to three audio previews. Each preview represents a different interpretation of your voice description, speaking the same text content you provided during submission. These previews have the following characteristics:Preview Variations Explained
- Preview 1: Primary Interpretation This first interpretation closely adheres to the core parameters of your description. It represents the systemβs primary understanding of your requested voice characteristics, focusing on the fundamental attributes you specified such as gender, age range, and basic tonal qualities.
- Preview 2: Nuanced Alternative The second interpretation explores alternative expressions of your description with modified pacing and emphasis patterns. This version might adjust factors like speaking rate, rhythmic patterns, and the relative emphasis placed on different syllables or words, creating a subtly different feel while still honoring your core description.
- Preview 3: Boundary Exploration The third interpretation takes a more experimental approach, testing the boundaries of your description parameters. This version might intensify certain characteristics or introduce complementary vocal qualities that werenβt explicitly stated but might enhance the overall effect of the voice.
- Emotional cadence variations: How emotional qualities rise and fall throughout the speech
- Prosody adjustments: Alterations in intonation patterns, rhythm, stress, and tonal qualities
- Phonetic emphasis patterns: Changes in how individual sounds are articulated and emphasized
From Preview to Permanent Voice
The preview voices you receive are temporary by design, intended to help you evaluate different interpretations of your description. However, you can convert any preview you particularly like into a permanent custom voice that becomes available throughout our speech synthesis ecosystem.Creating a Permanent Voice Profile
The process of converting a preview into a permanent voice involves these key steps:Retrieve and evaluate your preview samples
Create a permanent voice profile
/voices/create-custom-voice
endpoint to establish a permanent voice profile based on that preview.Integrate your custom voice
Implementation Example
The following Python code demonstrates how to retrieve your preview samples and convert your preferred option into a permanent custom voice:Best Practices for Voice Selection
When evaluating your preview options, consider these factors to make the most effective choice:- Content context: Different voices may be more suitable depending on whether your content is informational, narrative, promotional, or instructional.
- Audience expectations: Consider what vocal qualities would resonate best with your target audience based on their demographic and psychographic characteristics.
- Brand alignment: If the voice will represent your brand, assess which interpretation best embodies your brand personality and values.
- Emotional impact: Listen for how each voice conveys emotional subtext and choose the one that elicits the desired emotional response.
- Technical qualities: Consider practical aspects like clarity, intelligibility across different listening environments, and how well the voice handles specialized vocabulary.
Authorizations
The x-api-key
is a custom header required for authenticating requests to our API. Include this header in your request with the appropriate API key value to securely access our endpoints. You can find your API key(s) in the 'API' section of our studio website.
Path Parameters
The unique identifier for the run, which was generated during the text to voice creation process and returned upon task completion.
Response
Successful Response
An array of three distinct URL strings, each pointing to a unique voice sample generated from your text prompt. These samples represent different voice interpretations based on your description, allowing you to compare options before selecting your preferred voice for further use.