> ## Documentation Index
> Fetch the complete documentation index at: https://docs.camb.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Stream Text-to-Speech Audio

> Convert text to speech in real-time with customizable voice characteristics, delivering audio content as it's generated for immediate playback in your applications.

[Camb AI Python SDK Examples](https://github.com/camb-ai/cambai-python-sdk/tree/main/examples/async_tts_call.py)

[Link To Detailed Models Overview](/models)

## How the Streaming Process Works

Our streaming service is designed for simplicity and speed. Here’s how it works from request to playback:

<Steps>
  <Step title="Submit Your Text & Configuration">
    Send a POST request containing your text and desired audio configuration, including the voice, language, and output format.
  </Step>

  <Step title="Receive the Audio Stream">
    The server immediately begins processing and sends audio data back in chunks over the same connection. Your application can start playing the audio as soon as the first chunk arrives.
  </Step>

  <Step title="Manage Playback & Usage">
    Continue reading the byte stream until the connection closes, which signals the end of the audio. You can also monitor real-time usage via the `X-Credits-Required` header included in the response.
  </Step>
</Steps>

### Language Support

The `language` field takes a **BCP-47 locale code** (e.g. `en-us`, `hi-in`, `zh-cn`). It controls the accent and pronunciation of the generated speech — the model does **not** translate the input text, so the text you supply should already be written in the target language.

#### Coverage by model

| Speech model                               | Locales supported |
| :----------------------------------------- | :---------------- |
| `mars-flash`, `mars-pro`                   | 33                |
| `mars-8.1-flash-beta`, `mars-8.1-pro-beta` | 158               |
| `mars-instruct`                            | 141               |

See the full per-model locale list in [Language Support](/language-support).

#### Choosing a locale

* Use the most specific regional variant available for the accent you want. For example, prefer `es-mx` over `es-es` for a Mexican Spanish accent, or `zh-cn-sichuan` over `zh-cn` for a Sichuan-flavored Mandarin.
* For best results on the MARS 8.1 beta models, supply a **reference voice in the same language and accent** as the target locale.
* Codes are case-sensitive lowercase (e.g. `pt-br`, not `pt-BR`).

#### Validation behavior

If the requested `language` is not supported by the selected `speech_model`, the API responds with **HTTP 422** and a `ValidationError` body that lists the allowed locales for that model. Example:

```json theme={null}
{
  "detail": [{
    "loc": ["body"],
    "msg": "Value error, Language 'zh-tw' is not supported for speech model 'mars-flash'. Allowed languages are: ['en-us', 'en-in', 'zh-cn', ...]"
  }]
}
```

### Advanced Customization

Fine-tune the audio with additional parameters to control the performance, style, and quality of the generated speech. These are sent in the same JSON payload.

* **`speech_model`**: Specify the model for synthesis. Available values include `mars-8.1-flash-beta`, `mars-8.1-pro-beta`, `mars-flash`, `mars-pro`, and `mars-instruct`.
* **Expressive text tags**: With `mars-instruct`, you can also embed delivery tags directly in the text (for example, emotion tags or SSML-style pauses) to shape pacing and tone.
* **`output_configuration`**: Set the audio format (`wav`, `mp3`), sample rate, and toggle output enhancement.
  * `apply_enhancement` (boolean, optional): Applies output audio enhancement (loudness, denoising, polish). Defaults to `true` for most models, `false` for the speed-oriented `mars-flash` and `mars-8.1-flash-beta` models. Set explicitly to override.
* **`voice_settings`**: Enhance reference audio quality, maintain the source accent, or adjust the speaking rate.
* **`inference_options`**: Adjust stability, temperature, and speaker similarity for unique results.

<Note>
  The `mars-8.1-flash-beta` and `mars-8.1-pro-beta` models do not support the following parameters:

  * `acoustic_quality_boost`
  * `temperature`
  * `speaker_similarity`
  * `maintain_source_accent`
  * `stability`
  * `output_enhancement`
  * `enhance_named_entities_pronunciation`
  * `localize_speaker_weight`
</Note>

### MARS 8.1 Beta Text Controls

The `mars-8.1-flash-beta` and `mars-8.1-pro-beta` models support inline controls for English pronunciation and expressive non-verbal sounds. Add these controls directly in the `text` field.

#### Pronunciation Control (English)

Use CMU pronunciation dictionary phonemes in uppercase, wrapped in brackets, to override default English pronunciations.

```python theme={null}
payload = {
    "text": "He plays the [B EY1 S] guitar while catching a [B AE1 S] fish.",
    "language": "en-us",
    "voice_id": 147320,
    "speech_model": "mars-8.1-flash-beta"
}
```

#### Non-verbal Symbols

Insert supported tags directly in the text to add expressive non-verbal sounds.

```python theme={null}
payload = {
    "text": "[laughter] You really got me. I didn't see that coming at all.",
    "language": "en-us",
    "voice_id": 147320,
    "speech_model": "mars-8.1-flash-beta"
}
```

Supported tags: `[laughter]`, `[sigh]`, `[confirmation]`, `[question]`, `[surprise]`, `[dissatisfaction]`.

### Expressive Text Tags (`mars-instruct`)

You can directly convey expression in the input text by adding short tags for delivery. For a deeper guide to emotion tag intensity, see the [Emotion Tag Gradation Guide](/tutorials/emotional-voice-control#emotion-tag-gradation-guide).

* `[speaking slowly] You need to understand this. It is very important. We should do this the right way.`
* `[angry] You need to understand this! It is very important, we should do this the right way!`
* `[gentle, reassuring] Take a deep breath. You're doing well. Let's go step by step.`
* `Please pause here <break time="500ms"/> then continue in a calm, clear tone.`

Keep tags short and place them near the sentence you want to influence.

<Note>
  For comprehensive examples and best practices, see the [Emotional Voice Control tutorial](/tutorials/emotional-voice-control).
</Note>

### Tips For Best Results:

* For texts with numbers expand the numbers to words. For example, instead of "123" to "one hundred twenty three" or "one two three" as you need.
* For code-switched sentences, perform transliteration to convert the text to your chosen TTS language.
  Both of above could be done by a small LLM.
* To adjust pacing or approximate length, use `voice_settings.speaking_rate`. The streaming TTS endpoint does not support a duration parameter.

### Output format support by model

Supported `output_configuration.format` values depend on the selected `speech_model`:

| Speech Model          | Supported output formats                                                                                   |
| :-------------------- | :--------------------------------------------------------------------------------------------------------- |
| `mars-8.1-flash-beta` | `wav`, `mp3`, `flac`, `adts`, `pcm_s16le`, `pcm_s16be`, `pcm_s32be`, `pcm_s32le`, `pcm_f32le`, `pcm_f32be` |
| `mars-8.1-pro-beta`   | `wav`, `mp3`, `flac`, `adts`, `pcm_s16le`, `pcm_s16be`, `pcm_s32be`, `pcm_s32le`, `pcm_f32le`, `pcm_f32be` |
| `mars-flash`          | `wav`, `mp3`, `flac`, `adts`, `pcm_s16le`, `pcm_s16be`, `pcm_s32be`, `pcm_s32le`, `pcm_f32le`, `pcm_f32be` |
| `mars-pro`            | `wav`, `mp3`, `flac`, `adts`, `pcm_s16le`, `pcm_s16be`, `pcm_s32be`, `pcm_s32le`, `pcm_f32le`, `pcm_f32be` |
| `mars-instruct`       | `wav`, `flac`, `adts`, `pcm_s16le`, `pcm_s32be`, `pcm_s32le`, `pcm_f32le`, `pcm_f32be`                     |

## Example: Real-time Audio Streaming

This example shows how to call the endpoint and save the incoming audio stream to a file.

```python [expandable] theme={null}
import requests

payload = {
    "text": "Jupiter, the largest planet in our solar system, is a gas giant with swirling storms like the iconic Great Red Spot.",
    "language": "en-us",
    "voice_id": 147320,
    "speech_model": "mars-instruct",
    "enhance_named_entities_pronunciation": True,
    "output_configuration": {
        "format": "wav"
    },
    "voice_settings": {
        "enhance_reference_audio_quality": False,
        "maintain_source_accent": False,
        "speaking_rate": 1.0
    },
    "inference_options": {
        "inference_steps": 60,
    }
}

headers = {
    "x-api-key": "your-api-key"
}

response = requests.post(
    "https://client.camb.ai/apis/tts-stream",
    json=payload,
    headers=headers,
    stream=True
)

response.raise_for_status()

with open("output.wav", "wb") as audio_file:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:
            audio_file.write(chunk)

print("✨ Stream complete. Audio saved to output.wav")
```

## SDK Example: Async Streaming

```python theme={null}
import asyncio
from camb.client import AsyncCambAI, save_async_stream_to_file
from camb.types.stream_tts_output_configuration import StreamTtsOutputConfiguration
from camb.types.stream_tts_voice_settings import StreamTtsVoiceSettings

# Initialize the async client
client = AsyncCambAI(api_key="your-api-key")

async def main():
    # Stream the TTS generation
    response = client.text_to_speech.tts(
        text="Experience high quality realistic sounds with Camb AI.",
        language="en-us",
        speech_model="mars-8.1-flash-beta",
        voice_id=<voice_id>,
        voice_settings=StreamTtsVoiceSettings(
            speaking_rate=1.0
        ),
        output_configuration=StreamTtsOutputConfiguration(
            format="wav"
        )
    )
    
    # Save the stream to a file (or process chunks as they arrive)
    await save_async_stream_to_file(response, "async_stream_output.wav")
    print("Audio stream saved to async_stream_output.wav")

if __name__ == "__main__":
    asyncio.run(main())
```

## Streaming vs. Asynchronous: Which to Choose?

Select the right tool for your job by understanding the key differences between our TTS endpoints.

<CardGroup>
  <Card title="Use Streaming" icon="bolt">
    Ideal for real-time, interactive experiences where immediate audio feedback is crucial.
  </Card>

  <Card title="Use Asynchronous" icon="clock">
    Perfect for non-real-time tasks, long-form content, or when you need to retrieve a complete audio file later.
  </Card>
</CardGroup>


## OpenAPI

````yaml post /tts-stream
openapi: 3.1.0
info:
  title: FastAPI
  version: 0.1.0
servers:
  - url: https://client.camb.ai/apis
security: []
paths:
  /tts-stream:
    post:
      tags:
        - Apis
        - Text-to-Speech
      summary: Stream Text-to-Speech Audio
      description: >-
        Generate speech from text and stream audio bytes back as they’re
        produced for low-latency playback.
      operationId: tts_tts_stream_post
      requestBody:
        description: Streaming Text-to-Speech request parameters.
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateStreamTTSRequestPayload'
        required: true
      responses:
        '200':
          description: Streaming audio response
          headers:
            X-Credits-Required:
              schema:
                type: string
                description: Number of credits required for this request
          content:
            audio/wav:
              schema:
                type: string
                format: binary
                description: Binary audio stream in WAV format.
            audio/flac:
              schema:
                type: string
                format: binary
                description: Binary audio stream in FLAC format.
            audio/aac:
              schema:
                type: string
                format: binary
                description: Binary audio stream in AAC/ADTS format.
            audio/mpeg:
              schema:
                type: string
                format: binary
                description: Binary audio stream in MP3 format.
            audio/x-pcm:
              schema:
                type: string
                format: binary
                description: >-
                  Raw PCM audio stream (use `output_configuration.format` to
                  control sample format and endianness).
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
      security:
        - APIKeyHeader: []
components:
  schemas:
    CreateStreamTTSRequestPayload:
      properties:
        text:
          type: string
          maxLength: 3000
          minLength: 3
          title: Text
          description: >-
            The text to synthesize into speech (3–3000 characters). For
            `mars-8.1-flash-beta` and `mars-8.1-pro-beta`, you can include
            inline controls such as CMU phonemes (`[B EY1 S]`) and non-verbal
            tags (`[laughter]`).
          example: >-
            [laughter] He plays the [B EY1 S] guitar while catching a [B AE1 S]
            fish.
        language:
          type: string
          enum:
            - ro-ro
            - nl-nl
            - es-es
            - zh-tw
            - en-uk
            - el-gr
            - cs-cz
            - vi-vn
            - bn-bd
            - ar-tn
            - de-de
            - fr-ca
            - ar-xa
            - th-th
            - ar-eg
            - ar-sa
            - ar-sy
            - pa-in
            - zh-cn
            - ar-jo
            - ru-ru
            - bn-in
            - uk-ua
            - es-us
            - ja-jp
            - ar-ae
            - mr-in
            - en-au
            - de-ch
            - pt-pt
            - ar-kw
            - ar-qa
            - as-in
            - hi-in
            - fr-be
            - fi-fi
            - fr-fr
            - ar-dz
            - fr-ch
            - it-it
            - de-at
            - en-in
            - ko-kr
            - en-us
            - zh-hk
            - ar-om
            - ar-ma
            - pl-pl
            - ar-ly
            - es-mx
            - tr-tr
            - ar-iq
            - ar-lb
            - ml-in
            - pt-br
            - id-id
            - ar-bh
            - kn-in
            - nl-be
            - te-in
            - ar-ye
            - ta-in
            - af-za
            - am-et
            - az-az
            - bg-bg
            - bs-ba
            - ca-es
            - cy-gb
            - da-dk
            - en-ca
            - en-gb
            - en-hk
            - en-ie
            - en-ke
            - en-ng
            - en-nz
            - en-ph
            - en-sg
            - en-tz
            - en-za
            - es-ar
            - es-bo
            - es-cl
            - es-co
            - es-cr
            - es-cu
            - es-do
            - es-ec
            - es-gq
            - es-gt
            - es-hn
            - es-ni
            - es-pa
            - es-pe
            - es-pr
            - es-py
            - es-sv
            - es-uy
            - es-ve
            - et-ee
            - eu-es
            - fa-ir
            - fil-ph
            - ga-ie
            - gl-es
            - gu-in
            - he-il
            - hr-hr
            - hu-hu
            - hy-am
            - is-is
            - jv-id
            - ka-ge
            - kk-kz
            - km-kh
            - lo-la
            - lt-lt
            - lv-lv
            - mk-mk
            - mn-mn
            - ms-my
            - mt-mt
            - my-mm
            - nb-no
            - ps-af
            - si-lk
            - sk-sk
            - sl-si
            - so-so
            - sq-al
            - sr-rs
            - sv-se
            - sw-ke
            - sw-tz
            - ta-lk
            - ta-my
            - ta-sg
            - ur-in
            - ur-pk
            - uz-uz
            - zh-cn-henan
            - zh-cn-liaoning
            - zh-cn-shaanxi
            - zh-cn-shandong
            - zh-cn-sichuan
            - zu-za
            - sa-in
            - tl-ph
            - es-xl
            - or-in
            - mai-in
            - sd-in
            - kok-in
            - mni-in
            - ks-in
            - doi-in
            - brx-in
            - sat-in
          title: Language
          description: BCP-47 locale for the input text (for example, `en-us`).
          example: en-us
          default: en-us
        voice_id:
          type: integer
          minimum: 1
          title: Voice Id
          description: >-
            Voice profile ID to use for synthesis. Get available IDs from
            `/list-voices`.
          example: 147320
          default: 147320
        speech_model:
          $ref: '#/components/schemas/SpeechModels'
          default: mars-8.1-flash-beta
          description: >-
            Speech model variant to use for synthesis. Use `mars-8.1-flash-beta`
            or `mars-8.1-pro-beta` to leverage inline pronunciation and
            non-verbal controls in `text`.
          example: mars-8.1-flash-beta
        enhance_named_entities_pronunciation:
          type: boolean
          title: Enhance Named Entities Pronunciation
          default: false
          description: >-
            If `true`, improves pronunciation of names, brands, and other named
            entities.
          example: true
        output_configuration:
          $ref: '#/components/schemas/StreamTTSOutputConfiguration'
          description: Controls output format and enhancement options for the stream.
          example:
            format: wav
        voice_settings:
          $ref: '#/components/schemas/StreamTTSVoiceSettings'
          description: >-
            Voice behavior preferences such as accent preservation and reference
            enhancement.
          example:
            enhance_reference_audio_quality: false
            maintain_source_accent: false
            speaking_rate: 1.5
      type: object
      required:
        - text
        - language
        - voice_id
      title: CreateStreamTTSRequestPayload
      description: Request body for `/tts-stream`.
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    SpeechModels:
      type: string
      enum:
        - mars-8.1-flash-beta
        - mars-8.1-pro-beta
        - mars-flash
        - mars-pro
        - mars-instruct
      description: Selects which speech model variant to use for synthesis.
      example: 1
      title: SpeechModels
    StreamTTSOutputConfiguration:
      properties:
        format:
          $ref: '#/components/schemas/OutputFormat'
          default: wav
          description: >-
            Audio format for the streamed response. Choose a container (`mp3`,
            `wav`, `flac`, `adts`) or a raw PCM format (`pcm_*`).
        sample_rate:
          type: integer
          title: Sample Rate
          description: >-
            Optional sample rate in Hz. Use this to control the audio quality
            and compatibility with different devices.
          example: 48000
          nullable: true
        apply_enhancement:
          type: boolean
          title: Apply Enhancement
          description: >-
            If `true`, applies output audio enhancement (loudness, denoising,
            polish). Defaults to `true` for most models; `false` for the
            speed-oriented `mars-flash` and `mars-8.1-flash-beta` models. Set
            explicitly to override the per-model default.
          nullable: true
          example: true
      type: object
      title: StreamTTSOutputConfiguration
    StreamTTSVoiceSettings:
      properties:
        enhance_reference_audio_quality:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Enhance Reference Audio Quality
          default: false
          description: >-
            Remove noise from reference audio. (useful when the reference has
            background noise or compression).
          example: false
        maintain_source_accent:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Maintain Source Accent
          default: false
          description: Maintain the accent from the original source audio.
          example: false
        speaking_rate:
          anyOf:
            - type: number
            - type: 'null'
          title: Speaking Rate
          default: 1
          description: Controls playback speed for generated speech.
          example: 1.5
      type: object
      title: StreamTTSVoiceSettings
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    OutputFormat:
      type: string
      enum:
        - wav
        - flac
        - adts
        - mp3
        - pcm_s16le
        - pcm_s16be
        - pcm_s32be
        - pcm_s32le
        - pcm_f32le
        - pcm_f32be
      description: Supported audio formats for streaming output.
      example: mp3
      title: OutputFormat
  securitySchemes:
    APIKeyHeader:
      type: apiKey
      in: header
      name: x-api-key
      description: >-
        The `x-api-key` is a custom header required for authenticating requests
        to our API. Include this header in your request with the appropriate API
        key value to securely access our endpoints. You can find your API key(s)
        in the 'API' section of our studio website.

````