Skip to main content

Overview

Transform text from information to experience. With mars-instruct, you can craft speech that captures subtle emotional states, dramatic pacing, and conversational dynamics. Not just reading text, but performing it. With mars-instruct, you control expression by adding concise tags directly in the text, such as [happy], [sad], [laughing], or [sighing].

Hear the Difference


Emotion Tags

Emotion Tone Tags

For emotional tone (happy, sad, angry), use tags and match your text content to the emotion:
TagExample Text
[happy]โ€[happy] We won the match! This is the best day ever!โ€
[sad]โ€[sad] Iโ€ฆ I donโ€™t know if I can do this anymoreโ€ฆโ€
Important: The text content and punctuation must match the emotion for best results.

Sound Effect Tags

Sound effect tags go within your sentence where the action naturally occurs:
TagExampleNotes
[laughing]โ€Thatโ€™s ridiculous! [laughing] I canโ€™t believe that!โ€Produces laughter sound
[sighing]โ€I guess we have to start over. [sighing] Alright, letโ€™s begin.โ€Produces sigh sound
ahem ahemโ€So what I was going to say isโ€ฆ ahem ahemโ€ฆ never mind.โ€Produces throat-clearing sound

Delivery Tags

Delivery tags provide tone guidance for the words that follow them.
TagEffect
[shouting, angry, threatening]Agitated, confrontational delivery
[whispering, secretive]Quiet, intimate delivery
[empathetic, helpful]Caring, supportive delivery
[happy, excited, promotional]Upbeat, promotional delivery
[patient, teaching]Educational, measured delivery

Emotion Tag Gradation Guide

How To Use This

  • Read each tag list from left to right.
  • Left side means more balanced, subtle, or restrained.
  • Right side means more extreme, forceful, or obvious.
  • If you want the strongest controllable result, start from the rightmost tag.
  • If you want a more natural or less exaggerated result, move one or two steps left.
This is a practical TTS guide, not a dictionary guide. Some tags are ordered by how strongly they tend to push delivery, not just by literal meaning.

Examples Of Use

  1. [angry] Who stole my cash!
  2. [trembling] I don't know who did it...
  3. [cheerful] Welcome back. I saved you a seat.
  4. [commanding] Stop right there and listen carefully.

Tag Ladders

Balanced -> Extreme
  1. Nervousness: [uneasy] -> [nervous] -> [anxious] -> [trembling]
  2. Fear: [fearful] -> [scared] -> [terrified] -> [panicked]
  3. Anger: [irritated] -> [angry] -> [furious] -> [enraged]
  4. Sadness: [down] -> [sad] -> [melancholic] -> [depressed]
  5. Joy: [cheerful] -> [happy] -> [joyful] -> [delighted]
  6. Excitement: [energetic] -> [excited] -> [thrilled] -> [hyped]
  7. Calmness: [relaxed] -> [calm] -> [peaceful] -> [serene]
  8. Confidence: [assured] -> [confident] -> [certain] -> [bold]
  9. Doubt: [uncertain] -> [doubtful] -> [hesitant] -> [skeptical]
  10. Surprise: [surprised] -> [startled] -> [shocked] -> [astonished]
  11. Disgust: [grossed_out] -> [disgusted] -> [repulsed] -> [revolted]
  12. Pride: [satisfied] -> [accomplished] -> [proud] -> [fulfilled]
  13. Shame: [embarrassed] -> [guilty] -> [ashamed] -> [humiliated]
  14. Love: [warm] -> [affectionate] -> [loving] -> [tender]
  15. Flirtation: [charming] -> [playful] -> [flirty] -> [teasing]
  16. Sarcasm: [dry] -> [ironic] -> [sarcastic] -> [mocking]
  17. Determination: [focused] -> [determined] -> [driven] -> [resolute]
  18. Frustration: [annoyed] -> [irritated] -> [frustrated] -> [exasperated]
  19. Relief: [calmed] -> [reassured] -> [relieved] -> [grateful]
  20. Curiosity: [interested] -> [curious] -> [inquiring] -> [intrigued]
  21. Boredom: [dull] -> [uninterested] -> [bored] -> [apathetic]
  22. Awe: [inspired] -> [amazed] -> [awed] -> [wonderstruck]
  23. Suspicion: [wary] -> [suspicious] -> [guarded] -> [distrustful]
  24. Urgency: [urgent] -> [rushed] -> [intense] -> [pressured]
  25. Authority: [firm] -> [authoritative] -> [directive] -> [commanding]
  26. Politeness: [polite] -> [courteous] -> [respectful] -> [formal]
  27. Gratitude: [appreciative] -> [thankful] -> [grateful] -> [warm]
  28. Confusion: [uncertain] -> [puzzled] -> [confused] -> [lost]
  29. Hopelessness: [resigned] -> [defeated] -> [hopeless] -> [despairing]
  30. Playfulness: [lighthearted] -> [playful] -> [fun] -> [silly]

Practical Rule Of Thumb

  • Use the leftmost tag when you want the emotion to be present but not overpower the sentence.
  • Use the middle tags when you want clear emotional color without sounding theatrical.
  • Use the rightmost tag when you need the emotion to come through strongly and consistently.
Example:
  • Nervousness, subtle: [uneasy]
  • Nervousness, clear: [anxious]
  • Nervousness, strongest: [trembling]

How To Generalize This To New Emotions

This same principle generalizes well to new emotions:
  • Start with 3 to 4 tags for the same emotional family.
  • Arrange them from balanced to extreme.
  • Test them on the same sentence.
  • Keep the tag that gives the clearest emotional control without distorting the sentence too much.
  • When in doubt, the most extreme tag often gives the strongest controllability.
General rule: same emotion family + left-to-right intensity ladder + same test sentence = reliable controllable TTS

Combining Tags

For precise control, combine multiple embedded emotion and delivery tags:
response = client.text_to_speech.tts(
    text="[sighing, secretive] I have a secret to tell you... [happy, excited] We're going to Paris!",
    voice_id=147320,
    language="en-us",
    speech_model="mars-instruct",
    output_configuration=StreamTtsOutputConfiguration(format="wav")
)

Pauses

Add SSML-style breaks anywhere in your text for dramatic pauses:
You... must... understand... this. <break time='600ms'/> The future begins NOW.


Best Practices

  1. Use specific tags - Place concise delivery tags near the sentence they should affect
  2. Match content to emotion - Text and punctuation should reflect the emotional tone
  3. Place sound effects naturally - Tags like [laughing], [sighing] work best within sentences
  4. Keep tags short - Tags like [happy], [sad], or [whispering] work best when focused
  5. Add pauses - Use <break time='600ms'/> for dramatic effect

Next Steps

Text to Speech

Get started with basic TTS using the Python or TypeScript SDK.

Choosing a Model

Compare mars-instruct with mars-flash and mars-pro.

Voice Cloning

Create custom voices for your emotional speech.

TTS with Accents

Generate speech in 140+ language accents.