Emotional & Controllable TTS

Emotional Text to Speech with CosyVoice

Make your text sound human. CosyVoice generates expressive speech across five emotions and follows natural-language instructions to control style, dialect, speed, emphasis and breathing.

Enter your text

Emotion markers

Colors are moods. Highlight text to add emotion, then Generate.

0/120

Limit 120 characters per generation. Available: 120 characters.

Select a voice

Athena · Audiobook

Clear, formal, and perfectly cadenced British professional voice.

Luna · Conversational

Unleash warm, natural, and expressive storytelling voice.

Awnie · Kids Storyteller

Warm, maternal and soothing delivery for children’s stories and bedtime reading.

Angus · Warm Narrator

Warm, rich, and highly conversational male voice, perfect for narrating stories and books.

Seán · Podcast Host

Charismatic and effortless male voice with a friendly Irish lilt, ideal for hosting podcasts and discussions.

Orpheus · Explainer

Clear, confident voiceover for explainer videos, product demos, and YouTube tutorials.

Arcas · Commercial

Persuasive, polished reads for ads, promos, and brand commercials.

Expressive, controllable speech

Five core emotions

Render text as happy, sad, angry, fearful or surprised, in both Chinese and English.

Instruction control

Steer delivery with plain-language prompts like “speak slowly and gently” or “sound excited”.

Fine-grained markers

Place breaths, add emphasis and adjust pace at the word level for precise direction.

Consistent identity

Keep the same speaker identity across every emotion, style and speed.

Where expressive TTS shines

Games & characters

Voice NPCs and characters with emotion that matches the scene.

Video & social content

Add lively narration that holds viewer attention.

Conversational AI

Give assistants an empathetic, situation-aware tone.

Audiobooks & drama

Perform dialogue with believable emotional range.

Emotional TTS FAQ

What emotions does CosyVoice support?

CosyVoice can generate happy, sad, angry, fearful and surprised speech, plus neutral delivery, in both Chinese and English.

How do I control emotion and style?

Guide CosyVoice with natural-language instructions — for example “speak in a cheerful tone” — or use fine-grained markers for emphasis, pauses and speed.

Can I control speaking speed and emphasis?

Yes. CosyVoice supports fast and slow speed control plus word-level emphasis and breath markers for precise delivery.

Is emotional text to speech free to try?

Yes. Try expressive synthesis in the playground above. CosyVoice is open source under Apache-2.0 for self-hosting.

Explore more CosyVoice tools

Voice Cloning Cantonese & Dialects Multilingual TTS