Emotional & Controllable TTS

Emotional Text to Speech with CosyVoice

Make your text sound human. CosyVoice generates expressive speech across five emotions and follows natural-language instructions to control style, dialect, speed, emphasis and breathing.

Enter your text

0/120

Limit 120 characters per generation. 3 trial samples · 20s wait · Pro skips wait

Select a voice

Awnie · Kids Storyteller
US
Awnie · Kids Storyteller

Warm, maternal and soothing delivery for children’s stories and bedtime reading.

Luna · Conversational
US
Luna · Conversational

Unleash warm, natural, and expressive storytelling voice.

Athena · Audiobook
UK
Athena · Audiobook

Clear, formal, and perfectly cadenced British professional voice.

Angus · Warm Narrator
US
Angus · Warm Narrator

Warm, rich, and highly conversational male voice, perfect for narrating stories and books.

Seán · Podcast Host
IE
Seán · Podcast Host

Charismatic and effortless male voice with a friendly Irish lilt, ideal for hosting podcasts and discussions.

Orpheus · Explainer
US
Orpheus · Explainer

Clear, confident voiceover for explainer videos, product demos, and YouTube tutorials.

Arcas · Commercial
US
Arcas · Commercial

Persuasive, polished reads for ads, promos, and brand commercials.

Expressive, controllable speech

Five core emotions

Render text as happy, sad, angry, fearful or surprised, in both Chinese and English.

Instruction control

Steer delivery with plain-language prompts like “speak slowly and gently” or “sound excited”.

Fine-grained markers

Place breaths, add emphasis and adjust pace at the word level for precise direction.

Consistent identity

Keep the same speaker identity across every emotion, style and speed.

Where expressive TTS shines

Games & characters

Voice NPCs and characters with emotion that matches the scene.

Video & social content

Add lively narration that holds viewer attention.

Conversational AI

Give assistants an empathetic, situation-aware tone.

Audiobooks & drama

Perform dialogue with believable emotional range.

Emotional TTS FAQ

What emotions does CosyVoice support?

CosyVoice can generate happy, sad, angry, fearful and surprised speech, plus neutral delivery, in both Chinese and English.

How do I control emotion and style?

Guide CosyVoice with natural-language instructions — for example “speak in a cheerful tone” — or use fine-grained markers for emphasis, pauses and speed.

Can I control speaking speed and emphasis?

Yes. CosyVoice supports fast and slow speed control plus word-level emphasis and breath markers for precise delivery.

Is emotional text to speech free to try?

Yes. Try expressive synthesis in the playground above. CosyVoice is open source under Apache-2.0 for self-hosting.