Multilingual & Cross-Lingual TTS

Multilingual Text to Speech with CosyVoice

Speak to the world with one model. CosyVoice synthesizes natural speech in nine languages and switches languages mid-sentence, with cross-lingual voice cloning that preserves the speaker.

Enter your text

32/120

Limit 120 characters per generation. Available: 88 characters.

Select a voice

Athena · Audiobook

Clear, formal, and perfectly cadenced British professional voice.

Luna · Conversational

Unleash warm, natural, and expressive storytelling voice.

Awnie · Kids Storyteller

Warm, maternal and soothing delivery for children’s stories and bedtime reading.

Angus · Warm Narrator

Warm, rich, and highly conversational male voice, perfect for narrating stories and books.

Seán · Podcast Host

Charismatic and effortless male voice with a friendly Irish lilt, ideal for hosting podcasts and discussions.

Orpheus · Explainer

Clear, confident voiceover for explainer videos, product demos, and YouTube tutorials.

Arcas · Commercial

Persuasive, polished reads for ads, promos, and brand commercials.

Nine languages, one model

9 languages

Chinese, English, Japanese, Korean, German, Spanish, French, Italian and Russian.

Cross-lingual cloning

Clone a voice in one language and have it speak another, keeping its identity.

Code-switching

Mix languages naturally within a single sentence for bilingual content.

Consistent quality

Trained on 1M hours for strong content consistency and speaker similarity across languages.

Go-global use cases

Localization & dubbing

Ship one piece of content in many languages with a consistent voice.

Global assistants

Serve users worldwide with one multilingual voice model.

Language learning

Generate native-sounding examples across languages.

International media

Produce multilingual narration and ads at scale.

Multilingual TTS FAQ

How many languages does CosyVoice support?

CosyVoice supports nine languages — Chinese, English, Japanese, Korean, German, Spanish, French, Italian and Russian — plus 18 Chinese dialects.

What is cross-lingual voice cloning?

Cross-lingual cloning lets a voice recorded in one language speak another language while keeping the same timbre and identity.

Can CosyVoice switch languages in one sentence?

Yes. CosyVoice handles code-switching, so a single utterance can mix, for example, Chinese and English naturally.

Is multilingual TTS free to use?

Yes. CosyVoice is open source under Apache-2.0, and you can try multilingual synthesis in the playground above.

Explore more CosyVoice tools

Voice Cloning Emotional TTS Cantonese & Dialects