Multilingual Text to Speech with CosyVoice
Speak to the world with one model. CosyVoice synthesizes natural speech in nine languages and switches languages mid-sentence, with cross-lingual voice cloning that preserves the speaker.
Nine languages, one model
9 languages
Chinese, English, Japanese, Korean, German, Spanish, French, Italian and Russian.
Cross-lingual cloning
Clone a voice in one language and have it speak another, keeping its identity.
Code-switching
Mix languages naturally within a single sentence for bilingual content.
Consistent quality
Trained on 1M hours for strong content consistency and speaker similarity across languages.
Go-global use cases
Localization & dubbing
Ship one piece of content in many languages with a consistent voice.
Global assistants
Serve users worldwide with one multilingual voice model.
Language learning
Generate native-sounding examples across languages.
International media
Produce multilingual narration and ads at scale.
Multilingual TTS FAQ
How many languages does CosyVoice support?
CosyVoice supports nine languages — Chinese, English, Japanese, Korean, German, Spanish, French, Italian and Russian — plus 18 Chinese dialects.
What is cross-lingual voice cloning?
Cross-lingual cloning lets a voice recorded in one language speak another language while keeping the same timbre and identity.
Can CosyVoice switch languages in one sentence?
Yes. CosyVoice handles code-switching, so a single utterance can mix, for example, Chinese and English naturally.
Is multilingual TTS free to use?
Yes. CosyVoice is open source under Apache-2.0, and you can try multilingual synthesis in the playground above.