Zero-Shot Voice Cloning

AI Voice Cloning with CosyVoice

Clone voices you own or have permission to use from just a few seconds of audio. CosyVoice is an open-source, zero-shot voice cloning model that reproduces timbre, accent and prosody — then speaks your text in multiple languages.

Enter your text

0/120

Limit 120 characters per generation. 3 trial samples · 20s wait · Pro skips wait

Select a voice

Awnie · Kids Storyteller
US
Awnie · Kids Storyteller

Warm, maternal and soothing delivery for children’s stories and bedtime reading.

Luna · Conversational
US
Luna · Conversational

Unleash warm, natural, and expressive storytelling voice.

Athena · Audiobook
UK
Athena · Audiobook

Clear, formal, and perfectly cadenced British professional voice.

Angus · Warm Narrator
US
Angus · Warm Narrator

Warm, rich, and highly conversational male voice, perfect for narrating stories and books.

Seán · Podcast Host
IE
Seán · Podcast Host

Charismatic and effortless male voice with a friendly Irish lilt, ideal for hosting podcasts and discussions.

Orpheus · Explainer
US
Orpheus · Explainer

Clear, confident voiceover for explainer videos, product demos, and YouTube tutorials.

Arcas · Commercial
US
Arcas · Commercial

Persuasive, polished reads for ads, promos, and brand commercials.

Why clone voices with CosyVoice

Zero-shot cloning

Reproduce a target voice from a short reference clip. No fine-tuning or hours of training data required.

Open source & free

CosyVoice is released under Apache-2.0. Run it locally, self-host, or try it free online — no vendor lock-in.

Cross-lingual voices

Clone a voice in one language and have it speak another, keeping the same identity across Chinese, English, Japanese, Korean and more.

Natural prosody

Supervised speech tokens capture rhythm, stress and emotion, so cloned voices sound human, not robotic.

What you can build

Audiobooks & narration

Narrate long-form content in a consistent, recognizable voice.

AI agents & assistants

Give virtual agents a branded, real-time voice with sub-second latency.

Video dubbing

Dub videos into new languages while preserving the original speaker’s identity.

Accessibility

Restore or personalize voices for assistive reading and communication tools.

Voice cloning FAQ

What is zero-shot voice cloning?

Zero-shot voice cloning generates speech in a target voice using only a short reference sample — no per-speaker training. CosyVoice extracts the voice identity on the fly and applies it to any text you provide.

Is CosyVoice voice cloning free and open source?

Yes. CosyVoice is released under the Apache-2.0 license, so you can use, modify and self-host it for free. In-browser voice cloning in the playground above is coming soon.

How much audio do I need to clone a voice?

A few seconds of clean reference audio is usually enough for CosyVoice to capture timbre and speaking style. Longer, higher-quality samples improve similarity.

Can I clone a voice in one language and speak another?

Yes. CosyVoice supports cross-lingual cloning, so a voice recorded in English can speak Chinese, Japanese, Korean and other supported languages while keeping its identity.

Is voice cloning ethical and legal?

Only clone voices you own or have explicit permission to use. Cloning someone’s voice without consent may violate privacy and publicity rights. Use CosyVoice responsibly.