Blog

Insights, updates and stories from our team

What's New in CosyVoice 3: In-the-Wild Speech Generation

Release2026/05/20

What's New in CosyVoice 3: In-the-Wild Speech Generation

CosyVoice 3 scales training data to 1M hours and 1.5B parameters, adds a new supervised speech tokenizer and reinforcement-learning post-training, and supports 9 languages plus 18 Chinese dialects. Here is everything that changed.

CosyVoice 3 vs CosyVoice 2 vs CosyVoice 1: What Changed

Comparison2026/05/18

CosyVoice 3 vs CosyVoice 2 vs CosyVoice 1: What Changed

A side-by-side comparison of CosyVoice 1, 2, and 3 — training data, model size, languages, tokenizer, latency, and voice cloning quality — to help you choose the right version.

How to Install and Use CosyVoice (Step-by-Step Guide)

Guide2026/05/15

How to Install and Use CosyVoice (Step-by-Step Guide)

A practical guide to installing CosyVoice from GitHub, downloading the pretrained models, and generating your first zero-shot voice clone in Python — plus Docker and web UI options.

CosyVoice 3 Architecture Explained: Tokenizer, LLM, and Reward Model

Technical2026/05/12

CosyVoice 3 Architecture Explained: Tokenizer, LLM, and Reward Model

A clear walkthrough of how CosyVoice 3 works — the supervised multi-task speech tokenizer, the LLM with chunk-aware flow matching, and the differentiable reward model used for post-training.

CosyVoice 3 Benchmarks: Content Consistency and Speaker Similarity

Benchmarks2026/05/10

CosyVoice 3 Benchmarks: Content Consistency and Speaker Similarity

How CosyVoice 3 is evaluated — content consistency (WER/CER) and speaker similarity (SS) — what the metrics mean, how it compares to CosyVoice 2, and how to read TTS benchmark numbers.