Blog
Insights, updates and stories from our team
Release2026/05/20
What's New in CosyVoice 3: In-the-Wild Speech Generation
CosyVoice 3 scales training data to 1M hours and 1.5B parameters, adds a new supervised speech tokenizer and reinforcement-learning post-training, and supports 9 languages plus 18 Chinese dialects. Here is everything that changed.
Comparison2026/05/18
CosyVoice 3 vs CosyVoice 2 vs CosyVoice 1: What Changed
A side-by-side comparison of CosyVoice 1, 2, and 3 — training data, model size, languages, tokenizer, latency, and voice cloning quality — to help you choose the right version.
Guide2026/05/15
How to Install and Use CosyVoice (Step-by-Step Guide)
A practical guide to installing CosyVoice from GitHub, downloading the pretrained models, and generating your first zero-shot voice clone in Python — plus Docker and web UI options.
Technical2026/05/12
CosyVoice 3 Architecture Explained: Tokenizer, LLM, and Reward Model
A clear walkthrough of how CosyVoice 3 works — the supervised multi-task speech tokenizer, the LLM with chunk-aware flow matching, and the differentiable reward model used for post-training.
Benchmarks2026/05/10
CosyVoice 3 Benchmarks: Content Consistency and Speaker Similarity
How CosyVoice 3 is evaluated — content consistency (WER/CER) and speaker similarity (SS) — what the metrics mean, how it compares to CosyVoice 2, and how to read TTS benchmark numbers.