This report presents the Qwen2 series, the latest development in large language and multimodal models by the Qwen Team from Alibaba Group. The release includes a comprehensive range of foundational and instruction-tuned language models, spanning 0.5 to 72 billion parameters, with both dense models and a Mixture-of-Experts model. Qwen2 outperforms most previous open-weight models, including its predecessor Qwen1.5, and demonstrates competitive performance compared to proprietary models across various benchmarks, including language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning.
The flagship model, Qwen2-72B, delivers impressive results: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned version, Qwen2-72B-Instruct, achieves 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Additionally, Qwen2 exhibits strong multilingual capabilities, being proficient in around 30 languages, including English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, and Vietnamese, highlighting its versatility and global applicability.
To encourage community innovation and accessibility, the Qwen Team has made Qwen2 model weights available on Hugging Face and ModelScope, along with supplementary materials such as example code on GitHub. These resources also include tools for quantization, fine-tuning, and deployment, supporting a wide array of applications and research projects.