Qwen2.5-Coder

In early April, AlibabaCLoud Dev Team launched CodeQwen1.5, which quickly captured the attention of the community. Since then, they’ve been focused on advancing our coding models. Today, the Team excited to introduce the next generation of open-source coding models: Qwen2.5-Coder. Along with this release, they’re rebranding CodeQwen as Qwen-Coder. “We believe the name “Coder” better reflects our vision of creating a more human-like, agile coding assistant” they said. Qwen2.5-Coder is part of the Qwen2.5 series and will be available in three sizes: 1.5B, 7B, and a forthcoming 32B version.

Qwen2.5-Coder models

This new iteration brings two key improvements: an expanded training dataset and enhanced coding capabilities, while still excelling in other critical areas like mathematics and general tasks.

💻 Code More: Qwen2.5-Coder builds on the strengths of Qwen2.5, further training with a significantly larger dataset, including source code, text-code grounding data, and synthetic data, totaling 5.5 trillion tokens. This results in substantial improvements in code-centric tasks.
📚 Learn More: While boosting coding performance, we have ensured that Qwen2.5-Coder retains the robust math and general capabilities of the base model. It has been further enriched with data related to mathematics and general skills, making it a versatile tool for real-world applications like Code Agent.

Key Features

Supports long context understanding and generation with a context length of up to 128K tokens.
Capable of understanding and generating code in 92 programming languages, including:
- [‘ada’, ‘agda’, ‘alloy’, ‘antlr’, ‘applescript’, ‘assembly’, ‘augeas’, ‘awk’, ‘batchfile’, ‘bluespec’, ‘c’, ‘c#’, ‘c++’, ‘clojure’, ‘cmake’, ‘coffeescript’, ‘common-lisp’, ‘css’, ‘cuda’, ‘dart’, ‘dockerfile’, ‘elixir’, ‘elm’, ’emacs-lisp’, ‘erlang’, ‘f#’, ‘fortran’, ‘glsl’, ‘go’, ‘groovy’, ‘haskell’, ‘html’, ‘idris’, ‘isabelle’, ‘java’, ‘java-server-pages’, ‘javascript’, ‘json’, ‘julia’, ‘jupyter-notebook’, ‘kotlin’, ‘lean’, ‘literate-agda’, ‘literate-coffeescript’, ‘literate-haskell’, ‘lua’, ‘makefile’, ‘maple’, ‘markdown’, ‘mathematica’, ‘matlab’, ‘objective-c++’, ‘ocaml’, ‘pascal’, ‘perl’, ‘php’, ‘powershell’, ‘prolog’, ‘protocol-buffer’, ‘python’, ‘r’, ‘racket’, ‘restructuredtext’, ‘rmarkdown’, ‘ruby’, ‘rust’, ‘sas’, ‘scala’, ‘scheme’, ‘shell’, ‘smalltalk’, ‘solidity’, ‘sparql’, ‘sql’, ‘stan’, ‘standard-ml’, ‘stata’, ‘swift’, ‘systemverilog’, ‘tcl’, ‘tcsh’, ‘tex’, ‘thrift’, ‘typescript’, ‘verilog’, ‘vhdl’, ‘visual-basic’, ‘vue’, ‘xslt’, ‘yacc’, ‘yaml’, ‘zig’].
Retains strengths in math and general capabilities from the base model.

Important Update:

We’ve updated both the special tokens and their corresponding token IDs to ensure consistency with Qwen2.5. The new special tokens include:

{
  "<|fim_prefix|>": 151659, 
  "<|fim_middle|>": 151660, 
  "<|fim_suffix|>": 151661, 
  "<|fim_pad|>": 151662, 
  "<|repo_name|>": 151663, 
  "<|file_sep|>": 151664, 
  "<|im_start|>": 151644, 
  "<|im_end|>": 151645
}

model name	type	length	Download
Qwen2.5-Coder-1.5B	base	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-7B	base	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-1.5B-instruct	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-7B-instruct	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-1.5B-Instruct-GGUF	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-1.5B-Instruct-AWQ	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-7B-Instruct-GGUF	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-7B-Instruct-AWQ	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-7B-Instruct-GPTQ-Int4	instruct	128k	🤗 Hugging Face • 🤖 ModelScope
Qwen2.5-Coder-7B-Instruct-GPTQ-Int8	instruct	128k	🤗 Hugging Face • 🤖 ModelScope

Detailed performance and introduction are shown below.

Qwen2.5-Coder: Base Models

Qwen2.5-Coder supports context lengths of up to 128K tokens and is proficient in 92 programming languages. It has achieved significant advancements across various code-related evaluation benchmarks, including code generation, multi-language code generation, code completion, and code repair. Impressively, the open-source 7B version of Qwen2.5-Coder has outperformed larger models such as DeepSeek-Coder-V2-Lite and CodeStral-22B, establishing itself as one of the most robust base code models available.

In addition to its code-specific capabilities, Qwen2.5-Coder exhibits strong performance in mathematical tasks, as seen in evaluations like GSM8K and Math. For general tasks, assessments on MMLU and ARC demonstrate that Qwen2.5-Coder retains the overall general ability of the Qwen2.5 model series.

Base Models

Qwen2.5-Coder-Instruct: Instruction-Tuned Models

Qwen2.5-Coder-Instruct is built upon Qwen2.5-Coder through fine-tuning with instruction data. This refinement enhances its ability to understand and execute specific commands, leading to improved task performance. The model also demonstrates exceptional generalization, performing well across a wide range of benchmarks, making it an ideal choice for diverse coding and problem-solving tasks.

Instruction-Tuned Models

Qwen2.5-Coder-Instruct: Key Strengths

Qwen2.5-Coder-Instruct demonstrates exceptional capabilities across several key areas:

Outstanding Multi-language Expertise: Leveraging the McEval benchmark, we expanded our evaluations to include over 40 programming languages. Qwen2.5-Coder-Instruct performs exceptionally well across this diverse range, including niche languages, proving its versatility in multi-language coding tasks.
Code Reasoning: Recognizing the close relationship between code reasoning and general reasoning, we evaluated Qwen2.5-Coder-Instruct using the CRUXEval benchmark. The model excels in code reasoning tasks, and as its code reasoning abilities improve, so does its capacity to handle complex instructions. This synergy inspires further exploration of how coding can enhance broader reasoning skills.
Math Reasoning: Given the natural connection between math and coding—where math forms the foundation of code and coding is a crucial tool for mathematical tasks—Qwen2.5-Coder-Instruct excels in both domains. It is particularly strong in evaluations like GSM8K, GaoKao2023en, OlympiadBench, CollegeMath, and AIME24, earning the reputation of a “science student.”

Model	Math	GSM8K	GaoKao2023en	OlympiadBench	CollegeMath	AIME24
DeepSeek-Coder-V2-Lite-Instruct	61.0	87.6	56.1	26.4	39.8	6.7
Qwen2.5-Coder-7B-Instruct	66.8	86.7	60.5	29.8	43.5	10.0

Core Capabilities: Qwen2.5-Coder-Instruct maintains the robust general abilities of Qwen2.5, performing strongly in evaluations like AMC23, MMLU, IFEval, and more, retaining its advantage in diverse tasks.

Model	AMC23	MMLU	MMLU-Pro	IFEval	CEval	GPQA
DeepSeek-Coder-V2-Lite-Instruct	40.4	42.5	60.6	38.6	60.1	27.6
Qwen2.5-Coder-7B-Instruct	42.5	45.6	68.7	58.6	61.4	35.6

License

Qwen2.5-Coder is released under the Apache 2.0 license, fostering openness and encouraging its application in advancing code intelligence.

What’s Next for Qwen2.5-Coder?

We are preparing the release of the 32B version of Qwen2.5-Coder, with aspirations to challenge leading proprietary models. Additionally, we are exploring advanced, code-centric reasoning models to further push the boundaries of code intelligence. Stay tuned for updates in our Blog!