Qwen2.5-Max - Qwen2

Qwen2.5-Max is a large-scale MoE model, pretrained on more than 20 trillion tokens and further refined through curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Today, QWEN dev team excited to share its performance results and announce the availability of its API via Alibaba Cloud. We also invite you to experience Qwen2.5-Max on Qwen Chat!

Performance

We evaluate Qwen2.5-Max against top proprietary and open-weight models across key benchmarks that are highly relevant to the community. These include MMLU-Pro for assessing knowledge through college-level problems, LiveCodeBench for coding capabilities, LiveBench for overall general performance, and Arena-Hard, which approximates human preferences. Our analysis covers both base and instruct models.

We start by directly comparing the instruct models, which are designed for downstream applications such as chat and coding. The performance results of Qwen2.5-Max are presented alongside leading state-of-the-art models, including DeepSeek V3, GPT-4o, and Claude-3.5-Sonnet.

Qwen2.5-Max surpasses DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also delivering strong performance in other assessments like MMLU-Pro.

For base model comparisons, we are unable to access proprietary models like GPT-4o and Claude-3.5-Sonnet. Instead, we evaluate Qwen2.5-Max against DeepSeek V3, a leading open-weight MoE model, Llama-3.1-405B, the largest open-weight dense model, and Qwen2.5-72B, another top-tier open-weight dense model. The results of this comparison are detailed below.

Use Qwen2.5-Max

Qwen2.5-Max is now available in Qwen Chat! You can chat directly with the model, explore artifacts, perform searches, and more. Try it out and experience its capabilities firsthand!

The API of Qwen2.5-Max (whose model name is qwen-max-2025-01-25) is available. You can first register an Alibaba Cloud account and activate Alibaba Cloud Model Studio service, and then navigate to the console and create an API key.

Since the APIs of Qwen are OpenAI-API compatible, we can directly follow the common practice of using OpenAI APIs. Below is an example of using Qwen2.5-Max in Python:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-max-2025-01-25",
    messages=[
      {'role': 'system', 'content': 'You are a helpful assistant.'},
      {'role': 'user', 'content': 'Which number is larger, 9.11 or 9.8?'}
    ]
)

print(completion.choices[0].message)

Future Work

Scaling data and model size not only drives advancements in AI intelligence but also underscores our commitment to cutting-edge research. We aim to further enhance the reasoning and problem-solving abilities of large language models through innovative applications of scaled reinforcement learning. This approach holds the potential to push our models beyond human intelligence, opening new frontiers of knowledge and discovery.

Explore other Qwen Models: