Key Features:
• Advanced Reasoning
-1. Scores 92.3% on AIME 2025, surpassing OpenAI’s o4-mini (92.7%) and Gemini 2.5 Pro (88%)
-2. Achieves state-of-the-art results among open-source thinking models in logical reasoning, math, science, and coding
-3. Uses deepseek_r1 reasoning parser for enhanced step-by-step logic
• Mixture of Experts (MoE) Architecture
-1. 235 billion total parameters, activating 22 billion per forward pass for efficiency
-2. FP8-quantized version reduces memory footprint to ~30GB VRAM, compared to 88GB for BF16
-3. Optimized for complex tasks with increased thinking length
• Extended Context
-1. Native 262,144 token context window, ideal for long-form reasoning and large codebases
-2. Supports up to 1M tokens with specific configurations (e.g., vLLM, SGLang)
-3. Handles ~800 pages of text or extensive multi-step workflows
• Performance Excellence
-1. Outperforms DeepSeek-R1-0528, o3, and Claude Opus 4 in reasoning benchmarks
-2. Excels in coding (LiveCodeBench) and academic tasks (MMLU-Redux)
-3. Latency optimized with recommended settings: temperature=0.6, top_p=0.95, top_k=20
• Cost and Accessibility
-1. Priced at $0.70 per million input tokens, $8.40 per million output tokens, cheaper than average ($2.63 blended)
-2. Open-source under Apache 2.0, available on Hugging Face and ModelScope
-3. Supports local deployment via Ollama, LMStudio, llama.cpp, and KTransformers
• Developer-Friendly Features
-1. Supports tool calling via Qwen-Agent, simplifying agentic workflows
-2. Compatible with vLLM and SGLang for high-performance inference
-3. Fine-tuning supported with Unsloth, reducing VRAM needs by 70%
• Practical Applications
-1. Powers research, coding, and enterprise AI for complex reasoning tasks
-2. Ideal for academic benchmarks, multilingual tasks, and long-context analysis