logo
  • Categories
  • Submit
  • Blog

© 2026 NeuroKit. All Rights Reserved.
    AI Product Observation

    Light-R1-360 Zhinao's open source long-term thinking chain reasoning model

    Tina
    Tina
    ·March 27, 2025·27 views
    Light-R1-360 Zhinao's open source long-term thinking chain reasoning model

    What is Light-R1?

    Light-R1 is 360 Zhinao's open source A! model, focusing on long-term thinking chain reasoning in the field of mathematics, specifically Light-R1-32B. The model is based on Qwen2.5-32B-Instruct, trained with 70,000 mathematical data and two-stage course learning (SFT+DPO), and achieved performance that surpassed DeepSeekR1-Distil-Qwen-32B from scratch. In the AIME24 test, Light-R1 scored 76.6 points, significantly higher than DeepSeek-R1-Distil's 72.6 points. The model training cost is low, only 12 H800 machines are required to run for 6 hours, and the cost is about $1,000. The model supports full open source, including models, data sets, training frameworks, and evaluation codes, to promote the development of the open source community and provide a reference for low-cost training of specialized models in the field.

    Main functions of Light-R1

    Efficient math problem solving: can quickly and accurately solve complex math problems, including but not limited to algebra, geometry, probability and other fields.

    Inference ability improvement: has strong logical reasoning ability and supports processing long thought chain problems.

    Generalization ability: shows generalization ability in other fields (such as logical reasoning, language comprehension).

    Low-cost training and deployment: extremely low cost to achieve high performance, suitable for rapid deployment and application by users or enterprises with limited resources.

    Technical principles of Light-R1

    Basic model and starting point: the model is developed based on Qwen2.5-32B-Instruct, achieving performance improvement from zero to surpassing DeepSeek-R1-Disti.

    Course learning:

    SFT (Supervised Fine-Tuning): screen data with difficulty levels and perform supervised fine-tuning in two stages. The first stage uses 70,000 data, and the second stage screens out the 3,000 data with the highest difficulty for further fine-tuning.

    DPO (Direct Preference Optimization): Based on SFT, based on multiple sampling and preference pair construction, optimize the output quality of the model.

    Data processing and deduplication: The training data comes from multiple open source mathematical data sets (such as OpenR1-Math-220k, OpenThoughts-114k, etc.), and is strictly deduplicated to avoid the impact of test data leakage on model performance.

    Model fusion: The final Liaht-R1-328 is obtained by integrating SFT stage 2, DPO and another DPO version of the model. Further improve the performance and stability of the model.

    Training framework and optimization: Use the 360-LLaMA-factory training framework to support sequential parallelism and efficient distributed training. Based on the optimized training process, Light-R1 can complete training in just 6 hours on 12 H800 machines.

    Light-R1 project address

    GitHub repository: https://github.com/Qihoo360/Light-R1

    HuggingFace model library: https://huggingface.co/collections/gihoo360/light-r1z

    Application scenarios of Light-R1

    Education: As a mathematics learning tool, it helps students solve complex problems, provides problem-solving steps and ideas, and is suitable for mathematics competitions and daily learning.

    Scientific research and academic: Assists in mathematical research and interdisciplinary problem solving, such as physical construction, engineering optimization, etc.

    Enterprise application: Used to solve complex problems such as data analysis, risk assessment, and supply chain optimization.

    Software integration: Integrate into smart assistants and mathematical software to enhance reasoning and problem-solving functions.

    Open source and developers: Support developers to customize and expand models and promote the development of the open source community.



    Summary

    Discover Light-R1, 360 Zhinao's open-source AI model that excels in mathematics through efficient problem-solving, low-cost training, and long-term reasoning. Learn more!