Files
Open-RS2/README.md
ModelHub XC cd0e2ab2a2 初始化项目,由ModelHub XC社区提供模型
Model: knoveleng/Open-RS2
Source: Original Platform
2026-05-29 02:20:13 +08:00

57 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
datasets:
- knoveleng/open-rs
- knoveleng/open-s1
- knoveleng/open-deepscaler
license: mit
pipeline_tag: text-generation
inference: true
library_name: transformers
---
# Model Summary
This repository hosts model for the **Open RS** project, accompanying the paper *Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesnt*. The project explores enhancing reasoning capabilities in small large language models (LLMs) using reinforcement learning (RL) under resource-constrained conditions.
We focus on a 1.5-billion-parameter model, `DeepSeek-R1-Distill-Qwen-1.5B`, trained on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours. By adapting the Group Relative Policy Optimization (GRPO) algorithm and leveraging a curated, compact mathematical reasoning dataset, we conducted three experiments to assess performance and behavior. Key findings include:
- Significant reasoning improvements, e.g., AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%, outperforming `o1-preview`.
- Efficient training with just 7,000 samples at a cost of $42, compared to thousands of dollars for baseline models.
- Challenges like optimization instability and length constraints with extended training.
These results showcase RL-based fine-tuning as a cost-effective approach for small LLMs, making reasoning capabilities accessible in resource-limited settings. We open-source our code, models, and datasets to support further research.
For more details, please refer our [github](https://github.com/knoveleng/open-rs).
## Evaluation
### Performance Highlights
- **Open-RS1**: 53.0% avg. score
- **Open-RS2**: 55.7% avg. score, 80.0% on AMC23
- **Open-RS3**: 56.3% avg. score, 46.7% on AIME24 (outperforms `o1-preview` at 44.6%)
- Competitive MATH-500 scores; Minerva lags behind 7B models.
![Performance Metrics](assets/performances.png)
### Cost Efficiency
Our approach uses 7,000 samples (42,000 total outputs) and costs ~$42 on 4x A40 GPUs in 24 hours, compared to:
- 7B models: `Qwen2.5-7B-SimpleRL` ($1,633), `Eurus-2-7B-PRIME` ($1,088)
- 1.5B models: `DeepScaleR-1.5B-Preview` ($3,629), `Still-3-1.5B-Preview` ($2,268)
![7B Model Costs](assets/costs-7b.png)
![1.5B Model Costs](assets/costs-1.5b.png)
## Citation
If this project aids your work, please cite it as:
```
@inproceedings{
dang2026reinforcement,
title={Reinforcement Learning for Reasoning in Small {LLM}s: What Works and What Doesn{\textquoteright}t},
author={Quy-Anh Dang and Chris Ngo},
booktitle={Logical and Symbolic Reasoning in Language Models @ AAAI 2026},
year={2026},
url={https://openreview.net/forum?id=3pWL6Zxc4A}
}
```