MiroMind-M1-RL-7B/README.md

---
frameworks:
- Pytorch
license: Apache License 2.0
tasks:
- text-generation
base_model:
- okwinds/Miromind-M1-SFT-7B
---

本模型转载自 huggingface 【[miromind-ai](https://huggingface.co/miromind-ai)】

#### 📖 关于项目相关的研究，可阅读公众号“觉察流”文章👇</br>

《[MiroMind-M1：如何用CAMPO算法打造高效且可复现的全栈开源推理模型](https://mp.weixin.qq.com/s/REPzzgsUjDMikg4jIo9KRg)》

#### _本仓库作者在此 👇🏻 扫一扫_

<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />

---

SDK下载
```bash
#安装ModelScope
pip install modelscope
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('okwinds/MiroMind-M1-RL-7B')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/okwinds/MiroMind-M1-RL-7B.git
```

# 官方 MiroMind-M1-RL-7B 简介

<div align="center">
  <img src="https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B/resolve/master/assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" />
</div>
<!-- <hr> -->
<div align="center">

[![Models](https://img.shields.io/badge/Models-5EDDD2?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)
[![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)
[![Paper](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2507.14683)
[![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAsia/MiroMind-M1)
[![Website](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://miromind.ai/)

</div>


# MiroMind-M1


## 🧾 Overview
<div align="center">
  <img src="https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B/resolve/master/assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" />
  <p><i>Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.</i></p>
</div>

**MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.


## 📊 Evaluation

### MiroMind-M1-SFT
| Model           | Initial Checkpoint         | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|------------------|----------------------------|--------|--------|---------|
| DeepSeek-R1-Distill                  | Qwen2.5-Math-7B             | 55.5   | 40.4†  | 92.8    |
| OpenThoughts                         | Qwen2.5-7-Instruct           | 31.3   | 23.3   | 83.2    |
| Open-R1                              | Qwen2.5-Math-7B-Instruct     | 36.7   | 40.0   | 90.6    |
| Synthetic-1                          | Qwen2.5-7B-Instruct          | 30.0   | 26.6   | 85.6    |
| **MiroMind-SFT-7B**                  | Qwen2.5-Math-7B             | 60.4   | 45.0   | 94.6    |

*† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*

### MiroMind-M1-RL
| Model                            | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|----------------------------------|--------|--------|---------|
| DeepSeek-R1                      | 79.8   | 70.0   | –       |
| DeepSeek-R1-0528                 | 91.4   | 87.5   | –       |
| Qwen3-8B                         | 76.0   | 67.3   | –       |
| DeepSeek-R1-0528-Qwen3-8B        | 86.0   | 76.3   | –       |
| <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-32B     | 70.8   | 52.1   | 95.8    |
| Skywork-OR1-32B-Preview          | 77.1   | 68.2   | 97.5    |
| **MiroMind-M1-RL-32B**           | 77.5   | 65.6   | 96.4    |
| <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-7B      | 55.5   | 39.2   | –       |
| **MiroMind-M1-SFT-7B**           | 60.4   | 45.0   | 94.6    |
| Light-R1-7B-DS                   | 59.1   | 44.3   | –       |
| Skywork-OR1-7B                   | 72.2   | 54.6   | –       |
| **MiroMind-M1-RL-7B**            | 73.4   | 57.8   | 96.7    |


## 🔗 Resources
### Models
[`MiroMind-M1-SFT-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-SFT-7B)<br>
[`MiroMind-M1-RL-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)<br>
[`MiroMind-M1-RL-32B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-32B)<br>

### Data
[`MiroMind-M1-SFT-719K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K)<br>
[`MiroMind-M1-RL-62K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)<br>