ModelHub XC 93204eba2b 初始化项目,由ModelHub XC社区提供模型
Model: okwinds/MiroMind-M1-RL-7B
Source: Original Platform
2026-06-23 19:20:12 +08:00

frameworks, license, tasks, base_model
frameworks license tasks base_model
Pytorch
Apache License 2.0
text-generation
okwinds/Miromind-M1-SFT-7B

本模型转载自 huggingface 【miromind-ai

📖 关于项目相关的研究,可阅读公众号“觉察流”文章👇

MiroMind-M1如何用CAMPO算法打造高效且可复现的全栈开源推理模型

本仓库作者在此 👇🏻 扫一扫


SDK下载

#安装ModelScope
pip install modelscope
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('okwinds/MiroMind-M1-RL-7B')

Git下载

#Git模型下载
git clone https://www.modelscope.cn/okwinds/MiroMind-M1-RL-7B.git

官方 MiroMind-M1-RL-7B 简介

MiroMindM1

Models Data Paper Github Website

MiroMind-M1

🧾 Overview

7B Model Training Performance

Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.

MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (SFT) on 719K curated problems and reinforcement learning with verifiable rewards (RLVR) on 62K challenging examples, using a context-aware multi-stage policy optimization method (CAMPO). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (MiroMind-M1-SFT-7B, MiroMind-M1-RL-7B, MiroMind-M1-RL-32B), data (MiroMind-M1-SFT-719K, MiroMind-M1-RL-62K), and training setups openly released.

📊 Evaluation

MiroMind-M1-SFT

Model Initial Checkpoint AIME24 (avg@64) AIME25 (avg@64) MATH500 (avg@5)
DeepSeek-R1-Distill Qwen2.5-Math-7B 55.5 40.4† 92.8
OpenThoughts Qwen2.5-7-Instruct 31.3 23.3 83.2
Open-R1 Qwen2.5-Math-7B-Instruct 36.7 40.0 90.6
Synthetic-1 Qwen2.5-7B-Instruct 30.0 26.6 85.6
MiroMind-SFT-7B Qwen2.5-Math-7B 60.4 45.0 94.6

† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.

MiroMind-M1-RL

Model AIME24 (avg@64) AIME25 (avg@64) MATH500 (avg@5)
DeepSeek-R1 79.8 70.0
DeepSeek-R1-0528 91.4 87.5
Qwen3-8B 76.0 67.3
DeepSeek-R1-0528-Qwen3-8B 86.0 76.3
32B Models trained from Qwen2.5 series
DeepSeek-R1-Distill-Qwen-32B 70.8 52.1 95.8
Skywork-OR1-32B-Preview 77.1 68.2 97.5
MiroMind-M1-RL-32B 77.5 65.6 96.4
7B Models trained from Qwen2.5 series
DeepSeek-R1-Distill-Qwen-7B 55.5 39.2
MiroMind-M1-SFT-7B 60.4 45.0 94.6
Light-R1-7B-DS 59.1 44.3
Skywork-OR1-7B 72.2 54.6
MiroMind-M1-RL-7B 73.4 57.8 96.7

🔗 Resources

Models

MiroMind-M1-SFT-7B
MiroMind-M1-RL-7B
MiroMind-M1-RL-32B

Data

MiroMind-M1-SFT-719K
MiroMind-M1-RL-62K

Description
Model synced from source: okwinds/MiroMind-M1-RL-7B
Readme 4.6 MiB
Languages
SVG 100%