76 lines
2.9 KiB
Markdown
76 lines
2.9 KiB
Markdown
|
|
---
|
|||
|
|
frameworks:
|
|||
|
|
- Pytorch
|
|||
|
|
license: Apache License 2.0
|
|||
|
|
tasks:
|
|||
|
|
- text-generation
|
|||
|
|
---
|
|||
|
|
# 本模型论文解读,请看公众号文章 👇🏻
|
|||
|
|
|
|||
|
|
### <img src="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset/resolve/master/wechat.png" width="30" height="30" align="absmiddle"> 觉察流 - [Open-R1:深度揭秘 DeepSeek-R1 开源复现进展](https://mp.weixin.qq.com/s/TxRaI8amE_N__1VU4XHvMg)
|
|||
|
|
|
|||
|
|
> <span style="color:red;font-size:16px"> 声明:本模型完全转载自 Huggingface 上的 [open-r1/OpenR1-Qwen-7B](https://huggingface.co/open-r1/OpenR1-Qwen-7B) <br/>更多模型信息,请关注下文👇🏻, 为原数据集仓库的中文版说明。</span>
|
|||
|
|
|
|||
|
|
<br/>
|
|||
|
|
|
|||
|
|
#### _仓库作者在此 👇🏻 扫一扫_
|
|||
|
|
|
|||
|
|
<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
|
|||
|
|
|
|||
|
|
# 下载方式
|
|||
|
|
|
|||
|
|
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
|
|||
|
|
#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型
|
|||
|
|
|
|||
|
|
SDK下载
|
|||
|
|
```bash
|
|||
|
|
#安装ModelScope
|
|||
|
|
pip install modelscope
|
|||
|
|
```
|
|||
|
|
```python
|
|||
|
|
#SDK模型下载
|
|||
|
|
from modelscope import snapshot_download
|
|||
|
|
model_dir = snapshot_download('okwinds/OpenR1-Qwen-7B')
|
|||
|
|
```
|
|||
|
|
Git下载
|
|||
|
|
```
|
|||
|
|
#Git模型下载
|
|||
|
|
git clone https://www.modelscope.cn/okwinds/OpenR1-Qwen-7B.git
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
# 模型介绍
|
|||
|
|
|
|||
|
|
# OpenR1-Qwen-7B
|
|||
|
|
|
|||
|
|
This is a finetune of [Qwen2.5-Math-Instruct](https://www.modelscope.cn/models/Qwen/Qwen2.5-Math-7B-Instruct) on [okwinds/OpenR1-Math-220k](https://www.modelscope.cn/datasets/okwinds/OpenR1-Math-220k) (`default` split).
|
|||
|
|
|
|||
|
|
## Quick start
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
model_name = "open-r1/OpenR1-Qwen-7B"
|
|||
|
|
device = "cuda"
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
torch_dtype="auto",
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."
|
|||
|
|
messages = [
|
|||
|
|
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
|
|||
|
|
{"role": "user", "content": prompt}
|
|||
|
|
]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Training
|
|||
|
|
|
|||
|
|
We train the model on the `default` split of [okwinds/OpenR1-Math-220k](https://www.modelscope.cn/datasets/okwinds/OpenR1-Math-220k) for 3 epochs. We use learning rate of 5e-5 and extend the context length from 4k to 32k, by increasing RoPE frequency to 300k. The training follows a linear learning rate schedule with a 10% warmup phase. The table below compares the performance of OpenR1-Qwen-7B to DeepSeek-Distill-Qwen-7B and OpenThinker-7B using [lighteval](https://github.com/huggingface/open-r1/tree/main?tab=readme-ov-file#evaluating-models).
|
|||
|
|
|
|||
|
|
You can find the training and evaluation code at: https://github.com/huggingface/open-r1/
|
|||
|
|
|
|||
|
|
| Model | MATH-500 | AIME24 | AIME25 |
|
|||
|
|
| --- | --- | --- |--- |
|
|||
|
|
| DeepSeek-Distill-Qwen-7B | 91.6 | 43.3 | 40.0|
|
|||
|
|
| OpenR1-Qwen-7B | 90.6 | 36.7 | 40.0 |
|
|||
|
|
| OpenThinker-7B | 89.6 | 30.0 | 33.3 |
|