初始化项目,由ModelHub XC社区提供模型
Model: RoadQAQ/Qwen2.5-Math-7B-16k-think Source: Original Platform
This commit is contained in:
22
README.md
Normal file
22
README.md
Normal file
@@ -0,0 +1,22 @@
|
||||
---
|
||||
license: mit
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
The base Qwen2.5-Math-7B model used by ReLIFT.
|
||||
We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
|
||||
Also, we modify the chat_template for the system prompt and add <think>.
|
||||
|
||||
Github: https://github.com/TheRoadQaQ/ReLIFT
|
||||
|
||||
# Citation
|
||||
If you find our model, data, or evaluation code useful, please kindly cite our paper:
|
||||
```bib
|
||||
@article{ma2025learning,
|
||||
title={Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions},
|
||||
author={Ma, Lu and Liang, Hao and Qiang, Meiyi and Tang, Lexiang and Ma, Xiaochen and Wong, Zhen Hao and Niu, Junbo and Shen, Chengyu and He, Runming and Cui, Bin and others},
|
||||
journal={arXiv preprint arXiv:2506.07527},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user