Qwen2.5-Math-7B-16k-think/README.md

---
license: mit
library_name: transformers
pipeline_tag: text-generation
---

The base Qwen2.5-Math-7B model used by ReLIFT. 
We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
Also, we modify the chat_template for the system prompt and add <think>.

Github: https://github.com/TheRoadQaQ/ReLIFT

# Citation
If you find our model, data, or evaluation code useful, please kindly cite our paper:
```bib
@article{ma2025learning,
  title={Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions},
  author={Ma, Lu and Liang, Hao and Qiang, Meiyi and Tang, Lexiang and Ma, Xiaochen and Wong, Zhen Hao and Niu, Junbo and Shen, Chengyu and He, Runming and Cui, Bin and others},
  journal={arXiv preprint arXiv:2506.07527},
  year={2025}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: RoadQAQ/Qwen2.5-Math-7B-16k-think Source: Original Platform 2026-05-13 20:59:30 +08:00			`---`
			`license: mit`
			`library_name: transformers`
			`pipeline_tag: text-generation`
			`---`

			`The base Qwen2.5-Math-7B model used by ReLIFT.`
			`We change to rope_theta from 10000 to 40000 and extend the context window to 16k.`
			`Also, we modify the chat_template for the system prompt and add <think>.`

			`Github: https://github.com/TheRoadQaQ/ReLIFT`

			`# Citation`
			`If you find our model, data, or evaluation code useful, please kindly cite our paper:`
			```bib
			`@article{ma2025learning,`
			`title={Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions},`
			`author={Ma, Lu and Liang, Hao and Qiang, Meiyi and Tang, Lexiang and Ma, Xiaochen and Wong, Zhen Hao and Niu, Junbo and Shen, Chengyu and He, Runming and Cui, Bin and others},`
			`journal={arXiv preprint arXiv:2506.07527},`
			`year={2025}`
			`}`
			```