22 lines
808 B
Markdown
22 lines
808 B
Markdown
---
|
|
license: mit
|
|
library_name: transformers
|
|
pipeline_tag: text-generation
|
|
---
|
|
|
|
The base Qwen2.5-Math-7B model used by ReLIFT.
|
|
We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
|
|
Also, we modify the chat_template for the system prompt and add <think>.
|
|
|
|
Github: https://github.com/TheRoadQaQ/ReLIFT
|
|
|
|
# Citation
|
|
If you find our model, data, or evaluation code useful, please kindly cite our paper:
|
|
```bib
|
|
@article{ma2025learning,
|
|
title={Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions},
|
|
author={Ma, Lu and Liang, Hao and Qiang, Meiyi and Tang, Lexiang and Ma, Xiaochen and Wong, Zhen Hao and Niu, Junbo and Shen, Chengyu and He, Runming and Cui, Bin and others},
|
|
journal={arXiv preprint arXiv:2506.07527},
|
|
year={2025}
|
|
}
|
|
``` |