--- license: mit library_name: transformers pipeline_tag: text-generation --- The base Qwen2.5-Math-1.5B model used by ReLIFT. We change to rope_theta from 10000 to 40000 and extend the context window to 16k. Also, we modify the chat_template for the system prompt and add . Github: https://github.com/TheRoadQaQ/ReLIFT # Citation If you find our model, data, or evaluation code useful, please kindly cite our paper: ```bib @article{ma2025learning, title={Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions}, author={Ma, Lu and Liang, Hao and Qiang, Meiyi and Tang, Lexiang and Ma, Xiaochen and Wong, Zhen Hao and Niu, Junbo and Shen, Chengyu and He, Runming and Cui, Bin and others}, journal={arXiv preprint arXiv:2506.07527}, year={2025} } ```