TreePO-Qwen2.5-7B/README.md at main

Files

ModelHub XC da8103bbe4 初始化项目，由ModelHub XC社区提供模型

Model: m-a-p/TreePO-Qwen2.5-7B
Source: Original Platform

2026-06-09 15:30:12 +08:00

1.5 KiB

Raw Permalink Blame History

library_name, pipeline_tag, datasets, base_model

library_name

pipeline_tag

datasets

base_model

transformers

text-generation

m-a-p/TreePO_data

Qwen/Qwen2.5-7B

We release the resources for the paper TreePO:

Checkpoint with average weighted subgroup advantages + more diverse intial divergence (the final one). ← You are here.
Checkpoint with average weighted subgroup advantages + fixed divergence.
The training dataset consisted of deepscaler and simplerl math reasoning.

More links:

If you find this work useful, please consider citing the paper:

@misc{li2025treepo, title={TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling}, author={Yizhi Li and Qingshui Gu and Zhoufutu Wen and Ziniu Li and Tianshun Xing and Shuyue Guo and Tianyu Zheng and Xin Zhou and Xingwei Qu and Wangchunshu Zhou and Zheng Zhang and Wei Shen and Qian Liu and Chenghua Lin and Jian Yang and Ge Zhang and Wenhao Huang}, year={2025}, eprint={2508.17445}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.17445}, howpublished = {\url{https://m-a-p.ai/TreePO}} }

1.5 KiB Raw Permalink Blame History

1.5 KiB

Raw Permalink Blame History