初始化项目，由ModelHub XC社区提供模型

Model: wh-zhu/Qwen2.5-7B-PSFT-RL-DAPO-90 Source: Original Platform
2026-05-28 10:14:20 +08:00
commit f3b20dd1f4
16 changed files with 152171 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,27 @@
+---
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# Hybrid Policy Distillation for LLMs
+
+This repository contains the weights for the model described in the paper [Hybrid Policy Distillation for LLMs](https://huggingface.co/papers/2604.20244).
+
+Hybrid Policy Distillation (HPD) is a framework for compressing large language models (LLMs) that reformulates knowledge distillation (KD) as a reweighted log-likelihood objective at the token level. It integrates the complementary advantages of forward and reverse KL to balance mode coverage and mode-seeking, demonstrating improved computational efficiency and final performance across diverse model families and scales.
+
+## Resources
+- **Paper:** [Hybrid Policy Distillation for LLMs](https://huggingface.co/papers/2604.20244)
+- **Code:** [GitHub Repository](https://github.com/zwhong714/Hybrid-Policy-Distillation)
+
+## Citation
+
+If you find this work useful in your research, please cite:
+
+```bibtex
+@article{hong2024hybrid,
+  title={Hybrid Policy Distillation for LLMs},
+  author={Hong, Zhiwei and others},
+  journal={arXiv preprint arXiv:2604.20244},
+  year={2024}
+}
+```