初始化项目，由ModelHub XC社区提供模型

Model: daviddavidlu/PrAg-PO-Qwen3-1.7b-step720 Source: Original Platform
2026-05-30 09:43:18 +08:00
commit 3bee53c6b1
14 changed files with 152257 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,44 @@
+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen3-1.7B
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- mathematical-reasoning
+- reinforcement-learning
+- grpo
+---
+
+# Model Card for PrAg-PO Qwen3-1.7B step720
+
+**If you are using the model, a star to our [github repo](https://github.com/wenquanlu/PrAg-PO) would be really appreciated! 😊**
+
+This is the step 720 checkpoint when training Qwen3-1.7B on MATH Level-3-to-5 Dataset using PrAg-PO. The training procedure is outlined in the paper [PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning](https://arxiv.org/abs/2602.03190).
+
+### Model Sources
+
+- **Repository 🤖:** [https://github.com/wenquanlu/PrAg-PO](https://github.com/wenquanlu/PrAg-PO)
+- **Paper 📝:** [PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning](https://arxiv.org/abs/2602.03190)
+
+## Uses
+
+This model is intended for mathematical reasoning tasks. It leverages prompt augmentation to generate reasoning traces under diverse templates, increasing rollout diversity and stability during RL training.
+
+### Results
+
+<p align="center">
+<img src="https://raw.githubusercontent.com/wenquanlu/prompt-augmentation-GRPO/master/imgs/table_result.png">
+</p>
+
+
+## Citation
+```bibtex
+@misc{lu2026pragpopromptaugmentedpolicy,
+      title={PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning}, 
+      author={Wenquan Lu and Hai Huang and Enqi Liu and Randall Balestriero},
+      journal={arXiv preprint arXiv:2602.03190},
+      url={https://arxiv.org/abs/2602.03190},
+      year={2026},
+}
+```