初始化项目，由ModelHub XC社区提供模型

Model: sail/Qwen2.5-Math-7B-Oat-Zero Source: Original Platform
2026-05-06 09:35:31 +08:00
commit 1fc83072a1
16 changed files with 152187 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,88 @@
+---
+library_name: transformers
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-Math-7B
+---
+
+# Qwen2.5-Math-7B-Oat-Zero
+
+## Links
+
+- 📜 [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf)
+- 💻 [GitHub](https://github.com/sail-sg/understand-r1-zero)
+- 🤗 [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a)
+
+## Introduction
+
+This model is trained by the minimalist R1-Zero recipe introduced in our paper: 
+- **Algorithm**: Dr. DRPO
+- **Data**: level 3-5 questions from MATH dataset
+- **Base model**: [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
+- **Template**: Qwen-Math
+
+Evaluation results on widely used math benchmarks are shown below:
+
+<img src="https://raw.githubusercontent.com/sail-sg/understand-r1-zero/refs/heads/main/assets/benchmark_table.png" width=100%/>
+
+## Usage
+
+```python
+import vllm
+
+
+def apply_qwen_math_template(question: str):
+    return (
+        "<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n<|im_start|>user\n"
+        + question
+        + "<|im_end|>\n<|im_start|>assistant\n"
+    )
+
+def apply_r1_template(question: str):
+    return (
+        "A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. "
+        "The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\nUser: "
+        + question
+        + "\nAssistant: <think>"
+    )
+
+model_name = "sail/Qwen2.5-Math-7B-Oat-Zero"
+
+sampling_params = vllm.SamplingParams(
+    n=1,
+    temperature=0,
+    top_p=1,
+    max_tokens=3000,
+)
+
+model = vllm.LLM(
+    model_name,
+    max_model_len=4096,
+    dtype="bfloat16",
+    enable_prefix_caching=True,
+)
+
+if "Llama-3.2-3B-Oat-Zero" in model_name:
+    apply_template = apply_r1_template
+else:
+    apply_template = apply_qwen_math_template
+
+prompts = [
+    "How many positive whole-number divisors does 196 have?"
+]
+prompts = list(map(apply_template, prompts))
+outputs = model.generate(prompts, sampling_params)
+
+print(outputs)
+```
+
+## Citation
+
+```latex
+@article{liu2025understanding,
+  title={Understanding r1-zero-like training: A critical perspective},
+  author={Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min},
+  journal={arXiv preprint arXiv:2503.20783},
+  year={2025}
+}
+```