初始化项目，由ModelHub XC社区提供模型

Model: Intel/deepmath-v1 Source: Original Platform
2026-05-27 03:08:12 +08:00
commit b0a95f529c
15 changed files with 1101 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,203 @@
+---
+language:
+- en
+license: apache-2.0
+tags:
+- math
+- reasoning
+- agent
+- qwen
+- grpo
+- reinforcement-learning
+base_model: Qwen/Qwen3-4B-Thinking-2507
+datasets:
+- nvidia/OpenMathReasoning
+metrics:
+- accuracy
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# DeepMath: A Lightweight Math Reasoning Agent
+
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/ndb_WmPavW1MONAjsGpYT.jpeg" style="width:600px" alt="An LLM is using a calculator to answer questions." />
+
+## Model Description
+
+**DeepMath** is a 4B parameter mathematical reasoning model that combines a fine-tuned LLM with a sandboxed Python executor. Built on [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) and trained with **GRPO (Group Relative Policy Optimization)**, DeepMath generates concise Python snippets for computational steps instead of verbose text explanations, significantly reducing errors and output length.
+
+- **Developed by:** Intel AI Labs
+- **Model type:** Causal language model with agent capabilities
+- **Language:** English
+- **Base model:** [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
+- **License:** Apache 2.0
+- **Blog:**: 🔗 <https://huggingface.co/blog/intel-deepmath>
+- **Repository:** 💻 [https://github.com/IntelLabs/DeepMath](https://github.com/IntelLabs/DeepMath)
+
+## Key Features
+
+✅ **Code-driven reasoning:** Generates short Python snippets for intermediate computational steps  
+✅ **Sandboxed execution:** No file I/O, no network calls, strict timeouts  
+✅ **Improved accuracy:** Offloading computation reduces arithmetic errors  
+✅ **Reduced verbosity:** Up to 66% shorter outputs compared to baseline  
+✅ **Safe and auditable:** Deterministic execution with readable code snippets  
+
+## Model Architecture
+
+DeepMath uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the following components:
+
+- **Agent Interface:** Outputs special tokens for Python code execution during reasoning
+- **Executor:** Sandboxed Python environment with allow-listed modules
+- **Safety Constraints:** Per-snippet timeouts, no file/network access
+- **Training Method:** GRPO with accuracy and code generation rewards
+
+<figure>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/zOcvJ2DY61QZyozarsKbT.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." />
+<figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption>
+</figure>
+
+## Training Details
+
+### Training Data
+
+- **Dataset:** [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (tool-usage subset)
+- **Note:** GRPO training only uses problems, not solutions
+- **In-context Learning:** 4 solved examples demonstrating agent call syntax and patterns
+
+### Training Procedure
+
+**GRPO (Group Relative Policy Optimization)** fine-tuning with:
+
+- **Accuracy Reward:** +1 for correct answers
+- **Code Generation Reward:** +1 for using code snippets (weighted 10:1 vs. accuracy)
+- **Length Constraint:** GRPO completions limited to 5k tokens
+- **Temperature Scheduling:** Linear schedule from T=1.2 → T=0.7 during training
+- **Infrastructure:** Modified TRL library's vLLM client and server
+
+### Training Infrastructure
+
+- Base inference engine: [vLLM](https://github.com/vllm-project/vllm)
+- Agent framework: Based on [SmolAgents](https://github.com/huggingface/smolagents/)
+- Training framework: Modified [TRL](https://github.com/huggingface/trl) GRPO trainer
+
+## Performance
+
+### Benchmark Results
+
+We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics:
+
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/mBuINzNvjDKdZEuIqzJeO.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/>
+
+**Key Findings:**
+
+- **Accuracy:** Improved performance on challenging datasets (AIME, HMMT, HLE)
+- **Efficiency:** Up to **66% reduction** in output length
+- **Robustness:** Consistent improvements when combining agent + GRPO training
+
+### Evaluation Datasets
+
+- **MATH500:** Subset of the MATH dataset
+- **AIME:** American Invitational Mathematics Examination problems
+- **HMMT:** Harvard-MIT Mathematics Tournament problems
+- **HLE:** High-level exam problems
+
+<figure>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/a-kn3oHdlxTP_L-63N9LX.png" style="width:700px" alt="Output example showing Python code generation and execution." />
+<figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption>
+</figure>
+
+## Usage
+
+### Installation
+
+```bash
+# Install uv package manager
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Clone repository
+git clone https://github.com/IntelLabs/DeepMath.git
+cd DeepMath
+
+# Install dependencies
+uv pip install -r requirements.txt
+uv pip install -e .
+```
+
+### Basic Inference
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "Intel/deepmath-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+
+# Example problem
+problem = "What is the sum of the first 100 positive integers?"
+
+inputs = tokenizer(problem, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=3000)
+print(tokenizer.decode(outputs[0]))
+```
+
+### Inference with Agent
+
+For full agent capabilities with sandboxed Python execution:
+
+```bash
+python inference.py \
+    +model.use_vllm=true \
+    +model.math_agent=true \
+    +model.examples=deep_math/fewshot.txt \
+    model.generation.max_new_tokens=3000 \
+    +model.max_agent_output=20000 \
+    +model.max_steps=50 \
+    model.model_name_or_path=Intel/deepmath-v1 \
+    hf_tag=HuggingFaceH4/MATH-500 \
+    generated_file=output.jsonl
+```
+
+See the [repository](https://github.com/IntelLabs/DeepMath) for complete usage examples.
+
+## Limitations and Biases
+
+### Limitations
+
+- **Scope:** Optimized for mathematical reasoning tasks; may not generalize to other domains
+- **Problem Types:** Evaluated on contest-style math problems; performance on open-ended mathematical creativity or formal proofs is unknown
+- **Model Size:** 4B parameters may limit reasoning depth on extremely complex problems
+- **Code Execution:** Requires sandboxed environment for full agent capabilities
+
+### Safety Considerations
+
+⚠️ **Code Execution Risk:** This model generates and executes Python code. While DeepMath uses strict sandboxing and resource limits, any deployment should:
+
+- Carefully manage attack surfaces
+- Enforce rate limits
+- Use proper isolation (containers, VMs)
+- Monitor resource usage
+- Validate generated code before execution in production
+
+### Ethical Considerations
+
+- The model is trained on mathematical problem-solving datasets and should not be used for decision-making in critical applications without human oversight
+- Generated code should be reviewed before execution in production environments
+- The model may reflect biases present in the training data
+
+## Citation
+
+If you use DeepMath in your research, please cite:
+
+```bibtex
+@software{deepmath2025,
+  author = {Fleischer, Daniel and Berchansky, Moshe and Wasserblat, Moshe},
+  title = {DeepMath: A Lightweight Math Reasoning Agent for LLMs},
+  year = {2025},
+  publisher = {Intel AI Labs},
+  url = {https://github.com/IntelLabs/DeepMath}
+}
+```
+
+## Model Card Contact
+
+For questions or issues, please open an issue on the [GitHub repository](https://github.com/IntelLabs/DeepMath).