初始化项目,由ModelHub XC社区提供模型
Model: Intel/deepmath-v1 Source: Original Platform
This commit is contained in:
203
README.md
Normal file
203
README.md
Normal file
@@ -0,0 +1,203 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- math
|
||||
- reasoning
|
||||
- agent
|
||||
- qwen
|
||||
- grpo
|
||||
- reinforcement-learning
|
||||
base_model: Qwen/Qwen3-4B-Thinking-2507
|
||||
datasets:
|
||||
- nvidia/OpenMathReasoning
|
||||
metrics:
|
||||
- accuracy
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# DeepMath: A Lightweight Math Reasoning Agent
|
||||
|
||||
<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/ndb_WmPavW1MONAjsGpYT.jpeg" style="width:600px" alt="An LLM is using a calculator to answer questions." />
|
||||
|
||||
## Model Description
|
||||
|
||||
**DeepMath** is a 4B parameter mathematical reasoning model that combines a fine-tuned LLM with a sandboxed Python executor. Built on [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) and trained with **GRPO (Group Relative Policy Optimization)**, DeepMath generates concise Python snippets for computational steps instead of verbose text explanations, significantly reducing errors and output length.
|
||||
|
||||
- **Developed by:** Intel AI Labs
|
||||
- **Model type:** Causal language model with agent capabilities
|
||||
- **Language:** English
|
||||
- **Base model:** [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
|
||||
- **License:** Apache 2.0
|
||||
- **Blog:**: 🔗 <https://huggingface.co/blog/intel-deepmath>
|
||||
- **Repository:** 💻 [https://github.com/IntelLabs/DeepMath](https://github.com/IntelLabs/DeepMath)
|
||||
|
||||
## Key Features
|
||||
|
||||
✅ **Code-driven reasoning:** Generates short Python snippets for intermediate computational steps
|
||||
✅ **Sandboxed execution:** No file I/O, no network calls, strict timeouts
|
||||
✅ **Improved accuracy:** Offloading computation reduces arithmetic errors
|
||||
✅ **Reduced verbosity:** Up to 66% shorter outputs compared to baseline
|
||||
✅ **Safe and auditable:** Deterministic execution with readable code snippets
|
||||
|
||||
## Model Architecture
|
||||
|
||||
DeepMath uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the following components:
|
||||
|
||||
- **Agent Interface:** Outputs special tokens for Python code execution during reasoning
|
||||
- **Executor:** Sandboxed Python environment with allow-listed modules
|
||||
- **Safety Constraints:** Per-snippet timeouts, no file/network access
|
||||
- **Training Method:** GRPO with accuracy and code generation rewards
|
||||
|
||||
<figure>
|
||||
<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/zOcvJ2DY61QZyozarsKbT.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." />
|
||||
<figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption>
|
||||
</figure>
|
||||
|
||||
## Training Details
|
||||
|
||||
### Training Data
|
||||
|
||||
- **Dataset:** [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (tool-usage subset)
|
||||
- **Note:** GRPO training only uses problems, not solutions
|
||||
- **In-context Learning:** 4 solved examples demonstrating agent call syntax and patterns
|
||||
|
||||
### Training Procedure
|
||||
|
||||
**GRPO (Group Relative Policy Optimization)** fine-tuning with:
|
||||
|
||||
- **Accuracy Reward:** +1 for correct answers
|
||||
- **Code Generation Reward:** +1 for using code snippets (weighted 10:1 vs. accuracy)
|
||||
- **Length Constraint:** GRPO completions limited to 5k tokens
|
||||
- **Temperature Scheduling:** Linear schedule from T=1.2 → T=0.7 during training
|
||||
- **Infrastructure:** Modified TRL library's vLLM client and server
|
||||
|
||||
### Training Infrastructure
|
||||
|
||||
- Base inference engine: [vLLM](https://github.com/vllm-project/vllm)
|
||||
- Agent framework: Based on [SmolAgents](https://github.com/huggingface/smolagents/)
|
||||
- Training framework: Modified [TRL](https://github.com/huggingface/trl) GRPO trainer
|
||||
|
||||
## Performance
|
||||
|
||||
### Benchmark Results
|
||||
|
||||
We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics:
|
||||
|
||||
<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/mBuINzNvjDKdZEuIqzJeO.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/>
|
||||
|
||||
**Key Findings:**
|
||||
|
||||
- **Accuracy:** Improved performance on challenging datasets (AIME, HMMT, HLE)
|
||||
- **Efficiency:** Up to **66% reduction** in output length
|
||||
- **Robustness:** Consistent improvements when combining agent + GRPO training
|
||||
|
||||
### Evaluation Datasets
|
||||
|
||||
- **MATH500:** Subset of the MATH dataset
|
||||
- **AIME:** American Invitational Mathematics Examination problems
|
||||
- **HMMT:** Harvard-MIT Mathematics Tournament problems
|
||||
- **HLE:** High-level exam problems
|
||||
|
||||
<figure>
|
||||
<img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/a-kn3oHdlxTP_L-63N9LX.png" style="width:700px" alt="Output example showing Python code generation and execution." />
|
||||
<figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption>
|
||||
</figure>
|
||||
|
||||
## Usage
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Install uv package manager
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Clone repository
|
||||
git clone https://github.com/IntelLabs/DeepMath.git
|
||||
cd DeepMath
|
||||
|
||||
# Install dependencies
|
||||
uv pip install -r requirements.txt
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### Basic Inference
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "Intel/deepmath-v1"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_name)
|
||||
|
||||
# Example problem
|
||||
problem = "What is the sum of the first 100 positive integers?"
|
||||
|
||||
inputs = tokenizer(problem, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=3000)
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
### Inference with Agent
|
||||
|
||||
For full agent capabilities with sandboxed Python execution:
|
||||
|
||||
```bash
|
||||
python inference.py \
|
||||
+model.use_vllm=true \
|
||||
+model.math_agent=true \
|
||||
+model.examples=deep_math/fewshot.txt \
|
||||
model.generation.max_new_tokens=3000 \
|
||||
+model.max_agent_output=20000 \
|
||||
+model.max_steps=50 \
|
||||
model.model_name_or_path=Intel/deepmath-v1 \
|
||||
hf_tag=HuggingFaceH4/MATH-500 \
|
||||
generated_file=output.jsonl
|
||||
```
|
||||
|
||||
See the [repository](https://github.com/IntelLabs/DeepMath) for complete usage examples.
|
||||
|
||||
## Limitations and Biases
|
||||
|
||||
### Limitations
|
||||
|
||||
- **Scope:** Optimized for mathematical reasoning tasks; may not generalize to other domains
|
||||
- **Problem Types:** Evaluated on contest-style math problems; performance on open-ended mathematical creativity or formal proofs is unknown
|
||||
- **Model Size:** 4B parameters may limit reasoning depth on extremely complex problems
|
||||
- **Code Execution:** Requires sandboxed environment for full agent capabilities
|
||||
|
||||
### Safety Considerations
|
||||
|
||||
⚠️ **Code Execution Risk:** This model generates and executes Python code. While DeepMath uses strict sandboxing and resource limits, any deployment should:
|
||||
|
||||
- Carefully manage attack surfaces
|
||||
- Enforce rate limits
|
||||
- Use proper isolation (containers, VMs)
|
||||
- Monitor resource usage
|
||||
- Validate generated code before execution in production
|
||||
|
||||
### Ethical Considerations
|
||||
|
||||
- The model is trained on mathematical problem-solving datasets and should not be used for decision-making in critical applications without human oversight
|
||||
- Generated code should be reviewed before execution in production environments
|
||||
- The model may reflect biases present in the training data
|
||||
|
||||
## Citation
|
||||
|
||||
If you use DeepMath in your research, please cite:
|
||||
|
||||
```bibtex
|
||||
@software{deepmath2025,
|
||||
author = {Fleischer, Daniel and Berchansky, Moshe and Wasserblat, Moshe},
|
||||
title = {DeepMath: A Lightweight Math Reasoning Agent for LLMs},
|
||||
year = {2025},
|
||||
publisher = {Intel AI Labs},
|
||||
url = {https://github.com/IntelLabs/DeepMath}
|
||||
}
|
||||
```
|
||||
|
||||
## Model Card Contact
|
||||
|
||||
For questions or issues, please open an issue on the [GitHub repository](https://github.com/IntelLabs/DeepMath).
|
||||
Reference in New Issue
Block a user