213 lines
5.8 KiB
Markdown
213 lines
5.8 KiB
Markdown
|
|
|
|||
|
|
---
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
license: apache-2.0
|
|||
|
|
base_model: Qwen/Qwen2.5-0.5B-Instruct
|
|||
|
|
tags:
|
|||
|
|
- qwen2.5
|
|||
|
|
- math
|
|||
|
|
- reasoning
|
|||
|
|
- grpo
|
|||
|
|
- reinforcement-learning
|
|||
|
|
- unsloth
|
|||
|
|
- gsm8k
|
|||
|
|
- structured-output
|
|||
|
|
datasets:
|
|||
|
|
- openai/gsm8k
|
|||
|
|
- open-r1/OpenR1-Math-220k
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
library_name: transformers
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Q-SS-0.5B-Reasoning-Math
|
|||
|
|
|
|||
|
|
> *A compact, fast, and structured mathematical reasoning model — built to think before it answers.*
|
|||
|
|
|
|||
|
|
**Q-SS-0.5B-Reasoning-Math** is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), trained using **Group Relative Policy Optimization (GRPO)** reinforcement learning — the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer.
|
|||
|
|
|
|||
|
|
> 💾 Looking for the lightweight CPU version? See [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) for the Q4_K_M quantized model (~300MB).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✨ Highlights
|
|||
|
|
|
|||
|
|
- 🧠 **Thinks out loud** — explicit step-by-step reasoning inside `<thought>` tags before every answer
|
|||
|
|
- 🎯 **Clean structured output** — final answer always isolated in `<answer>` tags, trivial to parse
|
|||
|
|
- 🔁 **RL-trained** — learned through reward signals, not just imitation
|
|||
|
|
- 🔧 **Fine-tunable** — full FP16 weights, ready for further training or fine-tuning
|
|||
|
|
- 🔓 **Apache 2.0** — free for personal and commercial use
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 Model Details
|
|||
|
|
|
|||
|
|
| Property | Details |
|
|||
|
|
|---|---|
|
|||
|
|
| **Model Name** | Q-SS-0.5B-Reasoning-Math |
|
|||
|
|
| **Base Model** | Qwen/Qwen2.5-0.5B-Instruct |
|
|||
|
|
| **Parameters** | 500M |
|
|||
|
|
| **Training Method** | SFT Warm-up + GRPO Reinforcement Learning |
|
|||
|
|
| **Trained On** | GSM8K + OpenR1-Math-220k |
|
|||
|
|
| **Precision** | FP16 (merged, no adapter needed) |
|
|||
|
|
| **License** | Apache 2.0 |
|
|||
|
|
| **Developer** | Saad Salman |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💬 Output Format
|
|||
|
|
|
|||
|
|
Every response follows this strict structure:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
<thought>
|
|||
|
|
[Step-by-step reasoning and calculations]
|
|||
|
|
</thought>
|
|||
|
|
<answer>
|
|||
|
|
[Final numerical answer only]
|
|||
|
|
</answer>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Quick Start
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
import torch
|
|||
|
|
|
|||
|
|
model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math"
|
|||
|
|
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
torch_dtype = torch.float16,
|
|||
|
|
device_map = "auto",
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine.
|
|||
|
|
Solve the problem step-by-step inside <thought> tags, then give ONLY the
|
|||
|
|
final numerical or LaTeX result inside <answer> tags.
|
|||
|
|
|
|||
|
|
<thought>
|
|||
|
|
[Your internal reasoning and calculations here]
|
|||
|
|
</thought>
|
|||
|
|
<answer>
|
|||
|
|
[Final answer only]
|
|||
|
|
</answer>\"\"\"
|
|||
|
|
|
|||
|
|
def solve(problem):
|
|||
|
|
messages = [
|
|||
|
|
{"role": "system", "content": SYSTEM_PROMPT},
|
|||
|
|
{"role": "user", "content": problem},
|
|||
|
|
]
|
|||
|
|
inputs = tokenizer.apply_chat_template(
|
|||
|
|
messages,
|
|||
|
|
tokenize = True,
|
|||
|
|
add_generation_prompt = True,
|
|||
|
|
return_tensors = "pt",
|
|||
|
|
).to(model.device)
|
|||
|
|
|
|||
|
|
with torch.no_grad():
|
|||
|
|
outputs = model.generate(
|
|||
|
|
input_ids = inputs,
|
|||
|
|
max_new_tokens = 384,
|
|||
|
|
temperature = 0.1,
|
|||
|
|
do_sample = True,
|
|||
|
|
pad_token_id = tokenizer.eos_token_id,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|||
|
|
if "<answer>" in response:
|
|||
|
|
return response.split("<answer>")[-1].split("</answer>")[0].strip()
|
|||
|
|
return response
|
|||
|
|
|
|||
|
|
print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?"))
|
|||
|
|
# Output: 42
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 Example Outputs
|
|||
|
|
|
|||
|
|
**Problem:** Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
<thought>
|
|||
|
|
Each cat eats 2 cans per day.
|
|||
|
|
Janet has 3 cats, so they eat 3 × 2 = 6 cans per day together.
|
|||
|
|
For 7 days: 6 × 7 = 42 cans total.
|
|||
|
|
</thought>
|
|||
|
|
<answer>
|
|||
|
|
42
|
|||
|
|
</answer>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Problem:** Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left?
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
<thought>
|
|||
|
|
Tom starts with $50.
|
|||
|
|
He spends $12 on a book and $3 on a pen.
|
|||
|
|
Total spent: 12 + 3 = $15.
|
|||
|
|
Money remaining: 50 - 15 = $35.
|
|||
|
|
</thought>
|
|||
|
|
<answer>
|
|||
|
|
35
|
|||
|
|
</answer>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✅ What It's Good At
|
|||
|
|
|
|||
|
|
| Problem Type | Support |
|
|||
|
|
|---|---|
|
|||
|
|
| Basic arithmetic | ✅ Reliable |
|
|||
|
|
| Multi-step word problems | ✅ Reliable |
|
|||
|
|
| Problems with units and currency | ✅ Reliable |
|
|||
|
|
| Basic algebra | ⚠️ Partial |
|
|||
|
|
| Competition math (AMC/AIME) | ❌ Beyond capacity |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📦 Related Models
|
|||
|
|
|
|||
|
|
| Repo | Format | Size | Best For |
|
|||
|
|
|---|---|---|---|
|
|||
|
|
| [Q-SS-0.5B-Reasoning-Math](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math) | FP16 | ~988MB | GPU inference & further fine-tuning |
|
|||
|
|
| [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) | Q4_K_M | ~300MB | Local CPU inference |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚠️ Limitations
|
|||
|
|
|
|||
|
|
- Optimized for English language math problems only
|
|||
|
|
- Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale
|
|||
|
|
- Always verify critical calculations — the model may occasionally produce confident but incorrect answers
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🙏 Acknowledgements
|
|||
|
|
|
|||
|
|
- [Unsloth](https://github.com/unslothai/unsloth) — efficient fine-tuning framework
|
|||
|
|
- [Qwen Team](https://huggingface.co/Qwen) — Qwen2.5-0.5B-Instruct base model
|
|||
|
|
- [HuggingFace TRL](https://github.com/huggingface/trl) — GRPO implementation
|
|||
|
|
- [OpenR1](https://huggingface.co/open-r1) — OpenR1-Math-220k dataset
|
|||
|
|
- [OpenAI](https://huggingface.co/openai) — GSM8K dataset
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📄 Citation
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{qss-reasoning-math-2025,
|
|||
|
|
author = {Saad Salman},
|
|||
|
|
title = {Q-SS-0.5B-Reasoning-Math},
|
|||
|
|
year = {2025},
|
|||
|
|
publisher = {HuggingFace},
|
|||
|
|
howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}},
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|