213 lines
5.8 KiB
Markdown
213 lines
5.8 KiB
Markdown
|
||
---
|
||
language:
|
||
- en
|
||
license: apache-2.0
|
||
base_model: Qwen/Qwen2.5-0.5B-Instruct
|
||
tags:
|
||
- qwen2.5
|
||
- math
|
||
- reasoning
|
||
- grpo
|
||
- reinforcement-learning
|
||
- unsloth
|
||
- gsm8k
|
||
- structured-output
|
||
datasets:
|
||
- openai/gsm8k
|
||
- open-r1/OpenR1-Math-220k
|
||
pipeline_tag: text-generation
|
||
library_name: transformers
|
||
---
|
||
|
||
# Q-SS-0.5B-Reasoning-Math
|
||
|
||
> *A compact, fast, and structured mathematical reasoning model — built to think before it answers.*
|
||
|
||
**Q-SS-0.5B-Reasoning-Math** is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), trained using **Group Relative Policy Optimization (GRPO)** reinforcement learning — the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer.
|
||
|
||
> 💾 Looking for the lightweight CPU version? See [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) for the Q4_K_M quantized model (~300MB).
|
||
|
||
---
|
||
|
||
## ✨ Highlights
|
||
|
||
- 🧠 **Thinks out loud** — explicit step-by-step reasoning inside `<thought>` tags before every answer
|
||
- 🎯 **Clean structured output** — final answer always isolated in `<answer>` tags, trivial to parse
|
||
- 🔁 **RL-trained** — learned through reward signals, not just imitation
|
||
- 🔧 **Fine-tunable** — full FP16 weights, ready for further training or fine-tuning
|
||
- 🔓 **Apache 2.0** — free for personal and commercial use
|
||
|
||
---
|
||
|
||
## 📋 Model Details
|
||
|
||
| Property | Details |
|
||
|---|---|
|
||
| **Model Name** | Q-SS-0.5B-Reasoning-Math |
|
||
| **Base Model** | Qwen/Qwen2.5-0.5B-Instruct |
|
||
| **Parameters** | 500M |
|
||
| **Training Method** | SFT Warm-up + GRPO Reinforcement Learning |
|
||
| **Trained On** | GSM8K + OpenR1-Math-220k |
|
||
| **Precision** | FP16 (merged, no adapter needed) |
|
||
| **License** | Apache 2.0 |
|
||
| **Developer** | Saad Salman |
|
||
|
||
---
|
||
|
||
## 💬 Output Format
|
||
|
||
Every response follows this strict structure:
|
||
|
||
```
|
||
<thought>
|
||
[Step-by-step reasoning and calculations]
|
||
</thought>
|
||
<answer>
|
||
[Final numerical answer only]
|
||
</answer>
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 Quick Start
|
||
|
||
```python
|
||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
import torch
|
||
|
||
model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math"
|
||
|
||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
model = AutoModelForCausalLM.from_pretrained(
|
||
model_name,
|
||
torch_dtype = torch.float16,
|
||
device_map = "auto",
|
||
)
|
||
|
||
SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine.
|
||
Solve the problem step-by-step inside <thought> tags, then give ONLY the
|
||
final numerical or LaTeX result inside <answer> tags.
|
||
|
||
<thought>
|
||
[Your internal reasoning and calculations here]
|
||
</thought>
|
||
<answer>
|
||
[Final answer only]
|
||
</answer>\"\"\"
|
||
|
||
def solve(problem):
|
||
messages = [
|
||
{"role": "system", "content": SYSTEM_PROMPT},
|
||
{"role": "user", "content": problem},
|
||
]
|
||
inputs = tokenizer.apply_chat_template(
|
||
messages,
|
||
tokenize = True,
|
||
add_generation_prompt = True,
|
||
return_tensors = "pt",
|
||
).to(model.device)
|
||
|
||
with torch.no_grad():
|
||
outputs = model.generate(
|
||
input_ids = inputs,
|
||
max_new_tokens = 384,
|
||
temperature = 0.1,
|
||
do_sample = True,
|
||
pad_token_id = tokenizer.eos_token_id,
|
||
)
|
||
|
||
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||
if "<answer>" in response:
|
||
return response.split("<answer>")[-1].split("</answer>")[0].strip()
|
||
return response
|
||
|
||
print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?"))
|
||
# Output: 42
|
||
```
|
||
|
||
---
|
||
|
||
## 📝 Example Outputs
|
||
|
||
**Problem:** Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?
|
||
|
||
```
|
||
<thought>
|
||
Each cat eats 2 cans per day.
|
||
Janet has 3 cats, so they eat 3 × 2 = 6 cans per day together.
|
||
For 7 days: 6 × 7 = 42 cans total.
|
||
</thought>
|
||
<answer>
|
||
42
|
||
</answer>
|
||
```
|
||
|
||
**Problem:** Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left?
|
||
|
||
```
|
||
<thought>
|
||
Tom starts with $50.
|
||
He spends $12 on a book and $3 on a pen.
|
||
Total spent: 12 + 3 = $15.
|
||
Money remaining: 50 - 15 = $35.
|
||
</thought>
|
||
<answer>
|
||
35
|
||
</answer>
|
||
```
|
||
|
||
---
|
||
|
||
## ✅ What It's Good At
|
||
|
||
| Problem Type | Support |
|
||
|---|---|
|
||
| Basic arithmetic | ✅ Reliable |
|
||
| Multi-step word problems | ✅ Reliable |
|
||
| Problems with units and currency | ✅ Reliable |
|
||
| Basic algebra | ⚠️ Partial |
|
||
| Competition math (AMC/AIME) | ❌ Beyond capacity |
|
||
|
||
---
|
||
|
||
## 📦 Related Models
|
||
|
||
| Repo | Format | Size | Best For |
|
||
|---|---|---|---|
|
||
| [Q-SS-0.5B-Reasoning-Math](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math) | FP16 | ~988MB | GPU inference & further fine-tuning |
|
||
| [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) | Q4_K_M | ~300MB | Local CPU inference |
|
||
|
||
---
|
||
|
||
## ⚠️ Limitations
|
||
|
||
- Optimized for English language math problems only
|
||
- Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale
|
||
- Always verify critical calculations — the model may occasionally produce confident but incorrect answers
|
||
|
||
---
|
||
|
||
## 🙏 Acknowledgements
|
||
|
||
- [Unsloth](https://github.com/unslothai/unsloth) — efficient fine-tuning framework
|
||
- [Qwen Team](https://huggingface.co/Qwen) — Qwen2.5-0.5B-Instruct base model
|
||
- [HuggingFace TRL](https://github.com/huggingface/trl) — GRPO implementation
|
||
- [OpenR1](https://huggingface.co/open-r1) — OpenR1-Math-220k dataset
|
||
- [OpenAI](https://huggingface.co/openai) — GSM8K dataset
|
||
|
||
---
|
||
|
||
## 📄 Citation
|
||
|
||
```bibtex
|
||
@misc{qss-reasoning-math-2025,
|
||
author = {Saad Salman},
|
||
title = {Q-SS-0.5B-Reasoning-Math},
|
||
year = {2025},
|
||
publisher = {HuggingFace},
|
||
howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}},
|
||
}
|
||
```
|
||
|