初始化项目,由ModelHub XC社区提供模型
Model: saadxsalman/Q-SS-0.5B-Reasoning-Math Source: Original Platform
This commit is contained in:
212
README.md
Normal file
212
README.md
Normal file
@@ -0,0 +1,212 @@
|
||||
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen2.5-0.5B-Instruct
|
||||
tags:
|
||||
- qwen2.5
|
||||
- math
|
||||
- reasoning
|
||||
- grpo
|
||||
- reinforcement-learning
|
||||
- unsloth
|
||||
- gsm8k
|
||||
- structured-output
|
||||
datasets:
|
||||
- openai/gsm8k
|
||||
- open-r1/OpenR1-Math-220k
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# Q-SS-0.5B-Reasoning-Math
|
||||
|
||||
> *A compact, fast, and structured mathematical reasoning model — built to think before it answers.*
|
||||
|
||||
**Q-SS-0.5B-Reasoning-Math** is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), trained using **Group Relative Policy Optimization (GRPO)** reinforcement learning — the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer.
|
||||
|
||||
> 💾 Looking for the lightweight CPU version? See [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) for the Q4_K_M quantized model (~300MB).
|
||||
|
||||
---
|
||||
|
||||
## ✨ Highlights
|
||||
|
||||
- 🧠 **Thinks out loud** — explicit step-by-step reasoning inside `<thought>` tags before every answer
|
||||
- 🎯 **Clean structured output** — final answer always isolated in `<answer>` tags, trivial to parse
|
||||
- 🔁 **RL-trained** — learned through reward signals, not just imitation
|
||||
- 🔧 **Fine-tunable** — full FP16 weights, ready for further training or fine-tuning
|
||||
- 🔓 **Apache 2.0** — free for personal and commercial use
|
||||
|
||||
---
|
||||
|
||||
## 📋 Model Details
|
||||
|
||||
| Property | Details |
|
||||
|---|---|
|
||||
| **Model Name** | Q-SS-0.5B-Reasoning-Math |
|
||||
| **Base Model** | Qwen/Qwen2.5-0.5B-Instruct |
|
||||
| **Parameters** | 500M |
|
||||
| **Training Method** | SFT Warm-up + GRPO Reinforcement Learning |
|
||||
| **Trained On** | GSM8K + OpenR1-Math-220k |
|
||||
| **Precision** | FP16 (merged, no adapter needed) |
|
||||
| **License** | Apache 2.0 |
|
||||
| **Developer** | Saad Salman |
|
||||
|
||||
---
|
||||
|
||||
## 💬 Output Format
|
||||
|
||||
Every response follows this strict structure:
|
||||
|
||||
```
|
||||
<thought>
|
||||
[Step-by-step reasoning and calculations]
|
||||
</thought>
|
||||
<answer>
|
||||
[Final numerical answer only]
|
||||
</answer>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype = torch.float16,
|
||||
device_map = "auto",
|
||||
)
|
||||
|
||||
SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine.
|
||||
Solve the problem step-by-step inside <thought> tags, then give ONLY the
|
||||
final numerical or LaTeX result inside <answer> tags.
|
||||
|
||||
<thought>
|
||||
[Your internal reasoning and calculations here]
|
||||
</thought>
|
||||
<answer>
|
||||
[Final answer only]
|
||||
</answer>\"\"\"
|
||||
|
||||
def solve(problem):
|
||||
messages = [
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": problem},
|
||||
]
|
||||
inputs = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize = True,
|
||||
add_generation_prompt = True,
|
||||
return_tensors = "pt",
|
||||
).to(model.device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
input_ids = inputs,
|
||||
max_new_tokens = 384,
|
||||
temperature = 0.1,
|
||||
do_sample = True,
|
||||
pad_token_id = tokenizer.eos_token_id,
|
||||
)
|
||||
|
||||
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
if "<answer>" in response:
|
||||
return response.split("<answer>")[-1].split("</answer>")[0].strip()
|
||||
return response
|
||||
|
||||
print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?"))
|
||||
# Output: 42
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Example Outputs
|
||||
|
||||
**Problem:** Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?
|
||||
|
||||
```
|
||||
<thought>
|
||||
Each cat eats 2 cans per day.
|
||||
Janet has 3 cats, so they eat 3 × 2 = 6 cans per day together.
|
||||
For 7 days: 6 × 7 = 42 cans total.
|
||||
</thought>
|
||||
<answer>
|
||||
42
|
||||
</answer>
|
||||
```
|
||||
|
||||
**Problem:** Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left?
|
||||
|
||||
```
|
||||
<thought>
|
||||
Tom starts with $50.
|
||||
He spends $12 on a book and $3 on a pen.
|
||||
Total spent: 12 + 3 = $15.
|
||||
Money remaining: 50 - 15 = $35.
|
||||
</thought>
|
||||
<answer>
|
||||
35
|
||||
</answer>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ What It's Good At
|
||||
|
||||
| Problem Type | Support |
|
||||
|---|---|
|
||||
| Basic arithmetic | ✅ Reliable |
|
||||
| Multi-step word problems | ✅ Reliable |
|
||||
| Problems with units and currency | ✅ Reliable |
|
||||
| Basic algebra | ⚠️ Partial |
|
||||
| Competition math (AMC/AIME) | ❌ Beyond capacity |
|
||||
|
||||
---
|
||||
|
||||
## 📦 Related Models
|
||||
|
||||
| Repo | Format | Size | Best For |
|
||||
|---|---|---|---|
|
||||
| [Q-SS-0.5B-Reasoning-Math](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math) | FP16 | ~988MB | GPU inference & further fine-tuning |
|
||||
| [Q-SS-0.5B-Reasoning-Math-GGUF](https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math-GGUF) | Q4_K_M | ~300MB | Local CPU inference |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Limitations
|
||||
|
||||
- Optimized for English language math problems only
|
||||
- Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale
|
||||
- Always verify critical calculations — the model may occasionally produce confident but incorrect answers
|
||||
|
||||
---
|
||||
|
||||
## 🙏 Acknowledgements
|
||||
|
||||
- [Unsloth](https://github.com/unslothai/unsloth) — efficient fine-tuning framework
|
||||
- [Qwen Team](https://huggingface.co/Qwen) — Qwen2.5-0.5B-Instruct base model
|
||||
- [HuggingFace TRL](https://github.com/huggingface/trl) — GRPO implementation
|
||||
- [OpenR1](https://huggingface.co/open-r1) — OpenR1-Math-220k dataset
|
||||
- [OpenAI](https://huggingface.co/openai) — GSM8K dataset
|
||||
|
||||
---
|
||||
|
||||
## 📄 Citation
|
||||
|
||||
```bibtex
|
||||
@misc{qss-reasoning-math-2025,
|
||||
author = {Saad Salman},
|
||||
title = {Q-SS-0.5B-Reasoning-Math},
|
||||
year = {2025},
|
||||
publisher = {HuggingFace},
|
||||
howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}},
|
||||
}
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user