Files
Math-RL/README.md
ModelHub XC 5b140022fc 初始化项目,由ModelHub XC社区提供模型
Model: khazarai/Math-RL
Source: Original Platform
2026-05-04 07:21:46 +08:00

100 lines
2.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
base_model: unsloth/Qwen2.5-0.5B-Instruct
library_name: transformers
license: apache-2.0
datasets:
- HoangHa/pensez-grpo
language:
- en
pipeline_tag: text-generation
tags:
- math
- trl
- unsloth
- grpo
- transformers
---
# Model Card for Math-RL
## Model Details
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct, optimized with Group Relative Policy Optimization (GRPO) on a curated math dataset of 700 problems.
The fine-tuning process aims to enhance the models step-by-step reasoning ability in mathematical problem solving, improving its performance on structured reasoning tasks.
### Model Description
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** Qwen2.5-0.5B-Instruct
- **Fine-tuning Method**: GRPO with LoRa
- **Domain**: Mathematics (problem-solving, reasoning)
- **Dataset Size**: ~700 examples
## Uses
### Direct Use
The model is intended for:
- Educational purposes: assisting students with math problems
- Research on small-scale RLHF-style fine-tuning (GRPO)
- Experiments in reasoning with small instruction-tuned models
- Serving as a lightweight math reasoning assistant in constrained environments
## Bias, Risks, and Limitations
- Small Dataset: Fine-tuned only on 700 math problems, so generalization is limited.
- Reasoning Errors: May produce incorrect or hallucinated answers. Always verify results.
- Not a Math Oracle: Should not be used in high-stakes scenarios (e.g., exams, grading, critical calculations).
- Limited Scope: Performance is strongest on problems similar to the fine-tuning dataset; outside domains may degrade.
- Language: While the base model supports multiple languages, math-specific fine-tuning was primarily English-based.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("khazarai/Math-RL")
model = AutoModelForCausalLM.from_pretrained(
"khazarai/Math-RL",
device_map={"": 0}
)
question = """
Translate the graph of the function $y=\sin 2x$ along the $x$-axis to the left by $\dfrac{\pi }{6}$ units, and stretch the ordinate to twice its original length (the abscissa remains unchanged) to obtain the graph of the function $y=f(x)$. If the minimum value of the function $y=f(x)+a$ on the interval $\left[ 0,\dfrac{\pi }{2} \right]$ is $\sqrt{3}$, then $a=\boxed{\_\_\_\_\_}$.
"""
system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
messages = [
{"role" : "system", "content" : system},
{"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
messages,
tokenize = False,
)
from transformers import TextStreamer
_ = model.generate(
**tokenizer(text, return_tensors = "pt").to("cuda"),
max_new_tokens = 2048,
streamer = TextStreamer(tokenizer, skip_prompt = True),
)
```