Files

26 lines
720 B
Markdown
Raw Permalink Normal View History

---
base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
license: apache-2.0
language:
- en
---
# Reasoning Qwen2.5 1.5B
Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in `<reasoning>…</reasoning>` and a single final number in `<answer>…</answer>`.
Training: https://github.com/KickItLikeShika/llm-reasoning
I split the training in two stages:
1. Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.
2. GRPO on top of that adapter for 2,000 steps.
W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9