720 B
720 B
base_model, tags, license, language
| base_model | tags | license | language | |||||
|---|---|---|---|---|---|---|---|---|
| unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit |
|
apache-2.0 |
|
Reasoning Qwen2.5 1.5B
Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in <reasoning>…</reasoning> and a single final number in <answer>…</answer>.
Training: https://github.com/KickItLikeShika/llm-reasoning
I split the training in two stages:
- Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.
- GRPO on top of that adapter for 2,000 steps.
W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9