Qwen2.5-1.5B-Instruct-SFT-G…/README.md

---
base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
license: apache-2.0
language:
- en
---

# Reasoning Qwen2.5 1.5B

Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in `<reasoning>…</reasoning>` and a single final number in `<answer>…</answer>`.

Training: https://github.com/KickItLikeShika/llm-reasoning

I split the training in two stages:
1. Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.
2. GRPO on top of that adapter for 2,000 steps.

W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9
初始化项目，由ModelHub XC社区提供模型 Model: KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K Source: Original Platform 2026-04-26 23:42:11 +08:00			`---`
			`base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit`
			`tags:`
			`- text-generation-inference`
			`- transformers`
			`- unsloth`
			`- qwen2`
			`license: apache-2.0`
			`language:`
			`- en`
			`---`

			`# Reasoning Qwen2.5 1.5B`

			Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in `<reasoning>…</reasoning>` and a single final number in `<answer>…</answer>`.

			`Training: https://github.com/KickItLikeShika/llm-reasoning`

			`I split the training in two stages:`
			`1. Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.`
			`2. GRPO on top of that adapter for 2,000 steps.`

			`W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9`