Files
Qwen2.5-1.5B-Instruct-SFT-G…/README.md
ModelHub XC 2f679d82f9 初始化项目,由ModelHub XC社区提供模型
Model: KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K
Source: Original Platform
2026-04-26 23:42:11 +08:00

720 B

base_model, tags, license, language
base_model tags license language
unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
text-generation-inference
transformers
unsloth
qwen2
apache-2.0
en

Reasoning Qwen2.5 1.5B

Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in <reasoning>…</reasoning> and a single final number in <answer>…</answer>.

Training: https://github.com/KickItLikeShika/llm-reasoning

I split the training in two stages:

  1. Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.
  2. GRPO on top of that adapter for 2,000 steps.

W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9