KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K

Files

ModelHub XC 2f679d82f9 初始化项目，由ModelHub XC社区提供模型

Model: KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K
Source: Original Platform

2026-04-26 23:42:11 +08:00

720 B

Raw Blame History

base_model, tags, license, language

base_model

Reasoning Qwen2.5 1.5B

Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in <reasoning>…</reasoning> and a single final number in <answer>…</answer>.

Training: https://github.com/KickItLikeShika/llm-reasoning

I split the training in two stages:

Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.
GRPO on top of that adapter for 2,000 steps.

W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9

720 B Raw Blame History

Reasoning Qwen2.5 1.5B

720 B

Raw Blame History