初始化项目，由ModelHub XC社区提供模型

Model: KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K Source: Original Platform
2026-04-26 23:42:11 +08:00
commit 2f679d82f9
11 changed files with 151843 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,25 @@
+---
+base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
+tags:
+- text-generation-inference
+- transformers
+- unsloth
+- qwen2
+license: apache-2.0
+language:
+- en
+---
+
+# Reasoning Qwen2.5 1.5B
+
+Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in `<reasoning>…</reasoning>` and a single final number in `<answer>…</answer>`.
+
+Training: https://github.com/KickItLikeShika/llm-reasoning
+
+I split the training in two stages:
+1. Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.
+2. GRPO on top of that adapter for 2,000 steps.
+
+W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9
+
+