--- base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen2 license: apache-2.0 language: - en --- # Reasoning Qwen2.5 1.5B Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in `` and a single final number in ``. Training: https://github.com/KickItLikeShika/llm-reasoning I split the training in two stages: 1. Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score. 2. GRPO on top of that adapter for 2,000 steps. W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9