初始化项目,由ModelHub XC社区提供模型
Model: KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K Source: Original Platform
This commit is contained in:
25
README.md
Normal file
25
README.md
Normal file
@@ -0,0 +1,25 @@
|
||||
---
|
||||
base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
|
||||
tags:
|
||||
- text-generation-inference
|
||||
- transformers
|
||||
- unsloth
|
||||
- qwen2
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
---
|
||||
|
||||
# Reasoning Qwen2.5 1.5B
|
||||
|
||||
Reasoning Qwen2.5 1.5B model to solve grade-level math with explicit structure: a short scratchpad in `<reasoning>…</reasoning>` and a single final number in `<answer>…</answer>`.
|
||||
|
||||
Training: https://github.com/KickItLikeShika/llm-reasoning
|
||||
|
||||
I split the training in two stages:
|
||||
1. Short LoRA SFT on 100 random GSM8K training examples, to teach format + roughly sensible traces, not to maximize benchmark score.
|
||||
2. GRPO on top of that adapter for 2,000 steps.
|
||||
|
||||
W&B Report: https://api.wandb.ai/links/egyttsteam/uja0job9
|
||||
|
||||
|
||||
Reference in New Issue
Block a user