1426221f3ce66beeeabd46dc24cb3ad50d7cfb6a
Model: KeeganCarey/gemma-3-1b-it-amr_thinking Source: Original Platform
license, tags, library_name, language, base_model
| license | tags | library_name | language | base_model | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gemma |
|
transformers |
|
|
GemmaThink-32k (GRPO Trained)
This model was trained using GRPO (Group Relative Policy Optimization) to generate structured reasoning traces.
Training Details
- Base Model: chimbiwide/gemma-3-1b-it-thinking-32k-sft-base
- Training Method: SFT + GRPO
- LoRA Rank: 32
- LoRA Alpha: 64.0
- Framework: Tunix (JAX)
- Hardware: v6e-1 TPU in Colab
Output Format
<reasoning>step-by-step thinking process</reasoning>
<answer>final answer</answer>
Quicklinks:
- SFT Base Model
- SFT Base Model Q8 GGUF
- GRPO Full Model <-- You're here
- Q8-GGUF
- Article
Description