license, tags, library_name, language, base_model
| license |
tags |
library_name |
language |
base_model |
| gemma |
| gemma |
| gemma3 |
| tunix |
| grpo |
| reasoning |
| thinking |
|
transformers |
|
| chimbiwide/gemma-3-1b-it-thinking-32k-sft-base |
|
GemmaThink-32k (GRPO Trained)
This model was trained using GRPO (Group Relative Policy Optimization) to generate structured reasoning traces.
Training Details
- Base Model: chimbiwide/gemma-3-1b-it-thinking-32k-sft-base
- Training Method: SFT + GRPO
- LoRA Rank: 32
- LoRA Alpha: 64.0
- Framework: Tunix (JAX)
- Hardware: v6e-1 TPU in Colab
Output Format
Quicklinks: