license, tags, library_name, language, base_model
license tags library_name language base_model
gemma
gemma
gemma3
tunix
grpo
reasoning
thinking
transformers
en
chimbiwide/gemma-3-1b-it-thinking-32k-sft-base

GemmaThink-32k (GRPO Trained)

This model was trained using GRPO (Group Relative Policy Optimization) to generate structured reasoning traces.

Training Details

  • Base Model: chimbiwide/gemma-3-1b-it-thinking-32k-sft-base
  • Training Method: SFT + GRPO
  • LoRA Rank: 32
  • LoRA Alpha: 64.0
  • Framework: Tunix (JAX)
  • Hardware: v6e-1 TPU in Colab

Output Format

<reasoning>step-by-step thinking process</reasoning>
<answer>final answer</answer>
Description
Model synced from source: KeeganCarey/gemma-3-1b-it-amr_thinking
Readme 67 KiB