--- license: gemma tags: - gemma - gemma3 - tunix - grpo - reasoning - thinking library_name: transformers language: - en base_model: - chimbiwide/gemma-3-1b-it-thinking-32k-sft-base --- # GemmaThink-32k (GRPO Trained) This model was trained using GRPO (Group Relative Policy Optimization) to generate structured reasoning traces. ## Training Details - **Base Model**: chimbiwide/gemma-3-1b-it-thinking-32k-sft-base - **Training Method**: SFT + GRPO - **LoRA Rank**: 32 - **LoRA Alpha**: 64.0 - **Framework**: Tunix (JAX) - **Hardware**: v6e-1 TPU in Colab ## Output Format ``` step-by-step thinking process final answer ``` ## Quicklinks: - ***[SFT Base Model](https://huggingface.co/chimbiwide/gemma-3-1b-it-thinking-32k-sft-base)*** - ***[SFT Base Model Q8 GGUF](https://huggingface.co/chimbiwide/gemma-3-1b-it-thinking-32k-sft-base-Q8_0-GGUF)*** - ***[GRPO Full Model](https://huggingface.co/chimbiwide/gemma-3-1b-it-thinking-32k-grpo-merged)*** <-- You're here - ***[Q8-GGUF](https://huggingface.co/chimbiwide/gemma-3-1b-it-thinking-32k-grpo-merged-Q8_0-GGUF)*** - ***[Article](https://huggingface.co/blog/chimbiwide/gemma3think)***