31 lines
999 B
Markdown
31 lines
999 B
Markdown
|
|
---
|
|||
|
|
library_name: transformers
|
|||
|
|
base_model:
|
|||
|
|
- google/gemma-3-1b-it
|
|||
|
|
tags:
|
|||
|
|
- gemma3
|
|||
|
|
- math
|
|||
|
|
- merged
|
|||
|
|
license: gemma
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Gemma 3 1B IT — MetaMathQA Merged (α=0.5)
|
|||
|
|
|
|||
|
|
A merged model created by interpolating the weights of a MetaMathQA-finetuned Gemma 3 1B IT with the original base model.
|
|||
|
|
|
|||
|
|
## Method
|
|||
|
|
|
|||
|
|
1. **Fine-tune** `google/gemma-3-1b-it` on 7,000 samples from [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) using SFT.
|
|||
|
|
2. **Merge** the fine-tuned weights back into the base model via linear interpolation with α=0.5:
|
|||
|
|
|
|||
|
|
$$\theta_{\text{merged}} = \alpha \cdot \theta_{\text{FT}} + (1 - \alpha) \cdot \theta_{\text{base}}$$
|
|||
|
|
|
|||
|
|
This simple averaging actually improves task-specific gain from fine-tuning while retaining more of the base model's instruction following that pure FT degrades.
|
|||
|
|
|
|||
|
|
## Results
|
|||
|
|
|
|||
|
|
| Method | MMLU Redux | GSM8K | IFEval |
|
|||
|
|
|---|---|---|---|
|
|||
|
|
| Base | 39.79 | 33.66 | **40.48** |
|
|||
|
|
| FT | **41.02** | 37.15 | 28.84 |
|
|||
|
|
| **Merged** | 40.53 | **39.58** | 36.41 |
|