Files
MiniMerge-0.1B/README.md

79 lines
2.6 KiB
Markdown
Raw Permalink Normal View History

---
license: apache-2.0
base_model:
- HuggingFaceTB/SmolLM-135M
- HuggingFaceTB/SmolLM-135M-Instruct
tags:
- merge
- slerp
- smollm
- model-merge
- chimera
language:
- en
pipeline_tag: text-generation
---
# SmolLM-135M-SLERP-Merge
## Overview
A **SLERP (Spherical Linear Interpolation) merge** of SmolLM-135M (base) and
SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities
with instruction-following abilities, creating a balanced model that inherits
strengths from both parents.
## Key Features
- **Dual heritage**: Combines base + instruct capabilities
- **SLERP merge**: Uses spherical interpolation for better weight preservation
- **Ultra-small**: Only ~513 MB total
- **No training required**: Pure weight-space interpolation
## Merge Details
- **Model A (base)**: [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)
- **Model B (instruct)**: [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct)
- **Method**: SLERP (Spherical Linear Interpolation)
- **Interpolation factor**: t=0.6 (60% instruct, 40% base)
- **Weight matrices**: SLERP interpolation
- **Biases/norms**: Linear interpolation
## Why SLERP?
Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially
degrading model quality. SLERP interpolates along the surface of a hypersphere,
preserving vector magnitudes while smoothly transitioning between the two models.
This typically produces higher quality merges.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Merge Recipe (for reproduction with mergekit)
```yaml
slices:
- sources:
- model: HuggingFaceTB/SmolLM-135M
layer_range: [0, 30]
- model: HuggingFaceTB/SmolLM-135M-Instruct
layer_range: [0, 30]
merge_method: slerp
base_model: HuggingFaceTB/SmolLM-135M
parameters:
t:
- value: 0.6
dtype: float16
```
## Parent Models
| Model | Role | Description |
|-------|------|-------------|
| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Base | Raw language modeling |
| [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) | Instruct | Instruction following |