79 lines
2.6 KiB
Markdown
79 lines
2.6 KiB
Markdown
---
|
|
license: apache-2.0
|
|
base_model:
|
|
- HuggingFaceTB/SmolLM-135M
|
|
- HuggingFaceTB/SmolLM-135M-Instruct
|
|
tags:
|
|
- merge
|
|
- slerp
|
|
- smollm
|
|
- model-merge
|
|
- chimera
|
|
language:
|
|
- en
|
|
pipeline_tag: text-generation
|
|
---
|
|
|
|
# SmolLM-135M-SLERP-Merge
|
|
|
|
## Overview
|
|
A **SLERP (Spherical Linear Interpolation) merge** of SmolLM-135M (base) and
|
|
SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities
|
|
with instruction-following abilities, creating a balanced model that inherits
|
|
strengths from both parents.
|
|
|
|
## Key Features
|
|
- **Dual heritage**: Combines base + instruct capabilities
|
|
- **SLERP merge**: Uses spherical interpolation for better weight preservation
|
|
- **Ultra-small**: Only ~513 MB total
|
|
- **No training required**: Pure weight-space interpolation
|
|
|
|
## Merge Details
|
|
- **Model A (base)**: [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)
|
|
- **Model B (instruct)**: [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct)
|
|
- **Method**: SLERP (Spherical Linear Interpolation)
|
|
- **Interpolation factor**: t=0.6 (60% instruct, 40% base)
|
|
- **Weight matrices**: SLERP interpolation
|
|
- **Biases/norms**: Linear interpolation
|
|
|
|
## Why SLERP?
|
|
Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially
|
|
degrading model quality. SLERP interpolates along the surface of a hypersphere,
|
|
preserving vector magnitudes while smoothly transitioning between the two models.
|
|
This typically produces higher quality merges.
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
|
|
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
|
|
|
|
inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
|
|
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
```
|
|
|
|
## Merge Recipe (for reproduction with mergekit)
|
|
```yaml
|
|
slices:
|
|
- sources:
|
|
- model: HuggingFaceTB/SmolLM-135M
|
|
layer_range: [0, 30]
|
|
- model: HuggingFaceTB/SmolLM-135M-Instruct
|
|
layer_range: [0, 30]
|
|
merge_method: slerp
|
|
base_model: HuggingFaceTB/SmolLM-135M
|
|
parameters:
|
|
t:
|
|
- value: 0.6
|
|
dtype: float16
|
|
```
|
|
|
|
## Parent Models
|
|
| Model | Role | Description |
|
|
|-------|------|-------------|
|
|
| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Base | Raw language modeling |
|
|
| [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) | Instruct | Instruction following |
|