MiniMerge-0.1B/README.md

---
license: apache-2.0
base_model:
  - HuggingFaceTB/SmolLM-135M
  - HuggingFaceTB/SmolLM-135M-Instruct
tags:
  - merge
  - slerp
  - smollm
  - model-merge
  - chimera
language:
  - en
pipeline_tag: text-generation
---

# SmolLM-135M-SLERP-Merge

## Overview
A **SLERP (Spherical Linear Interpolation) merge** of SmolLM-135M (base) and
SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities
with instruction-following abilities, creating a balanced model that inherits
strengths from both parents.

## Key Features
- **Dual heritage**: Combines base + instruct capabilities
- **SLERP merge**: Uses spherical interpolation for better weight preservation
- **Ultra-small**: Only ~513 MB total
- **No training required**: Pure weight-space interpolation

## Merge Details
- **Model A (base)**: [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)
- **Model B (instruct)**: [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct)
- **Method**: SLERP (Spherical Linear Interpolation)
- **Interpolation factor**: t=0.6 (60% instruct, 40% base)
- **Weight matrices**: SLERP interpolation
- **Biases/norms**: Linear interpolation

## Why SLERP?
Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially
degrading model quality. SLERP interpolates along the surface of a hypersphere,
preserving vector magnitudes while smoothly transitioning between the two models.
This typically produces higher quality merges.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")

inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Merge Recipe (for reproduction with mergekit)
```yaml
slices:
  - sources:
      - model: HuggingFaceTB/SmolLM-135M
        layer_range: [0, 30]
      - model: HuggingFaceTB/SmolLM-135M-Instruct
        layer_range: [0, 30]
merge_method: slerp
base_model: HuggingFaceTB/SmolLM-135M
parameters:
  t:
    - value: 0.6
dtype: float16
```

## Parent Models
| Model | Role | Description |
|-------|------|-------------|
| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Base | Raw language modeling |
| [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) | Instruct | Instruction following |