MiniMerge-0.1B/README.md

---
license: apache-2.0
base_model:
  - HuggingFaceTB/SmolLM-135M
  - HuggingFaceTB/SmolLM-135M-Instruct
tags:
  - merge
  - slerp
  - smollm
  - model-merge
  - chimera
language:
  - en
pipeline_tag: text-generation
---

# SmolLM-135M-SLERP-Merge

## Overview
A **SLERP (Spherical Linear Interpolation) merge** of SmolLM-135M (base) and
SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities
with instruction-following abilities, creating a balanced model that inherits
strengths from both parents.

## Key Features
- **Dual heritage**: Combines base + instruct capabilities
- **SLERP merge**: Uses spherical interpolation for better weight preservation
- **Ultra-small**: Only ~513 MB total
- **No training required**: Pure weight-space interpolation

## Merge Details
- **Model A (base)**: [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)
- **Model B (instruct)**: [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct)
- **Method**: SLERP (Spherical Linear Interpolation)
- **Interpolation factor**: t=0.6 (60% instruct, 40% base)
- **Weight matrices**: SLERP interpolation
- **Biases/norms**: Linear interpolation

## Why SLERP?
Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially
degrading model quality. SLERP interpolates along the surface of a hypersphere,
preserving vector magnitudes while smoothly transitioning between the two models.
This typically produces higher quality merges.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")

inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Merge Recipe (for reproduction with mergekit)
```yaml
slices:
  - sources:
      - model: HuggingFaceTB/SmolLM-135M
        layer_range: [0, 30]
      - model: HuggingFaceTB/SmolLM-135M-Instruct
        layer_range: [0, 30]
merge_method: slerp
base_model: HuggingFaceTB/SmolLM-135M
parameters:
  t:
    - value: 0.6
dtype: float16
```

## Parent Models
| Model | Role | Description |
|-------|------|-------------|
| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Base | Raw language modeling |
| [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) | Instruct | Instruction following |
初始化项目，由ModelHub XC社区提供模型 Model: Abhinav-Anand/MiniMerge-0.1B Source: Original Platform 2026-04-28 23:01:07 +08:00			`---`
			`license: apache-2.0`
			`base_model:`
			`- HuggingFaceTB/SmolLM-135M`
			`- HuggingFaceTB/SmolLM-135M-Instruct`
			`tags:`
			`- merge`
			`- slerp`
			`- smollm`
			`- model-merge`
			`- chimera`
			`language:`
			`- en`
			`pipeline_tag: text-generation`
			`---`

			`# SmolLM-135M-SLERP-Merge`

			`## Overview`
			`A SLERP (Spherical Linear Interpolation) merge of SmolLM-135M (base) and`
			`SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities`
			`with instruction-following abilities, creating a balanced model that inherits`
			`strengths from both parents.`

			`## Key Features`
			`- Dual heritage: Combines base + instruct capabilities`
			`- SLERP merge: Uses spherical interpolation for better weight preservation`
			`- Ultra-small: Only ~513 MB total`
			`- No training required: Pure weight-space interpolation`

			`## Merge Details`
			`- Model A (base): [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)`
			`- Model B (instruct): [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct)`
			`- Method: SLERP (Spherical Linear Interpolation)`
			`- Interpolation factor: t=0.6 (60% instruct, 40% base)`
			`- Weight matrices: SLERP interpolation`
			`- Biases/norms: Linear interpolation`

			`## Why SLERP?`
			`Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially`
			`degrading model quality. SLERP interpolates along the surface of a hypersphere,`
			`preserving vector magnitudes while smoothly transitioning between the two models.`
			`This typically produces higher quality merges.`

			`## Usage`

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`

			`model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")`
			`tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")`

			`inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")`
			`outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)`
			`print(tokenizer.decode(outputs[0], skip_special_tokens=True))`
			```

			`## Merge Recipe (for reproduction with mergekit)`
			```yaml
			`slices:`
			`- sources:`
			`- model: HuggingFaceTB/SmolLM-135M`
			`layer_range: [0, 30]`
			`- model: HuggingFaceTB/SmolLM-135M-Instruct`
			`layer_range: [0, 30]`
			`merge_method: slerp`
			`base_model: HuggingFaceTB/SmolLM-135M`
			`parameters:`
			`t:`
			`- value: 0.6`
			`dtype: float16`
			```

			`## Parent Models`
			`\| Model \| Role \| Description \|`
			`\|-------\|------\|-------------\|`
			`\| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) \| Base \| Raw language modeling \|`
			`\| [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) \| Instruct \| Instruction following \|`