初始化项目,由ModelHub XC社区提供模型
Model: Abhinav-Anand/MiniMerge-0.1B Source: Original Platform
This commit is contained in:
78
README.md
Normal file
78
README.md
Normal file
@@ -0,0 +1,78 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model:
|
||||
- HuggingFaceTB/SmolLM-135M
|
||||
- HuggingFaceTB/SmolLM-135M-Instruct
|
||||
tags:
|
||||
- merge
|
||||
- slerp
|
||||
- smollm
|
||||
- model-merge
|
||||
- chimera
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# SmolLM-135M-SLERP-Merge
|
||||
|
||||
## Overview
|
||||
A **SLERP (Spherical Linear Interpolation) merge** of SmolLM-135M (base) and
|
||||
SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities
|
||||
with instruction-following abilities, creating a balanced model that inherits
|
||||
strengths from both parents.
|
||||
|
||||
## Key Features
|
||||
- **Dual heritage**: Combines base + instruct capabilities
|
||||
- **SLERP merge**: Uses spherical interpolation for better weight preservation
|
||||
- **Ultra-small**: Only ~513 MB total
|
||||
- **No training required**: Pure weight-space interpolation
|
||||
|
||||
## Merge Details
|
||||
- **Model A (base)**: [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)
|
||||
- **Model B (instruct)**: [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct)
|
||||
- **Method**: SLERP (Spherical Linear Interpolation)
|
||||
- **Interpolation factor**: t=0.6 (60% instruct, 40% base)
|
||||
- **Weight matrices**: SLERP interpolation
|
||||
- **Biases/norms**: Linear interpolation
|
||||
|
||||
## Why SLERP?
|
||||
Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially
|
||||
degrading model quality. SLERP interpolates along the surface of a hypersphere,
|
||||
preserving vector magnitudes while smoothly transitioning between the two models.
|
||||
This typically produces higher quality merges.
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
|
||||
tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
|
||||
|
||||
inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Merge Recipe (for reproduction with mergekit)
|
||||
```yaml
|
||||
slices:
|
||||
- sources:
|
||||
- model: HuggingFaceTB/SmolLM-135M
|
||||
layer_range: [0, 30]
|
||||
- model: HuggingFaceTB/SmolLM-135M-Instruct
|
||||
layer_range: [0, 30]
|
||||
merge_method: slerp
|
||||
base_model: HuggingFaceTB/SmolLM-135M
|
||||
parameters:
|
||||
t:
|
||||
- value: 0.6
|
||||
dtype: float16
|
||||
```
|
||||
|
||||
## Parent Models
|
||||
| Model | Role | Description |
|
||||
|-------|------|-------------|
|
||||
| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Base | Raw language modeling |
|
||||
| [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) | Instruct | Instruction following |
|
||||
Reference in New Issue
Block a user