初始化项目，由ModelHub XC社区提供模型

Model: Abhinav-Anand/MiniMerge-0.1B Source: Original Platform
2026-04-28 23:01:07 +08:00
commit 3095b26d93
10 changed files with 294212 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,78 @@
+---
+license: apache-2.0
+base_model:
+  - HuggingFaceTB/SmolLM-135M
+  - HuggingFaceTB/SmolLM-135M-Instruct
+tags:
+  - merge
+  - slerp
+  - smollm
+  - model-merge
+  - chimera
+language:
+  - en
+pipeline_tag: text-generation
+---
+
+# SmolLM-135M-SLERP-Merge
+
+## Overview
+A **SLERP (Spherical Linear Interpolation) merge** of SmolLM-135M (base) and
+SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities
+with instruction-following abilities, creating a balanced model that inherits
+strengths from both parents.
+
+## Key Features
+- **Dual heritage**: Combines base + instruct capabilities
+- **SLERP merge**: Uses spherical interpolation for better weight preservation
+- **Ultra-small**: Only ~513 MB total
+- **No training required**: Pure weight-space interpolation
+
+## Merge Details
+- **Model A (base)**: [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M)
+- **Model B (instruct)**: [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct)
+- **Method**: SLERP (Spherical Linear Interpolation)
+- **Interpolation factor**: t=0.6 (60% instruct, 40% base)
+- **Weight matrices**: SLERP interpolation
+- **Biases/norms**: Linear interpolation
+
+## Why SLERP?
+Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially
+degrading model quality. SLERP interpolates along the surface of a hypersphere,
+preserving vector magnitudes while smoothly transitioning between the two models.
+This typically produces higher quality merges.
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
+tokenizer = AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")
+
+inputs = tokenizer("Explain what photosynthesis is:", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## Merge Recipe (for reproduction with mergekit)
+```yaml
+slices:
+  - sources:
+      - model: HuggingFaceTB/SmolLM-135M
+        layer_range: [0, 30]
+      - model: HuggingFaceTB/SmolLM-135M-Instruct
+        layer_range: [0, 30]
+merge_method: slerp
+base_model: HuggingFaceTB/SmolLM-135M
+parameters:
+  t:
+    - value: 0.6
+dtype: float16
+```
+
+## Parent Models
+| Model | Role | Description |
+|-------|------|-------------|
+| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Base | Raw language modeling |
+| [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) | Instruct | Instruction following |