A SLERP (Spherical Linear Interpolation) merge of SmolLM-135M (base) and
SmolLM-135M-Instruct. This "chimera" model blends raw language modeling capabilities
with instruction-following abilities, creating a balanced model that inherits
strengths from both parents.
Key Features
Dual heritage: Combines base + instruct capabilities
SLERP merge: Uses spherical interpolation for better weight preservation
Ultra-small: Only ~513 MB total
No training required: Pure weight-space interpolation
Linear interpolation (lerp) can reduce the magnitude of weight vectors, potentially
degrading model quality. SLERP interpolates along the surface of a hypersphere,
preserving vector magnitudes while smoothly transitioning between the two models.
This typically produces higher quality merges.
Usage
fromtransformersimportAutoModelForCausalLM,AutoTokenizermodel=AutoModelForCausalLM.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")tokenizer=AutoTokenizer.from_pretrained("Ringkvist/SmolLM-135M-SLERP-Merge")inputs=tokenizer("Explain what photosynthesis is:",return_tensors="pt")outputs=model.generate(**inputs,max_new_tokens=100,temperature=0.7)print(tokenizer.decode(outputs[0],skip_special_tokens=True))