---
library_name: transformers
license: other
license_name: lfm1.0
license_link: LICENSE
language:
- en
- ar
- zh
- fr
- de
- ja
- ko
- es
pipeline_tag: text-generation
tags:
- liquid
- lfm2.5
- edge
- heretic
- uncensored
- decensored
- abliterated
base_model: LiquidAI/LFM2.5-1.2B-Instruct
---
This is an **LFM2.5-1.2B-Instruct** fine-tune, produced through P-E-W's [Heretic](https://github.com/p-e-w/heretic) (v1.1.0) abliteration engine with [Magnitude-Preserving Orthogonal Ablation](https://github.com/p-e-w/heretic/pull/52) enabled and [Hybrid Layer Support](https://github.com/p-e-w/heretic/pull/43) experimentally forward-ported.
---
**Heretication Results**
| Score Metric | Value | Parameter | Value |
| :--- | :--- | :--- | :--- |
| **Refusals** | 7/100 | **direction_index** | 10.10 |
| **KL Divergence** | 0.0679| **attn.o_proj.max_weight** | 3.02 |
| **Initial Refusals** | 99/100 | **attn.o_proj.max_weight_position** | 9.29 |
||| **attn.out_proj.min_weight** | 2.51 |
||| **attn.out_proj.min_weight_distance** | 3.99 |
||| **conv.out_proj.max_weight** | 1.95 |
||| **conv.out_proj.max_weight_position** | 11.58 |
||| **conv.out_proj.min_weight** | 0.67 |
||| **conv.out_proj.min_weight_distance** | 3.59 |
||| **mlp.w2.max_weight** | 1.68 |
||| **mlp.w2.max_weight_position** | 14.18 |
||| **mlp.w2.min_weight** | 0.59 |
||| **mlp.w2.min_weight_distance** | 8.52 |
---
## Degree of Heretication
The **Heresy Index** weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.
| Index Entry | Classification | Analysis |
| :--- | :--- | :--- |
|  | **Absolute Heresy** | Less than 10/100 Refusals and 0.10 KL Divergence |
|  | **Tainted Heresy** | Around 25-11/100 Refusals and/or -0.20-0.11 KL Divergence |
|  | **Impotent Heresy** | Anything above 25/100 Refusals and 0.21 KL Divergence |
**Note**: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.
---
|
| [vLLM](https://github.com/vllm-project/vllm) | High-throughput production deployments with GPU. | Link |
|
| [llama.cpp](https://github.com/ggml-org/llama.cpp) | Cross-platform inference with CPU offloading. | Link |
|
| [MLX](https://github.com/ml-explore/mlx) | Apple's machine learning framework optimized for Apple Silicon. | Link | — |
| [LM Studio](https://lmstudio.ai/) | Desktop application for running LLMs locally. | Link | — |
Here's a quick start example with Transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LiquidAI/LFM2.5-1.2B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.1,
top_k=50,
top_p=0.1,
repetition_penalty=1.05,
max_new_tokens=512,
streamer=streamer,
)
```
## 🔧 Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|------|-------------|------|----------|
| CPT ([Unsloth](https://github.com/unslothai/unsloth)) | Continued Pre-Training using Unsloth for text completion. | Link |
|
| CPT ([Unsloth](https://github.com/unslothai/unsloth)) | Continued Pre-Training using Unsloth for translation. | Link |
|
| SFT ([Unsloth](https://github.com/unslothai/unsloth)) | Supervised Fine-Tuning with LoRA using Unsloth. | Link |
|
| SFT ([TRL](https://github.com/huggingface/trl)) | Supervised Fine-Tuning with LoRA using TRL. | Link |
|
| DPO ([TRL](https://github.com/huggingface/trl)) | Direct Preference Optimization with LoRA using TRL. | Link |
|
| GRPO ([Unsloth](https://github.com/unslothai/unsloth)) | GRPO with LoRA using Unsloth. | Link |
|
| GRPO ([TRL](https://github.com/huggingface/trl)) | GRPO with LoRA using TRL. | Link |
|
## 📊 Performance
### Benchmarks
We compared LFM2.5-1.2B-Instruct with relevant sub-2B models on a diverse suite of benchmarks.
| Model | GPQA | MMLU-Pro | IFEval | IFBench | Multi-IF | AIME25 | BFCLv3 |
|-------|------|----------|--------|---------|----------|--------|--------|
| **LFM2.5-1.2B-Instruct** | 38.89 | 44.35 | 86.23 | 47.33 | 60.98 | 14.00 | 49.12 |
| Qwen3-1.7B (instruct)| 34.85 | 42.91 | 73.68 | 21.33 | 56.48 | 9.33 | 46.30 |
| Granite 4.0-1B | 24.24 | 33.53 | 79.61 | 21.00 | 43.65 | 3.33 | 52.43 |
| Llama 3.2 1B Instruct | 16.57 | 20.80 | 52.37 | 15.93 | 30.16 | 0.33 | 21.44 |
| Gemma 3 1B IT | 24.24 | 14.04 | 63.25 | 20.47 | 44.31 | 1.00 | 16.64 |
GPQA, MMLU-Pro, IFBench, and AIME25 follow [ArtificialAnalysis's methodology](https://artificialanalysis.ai/methodology/intelligence-benchmarking). For IFEval and Multi-IF, we report the average score across strict and loose prompt and instruction accuracies. For BFCLv3, we report the final weighted average score with a custom Liquid handler to support our tool use template.
### Inference speed
LFM2.5-1.2B-Instruct offers extremely fast inference speed on CPUs with a low memory profile compared to similar-sized models.

In addition, we are partnering with AMD, Qualcomm, and Nexa AI to bring the LFM2.5 family to NPUs. These optimized models are available through our partners, enabling highly efficient on-device inference.
The following numbers have been calculated using 1K prefill and 100 decode tokens:
| Device | Inference | Framework | Model | Prefill (tok/s) | Decode (tok/s) | Memory (GB) |
| ---------------------------------------------------- | --------- | ---------------- | -------------------- | --------------- | -------------- | ----------- |
| Qualcomm Snapdragon® X Elite | NPU | NexaML | LFM2.5-1.2B-Instruct | 2591 | 63 | 0.9GB |
| Qualcomm Snapdragon® Gen4 (ROG Phone9 Pro) | NPU | NexaML | LFM2.5-1.2B-Instruct | 4391 | 82 | 0.9GB |
| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | LFM2.5-1.2B-Instruct | 335 | 70 | 719MB |
| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | Qwen3-1.7B | 181 | 40 | 1306MB |
These capabilities unlock new deployment scenarios across various devices, including vehicles, mobile devices, laptops, IoT devices, and embedded systems.
## Contact
For enterprise solutions and edge deployment, contact [sales@liquid.ai](mailto:sales@liquid.ai).
## Citation
```bibtex
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
```