--- library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE language: - en - ar - zh - fr - de - ja - ko - es pipeline_tag: text-generation tags: - liquid - lfm2.5 - edge - heretic - uncensored - decensored - abliterated base_model: - LiquidAI/LFM2.5-1.2B-Instruct --- # This is a decensored version of [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct), made using [Heretic](https://github.com/p-e-w/heretic) v1.1.0 ## Abliteration parameters | Parameter | Value | | :-------- | :---: | | **direction_index** | 8.14 | | **attn.o_proj.max_weight** | 1.07 | | **attn.o_proj.max_weight_position** | 9.05 | | **attn.o_proj.min_weight** | 0.90 | | **attn.o_proj.min_weight_distance** | 8.96 | | **mlp.down_proj.max_weight** | 1.43 | | **mlp.down_proj.max_weight_position** | 11.39 | | **mlp.down_proj.min_weight** | 1.27 | | **mlp.down_proj.min_weight_distance** | 8.82 | ## Performance | Metric | This model | Original model ([LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct)) | | :----- | :--------: | :---------------------------: | | **KL divergence** | 0.7207 | 0 *(by definition)* | | **Refusals** | 10/100 | 98/100 | -----
|
| [vLLM](https://github.com/vllm-project/vllm) | High-throughput production deployments with GPU. | Link |
|
| [llama.cpp](https://github.com/ggml-org/llama.cpp) | Cross-platform inference with CPU offloading. | Link |
|
| [MLX](https://github.com/ml-explore/mlx) | Apple's machine learning framework optimized for Apple Silicon. | Link | — |
| [LM Studio](https://lmstudio.ai/) | Desktop application for running LLMs locally. | Link | — |
Here's a quick start example with Transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LiquidAI/LFM2.5-1.2B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.1,
top_k=50,
top_p=0.1,
repetition_penalty=1.05,
max_new_tokens=512,
streamer=streamer,
)
```
## 🔧 Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|------|-------------|------|----------|
| CPT ([Unsloth](https://github.com/unslothai/unsloth)) | Continued Pre-Training using Unsloth for text completion. | Link |
|
| CPT ([Unsloth](https://github.com/unslothai/unsloth)) | Continued Pre-Training using Unsloth for translation. | Link |
|
| SFT ([Unsloth](https://github.com/unslothai/unsloth)) | Supervised Fine-Tuning with LoRA using Unsloth. | Link |
|
| SFT ([TRL](https://github.com/huggingface/trl)) | Supervised Fine-Tuning with LoRA using TRL. | Link |
|
| DPO ([TRL](https://github.com/huggingface/trl)) | Direct Preference Optimization with LoRA using TRL. | Link |
|
| GRPO ([Unsloth](https://github.com/unslothai/unsloth)) | GRPO with LoRA using Unsloth. | Link |
|
## 📊 Performance
### Benchmarks
We compared LFM2.5-1.2B-Instruct with relevant sub-2B models on a diverse suite of benchmarks.
| Model | GPQA | MMLU-Pro | IFEval | IFBench | Multi-IF | AIME25 | BFCLv3 |
|-------|------|----------|--------|---------|----------|--------|--------|
| **LFM2.5-1.2B-Instruct** | 38.89 | 44.35 | 86.23 | 47.33 | 60.98 | 14.00 | 49.12 |
| Qwen3-1.7B (instruct)| 34.85 | 42.91 | 73.68 | 21.33 | 56.48 | 9.33 | 46.30 |
| Granite 4.0-1B | 24.24 | 33.53 | 79.61 | 21.00 | 43.65 | 3.33 | 52.43 |
| Llama 3.2 1B Instruct | 16.57 | 20.80 | 52.37 | 15.93 | 30.16 | 0.33 | 21.44 |
| Gemma 3 1B IT | 24.24 | 14.04 | 63.25 | 20.47 | 44.31 | 1.00 | 16.64 |
GPQA, MMLU-Pro, IFBench, and AIME25 follow [ArtificialAnalysis's methodology](https://artificialanalysis.ai/methodology/intelligence-benchmarking). For IFEval and Multi-IF, we report the average score across strict and loose prompt and instruction accuracies. For BFCLv3, we report the final weighted average score with a custom Liquid handler to support our tool use template.
### Inference speed
LFM2.5-1.2B-Instruct offers extremely fast inference speed on CPUs with a low memory profile compared to similar-sized models.

In addition, we are partnering with AMD, Qualcomm, and Nexa AI to bring the LFM2.5 family to NPUs. These optimized models are available through our partners, enabling highly efficient on-device inference.
The following numbers have been calculated using 1K prefill and 100 decode tokens:
| Device | Inference | Framework | Model | Prefill (tok/s) | Decode (tok/s) | Memory (GB) |
| ---------------------------------------------------- | --------- | ---------------- | -------------------- | --------------- | -------------- | ----------- |
| Qualcomm Snapdragon® X Elite | NPU | NexaML | LFM2.5-1.2B-Instruct | 2591 | 63 | 0.9GB |
| Qualcomm Snapdragon® Gen4 (ROG Phone9 Pro) | NPU | NexaML | LFM2.5-1.2B-Instruct | 4391 | 82 | 0.9GB |
| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | LFM2.5-1.2B-Instruct | 335 | 70 | 719MB |
| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | Qwen3-1.7B | 181 | 40 | 1306MB |
These capabilities unlock new deployment scenarios across various devices, including vehicles, mobile devices, laptops, IoT devices, and embedded systems.
## Contact
For enterprise solutions and edge deployment, contact [sales@liquid.ai](mailto:sales@liquid.ai).
## Citation
```bibtex
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
```