LFM2.5-1.2B-Thinking-GGUF/README.md

---
license: other
license_name: lfm-1.0
license_link: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE
language:
- en
- ar
- zh
- fr
- de
- ja
- ko
- es
pipeline_tag: text-generation
tags:
- gguf
- llama.cpp
- quantized
- q8_0
- liquid-ai
- lfm
- lfm2
- conversational
base_model: LiquidAI/LFM2.5-1.2B-Thinking
---

# LFM 2.5 1.2B Thinking (GGUF)

## Description

This repository contains the **GGUF** quantized version of [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking), a 1.2 billion parameter "thinking" language model by **Liquid AI**.

The model uses the novel `Lfm2ForCausalLM` architecture featuring a hybrid design of **10 double-gated LIV convolution blocks + 6 GQA attention blocks** — a departure from standard transformer-only designs. This architecture alternates between local convolution-based mixing and sparse global attention, enabling efficient sequence processing with strong reasoning capabilities.

## Model Details

| Property | Value |
|---|---|
| **Architecture** | Lfm2ForCausalLM |
| **Parameter Count** | 1.17B |
| **Layers** | 16 (10 conv blocks + 6 GQA blocks) |
| **Hidden Size** | 2048 |
| **Intermediate (FFN)** | 8192 |
| **Attention Heads** | 32 |
| **KV Heads (GQA)** | 8 (on attention layers) |
| **Context Length** | 32,768 tokens |
| **Vocabulary Size** | 65,536 |
| **Languages** | English, Arabic, Chinese, French, German, Japanese, Korean, Spanish |
| **Quantization** | Q8_0 (8-bit) |
| **File Type** | GGUF |

## Quantization Details

This model was quantized using **llama.cpp** with the `Q8_0` scheme:

- **Source format**: F16 (converted from HuggingFace safetensors)
- **Quantization**: Q8_0 — 8-bit quantization with block-wise scaling
- **Quality**: Near-lossless; ideal for deployment where precision matters
- **Size reduction**: ~50% smaller than F16 while retaining virtually all model quality

## Usage with llama.cpp
```bash
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build && cmake --build build --config Release -j$(nproc)

./build/bin/llama-cli \
  -hf Kelexine/LFM2.5-1.2B-Thinking-GGUF \
  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 -cnv
```

Or with a local file:
```bash
./build/bin/llama-cli \
  -m LFM2.5-1.2B-Thinking-Q8_0.gguf \
  -p "<|im_start|>user\nYour prompt here<|im_end|>\n<|im_start|>assistant\n" \
  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096
```

## Usage with Python (llama-cpp-python)
```python
from llama_cpp import Llama

llm = Llama(
    model_path="LFM2.5-1.2B-Thinking-Q8_0.gguf",
    n_ctx=4096,
    temperature=0.05,
    top_k=50,
    repeat_penalty=1.05,
)

response = llm(
    "<|im_start|>user\nWhat is machine learning?<|im_end|>\n<|im_start|>assistant\n",
    max_tokens=4096,
    stop=["<|im_end|>"],
)
print(response["choices"][0]["text"])
```

## Provided Files

| File | Description |
|---|---|
| `LFM2.5-1.2B-Thinking-Q8_0.gguf` | 8-bit quantized GGUF (recommended) |

## Limitations

- This is a 1.17B parameter model — suited for lightweight tasks, quick prototyping, and edge deployment.
- The "Thinking" variant is designed for chain-of-thought reasoning but may produce verbose `<think>...</think>` blocks; strip these in downstream integrations.
- Requires a recent version of llama.cpp with `Lfm2ForCausalLM` architecture support.
- Not recommended for knowledge-intensive tasks or programming per Liquid AI's own guidance.

## License

This repository inherits the [LFM 1.0 License](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE) from the base model [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking).

## Credits

- **Base model**: [Liquid AI](https://www.liquid.ai/)
- **Quantization**: kelexine
- **Framework**: [llama.cpp](https://github.com/ggml-org/llama.cpp) by ggml-org
初始化项目，由ModelHub XC社区提供模型 Model: Kelexine/LFM2.5-1.2B-Thinking-GGUF Source: Original Platform 2026-04-19 12:04:17 +08:00			`---`
			`license: other`
			`license_name: lfm-1.0`
			`license_link: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE`
			`language:`
			`- en`
			`- ar`
			`- zh`
			`- fr`
			`- de`
			`- ja`
			`- ko`
			`- es`
			`pipeline_tag: text-generation`
			`tags:`
			`- gguf`
			`- llama.cpp`
			`- quantized`
			`- q8_0`
			`- liquid-ai`
			`- lfm`
			`- lfm2`
			`- conversational`
			`base_model: LiquidAI/LFM2.5-1.2B-Thinking`
			`---`

			`# LFM 2.5 1.2B Thinking (GGUF)`

			`## Description`

			`This repository contains the GGUF quantized version of [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking), a 1.2 billion parameter "thinking" language model by Liquid AI.`

			The model uses the novel `Lfm2ForCausalLM` architecture featuring a hybrid design of 10 double-gated LIV convolution blocks + 6 GQA attention blocks — a departure from standard transformer-only designs. This architecture alternates between local convolution-based mixing and sparse global attention, enabling efficient sequence processing with strong reasoning capabilities.

			`## Model Details`

			`\| Property \| Value \|`
			`\|---\|---\|`
			`\| Architecture \| Lfm2ForCausalLM \|`
			`\| Parameter Count \| 1.17B \|`
			`\| Layers \| 16 (10 conv blocks + 6 GQA blocks) \|`
			`\| Hidden Size \| 2048 \|`
			`\| Intermediate (FFN) \| 8192 \|`
			`\| Attention Heads \| 32 \|`
			`\| KV Heads (GQA) \| 8 (on attention layers) \|`
			`\| Context Length \| 32,768 tokens \|`
			`\| Vocabulary Size \| 65,536 \|`
			`\| Languages \| English, Arabic, Chinese, French, German, Japanese, Korean, Spanish \|`
			`\| Quantization \| Q8_0 (8-bit) \|`
			`\| File Type \| GGUF \|`

			`## Quantization Details`

			This model was quantized using llama.cpp with the `Q8_0` scheme:

			`- Source format: F16 (converted from HuggingFace safetensors)`
			`- Quantization: Q8_0 — 8-bit quantization with block-wise scaling`
			`- Quality: Near-lossless; ideal for deployment where precision matters`
			`- Size reduction: ~50% smaller than F16 while retaining virtually all model quality`

			`## Usage with llama.cpp`
			```bash
			`git clone https://github.com/ggml-org/llama.cpp.git`
			`cd llama.cpp`
			`cmake -B build && cmake --build build --config Release -j$(nproc)`

			`./build/bin/llama-cli \`
			`-hf Kelexine/LFM2.5-1.2B-Thinking-GGUF \`
			`--temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 -cnv`
			```

			`Or with a local file:`
			```bash
			`./build/bin/llama-cli \`
			`-m LFM2.5-1.2B-Thinking-Q8_0.gguf \`
			`-p "<\|im_start\|>user\nYour prompt here<\|im_end\|>\n<\|im_start\|>assistant\n" \`
			`--temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096`
			```

			`## Usage with Python (llama-cpp-python)`
			```python
			`from llama_cpp import Llama`

			`llm = Llama(`
			`model_path="LFM2.5-1.2B-Thinking-Q8_0.gguf",`
			`n_ctx=4096,`
			`temperature=0.05,`
			`top_k=50,`
			`repeat_penalty=1.05,`
			`)`

			`response = llm(`
			`"<\|im_start\|>user\nWhat is machine learning?<\|im_end\|>\n<\|im_start\|>assistant\n",`
			`max_tokens=4096,`
			`stop=["<\|im_end\|>"],`
			`)`
			`print(response["choices"][0]["text"])`
			```

			`## Provided Files`

			`\| File \| Description \|`
			`\|---\|---\|`
			\| `LFM2.5-1.2B-Thinking-Q8_0.gguf` \| 8-bit quantized GGUF (recommended) \|

			`## Limitations`

			`- This is a 1.17B parameter model — suited for lightweight tasks, quick prototyping, and edge deployment.`
			- The "Thinking" variant is designed for chain-of-thought reasoning but may produce verbose `<think>...</think>` blocks; strip these in downstream integrations.
			- Requires a recent version of llama.cpp with `Lfm2ForCausalLM` architecture support.
			`- Not recommended for knowledge-intensive tasks or programming per Liquid AI's own guidance.`

			`## License`

			`This repository inherits the [LFM 1.0 License](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE) from the base model [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking).`

			`## Credits`

			`- Base model: [Liquid AI](https://www.liquid.ai/)`
			`- Quantization: kelexine`
			`- Framework: [llama.cpp](https://github.com/ggml-org/llama.cpp) by ggml-org`