121 lines
3.8 KiB
Markdown
121 lines
3.8 KiB
Markdown
|
|
---
|
||
|
|
license: other
|
||
|
|
license_name: lfm-1.0
|
||
|
|
license_link: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
- ar
|
||
|
|
- zh
|
||
|
|
- fr
|
||
|
|
- de
|
||
|
|
- ja
|
||
|
|
- ko
|
||
|
|
- es
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
tags:
|
||
|
|
- gguf
|
||
|
|
- llama.cpp
|
||
|
|
- quantized
|
||
|
|
- q8_0
|
||
|
|
- liquid-ai
|
||
|
|
- lfm
|
||
|
|
- lfm2
|
||
|
|
- conversational
|
||
|
|
base_model: LiquidAI/LFM2.5-1.2B-Thinking
|
||
|
|
---
|
||
|
|
|
||
|
|
# LFM 2.5 1.2B Thinking (GGUF)
|
||
|
|
|
||
|
|
## Description
|
||
|
|
|
||
|
|
This repository contains the **GGUF** quantized version of [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking), a 1.2 billion parameter "thinking" language model by **Liquid AI**.
|
||
|
|
|
||
|
|
The model uses the novel `Lfm2ForCausalLM` architecture featuring a hybrid design of **10 double-gated LIV convolution blocks + 6 GQA attention blocks** — a departure from standard transformer-only designs. This architecture alternates between local convolution-based mixing and sparse global attention, enabling efficient sequence processing with strong reasoning capabilities.
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
| Property | Value |
|
||
|
|
|---|---|
|
||
|
|
| **Architecture** | Lfm2ForCausalLM |
|
||
|
|
| **Parameter Count** | 1.17B |
|
||
|
|
| **Layers** | 16 (10 conv blocks + 6 GQA blocks) |
|
||
|
|
| **Hidden Size** | 2048 |
|
||
|
|
| **Intermediate (FFN)** | 8192 |
|
||
|
|
| **Attention Heads** | 32 |
|
||
|
|
| **KV Heads (GQA)** | 8 (on attention layers) |
|
||
|
|
| **Context Length** | 32,768 tokens |
|
||
|
|
| **Vocabulary Size** | 65,536 |
|
||
|
|
| **Languages** | English, Arabic, Chinese, French, German, Japanese, Korean, Spanish |
|
||
|
|
| **Quantization** | Q8_0 (8-bit) |
|
||
|
|
| **File Type** | GGUF |
|
||
|
|
|
||
|
|
## Quantization Details
|
||
|
|
|
||
|
|
This model was quantized using **llama.cpp** with the `Q8_0` scheme:
|
||
|
|
|
||
|
|
- **Source format**: F16 (converted from HuggingFace safetensors)
|
||
|
|
- **Quantization**: Q8_0 — 8-bit quantization with block-wise scaling
|
||
|
|
- **Quality**: Near-lossless; ideal for deployment where precision matters
|
||
|
|
- **Size reduction**: ~50% smaller than F16 while retaining virtually all model quality
|
||
|
|
|
||
|
|
## Usage with llama.cpp
|
||
|
|
```bash
|
||
|
|
git clone https://github.com/ggml-org/llama.cpp.git
|
||
|
|
cd llama.cpp
|
||
|
|
cmake -B build && cmake --build build --config Release -j$(nproc)
|
||
|
|
|
||
|
|
./build/bin/llama-cli \
|
||
|
|
-hf Kelexine/LFM2.5-1.2B-Thinking-GGUF \
|
||
|
|
--temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 -cnv
|
||
|
|
```
|
||
|
|
|
||
|
|
Or with a local file:
|
||
|
|
```bash
|
||
|
|
./build/bin/llama-cli \
|
||
|
|
-m LFM2.5-1.2B-Thinking-Q8_0.gguf \
|
||
|
|
-p "<|im_start|>user\nYour prompt here<|im_end|>\n<|im_start|>assistant\n" \
|
||
|
|
--temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage with Python (llama-cpp-python)
|
||
|
|
```python
|
||
|
|
from llama_cpp import Llama
|
||
|
|
|
||
|
|
llm = Llama(
|
||
|
|
model_path="LFM2.5-1.2B-Thinking-Q8_0.gguf",
|
||
|
|
n_ctx=4096,
|
||
|
|
temperature=0.05,
|
||
|
|
top_k=50,
|
||
|
|
repeat_penalty=1.05,
|
||
|
|
)
|
||
|
|
|
||
|
|
response = llm(
|
||
|
|
"<|im_start|>user\nWhat is machine learning?<|im_end|>\n<|im_start|>assistant\n",
|
||
|
|
max_tokens=4096,
|
||
|
|
stop=["<|im_end|>"],
|
||
|
|
)
|
||
|
|
print(response["choices"][0]["text"])
|
||
|
|
```
|
||
|
|
|
||
|
|
## Provided Files
|
||
|
|
|
||
|
|
| File | Description |
|
||
|
|
|---|---|
|
||
|
|
| `LFM2.5-1.2B-Thinking-Q8_0.gguf` | 8-bit quantized GGUF (recommended) |
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
|
||
|
|
- This is a 1.17B parameter model — suited for lightweight tasks, quick prototyping, and edge deployment.
|
||
|
|
- The "Thinking" variant is designed for chain-of-thought reasoning but may produce verbose `<think>...</think>` blocks; strip these in downstream integrations.
|
||
|
|
- Requires a recent version of llama.cpp with `Lfm2ForCausalLM` architecture support.
|
||
|
|
- Not recommended for knowledge-intensive tasks or programming per Liquid AI's own guidance.
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
This repository inherits the [LFM 1.0 License](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE) from the base model [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking).
|
||
|
|
|
||
|
|
## Credits
|
||
|
|
|
||
|
|
- **Base model**: [Liquid AI](https://www.liquid.ai/)
|
||
|
|
- **Quantization**: kelexine
|
||
|
|
- **Framework**: [llama.cpp](https://github.com/ggml-org/llama.cpp) by ggml-org
|