--- license: other license_name: lfm-1.0 license_link: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE language: - en - ar - zh - fr - de - ja - ko - es pipeline_tag: text-generation tags: - gguf - llama.cpp - quantized - q8_0 - liquid-ai - lfm - lfm2 - conversational base_model: LiquidAI/LFM2.5-1.2B-Thinking --- # LFM 2.5 1.2B Thinking (GGUF) ## Description This repository contains the **GGUF** quantized version of [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking), a 1.2 billion parameter "thinking" language model by **Liquid AI**. The model uses the novel `Lfm2ForCausalLM` architecture featuring a hybrid design of **10 double-gated LIV convolution blocks + 6 GQA attention blocks** — a departure from standard transformer-only designs. This architecture alternates between local convolution-based mixing and sparse global attention, enabling efficient sequence processing with strong reasoning capabilities. ## Model Details | Property | Value | |---|---| | **Architecture** | Lfm2ForCausalLM | | **Parameter Count** | 1.17B | | **Layers** | 16 (10 conv blocks + 6 GQA blocks) | | **Hidden Size** | 2048 | | **Intermediate (FFN)** | 8192 | | **Attention Heads** | 32 | | **KV Heads (GQA)** | 8 (on attention layers) | | **Context Length** | 32,768 tokens | | **Vocabulary Size** | 65,536 | | **Languages** | English, Arabic, Chinese, French, German, Japanese, Korean, Spanish | | **Quantization** | Q8_0 (8-bit) | | **File Type** | GGUF | ## Quantization Details This model was quantized using **llama.cpp** with the `Q8_0` scheme: - **Source format**: F16 (converted from HuggingFace safetensors) - **Quantization**: Q8_0 — 8-bit quantization with block-wise scaling - **Quality**: Near-lossless; ideal for deployment where precision matters - **Size reduction**: ~50% smaller than F16 while retaining virtually all model quality ## Usage with llama.cpp ```bash git clone https://github.com/ggml-org/llama.cpp.git cd llama.cpp cmake -B build && cmake --build build --config Release -j$(nproc) ./build/bin/llama-cli \ -hf Kelexine/LFM2.5-1.2B-Thinking-GGUF \ --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 -cnv ``` Or with a local file: ```bash ./build/bin/llama-cli \ -m LFM2.5-1.2B-Thinking-Q8_0.gguf \ -p "<|im_start|>user\nYour prompt here<|im_end|>\n<|im_start|>assistant\n" \ --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 ``` ## Usage with Python (llama-cpp-python) ```python from llama_cpp import Llama llm = Llama( model_path="LFM2.5-1.2B-Thinking-Q8_0.gguf", n_ctx=4096, temperature=0.05, top_k=50, repeat_penalty=1.05, ) response = llm( "<|im_start|>user\nWhat is machine learning?<|im_end|>\n<|im_start|>assistant\n", max_tokens=4096, stop=["<|im_end|>"], ) print(response["choices"][0]["text"]) ``` ## Provided Files | File | Description | |---|---| | `LFM2.5-1.2B-Thinking-Q8_0.gguf` | 8-bit quantized GGUF (recommended) | ## Limitations - This is a 1.17B parameter model — suited for lightweight tasks, quick prototyping, and edge deployment. - The "Thinking" variant is designed for chain-of-thought reasoning but may produce verbose `...` blocks; strip these in downstream integrations. - Requires a recent version of llama.cpp with `Lfm2ForCausalLM` architecture support. - Not recommended for knowledge-intensive tasks or programming per Liquid AI's own guidance. ## License This repository inherits the [LFM 1.0 License](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE) from the base model [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking). ## Credits - **Base model**: [Liquid AI](https://www.liquid.ai/) - **Quantization**: kelexine - **Framework**: [llama.cpp](https://github.com/ggml-org/llama.cpp) by ggml-org