初始化项目，由ModelHub XC社区提供模型

Model: Kelexine/LFM2.5-1.2B-Thinking-GGUF Source: Original Platform
2026-04-19 12:04:17 +08:00
commit 1e3bc228cc
3 changed files with 160 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,121 @@
+---
+license: other
+license_name: lfm-1.0
+license_link: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE
+language:
+- en
+- ar
+- zh
+- fr
+- de
+- ja
+- ko
+- es
+pipeline_tag: text-generation
+tags:
+- gguf
+- llama.cpp
+- quantized
+- q8_0
+- liquid-ai
+- lfm
+- lfm2
+- conversational
+base_model: LiquidAI/LFM2.5-1.2B-Thinking
+---
+
+# LFM 2.5 1.2B Thinking (GGUF)
+
+## Description
+
+This repository contains the **GGUF** quantized version of [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking), a 1.2 billion parameter "thinking" language model by **Liquid AI**.
+
+The model uses the novel `Lfm2ForCausalLM` architecture featuring a hybrid design of **10 double-gated LIV convolution blocks + 6 GQA attention blocks** — a departure from standard transformer-only designs. This architecture alternates between local convolution-based mixing and sparse global attention, enabling efficient sequence processing with strong reasoning capabilities.
+
+## Model Details
+
+| Property | Value |
+|---|---|
+| **Architecture** | Lfm2ForCausalLM |
+| **Parameter Count** | 1.17B |
+| **Layers** | 16 (10 conv blocks + 6 GQA blocks) |
+| **Hidden Size** | 2048 |
+| **Intermediate (FFN)** | 8192 |
+| **Attention Heads** | 32 |
+| **KV Heads (GQA)** | 8 (on attention layers) |
+| **Context Length** | 32,768 tokens |
+| **Vocabulary Size** | 65,536 |
+| **Languages** | English, Arabic, Chinese, French, German, Japanese, Korean, Spanish |
+| **Quantization** | Q8_0 (8-bit) |
+| **File Type** | GGUF |
+
+## Quantization Details
+
+This model was quantized using **llama.cpp** with the `Q8_0` scheme:
+
+- **Source format**: F16 (converted from HuggingFace safetensors)
+- **Quantization**: Q8_0 — 8-bit quantization with block-wise scaling
+- **Quality**: Near-lossless; ideal for deployment where precision matters
+- **Size reduction**: ~50% smaller than F16 while retaining virtually all model quality
+
+## Usage with llama.cpp
+```bash
+git clone https://github.com/ggml-org/llama.cpp.git
+cd llama.cpp
+cmake -B build && cmake --build build --config Release -j$(nproc)
+
+./build/bin/llama-cli \
+  -hf Kelexine/LFM2.5-1.2B-Thinking-GGUF \
+  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 -cnv
+```
+
+Or with a local file:
+```bash
+./build/bin/llama-cli \
+  -m LFM2.5-1.2B-Thinking-Q8_0.gguf \
+  -p "<|im_start|>user\nYour prompt here<|im_end|>\n<|im_start|>assistant\n" \
+  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096
+```
+
+## Usage with Python (llama-cpp-python)
+```python
+from llama_cpp import Llama
+
+llm = Llama(
+    model_path="LFM2.5-1.2B-Thinking-Q8_0.gguf",
+    n_ctx=4096,
+    temperature=0.05,
+    top_k=50,
+    repeat_penalty=1.05,
+)
+
+response = llm(
+    "<|im_start|>user\nWhat is machine learning?<|im_end|>\n<|im_start|>assistant\n",
+    max_tokens=4096,
+    stop=["<|im_end|>"],
+)
+print(response["choices"][0]["text"])
+```
+
+## Provided Files
+
+| File | Description |
+|---|---|
+| `LFM2.5-1.2B-Thinking-Q8_0.gguf` | 8-bit quantized GGUF (recommended) |
+
+## Limitations
+
+- This is a 1.17B parameter model — suited for lightweight tasks, quick prototyping, and edge deployment.
+- The "Thinking" variant is designed for chain-of-thought reasoning but may produce verbose `<think>...</think>` blocks; strip these in downstream integrations.
+- Requires a recent version of llama.cpp with `Lfm2ForCausalLM` architecture support.
+- Not recommended for knowledge-intensive tasks or programming per Liquid AI's own guidance.
+
+## License
+
+This repository inherits the [LFM 1.0 License](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking/blob/main/LICENSE) from the base model [LiquidAI/LFM2.5-1.2B-Thinking](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking).
+
+## Credits
+
+- **Base model**: [Liquid AI](https://www.liquid.ai/)
+- **Quantization**: kelexine
+- **Framework**: [llama.cpp](https://github.com/ggml-org/llama.cpp) by ggml-org