LFM2.5-8B-A1B-MXFP4_MOE-GGUF/README.md

---
pipeline_tag: text-generation
base_model:
- LiquidAI/LFM2.5-8B-A1B
---
These are **MXFP4** quantizations of the model [LiquidAI / LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B)

## Quick Start
1. Download the latest release of [**llama.cpp**](https://github.com/ggml-org/llama.cpp/releases).
2. Download your preferred model variant from below.

## Which version should I choose?
All FP4 variants use **MXFP4** for the MoE (Mixture of Experts) weights to keep the model efficient.
I've included also a new type Q8_XL_MOE, that uses Q8_0 for MoE tensors and BF16 for everything else.
The difference lies in how the remaining tensors are handled:

| Variant            | Quality       | Performance | MoE Tensors | Other Tensors | Size    | Recommendation                                                  |
| :----------------- | :------------ | :---------- | :---------: | :-----------: | ------: | :-------------------------------------------------------------- |
| **Q8_XL_MOE**      | ⭐⭐⭐⭐⭐ | Variable\*  |     Q8_0    |     FP16       | 9.02GiB | Maximum quality, uses Q8_0 instead of MXFP4 for the MoE weights. |
| **MXFP4_MOE_BF16** | ⭐⭐⭐      | Variable\*  |     MXFP4   |     FP16       | 5.18GiB | Best for maximum accuracy; original unquantized weights.         |
| **MXFP4_MOE_F16**  | ⭐⭐         | Fast       |     MXFP4    |   F16         | 5.18GiB | Great alternative if BF16 is slow on your hardware.             |
| **MXFP4_MOE**      | ⭐           | Fastest     |    MXFP4    |    Q8_0        | 4.79GiB | Balanced performance and memory usage.                           |

**Note:** On some older architectures, BF16 may be slower than F16.
Check that your GPU supports native BF16 acceleration, otherwise it would be better to get the F16 version.

Recommended parameters from LiquidAI:
- temperature 0.2
- top_p 80
- repetition_penalty 1.05

The chat template has been updated to fix the tool calling issues.
If you don't want to download the model again, you can use the template from the parent model.