pipeline_tag, base_model
pipeline_tag base_model
text-generation
LiquidAI/LFM2.5-8B-A1B

These are MXFP4 quantizations of the model LiquidAI / LFM2.5-8B-A1B

Quick Start

  1. Download the latest release of llama.cpp.
  2. Download your preferred model variant from below.

Which version should I choose?

All FP4 variants use MXFP4 for the MoE (Mixture of Experts) weights to keep the model efficient.
I've included also a new type Q8_XL_MOE, that uses Q8_0 for MoE tensors and BF16 for everything else. The difference lies in how the remaining tensors are handled:

Variant Quality Performance MoE Tensors Other Tensors Size Recommendation
Q8_XL_MOE Variable* Q8_0 FP16 9.02GiB Maximum quality, uses Q8_0 instead of MXFP4 for the MoE weights.
MXFP4_MOE_BF16 Variable* MXFP4 FP16 5.18GiB Best for maximum accuracy; original unquantized weights.
MXFP4_MOE_F16 Fast MXFP4 F16 5.18GiB Great alternative if BF16 is slow on your hardware.
MXFP4_MOE Fastest MXFP4 Q8_0 4.79GiB Balanced performance and memory usage.

Note: On some older architectures, BF16 may be slower than F16.
Check that your GPU supports native BF16 acceleration, otherwise it would be better to get the F16 version.

Recommended parameters from LiquidAI:

  • temperature 0.2
  • top_p 80
  • repetition_penalty 1.05

The chat template has been updated to fix the tool calling issues. If you don't want to download the model again, you can use the template from the parent model.

Description
Model synced from source: noctrex/LFM2.5-8B-A1B-MXFP4_MOE-GGUF
Readme 25 KiB