LFM2.5-8B-A1B-MXFP4_MOE-GGUF

Go to file

ModelHub XC 9cd025f83b 初始化项目，由ModelHub XC社区提供模型

Model: noctrex/LFM2.5-8B-A1B-MXFP4_MOE-GGUF
Source: Original Platform

2026-06-06 08:54:15 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-06 08:54:15 +08:00

LFM2.5-8B-A1B-MXFP4_MOE_BF16.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-06 08:54:15 +08:00

LFM2.5-8B-A1B-MXFP4_MOE_F16.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-06 08:54:15 +08:00

LFM2.5-8B-A1B-MXFP4_MOE.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-06 08:54:15 +08:00

LFM2.5-8B-A1B-Q8_XL_MOE.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-06 08:54:15 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-06 08:54:15 +08:00

README.md

pipeline_tag, base_model

pipeline_tag

base_model

text-generation

LiquidAI/LFM2.5-8B-A1B

These are MXFP4 quantizations of the model LiquidAI / LFM2.5-8B-A1B

Quick Start

Download the latest release of llama.cpp.
Download your preferred model variant from below.

Which version should I choose?

All FP4 variants use MXFP4 for the MoE (Mixture of Experts) weights to keep the model efficient.
I've included also a new type Q8_XL_MOE, that uses Q8_0 for MoE tensors and BF16 for everything else. The difference lies in how the remaining tensors are handled:

Variant	Quality	Performance	MoE Tensors	Other Tensors	Size	Recommendation
Q8_XL_MOE	⭐⭐⭐⭐⭐	Variable*	Q8_0	FP16	9.02GiB	Maximum quality, uses Q8_0 instead of MXFP4 for the MoE weights.
MXFP4_MOE_BF16	⭐⭐⭐	Variable*	MXFP4	FP16	5.18GiB	Best for maximum accuracy; original unquantized weights.
MXFP4_MOE_F16	⭐⭐	Fast	MXFP4	F16	5.18GiB	Great alternative if BF16 is slow on your hardware.
MXFP4_MOE	⭐	Fastest	MXFP4	Q8_0	4.79GiB	Balanced performance and memory usage.

Note: On some older architectures, BF16 may be slower than F16.
Check that your GPU supports native BF16 acceleration, otherwise it would be better to get the F16 version.

Recommended parameters from LiquidAI:

temperature 0.2
top_p 80
repetition_penalty 1.05

The chat template has been updated to fix the tool calling issues. If you don't want to download the model again, you can use the template from the parent model.