34 lines
2.0 KiB
Markdown
34 lines
2.0 KiB
Markdown
|
|
---
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
base_model:
|
||
|
|
- LiquidAI/LFM2.5-8B-A1B
|
||
|
|
---
|
||
|
|
These are **MXFP4** quantizations of the model [LiquidAI / LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B)
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
1. Download the latest release of [**llama.cpp**](https://github.com/ggml-org/llama.cpp/releases).
|
||
|
|
2. Download your preferred model variant from below.
|
||
|
|
|
||
|
|
## Which version should I choose?
|
||
|
|
All FP4 variants use **MXFP4** for the MoE (Mixture of Experts) weights to keep the model efficient.
|
||
|
|
I've included also a new type Q8_XL_MOE, that uses Q8_0 for MoE tensors and BF16 for everything else.
|
||
|
|
The difference lies in how the remaining tensors are handled:
|
||
|
|
|
||
|
|
| Variant | Quality | Performance | MoE Tensors | Other Tensors | Size | Recommendation |
|
||
|
|
| :----------------- | :------------ | :---------- | :---------: | :-----------: | ------: | :-------------------------------------------------------------- |
|
||
|
|
| **Q8_XL_MOE** | ⭐⭐⭐⭐⭐ | Variable\* | Q8_0 | FP16 | 9.02GiB | Maximum quality, uses Q8_0 instead of MXFP4 for the MoE weights. |
|
||
|
|
| **MXFP4_MOE_BF16** | ⭐⭐⭐ | Variable\* | MXFP4 | FP16 | 5.18GiB | Best for maximum accuracy; original unquantized weights. |
|
||
|
|
| **MXFP4_MOE_F16** | ⭐⭐ | Fast | MXFP4 | F16 | 5.18GiB | Great alternative if BF16 is slow on your hardware. |
|
||
|
|
| **MXFP4_MOE** | ⭐ | Fastest | MXFP4 | Q8_0 | 4.79GiB | Balanced performance and memory usage. |
|
||
|
|
|
||
|
|
**Note:** On some older architectures, BF16 may be slower than F16.
|
||
|
|
Check that your GPU supports native BF16 acceleration, otherwise it would be better to get the F16 version.
|
||
|
|
|
||
|
|
Recommended parameters from LiquidAI:
|
||
|
|
- temperature 0.2
|
||
|
|
- top_p 80
|
||
|
|
- repetition_penalty 1.05
|
||
|
|
|
||
|
|
The chat template has been updated to fix the tool calling issues.
|
||
|
|
If you don't want to download the model again, you can use the template from the parent model.
|