初始化项目,由ModelHub XC社区提供模型
Model: noctrex/LFM2.5-8B-A1B-MXFP4_MOE-GGUF Source: Original Platform
This commit is contained in:
33
README.md
Normal file
33
README.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
pipeline_tag: text-generation
|
||||
base_model:
|
||||
- LiquidAI/LFM2.5-8B-A1B
|
||||
---
|
||||
These are **MXFP4** quantizations of the model [LiquidAI / LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B)
|
||||
|
||||
## Quick Start
|
||||
1. Download the latest release of [**llama.cpp**](https://github.com/ggml-org/llama.cpp/releases).
|
||||
2. Download your preferred model variant from below.
|
||||
|
||||
## Which version should I choose?
|
||||
All FP4 variants use **MXFP4** for the MoE (Mixture of Experts) weights to keep the model efficient.
|
||||
I've included also a new type Q8_XL_MOE, that uses Q8_0 for MoE tensors and BF16 for everything else.
|
||||
The difference lies in how the remaining tensors are handled:
|
||||
|
||||
| Variant | Quality | Performance | MoE Tensors | Other Tensors | Size | Recommendation |
|
||||
| :----------------- | :------------ | :---------- | :---------: | :-----------: | ------: | :-------------------------------------------------------------- |
|
||||
| **Q8_XL_MOE** | ⭐⭐⭐⭐⭐ | Variable\* | Q8_0 | FP16 | 9.02GiB | Maximum quality, uses Q8_0 instead of MXFP4 for the MoE weights. |
|
||||
| **MXFP4_MOE_BF16** | ⭐⭐⭐ | Variable\* | MXFP4 | FP16 | 5.18GiB | Best for maximum accuracy; original unquantized weights. |
|
||||
| **MXFP4_MOE_F16** | ⭐⭐ | Fast | MXFP4 | F16 | 5.18GiB | Great alternative if BF16 is slow on your hardware. |
|
||||
| **MXFP4_MOE** | ⭐ | Fastest | MXFP4 | Q8_0 | 4.79GiB | Balanced performance and memory usage. |
|
||||
|
||||
**Note:** On some older architectures, BF16 may be slower than F16.
|
||||
Check that your GPU supports native BF16 acceleration, otherwise it would be better to get the F16 version.
|
||||
|
||||
Recommended parameters from LiquidAI:
|
||||
- temperature 0.2
|
||||
- top_p 80
|
||||
- repetition_penalty 1.05
|
||||
|
||||
The chat template has been updated to fix the tool calling issues.
|
||||
If you don't want to download the model again, you can use the template from the parent model.
|
||||
Reference in New Issue
Block a user