Model: yasserrmd/glm5.1-distill-GGUF Source: Original Platform
license, language, library_name, pipeline_tag, base_model, tags
| license | language | library_name | pipeline_tag | base_model | tags | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 |
|
gguf | text-generation | yasserrmd/glm5.1-distill |
|
glm5.1-distill-GGUF
GGUF quantizations of yasserrmd/glm5.1-distill,
produced with convert_hf_to_gguf.py and llama-quantize from
ggml-org/llama.cpp.
The quant ladder mirrors Liquid AI's own LFM2.5-1.2B GGUF releases
(e.g. LiquidAI/LFM2.5-1.2B-Base-GGUF).
Files
| File | Quantization | Size |
|---|---|---|
glm5.1-distill-BF16.gguf |
BF16 | 2.18 GB |
glm5.1-distill-Q4_0.gguf |
Q4_0 | 664 MB |
glm5.1-distill-Q4_K_M.gguf |
Q4_K_M | 697 MB |
glm5.1-distill-Q5_K_M.gguf |
Q5_K_M | 804 MB |
glm5.1-distill-Q6_K.gguf |
Q6_K | 918 MB |
glm5.1-distill-Q8_0.gguf |
Q8_0 | 1.16 GB |
Quickstart with llama.cpp
Run any quant directly from the Hub:
llama-cli -hf yasserrmd/glm5.1-distill-GGUF:Q4_K_M --jinja --ctx-size 32768 --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05
Or download manually and serve via llama-server:
huggingface-cli download yasserrmd/glm5.1-distill-GGUF --include "*Q4_K_M*" --local-dir ./glm5.1-distill-GGUF
llama-server --model ./glm5.1-distill-GGUF/glm5.1-distill-Q4_K_M.gguf --alias "yasserrmd/glm5.1-distill" --threads -1 --n-gpu-layers 99 --ctx-size 32768 --port 8001 --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05 --jinja
The recommended sampling parameters above are the official ones published by Liquid AI for the LFM2.5 family.
Quickstart with Ollama
ollama run hf.co/yasserrmd/glm5.1-distill-GGUF:Q4_K_M
Choosing a quant
| Use case | Recommended |
|---|---|
| Maximum quality, plenty of RAM | Q8_0 or Q6_K |
| Balanced default | Q4_K_M (matches Liquid AI's recommendation) |
| Smallest footprint, mobile / IoT | Q4_0 |
| Lossless reference | BF16 (only if you need it for further re-quantization) |
Note
: imatrix-based quantization is currently not supported for the LFM2 architecture in upstream llama.cpp (issue #14979). These files are plain k-quants, the same scheme used in Liquid AI's official GGUF releases.
Source model
For training data, hyperparameters, evaluation, and limitations see the source
repo: yasserrmd/glm5.1-distill.