Files
glm5.1-distill-GGUF/README.md
ModelHub XC e4aa527960 初始化项目,由ModelHub XC社区提供模型
Model: yasserrmd/glm5.1-distill-GGUF
Source: Original Platform
2026-06-19 01:17:18 +08:00

2.6 KiB

license, language, library_name, pipeline_tag, base_model, tags
license language library_name pipeline_tag base_model tags
apache-2.0
en
gguf text-generation yasserrmd/glm5.1-distill
gguf
llama.cpp
lfm2
liquid-ai
edge
text-generation-inference
conversational

glm5.1-distill-GGUF

GGUF quantizations of yasserrmd/glm5.1-distill, produced with convert_hf_to_gguf.py and llama-quantize from ggml-org/llama.cpp.

The quant ladder mirrors Liquid AI's own LFM2.5-1.2B GGUF releases (e.g. LiquidAI/LFM2.5-1.2B-Base-GGUF).

Files

File Quantization Size
glm5.1-distill-BF16.gguf BF16 2.18 GB
glm5.1-distill-Q4_0.gguf Q4_0 664 MB
glm5.1-distill-Q4_K_M.gguf Q4_K_M 697 MB
glm5.1-distill-Q5_K_M.gguf Q5_K_M 804 MB
glm5.1-distill-Q6_K.gguf Q6_K 918 MB
glm5.1-distill-Q8_0.gguf Q8_0 1.16 GB

Quickstart with llama.cpp

Run any quant directly from the Hub:

llama-cli -hf yasserrmd/glm5.1-distill-GGUF:Q4_K_M     --jinja --ctx-size 32768     --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05

Or download manually and serve via llama-server:

huggingface-cli download yasserrmd/glm5.1-distill-GGUF     --include "*Q4_K_M*" --local-dir ./glm5.1-distill-GGUF

llama-server     --model ./glm5.1-distill-GGUF/glm5.1-distill-Q4_K_M.gguf     --alias "yasserrmd/glm5.1-distill"     --threads -1 --n-gpu-layers 99     --ctx-size 32768 --port 8001     --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05     --jinja

The recommended sampling parameters above are the official ones published by Liquid AI for the LFM2.5 family.

Quickstart with Ollama

ollama run hf.co/yasserrmd/glm5.1-distill-GGUF:Q4_K_M

Choosing a quant

Use case Recommended
Maximum quality, plenty of RAM Q8_0 or Q6_K
Balanced default Q4_K_M (matches Liquid AI's recommendation)
Smallest footprint, mobile / IoT Q4_0
Lossless reference BF16 (only if you need it for further re-quantization)

Note

: imatrix-based quantization is currently not supported for the LFM2 architecture in upstream llama.cpp (issue #14979). These files are plain k-quants, the same scheme used in Liquid AI's official GGUF releases.

Source model

For training data, hyperparameters, evaluation, and limitations see the source repo: yasserrmd/glm5.1-distill.