Go to file

ModelHub XC e4aa527960 初始化项目，由ModelHub XC社区提供模型

Model: yasserrmd/glm5.1-distill-GGUF
Source: Original Platform

2026-06-19 01:17:18 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

glm5.1-distill-BF16.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

glm5.1-distill-Q4_0.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

glm5.1-distill-Q4_K_M.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

glm5.1-distill-Q5_K_M.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

glm5.1-distill-Q6_K.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

glm5.1-distill-Q8_0.gguf

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-06-19 01:17:18 +08:00

README.md

license, language, library_name, pipeline_tag, base_model, tags

license

language

library_name

pipeline_tag

base_model

glm5.1-distill-GGUF

GGUF quantizations of yasserrmd/glm5.1-distill, produced with convert_hf_to_gguf.py and llama-quantize from ggml-org/llama.cpp.

The quant ladder mirrors Liquid AI's own LFM2.5-1.2B GGUF releases (e.g. LiquidAI/LFM2.5-1.2B-Base-GGUF).

Files

File	Quantization	Size
`glm5.1-distill-BF16.gguf`	BF16	2.18 GB
`glm5.1-distill-Q4_0.gguf`	Q4_0	664 MB
`glm5.1-distill-Q4_K_M.gguf`	Q4_K_M	697 MB
`glm5.1-distill-Q5_K_M.gguf`	Q5_K_M	804 MB
`glm5.1-distill-Q6_K.gguf`	Q6_K	918 MB
`glm5.1-distill-Q8_0.gguf`	Q8_0	1.16 GB

Quickstart with `llama.cpp`

Run any quant directly from the Hub:

llama-cli -hf yasserrmd/glm5.1-distill-GGUF:Q4_K_M     --jinja --ctx-size 32768     --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05

Or download manually and serve via llama-server:

huggingface-cli download yasserrmd/glm5.1-distill-GGUF     --include "*Q4_K_M*" --local-dir ./glm5.1-distill-GGUF

llama-server     --model ./glm5.1-distill-GGUF/glm5.1-distill-Q4_K_M.gguf     --alias "yasserrmd/glm5.1-distill"     --threads -1 --n-gpu-layers 99     --ctx-size 32768 --port 8001     --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05     --jinja

The recommended sampling parameters above are the official ones published by Liquid AI for the LFM2.5 family.

Quickstart with Ollama

ollama run hf.co/yasserrmd/glm5.1-distill-GGUF:Q4_K_M

Choosing a quant

Use case	Recommended
Maximum quality, plenty of RAM	`Q8_0` or `Q6_K`
Balanced default	`Q4_K_M` (matches Liquid AI's recommendation)
Smallest footprint, mobile / IoT	`Q4_0`
Lossless reference	`BF16` (only if you need it for further re-quantization)

Note

: imatrix-based quantization is currently not supported for the LFM2 architecture in upstream llama.cpp (issue #14979). These files are plain k-quants, the same scheme used in Liquid AI's official GGUF releases.

Source model

For training data, hyperparameters, evaluation, and limitations see the source repo: yasserrmd/glm5.1-distill.

README.md

glm5.1-distill-GGUF

Files

Quickstart with llama.cpp

Quickstart with Ollama

Choosing a quant

Source model

Quickstart with `llama.cpp`