glm5.1-distill/README.md

---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
base_model: LiquidAI/LFM2.5-1.2B-Base
tags:
- lfm2
- liquid-ai
- distillation
- reasoning
- glm
- unsloth
- trl
- sft
- text-generation-inference
- conversational
datasets:
- Jackrong/GLM-5.1-Reasoning-1M-Cleaned
model-index:
- name: glm5.1-distill
  results: []
---

# glm5.1-distill

`yasserrmd/glm5.1-distill` is a 1.2B parameter instruction-tuned chat model
built on top of [`LiquidAI/LFM2.5-1.2B-Base`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base).
It is supervised-fine-tuned (SFT) on a 50k subset of
[`Jackrong/GLM-5.1-Reasoning-1M-Cleaned`](https://huggingface.co/datasets/Jackrong/GLM-5.1-Reasoning-1M-Cleaned),
a cleaned reasoning-style chat corpus distilled from the GLM-5.1 family.

The goal is to bring some of the conversational reasoning behavior of larger
GLM-5.1 teacher models into the small, efficient LFM2.5 architecture so it
can run comfortably on a single consumer GPU, on edge devices, or via
quantized runtimes such as ONNX, GGUF, or MLX.

> **Note:** This is an independent community fine-tune. It is not affiliated
> with or endorsed by Liquid AI or Z.ai/THUDM (the GLM authors).

---

## Model summary

| Property | Value |
|---|---|
| Architecture | LFM2 (hybrid conv + attention) |
| Parameters | ~1.2B |
| Tensor dtype | BF16 |
| Context length | 4096 (trained at 2048 with packing) |
| Base model | `LiquidAI/LFM2.5-1.2B-Base` |
| Fine-tuning method | LoRA SFT (merged back to base) |
| Trainer | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) `SFTTrainer` |
| Chat template | LFM2 / ChatML-style (`<|im_start|>` … `<|im_end|>`) |
| License | Apache 2.0 |

---

## Intended use

This model is designed for:

- General assistant-style chat
- Lightweight reasoning, step-by-step answers, and explanations
- On-device and edge deployments where a 1B class model is appropriate
- A starting checkpoint for further domain-specific fine-tuning

It is **not** a safety-aligned, production-ready assistant on its own. Treat
its output as that of a small distilled student model: it can be confidently
wrong, especially on long-horizon math, code correctness, current events,
and anything safety-critical.

### Out of scope

- Medical, legal, financial, or other high-stakes advice
- Any setting that requires guaranteed factuality
- Generating content that violates the Apache 2.0 license terms or the
  upstream LFM2.5 base model license

---

## Quickstart (Transformers)

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "yasserrmd/glm5.1-distill"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain why the sky is blue in two short paragraphs."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

streamer = TextStreamer(tokenizer, skip_prompt=True)

_ = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.1,
    top_k=50,
    top_p=0.1,
    repetition_penalty=1.05,
    streamer=streamer,
)
```

### Recommended sampling

The base LFM2.5 family is sensitive to sampling settings. The following
defaults (inherited from Liquid AI's reference settings) work well:

| Use case | temperature | top_k | top_p | repetition_penalty |
|---|---|---|---|---|
| Factual / short answers | 0.1 | 50 | 0.1 | 1.05 |
| Creative / longer text | 0.7 | 50 | 0.9 | 1.10 |
| Code / structured output | 0.2 | 40 | 0.9 | 1.05 |

---

## Chat template

The tokenizer ships with a ChatML-style template. A two-turn example
serializes to:

```
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
Hey there!<|im_end|>
```

Always use `tokenizer.apply_chat_template(..., add_generation_prompt=True)`
at inference time. Do not hand-roll the prompt.

---

## Training details

### Data

- Source: `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`, `main` config
- Slice: first 50,000 rows of the `train` split
- Format: ShareGPT-style multi-turn conversations, normalized via
  `unsloth.chat_templates.standardize_data_formats`
- Loss masking: `train_on_responses_only` so only assistant tokens
  contribute to the loss

### LoRA configuration

| Hyperparameter | Value |
|---|---|
| Rank `r` | 16 |
| `lora_alpha` | 16 |
| `lora_dropout` | 0 |
| Bias | none |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `out_proj`, `in_proj`, `w1`, `w2`, `w3` |
| Gradient checkpointing | `unsloth` |
| Random seed | 3407 |

### SFT hyperparameters

| Hyperparameter | Value |
|---|---|
| Epochs | 1 |
| Per-device batch size | 32 |
| Gradient accumulation | 1 |
| Effective batch size | 32 |
| Packing | True |
| Max sequence length | 2048 |
| Optimizer | `adamw_torch` |
| Learning rate | 2e-5 |
| LR scheduler | linear |
| Warmup steps | 50 |
| Weight decay | 0.01 |
| Precision | BF16 |
| Seed | 3407 |

### Merge & export

After SFT, the LoRA adapters were merged into the base weights using
Unsloth's `push_to_hub_merged(..., save_method="merged_16bit")`. The
repository contains the resulting full BF16 model, not adapters.

### Hardware

Trained on a single GPU using Unsloth's optimized kernels. End-to-end
training memory and time are dominated by the 50k-row, packed-2048 setup
described above.

---

## Evaluation

No formal benchmark scores are reported for this checkpoint yet. It has
been smoke-tested on:

- General Q&A (e.g. "Why is the sky blue?")
- Short creative writing prompts
- Multi-turn instruction following

Quantitative evaluations on benchmarks such as MMLU, GSM8K, IFEval, or
MT-Bench are left as future work. Contributions via the HF community tab
are welcome.

---

## Limitations and biases

- Inherits all limitations and biases of the LFM2.5 base model and of the
  GLM-5.1-derived training data.
- 1.2B parameters is small. Expect weaker performance than 7B+ chat
  models on hard reasoning, long context, and code generation.
- The training corpus is predominantly English. Other languages will work
  to varying degrees but are not the target.
- The model can hallucinate facts confidently. Verify anything important.

---

## ONNX version

An ONNX export of this model is available at:

**`yasserrmd/glm5.1-distill-onnx`**

It can be used with `onnxruntime` and `optimum` for CPU and accelerated
inference. See that repository's README for usage details.

---

## Citation

If you use this checkpoint, please cite the upstream work as well:

```bibtex
@misc{yasserrmd_glm51_distill_2026,
  title  = {glm5.1-distill: a small LFM2.5 student fine-tuned on GLM-5.1 reasoning data},
  author = {Mohamed Yasser},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/yasserrmd/glm5.1-distill}}
}
```

And the base model and dataset:

- LiquidAI, *LFM2.5-1.2B-Base*, 2025.
- Jackrong, *GLM-5.1-Reasoning-1M-Cleaned*, Hugging Face Datasets.

---

## Acknowledgements

- [Liquid AI](https://huggingface.co/LiquidAI) for the LFM2.5 base model.
- [Jackrong](https://huggingface.co/Jackrong) for the cleaned GLM-5.1
  reasoning dataset.
- [Unsloth](https://github.com/unslothai/unsloth) for the 2x faster SFT
  pipeline and memory-efficient LoRA kernels.
- [Hugging Face TRL](https://github.com/huggingface/trl) for `SFTTrainer`.

[![Made with Unsloth](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)](https://github.com/unslothai/unsloth)
初始化项目，由ModelHub XC社区提供模型 Model: yasserrmd/glm5.1-distill Source: Original Platform 2026-05-31 01:31:28 +08:00			`---`
			`license: apache-2.0`
			`language:`
			`- en`
			`library_name: transformers`
			`pipeline_tag: text-generation`
			`base_model: LiquidAI/LFM2.5-1.2B-Base`
			`tags:`
			`- lfm2`
			`- liquid-ai`
			`- distillation`
			`- reasoning`
			`- glm`
			`- unsloth`
			`- trl`
			`- sft`
			`- text-generation-inference`
			`- conversational`
			`datasets:`
			`- Jackrong/GLM-5.1-Reasoning-1M-Cleaned`
			`model-index:`
			`- name: glm5.1-distill`
			`results: []`
			`---`

			`# glm5.1-distill`

			`yasserrmd/glm5.1-distill` is a 1.2B parameter instruction-tuned chat model
			built on top of [`LiquidAI/LFM2.5-1.2B-Base`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base).
			`It is supervised-fine-tuned (SFT) on a 50k subset of`
			[`Jackrong/GLM-5.1-Reasoning-1M-Cleaned`](https://huggingface.co/datasets/Jackrong/GLM-5.1-Reasoning-1M-Cleaned),
			`a cleaned reasoning-style chat corpus distilled from the GLM-5.1 family.`

			`The goal is to bring some of the conversational reasoning behavior of larger`
			`GLM-5.1 teacher models into the small, efficient LFM2.5 architecture so it`
			`can run comfortably on a single consumer GPU, on edge devices, or via`
			`quantized runtimes such as ONNX, GGUF, or MLX.`

			`> Note: This is an independent community fine-tune. It is not affiliated`
			`> with or endorsed by Liquid AI or Z.ai/THUDM (the GLM authors).`

			`---`

			`## Model summary`

			`\| Property \| Value \|`
			`\|---\|---\|`
			`\| Architecture \| LFM2 (hybrid conv + attention) \|`
			`\| Parameters \| ~1.2B \|`
			`\| Tensor dtype \| BF16 \|`
			`\| Context length \| 4096 (trained at 2048 with packing) \|`
			\| Base model \| `LiquidAI/LFM2.5-1.2B-Base` \|
			`\| Fine-tuning method \| LoRA SFT (merged back to base) \|`
			\| Trainer \| [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) `SFTTrainer` \|
			\| Chat template \| LFM2 / ChatML-style (`<\|im_start\|>` … `<\|im_end\|>`) \|
			`\| License \| Apache 2.0 \|`

			`---`

			`## Intended use`

			`This model is designed for:`

			`- General assistant-style chat`
			`- Lightweight reasoning, step-by-step answers, and explanations`
			`- On-device and edge deployments where a 1B class model is appropriate`
			`- A starting checkpoint for further domain-specific fine-tuning`

			`It is not a safety-aligned, production-ready assistant on its own. Treat`
			`its output as that of a small distilled student model: it can be confidently`
			`wrong, especially on long-horizon math, code correctness, current events,`
			`and anything safety-critical.`

			`### Out of scope`

			`- Medical, legal, financial, or other high-stakes advice`
			`- Any setting that requires guaranteed factuality`
			`- Generating content that violates the Apache 2.0 license terms or the`
			`upstream LFM2.5 base model license`

			`---`

			`## Quickstart (Transformers)`

			```python
			`import torch`
			`from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer`

			`model_id = "yasserrmd/glm5.1-distill"`

			`tokenizer = AutoTokenizer.from_pretrained(model_id)`
			`model = AutoModelForCausalLM.from_pretrained(`
			`model_id,`
			`torch_dtype=torch.bfloat16,`
			`device_map="auto",`
			`)`

			`messages = [`
			`{"role": "user", "content": "Explain why the sky is blue in two short paragraphs."},`
			`]`

			`inputs = tokenizer.apply_chat_template(`
			`messages,`
			`add_generation_prompt=True,`
			`return_tensors="pt",`
			`tokenize=True,`
			`return_dict=True,`
			`).to(model.device)`

			`streamer = TextStreamer(tokenizer, skip_prompt=True)`

			`_ = model.generate(`
			`**inputs,`
			`max_new_tokens=512,`
			`temperature=0.1,`
			`top_k=50,`
			`top_p=0.1,`
			`repetition_penalty=1.05,`
			`streamer=streamer,`
			`)`
			```

			`### Recommended sampling`

			`The base LFM2.5 family is sensitive to sampling settings. The following`
			`defaults (inherited from Liquid AI's reference settings) work well:`

			`\| Use case \| temperature \| top_k \| top_p \| repetition_penalty \|`
			`\|---\|---\|---\|---\|---\|`
			`\| Factual / short answers \| 0.1 \| 50 \| 0.1 \| 1.05 \|`
			`\| Creative / longer text \| 0.7 \| 50 \| 0.9 \| 1.10 \|`
			`\| Code / structured output \| 0.2 \| 40 \| 0.9 \| 1.05 \|`

			`---`

			`## Chat template`

			`The tokenizer ships with a ChatML-style template. A two-turn example`
			`serializes to:`

			```
			`<\|im_start\|>user`
			`Hello!<\|im_end\|>`
			`<\|im_start\|>assistant`
			`Hey there!<\|im_end\|>`
			```

			Always use `tokenizer.apply_chat_template(..., add_generation_prompt=True)`
			`at inference time. Do not hand-roll the prompt.`

			`---`

			`## Training details`

			`### Data`

			- Source: `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`, `main` config
			- Slice: first 50,000 rows of the `train` split
			`- Format: ShareGPT-style multi-turn conversations, normalized via`
			`unsloth.chat_templates.standardize_data_formats`
			- Loss masking: `train_on_responses_only` so only assistant tokens
			`contribute to the loss`

			`### LoRA configuration`

			`\| Hyperparameter \| Value \|`
			`\|---\|---\|`
			\| Rank `r` \| 16 \|
			\| `lora_alpha` \| 16 \|
			\| `lora_dropout` \| 0 \|
			`\| Bias \| none \|`
			\| Target modules \| `q_proj`, `k_proj`, `v_proj`, `out_proj`, `in_proj`, `w1`, `w2`, `w3` \|
			\| Gradient checkpointing \| `unsloth` \|
			`\| Random seed \| 3407 \|`

			`### SFT hyperparameters`

			`\| Hyperparameter \| Value \|`
			`\|---\|---\|`
			`\| Epochs \| 1 \|`
			`\| Per-device batch size \| 32 \|`
			`\| Gradient accumulation \| 1 \|`
			`\| Effective batch size \| 32 \|`
			`\| Packing \| True \|`
			`\| Max sequence length \| 2048 \|`
			\| Optimizer \| `adamw_torch` \|
			`\| Learning rate \| 2e-5 \|`
			`\| LR scheduler \| linear \|`
			`\| Warmup steps \| 50 \|`
			`\| Weight decay \| 0.01 \|`
			`\| Precision \| BF16 \|`
			`\| Seed \| 3407 \|`

			`### Merge & export`

			`After SFT, the LoRA adapters were merged into the base weights using`
			Unsloth's `push_to_hub_merged(..., save_method="merged_16bit")`. The
			`repository contains the resulting full BF16 model, not adapters.`

			`### Hardware`

			`Trained on a single GPU using Unsloth's optimized kernels. End-to-end`
			`training memory and time are dominated by the 50k-row, packed-2048 setup`
			`described above.`

			`---`

			`## Evaluation`

			`No formal benchmark scores are reported for this checkpoint yet. It has`
			`been smoke-tested on:`

			`- General Q&A (e.g. "Why is the sky blue?")`
			`- Short creative writing prompts`
			`- Multi-turn instruction following`

			`Quantitative evaluations on benchmarks such as MMLU, GSM8K, IFEval, or`
			`MT-Bench are left as future work. Contributions via the HF community tab`
			`are welcome.`

			`---`

			`## Limitations and biases`

			`- Inherits all limitations and biases of the LFM2.5 base model and of the`
			`GLM-5.1-derived training data.`
			`- 1.2B parameters is small. Expect weaker performance than 7B+ chat`
			`models on hard reasoning, long context, and code generation.`
			`- The training corpus is predominantly English. Other languages will work`
			`to varying degrees but are not the target.`
			`- The model can hallucinate facts confidently. Verify anything important.`

			`---`

			`## ONNX version`

			`An ONNX export of this model is available at:`

			`yasserrmd/glm5.1-distill-onnx`

			It can be used with `onnxruntime` and `optimum` for CPU and accelerated
			`inference. See that repository's README for usage details.`

			`---`

			`## Citation`

			`If you use this checkpoint, please cite the upstream work as well:`

			```bibtex
			`@misc{yasserrmd_glm51_distill_2026,`
			`title = {glm5.1-distill: a small LFM2.5 student fine-tuned on GLM-5.1 reasoning data},`
			`author = {Mohamed Yasser},`
			`year = {2026},`
			`howpublished = {\url{https://huggingface.co/yasserrmd/glm5.1-distill}}`
			`}`
			```

			`And the base model and dataset:`

			`- LiquidAI, LFM2.5-1.2B-Base, 2025.`
			`- Jackrong, GLM-5.1-Reasoning-1M-Cleaned, Hugging Face Datasets.`

			`---`

			`## Acknowledgements`

			`- [Liquid AI](https://huggingface.co/LiquidAI) for the LFM2.5 base model.`
			`- [Jackrong](https://huggingface.co/Jackrong) for the cleaned GLM-5.1`
			`reasoning dataset.`
			`- [Unsloth](https://github.com/unslothai/unsloth) for the 2x faster SFT`
			`pipeline and memory-efficient LoRA kernels.`
			- [Hugging Face TRL](https://github.com/huggingface/trl) for `SFTTrainer`.

			`[![Made with Unsloth](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)](https://github.com/unslothai/unsloth)`