--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation base_model: LiquidAI/LFM2.5-1.2B-Base tags: - lfm2 - liquid-ai - distillation - reasoning - glm - unsloth - trl - sft - text-generation-inference - conversational datasets: - Jackrong/GLM-5.1-Reasoning-1M-Cleaned model-index: - name: glm5.1-distill results: [] --- # glm5.1-distill `yasserrmd/glm5.1-distill` is a 1.2B parameter instruction-tuned chat model built on top of [`LiquidAI/LFM2.5-1.2B-Base`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base). It is supervised-fine-tuned (SFT) on a 50k subset of [`Jackrong/GLM-5.1-Reasoning-1M-Cleaned`](https://huggingface.co/datasets/Jackrong/GLM-5.1-Reasoning-1M-Cleaned), a cleaned reasoning-style chat corpus distilled from the GLM-5.1 family. The goal is to bring some of the conversational reasoning behavior of larger GLM-5.1 teacher models into the small, efficient LFM2.5 architecture so it can run comfortably on a single consumer GPU, on edge devices, or via quantized runtimes such as ONNX, GGUF, or MLX. > **Note:** This is an independent community fine-tune. It is not affiliated > with or endorsed by Liquid AI or Z.ai/THUDM (the GLM authors). --- ## Model summary | Property | Value | |---|---| | Architecture | LFM2 (hybrid conv + attention) | | Parameters | ~1.2B | | Tensor dtype | BF16 | | Context length | 4096 (trained at 2048 with packing) | | Base model | `LiquidAI/LFM2.5-1.2B-Base` | | Fine-tuning method | LoRA SFT (merged back to base) | | Trainer | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) `SFTTrainer` | | Chat template | LFM2 / ChatML-style (`<|im_start|>` … `<|im_end|>`) | | License | Apache 2.0 | --- ## Intended use This model is designed for: - General assistant-style chat - Lightweight reasoning, step-by-step answers, and explanations - On-device and edge deployments where a 1B class model is appropriate - A starting checkpoint for further domain-specific fine-tuning It is **not** a safety-aligned, production-ready assistant on its own. Treat its output as that of a small distilled student model: it can be confidently wrong, especially on long-horizon math, code correctness, current events, and anything safety-critical. ### Out of scope - Medical, legal, financial, or other high-stakes advice - Any setting that requires guaranteed factuality - Generating content that violates the Apache 2.0 license terms or the upstream LFM2.5 base model license --- ## Quickstart (Transformers) ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer model_id = "yasserrmd/glm5.1-distill" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "user", "content": "Explain why the sky is blue in two short paragraphs."}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt", tokenize=True, return_dict=True, ).to(model.device) streamer = TextStreamer(tokenizer, skip_prompt=True) _ = model.generate( **inputs, max_new_tokens=512, temperature=0.1, top_k=50, top_p=0.1, repetition_penalty=1.05, streamer=streamer, ) ``` ### Recommended sampling The base LFM2.5 family is sensitive to sampling settings. The following defaults (inherited from Liquid AI's reference settings) work well: | Use case | temperature | top_k | top_p | repetition_penalty | |---|---|---|---|---| | Factual / short answers | 0.1 | 50 | 0.1 | 1.05 | | Creative / longer text | 0.7 | 50 | 0.9 | 1.10 | | Code / structured output | 0.2 | 40 | 0.9 | 1.05 | --- ## Chat template The tokenizer ships with a ChatML-style template. A two-turn example serializes to: ``` <|im_start|>user Hello!<|im_end|> <|im_start|>assistant Hey there!<|im_end|> ``` Always use `tokenizer.apply_chat_template(..., add_generation_prompt=True)` at inference time. Do not hand-roll the prompt. --- ## Training details ### Data - Source: `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`, `main` config - Slice: first 50,000 rows of the `train` split - Format: ShareGPT-style multi-turn conversations, normalized via `unsloth.chat_templates.standardize_data_formats` - Loss masking: `train_on_responses_only` so only assistant tokens contribute to the loss ### LoRA configuration | Hyperparameter | Value | |---|---| | Rank `r` | 16 | | `lora_alpha` | 16 | | `lora_dropout` | 0 | | Bias | none | | Target modules | `q_proj`, `k_proj`, `v_proj`, `out_proj`, `in_proj`, `w1`, `w2`, `w3` | | Gradient checkpointing | `unsloth` | | Random seed | 3407 | ### SFT hyperparameters | Hyperparameter | Value | |---|---| | Epochs | 1 | | Per-device batch size | 32 | | Gradient accumulation | 1 | | Effective batch size | 32 | | Packing | True | | Max sequence length | 2048 | | Optimizer | `adamw_torch` | | Learning rate | 2e-5 | | LR scheduler | linear | | Warmup steps | 50 | | Weight decay | 0.01 | | Precision | BF16 | | Seed | 3407 | ### Merge & export After SFT, the LoRA adapters were merged into the base weights using Unsloth's `push_to_hub_merged(..., save_method="merged_16bit")`. The repository contains the resulting full BF16 model, not adapters. ### Hardware Trained on a single GPU using Unsloth's optimized kernels. End-to-end training memory and time are dominated by the 50k-row, packed-2048 setup described above. --- ## Evaluation No formal benchmark scores are reported for this checkpoint yet. It has been smoke-tested on: - General Q&A (e.g. "Why is the sky blue?") - Short creative writing prompts - Multi-turn instruction following Quantitative evaluations on benchmarks such as MMLU, GSM8K, IFEval, or MT-Bench are left as future work. Contributions via the HF community tab are welcome. --- ## Limitations and biases - Inherits all limitations and biases of the LFM2.5 base model and of the GLM-5.1-derived training data. - 1.2B parameters is small. Expect weaker performance than 7B+ chat models on hard reasoning, long context, and code generation. - The training corpus is predominantly English. Other languages will work to varying degrees but are not the target. - The model can hallucinate facts confidently. Verify anything important. --- ## ONNX version An ONNX export of this model is available at: **`yasserrmd/glm5.1-distill-onnx`** It can be used with `onnxruntime` and `optimum` for CPU and accelerated inference. See that repository's README for usage details. --- ## Citation If you use this checkpoint, please cite the upstream work as well: ```bibtex @misc{yasserrmd_glm51_distill_2026, title = {glm5.1-distill: a small LFM2.5 student fine-tuned on GLM-5.1 reasoning data}, author = {Mohamed Yasser}, year = {2026}, howpublished = {\url{https://huggingface.co/yasserrmd/glm5.1-distill}} } ``` And the base model and dataset: - LiquidAI, *LFM2.5-1.2B-Base*, 2025. - Jackrong, *GLM-5.1-Reasoning-1M-Cleaned*, Hugging Face Datasets. --- ## Acknowledgements - [Liquid AI](https://huggingface.co/LiquidAI) for the LFM2.5 base model. - [Jackrong](https://huggingface.co/Jackrong) for the cleaned GLM-5.1 reasoning dataset. - [Unsloth](https://github.com/unslothai/unsloth) for the 2x faster SFT pipeline and memory-efficient LoRA kernels. - [Hugging Face TRL](https://github.com/huggingface/trl) for `SFTTrainer`. [![Made with Unsloth](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)](https://github.com/unslothai/unsloth)