初始化项目，由ModelHub XC社区提供模型

Model: yasserrmd/glm5.1-distill Source: Original Platform
2026-05-31 01:31:28 +08:00
commit 7f9f483011
8 changed files with 328293 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,275 @@
+---
+license: apache-2.0
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+base_model: LiquidAI/LFM2.5-1.2B-Base
+tags:
+- lfm2
+- liquid-ai
+- distillation
+- reasoning
+- glm
+- unsloth
+- trl
+- sft
+- text-generation-inference
+- conversational
+datasets:
+- Jackrong/GLM-5.1-Reasoning-1M-Cleaned
+model-index:
+- name: glm5.1-distill
+  results: []
+---
+
+# glm5.1-distill
+
+`yasserrmd/glm5.1-distill` is a 1.2B parameter instruction-tuned chat model
+built on top of [`LiquidAI/LFM2.5-1.2B-Base`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base).
+It is supervised-fine-tuned (SFT) on a 50k subset of
+[`Jackrong/GLM-5.1-Reasoning-1M-Cleaned`](https://huggingface.co/datasets/Jackrong/GLM-5.1-Reasoning-1M-Cleaned),
+a cleaned reasoning-style chat corpus distilled from the GLM-5.1 family.
+
+The goal is to bring some of the conversational reasoning behavior of larger
+GLM-5.1 teacher models into the small, efficient LFM2.5 architecture so it
+can run comfortably on a single consumer GPU, on edge devices, or via
+quantized runtimes such as ONNX, GGUF, or MLX.
+
+> **Note:** This is an independent community fine-tune. It is not affiliated
+> with or endorsed by Liquid AI or Z.ai/THUDM (the GLM authors).
+
+---
+
+## Model summary
+
+| Property | Value |
+|---|---|
+| Architecture | LFM2 (hybrid conv + attention) |
+| Parameters | ~1.2B |
+| Tensor dtype | BF16 |
+| Context length | 4096 (trained at 2048 with packing) |
+| Base model | `LiquidAI/LFM2.5-1.2B-Base` |
+| Fine-tuning method | LoRA SFT (merged back to base) |
+| Trainer | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) `SFTTrainer` |
+| Chat template | LFM2 / ChatML-style (`<|im_start|>` … `<|im_end|>`) |
+| License | Apache 2.0 |
+
+---
+
+## Intended use
+
+This model is designed for:
+
+- General assistant-style chat
+- Lightweight reasoning, step-by-step answers, and explanations
+- On-device and edge deployments where a 1B class model is appropriate
+- A starting checkpoint for further domain-specific fine-tuning
+
+It is **not** a safety-aligned, production-ready assistant on its own. Treat
+its output as that of a small distilled student model: it can be confidently
+wrong, especially on long-horizon math, code correctness, current events,
+and anything safety-critical.
+
+### Out of scope
+
+- Medical, legal, financial, or other high-stakes advice
+- Any setting that requires guaranteed factuality
+- Generating content that violates the Apache 2.0 license terms or the
+  upstream LFM2.5 base model license
+
+---
+
+## Quickstart (Transformers)
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+
+model_id = "yasserrmd/glm5.1-distill"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+
+messages = [
+    {"role": "user", "content": "Explain why the sky is blue in two short paragraphs."},
+]
+
+inputs = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt",
+    tokenize=True,
+    return_dict=True,
+).to(model.device)
+
+streamer = TextStreamer(tokenizer, skip_prompt=True)
+
+_ = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    temperature=0.1,
+    top_k=50,
+    top_p=0.1,
+    repetition_penalty=1.05,
+    streamer=streamer,
+)
+```
+
+### Recommended sampling
+
+The base LFM2.5 family is sensitive to sampling settings. The following
+defaults (inherited from Liquid AI's reference settings) work well:
+
+| Use case | temperature | top_k | top_p | repetition_penalty |
+|---|---|---|---|---|
+| Factual / short answers | 0.1 | 50 | 0.1 | 1.05 |
+| Creative / longer text | 0.7 | 50 | 0.9 | 1.10 |
+| Code / structured output | 0.2 | 40 | 0.9 | 1.05 |
+
+---
+
+## Chat template
+
+The tokenizer ships with a ChatML-style template. A two-turn example
+serializes to:
+
+```
+<|im_start|>user
+Hello!<|im_end|>
+<|im_start|>assistant
+Hey there!<|im_end|>
+```
+
+Always use `tokenizer.apply_chat_template(..., add_generation_prompt=True)`
+at inference time. Do not hand-roll the prompt.
+
+---
+
+## Training details
+
+### Data
+
+- Source: `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`, `main` config
+- Slice: first 50,000 rows of the `train` split
+- Format: ShareGPT-style multi-turn conversations, normalized via
+  `unsloth.chat_templates.standardize_data_formats`
+- Loss masking: `train_on_responses_only` so only assistant tokens
+  contribute to the loss
+
+### LoRA configuration
+
+| Hyperparameter | Value |
+|---|---|
+| Rank `r` | 16 |
+| `lora_alpha` | 16 |
+| `lora_dropout` | 0 |
+| Bias | none |
+| Target modules | `q_proj`, `k_proj`, `v_proj`, `out_proj`, `in_proj`, `w1`, `w2`, `w3` |
+| Gradient checkpointing | `unsloth` |
+| Random seed | 3407 |
+
+### SFT hyperparameters
+
+| Hyperparameter | Value |
+|---|---|
+| Epochs | 1 |
+| Per-device batch size | 32 |
+| Gradient accumulation | 1 |
+| Effective batch size | 32 |
+| Packing | True |
+| Max sequence length | 2048 |
+| Optimizer | `adamw_torch` |
+| Learning rate | 2e-5 |
+| LR scheduler | linear |
+| Warmup steps | 50 |
+| Weight decay | 0.01 |
+| Precision | BF16 |
+| Seed | 3407 |
+
+### Merge & export
+
+After SFT, the LoRA adapters were merged into the base weights using
+Unsloth's `push_to_hub_merged(..., save_method="merged_16bit")`. The
+repository contains the resulting full BF16 model, not adapters.
+
+### Hardware
+
+Trained on a single GPU using Unsloth's optimized kernels. End-to-end
+training memory and time are dominated by the 50k-row, packed-2048 setup
+described above.
+
+---
+
+## Evaluation
+
+No formal benchmark scores are reported for this checkpoint yet. It has
+been smoke-tested on:
+
+- General Q&A (e.g. "Why is the sky blue?")
+- Short creative writing prompts
+- Multi-turn instruction following
+
+Quantitative evaluations on benchmarks such as MMLU, GSM8K, IFEval, or
+MT-Bench are left as future work. Contributions via the HF community tab
+are welcome.
+
+---
+
+## Limitations and biases
+
+- Inherits all limitations and biases of the LFM2.5 base model and of the
+  GLM-5.1-derived training data.
+- 1.2B parameters is small. Expect weaker performance than 7B+ chat
+  models on hard reasoning, long context, and code generation.
+- The training corpus is predominantly English. Other languages will work
+  to varying degrees but are not the target.
+- The model can hallucinate facts confidently. Verify anything important.
+
+---
+
+## ONNX version
+
+An ONNX export of this model is available at:
+
+**`yasserrmd/glm5.1-distill-onnx`**
+
+It can be used with `onnxruntime` and `optimum` for CPU and accelerated
+inference. See that repository's README for usage details.
+
+---
+
+## Citation
+
+If you use this checkpoint, please cite the upstream work as well:
+
+```bibtex
+@misc{yasserrmd_glm51_distill_2026,
+  title  = {glm5.1-distill: a small LFM2.5 student fine-tuned on GLM-5.1 reasoning data},
+  author = {Mohamed Yasser},
+  year   = {2026},
+  howpublished = {\url{https://huggingface.co/yasserrmd/glm5.1-distill}}
+}
+```
+
+And the base model and dataset:
+
+- LiquidAI, *LFM2.5-1.2B-Base*, 2025.
+- Jackrong, *GLM-5.1-Reasoning-1M-Cleaned*, Hugging Face Datasets.
+
+---
+
+## Acknowledgements
+
+- [Liquid AI](https://huggingface.co/LiquidAI) for the LFM2.5 base model.
+- [Jackrong](https://huggingface.co/Jackrong) for the cleaned GLM-5.1
+  reasoning dataset.
+- [Unsloth](https://github.com/unslothai/unsloth) for the 2x faster SFT
+  pipeline and memory-efficient LoRA kernels.
+- [Hugging Face TRL](https://github.com/huggingface/trl) for `SFTTrainer`.
+
+[![Made with Unsloth](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)](https://github.com/unslothai/unsloth)