初始化项目,由ModelHub XC社区提供模型
Model: yasserrmd/glm5.1-distill Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
275
README.md
Normal file
275
README.md
Normal file
@@ -0,0 +1,275 @@
|
|||||||
|
---
|
||||||
|
license: apache-2.0
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
base_model: LiquidAI/LFM2.5-1.2B-Base
|
||||||
|
tags:
|
||||||
|
- lfm2
|
||||||
|
- liquid-ai
|
||||||
|
- distillation
|
||||||
|
- reasoning
|
||||||
|
- glm
|
||||||
|
- unsloth
|
||||||
|
- trl
|
||||||
|
- sft
|
||||||
|
- text-generation-inference
|
||||||
|
- conversational
|
||||||
|
datasets:
|
||||||
|
- Jackrong/GLM-5.1-Reasoning-1M-Cleaned
|
||||||
|
model-index:
|
||||||
|
- name: glm5.1-distill
|
||||||
|
results: []
|
||||||
|
---
|
||||||
|
|
||||||
|
# glm5.1-distill
|
||||||
|
|
||||||
|
`yasserrmd/glm5.1-distill` is a 1.2B parameter instruction-tuned chat model
|
||||||
|
built on top of [`LiquidAI/LFM2.5-1.2B-Base`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base).
|
||||||
|
It is supervised-fine-tuned (SFT) on a 50k subset of
|
||||||
|
[`Jackrong/GLM-5.1-Reasoning-1M-Cleaned`](https://huggingface.co/datasets/Jackrong/GLM-5.1-Reasoning-1M-Cleaned),
|
||||||
|
a cleaned reasoning-style chat corpus distilled from the GLM-5.1 family.
|
||||||
|
|
||||||
|
The goal is to bring some of the conversational reasoning behavior of larger
|
||||||
|
GLM-5.1 teacher models into the small, efficient LFM2.5 architecture so it
|
||||||
|
can run comfortably on a single consumer GPU, on edge devices, or via
|
||||||
|
quantized runtimes such as ONNX, GGUF, or MLX.
|
||||||
|
|
||||||
|
> **Note:** This is an independent community fine-tune. It is not affiliated
|
||||||
|
> with or endorsed by Liquid AI or Z.ai/THUDM (the GLM authors).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model summary
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Architecture | LFM2 (hybrid conv + attention) |
|
||||||
|
| Parameters | ~1.2B |
|
||||||
|
| Tensor dtype | BF16 |
|
||||||
|
| Context length | 4096 (trained at 2048 with packing) |
|
||||||
|
| Base model | `LiquidAI/LFM2.5-1.2B-Base` |
|
||||||
|
| Fine-tuning method | LoRA SFT (merged back to base) |
|
||||||
|
| Trainer | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) `SFTTrainer` |
|
||||||
|
| Chat template | LFM2 / ChatML-style (`<|im_start|>` … `<|im_end|>`) |
|
||||||
|
| License | Apache 2.0 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Intended use
|
||||||
|
|
||||||
|
This model is designed for:
|
||||||
|
|
||||||
|
- General assistant-style chat
|
||||||
|
- Lightweight reasoning, step-by-step answers, and explanations
|
||||||
|
- On-device and edge deployments where a 1B class model is appropriate
|
||||||
|
- A starting checkpoint for further domain-specific fine-tuning
|
||||||
|
|
||||||
|
It is **not** a safety-aligned, production-ready assistant on its own. Treat
|
||||||
|
its output as that of a small distilled student model: it can be confidently
|
||||||
|
wrong, especially on long-horizon math, code correctness, current events,
|
||||||
|
and anything safety-critical.
|
||||||
|
|
||||||
|
### Out of scope
|
||||||
|
|
||||||
|
- Medical, legal, financial, or other high-stakes advice
|
||||||
|
- Any setting that requires guaranteed factuality
|
||||||
|
- Generating content that violates the Apache 2.0 license terms or the
|
||||||
|
upstream LFM2.5 base model license
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quickstart (Transformers)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
||||||
|
|
||||||
|
model_id = "yasserrmd/glm5.1-distill"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
device_map="auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
messages = [
|
||||||
|
{"role": "user", "content": "Explain why the sky is blue in two short paragraphs."},
|
||||||
|
]
|
||||||
|
|
||||||
|
inputs = tokenizer.apply_chat_template(
|
||||||
|
messages,
|
||||||
|
add_generation_prompt=True,
|
||||||
|
return_tensors="pt",
|
||||||
|
tokenize=True,
|
||||||
|
return_dict=True,
|
||||||
|
).to(model.device)
|
||||||
|
|
||||||
|
streamer = TextStreamer(tokenizer, skip_prompt=True)
|
||||||
|
|
||||||
|
_ = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens=512,
|
||||||
|
temperature=0.1,
|
||||||
|
top_k=50,
|
||||||
|
top_p=0.1,
|
||||||
|
repetition_penalty=1.05,
|
||||||
|
streamer=streamer,
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Recommended sampling
|
||||||
|
|
||||||
|
The base LFM2.5 family is sensitive to sampling settings. The following
|
||||||
|
defaults (inherited from Liquid AI's reference settings) work well:
|
||||||
|
|
||||||
|
| Use case | temperature | top_k | top_p | repetition_penalty |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| Factual / short answers | 0.1 | 50 | 0.1 | 1.05 |
|
||||||
|
| Creative / longer text | 0.7 | 50 | 0.9 | 1.10 |
|
||||||
|
| Code / structured output | 0.2 | 40 | 0.9 | 1.05 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Chat template
|
||||||
|
|
||||||
|
The tokenizer ships with a ChatML-style template. A two-turn example
|
||||||
|
serializes to:
|
||||||
|
|
||||||
|
```
|
||||||
|
<|im_start|>user
|
||||||
|
Hello!<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
Hey there!<|im_end|>
|
||||||
|
```
|
||||||
|
|
||||||
|
Always use `tokenizer.apply_chat_template(..., add_generation_prompt=True)`
|
||||||
|
at inference time. Do not hand-roll the prompt.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training details
|
||||||
|
|
||||||
|
### Data
|
||||||
|
|
||||||
|
- Source: `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`, `main` config
|
||||||
|
- Slice: first 50,000 rows of the `train` split
|
||||||
|
- Format: ShareGPT-style multi-turn conversations, normalized via
|
||||||
|
`unsloth.chat_templates.standardize_data_formats`
|
||||||
|
- Loss masking: `train_on_responses_only` so only assistant tokens
|
||||||
|
contribute to the loss
|
||||||
|
|
||||||
|
### LoRA configuration
|
||||||
|
|
||||||
|
| Hyperparameter | Value |
|
||||||
|
|---|---|
|
||||||
|
| Rank `r` | 16 |
|
||||||
|
| `lora_alpha` | 16 |
|
||||||
|
| `lora_dropout` | 0 |
|
||||||
|
| Bias | none |
|
||||||
|
| Target modules | `q_proj`, `k_proj`, `v_proj`, `out_proj`, `in_proj`, `w1`, `w2`, `w3` |
|
||||||
|
| Gradient checkpointing | `unsloth` |
|
||||||
|
| Random seed | 3407 |
|
||||||
|
|
||||||
|
### SFT hyperparameters
|
||||||
|
|
||||||
|
| Hyperparameter | Value |
|
||||||
|
|---|---|
|
||||||
|
| Epochs | 1 |
|
||||||
|
| Per-device batch size | 32 |
|
||||||
|
| Gradient accumulation | 1 |
|
||||||
|
| Effective batch size | 32 |
|
||||||
|
| Packing | True |
|
||||||
|
| Max sequence length | 2048 |
|
||||||
|
| Optimizer | `adamw_torch` |
|
||||||
|
| Learning rate | 2e-5 |
|
||||||
|
| LR scheduler | linear |
|
||||||
|
| Warmup steps | 50 |
|
||||||
|
| Weight decay | 0.01 |
|
||||||
|
| Precision | BF16 |
|
||||||
|
| Seed | 3407 |
|
||||||
|
|
||||||
|
### Merge & export
|
||||||
|
|
||||||
|
After SFT, the LoRA adapters were merged into the base weights using
|
||||||
|
Unsloth's `push_to_hub_merged(..., save_method="merged_16bit")`. The
|
||||||
|
repository contains the resulting full BF16 model, not adapters.
|
||||||
|
|
||||||
|
### Hardware
|
||||||
|
|
||||||
|
Trained on a single GPU using Unsloth's optimized kernels. End-to-end
|
||||||
|
training memory and time are dominated by the 50k-row, packed-2048 setup
|
||||||
|
described above.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Evaluation
|
||||||
|
|
||||||
|
No formal benchmark scores are reported for this checkpoint yet. It has
|
||||||
|
been smoke-tested on:
|
||||||
|
|
||||||
|
- General Q&A (e.g. "Why is the sky blue?")
|
||||||
|
- Short creative writing prompts
|
||||||
|
- Multi-turn instruction following
|
||||||
|
|
||||||
|
Quantitative evaluations on benchmarks such as MMLU, GSM8K, IFEval, or
|
||||||
|
MT-Bench are left as future work. Contributions via the HF community tab
|
||||||
|
are welcome.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Limitations and biases
|
||||||
|
|
||||||
|
- Inherits all limitations and biases of the LFM2.5 base model and of the
|
||||||
|
GLM-5.1-derived training data.
|
||||||
|
- 1.2B parameters is small. Expect weaker performance than 7B+ chat
|
||||||
|
models on hard reasoning, long context, and code generation.
|
||||||
|
- The training corpus is predominantly English. Other languages will work
|
||||||
|
to varying degrees but are not the target.
|
||||||
|
- The model can hallucinate facts confidently. Verify anything important.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ONNX version
|
||||||
|
|
||||||
|
An ONNX export of this model is available at:
|
||||||
|
|
||||||
|
**`yasserrmd/glm5.1-distill-onnx`**
|
||||||
|
|
||||||
|
It can be used with `onnxruntime` and `optimum` for CPU and accelerated
|
||||||
|
inference. See that repository's README for usage details.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use this checkpoint, please cite the upstream work as well:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{yasserrmd_glm51_distill_2026,
|
||||||
|
title = {glm5.1-distill: a small LFM2.5 student fine-tuned on GLM-5.1 reasoning data},
|
||||||
|
author = {Mohamed Yasser},
|
||||||
|
year = {2026},
|
||||||
|
howpublished = {\url{https://huggingface.co/yasserrmd/glm5.1-distill}}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And the base model and dataset:
|
||||||
|
|
||||||
|
- LiquidAI, *LFM2.5-1.2B-Base*, 2025.
|
||||||
|
- Jackrong, *GLM-5.1-Reasoning-1M-Cleaned*, Hugging Face Datasets.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
- [Liquid AI](https://huggingface.co/LiquidAI) for the LFM2.5 base model.
|
||||||
|
- [Jackrong](https://huggingface.co/Jackrong) for the cleaned GLM-5.1
|
||||||
|
reasoning dataset.
|
||||||
|
- [Unsloth](https://github.com/unslothai/unsloth) for the 2x faster SFT
|
||||||
|
pipeline and memory-efficient LoRA kernels.
|
||||||
|
- [Hugging Face TRL](https://github.com/huggingface/trl) for `SFTTrainer`.
|
||||||
|
|
||||||
|
[](https://github.com/unslothai/unsloth)
|
||||||
7
chat_template.jinja
Normal file
7
chat_template.jinja
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
{{- bos_token -}}{%- set system_prompt = "" -%}{%- set ns = namespace(system_prompt="") -%}{%- if messages[0]["role"] == "system" -%} {%- set ns.system_prompt = messages[0]["content"] -%} {%- set messages = messages[1:] -%}{%- endif -%}{%- if tools -%} {%- set ns.system_prompt = ns.system_prompt + ("
|
||||||
|
" if ns.system_prompt else "") + "List of tools: <|tool_list_start|>[" -%} {%- for tool in tools -%} {%- if tool is not string -%} {%- set tool = tool | tojson -%} {%- endif -%} {%- set ns.system_prompt = ns.system_prompt + tool -%} {%- if not loop.last -%} {%- set ns.system_prompt = ns.system_prompt + ", " -%} {%- endif -%} {%- endfor -%} {%- set ns.system_prompt = ns.system_prompt + "]<|tool_list_end|>" -%}{%- endif -%}{%- if ns.system_prompt -%} {{- "<|im_start|>system
|
||||||
|
" + ns.system_prompt + "<|im_end|>
|
||||||
|
" -}}{%- endif -%}{%- for message in messages -%} {{- "<|im_start|>" + message["role"] + "
|
||||||
|
" -}} {%- set content = message["content"] -%} {%- if content is not string -%} {%- set content = content | tojson -%} {%- endif -%} {%- if message["role"] == "tool" -%} {%- set content = "<|tool_response_start|>" + content + "<|tool_response_end|>" -%} {%- endif -%} {{- content + "<|im_end|>
|
||||||
|
" -}}{%- endfor -%}{%- if add_generation_prompt -%} {{- "<|im_start|>assistant
|
||||||
|
" -}}{%- endif -%}
|
||||||
58
config.json
Normal file
58
config.json
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Lfm2ForCausalLM"
|
||||||
|
],
|
||||||
|
"block_auto_adjust_ff_dim": true,
|
||||||
|
"block_dim": 2048,
|
||||||
|
"block_ff_dim": 12288,
|
||||||
|
"block_ffn_dim_multiplier": 1.0,
|
||||||
|
"block_mlp_init_scale": 1.0,
|
||||||
|
"block_multiple_of": 256,
|
||||||
|
"block_norm_eps": 1e-05,
|
||||||
|
"block_out_init_scale": 1.0,
|
||||||
|
"block_use_swiglu": true,
|
||||||
|
"block_use_xavier_init": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"conv_L_cache": 3,
|
||||||
|
"conv_bias": false,
|
||||||
|
"conv_dim": 2048,
|
||||||
|
"conv_use_xavier_init": true,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"eos_token_id": 7,
|
||||||
|
"hidden_size": 2048,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 12288,
|
||||||
|
"layer_types": [
|
||||||
|
"conv",
|
||||||
|
"conv",
|
||||||
|
"full_attention",
|
||||||
|
"conv",
|
||||||
|
"conv",
|
||||||
|
"full_attention",
|
||||||
|
"conv",
|
||||||
|
"conv",
|
||||||
|
"full_attention",
|
||||||
|
"conv",
|
||||||
|
"full_attention",
|
||||||
|
"conv",
|
||||||
|
"full_attention",
|
||||||
|
"conv",
|
||||||
|
"full_attention",
|
||||||
|
"conv"
|
||||||
|
],
|
||||||
|
"max_position_embeddings": 128000,
|
||||||
|
"model_name": "LiquidAI/LFM2.5-1.2B-Base",
|
||||||
|
"model_type": "lfm2",
|
||||||
|
"norm_eps": 1e-05,
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_heads": 32,
|
||||||
|
"num_hidden_layers": 16,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"rope_theta": 1000000.0,
|
||||||
|
"tie_embedding": true,
|
||||||
|
"unsloth_version": "2026.4.8",
|
||||||
|
"use_cache": true,
|
||||||
|
"use_pos_enc": true,
|
||||||
|
"vocab_size": 65536
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:66b718a04a412036b152218ca83f93a6517bd1939bad83fe36fd863b3b7c3e53
|
||||||
|
size 2340697936
|
||||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|startoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
323812
tokenizer.json
Normal file
323812
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
4080
tokenizer_config.json
Normal file
4080
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user