[Doc][v0.18.0] Add GLM-5.1 to models tutotials (#8778)

### What this PR does / why we need it? Add a description of glm-5.1 in the document. Signed-off-by: ZYang6263 <zy626375@gmail.com>
2026-04-30 10:09:55 +08:00
parent 0cb4bca1ff
commit 483f7d8188
1 changed files with 9 additions and 6 deletions
--- a/docs/source/tutorials/models/GLM5.md
+++ b/docs/source/tutorials/models/GLM5.md
@@ -1,10 +1,12 @@
-# GLM-5
+# GLM-5/GLM-5.1
 ## Introduction
 This document applies to both `GLM-5` and `GLM-5.1`. Unless otherwise specified, all descriptions, configurations, and deployment procedures for `GLM-5` in this document also apply to `GLM-5.1`. For brevity, `GLM-5` is used hereafter as a unified reference to both `GLM-5` and `GLM-5.1`.
 [GLM-5](https://huggingface.co/zai-org/GLM-5) use a Mixture-of-Experts (MoE) architecture and targets at complex systems engineering and long-horizon agentic tasks.
-The `GLM-5` model is first supported in `vllm-ascend:v0.17.0rc1`. In `vllm-ascend:v0.17.0rc1` and `vllm-ascend:v0.18.0rc1` , the version of transformers need to be upgraded to 5.2.0.
+The `GLM-5` model is first supported in `vllm-ascend:v0.17.0rc1`. The version of transformers need to be upgraded to 5.2.0.
 This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation.
@@ -21,6 +23,9 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
 - `GLM-5`(BF16 version): [Download model weight](https://www.modelscope.cn/models/ZhipuAI/GLM-5).
 - `GLM-5-w4a8`: [Download model weight](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8).
 - `GLM-5-w8a8`: [Download model weight](https://www.modelscope.cn/models/Eco-Tech/GLM-5-w8a8).
 - `GLM-5.1`(BF16 version): [Download model weight](https://huggingface.co/zai-org/GLM-5.1).
 - `GLM-5.1-w4a8`: [Download model weight](https://modelers.cn/models/Eco-Tech/GLM-5.1-w4a8).
 - `GLM-5.1-w8a8`: [Download model weight](https://modelers.cn/models/Eco-Tech/GLM-5.1-w8a8).
 - You can use [msmodelslim](https://gitcode.com/Ascend/msmodelslim) to quantify the model naively.
 It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
@@ -508,9 +513,9 @@ if __name__ == "__main__":
               new_dict[new_key] = tensor_dict[key]
   new_file_name = os.path.join(directory_path, "mtp-others.safetensors")
-   new_key = ["model.layers.78.embed_tokens.weight", "model.layers.78.shared_head.head.weight"]
+   new_keys = ["model.layers.78.embed_tokens.weight", "model.layers.78.shared_head.head.weight"]
   save_file(tensors=new_dict, filename=new_file_name)
-   for key in new_key:
+   for key in new_keys:
         json_data["weight_map"][key] = "mtp-others.safetensors"
   with open(json_path, 'w', encoding='utf-8') as f:
         json.dump(json_data, f, indent=2)
@@ -1289,8 +1294,6 @@ python load_balance_proxy_server_example.py \
    --host 0.0.0.0 \
    --prefiller-hosts \
       $node_p0_ip \
       $node_p0_ip \
       $node_p1_ip \
       $node_p1_ip \
    --prefiller-ports \
       6700 \