diff --git a/docs/source/tutorials/models/GLM5.md b/docs/source/tutorials/models/GLM5.md index 86ec5de7..42e02616 100644 --- a/docs/source/tutorials/models/GLM5.md +++ b/docs/source/tutorials/models/GLM5.md @@ -1,10 +1,12 @@ -# GLM-5 +# GLM-5/GLM-5.1 ## Introduction +This document applies to both `GLM-5` and `GLM-5.1`. Unless otherwise specified, all descriptions, configurations, and deployment procedures for `GLM-5` in this document also apply to `GLM-5.1`. For brevity, `GLM-5` is used hereafter as a unified reference to both `GLM-5` and `GLM-5.1`. + [GLM-5](https://huggingface.co/zai-org/GLM-5) use a Mixture-of-Experts (MoE) architecture and targets at complex systems engineering and long-horizon agentic tasks. -The `GLM-5` model is first supported in `vllm-ascend:v0.17.0rc1`. In `vllm-ascend:v0.17.0rc1` and `vllm-ascend:v0.18.0rc1` , the version of transformers need to be upgraded to 5.2.0. +The `GLM-5` model is first supported in `vllm-ascend:v0.17.0rc1`. The version of transformers need to be upgraded to 5.2.0. This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation. @@ -21,6 +23,9 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea - `GLM-5`(BF16 version): [Download model weight](https://www.modelscope.cn/models/ZhipuAI/GLM-5). - `GLM-5-w4a8`: [Download model weight](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8). - `GLM-5-w8a8`: [Download model weight](https://www.modelscope.cn/models/Eco-Tech/GLM-5-w8a8). +- `GLM-5.1`(BF16 version): [Download model weight](https://huggingface.co/zai-org/GLM-5.1). +- `GLM-5.1-w4a8`: [Download model weight](https://modelers.cn/models/Eco-Tech/GLM-5.1-w4a8). +- `GLM-5.1-w8a8`: [Download model weight](https://modelers.cn/models/Eco-Tech/GLM-5.1-w8a8). - You can use [msmodelslim](https://gitcode.com/Ascend/msmodelslim) to quantify the model naively. It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/` @@ -508,9 +513,9 @@ if __name__ == "__main__": new_dict[new_key] = tensor_dict[key] new_file_name = os.path.join(directory_path, "mtp-others.safetensors") - new_key = ["model.layers.78.embed_tokens.weight", "model.layers.78.shared_head.head.weight"] + new_keys = ["model.layers.78.embed_tokens.weight", "model.layers.78.shared_head.head.weight"] save_file(tensors=new_dict, filename=new_file_name) - for key in new_key: + for key in new_keys: json_data["weight_map"][key] = "mtp-others.safetensors" with open(json_path, 'w', encoding='utf-8') as f: json.dump(json_data, f, indent=2) @@ -1289,8 +1294,6 @@ python load_balance_proxy_server_example.py \ --host 0.0.0.0 \ --prefiller-hosts \ $node_p0_ip \ - $node_p0_ip \ - $node_p1_ip \ $node_p1_ip \ --prefiller-ports \ 6700 \