[Doc][v0.18.0] Add GLM-5.1 to models tutotials (#8778)
### What this PR does / why we need it? Add a description of glm-5.1 in the document. Signed-off-by: ZYang6263 <zy626375@gmail.com>
This commit is contained in:
@@ -1,10 +1,12 @@
|
|||||||
# GLM-5
|
# GLM-5/GLM-5.1
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
|
This document applies to both `GLM-5` and `GLM-5.1`. Unless otherwise specified, all descriptions, configurations, and deployment procedures for `GLM-5` in this document also apply to `GLM-5.1`. For brevity, `GLM-5` is used hereafter as a unified reference to both `GLM-5` and `GLM-5.1`.
|
||||||
|
|
||||||
[GLM-5](https://huggingface.co/zai-org/GLM-5) use a Mixture-of-Experts (MoE) architecture and targets at complex systems engineering and long-horizon agentic tasks.
|
[GLM-5](https://huggingface.co/zai-org/GLM-5) use a Mixture-of-Experts (MoE) architecture and targets at complex systems engineering and long-horizon agentic tasks.
|
||||||
|
|
||||||
The `GLM-5` model is first supported in `vllm-ascend:v0.17.0rc1`. In `vllm-ascend:v0.17.0rc1` and `vllm-ascend:v0.18.0rc1` , the version of transformers need to be upgraded to 5.2.0.
|
The `GLM-5` model is first supported in `vllm-ascend:v0.17.0rc1`. The version of transformers need to be upgraded to 5.2.0.
|
||||||
|
|
||||||
This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation.
|
This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation.
|
||||||
|
|
||||||
@@ -21,6 +23,9 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea
|
|||||||
- `GLM-5`(BF16 version): [Download model weight](https://www.modelscope.cn/models/ZhipuAI/GLM-5).
|
- `GLM-5`(BF16 version): [Download model weight](https://www.modelscope.cn/models/ZhipuAI/GLM-5).
|
||||||
- `GLM-5-w4a8`: [Download model weight](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8).
|
- `GLM-5-w4a8`: [Download model weight](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8).
|
||||||
- `GLM-5-w8a8`: [Download model weight](https://www.modelscope.cn/models/Eco-Tech/GLM-5-w8a8).
|
- `GLM-5-w8a8`: [Download model weight](https://www.modelscope.cn/models/Eco-Tech/GLM-5-w8a8).
|
||||||
|
- `GLM-5.1`(BF16 version): [Download model weight](https://huggingface.co/zai-org/GLM-5.1).
|
||||||
|
- `GLM-5.1-w4a8`: [Download model weight](https://modelers.cn/models/Eco-Tech/GLM-5.1-w4a8).
|
||||||
|
- `GLM-5.1-w8a8`: [Download model weight](https://modelers.cn/models/Eco-Tech/GLM-5.1-w8a8).
|
||||||
- You can use [msmodelslim](https://gitcode.com/Ascend/msmodelslim) to quantify the model naively.
|
- You can use [msmodelslim](https://gitcode.com/Ascend/msmodelslim) to quantify the model naively.
|
||||||
|
|
||||||
It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
|
It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/`
|
||||||
@@ -508,9 +513,9 @@ if __name__ == "__main__":
|
|||||||
new_dict[new_key] = tensor_dict[key]
|
new_dict[new_key] = tensor_dict[key]
|
||||||
|
|
||||||
new_file_name = os.path.join(directory_path, "mtp-others.safetensors")
|
new_file_name = os.path.join(directory_path, "mtp-others.safetensors")
|
||||||
new_key = ["model.layers.78.embed_tokens.weight", "model.layers.78.shared_head.head.weight"]
|
new_keys = ["model.layers.78.embed_tokens.weight", "model.layers.78.shared_head.head.weight"]
|
||||||
save_file(tensors=new_dict, filename=new_file_name)
|
save_file(tensors=new_dict, filename=new_file_name)
|
||||||
for key in new_key:
|
for key in new_keys:
|
||||||
json_data["weight_map"][key] = "mtp-others.safetensors"
|
json_data["weight_map"][key] = "mtp-others.safetensors"
|
||||||
with open(json_path, 'w', encoding='utf-8') as f:
|
with open(json_path, 'w', encoding='utf-8') as f:
|
||||||
json.dump(json_data, f, indent=2)
|
json.dump(json_data, f, indent=2)
|
||||||
@@ -1289,8 +1294,6 @@ python load_balance_proxy_server_example.py \
|
|||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--prefiller-hosts \
|
--prefiller-hosts \
|
||||||
$node_p0_ip \
|
$node_p0_ip \
|
||||||
$node_p0_ip \
|
|
||||||
$node_p1_ip \
|
|
||||||
$node_p1_ip \
|
$node_p1_ip \
|
||||||
--prefiller-ports \
|
--prefiller-ports \
|
||||||
6700 \
|
6700 \
|
||||||
|
|||||||
Reference in New Issue
Block a user