[Docs] Fix GLM-5 deploy command (#6711)

This pull request refines the GLM-5 deployment documentation by updating the Docker run command to include a more comprehensive set of device mappings and by removing an extraneous quantization flag from the `vllm serve` commands. These changes aim to correct and clarify the deployment instructions, ensuring users can successfully set up and run the GLM-5 model as intended. - vLLM version: v0.15.0 - vLLM main: 9562912cea Signed-off-by: Canlin Guo <961750412@qq.com>
2026-02-12 08:55:48 +08:00
parent a0315f6697
commit 052cc4e61b
1 changed files with 8 additions and 2 deletions
--- a/docs/source/tutorials/models/GLM5.md
+++ b/docs/source/tutorials/models/GLM5.md
@@ -48,6 +48,14 @@ docker run --rm \
 --device /dev/davinci5 \
 --device /dev/davinci6 \
 --device /dev/davinci7 \
+--device /dev/davinci8 \
+--device /dev/davinci9 \
+--device /dev/davinci10 \
+--device /dev/davinci11 \
+--device /dev/davinci12 \
+--device /dev/davinci13 \
+--device /dev/davinci14 \
+--device /dev/davinci15 \
 --device /dev/davinci_manager \
 --device /dev/devmm_svm \
 --device /dev/hisi_hdc \
@@ -181,7 +189,6 @@ vllm serve /root/.cache/modelscope/hub/models/vllm-ascend/GLM5-bf16 \
 --data-parallel-address $node0_ip \
 --data-parallel-rpc-port 12890 \
 --tensor-parallel-size 16 \
--quantization ascend \
 --seed 1024 \
 --served-model-name glm-5 \
 --enable-expert-parallel \
@@ -228,7 +235,6 @@ vllm serve /root/.cache/modelscope/hub/models/vllm-ascend/GLM5-bf16 \
 --data-parallel-address $node0_ip \
 --data-parallel-rpc-port 12890 \
 --tensor-parallel-size 16 \
--quantization ascend \
 --seed 1024 \
 --served-model-name glm-5 \
 --enable-expert-parallel \