[Doc] Upgrade env VLLM_ASCEND_ENABLE_FUSED_MC2 used in nightly test and tutorials (#8441)
### What this PR does / why we need it? The env `VLLM_ASCEND_ENABLE_FUSED_MC2` should only enabled in the decoder node during Prefill-Decode Disaggregation scenario --------- Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
@@ -10,7 +10,6 @@ env_common:
|
||||
OMP_PROC_BIND: false
|
||||
OMP_NUM_THREADS: 1
|
||||
VLLM_ASCEND_ENABLE_FLASHCOMM1: 1
|
||||
VLLM_ASCEND_ENABLE_FUSED_MC2: 2
|
||||
TASK_QUEUE_ENABLE: 1
|
||||
SERVER_PORT: 8080
|
||||
|
||||
@@ -21,6 +20,9 @@ disaggregated_prefill:
|
||||
|
||||
deployment:
|
||||
-
|
||||
envs:
|
||||
# should disable this in the prefiller node
|
||||
VLLM_ASCEND_ENABLE_FUSED_MC2: 0
|
||||
server_cmd: >
|
||||
vllm serve "Qwen/Qwen3-235B-A22B"
|
||||
--host 0.0.0.0
|
||||
@@ -57,6 +59,8 @@ deployment:
|
||||
}'
|
||||
|
||||
-
|
||||
envs:
|
||||
VLLM_ASCEND_ENABLE_FUSED_MC2: 2
|
||||
server_cmd: >
|
||||
vllm serve "Qwen/Qwen3-235B-A22B"
|
||||
--host 0.0.0.0
|
||||
|
||||
Reference in New Issue
Block a user