[0.18.0][Bugfix] Restore VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT to original value for nightly test (#8794)

### What this PR does / why we need it?
PR #8618 renamed `VLLM_NIXL_ABORT_REQUEST_TIMEOUT` to
`VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT` and simultaneously reduced the
timeout value from 300000 to 480 seconds in the nightly test configs.
The 480s value is far too short for heavy multi-node workloads (DeepSeek
V3/R1 under W8A8 + EP), causing [spurious abort-request
timeouts](https://github.com/vllm-project/vllm-ascend/actions/runs/25067539406/job/73441223206)
in CI.

This PR restores the timeout value to the original 300000 to fix the
nightly test failures introduced by #8618.

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
zhangxinyuehfad
2026-04-29 14:31:12 +08:00
committed by GitHub
parent 96b90ad625
commit bc5ca2c856
3 changed files with 3 additions and 3 deletions

View File

@@ -13,7 +13,7 @@ env_common:
HCCL_DETERMINISTIC: True
TASK_QUEUE_ENABLE: 1
HCCL_OP_RETRY_ENABLE: "L0:0, L1:0"
VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT: 480
VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT: 300000
disaggregated_prefill:
enabled: true

View File

@@ -15,7 +15,7 @@ env_common:
ASCEND_TRANSPORT_PRINT: 1
ACL_OP_INIT_MODE: 1
ASCEND_A3_ENABLE: 1
VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT: 480
VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT: 300000
VLLM_ENGINE_READY_TIMEOUT_S: 1800
HCCL_CONNECT_TIMEOUT: 1200
HCCL_INTRA_PCIE_ENABLE: 1

View File

@@ -15,7 +15,7 @@ env_common:
ASCEND_TRANSPORT_PRINT: 1
ACL_OP_INIT_MODE: 1
ASCEND_A3_ENABLE: 1
VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT: 480
VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT: 300000
VLLM_ENGINE_READY_TIMEOUT_S: 1800
HCCL_CONNECT_TIMEOUT: 1200
HCCL_INTRA_PCIE_ENABLE: 1