[Chore] Prevents use of ASCEND_LAUNCH_BLOCKING with ACL Graph (#3574)

### What this PR does / why we need it?
Adds a validation check to prevent running with an incompatible
configuration.

The `ASCEND_LAUNCH_BLOCKING=1` environment variable, used for debugging,
enforces synchronous execution which is incompatible with ACL Graph.

This change raises an explicit error to inform the user about the
conflict and how to resolve it, preventing a more obscure failure later.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None needed.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
Yizhou
2025-10-21 20:17:33 +08:00
committed by GitHub
parent 220df60c61
commit ef3fabf399

View File

@@ -270,6 +270,17 @@ class NPUPlatform(Platform):
compilation_config.cudagraph_mode = CUDAGraphMode.NONE
compilation_config.level = CompilationLevel.NO_COMPILATION
# TODO: Remove this check when ACL Graph supports ASCEND_LAUNCH_BLOCKING=1
# Then, we will have to discuss the error handling strategy and user experience
if compilation_config.cudagraph_mode != CUDAGraphMode.NONE and \
os.environ.get("ASCEND_LAUNCH_BLOCKING", "0") == "1":
raise ValueError(
"ACL graph is incompatible with ASCEND_LAUNCH_BLOCKING=1. "
"Please unset ASCEND_LAUNCH_BLOCKING or set it to 0. If you "
"need ASCEND_LAUNCH_BLOCKING for debugging, consider other methods — "
"for example, check the plog files (default: $HOME/ascend/log/debug) "
"for more information about runtime errors.")
if parallel_config and parallel_config.worker_cls == "auto":
# TODO: this is a tricky way to disable `use_sequence_parallel_moe` in vllm.
os.environ["VLLM_ALL2ALL_BACKEND"] = "flashinfer_all2allv"