From 9e6c547d9808eb5fa532d49102969c91b79be905 Mon Sep 17 00:00:00 2001 From: bazingazhou233-hub Date: Sat, 14 Mar 2026 22:38:36 +0800 Subject: [PATCH] [Doc] Replace deprecated full_cuda_graph with cudagraph_mode in Qwen2.5-Omni (#7286) ## Summary - Replace `full_cuda_graph: 1` with `cudagraph_mode: FULL_DECODE_ONLY` in both single-NPU and multi-NPU examples - `full_cuda_graph` is deprecated and falls back to `NONE` on NPU Fixes #4696 - vLLM version: v0.17.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d Signed-off-by: bazingazhou233-hub Co-authored-by: bazingazhou233-hub --- docs/source/tutorials/models/Qwen2.5-Omni.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/tutorials/models/Qwen2.5-Omni.md b/docs/source/tutorials/models/Qwen2.5-Omni.md index a869dd6f..5757091a 100644 --- a/docs/source/tutorials/models/Qwen2.5-Omni.md +++ b/docs/source/tutorials/models/Qwen2.5-Omni.md @@ -82,7 +82,7 @@ vllm serve "${MODEL_PATH}" \ --served-model-name Qwen-Omni \ --allowed-local-media-path ${LOCAL_MEDIA_PATH} \ --trust-remote-code \ ---compilation-config '{"full_cuda_graph": 1}' \ +--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \ --no-enable-prefix-caching ``` @@ -113,7 +113,7 @@ vllm serve ${MODEL_PATH}\ --served-model-name Qwen-Omni \ --allowed-local-media-path ${LOCAL_MEDIA_PATH} \ --trust-remote-code \ ---compilation-config {"full_cuda_graph": 1} \ +--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \ --data-parallel-size ${DP_SIZE} \ --no-enable-prefix-caching ```