[Doc] Replace deprecated full_cuda_graph with cudagraph_mode in Qwen2.5-Omni (#7286)

## Summary - Replace `full_cuda_graph: 1` with `cudagraph_mode: FULL_DECODE_ONLY` in both single-NPU and multi-NPU examples - `full_cuda_graph` is deprecated and falls back to `NONE` on NPU Fixes #4696 - vLLM version: v0.17.0 - vLLM main: 4034c3d32e Signed-off-by: bazingazhou233-hub <bazingazhou233-hub@users.noreply.github.com> Co-authored-by: bazingazhou233-hub <bazingazhou233-hub@users.noreply.github.com>
2026-03-14 22:38:36 +08:00
parent bb506a1c99
commit 9e6c547d98
1 changed files with 2 additions and 2 deletions
--- a/docs/source/tutorials/models/Qwen2.5-Omni.md
+++ b/docs/source/tutorials/models/Qwen2.5-Omni.md
@@ -82,7 +82,7 @@ vllm serve "${MODEL_PATH}" \
 --served-model-name Qwen-Omni \
 --allowed-local-media-path ${LOCAL_MEDIA_PATH} \
 --trust-remote-code \
--compilation-config '{"full_cuda_graph": 1}' \
+--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \
 --no-enable-prefix-caching
 ```

@@ -113,7 +113,7 @@ vllm serve ${MODEL_PATH}\
 --served-model-name Qwen-Omni \
 --allowed-local-media-path ${LOCAL_MEDIA_PATH} \
 --trust-remote-code \
--compilation-config {"full_cuda_graph": 1} \
+--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \
 --data-parallel-size ${DP_SIZE} \
 --no-enable-prefix-caching
 ```