[Doc] Replace deprecated full_cuda_graph with cudagraph_mode in Qwen2.5-Omni (#7286)
## Summary
- Replace `full_cuda_graph: 1` with `cudagraph_mode: FULL_DECODE_ONLY`
in both single-NPU and multi-NPU examples
- `full_cuda_graph` is deprecated and falls back to `NONE` on NPU
Fixes #4696
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
Signed-off-by: bazingazhou233-hub <bazingazhou233-hub@users.noreply.github.com>
Co-authored-by: bazingazhou233-hub <bazingazhou233-hub@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
bb506a1c99
commit
9e6c547d98
@@ -82,7 +82,7 @@ vllm serve "${MODEL_PATH}" \
|
||||
--served-model-name Qwen-Omni \
|
||||
--allowed-local-media-path ${LOCAL_MEDIA_PATH} \
|
||||
--trust-remote-code \
|
||||
--compilation-config '{"full_cuda_graph": 1}' \
|
||||
--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \
|
||||
--no-enable-prefix-caching
|
||||
```
|
||||
|
||||
@@ -113,7 +113,7 @@ vllm serve ${MODEL_PATH}\
|
||||
--served-model-name Qwen-Omni \
|
||||
--allowed-local-media-path ${LOCAL_MEDIA_PATH} \
|
||||
--trust-remote-code \
|
||||
--compilation-config {"full_cuda_graph": 1} \
|
||||
--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \
|
||||
--data-parallel-size ${DP_SIZE} \
|
||||
--no-enable-prefix-caching
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user