[Doc] Add note for unsupported PCP + FULL (#7559)

### What this PR does / why we need it?
This PR aims to add note in doc that FULL mode is not supported in PCP
scenario.

Signed-off-by: Zetong Li <slippersss@126.com>
This commit is contained in:
Zetong Li
2026-03-23 17:34:51 +08:00
committed by GitHub
parent 9976e685b7
commit a253235a59

View File

@@ -4,6 +4,10 @@
This feature is currently experimental. In future versions, there may be behavioral changes around configuration, coverage, performance improvement.
```
```{note}
In context parallel scenario (i.e. prefill_context_parallel_size * decode_context_parallel_size > 1), "cudagraph_mode" is not sufficiently supported to be set to "FULL" yet.
```
This guide provides instructions for using Ascend Graph Mode with vLLM Ascend. Please note that graph mode is only available on V1 Engine. And only Qwen, DeepSeek series models are well tested from 0.9.0rc1. We will make it stable and generalized in the next release.
## Getting Started