From a253235a596c43cae1de7edd5c937470aff4b2ff Mon Sep 17 00:00:00 2001 From: Zetong Li <48438720+slippersss@users.noreply.github.com> Date: Mon, 23 Mar 2026 17:34:51 +0800 Subject: [PATCH] [Doc] Add note for unsupported PCP + FULL (#7559) ### What this PR does / why we need it? This PR aims to add note in doc that FULL mode is not supported in PCP scenario. Signed-off-by: Zetong Li --- docs/source/user_guide/feature_guide/graph_mode.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/user_guide/feature_guide/graph_mode.md b/docs/source/user_guide/feature_guide/graph_mode.md index c2e07956..029c77b6 100644 --- a/docs/source/user_guide/feature_guide/graph_mode.md +++ b/docs/source/user_guide/feature_guide/graph_mode.md @@ -4,6 +4,10 @@ This feature is currently experimental. In future versions, there may be behavioral changes around configuration, coverage, performance improvement. ``` +```{note} +In context parallel scenario (i.e. prefill_context_parallel_size * decode_context_parallel_size > 1), "cudagraph_mode" is not sufficiently supported to be set to "FULL" yet. +``` + This guide provides instructions for using Ascend Graph Mode with vLLM Ascend. Please note that graph mode is only available on V1 Engine. And only Qwen, DeepSeek series models are well tested from 0.9.0rc1. We will make it stable and generalized in the next release. ## Getting Started