diff --git a/docs/source/user_guide/feature_guide/graph_mode.md b/docs/source/user_guide/feature_guide/graph_mode.md index c2e07956..029c77b6 100644 --- a/docs/source/user_guide/feature_guide/graph_mode.md +++ b/docs/source/user_guide/feature_guide/graph_mode.md @@ -4,6 +4,10 @@ This feature is currently experimental. In future versions, there may be behavioral changes around configuration, coverage, performance improvement. ``` +```{note} +In context parallel scenario (i.e. prefill_context_parallel_size * decode_context_parallel_size > 1), "cudagraph_mode" is not sufficiently supported to be set to "FULL" yet. +``` + This guide provides instructions for using Ascend Graph Mode with vLLM Ascend. Please note that graph mode is only available on V1 Engine. And only Qwen, DeepSeek series models are well tested from 0.9.0rc1. We will make it stable and generalized in the next release. ## Getting Started