[doc] Add context parallel user guide (#5358)

1. Add context parallel user guide 2. Add context parallel related message in supported features/models - vLLM version: release/v0.13.0 - vLLM main: bc0a5a0c08 Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
2025-12-26 17:03:47 +08:00
parent 09390eaf32
commit 8ed87dfa84
5 changed files with 126 additions and 37 deletions
--- a/docs/source/user_guide/support_matrix/supported_features.md
+++ b/docs/source/user_guide/support_matrix/supported_features.md
@@ -28,6 +28,7 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
 | Quantization                  | 🟢 Functional  | W8A8 available; working on more quantization method support (W4A8, etc) |
 | Graph Mode                    | 🔵 Experimental| Experimental, see detailed note: [vllm-ascend#767][graph_mode]           |
 | Sleep Mode                    | 🟢 Functional  |                                                                        |
+| Context Parallel              | 🔵 Experimental|                                                                        |

 - 🟢 Functional: Fully operational, with ongoing optimizations.
 - 🔵 Experimental: Experimental support, interfaces and functions may change.
--- a/docs/source/user_guide/support_matrix/supported_models.md
+++ b/docs/source/user_guide/support_matrix/supported_models.md
@@ -6,41 +6,41 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160

 ### Generative Models

-| Model                         | Support   | Note                                                                 | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
-|-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
-| DeepSeek V3/3.1               | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 240k || [DeepSeek-V3.1](../../tutorials/DeepSeek-V3.1.md) |
-| DeepSeek V3.2                 | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 160k | ✅ | [DeepSeek-V3.2](../../tutorials/DeepSeek-V3.2.md) |
-| DeepSeek R1                   | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 128k || [DeepSeek-R1](../../tutorials/DeepSeek-R1.md) |
-| DeepSeek Distill (Qwen/Llama) | ✅        |                                                                      |||||||||||||||||||
-| Qwen3                         | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ ||| ✅ || ✅ | ✅ | 128k | ✅ | [Qwen3-Dense](../../tutorials/Qwen3-Dense.md) |
-| Qwen3-based                   | ✅        |                                                                      |||||||||||||||||||
-| Qwen3-Coder                   | ✅        |                                                                      | ✅ | A2/A3 ||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)|
-| Qwen3-Moe                     | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | 256k || [Qwen3-235B-A22B](../../tutorials/Qwen3-235B-A22B.md) |
-| Qwen3-Next                    | ✅        |                                                                      | ✅ | A2/A3 | ✅ |||||| ✅ ||| ✅ || ✅ | ✅ ||| [Qwen3-Next](../../tutorials/Qwen3-Next.md) |
-| Qwen2.5                       | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ ||| ✅ |||||| [Qwen2.5-7B](../../tutorials/Qwen2.5-7B.md) |
-| Qwen2                         | ✅        |                                                                      |||||||||||||||||||
-| Qwen2-based                   | ✅        |                                                                      |||||||||||||||||||
-| QwQ-32B                       | ✅        |                                                                      |||||||||||||||||||
-| Llama2/3/3.1                  | ✅        |                                                                      |||||||||||||||||||
-| Internlm                      | ✅        | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962)     |||||||||||||||||||
-| Baichuan                      | ✅        |                                                                      |||||||||||||||||||
-| Baichuan2                     | ✅        |                                                                      |||||||||||||||||||
-| Phi-4-mini                    | ✅        |                                                                      |||||||||||||||||||
-| MiniCPM                       | ✅        |                                                                      |||||||||||||||||||
-| MiniCPM3                      | ✅        |                                                                      |||||||||||||||||||
-| Ernie4.5                      | ✅        |                                                                      |||||||||||||||||||
-| Ernie4.5-Moe                  | ✅        |                                                                      |||||||||||||||||||
-| Gemma-2                       | ✅        |                                                                      |||||||||||||||||||
-| Gemma-3                       | ✅        |                                                                      |||||||||||||||||||
-| Phi-3/4                       | ✅        |                                                                      |||||||||||||||||||
-| Mistral/Mistral-Instruct      | ✅        |                                                                      |||||||||||||||||||
-| GLM-4.5                       | ✅        |                                                                      |||||||||||||||||||
-| GLM-4                         | ❌        | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255)     |||||||||||||||||||
-| GLM-4-0414                    | ❌        | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258)     |||||||||||||||||||
-| ChatGLM                       | ❌        | [#554](https://github.com/vllm-project/vllm-ascend/issues/554)       |||||||||||||||||||
-| DeepSeek V2.5                 | 🟡        | Need test                                                            |||||||||||||||||||
-| Mllama                        | 🟡        | Need test                                                            |||||||||||||||||||
-| MiniMax-Text                  | 🟡        | Need test                                                            |||||||||||||||||||
+| Model                         | Support   | Note                                                                 | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Context Parallel | Doc |
+|-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|-----|
+| DeepSeek V3/3.1               | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 240k || ✅ | [DeepSeek-V3.1](../../tutorials/DeepSeek-V3.1.md) |
+| DeepSeek V3.2                 | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 160k | ✅ || [DeepSeek-V3.2](../../tutorials/DeepSeek-V3.2.md) |
+| DeepSeek R1                   | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 128k || ✅ | [DeepSeek-R1](../../tutorials/DeepSeek-R1.md) |
+| DeepSeek Distill (Qwen/Llama) | ✅        |                                                                      ||||||||||||||||||||
+| Qwen3                         | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ ||| ✅ || ✅ | ✅ | 128k | ✅ | ✅ | [Qwen3-Dense](../../tutorials/Qwen3-Dense.md) |
+| Qwen3-based                   | ✅        |                                                                      ||||||||||||||||||||
+| Qwen3-Coder                   | ✅        |                                                                      | ✅ | A2/A3 ||✅|✅|✅|||✅|✅|✅|✅|||||| ✅ | [Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)|
+| Qwen3-Moe                     | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | 256k || ✅ | [Qwen3-235B-A22B](../../tutorials/Qwen3-235B-A22B.md) |
+| Qwen3-Next                    | ✅        |                                                                      | ✅ | A2/A3 | ✅ |||||| ✅ ||| ✅ || ✅ | ✅ |||| [Qwen3-Next](../../tutorials/Qwen3-Next.md) |
+| Qwen2.5                       | ✅        |                                                                      | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ ||| ✅ ||||||| [Qwen2.5-7B](../../tutorials/Qwen2.5-7B.md) |
+| Qwen2                         | ✅        |                                                                      ||||||||||||||||||||
+| Qwen2-based                   | ✅        |                                                                      ||||||||||||||||||||
+| QwQ-32B                       | ✅        |                                                                      ||||||||||||||||||||
+| Llama2/3/3.1                  | ✅        |                                                                      ||||||||||||||||||||
+| Internlm                      | ✅        | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962)     ||||||||||||||||||||
+| Baichuan                      | ✅        |                                                                      ||||||||||||||||||||
+| Baichuan2                     | ✅        |                                                                      ||||||||||||||||||||
+| Phi-4-mini                    | ✅        |                                                                      ||||||||||||||||||||
+| MiniCPM                       | ✅        |                                                                      ||||||||||||||||||||
+| MiniCPM3                      | ✅        |                                                                      ||||||||||||||||||||
+| Ernie4.5                      | ✅        |                                                                      ||||||||||||||||||||
+| Ernie4.5-Moe                  | ✅        |                                                                      ||||||||||||||||||||
+| Gemma-2                       | ✅        |                                                                      ||||||||||||||||||||
+| Gemma-3                       | ✅        |                                                                      ||||||||||||||||||||
+| Phi-3/4                       | ✅        |                                                                      ||||||||||||||||||||
+| Mistral/Mistral-Instruct      | ✅        |                                                                      ||||||||||||||||||||
+| GLM-4.5                       | ✅        |                                                                      ||||||||||||||||||||
+| GLM-4                         | ❌        | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255)     ||||||||||||||||||||
+| GLM-4-0414                    | ❌        | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258)     ||||||||||||||||||||
+| ChatGLM                       | ❌        | [#554](https://github.com/vllm-project/vllm-ascend/issues/554)       ||||||||||||||||||||
+| DeepSeek V2.5                 | 🟡        | Need test                                                            ||||||||||||||||||||
+| Mllama                        | 🟡        | Need test                                                            ||||||||||||||||||||
+| MiniMax-Text                  | 🟡        | Need test                                                            ||||||||||||||||||||

 ### Pooling Models