diff --git a/docs/source/user_guide/support_matrix/supported_features.md b/docs/source/user_guide/support_matrix/supported_features.md index 852843ff..6210313c 100644 --- a/docs/source/user_guide/support_matrix/supported_features.md +++ b/docs/source/user_guide/support_matrix/supported_features.md @@ -10,11 +10,11 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th |-------------------------------|----------------|------------------------------------------------------------------------| | Chunked Prefill | 🟢 Functional | Functional, see detailed note: [Chunked Prefill][cp] | | Automatic Prefix Caching | 🟢 Functional | Functional, see detailed note: [vllm-ascend#732][apc] | -| LoRA | 🟢 Functional | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora] | +| LoRA | 🟢 Functional | Functional, see detailed note: [LoRA][LoRA] | | Speculative decoding | 🟢 Functional | Basic support | | Pooling | 🟢 Functional | CI needed to adapt to more models; V1 support relies on vLLM support. | | Enc-dec | 🟡 Planned | vLLM should support this feature first. | -| Multi Modality | 🟢 Functional | [Tutorial][multimodal], optimizing and adapting more models | +| Multi Modality | 🟢 Functional | [Multi Modality][multimodal], optimizing and adapting more models | | LogProbs | 🟢 Functional | CI needed | | Prompt logProbs | 🟢 Functional | CI needed | | Async output | 🟢 Functional | CI needed | @@ -26,9 +26,9 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th | Data Parallel | 🟢 Functional | Data Parallel support for Qwen3 MoE. | | Prefill Decode Disaggregation | 🟢 Functional | Functional, xPyD is supported. | | Quantization | 🟢 Functional | W8A8 available; working on more quantization method support (W4A8, etc) | -| Graph Mode | 🔵 Experimental| Experimental, see detailed note: [vllm-ascend#767][graph_mode] | -| Sleep Mode | 🟢 Functional | | -| Context Parallel | 🔵 Experimental| | +| Graph Mode | 🟢 Functional | Functional, see detailed note: [Graph Mode][graph_mode] | +| Sleep Mode | 🟢 Functional | Functional, see detailed note: [Sleep Mode][sleep_mode] | +| Context Parallel | 🟢 Functional | Functional, see detailed note: [Context Parallel][context_parallel] | - 🟢 Functional: Fully operational, with ongoing optimizations. - 🔵 Experimental: Experimental support, interfaces and functions may change. @@ -37,12 +37,12 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th - 🔴 NO plan/Deprecated: No plan or deprecated by vLLM. [v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html -[multimodal]: https://docs.vllm.ai/projects/ascend/en/latest/tutorials/single_npu_multimodal.html +[multimodal]: https://docs.vllm.ai/projects/ascend/en/latest/tutorials/Qwen-VL-Dense.html [guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177 -[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396 -[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893 -[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767 +[LoRA]: https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/lora.html +[graph_mode]: https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/graph_mode.html [apc]: https://github.com/vllm-project/vllm-ascend/issues/732 [cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill [1P1D]: https://github.com/vllm-project/vllm-ascend/pull/950 -[ray]: https://github.com/vllm-project/vllm-ascend/issues/1751 +[context_parallel]: https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/context_parallel.html +[sleep_mode]: https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/sleep_mode.html