xc-llm-ascend/index.md at fc2bcbe21c86f7684c80e42771b128da9fc17571 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

yupeng a746f8274f [DOC] Qwen3 PD disaggregation user guide (#2751 )

### What this PR does / why we need it?
The PR is for the document of the prefiller&decoder disaggregation
deloyment guide.

The scenario of the guide is:
- Use 3 nodes totally and 2 NPUs on each node
- Qwen3-30B-A3B
- 1P2D
- Expert Parallel

The deployment can be used to verify PD Disggregation / Expert Parallel
features with a slightly less resources.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No.


- vLLM version: v0.10.1.1
- vLLM main:
e599e2c65e

---------

Signed-off-by: paulyu12 <507435917@qq.com>

2025-09-07 10:35:37 +08:00

20 lines

312 B

Markdown

Raw Blame History

 # Tutorials
 :::{toctree}
 :caption: Deployment
 :maxdepth: 1
 single_npu
 single_npu_multimodal
 single_npu_audio
 single_npu_qwen3_embedding
 single_npu_qwen3_quantization
 multi_npu
 multi_npu_moge
 multi_npu_qwen3_moe
 multi_npu_quantization
 single_node_300i
 multi_node
 multi_node_kimi
 multi_node_pd_disaggregation
 :::