SILONG ZENG
09b3f9d91b
[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B (#5502)
### What this PR does / why we need it?
This PR adds online **Disaggregated Prefill/Decode** performance and
accuracy tests for the **Qwen3-235B-A22B** and
**Qwen3-VL-235B-A22B-Instruct** models to the Nightly test suite.
These test configurations simulate the deployment of massive MoE and
Vision-Language models in **a dual-node (32 NPU)** environment,
utilizing Mooncake (KVCache Transfer) technology to achieve efficient KV
cache transfer between the Prefill node and the Decode node.
#### Test Configuration
**Qwen3-235B-A22B**
- Model: Qwen/Qwen3-235B-A22B
- Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
- Architecture: Disaggregated Prefill & Decode
- Node 0 (Producer/Prefill): **DP2 + TP8 + EP + FLASHCOMM1 +
FUSED_MC2**.
- Node 1 (Consumer/Decode): **DP4 + TP4 + EP + FLASHCOMM1 + FUSED_MC2 +
FULL_DECODE_ONLY**.
- Benchmarks:
- Performance: vllm-ascend/GSM8K-in3500-bs2800.
- Accuracy: vllm-ascend/gsm8k-lite.
**Qwen3-VL-235B-A22B-Instruct**
- Model: Qwen/Qwen3-VL-235B-A22B-Instruct
- Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
- Architecture: Disaggregated Prefill & Decode
- Node 0 (Producer/Prefill): **DP2 + TP8 + EP**.
- Node 1 (Consumer/Decode): **DP4 + TP4 + EP + FULL_DECODE_ONLY**.
- Benchmarks:
- Performance: vllm-ascend/textvqa-perf-1080p.
- Accuracy: vllm-ascend/textvqa-lite.
### How was this patch tested?
Nightly test action on CI
- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
2026-01-09 16:25:20 +08:00
..
2025-12-30 19:03:02 +08:00
2025-12-30 19:03:02 +08:00
2026-01-05 22:40:28 +08:00
2025-12-30 19:03:02 +08:00
2025-12-30 19:03:02 +08:00
2026-01-07 10:02:02 +08:00
2025-12-31 09:11:42 +08:00
2025-12-30 19:03:02 +08:00
2025-12-30 19:03:02 +08:00
2026-01-09 16:25:20 +08:00
2025-12-30 19:03:02 +08:00
2026-01-05 14:08:11 +08:00
2025-12-30 19:03:02 +08:00
2026-01-09 16:25:20 +08:00