[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B (#5502)
### What this PR does / why we need it?
This PR adds online **Disaggregated Prefill/Decode** performance and
accuracy tests for the **Qwen3-235B-A22B** and
**Qwen3-VL-235B-A22B-Instruct** models to the Nightly test suite.
These test configurations simulate the deployment of massive MoE and
Vision-Language models in **a dual-node (32 NPU)** environment,
utilizing Mooncake (KVCache Transfer) technology to achieve efficient KV
cache transfer between the Prefill node and the Decode node.
#### Test Configuration
**Qwen3-235B-A22B**
- Model: Qwen/Qwen3-235B-A22B
- Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
- Architecture: Disaggregated Prefill & Decode
- Node 0 (Producer/Prefill): **DP2 + TP8 + EP + FLASHCOMM1 +
FUSED_MC2**.
- Node 1 (Consumer/Decode): **DP4 + TP4 + EP + FLASHCOMM1 + FUSED_MC2 +
FULL_DECODE_ONLY**.
- Benchmarks:
- Performance: vllm-ascend/GSM8K-in3500-bs2800.
- Accuracy: vllm-ascend/gsm8k-lite.
**Qwen3-VL-235B-A22B-Instruct**
- Model: Qwen/Qwen3-VL-235B-A22B-Instruct
- Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
- Architecture: Disaggregated Prefill & Decode
- Node 0 (Producer/Prefill): **DP2 + TP8 + EP**.
- Node 1 (Consumer/Decode): **DP4 + TP4 + EP + FULL_DECODE_ONLY**.
- Benchmarks:
- Performance: vllm-ascend/textvqa-perf-1080p.
- Accuracy: vllm-ascend/textvqa-lite.
### How was this patch tested?
Nightly test action on CI
- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
This commit is contained in:
@@ -108,8 +108,12 @@ class AscendConfig:
|
||||
decode_tp_size = min(decode_tp_size, num_kv_head)
|
||||
self.pd_head_ratio = prefill_tp_size // decode_tp_size
|
||||
except Exception:
|
||||
raise AssertionError(
|
||||
"Can not get num_key_value_heads from model_config")
|
||||
raise ValueError(
|
||||
"The text_config extracted from the model config does not have "
|
||||
"`num_key_value_heads` attribute. This indicates a mismatch "
|
||||
"between the model config and vLLM's expectations. Please "
|
||||
"ensure that the model config is compatible with vLLM."
|
||||
)
|
||||
|
||||
if self.pd_tp_ratio == 0:
|
||||
raise AssertionError(
|
||||
|
||||
Reference in New Issue
Block a user