[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B (#5502)

### What this PR does / why we need it? This PR adds online **Disaggregated Prefill/Decode** performance and accuracy tests for the **Qwen3-235B-A22B** and **Qwen3-VL-235B-A22B-Instruct** models to the Nightly test suite. These test configurations simulate the deployment of massive MoE and Vision-Language models in **a dual-node (32 NPU)** environment, utilizing Mooncake (KVCache Transfer) technology to achieve efficient KV cache transfer between the Prefill node and the Decode node. #### Test Configuration **Qwen3-235B-A22B** - Model: Qwen/Qwen3-235B-A22B - Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node) - Architecture: Disaggregated Prefill & Decode - Node 0 (Producer/Prefill): **DP2 + TP8 + EP + FLASHCOMM1 + FUSED_MC2**. - Node 1 (Consumer/Decode): **DP4 + TP4 + EP + FLASHCOMM1 + FUSED_MC2 + FULL_DECODE_ONLY**. - Benchmarks: - Performance: vllm-ascend/GSM8K-in3500-bs2800. - Accuracy: vllm-ascend/gsm8k-lite. **Qwen3-VL-235B-A22B-Instruct** - Model: Qwen/Qwen3-VL-235B-A22B-Instruct - Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node) - Architecture: Disaggregated Prefill & Decode - Node 0 (Producer/Prefill): **DP2 + TP8 + EP**. - Node 1 (Consumer/Decode): **DP4 + TP4 + EP + FULL_DECODE_ONLY**. - Benchmarks: - Performance: vllm-ascend/textvqa-perf-1080p. - Accuracy: vllm-ascend/textvqa-lite. ### How was this patch tested? Nightly test action on CI - vLLM version: v0.13.0 - vLLM main: 45c1ca1ca1 --------- Signed-off-by: MrZ20 <2609716663@qq.com>
2026-01-09 16:25:20 +08:00
parent f63c1341d9
commit 09b3f9d91b
4 changed files with 241 additions and 2 deletions
--- a/vllm_ascend/ascend_config.py
+++ b/vllm_ascend/ascend_config.py
@@ -108,8 +108,12 @@ class AscendConfig:
                    decode_tp_size = min(decode_tp_size, num_kv_head)
                    self.pd_head_ratio = prefill_tp_size // decode_tp_size
                except Exception:
-                    raise AssertionError(
-                        "Can not get num_key_value_heads from model_config")
+                    raise ValueError(
+                        "The text_config extracted from the model config does not have "
+                        "`num_key_value_heads` attribute. This indicates a mismatch "
+                        "between the model config and vLLM's expectations. Please "
+                        "ensure that the model config is compatible with vLLM."
+                    )

            if self.pd_tp_ratio == 0:
                raise AssertionError(