[CI] Add long and short prompt tests for DeepSeek-V3.2 (#6536)

### What this PR does / why we need it? This version has no divisibility constraint between tp and mtp+1. However, cudagraph_capture_sizes must be a common multiple of tp and mtp+1, with a maximum of tp * (mtp+1). Therefore, we fixed cudagraph_capture_sizes. We added a long-sequence test (64k input, 3k output) for the two-node mixed deployment scenario. Due to the excessive time required for performance benchmarking, we are only verifying functionality. The single-node scenario is skipped because VRAM limitations prevent launching the model with a max-model-len of 68,000. and we also add aime2025 test for dual-node deepseek 3.2 nightly test. ### How was this patch tested? test at nightly environment. - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 Signed-off-by: guozr <guozr1997@hotmail.com> Co-authored-by: guozr <guozr1997@hotmail.com>
2026-02-26 10:58:50 +08:00
parent 169e434f78
commit bc1622338c
3 changed files with 64 additions and 16 deletions
--- a/tests/e2e/multicard/2-cards/test_offline_inference_distributed.py
+++ b/tests/e2e/multicard/2-cards/test_offline_inference_distributed.py
@@ -255,18 +255,18 @@ def test_deepseek3_2_w8a8_pruning_mtp_tp2_ep():
    long_example_prompts = [
        "Hello " * (163839 - 500) + "Hello"
    ]
-    max_tokens = 500 
+    max_tokens = 500
    with VllmRunner("vllm-ascend/DeepSeek-V3.2-W8A8-Pruning",
                    tensor_parallel_size=2,
                    quantization="ascend",
                    enable_expert_parallel=True,
                    max_model_len=163840,
                    compilation_config={
-                        "cudagraph_capture_sizes": [3, 6, 9, 12],
+                        "cudagraph_capture_sizes": [2, 4, 6, 8, 10, 12],
                        "cudagraph_mode": "FULL_DECODE_ONLY"
                    },
                    speculative_config={
-                        "num_speculative_tokens": 2,
+                        "num_speculative_tokens": 1,
                        "method": "deepseek_mtp"
                    },
                    additional_config={