Files
xc-llm-ascend/vllm_ascend
Mengqing Cao 58db21f56a [DP] Fix dp padding logic in dummyrun (#4705)
### What this PR does / why we need it?
Fix dp padding logic in dummyrun. After
https://github.com/vllm-project/vllm/pull/28579, `num_tokens` will be
padded in `CudagraphDispatcher`, thus we also need to do the pad in the
dummy_run.

### How was this patch tested?
Test locally with the following scripts
```bash
VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server \
         --model wemaster/deepseek_mtp_main_random_bf16 \
         --trust-remote-code \
         --data-parallel-size 4 \
         --tensor-parallel-size 1 \
         --compilation-config '{"cudagraph_capture_sizes":[96],"cudagraph_mode":"FULL_DECODE_ONLY"}' \
         --enable-expert-parallel
```
```bash
vllm bench serve --model wemaster/deepseek_mtp_main_random_bf16 --endpoint /v1/completions --dataset-name random --random-input 512 --random-output 100 --num-prompts 48 --request-rate 1 --ready-check-timeout-sec 0
```

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-12-08 20:32:35 +08:00
..
2025-12-08 11:02:42 +08:00
2025-12-05 09:03:45 +08:00
2025-12-02 22:10:52 +08:00
2025-11-24 17:08:20 +08:00
2025-12-08 19:19:17 +08:00
2025-12-08 11:02:42 +08:00
2025-12-08 11:02:42 +08:00
2025-12-08 11:02:42 +08:00
2025-12-02 17:35:47 +08:00