xc-llm-ascend/vllm_ascend at 58db21f56a2874c935a5c18e4b86b5cdce84f59f - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Mengqing Cao 58db21f56a [DP] Fix dp padding logic in dummyrun (#4705 )

### What this PR does / why we need it?
Fix dp padding logic in dummyrun. After
https://github.com/vllm-project/vllm/pull/28579, `num_tokens` will be
padded in `CudagraphDispatcher`, thus we also need to do the pad in the
dummy_run.

### How was this patch tested?
Test locally with the following scripts
```bash
VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server \
         --model wemaster/deepseek_mtp_main_random_bf16 \
         --trust-remote-code \
         --data-parallel-size 4 \
         --tensor-parallel-size 1 \
         --compilation-config '{"cudagraph_capture_sizes":[96],"cudagraph_mode":"FULL_DECODE_ONLY"}' \
         --enable-expert-parallel
```
```bash
vllm bench serve --model wemaster/deepseek_mtp_main_random_bf16 --endpoint /v1/completions --dataset-name random --random-input 512 --random-output 100 --num-prompts 48 --request-rate 1 --ready-check-timeout-sec 0
```

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-12-08 20:32:35 +08:00

..

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Bugfix] Fix Dcp dimension mismatch when enable Mlapo (#4687 )

2025-12-08 17:19:58 +08:00

remove useless patch (#4699 )

2025-12-08 11:02:42 +08:00

Drop ascend scheduler (#4623 )

2025-12-05 09:03:45 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[P/D] check kv extra config and del hccl backend (#4547 )

2025-12-07 15:19:42 +08:00

[EPLB] Add log Info for moe_load Imbalance Ratio (#4482 )

2025-12-08 14:28:13 +08:00

upgrade vLLM to main (#4608 )

2025-12-02 22:10:52 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

fix qwen3vl mrope op (#4484 )

2025-12-08 19:19:17 +08:00

remove useless patch (#4699 )

2025-12-08 11:02:42 +08:00

[Bugifx] fix quant_apply_mlp w1_scale type error & fix getting num_local_expert (#4632 )

2025-12-05 16:04:24 +08:00

[BugFix][Triton] Fix ub overflow bug of sample_recover_tokens_kernel (#4673 )

2025-12-05 15:16:19 +08:00

remove useless patch (#4699 )

2025-12-08 11:02:42 +08:00

remove useless patch (#4699 )

2025-12-08 11:02:42 +08:00

[DP] Fix dp padding logic in dummyrun (#4705 )

2025-12-08 20:32:35 +08:00

[Feat] Add Euler xlite graph wrapper support (#4526 )

2025-12-08 08:27:46 +08:00

__init__.py

clean up model module (#4611 )

2025-12-02 17:35:47 +08:00

ascend_config.py

[Feat] Add Euler xlite graph wrapper support (#4526 )

2025-12-08 08:27:46 +08:00

ascend_forward_context.py

add dispatch_gmm_combine kernel (#3532 )

2025-12-04 23:00:59 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feat] Add Euler xlite graph wrapper support (#4526 )

2025-12-08 08:27:46 +08:00

profiling_config.py

Drop ascend scheduler (#4623 )

2025-12-05 09:03:45 +08:00

utils.py

[BugFix] Refactor ACL graph size adjustment for speculative decoding (#4640 )

2025-12-07 17:32:45 +08:00