xc-llm-ascend/vllm_ascend at e945e919331d8856cd01db7c467b61da4bf76c61 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

wujinyuan1 06f6cc1c81 [Bugfix]Fix the hang issue of multimodal model when running with DP>1 (#4392 )

### What this PR does / why we need it?
When cudagraph_mode is set to FULL_DECODE_ONLY, if dp > 1, the dummy-run
process will be triggered. When calling the update_attn_params function,
the num_tokens parameter needs to be passed, and this value is obtained
through positions.shape[0]. However, the multimodal model uses mRope
(multi-dimensional rotary positional embeddings), which causes the shape
of positions to be 2. As a result, the value obtained from
positions.shape[0] is incorrect. We solve this problem by replacing
positions.shape[0] with num_tokens.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: wujinyuan1 <wjy9595@qq.com>
Co-authored-by: wujinyuan1 <wjy9595@qq.com>

2025-11-25 09:33:49 +08:00

..

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

[Feat] Support MTP to running in full graph mode (#3892 )

2025-11-20 20:34:54 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

eplb redundant expert bugfix (#4291 )

2025-11-21 14:24:35 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

[Bugfix][MoE] enable force_load_balance in aclgraph (#4366 )

2025-11-24 20:33:56 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

[Refactor] remove moe type of multicast. (#4224 )

2025-11-24 17:32:37 +08:00

[Refactor] remove moe type of multicast. (#4224 )

2025-11-24 17:32:37 +08:00

[Bugfix]Fix the hang issue of multimodal model when running with DP>1 (#4392 )

2025-11-25 09:33:49 +08:00

__init__.py

[Misc][Doc] Add service profiling feature with user guide (#3756 )

2025-11-12 09:07:14 +08:00

ascend_config.py

[feature] vllm-ascend support msprobe (eager mode dump) (#4241 )

2025-11-24 21:58:31 +08:00

ascend_forward_context.py

[Refactor] remove moe type of multicast. (#4224 )

2025-11-24 17:32:37 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Bugfix] fix nightly multi-node EPLB tests' "DYNAMIC_EPLB=true" environment not working (#4223 )

2025-11-19 21:31:58 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

profiling_config.py

[Misc][Doc] Add service profiling feature with user guide (#3756 )

2025-11-12 09:07:14 +08:00

utils.py

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00