xc-llm-ascend/vllm_ascend at c3c265648f6fb3bf9ea2f6c0e43a4a2e67973d40 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Zhujiyang2 c3c265648f [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (#6939 )

What this PR does / why we need it?
When using a draft model (e.g., in MTP speculative decoding) with shared
expert data parallelism (enabled via flashcomm), a shape mismatch error
occurs in the rotary embedding calculation for models like GLM-4.7. This
is because the positions tensor has an incorrect shape for this specific
configuration.

This PR fixes the issue by adding a check in
AscendRotaryEmbedding.forward_oot. If the model is a draft model and
shared expert DP is enabled, it processes the positions tensor using
torch.ops.vllm.maybe_all_gather_and_maybe_unpad to ensure its shape is
correct before applying the rotary embedding. This resolves the shape
mismatch error.
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: Zhu Jiyang <zhujiyang2@huawei.com>

2026-03-04 16:02:08 +08:00

..

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

2026-03-03 15:57:26 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[feat]ds3.2 pcp support mtp and chunkprefill (#6917 )

2026-03-03 19:03:50 +08:00

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (#6898 )

2026-03-02 23:24:03 +08:00

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[BugFix][PCP] Fix presion bugs for pcp/dcp in PD disaggregate (#6876 )

2026-03-02 16:11:00 +08:00

[EPLB] Reduce the memory used for heat aggregation (#6729 )

2026-02-24 18:02:24 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

[Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (#6939 )

2026-03-04 16:02:08 +08:00

[Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914 )

2026-03-04 11:29:49 +08:00

[Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (#6828 )

2026-03-03 00:07:23 +08:00

clean 0.15.0 support (#6852 )

2026-02-28 09:20:57 +08:00

[Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (#6939 )

2026-03-04 16:02:08 +08:00

[Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914 )

2026-03-04 11:29:49 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

ascend_forward_context.py

add mxfp8 moe quantization (#6670 )

2026-03-02 11:04:06 +08:00

batch_invariant.py

implement batch invariant with ascendc (#6590 )

2026-02-10 14:15:26 +08:00

cpu_binding.py

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

2026-03-03 17:20:52 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[BugFix] Improve GDN layer detection for multimodal models (#6941 )

2026-03-03 20:08:39 +08:00