[Bugfix] Fix Triton operator usage for multimodal models based on the mrope_interleaved parameter (#6042)

### What this PR does / why we need it?

When running the Qwen2.5-Omni-7B model on Ascend NPU, the engine fails
during the profiling/warmup stage with the following error:
`AclNN_Runtime_Error(EZ9903): rtKernelLaunchWithHandleV2 failed: 507035.
The vector core execution is abnormal.`

error log:
https://github.com/vllm-project/vllm-ascend/actions/runs/21144534911/job/60806765393#step:17:6412

This error is specifically triggered by the `triton_mrope` kernel when
handling the unique `mrope_section` configurations of the Omni model.
Other multimodal models with standard sections (e.g., [16, 24, 24]) or
standard LLMs work correctly with Triton.

Modified vllm_ascend/ops/rotary_embedding.py to add a conditional check
before calling forward_triton.

1. For standard LLMs (mrope_interleaved = True ), it continues to use
Triton for acceleration.

2. For complex configurations (like Qwen2.5-Omni mrope_interleaved =
False ), it now falls back to the native super().forward_oot() path,
which uses the stable torch_npu or PyTorch implementation.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
zhangxinyuehfad
2026-01-22 15:46:05 +08:00
committed by GitHub
parent 38edfd585a
commit 9bba0a2a68

View File

@@ -586,7 +586,7 @@ class AscendMRotaryEmbedding(MRotaryEmbedding):
query: torch.Tensor,
key: torch.Tensor,
):
if HAS_TRITON and positions.ndim == 2:
if HAS_TRITON and positions.ndim == 2 and self.mrope_interleaved:
# todo: need cann update in 8.5.0
return self.forward_triton(positions, query, key)