[BugFix] Fix implementation bug of triton rope_siso (#7082)

### What this PR does / why we need it? Previously implemention of triton rope_siso missing the storage of second half of rope results, which will result in: 1. accuracy problem in neox-style scenario 2. ub overflow in non neox-style scenario This PR fixes it and supplement nightly test case for it. - vLLM version: v0.16.0 - vLLM main: 4034c3d32e Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-03-09 23:08:43 +08:00
parent 542258ac9d
commit 48b624e4cc
2 changed files with 88 additions and 1 deletions
--- a/vllm_ascend/ops/triton/rope.py
+++ b/vllm_ascend/ops/triton/rope.py
@@ -218,6 +218,9 @@ def _triton_rope_siso(
        new_qk_tile_1 = qk_tile_1 * cos_row - qk_tile_2 * sin_row
        tl.store(qk_start_ptr + first_half_offsets, new_qk_tile_1, mask=first_mask)

+        new_qk_tile_2 = qk_tile_2 * cos_row + qk_tile_1 * sin_row
+        tl.store(qk_start_ptr + second_half_offsets, new_qk_tile_2, mask=second_mask)
+

 def rope_forward_triton(
    q: torch.Tensor,