[BugFix] Fix implementation bug of triton rope_siso (#7082)
### What this PR does / why we need it?
Previously implemention of triton rope_siso missing the storage of
second half of rope results, which will result in:
1. accuracy problem in neox-style scenario
2. ub overflow in non neox-style scenario
This PR fixes it and supplement nightly test case for it.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
@@ -218,6 +218,9 @@ def _triton_rope_siso(
|
||||
new_qk_tile_1 = qk_tile_1 * cos_row - qk_tile_2 * sin_row
|
||||
tl.store(qk_start_ptr + first_half_offsets, new_qk_tile_1, mask=first_mask)
|
||||
|
||||
new_qk_tile_2 = qk_tile_2 * cos_row + qk_tile_1 * sin_row
|
||||
tl.store(qk_start_ptr + second_half_offsets, new_qk_tile_2, mask=second_mask)
|
||||
|
||||
|
||||
def rope_forward_triton(
|
||||
q: torch.Tensor,
|
||||
|
||||
Reference in New Issue
Block a user