[bugfix]fix rope_forward_triton error (#6404)
### What this PR does / why we need it?
The rope_forward_triton method reports an error.
For example:
```
(Worker_DP0_TP1_EP1 pid=5298) ERROR 01-29 02:01:11 [multiproc_executor.py:822] q, k = rope_forward_triton(q, k, cos, sin, rope_dim=self.qk_rope_head_dim, is_neox_style=True)
(Worker_DP0_TP1_EP1 pid=5298) ERROR 01-29 02:01:11 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_TP1_EP1 pid=5298) ERROR 01-29 02:01:11 [multiproc_executor.py:822] File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/triton/rope.py", line 155, in rope_forward_triton
(Worker_DP0_TP1_EP1 pid=5298) ERROR 01-29 02:01:11 [multiproc_executor.py:822] cos = cos.view(num_tokens, -1)
(Worker_DP0_TP1_EP1 pid=5298) ERROR 01-29 02:01:11 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_TP1_EP1 pid=5298) ERROR 01-29 02:01:11 [multiproc_executor.py:822] RuntimeError: shape '[14, -1]' is invalid for input of size 768
```
This is because an incorrect num_tokens_padded was passed in.
Related-RFC: https://github.com/vllm-project/vllm-ascend/issues/5449
Co-authored-by: @zhenwenqi2024
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
This commit is contained in:
@@ -1210,9 +1210,9 @@ class NPUModelRunner(GPUModelRunner):
|
|||||||
(attn_metadata, spec_decode_common_attn_metadata) = (
|
(attn_metadata, spec_decode_common_attn_metadata) = (
|
||||||
self._build_attention_metadata(
|
self._build_attention_metadata(
|
||||||
num_tokens=num_tokens_unpadded,
|
num_tokens=num_tokens_unpadded,
|
||||||
num_tokens_padded=num_tokens_padded if pad_attn else None,
|
num_tokens_padded=num_tokens_padded,
|
||||||
num_reqs=num_reqs,
|
num_reqs=num_reqs,
|
||||||
num_reqs_padded=num_reqs_padded if pad_attn else None,
|
num_reqs_padded=num_reqs_padded,
|
||||||
max_query_len=max_num_scheduled_tokens,
|
max_query_len=max_num_scheduled_tokens,
|
||||||
ubatch_slices=ubatch_slices_attn,
|
ubatch_slices=ubatch_slices_attn,
|
||||||
logits_indices=logits_indices,
|
logits_indices=logits_indices,
|
||||||
|
|||||||
Reference in New Issue
Block a user