[Bugfix]fix bmm_transpose ops in dsv32 (#4791)
### What this PR does / why we need it?
bmm transpose ops can't be used in cp, so add judgement in the modeling
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: hust17yixuan <303660421@qq.com>
This commit is contained in:
@@ -485,7 +485,8 @@ class AscendSFAImpl(MLAAttentionImpl):
|
|||||||
|
|
||||||
def _v_up_proj(self, x):
|
def _v_up_proj(self, x):
|
||||||
if x.dtype in [torch.float16, torch.bfloat16] \
|
if x.dtype in [torch.float16, torch.bfloat16] \
|
||||||
and hasattr(torch.ops._C_ascend, "batch_matmul_transpose"):
|
and hasattr(torch.ops._C_ascend, "batch_matmul_transpose") \
|
||||||
|
and not self.enable_sfa_cp:
|
||||||
x = x.view(-1, self.num_heads, self.kv_lora_rank)
|
x = x.view(-1, self.num_heads, self.kv_lora_rank)
|
||||||
b, _, _ = x.shape
|
b, _, _ = x.shape
|
||||||
res = torch.empty((b, self.num_heads, self.v_head_dim),
|
res = torch.empty((b, self.num_heads, self.v_head_dim),
|
||||||
|
|||||||
Reference in New Issue
Block a user