[Bugfix] Remove swa parameter of fia (#5602)
### What this PR does / why we need it?
When using the swa parameter in fia, headDim does not currently support
256, and when gemma3's headDim is equal to 256, an error will occur.
Therefore, code rollback is required, and it will be incorporated after
cann supports it.
### Does this PR introduce _any_ user-facing change?
Remove swa parameter of fia.
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
7157596103
---------
Signed-off-by: nsdie <yeyifan@huawei.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -574,11 +574,7 @@ class AscendAttentionBackendImpl(AttentionImpl):
|
|||||||
query=query,
|
query=query,
|
||||||
key=key,
|
key=key,
|
||||||
value=value,
|
value=value,
|
||||||
pre_tokens=self.sliding_window
|
atten_mask=attn_metadata.attn_mask,
|
||||||
if self.sliding_window else SWA_INT_MAX,
|
|
||||||
next_tokens=0 if self.sliding_window else SWA_INT_MAX,
|
|
||||||
atten_mask=attn_metadata.swa_mask
|
|
||||||
if self.sliding_window else attn_metadata.attn_mask,
|
|
||||||
block_table=block_table,
|
block_table=block_table,
|
||||||
input_layout="TND",
|
input_layout="TND",
|
||||||
block_size=block_size,
|
block_size=block_size,
|
||||||
@@ -587,7 +583,7 @@ class AscendAttentionBackendImpl(AttentionImpl):
|
|||||||
num_key_value_heads=self.num_kv_heads,
|
num_key_value_heads=self.num_kv_heads,
|
||||||
num_heads=self.num_heads,
|
num_heads=self.num_heads,
|
||||||
scale=self.scale,
|
scale=self.scale,
|
||||||
sparse_mode=4 if self.sliding_window else 3,
|
sparse_mode=3,
|
||||||
)
|
)
|
||||||
|
|
||||||
attn_output = attn_output.view(num_tokens, self.num_heads,
|
attn_output = attn_output.view(num_tokens, self.num_heads,
|
||||||
|
|||||||
Reference in New Issue
Block a user