[Bugfix] Remove swa parameter of fia (#5602)

### What this PR does / why we need it? When using the swa parameter in fia, headDim does not currently support 256, and when gemma3's headDim is equal to 256, an error will occur. Therefore, code rollback is required, and it will be incorporated after cann supports it. ### Does this PR introduce _any_ user-facing change? Remove swa parameter of fia. ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: 7157596103 --------- Signed-off-by: nsdie <yeyifan@huawei.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>
2026-01-06 17:24:43 +08:00
parent 29e2f9a43e
commit cc0110abb4
1 changed files with 2 additions and 6 deletions
--- a/vllm_ascend/attention/attention_v1.py
+++ b/vllm_ascend/attention/attention_v1.py
@@ -574,11 +574,7 @@ class AscendAttentionBackendImpl(AttentionImpl):
            query=query,
            key=key,
            value=value,
-            pre_tokens=self.sliding_window
+            atten_mask=attn_metadata.attn_mask,
            if self.sliding_window else SWA_INT_MAX,
            next_tokens=0 if self.sliding_window else SWA_INT_MAX,
            atten_mask=attn_metadata.swa_mask
            if self.sliding_window else attn_metadata.attn_mask,
            block_table=block_table,
            input_layout="TND",
            block_size=block_size,
@@ -587,7 +583,7 @@ class AscendAttentionBackendImpl(AttentionImpl):
            num_key_value_heads=self.num_kv_heads,
            num_heads=self.num_heads,
            scale=self.scale,
-            sparse_mode=4 if self.sliding_window else 3,
+            sparse_mode=3,
        )
        attn_output = attn_output.view(num_tokens, self.num_heads,