[Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202)

### What this PR does / why we need it? Fixes a compatible bug with torch_npu.npu_fused_infer_attention_score which is discribed in https://github.com/vllm-project/vllm-ascend/issues/4020. @momo609 tells us this solution. cherry-pick: https://github.com/vllm-project/vllm-ascend/pull/4025 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: Icey <1790571317@qq.com>
2025-11-17 10:56:23 +08:00
parent a7eb42cf0a
commit 378e92a2a2
2 changed files with 2 additions and 2 deletions
--- a/vllm_ascend/attention/attention_v1.py
+++ b/vllm_ascend/attention/attention_v1.py
@@ -115,7 +115,7 @@ class AscendAttentionBackend(AttentionBackend):

    @staticmethod
    def get_supported_block_size() -> list[int]:
-        return [64]
+        return [128]


 class AscendAttentionState(Enum):