[Fixbug] Fix shape not match when sliding_window and dynamic batch_size (#2830)

### What this PR does / why we need it? Fix shape not match when test LLM-Research/Phi-4-mini-instruct accuarcy ### Does this PR introduce _any_ user-facing change? Users can't set dynamic batch_size or use lm_eval test accuracy when using models(sliding_window) ### How was this patch tested? accuarcy of LLM-Research/Phi-4-mini-instruct is ok : ``` vllm (pretrained=LLM-Research/Phi-4-mini-instruct,max_model_len=4096,dtype=auto,tensor_parallel_size=1), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8105|± |0.0108| | | |strict-match | 5|exact_match|↑ |0.8097|± |0.0108| ``` - vLLM version: v0.10.2 - vLLM main: 3c96e7b8a1 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-09-19 22:35:14 +08:00
parent cf549b976d
commit a22b532d38
2 changed files with 37 additions and 1 deletions
--- a/vllm_ascend/attention/attention_v1.py
+++ b/vllm_ascend/attention/attention_v1.py
@@ -378,7 +378,8 @@ class AscendAttentionBackendImpl(AttentionImpl):
            # seq_lens_tensor needs to be transferred to the device for 310P.
            attn_metadata.seq_lens = \
                attn_metadata.seq_lens.to(device=query.device)
-        if self.sliding_window is not None:
+        if self.sliding_window is not None and attn_metadata.seq_lens.shape[
+                0] == query.size(0):
            batch_size = attn_metadata.seq_lens.shape[0]
            block_size = 128
            query = query.view(batch_size, 1, self.num_heads * self.head_size)