[Bugfix] fix pcp qwen full graph FIA bug (#6037)

### What this PR does / why we need it? In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: 2c24bc6996 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2026-01-21 08:49:05 +08:00
parent b6d55fc48e
commit b399117e89
1 changed files with 2 additions and 5 deletions
--- a/vllm_ascend/compilation/acl_graph.py
+++ b/vllm_ascend/compilation/acl_graph.py
@@ -440,11 +440,8 @@ def update_attn_dcp_pcp_params(update_stream, forward_context, runtime_shape):
                pad_tensor = np.zeros(pad_length, dtype=actual_seq_lengths_kv.dtype)
                actual_seq_lengths_kv = np.concatenate([actual_seq_lengths_kv, pad_tensor])
-            actual_seq_lengths_q = attn_metadata.actual_seq_lengths_q[: attn_metadata.num_decode_tokens]
+            actual_seq_lengths_q = attn_metadata.actual_seq_lengths_q
-            if runtime_shape - len(actual_seq_lengths_q):
+
                actual_seq_lengths_q = actual_seq_lengths_q + [actual_seq_lengths_q[-1]] * (
                    runtime_shape - len(actual_seq_lengths_q)
                )
            if dcp_size > 1:
                num_heads = num_heads * dcp_size