fix torchair execute issue on padding data, and mtp padding logic (#1160)

### What this PR does / why we need it? The former PR https://github.com/vllm-project/vllm-ascend/pull/736 select the valid token inside the `input_ids` and `position_ids` breaks the necessary padding required by torchair. In this PR, we pending the pad logic after the multimodal part. Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-06-10 22:20:40 +08:00
parent 95414bae70
commit 291c216898
2 changed files with 9 additions and 6 deletions
--- a/vllm_ascend/attention/mla_v1.py
+++ b/vllm_ascend/attention/mla_v1.py
@@ -376,7 +376,10 @@ class AscendMLAMetadataBuilder:
            seq_lens = seq_lens[:self._num_decode_tokens]
            input_positions = input_positions[:self._num_decode_tokens]
            block_table = block_table[:self._num_decode_tokens, ...]
-            if use_torchair_graph and self.runner.attn_state == AscendAttentionState.DecodeOnly:
+            if use_torchair_graph and self.runner.attn_state in [
+                    AscendAttentionState.DecodeOnly,
+                    AscendAttentionState.SpecDecoding
+            ]:
                num_seqs = len(seq_lens)
                if graph_pad_size != 0:
                    pad_value = 1