[Bugfix] Fix mtp torchair in pd Disaggregation scenario (#2951)

### What this PR does / why we need it? 1. In memory of #2509, Fix mtp torchair in pd Disaggregation scenario 2. fix mla bug in SpecDecoding Scenario， since num_decodes != num_decode_tokens ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: 5206ab20ba Signed-off-by: xuyexiong <xuyexiong@huawei.com>
2025-09-17 09:07:58 +08:00
parent 6b7117dbb7
commit ae758dda05
3 changed files with 58 additions and 9 deletions
--- a/vllm_ascend/attention/mla_v1.py
+++ b/vllm_ascend/attention/mla_v1.py
@@ -379,11 +379,12 @@ class AscendMLAMetadataBuilder:

        decode_metadata = None
        if num_decodes > 0:
+            # Notice that num_decodes != num_decode_tokens in SpecDecoding Scenario
            actual_seq_lengths_q = query_start_loc[1:num_decodes + 1].tolist()
            max_seq_lens = seq_lens[:num_decodes].max().item()
-            seq_lens = seq_lens[:num_decode_tokens]
+            seq_lens = seq_lens[:num_decodes]
            input_positions = input_positions[:num_decode_tokens]
-            block_table = block_table[:num_decode_tokens, ...]
+            block_table = block_table[:num_decodes, ...]
            seq_lens_list = seq_lens.tolist()

            cos = self.cos_cache[input_positions].unsqueeze(  # type: ignore