[FEATURE][MTP] Support MTP > 1 (#2708)

### What this PR does / why we need it?
[RFC:Support MTP > 1 for
DeepSeek](https://github.com/vllm-project/vllm-ascend/issues/2745)

- [x] dp1 tp16
- [x] dp4 tp4
- [x] dp2 tp 8
- [x] torchair graph

- vLLM version: v0.10.1.1
- vLLM main:
c9f7081f9c

Signed-off-by: 1092626063 <1092626063@qq.com>
This commit is contained in:
1092626063
2025-09-05 09:11:22 +08:00
committed by GitHub
parent 83eb40a51c
commit 5b3646ab21
5 changed files with 206 additions and 88 deletions

View File

@@ -1020,7 +1020,6 @@ class AscendMLATorchairImpl(MLAAttentionImpl):
input_layout = "BNSD"
if attn_metadata.attn_state == AscendAttentionState.SpecDecoding:
assert num_tokens % self.spec_token_num == 0
input_layout = "TND"
# [bs * q_seq_len, num_heads_per_rank, dim]
q_nope = q_nope.view(num_tokens, self.num_heads, -1)