[FEATURE][MTP] Support MTP > 1 (#2708)
### What this PR does / why we need it?
[RFC:Support MTP > 1 for
DeepSeek](https://github.com/vllm-project/vllm-ascend/issues/2745)
- [x] dp1 tp16
- [x] dp4 tp4
- [x] dp2 tp 8
- [x] torchair graph
- vLLM version: v0.10.1.1
- vLLM main:
c9f7081f9c
Signed-off-by: 1092626063 <1092626063@qq.com>
This commit is contained in:
@@ -1020,7 +1020,6 @@ class AscendMLATorchairImpl(MLAAttentionImpl):
|
||||
input_layout = "BNSD"
|
||||
|
||||
if attn_metadata.attn_state == AscendAttentionState.SpecDecoding:
|
||||
assert num_tokens % self.spec_token_num == 0
|
||||
input_layout = "TND"
|
||||
# [bs * q_seq_len, num_heads_per_rank, dim]
|
||||
q_nope = q_nope.view(num_tokens, self.num_heads, -1)
|
||||
|
||||
Reference in New Issue
Block a user