[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495)

### What this PR does / why we need it? shared expert dp for deepseek and deepseek_mtp, could be combined with sp to improve performance. ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: realliujiaxu <realliujiaxu@163.com>
2025-10-17 15:06:37 +08:00
parent d9ee491f70
commit bf87606932
9 changed files with 57 additions and 10 deletions
--- a/vllm_ascend/attention/mla_v1.py
+++ b/vllm_ascend/attention/mla_v1.py
@@ -1245,7 +1245,8 @@ class AscendMLAImpl(MLAAttentionImpl):
            current_ms_metadata = get_multistream_comm_context()
            if current_ms_metadata is not None:
                with torch.npu.stream(current_ms_metadata.comm_stream):
-                    o_proj_input[num_decode_tokens:] = output_prefill
+                    o_proj_input[
+                        num_decode_tokens:num_actual_tokens] = output_prefill
                    current_ms_metadata.after_comm_event.record()
            else:
                o_proj_input[