[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495)

### What this PR does / why we need it?
shared expert dp for deepseek and deepseek_mtp, could be combined with
sp to improve performance.

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
Co-authored-by: realliujiaxu <realliujiaxu@163.com>
This commit is contained in:
zhaozx-cn
2025-10-17 15:06:37 +08:00
committed by GitHub
parent d9ee491f70
commit bf87606932
9 changed files with 57 additions and 10 deletions

View File

@@ -1245,7 +1245,8 @@ class AscendMLAImpl(MLAAttentionImpl):
current_ms_metadata = get_multistream_comm_context()
if current_ms_metadata is not None:
with torch.npu.stream(current_ms_metadata.comm_stream):
o_proj_input[num_decode_tokens:] = output_prefill
o_proj_input[
num_decode_tokens:num_actual_tokens] = output_prefill
current_ms_metadata.after_comm_event.record()
else:
o_proj_input[