[Perf][MoE] Improve MoE multistream parallel performace. (#1891)

This PR designs the shared expert multi-stream parallelism of
w8a8-dynamic-quantized MoE stage in more detail to achieve better
performance.

- vLLM version: v0.10.0
- vLLM main:
2cc571199b

Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
whx
2025-07-29 23:53:19 +08:00
committed by GitHub
parent 4df8e0027c
commit b6a7f07c70
3 changed files with 124 additions and 14 deletions

View File

@@ -393,7 +393,7 @@ class CustomDeepseekV2MoE(nn.Module):
# router_logits: (num_tokens, n_experts)
router_logits = None
if not self.rm_router_logits:
if not self.rm_router_logits and not self.enable_multistream_moe:
router_logits, _ = self.gate(hidden_states)
experts_hidden_states = self.experts(