[v0.18.0][BugFix] Fix Qwen3.5 MoE flash comm v1 shared expert shape error of mtp layer on A2 (#8004)
### What this PR does / why we need it?
Fix Qwen3.5 MoE MTP layer shared expert shape error when flash comm v1
is enabled.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- vLLM version: v0.18.0
- vLLM main:
35141a7eed
Signed-off-by: Wangbingjie <wangbj1207@126.com>
This commit is contained in:
@@ -1595,10 +1595,8 @@ class SpecDecodeBaseProposer(EagleProposer):
|
||||
hidden_states: torch.Tensor,
|
||||
positions: torch.Tensor,
|
||||
) -> tuple[torch.Tensor, torch.Tensor]:
|
||||
if self.is_multimodal_model and _EXTRA_CTX.flash_comm_v1_enabled:
|
||||
return hidden_states, positions
|
||||
if self.method == "mtp":
|
||||
if _EXTRA_CTX.flash_comm_v1_enabled:
|
||||
if _EXTRA_CTX.flash_comm_v1_enabled and not self.is_multimodal_model:
|
||||
hidden_states = torch.ops.vllm.maybe_pad_and_reduce(hidden_states)
|
||||
positions = positions.unsqueeze(-1)
|
||||
positions = torch.ops.vllm.maybe_pad_and_reduce(positions)
|
||||
|
||||
Reference in New Issue
Block a user