[Bugfix] Qwen3Next support FlashComm1 (#6830)

### What this PR does / why we need it?
Support FlashComm1 for Qwen3-Next. Fix some padding problems in Sequence
Parallel (SP)
and resolve precision problems in shared_out when both FlashComm1 is
enabled.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: zhaojiangjiang <zhaojiangjiang1@h-partners.com>
Co-authored-by: zhaojiangjiang <zhaojiangjiang1@h-partners.com>
This commit is contained in:
ZhaoJiangJiang
2026-03-06 17:14:08 +08:00
committed by GitHub
parent a2696006d1
commit a51d6366b9
4 changed files with 63 additions and 8 deletions

View File

@@ -705,7 +705,11 @@ def _get_row_parallel_op(
def get_parallel_op(disable_tp, prefix, layer, direct):
if disable_tp or ("shared_experts" in prefix and shared_expert_dp_enabled()):
if (
disable_tp
or ("shared_experts" in prefix and shared_expert_dp_enabled())
or ("shared_expert" in prefix and shared_expert_dp_enabled())
):
return None, 0, 1
custom_op: (
MLPColumnParallelOp