[Bugfix][MoE] Remove All2All in w4a8_dynamic (#4977)
### What this PR does / why we need it?
GatherEP has been fixed in
https://github.com/vllm-project/vllm-ascend/pull/3279, remove all2all in
w4a8_dynamic scenario.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
e2e & ut
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
This commit is contained in:
@@ -248,10 +248,6 @@ def select_moe_comm_method(num_tokens: int,
|
||||
and vllm_config.parallel_config.world_size_across_dp /
|
||||
vllm_config.parallel_config.pipeline_parallel_size >= 16):
|
||||
moe_comm_type = MoECommType.MC2
|
||||
else:
|
||||
# Currently, w4a8_dynamic does not support allgatherep
|
||||
if quant_type == "w4a8_dynamic":
|
||||
moe_comm_type = MoECommType.ALLTOALL
|
||||
else:
|
||||
moe_comm_type = MoECommType.ALLGATHER
|
||||
|
||||
|
||||
Reference in New Issue
Block a user