bugfix for mtp when running torchair in a2 (#3354)
### What this PR does / why we need it? when ops torchair_fused_experts_with_mc2 is called, we need pass a tp group, but now it only pass when quantized scenario, we need also pass it when unquantized. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
This commit is contained in:
@@ -899,6 +899,7 @@ class TorchairAscendUnquantizedFusedMoEMethod(UnquantizedFusedMoEMethod):
|
||||
expert_map=expert_map,
|
||||
moe_all_to_all_group_name=self.moe_all_to_all_group_name,
|
||||
shared_experts=shared_experts,
|
||||
is_torchair=self.torchair_graph_enabled,
|
||||
mc2_mask=kwargs.get("mc2_mask", None))
|
||||
elif fused_moe_state in [
|
||||
FusedMoEState.AllGather, FusedMoEState.NaiveMulticast
|
||||
|
||||
Reference in New Issue
Block a user