[EPLB][Bugfix] EPLB support fp/bf16 (#5531)

### What this PR does / why we need it? EPLB support dtype of fp/bf16. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? w8a8_dynamic Baseline: | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 86.67 | w8a8_dynamic eplb: | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 86.67 | The fp16 conversation is normal. The fp16 test is in progress. Baseline fp16 | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 86.67 | eplb fp16 | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 83.33 | - vLLM version: v0.13.0 - vLLM main: 45c1ca1ca1 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
2026-01-26 14:28:16 +08:00
parent 52d4acfa51
commit 611e223b7d
4 changed files with 67 additions and 118 deletions
--- a/vllm_ascend/ops/fused_moe/fused_moe.py
+++ b/vllm_ascend/ops/fused_moe/fused_moe.py
@@ -102,6 +102,7 @@ class AscendUnquantizedFusedMoEMethod(UnquantizedFusedMoEMethod):
              expert_map: Optional[torch.Tensor] = None,
              apply_router_weight_on_input: bool = False,
              enable_force_load_balance: bool = False,
+              log2phy: torch.Tensor = None,
              **kwargs) -> torch.Tensor:
        zero_expert_num = getattr(layer, "zero_expert_num", 0)
        zero_expert_type = getattr(layer, "zero_expert_type", None)
@@ -149,6 +150,7 @@ class AscendUnquantizedFusedMoEMethod(UnquantizedFusedMoEMethod):
            expert_map=expert_map,
            apply_router_weight_on_input=apply_router_weight_on_input,
            dynamic_eplb=self.dynamic_eplb,
+            log2phy=log2phy,
            mc2_mask=kwargs.get("mc2_mask", None))
        if zero_expert_num > 0 and zero_expert_type is not None:
            final_hidden_states += zero_expert_result