xc-llm-ascend

Author	SHA1	Message	Date
Mercykid-bash	8f45f9ce29	BugFix: Resolve shape mismatch in eplb update and calculation issues in quant_apply_mlp (#4777 ) ## Description This PR addresses two key issues in the MoE module when redundant experts are enabled, and fixes a calculation precision bug in the forward inference of quantized MLP: ### 1. Shape Mismatch in EPLB Expert Map Update - Root Cause: When redundant experts are turned on, a shape inconsistency occurs during the expert map update in `Vllm_apaptor`: - The shape of `self.expert_map_per_layer[layer_id]` is `[num_physical_experts,]` (aligned with physical expert count). - The shape of `updated_expert_map` is `[num_logical_experts,]` (aligned with logical expert count). - Indices in `self.expert_map_per_layer[layer_id]` that exceed the logical expert count cannot be properly mapped, leading to tensor shape mismatch errors. - The same shape mismatch exists in the `log2phy` map update (between `self.log2phy_map_per_layer[layer_id]` and `updated_log2phy_map`). - Fix: - Fix the shape initialization of `expert_map_per_layer` and `log2phy_map_per_layer` to be consistently set to `[num_physical_experts,]` across the module lifecycle. - Align the shape of `updated_expert_map` and `updated_log2phy_map` with the pre-initialized physical-expert-sized tensors during update operations, ensuring shape consistency for index mapping. ### 2. Calculation Precision Issue in Quantized MoE MLP Forward Inference - Root Cause: In the forward pass of `moe_mlp`, the `torch_npu.npu_dequant_swiglu_quant` operator only accepts group lists in Count format as input. However, the group list provided by `quant_apply_mlp` was in Cumsum format, which caused operator input format mismatch and degraded calculation precision. - Fix: - Convert the cumsum-formatted group list from `quant_apply_mlp` to Count format before passing it to `torch_npu.npu_dequant_swiglu_quant`. - Ensure the input format of the dequantization operator meets its requirements, restoring the expected calculation precision for quantized MoE MLP layers. ## Impact - Resolves shape mismatch errors in EPLB expert/log2phy map updates when redundant experts are enabled, ensuring stable expert routing. - Fixes quantized MoE MLP forward precision issues on NPU, aligning operator input formats with NPU kernel requirements. - No breaking changes to existing interfaces; the fixes are backward-compatible for scenarios without redundant experts enabled. --------- Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: Mercykid-bash <ruanche0218@gmail.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-09 15:46:58 +08:00
LI SHENGYONG	593a96056c	【EPLB】Eplb Redundant Experts Bugfix (#4232 ) ### What this PR does / why we need it? Redundant experts bugfix The calculation logic for redundant experts has been fixed, allowing the correct number of redundant experts to be calculated using the map. Therefore, there is no longer a need to set the redundant expert parameter when passing the map. ### Does this PR introduce _any_ user-facing change? After configuring the path for experts_map, users do not need to configure iinit_redundancy_expert. ### How was this patch tested? The accuracy of EPLB was tested with and without the use of redundant experts. --------- Signed-off-by: shenchuxiaofugui <1311027364@qq.com>	2025-12-03 12:00:05 +08:00
offline893	4e21b1537e	[BugFix] Check all expert maps when using muilty instance. (#3662 ) ### What this PR does / why we need it? Check all expert maps when using muilty instance. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Qwen 235B in double A3. case1：master has expert map, slave has not expert map. case2: master has expert map, slave has error expert map. case3: master has expert map,slave has correct expert map. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-10-24 17:10:31 +08:00
Yuxiao-Xu	6b853f15fe	Add static EPLB (#1116 ) ### What this PR does / why we need it? Add EPLB expert map import capabilities ### Does this PR introduce _any_ user-facing change? When importing the EPLB expert map you need import expert map file by vllm args additional_config ### How was this patch tested? 1.You need to collect expert hotness and generate an expert placement file based on the hotness and the EPLB algorithm, or you can directly use an existing expert placement table. 2.When launching vLLM, enable EC2 and pass the configuration via the command-line argument: --additional-config '{"expert_map_path": "/xxx/xxx/xx.json"} Co-authored-by: songshanhu07 <1763685535@qq.com> --------- Signed-off-by: songshanhu07 <1763685535@qq.com> Signed-off-by: Yuxiao-Xu <664988918@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: songshanhu07 <1763685535@qq.com> Co-authored-by: Xu Yuxiao <xuyuxiao2@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-09 19:28:11 +08:00

4 Commits