xc-llm-ascend

Files

wangqiankun13 bec8641876 [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (#5932 )

### What this PR does / why we need it?

In [PR 5040](https://github.com/vllm-project/vllm-ascend/pull/5040), the
`dispatch_gmm_combine_decode` operator was configured with an incorrect
global_bs parameter. This PR is to fix the bug.

The global_bs provided as input should have the same meaning as in the
`moe_distributed_dispatch` operator, specifically: (the maximum batch
size across all cards) * (expert parallel world size).
However, the implementation incorrectly used the variable
max_num_tokens, which does not account for tensor parallelism. This
error likely resulted in an unnecessarily large (overestimated) value.

More info about this operator, please refer to RFC: issue
https://github.com/vllm-project/vllm-ascend/issues/5476

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Acc
test qwen3-235b eplb on a single A3 node(ep16),
with dispatch_gmm_combine_decode

| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 80.00 |
- vLLM version: v0.13.0
- vLLM main:
11b6af5280

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

2026-01-21 09:26:40 +08:00

__init__.py

[Refactor] [MoE] Rename moe-related classes & files (#3646 )

2025-10-25 11:22:03 +08:00

comm_utils.py

[Refactor] [MoE] Rename moe-related classes & files (#3646 )

2025-10-25 11:22:03 +08:00

experts_selector.py

[Kernel] Add moe_gating_top_k operator support for Ascend NPU (#5579 )

2026-01-07 21:42:31 +08:00

fused_moe.py

[EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (#5933 )