xc-llm-ascend/vllm_ascend at e67608041dc29ca0d9d41210d7f5f6eb812d397e - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

wangqiankun13 d840f153f4 [Bugfix] Fix acc bug when enbale dispatch_gmm_combine_decode and eplb (#5806 )

### What this PR does / why we need it?

Fix acc bug when enbale dispatch_gmm_combine_decode and eplb.

After eplb, expert table may change, so mapping is needed, while
fused_mc2 miss the mapping.

More info about this operator, please refer to RFC: issue
https://github.com/vllm-project/vllm-ascend/issues/5476

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

without this pr, qwen3-235b eplb with dispatch_gmm_combine_decode get
acc 3.33% on aime2024.

with this pr,

test qwen3-235b eplb on a single A3 node(ep16)
without dispatch_gmm_combine_decode
| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |

with dispatch_gmm_combine_decode
| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |


- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

2026-01-15 09:21:18 +08:00

..

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[bugfix]support dsv3.2 enable both mtp and full_decode_only (#5849 )

2026-01-14 22:57:38 +08:00

[bugfix]limit graph replay sync (#5761 )

2026-01-12 16:46:21 +08:00

[CI] fix lint (#5216 )

2025-12-20 17:03:25 +08:00

[Refactor] Provide a framework to accommodate operators for different hardware devices (#5735 )

2026-01-13 09:53:26 +08:00

device_allocator

[Refactor] Cleanup platform (#5566 )

2026-01-07 09:25:55 +08:00

[Refactor]Refactor of vllm_ascend/distributed module (#5719 )

2026-01-15 08:57:40 +08:00

[EPLB][Bugfix] Get expert map from layers (#5817 )

2026-01-14 09:16:51 +08:00

[BugFix] Fix npu-cpu offloading interface change bug. (#5290 )

2025-12-27 10:21:20 +08:00

[BufFix]Fix the error when using Ascend custom operators with rank=128 (#5394 )

2026-01-09 15:57:43 +08:00

[BugFix] NetLoader: No backend type associated with device type npu (#5700 )

2026-01-09 15:54:54 +08:00

[Bugfix] Fix acc bug when enbale dispatch_gmm_combine_decode and eplb (#5806 )

2026-01-15 09:21:18 +08:00

[Refactor]Refactor of vllm_ascend/distributed module (#5719 )

2026-01-15 08:57:40 +08:00

[Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (#5718 )

2026-01-14 09:17:26 +08:00

[Feature] add the magicmtp speculative decoding acceleration algorithm (#5542 )

2026-01-08 09:15:55 +08:00

Revert "[BugFix] Support setting tp=1 for the Eagle draft model to take effect (#5519 )"(#5902 )

2026-01-14 20:55:10 +08:00

Revert "[BugFix] Support setting tp=1 for the Eagle draft model to take effect (#5519 )"(#5902 )

2026-01-14 20:55:10 +08:00

[BugFix] Xlite: Bypass the padding of the graph mode in non-MTP cases to obtain the correct decode num. (#5711 )

2026-01-09 15:55:30 +08:00

__init__.py

[Refactor]Refactor of vllm_ascend/distributed module (#5719 )

2026-01-15 08:57:40 +08:00

ascend_config.py

[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B (#5502 )

2026-01-09 16:25:20 +08:00

ascend_forward_context.py

[Bugfix] Fixed an accuracy problem of sp with eagle3 (#5816 )

2026-01-14 09:00:37 +08:00

batch_invariant.py

[Feature] implement basic framework for batch invariant (#5517 )

2026-01-07 09:11:26 +08:00

cpu_binding.py

[Refactor] Modify the binding logic to allocate CPU cores for each NPU card (#5555 )

2026-01-13 09:21:28 +08:00

envs.py

enable ep32 for dispatch_ffn_combine (#5787 )

2026-01-13 14:35:52 +08:00

flash_common3_context.py

[Perf]enable prefill flashcommon3 (#4065 )

2025-12-14 09:34:13 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feature] implenment set_additional_forward_context for model runner v2 (#5720 )

2026-01-15 09:18:28 +08:00

profiling_config.py

Drop ascend scheduler (#4623 )

2025-12-05 09:03:45 +08:00

utils.py

[Bugfix] Fixed an accuracy problem of sp with eagle3 (#5816 )

2026-01-14 09:00:37 +08:00