xc-llm-ascend/csrc at 3cb0af0bcf3299089ca7e72159fa36e825a470f8 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

wangqiankun13 e67608041d [main][BugFix]Fix DispatchGmmCombineDecode acc bug when big batch (#5808 )

### What this PR does / why we need it?
If one expert handle more than 48 * 8 token, DispatchGmmCombineDecode
may incur acc problem, because a flag is set too early.

> Reason: LocalTensor ubInputRightHalf, ubInputTmp, ubInputRightHalf,
ubQuantF32, ubQuantS32, and ubQuantF16 use the same space with ubAbs, so
only after all of them are free, the copy from gm into ubInputRightHalf
can start, while before this pr,
AscendC::SetFlag<AscendC::HardEvent::V_MTE2>(0) is too early.

This pr sets flag at right time.

More info about this operator, please refer to RFC: issue
https://github.com/vllm-project/vllm-ascend/issues/5476
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
test qwen3-235b eplb with DispatchGmmCombineDecode on a single A3
node(ep16)
| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |


- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

2026-01-15 09:29:34 +08:00

..

aclnn_torch_adapter

[feature] Add Custom Op grouped_matmul_swiglu_quant (#4431 )

2025-11-27 21:56:18 +08:00

batch_matmul_transpose

[Bugfix]fix bmm_transpose ops for cann version (#4653 )

2025-12-06 10:52:46 +08:00

add dispatch_gmm_combine kernel (#3532 )

2025-12-04 23:00:59 +08:00

dispatch_ffn_combine

[CustomOp] support TensorList for dispatchFFNCombine (#5665 )

2026-01-09 15:56:29 +08:00

dispatch_gmm_combine_decode

[main][BugFix]Fix DispatchGmmCombineDecode acc bug when big batch (#5808 )

2026-01-15 09:29:34 +08:00

dispatch_layout

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00

grouped_matmul_swiglu_quant_weight_nz_tensor_list

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[BugFix]Fix precision issue for LoRA feature (#4141 )

2025-12-19 14:22:06 +08:00

lightning_indexer

[kernel] add AscendC op: lightning_indexer and sparse_flash_attention (#4625 )

2025-12-03 09:53:10 +08:00

matmul_allreduce_add_rmsnorm

[BugFix][kernel] fix matmul_allreduce_add_rmsnorm_kernel (#5335 )

2026-01-05 15:19:54 +08:00

[feat] mlapo add bf16 no_quant support (#4852 )

2025-12-11 11:06:56 +08:00

moe_combine_normal

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00

moe_dispatch_normal

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00

moe_gating_top_k

[Kernel] Add moe_gating_top_k operator support for Ascend NPU (#5579 )

2026-01-07 21:42:31 +08:00

moe_init_routing_custom

[OP] add custom op aclnnMoeInitRoutingCustom (#5251 )

2025-12-29 19:29:40 +08:00

notify_dispatch

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00

sparse_flash_attention

[kernel] add AscendC op: lightning_indexer and sparse_flash_attention (#4625 )

2025-12-03 09:53:10 +08:00

add dispatch_gmm_combine kernel (#3532 )

2025-12-04 23:00:59 +08:00

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00

build_aclnn.sh

[Kernel] Add moe_gating_top_k operator support for Ascend NPU (#5579 )

2026-01-07 21:42:31 +08:00

build.sh

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

camem_allocator.cpp

Quality enhancement: Immediately interrupt execution when memory OOM (#3932 )

2025-11-04 08:55:09 +08:00

CMakeLists.txt

[Kernel]update csrc cmakelist for open-source cann (#5458 )

2025-12-29 20:34:53 +08:00

ops.h

mlapo add qdown output (#4707 )

2025-12-06 11:18:53 +08:00

torch_binding_meta.cpp

[CustomOp] support TensorList for dispatchFFNCombine (#5665 )

2026-01-09 15:56:29 +08:00

torch_binding.cpp

[CustomOp] support TensorList for dispatchFFNCombine (#5665 )

2026-01-09 15:56:29 +08:00

utils.h

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00