xc-llm-ascend/csrc at 42bcad7e9b68cf92d9b98a9072eff10edc8d6f33 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

chenxi-hh 42bcad7e9b GMM custom operator optimization in small batch scenarios (#7100 )

### What this PR does / why we need it?
GMM custom operator optimization in small batch scenarios

### How was this patch tested?

Qwen3-30B input: 4k, output: 1k

batch 1：
TPOT 7.9 ms -> 7.0 ms
Output Token Throughput 125.4651 token/s -> 140.6278 token/s

batch 2：
TPOT 9.4 ms -> 8.8 ms
Output Token Throughput 211.8187 token/s -> 225.2254 token/s

batch 16：
TPOT 13.6 ms -> 13.5 ms
Output Token Throughput 1159.8213 token/s -> 1165.0982 token/s

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: chenxi-hh <chen464822955@163.com>

2026-03-19 16:10:30 +08:00

..

aclnn_torch_adapter

[feature] Add Custom Op grouped_matmul_swiglu_quant (#4431 )

2025-11-27 21:56:18 +08:00

add_rms_norm_bias

[csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (#6936 )

2026-03-03 17:08:22 +08:00

apply_top_k_top_p_custom

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

2026-02-24 09:12:43 +08:00

batch_matmul_transpose

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

2026-02-24 09:12:43 +08:00

[qwen3 next ]add ascend c casual_conv1d_fn (#6661 )

2026-03-09 23:29:49 +08:00

[csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (#6936 )

2026-03-03 17:08:22 +08:00

copy_and_expand_eagle_inputs

[feat][spec decode]Unified draft parallel (#6766 )

2026-03-13 14:07:35 +08:00

dispatch_ffn_combine

[Build] Add support for Ascend950 chip (#7151 )

2026-03-12 10:25:51 +08:00

dispatch_ffn_combine_bf16

[fix]Resolve compilation errors that occur when building versions subsequent to b020 (#7059 )

2026-03-09 16:09:35 +08:00

dispatch_gmm_combine_decode

[Build] Add support for Ascend950 chip (#7151 )

2026-03-12 10:25:51 +08:00

dispatch_layout

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

2026-02-24 09:12:43 +08:00

grouped_matmul_swiglu_quant_weight_nz_tensor_list

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

2026-02-24 09:12:43 +08:00

[Build] Add support for Ascend950 chip (#7151 )

2026-03-12 10:25:51 +08:00

lightning_indexer_quant

[bugfix] restore pr-7029 and fix patch error (#7294 )

2026-03-16 15:39:42 +08:00

lightning_indexer_vllm

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

2026-02-24 09:12:43 +08:00

matmul_allreduce_add_rmsnorm

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

2026-02-24 09:12:43 +08:00

[feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902 )

2026-03-09 20:17:21 +08:00

moe_combine_normal

Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732 )

2026-02-24 09:12:43 +08:00

moe_dispatch_normal

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00

moe_gating_top_k

[csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (#6936 )

2026-03-03 17:08:22 +08:00

moe_grouped_matmul

[MOE] commit GMM custom operator (#7010 )

2026-03-09 09:56:31 +08:00

moe_init_routing_custom

[csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (#6936 )

2026-03-03 17:08:22 +08:00

notify_dispatch

[Build] Add support for Ascend950 chip (#7151 )

2026-03-12 10:25:51 +08:00

sparse_flash_attention

[Build] Add support for Ascend950 chip (#7151 )

2026-03-12 10:25:51 +08:00

add dispatch_gmm_combine kernel (#3532 )

2025-12-04 23:00:59 +08:00

transpose_kv_cache_by_block

[Bugfix] fix TransposeKvCacheByBlock op error report in plog (#7235 )

2026-03-17 10:08:32 +08:00

[Build] Add support for Ascend950 chip (#7151 )

2026-03-12 10:25:51 +08:00

build_aclnn.sh

[bugfix] restore pr-7029 and fix patch error (#7294 )

2026-03-16 15:39:42 +08:00

build.sh

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

camem_allocator.cpp

Quality enhancement: Immediately interrupt execution when memory OOM (#3932 )

2025-11-04 08:55:09 +08:00

CMakeLists.txt

[Kernel]update csrc cmakelist for open-source cann (#5458 )

2025-12-29 20:34:53 +08:00

ops.h

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

torch_binding_meta.cpp

[bugfix] restore pr-7029 and fix patch error (#7294 )

2026-03-16 15:39:42 +08:00

torch_binding.cpp

GMM custom operator optimization in small batch scenarios (#7100 )

2026-03-19 16:10:30 +08:00

utils.h

[Kernel] Add moe normal ops (#4810 )

2025-12-10 17:15:28 +08:00