xc-llm-ascend

Files

chenxi-hh 42bcad7e9b GMM custom operator optimization in small batch scenarios (#7100 )

### What this PR does / why we need it?
GMM custom operator optimization in small batch scenarios

### How was this patch tested?

Qwen3-30B input: 4k, output: 1k

batch 1：
TPOT 7.9 ms -> 7.0 ms
Output Token Throughput 125.4651 token/s -> 140.6278 token/s

batch 2：
TPOT 9.4 ms -> 8.8 ms
Output Token Throughput 211.8187 token/s -> 225.2254 token/s

batch 16：
TPOT 13.6 ms -> 13.5 ms
Output Token Throughput 1159.8213 token/s -> 1165.0982 token/s

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: chenxi-hh <chen464822955@163.com>

2026-03-19 16:10:30 +08:00

__init__.py

[Refactor] [MoE] Rename moe-related classes & files (#3646 )

2025-10-25 11:22:03 +08:00

comm_utils.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #11 ) (#6176 )

2026-02-06 15:28:49 +08:00

experts_selector.py

[Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (#6629 )

2026-02-10 14:14:37 +08:00

fused_moe.py

[bugfix]Enable dispatch_ffn_combine feature for qwen3.5 (#7066 )