xc-llm-ascend

Files

1092626063 9328f377b4 [refactor]support gatingtopk operator generalization (#2958 )

### What this PR does / why we need it?

Past：
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now：
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are
included in `torch_npu.npu_moe_gating_top_k`

CANN: depends on 8.3.RC1

Performance：
1. GLM4.5-w8a8, TPS improve 6%
2. Qwen3, the same as before

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: 1092626063 <1092626063@qq.com>

2025-11-19 10:38:56 +08:00

fused_moe

[refactor]support gatingtopk operator generalization (#2958 )

2025-11-19 10:38:56 +08:00

__init__.py

[Refactor] [MoE] Rename moe-related classes & files (#3646 )

2025-10-25 11:22:03 +08:00

activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

attention.py

remove useless code (#3685 )