xc-llm-ascend

Files

1092626063 c4a11a745a [refactor]support gatingtopk operator generalization (#4356 )

### What this PR does / why we need it?
This pr is cherry-pick from :
https://github.com/vllm-project/vllm-ascend/pull/2958 and
https://github.com/vllm-project/vllm-ascend/pull/4340

Past：
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now：
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are
included in `torch_npu.npu_moe_gating_top_k`

CANN: depends on 8.3.RC1

Performance：
1. GLM4.5-w8a8, TPS improve 6%
2. Qwen3, the same as before

---------

Signed-off-by: 1092626063 <1092626063@qq.com>

2025-12-04 20:10:13 +08:00

e2e

[refactor]support gatingtopk operator generalization (#4356 )

2025-12-04 20:10:13 +08:00

[refactor]support gatingtopk operator generalization (#4356 )

2025-12-04 20:10:13 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00