Files
xc-llm-ascend/vllm_ascend/ops
weichen 17259cb265 [Perf] [MoE] optimize all2allv (#3738)
### What this PR does / why we need it?
1. Replace init_routing_v2 with token_permute to optimize performance.

Note: This pr will be merged after switching ci to CANN 8.3
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
vllm bench serve bs = 48 / rr = 10000 / 2k input -> 20k output:
before:
<img width="489" height="488" alt="image"
src="https://github.com/user-attachments/assets/268a19e6-9ab2-47f0-84a1-4f6d3bc342e2"
/>
 after:
<img width="480" height="500" alt="image"
src="https://github.com/user-attachments/assets/d9b1e628-0520-42d5-8a21-b42f7cd7abc7"
/>
- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-11-13 09:38:11 +08:00
..
2025-10-24 16:29:08 +08:00
2025-11-03 20:21:07 +08:00