### What this PR does / why we need it?
1. Replace init_routing_v2 with token_permute to optimize performance.
Note: This pr will be merged after switching ci to CANN 8.3
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
vllm bench serve bs = 48 / rr = 10000 / 2k input -> 20k output:
before:
<img width="489" height="488" alt="image"
src="https://github.com/user-attachments/assets/268a19e6-9ab2-47f0-84a1-4f6d3bc342e2"
/>
after:
<img width="480" height="500" alt="image"
src="https://github.com/user-attachments/assets/d9b1e628-0520-42d5-8a21-b42f7cd7abc7"
/>
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>