sherie
f86596a66c
allgather use fusedop. (#2689)
### What this PR does / why we need it?
Use 'npu_moe_init_routing_v2' &'npu_moe_token_unpermute' repalce
'npu_moe_init_routing' &‘npu_moe_compute_expert_tokens’&
'npu_moe_finalize_routing' to optimize performance
### Does this PR introduce _any_ user-facing change?
| branch| tps| TTFT |TPOT |
| --- | --- | --- |--- |
|main |733.98 | 280.05 |34.30 |
|main+fusedop | 740.33 | 273.34 |33.99 |
### How was this patch tested?
- vLLM version: v0.10.1.1
- vLLM main:
6997a25ac6
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-09-04 11:56:29 +08:00
..
2025-09-02 09:02:22 +08:00
2025-08-07 14:41:10 +08:00
2025-08-25 09:39:30 +08:00
2025-09-04 10:39:21 +08:00
2025-07-26 17:15:47 +08:00
2025-07-02 16:57:03 +08:00
2025-09-04 11:56:29 +08:00
2025-07-16 17:57:48 +08:00
2025-06-17 08:52:26 +08:00
2025-09-02 18:49:17 +08:00
2025-09-02 09:02:22 +08:00
2025-06-09 16:34:41 +08:00
2025-06-17 08:52:26 +08:00
2025-07-15 12:49:57 +08:00