sherie
f86596a66c
allgather use fusedop. (#2689)
### What this PR does / why we need it?
Use 'npu_moe_init_routing_v2' &'npu_moe_token_unpermute' repalce
'npu_moe_init_routing' &‘npu_moe_compute_expert_tokens’&
'npu_moe_finalize_routing' to optimize performance
### Does this PR introduce _any_ user-facing change?
| branch| tps| TTFT |TPOT |
| --- | --- | --- |--- |
|main |733.98 | 280.05 |34.30 |
|main+fusedop | 740.33 | 273.34 |33.99 |
### How was this patch tested?
- vLLM version: v0.10.1.1
- vLLM main:
6997a25ac6
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-09-04 11:56:29 +08:00
..
2025-09-04 10:22:46 +08:00
2025-08-30 12:04:01 +08:00
2025-07-31 19:17:27 +08:00
2025-08-29 11:41:21 +08:00
2025-06-16 18:32:28 +08:00
2025-09-04 08:22:10 +08:00
2025-09-03 14:38:55 +08:00
2025-07-31 19:17:27 +08:00
2025-09-04 11:56:29 +08:00
2025-08-14 09:33:39 +08:00
2025-09-04 11:37:32 +08:00
2025-08-11 17:37:49 +08:00
2025-09-04 11:35:14 +08:00
2025-09-03 10:58:08 +08:00
2025-07-21 19:43:30 +08:00
2025-07-28 15:13:37 +08:00
2025-07-18 23:07:14 +08:00
2025-09-03 17:56:12 +08:00
2025-08-14 09:33:39 +08:00
2025-09-04 11:45:56 +08:00
2025-09-04 11:50:43 +08:00