sherie
f86596a66c
allgather use fusedop. (#2689)
### What this PR does / why we need it?
Use 'npu_moe_init_routing_v2' &'npu_moe_token_unpermute' repalce
'npu_moe_init_routing' &‘npu_moe_compute_expert_tokens’&
'npu_moe_finalize_routing' to optimize performance
### Does this PR introduce _any_ user-facing change?
| branch| tps| TTFT |TPOT |
| --- | --- | --- |--- |
|main |733.98 | 280.05 |34.30 |
|main+fusedop | 740.33 | 273.34 |33.99 |
### How was this patch tested?
- vLLM version: v0.10.1.1
- vLLM main:
6997a25ac6
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-09-04 11:56:29 +08:00
..
2025-09-04 10:22:46 +08:00
2025-08-28 14:08:31 +08:00
2025-09-04 08:22:46 +08:00
2025-07-28 16:01:59 +08:00
2025-09-04 08:22:10 +08:00
2025-08-20 09:01:04 +08:00
2025-09-04 10:39:21 +08:00
2025-08-15 07:35:27 +08:00
2025-09-04 11:56:29 +08:00
2025-09-02 11:46:59 +08:00
2025-09-04 11:37:32 +08:00
2025-08-28 18:47:53 +08:00
2025-09-04 11:34:47 +08:00
2025-09-04 11:35:14 +08:00
2025-09-04 11:34:47 +08:00
2025-08-05 08:43:24 +08:00
2025-09-03 17:56:12 +08:00
2025-08-30 22:28:50 +08:00
2025-09-04 08:22:10 +08:00
2025-08-20 09:01:04 +08:00
2025-09-04 11:45:56 +08:00
2025-09-04 11:50:43 +08:00