Nengjun Ma
66b60c9440
[Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (#6629)
### What this PR does / why we need it?
1. [Refact] Refact MLA/SFA weight prefetch to consist with moe weight
prefetch
2. Remove duplicated o_proj weight prefetch in forward for MLA/SFA
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
1) Performance result:
Perf test data:
*) MLA:
| | 1st test | 2nd test | Output Token Throughput(Avg) | Performance
improvement percentage |
| --- | --- | --- | --- | --- |
| o_proj duplicate prefetch | 11.9669 token/s | 12.0287 token/s |
11.9978 |
| o_proj no duplicate prefetch | 12.5594 token/s | 12.6216 token/s |
12.5905 | 4.94%| |
single layer performace improve: 5%~8%
*) SFA:
| | 1st test | 2nd test | Output Token Throughput(Avg) | Performance
improvement percentage |
| --- | --- | --- | --- | --- |
| o_proj duplicate prefetch | 13.0523 token/s | 13.1084 token/s |
13.08035 | |
| o_proj no duplicate prefetch | 13.9844 token/s | 14.1678 token/s |
14.0761 | 7.6% |
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
2026-02-10 14:14:37 +08:00
..
2026-02-07 09:26:26 +08:00
2026-02-10 14:14:37 +08:00
2026-01-26 09:04:54 +08:00
2026-01-06 16:41:39 +08:00
2026-01-13 09:21:28 +08:00
2026-01-15 08:57:40 +08:00
2026-02-05 19:31:17 +08:00
2025-06-16 18:32:28 +08:00
2026-01-30 14:27:53 +08:00
2025-12-30 15:05:47 +08:00
2026-02-10 14:14:37 +08:00
2026-02-09 14:07:44 +08:00
2026-02-05 19:31:17 +08:00
2026-01-07 18:41:45 +08:00
2026-02-10 14:08:59 +08:00
2026-02-05 10:06:14 +08:00
2025-07-21 19:43:30 +08:00
2025-07-28 15:13:37 +08:00
2026-01-06 08:44:29 +08:00
2026-01-20 21:31:38 +08:00
2025-08-14 09:33:39 +08:00
2026-01-27 14:38:07 +08:00
2026-02-06 10:28:42 +08:00