chenxi-hh
42bcad7e9b
GMM custom operator optimization in small batch scenarios (#7100)
### What this PR does / why we need it?
GMM custom operator optimization in small batch scenarios
### How was this patch tested?
Qwen3-30B input: 4k, output: 1k
batch 1:
TPOT 7.9 ms -> 7.0 ms
Output Token Throughput 125.4651 token/s -> 140.6278 token/s
batch 2:
TPOT 9.4 ms -> 8.8 ms
Output Token Throughput 211.8187 token/s -> 225.2254 token/s
batch 16:
TPOT 13.6 ms -> 13.5 ms
Output Token Throughput 1159.8213 token/s -> 1165.0982 token/s
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: chenxi-hh <chen464822955@163.com>
2026-03-19 16:10:30 +08:00
..
2025-11-27 21:56:18 +08:00
2026-03-03 17:08:22 +08:00
2026-02-24 09:12:43 +08:00
2026-02-24 09:12:43 +08:00
2026-03-09 23:29:49 +08:00
2026-03-03 17:08:22 +08:00
2026-03-13 14:07:35 +08:00
2026-03-12 10:25:51 +08:00
2026-03-09 16:09:35 +08:00
2026-03-12 10:25:51 +08:00
2026-02-24 09:12:43 +08:00
2026-02-24 09:12:43 +08:00
2026-03-12 10:25:51 +08:00
2026-03-16 15:39:42 +08:00
2026-02-24 09:12:43 +08:00
2026-02-24 09:12:43 +08:00
2026-03-09 20:17:21 +08:00
2026-02-24 09:12:43 +08:00
2025-12-10 17:15:28 +08:00
2026-03-03 17:08:22 +08:00
2026-03-09 09:56:31 +08:00
2026-03-03 17:08:22 +08:00
2026-03-12 10:25:51 +08:00
2026-03-12 10:25:51 +08:00
2025-12-04 23:00:59 +08:00
2026-03-17 10:08:32 +08:00
2026-03-12 10:25:51 +08:00
2026-03-16 15:39:42 +08:00
2025-11-28 18:06:39 +08:00
2025-11-04 08:55:09 +08:00
2025-12-29 20:34:53 +08:00
2026-02-07 09:24:05 +08:00
2026-03-16 15:39:42 +08:00
2026-03-19 16:10:30 +08:00
2025-12-10 17:15:28 +08:00