chenxi-hh
42bcad7e9b
GMM custom operator optimization in small batch scenarios (#7100)
### What this PR does / why we need it?
GMM custom operator optimization in small batch scenarios
### How was this patch tested?
Qwen3-30B input: 4k, output: 1k
batch 1:
TPOT 7.9 ms -> 7.0 ms
Output Token Throughput 125.4651 token/s -> 140.6278 token/s
batch 2:
TPOT 9.4 ms -> 8.8 ms
Output Token Throughput 211.8187 token/s -> 225.2254 token/s
batch 16:
TPOT 13.6 ms -> 13.5 ms
Output Token Throughput 1159.8213 token/s -> 1165.0982 token/s
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: chenxi-hh <chen464822955@163.com>
2026-03-19 16:10:30 +08:00
..
2026-03-19 09:16:22 +08:00
2025-11-28 18:06:39 +08:00
2026-03-18 14:20:21 +08:00
2026-03-18 09:24:43 +08:00
2026-03-16 15:39:42 +08:00
2026-03-19 16:10:30 +08:00
2026-01-19 08:59:46 +08:00
2026-03-18 10:50:02 +08:00
2026-03-12 15:49:09 +08:00
2026-03-18 23:24:27 +08:00
2026-03-15 17:55:42 +08:00
2026-01-24 22:08:33 +08:00
2026-03-19 16:10:30 +08:00
2026-03-19 09:16:22 +08:00
2026-03-18 20:30:03 +08:00
2026-03-05 09:12:40 +08:00
2026-03-18 09:24:43 +08:00
2026-03-18 23:24:27 +08:00
2026-03-18 23:24:27 +08:00
2026-01-16 20:57:46 +08:00
2026-03-19 14:27:27 +08:00
2026-03-17 23:03:45 +08:00
2026-03-05 09:12:40 +08:00
2026-03-03 17:20:52 +08:00
2026-03-19 14:27:27 +08:00
2026-01-16 20:57:46 +08:00
2026-02-07 09:24:05 +08:00
2026-03-13 22:53:25 +08:00
2026-02-01 20:06:01 +08:00
2026-03-16 22:49:05 +08:00