chenxi-hh
42bcad7e9b
GMM custom operator optimization in small batch scenarios (#7100)
### What this PR does / why we need it?
GMM custom operator optimization in small batch scenarios
### How was this patch tested?
Qwen3-30B input: 4k, output: 1k
batch 1:
TPOT 7.9 ms -> 7.0 ms
Output Token Throughput 125.4651 token/s -> 140.6278 token/s
batch 2:
TPOT 9.4 ms -> 8.8 ms
Output Token Throughput 211.8187 token/s -> 225.2254 token/s
batch 16:
TPOT 13.6 ms -> 13.5 ms
Output Token Throughput 1159.8213 token/s -> 1165.0982 token/s
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: chenxi-hh <chen464822955@163.com>
2026-03-19 16:10:30 +08:00
..
2026-03-19 16:10:30 +08:00
2026-03-15 09:44:09 +08:00
2026-03-06 16:18:37 +08:00
2026-02-12 10:55:34 +08:00
2026-03-06 14:26:37 +08:00
2026-02-07 09:16:07 +08:00
2026-02-06 15:35:06 +08:00
2026-03-17 16:53:28 +08:00
2026-03-15 17:55:42 +08:00
2026-03-15 17:55:42 +08:00
2026-03-16 22:49:05 +08:00
2026-03-06 09:08:52 +08:00
2026-03-13 09:11:46 +08:00
2026-03-13 09:11:46 +08:00
2026-02-07 09:16:07 +08:00
2026-03-19 14:27:27 +08:00