chenxi-hh
42bcad7e9b
GMM custom operator optimization in small batch scenarios (#7100)
### What this PR does / why we need it?
GMM custom operator optimization in small batch scenarios
### How was this patch tested?
Qwen3-30B input: 4k, output: 1k
batch 1:
TPOT 7.9 ms -> 7.0 ms
Output Token Throughput 125.4651 token/s -> 140.6278 token/s
batch 2:
TPOT 9.4 ms -> 8.8 ms
Output Token Throughput 211.8187 token/s -> 225.2254 token/s
batch 16:
TPOT 13.6 ms -> 13.5 ms
Output Token Throughput 1159.8213 token/s -> 1165.0982 token/s
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: chenxi-hh <chen464822955@163.com>
2026-03-19 16:10:30 +08:00
..
2025-10-25 11:22:03 +08:00
2026-02-06 15:28:49 +08:00
2026-02-10 14:14:37 +08:00
2026-03-17 19:53:02 +08:00
2026-03-13 09:11:46 +08:00
2026-03-19 16:10:30 +08:00
2026-03-13 09:11:46 +08:00
2026-03-02 11:04:06 +08:00