sglang

Author	SHA1	Message	Date
Xiaoyu Zhang	7a4309cc8a	[sgl-kernel performace] fix fp8 quant kernels dispatch __nv_fp8_e4m3 bug to improve performance 10%-20% (#8499 ) Co-authored-by: Ke Bao <ispobaoke@gmail.com>	2025-07-29 23:31:54 +08:00
strgrb	fb4ce17de6	Fix per_token_group_quant_8bit when hidden_dim // group_size is not divided by 4. (#8449 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-07-28 01:32:46 -07:00
likesen-alibaba	4a0d19198b	Fix bug of deepseek-v3 under DP+EP mode with large batchsize/seqlen (#6449 )	2025-07-10 01:19:56 -07:00
fzyzcjy	5c66c4424f	Support new DeepGEMM format in per token group quant (#7146 )	2025-06-13 02:00:22 -07:00
Xiaoyu Zhang	2c8fd99363	[sgl-kernel] per token group quant support COLUMN MAJOR (#4817 )	2025-04-02 18:29:59 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00