sglang

Author	SHA1	Message	Date
Yineng Zhang	6578cf27de	chore: bump sgl-kernel 0.1.2 (#6131 )	2025-05-08 15:16:28 -07:00
Stefan He	087751a8f2	Remove unecessary is_fa3_supported check (#6112 )	2025-05-08 14:45:33 -07:00
Yineng Zhang	d353d08b4e	chore: bump sgl-kernel 0.1.1 (#5932 )	2025-04-30 14:01:49 -07:00
PGFLMG	08acdb5c3d	[Feat] Scale up fa3 kernel to sm8x arch (#5912 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-30 13:59:36 -07:00
Johnny	2c7dbb7cc2	[FEATURE] Enhance platform compatibility for ARM (#5746 )	2025-04-29 15:06:16 -07:00
PGFLMG	ee71ed8a41	[Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (#5847 ) Co-authored-by: sighingnow <sighingnow@gmail.com>	2025-04-28 11:03:17 -07:00
Trevor Morris	84810da4ae	Add Cutlass MLA attention backend (#5390 )	2025-04-27 20:58:53 -07:00
Yineng Zhang	7d0edf3cae	chore: bump sgl-kernel 0.1.0 (#5688 )	2025-04-23 14:23:59 -07:00
Yineng Zhang	15fabcc07f	fix sgl-kernel unit tests (#5666 )	2025-04-23 01:18:30 -07:00
Elfie Guo	e62c49557d	[1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (#5281 )	2025-04-22 22:28:20 -07:00
Yubo Wang	20f1c8e374	Fix sampler nan check when calling top_k_top_p_sampling_from_probs (#5546 )	2025-04-19 21:47:23 -07:00
Yineng Zhang	f28d82997a	chore: bump sgl-kernel 0.0.9.post2 (#5518 )	2025-04-17 23:42:39 -07:00
Xiaoyu Zhang	8e09b37077	Sgl kernel fused_moe_gate support n_shared_experts (#5440 )	2025-04-17 23:05:15 -07:00
PGFLMG	c08a717c77	[Feat] Update sgl-kernel flashinfer to latest main version (#5500 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-17 12:43:23 -07:00
Trevor Morris	e8f62b20ca	BLackwell cutlass mla: Add check for bad page size/block num combinations (#5431 )	2025-04-15 14:07:42 -07:00
Yineng Zhang	6f509d5503	chore: bump sgl-kernel v0.0.9.post1 (#5430 )	2025-04-15 11:00:21 -07:00
Yineng Zhang	e940dc4f06	chore: bump sgl-kernel 0.0.9 (#5400 )	2025-04-14 21:34:04 -07:00
DefTruth	388e15c0db	kernel: support slightly faster merge_state_v2 cuda kernel (#5381 )	2025-04-14 21:28:23 -07:00
Yineng Zhang	b62e7e99b8	feat: adapt merge_state (#5337 )	2025-04-12 21:14:04 -07:00
Yineng Zhang	b371f7cd36	chore: bump sgl-kernel v0.0.8.post3 (#5332 )	2025-04-12 12:53:37 -07:00
PGFLMG	4879e50c6d	[Feat] Add sparse attn to sgl-kernel (#5327 )	2025-04-12 11:36:36 -07:00
Yineng Zhang	115ae2e728	chore: bump sgl-kernel v0.0.8.post2 (#5317 )	2025-04-11 23:42:03 -07:00
Baizhou Zhang	e4155e96d0	Add flash_attn_varlen_func to sgl-kernel (#5315 )	2025-04-11 23:36:36 -07:00
Trevor Morris	f65b8d5c89	Blackwell Cutlass MLA kernel (#5142 )	2025-04-11 22:16:51 -07:00
Yineng Zhang	4f288113ce	fix: update flash attn (#5308 )	2025-04-11 16:23:09 -07:00
Yineng Zhang	136b8e6afb	fix: remove cublas_grouped_gemm (#5307 )	2025-04-11 16:22:37 -07:00
Yineng Zhang	c163bf4ff1	chore: bump sgl-kernel v0.0.8.post1 (#5289 )	2025-04-11 02:11:53 -07:00
Yineng Zhang	496dde8491	bump sgl-kernel 0.0.8 (#5089 )	2025-04-05 14:28:04 -07:00
Yi Zhang	bcbbf519f9	sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079 )	2025-04-05 14:23:20 -07:00
Yineng Zhang	d7954b7682	bump sgl-kernel v0.0.7 (#5046 )	2025-04-03 13:38:13 -07:00
yinfan98	b8b6008f47	[Fix] fix fa3 build at cu118 (#5036 )	2025-04-03 11:52:35 -07:00
Yineng Zhang	6384d31776	bump sgl-kernel v0.0.6 (#4950 )	2025-03-31 11:24:09 -07:00
yinfan98	37c66ec856	[feat] add fa3 in sgl-kernel (#4902 ) Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>	2025-03-30 12:57:10 -07:00
yinfan98	0d7fe866f9	[Misc] Clean m.def and add Development Tips (#4890 )	2025-03-29 23:06:18 -07:00
yinfan98	8e7b31546c	quick fix: add default for new kernel (#4898 )	2025-03-29 12:31:59 -07:00
Qingquan Song	45dcfc2e76	Add deepseek style fused moe group gate selection kernel (#4530 )	2025-03-29 11:51:45 -07:00
Yineng Zhang	92941ce7b5	bump sgl-kernel 0.0.5.post4 (#4768 )	2025-03-28 14:40:53 -07:00
Yineng Zhang	31dfff7da7	use default for torch.ops (#4835 )	2025-03-27 19:09:58 -07:00
Yineng Zhang	8bf6d7f406	support cmake for sgl-kernel (#4706 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-27 01:42:28 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
Yineng Zhang	988ab646ec	bump v0.0.5.post3 (#4520 )	2025-03-17 13:05:59 -07:00
Lianmin Zheng	3db35c1af4	Release sgl-kernel v0.0.5.post2 (#4469 )	2025-03-16 01:01:53 -07:00
Ying Sheng	52a34d7448	Add greedy verification kernel (#4383 )	2025-03-16 00:58:26 -07:00
Yineng Zhang	862fe52241	bump v0.0.5.post1 (#4437 )	2025-03-14 15:00:26 -07:00
Qingquan Song	61e4433caf	Add moe topk softmax templated from vllm (#4302 )	2025-03-14 12:03:33 -07:00
Yineng Zhang	2a4cbad8e9	bump 0.0.5 sgl-kernel (#4377 )	2025-03-13 02:08:35 -07:00
Yineng Zhang	6e7239f912	release 0.0.4.post3 sgl-kernel (#4331 )	2025-03-12 01:05:16 -07:00
Yineng Zhang	0a3960f21f	fix awq_dequantize (#4333 )	2025-03-12 01:04:38 -07:00
Rex	07f944631e	Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )	2025-03-12 00:10:02 -07:00

1 2

55 Commits