sglang

Author	SHA1	Message	Date
Yineng Zhang	c1dd773c19	fix: use fa3 unit test on hopper only (#5304 )	2025-04-11 15:10:49 -07:00
Yineng Zhang	c163bf4ff1	chore: bump sgl-kernel v0.0.8.post1 (#5289 )	2025-04-11 02:11:53 -07:00
Yineng Zhang	5598634326	chore: relax the torch version restriction for sgl-kernel compilation (#5288 )	2025-04-11 02:05:53 -07:00
Yineng Zhang	b75275b6f2	feat: add cu128 identifier for sgl-kernel (#5287 )	2025-04-11 01:58:46 -07:00
Yineng Zhang	7074e9ca20	fix: enable fp4 compilation on cu128 (#5286 )	2025-04-11 01:43:44 -07:00
Elfie Guo	a222945df2	Update Makefile / build script to avoid installing incompatible torch dependency (#5245 )	2025-04-10 22:21:02 +00:00
PGFLMG	ed01b4515e	[Misc] Clean sgl-kernel test (#5216 )	2025-04-10 11:28:41 -07:00
HAI	d050df368c	ROCm sgl-kernel: compatible to later torch (#5167 )	2025-04-10 09:18:36 -07:00
Richard Zou	76f44c2a8d	Fix deepseek-v3 with torch.compile in PyTorch 2.6. (#5213 )	2025-04-10 09:14:38 -07:00
Xiaoyu Zhang	f730362ee2	reduce moe_align_block_size_kernel small batch mode overhead (#5086 )	2025-04-09 17:59:35 -07:00
Yi Zhang	ebf495f013	sgl-kernel use cutlass latest version for fp8 blockwise gemm (#5207 )	2025-04-09 11:47:04 -07:00
yinfan98	d2e507df3c	[Misc] clean up vllm in sgl-kernel test (#5189 )	2025-04-09 01:22:13 -07:00
Trevor Morris	11d760d56a	FP4 weight loading and inference (2/2) (#3972 )	2025-04-08 17:26:21 -07:00
Ma Mingfei	a73c4df438	Add optimized native kernels in sgl-kernel (#5150 ) Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com> Co-authored-by: YanbingJiang <yanbing.jiang@intel.com> Co-authored-by: blzheng <beilei.zheng@intel.com>	2025-04-08 09:37:46 -07:00
yinfan98	9798e72baa	[Misc] Use pytest.mark.skipif in sgl-kernel test (#5137 )	2025-04-07 21:35:14 -07:00
Yineng Zhang	496dde8491	bump sgl-kernel 0.0.8 (#5089 )	2025-04-05 14:28:04 -07:00
Yi Zhang	bcbbf519f9	sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079 )	2025-04-05 14:23:20 -07:00
Yineng Zhang	3f287b8579	support sgl-kernel on blackwell (#5074 )	2025-04-04 16:59:32 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Yineng Zhang	d7954b7682	bump sgl-kernel v0.0.7 (#5046 )	2025-04-03 13:38:13 -07:00
yinfan98	b8b6008f47	[Fix] fix fa3 build at cu118 (#5036 )	2025-04-03 11:52:35 -07:00
Zhiqiang Xie	9d0b36c47a	fix deepgemm as well (#5030 )	2025-04-03 02:41:37 -07:00
Yuhong Guo	7d8c0ce7ce	[Build] Support build sgl-kernel with ccache (#5020 )	2025-04-03 00:22:37 -07:00
Zhiqiang Xie	a2aea59b6e	update cutlass tag (#5011 )	2025-04-02 18:30:30 -07:00
Xiaoyu Zhang	2c8fd99363	[sgl-kernel] per token group quant support COLUMN MAJOR (#4817 )	2025-04-02 18:29:59 -07:00
Yuhong Guo	ee47a6c1c3	[Build] Fix cuda12.8 build error in nvfp4_scaled_mm_kernels.cu (#4953 )	2025-03-31 12:00:34 -07:00
Yineng Zhang	6384d31776	bump sgl-kernel v0.0.6 (#4950 )	2025-03-31 11:24:09 -07:00
yinfan98	c7457191a0	[Fix] revert clean m.def for cudagraph (#4944 )	2025-03-31 02:08:55 -07:00
Yineng Zhang	4814ecaff9	cleanup sgl-kernel (#4933 )	2025-03-30 14:12:30 -07:00
yinfan98	37c66ec856	[feat] add fa3 in sgl-kernel (#4902 ) Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>	2025-03-30 12:57:10 -07:00
Yineng Zhang	195a09f57c	fix bmm fp8 (#4926 )	2025-03-30 12:15:20 -07:00
Adarsh Shirawalmath	9fccda3111	[Feature] use pytest for sgl-kernel (#4896 )	2025-03-30 10:36:52 -07:00
Yi Zhang	5ec5eaf760	fix allreduce test (#4909 )	2025-03-29 23:16:53 -07:00
yinfan98	0d7fe866f9	[Misc] Clean m.def and add Development Tips (#4890 )	2025-03-29 23:06:18 -07:00
Yineng Zhang	54b9a2de0a	remove setup for sgl-kernel (#4899 )	2025-03-29 12:47:38 -07:00
yinfan98	8e7b31546c	quick fix: add default for new kernel (#4898 )	2025-03-29 12:31:59 -07:00
Qingquan Song	45dcfc2e76	Add deepseek style fused moe group gate selection kernel (#4530 )	2025-03-29 11:51:45 -07:00
yinfan98	ddf8981d91	Delete test_deep_gemm.py (#4891 )	2025-03-29 10:46:11 -07:00
yinfan98	05625b9792	[Docs] Update DeepGEMM at README.md (#4886 )	2025-03-29 09:53:39 -07:00
Yineng Zhang	ec3ee0289d	fix sgl-kernel cu118 build (#4872 )	2025-03-28 17:23:51 -07:00
Yineng Zhang	92941ce7b5	bump sgl-kernel 0.0.5.post4 (#4768 )	2025-03-28 14:40:53 -07:00
Yineng Zhang	2bb0e7cf43	fix sampling issue (#4871 )	2025-03-28 14:07:21 -07:00
yinfan98	4db29e82ec	[Feat] support deepgemm for cmake (#4864 )	2025-03-28 10:51:44 -07:00
Yineng Zhang	6dea5c96bf	Revert "get the python version from env (#4729 )" (#4863 )	2025-03-28 08:07:48 -07:00
DavidChan	5eae67cb1f	get the python version from env (#4729 )	2025-03-27 22:26:42 -07:00
Yineng Zhang	31dfff7da7	use default for torch.ops (#4835 )	2025-03-27 19:09:58 -07:00
Yineng Zhang	8bf6d7f406	support cmake for sgl-kernel (#4706 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-27 01:42:28 -07:00
Yi Pan	45fdf1f7f3	Fix shared memory OOM on sm86 GPUs. (#4797 )	2025-03-26 10:41:53 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00

1 2 3 4 5

237 Commits