sglang

Author	SHA1	Message	Date
Yi Zhang	bcbbf519f9	sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079 )	2025-04-05 14:23:20 -07:00
Yineng Zhang	3f287b8579	support sgl-kernel on blackwell (#5074 )	2025-04-04 16:59:32 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Yineng Zhang	d7954b7682	bump sgl-kernel v0.0.7 (#5046 )	2025-04-03 13:38:13 -07:00
yinfan98	b8b6008f47	[Fix] fix fa3 build at cu118 (#5036 )	2025-04-03 11:52:35 -07:00
Zhiqiang Xie	9d0b36c47a	fix deepgemm as well (#5030 )	2025-04-03 02:41:37 -07:00
Yuhong Guo	7d8c0ce7ce	[Build] Support build sgl-kernel with ccache (#5020 )	2025-04-03 00:22:37 -07:00
Zhiqiang Xie	a2aea59b6e	update cutlass tag (#5011 )	2025-04-02 18:30:30 -07:00
Xiaoyu Zhang	2c8fd99363	[sgl-kernel] per token group quant support COLUMN MAJOR (#4817 )	2025-04-02 18:29:59 -07:00
Yuhong Guo	ee47a6c1c3	[Build] Fix cuda12.8 build error in nvfp4_scaled_mm_kernels.cu (#4953 )	2025-03-31 12:00:34 -07:00
Yineng Zhang	6384d31776	bump sgl-kernel v0.0.6 (#4950 )	2025-03-31 11:24:09 -07:00
yinfan98	c7457191a0	[Fix] revert clean m.def for cudagraph (#4944 )	2025-03-31 02:08:55 -07:00
Yineng Zhang	4814ecaff9	cleanup sgl-kernel (#4933 )	2025-03-30 14:12:30 -07:00
yinfan98	37c66ec856	[feat] add fa3 in sgl-kernel (#4902 ) Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>	2025-03-30 12:57:10 -07:00
Yineng Zhang	195a09f57c	fix bmm fp8 (#4926 )	2025-03-30 12:15:20 -07:00
Adarsh Shirawalmath	9fccda3111	[Feature] use pytest for sgl-kernel (#4896 )	2025-03-30 10:36:52 -07:00
Yi Zhang	5ec5eaf760	fix allreduce test (#4909 )	2025-03-29 23:16:53 -07:00
yinfan98	0d7fe866f9	[Misc] Clean m.def and add Development Tips (#4890 )	2025-03-29 23:06:18 -07:00
Yineng Zhang	54b9a2de0a	remove setup for sgl-kernel (#4899 )	2025-03-29 12:47:38 -07:00
yinfan98	8e7b31546c	quick fix: add default for new kernel (#4898 )	2025-03-29 12:31:59 -07:00
Qingquan Song	45dcfc2e76	Add deepseek style fused moe group gate selection kernel (#4530 )	2025-03-29 11:51:45 -07:00
yinfan98	ddf8981d91	Delete test_deep_gemm.py (#4891 )	2025-03-29 10:46:11 -07:00
yinfan98	05625b9792	[Docs] Update DeepGEMM at README.md (#4886 )	2025-03-29 09:53:39 -07:00
Yineng Zhang	ec3ee0289d	fix sgl-kernel cu118 build (#4872 )	2025-03-28 17:23:51 -07:00
Yineng Zhang	92941ce7b5	bump sgl-kernel 0.0.5.post4 (#4768 )	2025-03-28 14:40:53 -07:00
Yineng Zhang	2bb0e7cf43	fix sampling issue (#4871 )	2025-03-28 14:07:21 -07:00
yinfan98	4db29e82ec	[Feat] support deepgemm for cmake (#4864 )	2025-03-28 10:51:44 -07:00
Yineng Zhang	6dea5c96bf	Revert "get the python version from env (#4729 )" (#4863 )	2025-03-28 08:07:48 -07:00
DavidChan	5eae67cb1f	get the python version from env (#4729 )	2025-03-27 22:26:42 -07:00
Yineng Zhang	31dfff7da7	use default for torch.ops (#4835 )	2025-03-27 19:09:58 -07:00
Yineng Zhang	8bf6d7f406	support cmake for sgl-kernel (#4706 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-27 01:42:28 -07:00
Yi Pan	45fdf1f7f3	Fix shared memory OOM on sm86 GPUs. (#4797 )	2025-03-26 10:41:53 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
Alex Sun	af6535e7aa	[ROCm] Enable MTP (NextN) on AMD GPU (#4631 )	2025-03-23 22:58:05 -07:00
AniZpZ	321ab756bc	[1/3] fix dsv3 awq issue (#4556 ) Co-authored-by: leoneo <1320612015@qq.com>	2025-03-22 01:07:17 -07:00
Chunan Zeng	6a384d5c01	Speed up per token and per tensor quant by 15% (#4639 )	2025-03-22 00:37:57 -07:00
Shu Wang	ad4e58bf67	Support fp8 gemm for blackwell (#4558 )	2025-03-20 12:40:28 -07:00
strgrb	f9c53cbb42	Create col-major and tma-aligned x_scale for deep_gemm.gemm_fp8_fp8_bf16_nt (#4515 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-03-19 00:02:43 -07:00
Yineng Zhang	988ab646ec	bump v0.0.5.post3 (#4520 )	2025-03-17 13:05:59 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
yiakwy-xpu-ml-framework-team	9b8333d992	[ROCm] enable moe topk softmax in amd (#4448 )	2025-03-16 18:16:55 -07:00
Yi Zhang	25e1816eff	fix custom allreduce performance/accuracy problem (#4477 )	2025-03-16 12:16:30 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Yineng Zhang	9971dc2283	Revert "feat: Add FlashMLA submodule (#4449 )" (#4470 )	2025-03-16 01:30:05 -07:00
Lianmin Zheng	3db35c1af4	Release sgl-kernel v0.0.5.post2 (#4469 )	2025-03-16 01:01:53 -07:00
Ying Sheng	52a34d7448	Add greedy verification kernel (#4383 )	2025-03-16 00:58:26 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Shi Shuai	81f431eded	feat: Add FlashMLA submodule (#4449 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-03-15 23:30:25 -07:00
Yineng Zhang	862fe52241	bump v0.0.5.post1 (#4437 )	2025-03-14 15:00:26 -07:00

1 2 3 4 5

221 Commits