sglang

Author	SHA1	Message	Date
DefTruth	12ef7e3bc3	bugfix: fix merge_state_v2 cuda graph (#5419 )	2025-04-15 10:18:47 -07:00
DefTruth	388e15c0db	kernel: support slightly faster merge_state_v2 cuda kernel (#5381 )	2025-04-14 21:28:23 -07:00
Yineng Zhang	b62e7e99b8	feat: adapt merge_state (#5337 )	2025-04-12 21:14:04 -07:00
Yineng Zhang	812e82f35e	fix: solve cu118 issue for cutlass mla (#5331 )	2025-04-12 12:51:09 -07:00
PGFLMG	4879e50c6d	[Feat] Add sparse attn to sgl-kernel (#5327 )	2025-04-12 11:36:36 -07:00
Zhaoyi Li	3c9740d200	update variable naming and comments for rocm (#5299 )	2025-04-11 23:15:05 -07:00
Trevor Morris	f65b8d5c89	Blackwell Cutlass MLA kernel (#5142 )	2025-04-11 22:16:51 -07:00
Yineng Zhang	136b8e6afb	fix: remove cublas_grouped_gemm (#5307 )	2025-04-11 16:22:37 -07:00
Richard Zou	76f44c2a8d	Fix deepseek-v3 with torch.compile in PyTorch 2.6. (#5213 )	2025-04-10 09:14:38 -07:00
Xiaoyu Zhang	f730362ee2	reduce moe_align_block_size_kernel small batch mode overhead (#5086 )	2025-04-09 17:59:35 -07:00
Yi Zhang	ebf495f013	sgl-kernel use cutlass latest version for fp8 blockwise gemm (#5207 )	2025-04-09 11:47:04 -07:00
Ma Mingfei	a73c4df438	Add optimized native kernels in sgl-kernel (#5150 ) Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com> Co-authored-by: YanbingJiang <yanbing.jiang@intel.com> Co-authored-by: blzheng <beilei.zheng@intel.com>	2025-04-08 09:37:46 -07:00
Yi Zhang	bcbbf519f9	sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079 )	2025-04-05 14:23:20 -07:00
yinfan98	b8b6008f47	[Fix] fix fa3 build at cu118 (#5036 )	2025-04-03 11:52:35 -07:00
Xiaoyu Zhang	2c8fd99363	[sgl-kernel] per token group quant support COLUMN MAJOR (#4817 )	2025-04-02 18:29:59 -07:00
Yuhong Guo	ee47a6c1c3	[Build] Fix cuda12.8 build error in nvfp4_scaled_mm_kernels.cu (#4953 )	2025-03-31 12:00:34 -07:00
yinfan98	c7457191a0	[Fix] revert clean m.def for cudagraph (#4944 )	2025-03-31 02:08:55 -07:00
yinfan98	37c66ec856	[feat] add fa3 in sgl-kernel (#4902 ) Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>	2025-03-30 12:57:10 -07:00
Yineng Zhang	195a09f57c	fix bmm fp8 (#4926 )	2025-03-30 12:15:20 -07:00
yinfan98	0d7fe866f9	[Misc] Clean m.def and add Development Tips (#4890 )	2025-03-29 23:06:18 -07:00
Qingquan Song	45dcfc2e76	Add deepseek style fused moe group gate selection kernel (#4530 )	2025-03-29 11:51:45 -07:00
Yineng Zhang	ec3ee0289d	fix sgl-kernel cu118 build (#4872 )	2025-03-28 17:23:51 -07:00
Yineng Zhang	8bf6d7f406	support cmake for sgl-kernel (#4706 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-27 01:42:28 -07:00
Yi Pan	45fdf1f7f3	Fix shared memory OOM on sm86 GPUs. (#4797 )	2025-03-26 10:41:53 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
Alex Sun	af6535e7aa	[ROCm] Enable MTP (NextN) on AMD GPU (#4631 )	2025-03-23 22:58:05 -07:00
AniZpZ	321ab756bc	[1/3] fix dsv3 awq issue (#4556 ) Co-authored-by: leoneo <1320612015@qq.com>	2025-03-22 01:07:17 -07:00
Chunan Zeng	6a384d5c01	Speed up per token and per tensor quant by 15% (#4639 )	2025-03-22 00:37:57 -07:00
Shu Wang	ad4e58bf67	Support fp8 gemm for blackwell (#4558 )	2025-03-20 12:40:28 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
yiakwy-xpu-ml-framework-team	9b8333d992	[ROCm] enable moe topk softmax in amd (#4448 )	2025-03-16 18:16:55 -07:00
Yi Zhang	25e1816eff	fix custom allreduce performance/accuracy problem (#4477 )	2025-03-16 12:16:30 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Ying Sheng	52a34d7448	Add greedy verification kernel (#4383 )	2025-03-16 00:58:26 -07:00
Qingquan Song	61e4433caf	Add moe topk softmax templated from vllm (#4302 )	2025-03-14 12:03:33 -07:00
Yineng Zhang	2937387a50	fix accuracy issue (#4376 )	2025-03-13 02:06:22 -07:00
Qingquan Song	4068e01292	Fix per token fp8 quant precision (#4362 )	2025-03-12 21:19:05 -07:00
Shi Shuai	817d43705c	feat: support ep size < 32 for sgl kernel (#4348 )	2025-03-12 20:50:46 -07:00
Elfie Guo	7c86671131	Support Blackwell Block Scale FP8 Gemm (#4278 )	2025-03-12 14:17:11 -07:00
Rex	07f944631e	Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )	2025-03-12 00:10:02 -07:00
Stefan He	e0917e6bd0	Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215 ) Co-authored-by: Stefan He <bhe@linkedin.com>	2025-03-12 00:08:03 -07:00
yigex	690e1f2371	[AMD] Fix rocm sgl-kernel missing modules error (#4311 ) Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>	2025-03-11 10:35:28 -07:00
Lianmin Zheng	cf0ccd406e	Optimize rope in sgl kernel (#4267 )	2025-03-10 10:07:45 -07:00
Xiaoyu Zhang	23308a9032	fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231 )	2025-03-10 01:42:58 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
Lianmin Zheng	7c0541b385	Move activation.cu to sgl-kernel/elementwise (#4250 )	2025-03-09 22:41:13 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Lianmin Zheng	eb06dbcbf8	Move rope and bmm into sgl-kernel (#4241 )	2025-03-09 18:38:15 -07:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00

50 Commits