sglang

Author	SHA1	Message	Date
Johnny	e7aa4664b3	[NVIDIA] Build CUDA 13 (#11299 ) Co-authored-by: ishandhanani <ishandhanani@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-22 20:03:12 -07:00
Johnny	4b65ed42cc	[NVIDIA] upstream FA4 and fix cccl path (#11929 )	2025-10-21 21:18:25 -07:00
Fan Yin	23afdfd1c2	[sgl-kernel] support flashmla libtorch (#11717 )	2025-10-21 21:17:50 -07:00
Johnny	252dc4e112	[NVIDIA] FA3/FA4 Fix (#11606 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-19 17:10:10 -07:00
Fan Yin	3289da5b41	[sgl-kernel] support hadamard (#11663 )	2025-10-15 19:00:44 -07:00
Yineng Zhang	f792e3c561	Revert "[NVIDIA] BUMP FA3 (#11444 )" (#11582 )	2025-10-13 20:51:45 -07:00
Johnny	b8c430f1ce	[NVIDIA] BUMP FA3 (#11444 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2025-10-13 09:30:57 -07:00
Qi Yuhang	9a30914e94	[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-12 20:19:21 -07:00
PGFLMG	8fdcd98efe	[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019 )	2025-10-11 14:04:57 -07:00
fzyzcjy	21337b22b9	Reland [1/2] Optimizations and refactors about quant kernel (#10312 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-11 15:59:03 +08:00
Mick	a3c2ea4451	fix: fix revision for sgl-flash-attn in sgl-kernel (#11327 )	2025-10-08 15:50:44 -07:00
Lianmin Zheng	148d8d485d	Update DeepGEMM repository tag to specific commit (#11229 )	2025-10-05 13:47:36 -07:00
DarkSharpness	e0b2d3eebe	[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-05 10:19:03 -07:00
Zhihao Zhang	24f7cb1ece	[speculative decoding] rename lookahead to ngram (#11010 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com>	2025-09-28 21:06:59 -07:00
Kangyan-Zhou	0c9174108a	Unify SGL Kernel Releases (#10701 )	2025-09-28 19:48:28 -07:00
Yuan Luo	616a3e20df	[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-09-19 14:12:09 +08:00
Zhihao Zhang	e7bc600304	[Feature] Speculative decoding support lookahead (#9873 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-18 16:42:41 -07:00
Qi Yuhang	9b876889b7	Update CUTLASS. Refine KernelSchedule for fp8 (grouped) gemm. (#10491 )	2025-09-16 02:47:37 -07:00
Yineng Zhang	55025b9282	fix: use latest flashinfer (#10428 )	2025-09-14 15:07:14 -07:00
Lianmin Zheng	c9ec4cae5b	Fix the style of sgl kernel (#10398 )	2025-09-12 22:20:21 -07:00
Yineng Zhang	de15d1405a	Revert "Fix flashinfer version in sgl-kernel (#10135 )" (#10310 )	2025-09-11 01:27:58 -07:00
Yi Zhang	8cbe1538ef	Add mamba kernel (#10234 )	2025-09-09 12:58:43 -07:00
Yineng Zhang	94fb4e9e54	feat: support fa cute in sgl-kernel (#10205 ) Co-authored-by: cicirori <32845984+cicirori@users.noreply.github.com>	2025-09-09 00:14:39 -07:00
fzyzcjy	0096798ed6	[1/2] Speed up prefill mla attention (#10156 )	2025-09-08 09:00:33 -07:00
Rain Jiang	6049ca209e	move compile threads to an option to avoid OOM on low memory host (#10123 )	2025-09-07 21:36:14 -07:00
Lianmin Zheng	76a2c86b88	Fix flashinfer version in sgl-kernel (#10135 )	2025-09-07 12:54:07 -07:00
hlu1	5f1eb20484	[chore] Remove unused ep_moe cuda kernels (#9956 )	2025-09-06 01:35:50 -07:00
hlu1	039cef76aa	Remove non-accelerated targets(100 and up) from cmake (#10041 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-09-06 01:35:28 -07:00
fzyzcjy	bd7f882142	Support copying tensor from cpu to gpu without using copy engines (#10007 )	2025-09-05 20:07:19 +08:00
Lianmin Zheng	d631290e32	Remove annoying warnings in sgl kernel build (#9905 )	2025-09-02 20:18:25 -07:00
PGFLMG	7fe89f7cdb	[sgl-kernel] fix: fix missing FetchContent_Populate for fmt (#9826 )	2025-08-30 12:57:42 -07:00
Rain Jiang	6b39f9cf8c	Support compile sgl-kernel on cuda 13.0 (#9721 )	2025-08-28 10:18:03 -07:00
PGFLMG	aa3eba8eb4	[sgl-kernel] misc: update deepgemm version for sgl-kernel (#9340 ) Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: fzyzcjy <ch271828n@outlook.com>	2025-08-27 12:01:30 -07:00
Rain Jiang	79e6a8a6ac	support cuda 13.0 and trtllm kernel by Aug 25 2025 (#9495 )	2025-08-26 23:13:27 -07:00
Qi Yuhang	fda4792620	Update CUTLASS 4.2 & Enable K-Major Scale Factor for SM90 FP8 Blockwise Group GEMM (#9559 )	2025-08-24 23:24:43 -07:00
EduardDurech	720cd308ba	Add `CMakeLists.txt` binary_dir (#7019 )	2025-08-18 18:36:33 -07:00
Lianmin Zheng	c480a3f6ea	Minor style fixes for sgl-kernel (#9289 )	2025-08-18 09:38:35 -07:00
Liangsheng Yin	4d98e48649	Revert "[Misc] feat: Deepgemm update for sgl-kernel (#8790 )" to fix kernel CI (#9260 )	2025-08-17 22:59:50 +08:00
Liangsheng Yin	0c8594e67d	Optional extension for green context (#9231 )	2025-08-15 21:33:52 +08:00
PGFLMG	a3d99d6dcd	[Misc] feat: Deepgemm update for sgl-kernel (#8790 )	2025-08-15 01:05:27 -07:00
Yineng Zhang	9d54c6e6dd	feat: remove sm75 (#9207 )	2025-08-14 22:27:14 -07:00
strgrb	1f9d65f57d	use fast math for per_token_group_quant_8bit. (#9177 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-08-14 22:19:56 -07:00
Peng Zhang	5aa1ebd242	[2/n]decouple quantization implementation from vLLM dependency (#8112 ) Co-authored-by: walker-ai <yiyun.wyt@antgroup.com> Co-authored-by: leoneo <1320612015@qq.com>	2025-08-14 03:19:03 -07:00
DarkSharpness	86a0be65d8	[Feature] Support custom set kv buffer kernel (#8884 )	2025-08-12 16:56:51 -07:00
Lianmin Zheng	2c7f01bc89	Reorganize CI and test files (#9027 )	2025-08-10 12:30:06 -07:00
Yineng Zhang	8e8545caf6	fix: update cmake (#8817 )	2025-08-05 09:38:30 -07:00
Qiaolin Yu	fc8c8e5041	Integrate triton_kernels in sgl-kernel (#8762 )	2025-08-04 12:12:14 -07:00
Baizhou Zhang	91e3d1542e	Update Cutlass in sgl-kernel to v4.1 (#8392 )	2025-07-27 00:36:15 -07:00
Yineng Zhang	4c605235aa	fix: workaround for deepgemm warmup issue (#8302 )	2025-07-23 12:01:51 -07:00
Baizhou Zhang	282eb59ff3	Add bf16 output option for dsv3_router_gemm kernel (#7999 )	2025-07-20 09:49:37 +08:00

1 2 3

109 Commits