sglang

Author	SHA1	Message	Date
maxiao1	75cd34d172	change sgl_kernel WARP_SIZE to 64	2025-11-03 10:17:53 +08:00
maxiao1	32b1ccaf62	修改sgl-kernel下的setup_hip.py	2025-10-25 13:11:02 +08:00
maxiao	251235c229	适配v0.5.4	2025-10-25 12:16:25 +08:00
blzheng	13fb8b5489	[CPU] Optimize FP16 decode_attention_cpu (#10652 )	2025-10-22 21:39:51 -07:00
Zaili Wang	007b849b0e	[CPU] misc updates (#11906 )	2025-10-22 21:10:05 -07:00
Johnny	e7aa4664b3	[NVIDIA] Build CUDA 13 (#11299 ) Co-authored-by: ishandhanani <ishandhanani@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-22 20:03:12 -07:00
Johnny	4b65ed42cc	[NVIDIA] upstream FA4 and fix cccl path (#11929 )	2025-10-21 21:18:25 -07:00
Fan Yin	23afdfd1c2	[sgl-kernel] support flashmla libtorch (#11717 )	2025-10-21 21:17:50 -07:00
Serge Panev	2b1da821b5	[NVIDIA] Add new SMs support for Spark & Thor (#11287 ) Signed-off-by: Serge Panev <spanev@nvidia.com>	2025-10-22 02:02:24 +08:00
Yuan Luo	271d3d0d50	Support mrope triton kernel and add unit test (#11722 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2025-10-20 11:51:07 +08:00
sglang-bot	283c8ba031	chore: bump sgl-kernel version to 0.3.16.post3 (#11733 )	2025-10-19 21:44:15 -05:00
Kangyan-Zhou	27a223aba4	Improve Kernel Build Time (#11508 )	2025-10-19 18:11:48 -07:00
hlu1	3b80232d06	[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-10-19 17:13:39 -07:00
Johnny	252dc4e112	[NVIDIA] FA3/FA4 Fix (#11606 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-19 17:10:10 -07:00
fzyzcjy	a27825ae01	Support not officially supported high sgl-kernel version with low srt version (#11786 )	2025-10-19 16:11:59 +08:00
Fan Yin	3289da5b41	[sgl-kernel] support hadamard (#11663 )	2025-10-15 19:00:44 -07:00
Fan Yin	5464457251	[sgl-kernel] Optimize gguf test (#11667 )	2025-10-15 15:45:53 -07:00
Qi Yuhang	6c01844f45	[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674 )	2025-10-15 13:39:31 -07:00
fzyzcjy	32803fb279	Super tiny improve FA3 import error message (#11590 )	2025-10-14 22:06:31 -07:00
sglang-bot	98923880bc	chore: bump sgl-kernel version to 0.3.16.post2 (#11583 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 20:52:38 -07:00
Yineng Zhang	f792e3c561	Revert "[NVIDIA] BUMP FA3 (#11444 )" (#11582 )	2025-10-13 20:51:45 -07:00
sglang-bot	60b0503227	chore: bump sgl-kernel version to 0.3.16.post1 (#11573 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 16:26:18 -07:00
Qi Yuhang	dc48c4c0e3	[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534 )	2025-10-13 16:24:48 -07:00
Johnny	b8c430f1ce	[NVIDIA] BUMP FA3 (#11444 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2025-10-13 09:30:57 -07:00
Qi Yuhang	9a30914e94	[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-12 20:19:21 -07:00
sglang-bot	2db2cddd12	chore: bump sgl-kernel version to 0.3.16 (#11476 )	2025-10-11 22:04:49 -07:00
PGFLMG	8fdcd98efe	[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019 )	2025-10-11 14:04:57 -07:00
fzyzcjy	21337b22b9	Reland [1/2] Optimizations and refactors about quant kernel (#10312 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-11 15:59:03 +08:00
Lianmin Zheng	9b8ebb2798	move more files under srt/utils (#11285 )	2025-10-09 16:46:15 -07:00
Mick	a3c2ea4451	fix: fix revision for sgl-flash-attn in sgl-kernel (#11327 )	2025-10-08 15:50:44 -07:00
Yuan Luo	4f42c8cd3e	[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-07 14:31:11 +00:00
sglang-bot	8c9670375f	chore: bump sgl-kernel version to 0.3.15 (#11281 )	2025-10-06 18:17:51 -07:00
Yineng Zhang	fb27d38305	docs: update sgl-kernel README (#11286 )	2025-10-06 17:55:22 -07:00
Lifu Huang	748f86f3de	[Bug] Fix incorrect assertion in FA4 and add UT. (#11182 )	2025-10-06 14:58:39 -07:00
Lianmin Zheng	d645ae90a3	Rename runner labels (#11228 )	2025-10-05 18:05:41 -07:00
Lianmin Zheng	148d8d485d	Update DeepGEMM repository tag to specific commit (#11229 )	2025-10-05 13:47:36 -07:00
PGFLMG	1a599509cc	chore: bump sgl-kernel v0.3.14.post1 (#11137 )	2025-10-05 13:46:43 -07:00
DarkSharpness	e0b2d3eebe	[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-05 10:19:03 -07:00
PGFLMG	580051c5a8	chore: bump sgl-kernel v0.3.14 (#11067 )	2025-09-30 02:53:24 -07:00
Xiaoyu Zhang	11965b0daf	Fix sgl-kernel benchmark dead code (#11022 )	2025-09-29 15:06:40 +08:00
Zhihao Zhang	24f7cb1ece	[speculative decoding] rename lookahead to ngram (#11010 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com>	2025-09-28 21:06:59 -07:00
Lifu Huang	e98d9346c7	[1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940 )	2025-09-28 19:59:14 -07:00
Kangyan-Zhou	0c9174108a	Unify SGL Kernel Releases (#10701 )	2025-09-28 19:48:28 -07:00
Lianmin Zheng	07440f5f34	Fix FusedSetKVBufferArg in RotaryEmbedding (#11003 )	2025-09-28 11:17:27 -07:00
Yuan Luo	42245551ef	[sgl-kernel] Optimize concat_mla_k kernel (#10543 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com>	2025-09-28 23:04:22 +08:00
Lianmin Zheng	35ec2a45a8	[minor] Remove deprecated function `get_ip` (#10883 )	2025-09-25 16:18:04 -07:00
Yuhao Yao	fe531d6f4e	[Bug] Fix Issue#10215 (#10572 )	2025-09-25 09:51:50 +08:00
Xiaoyu Zhang	c4e314f986	Restruct sgl-kernel benchmark (#10861 )	2025-09-25 07:45:25 +08:00
Yineng Zhang	e53df7c009	chore: bump sgl-kernel v0.3.12 (#10732 )	2025-09-22 14:39:25 -07:00
Qi Yuhang	0f04a5f428	Optimize cutlass int8 gemm kernel for large M on SM89 Ada GPU (#10714 )	2025-09-21 17:04:27 -07:00

1 2 3 4 5 ...

587 Commits