sglang

Author	SHA1	Message	Date
Yineng Zhang	b371f7cd36	chore: bump sgl-kernel v0.0.8.post3 (#5332 )	2025-04-12 12:53:37 -07:00
Yineng Zhang	812e82f35e	fix: solve cu118 issue for cutlass mla (#5331 )	2025-04-12 12:51:09 -07:00
PGFLMG	4879e50c6d	[Feat] Add sparse attn to sgl-kernel (#5327 )	2025-04-12 11:36:36 -07:00
tianlian yi	bc92107b03	Support server based rollout in Verlengine (#4848 ) Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>	2025-04-12 10:07:52 -07:00
Xiaoyu Zhang	3e4794aad8	refine fused_moe tuning docs (#5294 )	2025-04-12 10:01:13 -07:00
Xiaoyu Zhang	690ec20587	Delete python/sglang/srt/layers/moe/fused_moe_triton/configs/E=257,N=… (#5321 )	2025-04-12 10:00:03 -07:00
thyecust	2074a2e6b6	Fix: docs/backend/structured_outputs.ipynb (#4884 )	2025-04-12 02:18:55 -07:00
Yineng Zhang	57de7c6b5f	feat: use fa3 mla by default on hopper (#5210 ) Co-authored-by: yundai424 <yundai424@gmail.com> Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>	2025-04-12 01:09:25 -07:00
Yineng Zhang	115ae2e728	chore: bump sgl-kernel v0.0.8.post2 (#5317 )	2025-04-11 23:42:03 -07:00
Qingquan Song	aea98512a8	Fix fa3 window size setup (#5316 )	2025-04-11 23:37:52 -07:00
Baizhou Zhang	e4155e96d0	Add flash_attn_varlen_func to sgl-kernel (#5315 )	2025-04-11 23:36:36 -07:00
lambert0312	1b1b47a949	Fix w8a8_int8 model shared experts fusion load weights error (#5120 )	2025-04-11 23:33:51 -07:00
Zhaoyi Li	3c9740d200	update variable naming and comments for rocm (#5299 )	2025-04-11 23:15:05 -07:00
Yineng Zhang	2eb55770f9	misc: cleanup 3rdparty (#5311 )	2025-04-11 22:53:50 -07:00
Trevor Morris	f65b8d5c89	Blackwell Cutlass MLA kernel (#5142 )	2025-04-11 22:16:51 -07:00
Ke Bao	5ad0571903	Adjust ci test threshold (#5271 )	2025-04-11 22:03:37 -07:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Yineng Zhang	611720919d	fix: use deepgemm only on hopper (#5310 )	2025-04-11 20:48:24 -07:00
Yineng Zhang	4f288113ce	fix: update flash attn (#5308 )	2025-04-11 16:23:09 -07:00
Yineng Zhang	136b8e6afb	fix: remove cublas_grouped_gemm (#5307 )	2025-04-11 16:22:37 -07:00
Yineng Zhang	034c5256cc	misc: update blackwell Dockerfile (#5306 )	2025-04-11 15:58:04 -07:00
Yineng Zhang	c1dd773c19	fix: use fa3 unit test on hopper only (#5304 )	2025-04-11 15:10:49 -07:00
Yineng Zhang	6f8593799b	feat: add blackwell workflow (#5303 )	2025-04-11 13:42:00 -07:00
Yineng Zhang	f774a0d275	feat: add blackwell Dockerfile (#5302 )	2025-04-11 13:08:53 -07:00
Xiaoyu Zhang	60bcbf2a35	remove moe_align_block_size torch.zeros in small batch/expert mode (#5298 )	2025-04-11 12:13:55 -07:00
Adarsh Shirawalmath	a0a9f6d64f	[Docs] Remove the older supported docs section (#5301 )	2025-04-11 11:30:18 -07:00
Yineng Zhang	80aa8ca84e	fix: update update_wheel_index for cu128 (#5300 )	2025-04-11 09:31:03 -07:00
Adarsh Shirawalmath	4aa6bab0b0	[Docs] Supported Model Docs - Major restructuring (#5290 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-11 09:17:47 -07:00
Yusong Gao	c35dcfdb30	[PD] fix: skip warmup request in disaggregation mode to prevent crash on timeout (#5292 )	2025-04-11 23:03:07 +08:00
Yineng Zhang	c163bf4ff1	chore: bump sgl-kernel v0.0.8.post1 (#5289 )	2025-04-11 02:11:53 -07:00
Yineng Zhang	5598634326	chore: relax the torch version restriction for sgl-kernel compilation (#5288 )	2025-04-11 02:05:53 -07:00
Yineng Zhang	b75275b6f2	feat: add cu128 identifier for sgl-kernel (#5287 )	2025-04-11 01:58:46 -07:00
Yineng Zhang	7074e9ca20	fix: enable fp4 compilation on cu128 (#5286 )	2025-04-11 01:43:44 -07:00
Michael Yao	fc14cca088	Fix a 404 link in send_request.ipynb (#5280 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-11 01:38:45 -07:00
XinyuanTong	e7beff8a13	fix: examples for token_in_token_out_vlm (#5193 )	2025-04-11 01:38:23 -07:00
mlmz	4d2e305149	doc: nested loop code for offline engine (#5244 )	2025-04-11 01:36:30 -07:00
Mick	e53a0b3d5b	[fix] fix mrope positions not picked up (#5265 )	2025-04-11 01:29:45 -07:00
Cheng Wan	038bc5d521	Support `--enable-llama4-multimodal` (#5254 )	2025-04-11 01:24:14 -07:00
Chang Su	aee62d744b	Optimize GPU memory usage in FlashAttentionBackend's strided indexing (#5262 ) Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-04-11 00:34:17 -07:00
fzyzcjy	cd7e32e2cb	Optimize attention in llama4 (#5127 )	2025-04-11 00:32:41 -07:00
HAI	8879944800	ROCm/AITER CK_MoE: update 2-stage kernels & support both Activations (#5228 )	2025-04-10 18:19:57 -07:00
Richard Zou	a879811c4b	Fix torch.compile cacheing (#5259 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-10 18:08:45 -07:00
Elfie Guo	a222945df2	Update Makefile / build script to avoid installing incompatible torch dependency (#5245 )	2025-04-10 22:21:02 +00:00
PGFLMG	ed01b4515e	[Misc] Clean sgl-kernel test (#5216 )	2025-04-10 11:28:41 -07:00
HAI	d050df368c	ROCm sgl-kernel: compatible to later torch (#5167 )	2025-04-10 09:18:36 -07:00
Richard Zou	76f44c2a8d	Fix deepseek-v3 with torch.compile in PyTorch 2.6. (#5213 )	2025-04-10 09:14:38 -07:00
Ke Bao	1078396f47	Update deps for mllama4 (#5215 )	2025-04-10 09:12:44 -07:00
Teng Ma	7e4f72dd8c	[PD] Add get_contiguous_buf_infos interface for MLATokenToKVPool (#5204 )	2025-04-10 20:05:34 +08:00
Teng Ma	4c31ae9f6d	[PD] Support KV transfer with mooncake (#4880 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: shangmingc <csmthu@gmail.com>	2025-04-10 14:23:23 +08:00
Xiaoyu Zhang	f730362ee2	reduce moe_align_block_size_kernel small batch mode overhead (#5086 )	2025-04-09 17:59:35 -07:00

1 2 3 4 5 ...

2796 Commits