sglang

Author	SHA1	Message	Date
Byron Hsu	a9499885e9	[PD] Add transfer backend abstraction (#5328 )	2025-04-14 01:39:39 +08:00
Liangsheng Yin	f765579046	Fix typo: infight -> inflight (#5357 )	2025-04-14 01:25:30 +08:00
Yineng Zhang	f58b929a51	chore: upgrade sgl-kernel 0.0.8.post3 (#5342 )	2025-04-13 00:45:59 -07:00
Yineng Zhang	c1270aabc5	docs: update adoption and sponsorship list with Oracle (#5343 )	2025-04-12 22:55:25 -07:00
mlmz	8311b07fb9	Fix: Ensure tensors for dist.broadcast match NCCL backend device (#5322 )	2025-04-12 22:50:37 -07:00
Yineng Zhang	c138025731	misc: update sagemaker Dockerfile (#5341 )	2025-04-12 22:39:49 -07:00
Yineng Zhang	b62e7e99b8	feat: adapt merge_state (#5337 )	2025-04-12 21:14:04 -07:00
Yineng Zhang	7d3b7c87f5	fix: determine if flashinfer is installed (#5336 )	2025-04-12 19:59:13 -07:00
Yineng Zhang	75015bb688	ci: update release node (#5333 )	2025-04-12 14:22:45 -07:00
Yineng Zhang	b371f7cd36	chore: bump sgl-kernel v0.0.8.post3 (#5332 )	2025-04-12 12:53:37 -07:00
Yineng Zhang	812e82f35e	fix: solve cu118 issue for cutlass mla (#5331 )	2025-04-12 12:51:09 -07:00
PGFLMG	4879e50c6d	[Feat] Add sparse attn to sgl-kernel (#5327 )	2025-04-12 11:36:36 -07:00
tianlian yi	bc92107b03	Support server based rollout in Verlengine (#4848 ) Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>	2025-04-12 10:07:52 -07:00
Xiaoyu Zhang	3e4794aad8	refine fused_moe tuning docs (#5294 )	2025-04-12 10:01:13 -07:00
Xiaoyu Zhang	690ec20587	Delete python/sglang/srt/layers/moe/fused_moe_triton/configs/E=257,N=… (#5321 )	2025-04-12 10:00:03 -07:00
thyecust	2074a2e6b6	Fix: docs/backend/structured_outputs.ipynb (#4884 )	2025-04-12 02:18:55 -07:00
Yineng Zhang	57de7c6b5f	feat: use fa3 mla by default on hopper (#5210 ) Co-authored-by: yundai424 <yundai424@gmail.com> Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>	2025-04-12 01:09:25 -07:00
Yineng Zhang	115ae2e728	chore: bump sgl-kernel v0.0.8.post2 (#5317 )	2025-04-11 23:42:03 -07:00
Qingquan Song	aea98512a8	Fix fa3 window size setup (#5316 )	2025-04-11 23:37:52 -07:00
Baizhou Zhang	e4155e96d0	Add flash_attn_varlen_func to sgl-kernel (#5315 )	2025-04-11 23:36:36 -07:00
lambert0312	1b1b47a949	Fix w8a8_int8 model shared experts fusion load weights error (#5120 )	2025-04-11 23:33:51 -07:00
Zhaoyi Li	3c9740d200	update variable naming and comments for rocm (#5299 )	2025-04-11 23:15:05 -07:00
Yineng Zhang	2eb55770f9	misc: cleanup 3rdparty (#5311 )	2025-04-11 22:53:50 -07:00
Trevor Morris	f65b8d5c89	Blackwell Cutlass MLA kernel (#5142 )	2025-04-11 22:16:51 -07:00
Ke Bao	5ad0571903	Adjust ci test threshold (#5271 )	2025-04-11 22:03:37 -07:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Yineng Zhang	611720919d	fix: use deepgemm only on hopper (#5310 )	2025-04-11 20:48:24 -07:00
Yineng Zhang	4f288113ce	fix: update flash attn (#5308 )	2025-04-11 16:23:09 -07:00
Yineng Zhang	136b8e6afb	fix: remove cublas_grouped_gemm (#5307 )	2025-04-11 16:22:37 -07:00
Yineng Zhang	034c5256cc	misc: update blackwell Dockerfile (#5306 )	2025-04-11 15:58:04 -07:00
Yineng Zhang	c1dd773c19	fix: use fa3 unit test on hopper only (#5304 )	2025-04-11 15:10:49 -07:00
Yineng Zhang	6f8593799b	feat: add blackwell workflow (#5303 )	2025-04-11 13:42:00 -07:00
Yineng Zhang	f774a0d275	feat: add blackwell Dockerfile (#5302 )	2025-04-11 13:08:53 -07:00
Xiaoyu Zhang	60bcbf2a35	remove moe_align_block_size torch.zeros in small batch/expert mode (#5298 )	2025-04-11 12:13:55 -07:00
Adarsh Shirawalmath	a0a9f6d64f	[Docs] Remove the older supported docs section (#5301 )	2025-04-11 11:30:18 -07:00
Yineng Zhang	80aa8ca84e	fix: update update_wheel_index for cu128 (#5300 )	2025-04-11 09:31:03 -07:00
Adarsh Shirawalmath	4aa6bab0b0	[Docs] Supported Model Docs - Major restructuring (#5290 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-11 09:17:47 -07:00
Yusong Gao	c35dcfdb30	[PD] fix: skip warmup request in disaggregation mode to prevent crash on timeout (#5292 )	2025-04-11 23:03:07 +08:00
Yineng Zhang	c163bf4ff1	chore: bump sgl-kernel v0.0.8.post1 (#5289 )	2025-04-11 02:11:53 -07:00
Yineng Zhang	5598634326	chore: relax the torch version restriction for sgl-kernel compilation (#5288 )	2025-04-11 02:05:53 -07:00
Yineng Zhang	b75275b6f2	feat: add cu128 identifier for sgl-kernel (#5287 )	2025-04-11 01:58:46 -07:00
Yineng Zhang	7074e9ca20	fix: enable fp4 compilation on cu128 (#5286 )	2025-04-11 01:43:44 -07:00
Michael Yao	fc14cca088	Fix a 404 link in send_request.ipynb (#5280 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-11 01:38:45 -07:00
XinyuanTong	e7beff8a13	fix: examples for token_in_token_out_vlm (#5193 )	2025-04-11 01:38:23 -07:00
mlmz	4d2e305149	doc: nested loop code for offline engine (#5244 )	2025-04-11 01:36:30 -07:00
Mick	e53a0b3d5b	[fix] fix mrope positions not picked up (#5265 )	2025-04-11 01:29:45 -07:00
Cheng Wan	038bc5d521	Support `--enable-llama4-multimodal` (#5254 )	2025-04-11 01:24:14 -07:00
Chang Su	aee62d744b	Optimize GPU memory usage in FlashAttentionBackend's strided indexing (#5262 ) Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-04-11 00:34:17 -07:00
fzyzcjy	cd7e32e2cb	Optimize attention in llama4 (#5127 )	2025-04-11 00:32:41 -07:00
HAI	8879944800	ROCm/AITER CK_MoE: update 2-stage kernels & support both Activations (#5228 )	2025-04-10 18:19:57 -07:00

... 12 13 14 15 16 ...

3455 Commits