sglang

Author	SHA1	Message	Date
Liangsheng Yin	33b16ad178	Distinguish bootstrap key only in decode server (#5422 )	2025-04-15 20:59:28 +08:00
shangmingc	ffde65a094	[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-04-15 19:29:31 +08:00
lambert0312	471650dee0	Fix broadcast use cuda device lead to memory capacity unbalanced (#5416 )	2025-04-15 02:47:26 -07:00
Yuan Luo	d06a83fb01	Support dynamic connection and TP 16 (#5351 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-04-15 17:08:07 +08:00
Zhaoyang Hao	5d13440162	[FIX] Fix concatenation error in capture_bs when open --disable-cuda-graph-padding and without MTP (#5412 )	2025-04-15 01:42:27 -07:00
Yuhong Guo	3dfc6023ce	Fix bench_serving with random-ids (#5214 )	2025-04-15 01:34:35 -07:00
fzyzcjy	15e91d721b	Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (#5406 )	2025-04-15 01:33:47 -07:00
Yineng Zhang	8aab7fdb21	chore: upgrade sgl-kernel 0.0.9 (#5401 )	2025-04-14 22:37:59 -07:00
Yangcheng Li	ee9d6ca677	[fix/misc] remove duplicate row in deepseek v2 model (#5279 )	2025-04-14 18:41:24 -07:00
Ximingwang-09	2dd6489468	Add H20 dtype fp8_w8a8 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5291 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-04-14 18:40:31 -07:00
lambert0312	61e7c4dd21	Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368 )	2025-04-14 18:39:44 -07:00
Baizhou Zhang	f6772f1497	[Fix] Turn off DeepGEMM by default (#5263 )	2025-04-14 17:45:44 -07:00
Xiaoyu Zhang	38076dea84	apply fused moe gate in ds v3/r1 (#5371 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-14 16:24:26 -07:00
Ke Bao	5e0a9b0981	Apply deepseek cuda rope (#5385 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-14 15:22:43 -07:00
JieXin Liang	bdde237562	[perf] experimental enhance fp8 per-tensor quant (#5370 )	2025-04-14 12:35:43 -07:00
ybyang	e9fc2ac7b6	[PD Bug] fix MLA get_contiguous_buf_infos error (#5384 )	2025-04-14 22:56:39 +08:00
Liangsheng Yin	44afde82d7	Fix PD disaggregation bugs (#5326 )	2025-04-14 19:27:30 +08:00
yhyang201	072df75354	Support for Qwen2.5-VL Model in bitsandbytes Format (#5003 )	2025-04-14 02:03:40 -07:00
fzyzcjy	defede5073	Fix DeepSeek DP Attention + torch compile (#5367 ) Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-14 01:07:58 -07:00
Yongtong Wu	14e8bd889f	Free metadata_buffer_index after transfer finished (#5364 )	2025-04-14 16:04:46 +08:00
yulei	adca585bfb	[DeepEP] Reduce routed scaling overhead (#5277 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-04-13 16:03:09 -07:00
Yineng Zhang	39e411385c	fix #5322 (#5359 )	2025-04-13 13:57:36 -07:00
huangtingwei	5fbafbb8f8	fix MLATokenToKVPoolHost get_size_per_token bug (#5161 ) Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>	2025-04-13 12:37:26 -07:00
Byron Hsu	a9499885e9	[PD] Add transfer backend abstraction (#5328 )	2025-04-14 01:39:39 +08:00
Liangsheng Yin	f765579046	Fix typo: infight -> inflight (#5357 )	2025-04-14 01:25:30 +08:00
Yineng Zhang	f58b929a51	chore: upgrade sgl-kernel 0.0.8.post3 (#5342 )	2025-04-13 00:45:59 -07:00
mlmz	8311b07fb9	Fix: Ensure tensors for dist.broadcast match NCCL backend device (#5322 )	2025-04-12 22:50:37 -07:00
Yineng Zhang	7d3b7c87f5	fix: determine if flashinfer is installed (#5336 )	2025-04-12 19:59:13 -07:00
tianlian yi	bc92107b03	Support server based rollout in Verlengine (#4848 ) Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>	2025-04-12 10:07:52 -07:00
Xiaoyu Zhang	3e4794aad8	refine fused_moe tuning docs (#5294 )	2025-04-12 10:01:13 -07:00
Xiaoyu Zhang	690ec20587	Delete python/sglang/srt/layers/moe/fused_moe_triton/configs/E=257,N=… (#5321 )	2025-04-12 10:00:03 -07:00
Yineng Zhang	57de7c6b5f	feat: use fa3 mla by default on hopper (#5210 ) Co-authored-by: yundai424 <yundai424@gmail.com> Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>	2025-04-12 01:09:25 -07:00
Qingquan Song	aea98512a8	Fix fa3 window size setup (#5316 )	2025-04-11 23:37:52 -07:00
lambert0312	1b1b47a949	Fix w8a8_int8 model shared experts fusion load weights error (#5120 )	2025-04-11 23:33:51 -07:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Yineng Zhang	611720919d	fix: use deepgemm only on hopper (#5310 )	2025-04-11 20:48:24 -07:00
Yineng Zhang	f774a0d275	feat: add blackwell Dockerfile (#5302 )	2025-04-11 13:08:53 -07:00
Xiaoyu Zhang	60bcbf2a35	remove moe_align_block_size torch.zeros in small batch/expert mode (#5298 )	2025-04-11 12:13:55 -07:00
Yusong Gao	c35dcfdb30	[PD] fix: skip warmup request in disaggregation mode to prevent crash on timeout (#5292 )	2025-04-11 23:03:07 +08:00
Mick	e53a0b3d5b	[fix] fix mrope positions not picked up (#5265 )	2025-04-11 01:29:45 -07:00
Cheng Wan	038bc5d521	Support `--enable-llama4-multimodal` (#5254 )	2025-04-11 01:24:14 -07:00
Chang Su	aee62d744b	Optimize GPU memory usage in FlashAttentionBackend's strided indexing (#5262 ) Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-04-11 00:34:17 -07:00
fzyzcjy	cd7e32e2cb	Optimize attention in llama4 (#5127 )	2025-04-11 00:32:41 -07:00
HAI	8879944800	ROCm/AITER CK_MoE: update 2-stage kernels & support both Activations (#5228 )	2025-04-10 18:19:57 -07:00
Richard Zou	a879811c4b	Fix torch.compile cacheing (#5259 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-10 18:08:45 -07:00
Ke Bao	1078396f47	Update deps for mllama4 (#5215 )	2025-04-10 09:12:44 -07:00
Teng Ma	7e4f72dd8c	[PD] Add get_contiguous_buf_infos interface for MLATokenToKVPool (#5204 )	2025-04-10 20:05:34 +08:00
Teng Ma	4c31ae9f6d	[PD] Support KV transfer with mooncake (#4880 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: shangmingc <csmthu@gmail.com>	2025-04-10 14:23:23 +08:00
Xiaoyu Zhang	f730362ee2	reduce moe_align_block_size_kernel small batch mode overhead (#5086 )	2025-04-09 17:59:35 -07:00
fzyzcjy	e3c4bd3153	Fix DeepSeek error when using DeepEP mode (#5190 )	2025-04-09 17:43:22 -07:00

1 2 3 4 5 ...

1873 Commits