sglang

Author	SHA1	Message	Date
Yineng Zhang	2c11f9c2eb	chore: upgrade sgl-kernel 0.0.9.post2 (#5540 )	2025-04-18 21:17:23 -07:00
Yineng Zhang	a6f892e5d0	Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544 )	2025-04-18 16:50:21 -07:00
Yineng Zhang	08b518d51f	fix util import (#5542 )	2025-04-18 15:06:46 -07:00
yhyang201	4db463b1ad	[Model] Adding Qwen3 and Qwen3MoE (#4693 )	2025-04-18 09:51:29 -07:00
Wenxuan Tan	bfa3922451	Avoid computing lse in Ragged Prefill when there's no prefix. (#5476 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-04-18 01:13:57 -07:00
liwenju0	e465b08ddb	fix bug of VLLM_AVAILABLE not defined (#5497 )	2025-04-18 00:59:03 -07:00
Xiaoyu Zhang	bed05878f6	fix kimi vl running bug after rebase main (#5461 )	2025-04-18 00:17:34 -07:00
strgrb	b2a189dd11	use sglang_per_token_group_quant_fp8 from sgl-kernel instead of trion kernel (#5473 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-04-18 00:05:24 -07:00
fzyzcjy	53dcf38876	Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (#4836 )	2025-04-17 21:38:26 -07:00
Michael Feil	1effba4c70	Configuration qwen2_moe.py - qkv_bias now in transformers (#5512 )	2025-04-17 21:23:22 -07:00
mlmz	27e9538a7e	Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs … (#5426 ) Co-authored-by: ocss884 <ocss.lin@gmail.com>	2025-04-18 10:51:39 +08:00
u4lr451	211c7b31b8	Fix: Incorrect parameters passed to forward_batch_generation (#5506 ) (#5511 )	2025-04-17 18:49:59 -07:00
Xuchun Shang	06d0a3d92b	[Bug fix] use correct func path in deepseek (#5496 ) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>	2025-04-17 02:41:41 -07:00
fzyzcjy	8beb356f0d	Refactor DeepSeek decoder layer branches (#5205 )	2025-04-17 02:11:11 -07:00
Chang Su	c776234b45	Enable local attention during decode (#5479 )	2025-04-17 02:07:43 -07:00
woodx	3bface15e6	Feat/support encoder model (like bert) (#4887 )	2025-04-17 01:50:48 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Baizhou Zhang	4fb05583ef	Deprecate disable-mla (#5481 )	2025-04-17 01:43:14 -07:00
eigen	8f783c1943	[Model Support] unsloth/Phi-4-mini bnb model (#4982 ) Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-16 19:58:20 -07:00
BearBiscuit	90faf9018e	[verl] Modify the update_weights func to align with verl's resharding (#5345 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-04-16 19:56:57 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
Cheng Wan	6aca583420	Fix several minor issues in PD disaggregation (#5444 )	2025-04-15 23:04:41 -07:00
Yineng Zhang	5b5c7237c8	chore: bump v0.4.5.post1 (#5445 )	2025-04-15 23:00:07 -07:00
Baizhou Zhang	a42736bbb8	Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113 )	2025-04-15 22:01:22 -07:00
ybyang	dd83e7e9c3	[Bug fix] need record start time in pd mode (#5425 )	2025-04-16 10:11:16 +08:00
Lianmin Zheng	0769b14bf9	[Minor] Move torch.compile patch to a better place (#5397 )	2025-04-15 18:37:07 -07:00
ryang	bc24205b32	Support BNB quantization for llama/mllama (#5038 ) Co-authored-by: Yuhao Yang <yyh073@foxmail.com>	2025-04-15 18:00:31 -07:00
Chang Su	27a009bb00	Fix ignore_eos parameter when loading a chat template (#5264 )	2025-04-15 17:09:45 -07:00
Yineng Zhang	8ec0bb7d55	chore: upgrade sgl-kernel 0.0.9.post1 (#5436 )	2025-04-15 15:45:51 -07:00
Yineng Zhang	fa909dc3c4	feat: update model_specific_adjustment (#5344 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>	2025-04-15 14:45:15 -07:00
shangmingc	f1b3b75fc6	[PD] Remove unused bootstrap param and fix port table type (#5423 )	2025-04-15 21:21:20 +08:00
Liangsheng Yin	33b16ad178	Distinguish bootstrap key only in decode server (#5422 )	2025-04-15 20:59:28 +08:00
shangmingc	ffde65a094	[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-04-15 19:29:31 +08:00
lambert0312	471650dee0	Fix broadcast use cuda device lead to memory capacity unbalanced (#5416 )	2025-04-15 02:47:26 -07:00
Yuan Luo	d06a83fb01	Support dynamic connection and TP 16 (#5351 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-04-15 17:08:07 +08:00
Zhaoyang Hao	5d13440162	[FIX] Fix concatenation error in capture_bs when open --disable-cuda-graph-padding and without MTP (#5412 )	2025-04-15 01:42:27 -07:00
Yuhong Guo	3dfc6023ce	Fix bench_serving with random-ids (#5214 )	2025-04-15 01:34:35 -07:00
fzyzcjy	15e91d721b	Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (#5406 )	2025-04-15 01:33:47 -07:00
Yineng Zhang	8aab7fdb21	chore: upgrade sgl-kernel 0.0.9 (#5401 )	2025-04-14 22:37:59 -07:00
Yangcheng Li	ee9d6ca677	[fix/misc] remove duplicate row in deepseek v2 model (#5279 )	2025-04-14 18:41:24 -07:00
Ximingwang-09	2dd6489468	Add H20 dtype fp8_w8a8 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5291 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-04-14 18:40:31 -07:00
lambert0312	61e7c4dd21	Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368 )	2025-04-14 18:39:44 -07:00
Baizhou Zhang	f6772f1497	[Fix] Turn off DeepGEMM by default (#5263 )	2025-04-14 17:45:44 -07:00
Xiaoyu Zhang	38076dea84	apply fused moe gate in ds v3/r1 (#5371 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-14 16:24:26 -07:00
Ke Bao	5e0a9b0981	Apply deepseek cuda rope (#5385 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-14 15:22:43 -07:00
JieXin Liang	bdde237562	[perf] experimental enhance fp8 per-tensor quant (#5370 )	2025-04-14 12:35:43 -07:00
ybyang	e9fc2ac7b6	[PD Bug] fix MLA get_contiguous_buf_infos error (#5384 )	2025-04-14 22:56:39 +08:00
Liangsheng Yin	44afde82d7	Fix PD disaggregation bugs (#5326 )	2025-04-14 19:27:30 +08:00
yhyang201	072df75354	Support for Qwen2.5-VL Model in bitsandbytes Format (#5003 )	2025-04-14 02:03:40 -07:00
fzyzcjy	defede5073	Fix DeepSeek DP Attention + torch compile (#5367 ) Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-14 01:07:58 -07:00

1 2 3 4 5 ...

1904 Commits