sglang

Author	SHA1	Message	Date
Baizhou Zhang	b54b5a96e4	[Doc]Add instruction for profiling with bench_one_batch (#5581 )	2025-04-20 14:05:36 -07:00
JieXin Liang	bca832c7c6	[Fix] fix outlines and xgrammar (#4947 )	2025-04-20 13:31:25 -07:00
Xiaoyu Zhang	d9dd529854	enable DeepSeek V3 shared_experts_fusion in sm90 (#5571 )	2025-04-20 12:46:42 -07:00
fzyzcjy	0a0dd34e6a	Fix BumpAllocator error when no input_ids (#5564 )	2025-04-20 02:20:53 -07:00
fzyzcjy	80ac527d22	[PD] Fix DeepSeek cannot be run on latest master (#5568 )	2025-04-20 02:19:48 -07:00
JieXin Liang	99456bcacb	[perf] introduce deep gemm group_gemm_masked as bmm (#5432 )	2025-04-20 00:38:27 -07:00
fzyzcjy	d07e797ace	Fix bench_one_batch producing unnatural results for expert parallel (#5149 )	2025-04-20 00:38:04 -07:00
Zhaoyi Li	c555d794f7	Minor update for ROCm variable style (#5562 )	2025-04-19 23:45:27 -07:00
Zhiqiang Xie	e2574ee986	fix hicache write back (#5543 )	2025-04-19 21:56:22 -07:00
Byron Hsu	ab4b5606e4	[PD] Support page size > 1 (#5561 )	2025-04-19 21:54:27 -07:00
Yubo Wang	20f1c8e374	Fix sampler nan check when calling top_k_top_p_sampling_from_probs (#5546 )	2025-04-19 21:47:23 -07:00
fzyzcjy	613b197e57	Remove one kernel in per_tensor_quant_mla_fp8 (#5549 )	2025-04-19 15:08:15 -07:00
Xiaoyu Zhang	d58e354472	simplify the control logic for using shared experts fusion (#5504 )	2025-04-19 13:17:35 -07:00
Xiaoyu Zhang	bf86c5e990	restruct compressed_tensors_w8a8_fp8 (#5475 )	2025-04-19 04:52:15 -07:00
shangmingc	dca90f1db8	[PD] Remove the requirement of config file for mooncake backend (#5460 )	2025-04-19 19:31:00 +08:00
Yineng Zhang	0961feefca	feat: use flashinfer jit package (#5547 )	2025-04-19 00:28:39 -07:00
ybyang	59dd090f1c	[PD] Fix no cache connect for recevier (#5534 )	2025-04-19 14:55:28 +08:00
fzyzcjy	569b032c58	[PD] Tiny fix timeout error when generate (#5545 )	2025-04-19 14:42:57 +08:00
fzyzcjy	f6a71139a8	Make profiler output file names consistent (#5548 )	2025-04-18 22:57:11 -07:00
fzyzcjy	1e0806f30b	Fix DeepGEMM masked cannot be run on groups not being multiple or 4 (#5340 )	2025-04-18 22:38:07 -07:00
Yineng Zhang	2c11f9c2eb	chore: upgrade sgl-kernel 0.0.9.post2 (#5540 )	2025-04-18 21:17:23 -07:00
Yineng Zhang	a6f892e5d0	Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544 )	2025-04-18 16:50:21 -07:00
Yineng Zhang	08b518d51f	fix util import (#5542 )	2025-04-18 15:06:46 -07:00
yhyang201	4db463b1ad	[Model] Adding Qwen3 and Qwen3MoE (#4693 )	2025-04-18 09:51:29 -07:00
Wenxuan Tan	bfa3922451	Avoid computing lse in Ragged Prefill when there's no prefix. (#5476 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-04-18 01:13:57 -07:00
liwenju0	e465b08ddb	fix bug of VLLM_AVAILABLE not defined (#5497 )	2025-04-18 00:59:03 -07:00
Xiaoyu Zhang	bed05878f6	fix kimi vl running bug after rebase main (#5461 )	2025-04-18 00:17:34 -07:00
strgrb	b2a189dd11	use sglang_per_token_group_quant_fp8 from sgl-kernel instead of trion kernel (#5473 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-04-18 00:05:24 -07:00
Yineng Zhang	f28d82997a	chore: bump sgl-kernel 0.0.9.post2 (#5518 )	2025-04-17 23:42:39 -07:00
Xiaoyu Zhang	8e09b37077	Sgl kernel fused_moe_gate support n_shared_experts (#5440 )	2025-04-17 23:05:15 -07:00
fzyzcjy	53dcf38876	Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (#4836 )	2025-04-17 21:38:26 -07:00
Michael Feil	1effba4c70	Configuration qwen2_moe.py - qkv_bias now in transformers (#5512 )	2025-04-17 21:23:22 -07:00
Michael Yao	a0fc5bc144	[docs] Fix several consistency issues in sampling_params.md (#5373 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-04-18 10:54:40 +08:00
mlmz	27e9538a7e	Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs … (#5426 ) Co-authored-by: ocss884 <ocss.lin@gmail.com>	2025-04-18 10:51:39 +08:00
u4lr451	211c7b31b8	Fix: Incorrect parameters passed to forward_batch_generation (#5506 ) (#5511 )	2025-04-17 18:49:59 -07:00
PGFLMG	c08a717c77	[Feat] Update sgl-kernel flashinfer to latest main version (#5500 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-17 12:43:23 -07:00
mlmz	f13d65a7ea	Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503 )	2025-04-17 11:37:43 -07:00
Xuchun Shang	06d0a3d92b	[Bug fix] use correct func path in deepseek (#5496 ) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>	2025-04-17 02:41:41 -07:00
Michael Yao	22c2a79dc5	Fix a link in sgl-kernel/README.md (#5493 )	2025-04-17 02:25:28 -07:00
fzyzcjy	8beb356f0d	Refactor DeepSeek decoder layer branches (#5205 )	2025-04-17 02:11:11 -07:00
Chang Su	c776234b45	Enable local attention during decode (#5479 )	2025-04-17 02:07:43 -07:00
woodx	3bface15e6	Feat/support encoder model (like bert) (#4887 )	2025-04-17 01:50:48 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Baizhou Zhang	4fb05583ef	Deprecate disable-mla (#5481 )	2025-04-17 01:43:14 -07:00
Baizhou Zhang	81c891111f	Add test for flash_attn_varlen_func kernel (#5484 )	2025-04-17 01:42:56 -07:00
Didier Durand	92d1561b70	Update attention_backend.md: plural form (#5489 )	2025-04-17 01:42:40 -07:00
eigen	8f783c1943	[Model Support] unsloth/Phi-4-mini bnb model (#4982 ) Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-16 19:58:20 -07:00
BearBiscuit	90faf9018e	[verl] Modify the update_weights func to align with verl's resharding (#5345 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-04-16 19:56:57 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
Ying Sheng	d7bc19a46a	add multi-lora feature in README.md (#5463 )	2025-04-16 03:25:25 -07:00

1 2 3 4 5 ...

2906 Commits