sglang

Author	SHA1	Message	Date
Shenggui Li	3f23d8cdf1	added support for tied weights in qwen pipeline parallelism (#6546 )	2025-05-25 00:00:56 -07:00
Lifu Huang	022012aae8	Support Phi-4 Multi-Modal (text + vision only) (#6494 )	2025-05-24 21:43:38 -07:00
Xinyuan Tong	681fdc264b	Refactor vlm embedding routine to use precomputed feature (#6543 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-24 18:39:21 -07:00
fzyzcjy	0d47788025	Support overlapping two batches (#4068 )	2025-05-24 17:39:07 -07:00
kk	7a5e6ce1cb	Fix GPU OOM (#6564 ) Co-authored-by: michael <michael.zhang@amd.com>	2025-05-24 16:38:39 -07:00
Byron Hsu	2d831c6ef9	[PD] Support structured output (#6560 )	2025-05-23 21:49:00 -07:00
Chang Su	ed0c3035cd	feat(Tool Calling): Support `required` and specific function mode (#6550 )	2025-05-23 21:00:37 -07:00
Shi Shuai	9c574585b3	fix: remove content=none test when tool called (#6347 )	2025-05-23 15:12:55 -07:00
Byron Hsu	8233cc10fd	[PD] Support logprob & Add failure test (#6558 )	2025-05-23 14:29:20 -07:00
Byron Hsu	d2e0881a34	[PD] support spec decode (#6507 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-23 12:03:05 -07:00
YanbingJiang	d8189660a9	Update sgl-kernel UTs for activation/topk/norm/rope kernels (#6452 )	2025-05-23 02:03:15 -07:00
Chunyuan WU	3ded6235c9	Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404 )	2025-05-23 02:01:55 -07:00
blzheng	4ba1eea83f	Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (#6493 )	2025-05-23 00:14:46 -07:00
Chang Su	4685fbb888	[VLM] Support chunk prefill for VLM (#6355 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-22 20:32:41 -07:00
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00
Yineng Zhang	0b07c4a99f	chore: upgrade sgl-kernel v0.1.4 (#6532 )	2025-05-22 13:28:16 -07:00
fzyzcjy	7a80f56513	Support dynamically rebalancing experts using EPLB (#6469 )	2025-05-21 23:13:21 -07:00
fzyzcjy	fc992a09f9	Support updating expert locations dynamically (#6388 )	2025-05-21 21:59:33 -07:00
Ke Bao	6ce0ed073b	Apply constraint grammar to EAGLE (#6499 ) Co-authored-by: merrymercy <lianminzheng@gmail.com>	2025-05-21 17:18:41 -07:00
blzheng	cfe48c5902	[CPU] Fix build issue (#6419 )	2025-05-21 11:17:10 -07:00
Jiajun Li	4024e1d2a8	Implement Siglip Vision model, and support BNB quantization for gemma3-mm (#5339 )	2025-05-20 23:53:46 -07:00
HAI	5c0b38f369	aiter attention-backend (default enabled on AMD/ROCm) (#6381 )	2025-05-20 22:52:41 -07:00
YanbingJiang	32cc66efa5	Update extend/decode attention kernel for CPU in sgl-kernel and add UTs (#6405 ) Co-authored-by: mingfeima <mingfei.ma@intel.com>	2025-05-19 21:23:17 -07:00
fzyzcjy	f0653886a5	Expert distribution recording without overhead for EPLB (#4957 )	2025-05-19 20:07:43 -07:00
Yineng Zhang	b146555749	Revert "Implement `return_hidden_states` for the OpenAI API (#6137 )" (#6440 )	2025-05-19 18:21:29 -07:00
Trevor Morris	7adf245ba2	[Metrics] Add KV events publishing (#6098 )	2025-05-19 14:19:54 -07:00
Baizhou Zhang	299fd22f9e	Fix throughput threshold for amd ci test (#6414 )	2025-05-19 14:17:41 -07:00
kyle-pena-kuzco	4f39bcf7ab	Implement `return_hidden_states` for the OpenAI API (#6137 )	2025-05-18 22:30:25 -07:00
Chunyuan WU	5dd62c3a6f	Add fp8 shared_expert kernel for CPU in sgl-kernel and add UT (#6339 ) Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com> Co-authored-by: mingfeima <mingfei.ma@intel.com>	2025-05-18 12:42:15 -07:00
fzyzcjy	f11481b921	Add 4-GPU runner tests and split existing tests (#6383 )	2025-05-18 11:56:51 -07:00
libra	11553c1a37	Add pipeline parallelism for Qwen2 and Qwen3 Model (#6250 )	2025-05-18 00:42:55 -07:00
Mick	01dd39bac1	refactor: minor refactors regarding multimodal processing (#6187 )	2025-05-17 22:53:20 -07:00
fzyzcjy	fd08c04821	Support custom DeepEP tuning config (#6257 )	2025-05-17 17:09:42 -07:00
Lifu Huang	3cf1473a09	Use monotonic clock for interval measurement (#6211 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-17 16:49:18 -07:00
Kiv Chen	64825b8395	model(vlm): mistral 3.1 (#5099 ) Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>	2025-05-16 18:36:18 -07:00
Lianmin Zheng	c2b7ddca49	[Minor] cleanup unused imports (#6358 )	2025-05-16 14:52:38 -07:00
Yury Sulsky	f19a9204cd	Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136 ) Co-authored-by: Yury Sulsky <ysulsky@tesla.com>	2025-05-16 12:26:15 -07:00
Chunyuan WU	fb4959b2c5	Add fp8 gemm kernel for CPU in sgl-kernel and add gemm UT (#6216 ) Co-authored-by: YanbingJiang <yanbing.jiang@intel.com> Co-authored-by: mingfeima <mingfei.ma@intel.com>	2025-05-15 09:10:40 -07:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00
Zilin Zhu	44a3783d13	[fix][RL] Remove the incorrect barrier in init_weights_update_group (#5914 )	2025-05-14 19:15:21 -07:00
Sai Enduri	73eb67c087	Enable unit tests for AMD CI. (#6283 )	2025-05-14 12:55:36 -07:00
Kiv Chen	5380cd7ea3	model(vlm): pixtral (#5084 )	2025-05-13 00:16:10 -07:00
Lianmin Zheng	ac2324c177	Skip the flaky test_stateful_custom_logit_processor (#6251 )	2025-05-12 18:29:41 -07:00
shangmingc	f1c896007a	[PD] Add support for different TP sizes per DP rank (#5922 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-12 13:55:42 -07:00
Sai Enduri	983c663de6	Update AMD nightly deps. (#6241 )	2025-05-12 13:39:20 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
Ying Sheng	bad7c26fdc	[PP] Fix init_memory_pool desync & add PP for mixtral (#6223 )	2025-05-12 12:38:09 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
shangmingc	3ee40ff919	[CI] Re-enable pd disaggregation test (#6231 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-12 10:09:12 -07:00
Lianmin Zheng	fba8eccd7e	Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-12 00:17:33 -07:00

1 2 3 4 5 ...

659 Commits