sglang

Author	SHA1	Message	Date
b8zhong	d0a64c7e2c	vlm: enforce pybase64 for image and str encode/decode (#10700 )	2025-10-21 19:05:32 +08:00
Shangming Cai	05d3667ab9	[CI] disable glm4.1v and fix the flashinfer installation (#11902 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-21 18:38:35 +08:00
ybyang	dbb16bedd5	Support Thinking Budget (via custom_logit_processor for OpenAI API) [Fix #6572 ] (#11416 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: YorkSu <york_su@qq.com>	2025-10-21 16:27:56 +08:00
Neelabh Sinha	852c0578fd	[FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570 )	2025-10-21 15:44:33 +08:00
Meng, Hengyu	b113c72e7a	Init attention backend for Intel XPU (#10656 ) Co-authored-by: guangyey <guangye.yu@intel.com> Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>	2025-10-21 11:41:28 +08:00
Xiaoyu Zhang	8374a96e49	piecewise cuda graph support qwen3-moe (#11845 )	2025-10-21 10:55:49 +08:00
DarkSharpness	276e7b3e4e	[Feature] New structural tag support (#10691 )	2025-10-20 18:25:58 +08:00
Shane A	d383e6616e	[Model] Add Olmo 3 model support (#11396 )	2025-10-19 23:59:16 -07:00
Yuan Luo	271d3d0d50	Support mrope triton kernel and add unit test (#11722 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2025-10-20 11:51:07 +08:00
Johnny	252dc4e112	[NVIDIA] FA3/FA4 Fix (#11606 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-19 17:10:10 -07:00
Baizhou Zhang	cbb5fc2edc	[CI] Add CI test for DeepSeek V3.2 MTP (#11835 )	2025-10-19 17:00:25 -07:00
Night	53fb229f53	[logprobs] Enable local deterministic logrprobs testing with strict threshold (#10994 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-19 13:30:39 -07:00
Stefan He	4fff1ec1d9	Deterministic Mode: Add 1-stage triton kernel for prefill (#11147 ) Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Binyao Jiang <bijiang@linkedin.com>	2025-10-20 01:47:36 +08:00
Liangsheng Yin	7a020e0f3b	[Test] Add basic matched stop for beta eagle (#11833 )	2025-10-20 01:17:00 +08:00
b8zhong	f4f8a1b4d8	ci: update `lmms-eval` to speed up multimodal CI (#11000 )	2025-10-19 02:51:19 +08:00
Minglei Zhu	13219e1e48	completely remove mixed mode deterministic test as prefix mode could cover it (#11783 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-17 17:46:03 -07:00
Lianmin Zheng	b9a54e0968	[minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2025-10-17 14:25:22 -07:00
Chunyuan WU	8fcc69e7c4	Turn on shm_allreduce and shm_allgather for fp16 (#10725 )	2025-10-17 12:35:20 -07:00
Yineng Zhang	da681f35d3	Revert "Set csgmv as default lora backend. (#11488 )" (#11735 )	2025-10-17 12:01:36 -05:00
Mick	3e4c7da2f5	ci: reduce and refactor vlm ut and combine test files (#11062 )	2025-10-17 15:24:50 +00:00
Mick	86b04d25b3	model: qwen3-omni (thinker-only) (#10911 ) Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-10-16 13:20:38 -07:00
Hank Han	0dd6cf16ba	[ci]use H20 to run disaggregation test (#11543 )	2025-10-16 11:42:42 -07:00
Shangming Cai	1de3924b18	[CI] Add GLM4MoE model test (#11706 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-16 16:25:58 +08:00
Even Zhou	3cceaa381a	[Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510 )	2025-10-16 15:14:09 +08:00
Lifu Huang	b0d20cdec7	Set csgmv as default lora backend. (#11488 )	2025-10-15 23:53:24 -05:00
YanbingJiang	cbac499750	Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2025-10-15 19:22:32 -07:00
Shangming Cai	868403f642	[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>	2025-10-15 18:59:14 -07:00
Lianmin Zheng	cd7e1bd591	Sync code and test CI; rename some env vars (#11686 )	2025-10-15 18:37:03 -07:00
DiweiSun	4c03dbaaef	[CI][XPU]enable sglang CI on Intel XPU (#9493 ) Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-15 17:13:19 -07:00
Jinwu	825432fce6	[1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247 ) Co-authored-by: Hank Han <hanhan7630@outlook.com>	2025-10-14 20:10:53 -07:00
Xun Sun	a40229f6f8	[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423 ) Co-authored-by: Hank Han <hanhan7630@outlook.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-14 19:40:54 -07:00
yinghui	56222658ec	move eagle draft post process to cuda graph (#11434 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-10-14 22:50:53 +08:00
Chenxi Li	28f80b1244	Implement LRU eviction policy for LoRA adapters (#11041 )	2025-10-13 20:18:25 -07:00
Neelabh Sinha	aaf7af1b17	[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413 )	2025-10-14 09:20:17 +08:00
Baizhou Zhang	9f1f699a7a	[CI] Add Basic Test for DeepSeek V3.2 (#11308 )	2025-10-13 11:41:02 -07:00
Liangsheng Yin	acc2327bbd	Move deep gemm related arguments to `sglang.srt.environ` (#11547 )	2025-10-14 00:34:35 +08:00
Mick	f35f120d70	fix: fix video input for qwen3-vl (#11442 )	2025-10-13 09:30:43 -07:00
Liangsheng Yin	516738b096	Depreate `global_server_args_dict` (#11528 )	2025-10-13 19:34:43 +08:00
Shangming Cai	c5fe3c0b75	Tiny fix test run estimated time (#11544 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-13 02:23:13 -07:00
hzh0425	318424e2c8	[HICache]: Support 3FS-Store with page_first_direct layout (#11460 )	2025-10-13 15:47:22 +08:00
Mick	0c0779d667	ci: improve nightly-ci (#11385 )	2025-10-12 21:19:34 -07:00
Yi Zhang	a55cf5304a	[Feature] Support mamba radix cache v0 (#11214 ) Co-authored-by: hanming-lu <hanming@x.ai> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: thalahors <ericalcaide1@gmail.com>	2025-10-12 20:57:15 -07:00
Yongtong Wu	a20e7df8d0	Improve dp attention port assignment scheme (#5889 ) Co-authored-by: Cheng Wan <cwan@x.ai>	2025-10-12 17:55:59 -07:00
Cheng Wan	1bdd010291	Revert "Deprecate `global_server_args_dict`" (#11520 )	2025-10-12 17:40:40 -07:00
Liangsheng Yin	1083e7e3df	Deprecate `global_server_args_dict` (#11331 )	2025-10-13 01:20:47 +08:00
Liangsheng Yin	2157d12ae8	[CI] fix lint (#11509 )	2025-10-13 01:07:21 +08:00
Mick	9f2b457cbe	doc: add doc for adding new models into nightly-ci (#11443 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-10-12 08:35:10 -07:00
Lianmin Zheng	5a6ec8f999	Fix unit tests (#11503 )	2025-10-12 07:45:57 -07:00
Lianmin Zheng	548a57b1f3	Fix port conflicts in CI (#11497 )	2025-10-12 06:46:36 -07:00
Yuwei An	4ac8e09df0	Piecewise CUDA Graph Support & Torch Compile Backend (#10062 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-12 11:55:57 +08:00

1 2 3 4 5 ...

1088 Commits