sglang

Author	SHA1	Message	Date
Liangsheng Yin	516738b096	Depreate `global_server_args_dict` (#11528 )	2025-10-13 19:34:43 +08:00
Shangming Cai	c5fe3c0b75	Tiny fix test run estimated time (#11544 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-13 02:23:13 -07:00
hzh0425	318424e2c8	[HICache]: Support 3FS-Store with page_first_direct layout (#11460 )	2025-10-13 15:47:22 +08:00
Mick	0c0779d667	ci: improve nightly-ci (#11385 )	2025-10-12 21:19:34 -07:00
Yi Zhang	a55cf5304a	[Feature] Support mamba radix cache v0 (#11214 ) Co-authored-by: hanming-lu <hanming@x.ai> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: thalahors <ericalcaide1@gmail.com>	2025-10-12 20:57:15 -07:00
Yongtong Wu	a20e7df8d0	Improve dp attention port assignment scheme (#5889 ) Co-authored-by: Cheng Wan <cwan@x.ai>	2025-10-12 17:55:59 -07:00
Cheng Wan	1bdd010291	Revert "Deprecate `global_server_args_dict`" (#11520 )	2025-10-12 17:40:40 -07:00
Liangsheng Yin	1083e7e3df	Deprecate `global_server_args_dict` (#11331 )	2025-10-13 01:20:47 +08:00
Liangsheng Yin	2157d12ae8	[CI] fix lint (#11509 )	2025-10-13 01:07:21 +08:00
Mick	9f2b457cbe	doc: add doc for adding new models into nightly-ci (#11443 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-10-12 08:35:10 -07:00
Lianmin Zheng	5a6ec8f999	Fix unit tests (#11503 )	2025-10-12 07:45:57 -07:00
Lianmin Zheng	548a57b1f3	Fix port conflicts in CI (#11497 )	2025-10-12 06:46:36 -07:00
Yuwei An	4ac8e09df0	Piecewise CUDA Graph Support & Torch Compile Backend (#10062 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-12 11:55:57 +08:00
Liangsheng Yin	20a6c0a63d	Beta spec-overlap for EAGLE (#11398 ) Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-10-12 11:02:22 +08:00
Glen Liu	47c606d3dc	[Feature] support regex strings as a stopping condition (#10635 )	2025-10-12 10:53:15 +08:00
Binyao Jiang	451d15c44b	[DPSKv3.2] Rewrite nsa tilelang act_quant kernel to triton (#11450 )	2025-10-10 23:13:46 -07:00
Stefan He	eae9a9fb9d	Fix batch invariant ops (#11368 )	2025-10-10 20:49:08 -07:00
Lianmin Zheng	61055cb309	Reorder PD disagg CI tests (#11438 )	2025-10-10 17:56:49 -07:00
cctry	b36afed4a7	Separate allocation logic from scheduler (#11313 )	2025-10-10 17:38:54 -07:00
Lianmin Zheng	b4408e6098	Revert "fix: fix video input for qwen3-vl" (#11437 )	2025-10-10 12:44:40 -07:00
Mick	a1a20b4c7c	fix: fix video input for qwen3-vl (#11361 )	2025-10-10 04:35:35 -07:00
hzh0425	ee3bd8a1c8	feat(hicache): Support passing prefix keys for l3 store. (#9045 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-10 00:22:05 -07:00
Shangming Cai	70fbb3adf6	[CI] Refactor PD disaggregation test suite (#11363 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-09 18:50:39 -07:00
Glen Liu	9a7e7a6576	[Bug Fix] prevent lora adapter from being loaded into LoRAManager if it is already loaded (#11365 )	2025-10-09 18:43:03 -07:00
Sundara Raman Ramachandran	53bd00d975	[Generative Score API] Multi-Item scoring with custom attention mask. (#10979 )	2025-10-08 18:47:32 -07:00
Netanel Haber	d6837aea4d	model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-10-09 00:37:38 +08:00
Liangsheng Yin	c882b5ae75	[CI] improve disaggregation CI. (#11264 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-08 21:40:56 +08:00
Cheng Wan	3c06b673af	[8/N] MoE Refactor: deprecate `EPMoE` (#11211 )	2025-10-07 21:51:41 -07:00
Adarsh Shirawalmath	7c3f07dbcb	[Feature] Add /tokenize and /detokenize OpenAI compatible endpoints (#9545 )	2025-10-08 12:38:48 +08:00
Mick	64d1505c0a	ci: unify the model launch method of nightly ci (#11230 )	2025-10-07 18:13:14 -07:00
cctry	f3764c26a3	Clean match_prefix and prepare_for_extend for mem cache V2 (#11200 )	2025-10-07 17:54:18 -07:00
Chang Su	7ba3de0e92	[oai serving chat] Add argument `--sampling-defaults` and fix `ChatCompletionRequest` defaults (#11304 )	2025-10-08 00:36:05 +00:00
Ke Bao	24bc3fb0f9	EAGLE cache fix for SWARadixCache (#11231 ) Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-10-07 18:21:37 +08:00
Liangsheng Yin	8a8a608af9	[ci] fix pp test (#11294 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-07 14:20:04 +08:00
Alex Chi Z	9b4c449735	convert test_deterministic into unit tests (#11095 ) Signed-off-by: Alex Chi Z <iskyzh@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-06 20:33:11 -07:00
sunxxuns	a57f0e3d56	reverse the amd ci test back to 1200s and split the 8-gpu deepseek job into two. (#11238 ) Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>	2025-10-06 19:27:57 -04:00
Zhiyu	155cbb51f0	Enable native ModelOpt quantization support (1/3) (#7149 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-10-06 13:24:15 -07:00
Lianmin Zheng	d645ae90a3	Rename runner labels (#11228 )	2025-10-05 18:05:41 -07:00
Xinyuan Tong	652c24a653	Update transformers package version to 4.57.0 (#11222 ) Co-authored-by: yhyang201 <yhyang201@gmail.com>	2025-10-05 23:45:14 +00:00
sunxxuns	5e142484e2	[Fix AMD CI] VRAM cleanup (#11174 ) Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>	2025-10-05 19:03:53 -04:00
Shangming Cai	c560410da7	Refactor and optimize mooncake CI (#11162 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-05 14:08:52 -07:00
Vincent Zhong	36a6b8dbfc	Update `v1/responses` to be more OpenAI-compatible. (#9624 )	2025-10-05 18:47:46 +00:00
Ke Bao	31b49c0b51	EAGLE cache fix for HiCache (#11215 )	2025-10-04 16:53:53 -07:00
Hank Han	666da3d59f	[fix]enable flashmla when using draft model P/D attention select (#11012 )	2025-10-04 20:59:34 +08:00
hzh0425	c70e58e837	[HICache]: Refactor HiCache CI (#11011 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-03 20:51:56 -04:00
Liangsheng Yin	4726c9197f	[minor] fix the lint (#11198 )	2025-10-04 01:04:58 +08:00
vikram singh shekhawat	586e81a28a	[Test] Initialize mem_fraction_static in setUpClass to fix pytest VLM test crashes. (#10859 ) Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>	2025-10-04 00:14:48 +08:00
shubham singhal	03def5e3b1	Fix [test]: Env:SGLANG_TORCH_PROFILER_DIR for pytest. (#10780 )	2025-10-03 22:59:32 +08:00
fzyzcjy	fdc4e1e570	Tiny move files to utils folder (#11166 )	2025-10-03 22:40:06 +08:00
Matt Nappo	8c57490210	[Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873 ) Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>	2025-10-03 16:48:19 +08:00

1 2 3 4 5 ...

1051 Commits