sglang

Author	SHA1	Message	Date
Lifu Huang	b0d20cdec7	Set csgmv as default lora backend. (#11488 )	2025-10-15 23:53:24 -05:00
YanbingJiang	cbac499750	Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2025-10-15 19:22:32 -07:00
Shangming Cai	868403f642	[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>	2025-10-15 18:59:14 -07:00
Lianmin Zheng	cd7e1bd591	Sync code and test CI; rename some env vars (#11686 )	2025-10-15 18:37:03 -07:00
DiweiSun	4c03dbaaef	[CI][XPU]enable sglang CI on Intel XPU (#9493 ) Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-15 17:13:19 -07:00
Jinwu	825432fce6	[1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247 ) Co-authored-by: Hank Han <hanhan7630@outlook.com>	2025-10-14 20:10:53 -07:00
Xun Sun	a40229f6f8	[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423 ) Co-authored-by: Hank Han <hanhan7630@outlook.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-14 19:40:54 -07:00
yinghui	56222658ec	move eagle draft post process to cuda graph (#11434 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-10-14 22:50:53 +08:00
Chenxi Li	28f80b1244	Implement LRU eviction policy for LoRA adapters (#11041 )	2025-10-13 20:18:25 -07:00
Neelabh Sinha	aaf7af1b17	[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413 )	2025-10-14 09:20:17 +08:00
Baizhou Zhang	9f1f699a7a	[CI] Add Basic Test for DeepSeek V3.2 (#11308 )	2025-10-13 11:41:02 -07:00
Liangsheng Yin	acc2327bbd	Move deep gemm related arguments to `sglang.srt.environ` (#11547 )	2025-10-14 00:34:35 +08:00
Mick	f35f120d70	fix: fix video input for qwen3-vl (#11442 )	2025-10-13 09:30:43 -07:00
Liangsheng Yin	516738b096	Depreate `global_server_args_dict` (#11528 )	2025-10-13 19:34:43 +08:00
Shangming Cai	c5fe3c0b75	Tiny fix test run estimated time (#11544 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-13 02:23:13 -07:00
hzh0425	318424e2c8	[HICache]: Support 3FS-Store with page_first_direct layout (#11460 )	2025-10-13 15:47:22 +08:00
Mick	0c0779d667	ci: improve nightly-ci (#11385 )	2025-10-12 21:19:34 -07:00
Yi Zhang	a55cf5304a	[Feature] Support mamba radix cache v0 (#11214 ) Co-authored-by: hanming-lu <hanming@x.ai> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: thalahors <ericalcaide1@gmail.com>	2025-10-12 20:57:15 -07:00
Yongtong Wu	a20e7df8d0	Improve dp attention port assignment scheme (#5889 ) Co-authored-by: Cheng Wan <cwan@x.ai>	2025-10-12 17:55:59 -07:00
Cheng Wan	1bdd010291	Revert "Deprecate `global_server_args_dict`" (#11520 )	2025-10-12 17:40:40 -07:00
Liangsheng Yin	1083e7e3df	Deprecate `global_server_args_dict` (#11331 )	2025-10-13 01:20:47 +08:00
Liangsheng Yin	2157d12ae8	[CI] fix lint (#11509 )	2025-10-13 01:07:21 +08:00
Mick	9f2b457cbe	doc: add doc for adding new models into nightly-ci (#11443 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-10-12 08:35:10 -07:00
Lianmin Zheng	5a6ec8f999	Fix unit tests (#11503 )	2025-10-12 07:45:57 -07:00
Lianmin Zheng	548a57b1f3	Fix port conflicts in CI (#11497 )	2025-10-12 06:46:36 -07:00
Yuwei An	4ac8e09df0	Piecewise CUDA Graph Support & Torch Compile Backend (#10062 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-12 11:55:57 +08:00
Liangsheng Yin	20a6c0a63d	Beta spec-overlap for EAGLE (#11398 ) Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-10-12 11:02:22 +08:00
Glen Liu	47c606d3dc	[Feature] support regex strings as a stopping condition (#10635 )	2025-10-12 10:53:15 +08:00
Binyao Jiang	451d15c44b	[DPSKv3.2] Rewrite nsa tilelang act_quant kernel to triton (#11450 )	2025-10-10 23:13:46 -07:00
Stefan He	eae9a9fb9d	Fix batch invariant ops (#11368 )	2025-10-10 20:49:08 -07:00
Lianmin Zheng	61055cb309	Reorder PD disagg CI tests (#11438 )	2025-10-10 17:56:49 -07:00
cctry	b36afed4a7	Separate allocation logic from scheduler (#11313 )	2025-10-10 17:38:54 -07:00
Lianmin Zheng	b4408e6098	Revert "fix: fix video input for qwen3-vl" (#11437 )	2025-10-10 12:44:40 -07:00
Mick	a1a20b4c7c	fix: fix video input for qwen3-vl (#11361 )	2025-10-10 04:35:35 -07:00
hzh0425	ee3bd8a1c8	feat(hicache): Support passing prefix keys for l3 store. (#9045 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-10 00:22:05 -07:00
Shangming Cai	70fbb3adf6	[CI] Refactor PD disaggregation test suite (#11363 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-09 18:50:39 -07:00
Glen Liu	9a7e7a6576	[Bug Fix] prevent lora adapter from being loaded into LoRAManager if it is already loaded (#11365 )	2025-10-09 18:43:03 -07:00
Sundara Raman Ramachandran	53bd00d975	[Generative Score API] Multi-Item scoring with custom attention mask. (#10979 )	2025-10-08 18:47:32 -07:00
Netanel Haber	d6837aea4d	model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-10-09 00:37:38 +08:00
Liangsheng Yin	c882b5ae75	[CI] improve disaggregation CI. (#11264 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-08 21:40:56 +08:00
Cheng Wan	3c06b673af	[8/N] MoE Refactor: deprecate `EPMoE` (#11211 )	2025-10-07 21:51:41 -07:00
Adarsh Shirawalmath	7c3f07dbcb	[Feature] Add /tokenize and /detokenize OpenAI compatible endpoints (#9545 )	2025-10-08 12:38:48 +08:00
Mick	64d1505c0a	ci: unify the model launch method of nightly ci (#11230 )	2025-10-07 18:13:14 -07:00
cctry	f3764c26a3	Clean match_prefix and prepare_for_extend for mem cache V2 (#11200 )	2025-10-07 17:54:18 -07:00
Chang Su	7ba3de0e92	[oai serving chat] Add argument `--sampling-defaults` and fix `ChatCompletionRequest` defaults (#11304 )	2025-10-08 00:36:05 +00:00
Ke Bao	24bc3fb0f9	EAGLE cache fix for SWARadixCache (#11231 ) Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-10-07 18:21:37 +08:00
Liangsheng Yin	8a8a608af9	[ci] fix pp test (#11294 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-07 14:20:04 +08:00
Alex Chi Z	9b4c449735	convert test_deterministic into unit tests (#11095 ) Signed-off-by: Alex Chi Z <iskyzh@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-06 20:33:11 -07:00
sunxxuns	a57f0e3d56	reverse the amd ci test back to 1200s and split the 8-gpu deepseek job into two. (#11238 ) Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>	2025-10-06 19:27:57 -04:00
Zhiyu	155cbb51f0	Enable native ModelOpt quantization support (1/3) (#7149 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-10-06 13:24:15 -07:00

1 2 3 4 5 ...

1064 Commits