sglang

Author	SHA1	Message	Date
Liangsheng Yin	f49419061d	Move args from `global_config` to `environ` (#11332 )	2025-10-12 21:29:31 +08:00
Liangsheng Yin	01e59e8247	Fix CI break by express-laned PRs. (#11499 )	2025-10-12 21:06:06 +08:00
Mike Qiu	99a0704a36	bailingMoE: Fix Key error of deepep_mode (#11465 ) Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com> Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>	2025-10-12 20:42:59 +08:00
Antoine Roux	ec1cd90ac9	Fix the GPT function calling regex to allow dash in the name (#10577 )	2025-10-12 20:34:58 +08:00
Kai-Hsun Chen	1103dc6204	[chore][2/N] Avoid using default mutable parameters (#11479 ) Signed-off-by: Kai-Hsun Chen <khchen@x.ai>	2025-10-12 20:34:04 +08:00
Vincent Zhong	a220536f40	[ perf ] Replace json-> orjson in hot path (#11221 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2025-10-12 20:30:58 +08:00
Mahmoud Ashraf	7b064f04f8	[bugfix]: use correct causality condition for flashattention, flashinfer, and triton backends (#10172 )	2025-10-12 20:28:16 +08:00
Kai-Hsun Chen	43190becfa	[chore][1/N] Avoid using default mutable parameters (#11478 ) Signed-off-by: Kai-Hsun Chen <khchen@x.ai>	2025-10-12 20:26:39 +08:00
Vincent Zhong	be740acdb0	[smol] [perf] Qwen3-VL in place op. (#11481 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2025-10-12 20:25:30 +08:00
Yuwei An	4ac8e09df0	Piecewise CUDA Graph Support & Torch Compile Backend (#10062 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-12 11:55:57 +08:00
Liangsheng Yin	20a6c0a63d	Beta spec-overlap for EAGLE (#11398 ) Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-10-12 11:02:22 +08:00
Glen Liu	47c606d3dc	[Feature] support regex strings as a stopping condition (#10635 )	2025-10-12 10:53:15 +08:00
Lorenzo Lu	b5dcfd4154	Add option to disable `any_whitespace` for `xgrammar` and `llguidance` backends. (#8919 ) Co-authored-by: Chang Su <chang.s.su@oracle.com>	2025-10-11 22:24:58 +08:00
ybyang	5061b8fd3e	fix stop when stream (#11462 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2025-10-11 22:06:31 +08:00
ykcombat	c8452551ce	[Fix] Fix split prefill with fa3. (#11428 )	2025-10-11 22:03:28 +08:00
fzyzcjy	bf3e7149be	Fix enable_v2 in int8 quant (#11470 )	2025-10-11 21:56:30 +08:00
ykcombat	f5754d1256	[Documentation][Configuration] Server args and documentation of PD-Multiplexing. (#11427 )	2025-10-11 21:36:07 +08:00
Liangsheng Yin	739daa63e4	Adjust logits metada init for target verify (#11467 )	2025-10-11 21:17:04 +08:00
fzyzcjy	21337b22b9	Reland [1/2] Optimizations and refactors about quant kernel (#10312 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-11 15:59:03 +08:00
Zhiyu	129d299278	Enable native ModelOpt quantization support (2/3) (#9991 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-10-11 07:48:14 +00:00
Binyao Jiang	451d15c44b	[DPSKv3.2] Rewrite nsa tilelang act_quant kernel to triton (#11450 )	2025-10-10 23:13:46 -07:00
Liu-congo	c80a96dae9	[BugFix] test_mla_fp8.py fails on Cublas 12.9 (#11360 ) Signed-off-by: Liu-congo <1502632128@qq.com>	2025-10-10 21:14:24 -07:00
Stefan He	eae9a9fb9d	Fix batch invariant ops (#11368 )	2025-10-10 20:49:08 -07:00
wxsm	2674c1d280	fix: Change dsv32 hack temporary path to use system temp directory (#11445 )	2025-10-10 19:59:41 -07:00
Lianmin Zheng	61055cb309	Reorder PD disagg CI tests (#11438 )	2025-10-10 17:56:49 -07:00
Simo Lin	c495833186	[router] leverage RAII to actively cancel request during client disconnect (#11399 )	2025-10-10 20:43:38 -04:00
cctry	b36afed4a7	Separate allocation logic from scheduler (#11313 )	2025-10-10 17:38:54 -07:00
JinYan Su	9aa4502d11	feat(mooncake): support GB suffix for global_segment_size (#10745 ) Signed-off-by: Jinyang Su <751080330@qq.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>	2025-10-10 17:38:25 -07:00
Scott Lee	55b14656e6	Revert "Add metrics for speculative decoding (acceptance rate, average acceptance length)" (#11433 )	2025-10-10 12:54:57 -07:00
Lianmin Zheng	b4408e6098	Revert "fix: fix video input for qwen3-vl" (#11437 )	2025-10-10 12:44:40 -07:00
Cheng Wan	52fcbbb8bd	Revert "perf: optimize qwen-vl with symm mem allreduce" (#11436 )	2025-10-10 12:30:05 -07:00
Teng Ma	9082a7d323	[HiCache] feat: add multi tenant with prefix tag (#9256 )	2025-10-11 00:23:28 +08:00
Yuan Luo	3b9d97f335	perf: optimize qwen-vl with symm mem allreduce (#11381 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-10 22:24:45 +08:00
Mick	a1a20b4c7c	fix: fix video input for qwen3-vl (#11361 )	2025-10-10 04:35:35 -07:00
Yineng Zhang	4299aebdbb	chore: update pyproject (#11420 )	2025-10-10 00:56:30 -07:00
Scott Lee	0babd48736	Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11144 )	2025-10-10 00:46:44 -07:00
Zaili Wang	f19613e6c3	Dedicated toml files for CPU/XPU (#10734 )	2025-10-10 00:44:55 -07:00
ziruiliu	8df4945559	fix file and object naming scheme in HiCacheNixl to avoid data corruption (#10969 ) Signed-off-by: Zirui Liu <ziliu@ddn.com>	2025-10-10 00:23:10 -07:00
hzh0425	ee3bd8a1c8	feat(hicache): Support passing prefix keys for l3 store. (#9045 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-10 00:22:05 -07:00
Yuan Luo	b5044fbf12	Replace pad with cat for better performance (#11388 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-10 12:03:17 +08:00
Glen Liu	9a7e7a6576	[Bug Fix] prevent lora adapter from being loaded into LoRAManager if it is already loaded (#11365 )	2025-10-09 18:43:03 -07:00
Yingchun Lai	0fe87213bb	fix: fix gpu-proc affinity set incorrectly when pp_size > 1 (#11389 )	2025-10-09 18:40:05 -07:00
Xinyuan Tong	1f106ee365	[grammar] Avoid server crash when grammar backend is None (#11401 )	2025-10-09 18:38:10 -07:00
Lianmin Zheng	9b8ebb2798	move more files under srt/utils (#11285 )	2025-10-09 16:46:15 -07:00
sglang-bot	758b887ad1	chore: bump SGLang version to 0.5.3.post1 (#11324 )	2025-10-09 15:19:59 -07:00
Yineng Zhang	44cb060785	chore: upgrade flashinfer 0.4.0 (#11364 )	2025-10-09 14:17:54 -07:00
Chang Su	b520958ec8	[router][grpc] Replace fake health check with correct ones (#11387 )	2025-10-09 09:13:57 -07:00
shaharmor98	fa7e2c3049	fix bench_serving mishandling of internal states (#11376 ) Signed-off-by: Shahar Mor <smor@nvidia.com>	2025-10-09 19:24:50 +08:00
shaharmor98	8f2cd177af	add code pp support for nixl (#11375 ) Signed-off-by: Shahar Mor <smor@nvidia.com>	2025-10-09 19:24:32 +08:00
Trevor Morris	a4b424c632	[DeepSeek-V3.2] Include indexer kv cache when estimating kv cache size (#11309 )	2025-10-08 23:59:46 -07:00

1 2 3 4 5 ...

3928 Commits