sglang

Author	SHA1	Message	Date
Baizhou Zhang	3fa3c22ae2	Fix fast decode plan for flashinfer v0.4.0rc1 and upgrade sgl-kernel 0.3.11 (#10634 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-09-19 01:25:29 -07:00
penguin_wwy	93f75778be	[RL] Add destroy process group api (#9979 )	2025-09-19 00:31:56 +08:00
harrisonlimh	14fdd52740	feat: add priority based scheduling with priority based request acceptance and preemption (#8746 )	2025-09-16 17:10:10 -07:00
cicirori	a2f7218a2e	support using fa4 on deepseek on blackwell (#9928 )	2025-09-16 16:16:06 -07:00
Yineng Zhang	c0c6f543e4	chore: upgrade sgl-kernel 0.3.10 (#10500 )	2025-09-16 02:00:53 -07:00
Liangsheng Yin	fa5d0bf6a5	Remove wrong imports `from sglang.python` (#10493 )	2025-09-15 22:12:21 -07:00
Vincent Zhong	0b14159fc4	Add reasoning examples for GPT-OSS in Markdown examples (#9626 )	2025-09-15 11:27:40 +08:00
Yingchun Lai	fc2c3a3d8e	metrics: support customer labels specified in request header (#10143 )	2025-09-14 20:00:08 -07:00
Liangsheng Yin	305c9e8c2d	[4/N]DP refactor: support watching mode `get_load` and shortest queue strategy (#10201 )	2025-09-15 10:06:08 +08:00
Feng Su	4c21b09074	[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962 ) Signed-off-by: Feng Su <sufeng@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peng Wang <rocking@linux.alibaba.com>	2025-09-15 02:08:02 +08:00
fzyzcjy	4da5533682	Support profile args in Engine API (#6539 )	2025-09-14 01:21:10 -07:00
amysaq2023	30d20ce84f	Support loading weights from remote instance (#8215 ) Signed-off-by: Anqi Shen <amy.saq@antgroup.com> Co-authored-by: Chayenne <74843776+zhaochenyang20@users.noreply.github.com>	2025-09-12 17:40:22 +08:00
ybyang	6d40308905	Revert add mainprocess's proctitle (#10351 )	2025-09-12 16:48:30 +08:00
Chang Su	53ca15529a	Implement Standalone gRPC Server for SGLang Python Scheduler (#10283 )	2025-09-11 20:57:17 -07:00
Yineng Zhang	bfe01a5eef	chore: upgrade v0.3.9.post2 sgl-kernel (#10297 )	2025-09-11 04:10:29 -07:00
Lianmin Zheng	033b75f559	[Auto Sync] Update serving_base.py, serving_chat.py, servin... (20250910) (#10282 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: cctry <shiyang@x.ai>	2025-09-10 16:58:59 -07:00
Xinyuan Tong	f3b5db6ee8	Feat: support disable tool parser (#10184 )	2025-09-10 14:03:55 -07:00
Lianmin Zheng	bcf1955f7e	Revert "chore: upgrade v0.3.9 sgl-kernel" (#10245 )	2025-09-09 19:05:20 -07:00
Yineng Zhang	d3ee70985f	chore: upgrade v0.3.9 sgl-kernel (#10220 )	2025-09-09 03:16:25 -07:00
Liangsheng Yin	72f9fc5f11	Monkey patch uvicorn multi worker `is_alive` timeout (#10159 ) Co-authored-by: Huang Long <121648372+llll114@users.noreply.github.com>	2025-09-08 17:43:23 +08:00
Liangsheng Yin	e719bb0e84	[1/2] Refactor multi-tokenizer manager (#10074 )	2025-09-07 19:13:34 +08:00
Jinyang Yuan	012584ecd5	perf: Avoid unnecessary data type conversions for DeepSeek-V3 on Blackwell (#9834 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-09-06 14:06:46 +08:00
hlu1	2985090084	Update flashinfer to 0.3.1 for B300 support (#10087 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-09-05 13:41:01 -07:00
Liangsheng Yin	6e95f5e5bd	Simplify `Router` arguments passing and build it in docker image (#9964 )	2025-09-05 12:13:55 +08:00
Tony Lu	1e18a341e9	[Bugfix] fix pd chat completion protocol for batching support (#10016 ) Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>	2025-09-04 01:43:16 -07:00
Liangsheng Yin	5dfcd6c207	add proctitle for tokenizers (#9952 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-03 13:31:38 +08:00
Lianmin Zheng	60e37f8028	Move parsers under a single folder (#9912 )	2025-09-02 18:25:04 -07:00
JieXin Liang	1db649ac02	[feat] apply deep_gemm compile_mode to skip launch (#9879 )	2025-09-02 03:20:30 -07:00
Yineng Zhang	349b491c63	chore: upgrade flashinfer 0.3.0 (#9864 )	2025-09-01 03:07:19 -07:00
ybyang	5f77e1292d	Support Multi Process Tokenizer Manager(#6555 ) (#8964 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-09-01 01:00:13 -07:00
Teng Ma	f05c68733e	[HiCache] Clear kvcache in storage backend with fastAPI (#9750 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2025-08-31 17:41:44 +08:00
Yineng Zhang	9970e3bf32	chore: upgrade sgl-kernel 0.3.7.post1 with deepgemm fix (#9822 )	2025-08-30 04:02:25 -07:00
Yineng Zhang	3d8fc43400	chore: upgrade flashinfer 0.3.0rc1 (#9793 )	2025-08-29 16:24:17 -07:00
gongwei-130	3fd1431df2	support enable in the reasoning field to enable thingking for thinkin… (#9715 )	2025-08-29 10:57:32 -07:00
gongwei-130	9a7c8842ba	accomendate json schema in the "schema" field, not in "json_schema" field of response_format (#9786 )	2025-08-28 23:51:50 -07:00
Yineng Zhang	b962a296ed	chore: upgrade sgl-kernel 0.3.7 (#9708 )	2025-08-27 14:00:31 -07:00
Xinyuan Tong	68a54e063e	Sets default model name in request classes (#9683 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-27 10:43:03 -07:00
cicirori	b6c14ec0b4	add `response_format` support for `completion` API (#9665 )	2025-08-26 15:01:29 -07:00
Xiaotong Jiang	0936c766ed	Fix kimi k2 function calling format (#9606 )	2025-08-26 00:50:59 -07:00
GavinZhu-GMI	0ef583b7de	fix: allow user to specify function as role (#9635 )	2025-08-26 00:47:20 -07:00
Jonas	a0a77d937b	Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190 ) Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: minleminzui <2969413251@qq.com> Co-authored-by: maocheng23 <maocheng@berkeley.edu> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-25 15:26:26 -07:00
Binyao Jiang	3affa9dcc3	Fix GLM45 tool call multi-turn bug (#9500 )	2025-08-25 13:46:13 -07:00
Yineng Zhang	938e986e15	chore: upgrade flashinfer 0.2.14.post1 (#9578 )	2025-08-25 00:12:17 -07:00
Yuhao Zhou	17d5eda887	bugfix for undefined logging functions in HarmonyBrowserTool & HarmonyPythonTool (#9229 )	2025-08-25 00:10:35 -07:00
fzyzcjy	2600fc0d47	Overlapped weight offload (#8034 )	2025-08-23 02:06:46 -07:00
Chanh Nguyen	127d4b0d5e	Support GC Freezing to improve latency & throughput (#9241 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2025-08-23 13:43:09 +08:00
Xinyuan Tong	6c855db82c	Revert "bugfix: Fix output_ids extraction in detokenizer_manager" (#9467 )	2025-08-21 17:24:25 -07:00
Xinyuan Tong	e8449ab515	Add deepseek v3.1 thinking parser support and update docs (#9464 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 15:09:40 -07:00
gongwei-130	10d34f74e2	fix: should return a invalid request response when schema missing (#9461 )	2025-08-21 14:06:50 -07:00
gongwei-130	9ba7253094	accomendate reasoning_effort set in chat_template_kwargs (#9458 )	2025-08-21 13:22:03 -07:00

1 2 3 4 5

235 Commits