sglang

Author	SHA1	Message	Date
Tony Lu	1e18a341e9	[Bugfix] fix pd chat completion protocol for batching support (#10016 ) Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>	2025-09-04 01:43:16 -07:00
Liangsheng Yin	5dfcd6c207	add proctitle for tokenizers (#9952 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-03 13:31:38 +08:00
Lianmin Zheng	60e37f8028	Move parsers under a single folder (#9912 )	2025-09-02 18:25:04 -07:00
JieXin Liang	1db649ac02	[feat] apply deep_gemm compile_mode to skip launch (#9879 )	2025-09-02 03:20:30 -07:00
Yineng Zhang	349b491c63	chore: upgrade flashinfer 0.3.0 (#9864 )	2025-09-01 03:07:19 -07:00
ybyang	5f77e1292d	Support Multi Process Tokenizer Manager(#6555 ) (#8964 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-09-01 01:00:13 -07:00
Teng Ma	f05c68733e	[HiCache] Clear kvcache in storage backend with fastAPI (#9750 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2025-08-31 17:41:44 +08:00
Yineng Zhang	9970e3bf32	chore: upgrade sgl-kernel 0.3.7.post1 with deepgemm fix (#9822 )	2025-08-30 04:02:25 -07:00
Yineng Zhang	3d8fc43400	chore: upgrade flashinfer 0.3.0rc1 (#9793 )	2025-08-29 16:24:17 -07:00
gongwei-130	3fd1431df2	support enable in the reasoning field to enable thingking for thinkin… (#9715 )	2025-08-29 10:57:32 -07:00
gongwei-130	9a7c8842ba	accomendate json schema in the "schema" field, not in "json_schema" field of response_format (#9786 )	2025-08-28 23:51:50 -07:00
Yineng Zhang	b962a296ed	chore: upgrade sgl-kernel 0.3.7 (#9708 )	2025-08-27 14:00:31 -07:00
Xinyuan Tong	68a54e063e	Sets default model name in request classes (#9683 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-27 10:43:03 -07:00
cicirori	b6c14ec0b4	add `response_format` support for `completion` API (#9665 )	2025-08-26 15:01:29 -07:00
Xiaotong Jiang	0936c766ed	Fix kimi k2 function calling format (#9606 )	2025-08-26 00:50:59 -07:00
GavinZhu-GMI	0ef583b7de	fix: allow user to specify function as role (#9635 )	2025-08-26 00:47:20 -07:00
Jonas	a0a77d937b	Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190 ) Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: minleminzui <2969413251@qq.com> Co-authored-by: maocheng23 <maocheng@berkeley.edu> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-25 15:26:26 -07:00
Binyao Jiang	3affa9dcc3	Fix GLM45 tool call multi-turn bug (#9500 )	2025-08-25 13:46:13 -07:00
Yineng Zhang	938e986e15	chore: upgrade flashinfer 0.2.14.post1 (#9578 )	2025-08-25 00:12:17 -07:00
Yuhao Zhou	17d5eda887	bugfix for undefined logging functions in HarmonyBrowserTool & HarmonyPythonTool (#9229 )	2025-08-25 00:10:35 -07:00
fzyzcjy	2600fc0d47	Overlapped weight offload (#8034 )	2025-08-23 02:06:46 -07:00
Chanh Nguyen	127d4b0d5e	Support GC Freezing to improve latency & throughput (#9241 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2025-08-23 13:43:09 +08:00
Xinyuan Tong	6c855db82c	Revert "bugfix: Fix output_ids extraction in detokenizer_manager" (#9467 )	2025-08-21 17:24:25 -07:00
Xinyuan Tong	e8449ab515	Add deepseek v3.1 thinking parser support and update docs (#9464 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 15:09:40 -07:00
gongwei-130	10d34f74e2	fix: should return a invalid request response when schema missing (#9461 )	2025-08-21 14:06:50 -07:00
gongwei-130	9ba7253094	accomendate reasoning_effort set in chat_template_kwargs (#9458 )	2025-08-21 13:22:03 -07:00
hlu1	dae9a80f43	[fix] Fix mxfp4 weight loading bug with TP sharding in GPT-OSS (#9433 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 03:50:51 -07:00
fzyzcjy	42c8704560	Add PDL support for quant kernel and rope kernel (#9106 )	2025-08-20 01:56:29 -07:00
Keyang Ru	f515449582	Fix gpt-oss response api streaming issue (#9368 )	2025-08-19 20:19:42 -07:00
江家瑋	ca533580f2	[Docs] Correct and clarify notes in Engine docstring (#9313 ) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>	2025-08-18 13:24:19 -07:00
gongwei-130	0cf3fbeb18	should return invalide request for empty prompt (#9315 )	2025-08-18 11:44:11 -07:00
Chengxing Xie	c1c7dc4534	feat: Add model version tracking with API endpoints and response metadata (#8795 )	2025-08-14 12:13:46 -07:00
Hongbo Xu	2cc9eeab01	[4/n]decouple quantization implementation from vLLM dependency (#9191 ) Co-authored-by: AniZpZ <aniz1905@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-08-14 12:05:46 -07:00
eigen	4dbf43601d	fix: zero_init buffer (#9065 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-08-14 02:39:09 -07:00
Jiaqi Gu	c9ee738515	Fuse writing KV buffer into rope kernel (part 2: srt) (#9014 ) Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>	2025-08-12 13:15:30 -07:00
Chang Su	f2a5de284b	[Bugfix] Fix accuracy-test-1-gpu failure caused by `builtin_tools` (#9114 )	2025-08-12 09:56:13 -07:00
Chang Su	a218490136	(gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043 )	2025-08-11 18:59:18 -07:00
Chang Su	a6452b7188	bugfix: Fix output_ids extraction in detokenizer_manager (#9047 )	2025-08-11 03:17:32 -07:00
zhyncs	f4ae50e97c	fix: use flashinfer v0.2.11.post1	2025-08-11 02:49:25 -07:00
Yineng Zhang	84cb449eec	Revert "chore: upgrade flashinfer 0.2.11 (#9036 )" (#9057 )	2025-08-11 00:16:39 -07:00
Yineng Zhang	dd001a5477	chore: upgrade flashinfer 0.2.11 (#9036 )	2025-08-10 17:35:37 -07:00
Lianmin Zheng	4ea9d74a3e	Simplify health check (#9034 )	2025-08-10 17:35:05 -07:00
Stefan He	8ecf6b9d24	Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079 )	2025-08-10 16:08:59 -07:00
Lianmin Zheng	9a44b643c6	Fix CI (#9012 )	2025-08-09 13:33:42 -07:00
Yineng Zhang	326a901df4	chore: upgrade sgl-kernel 0.3.3 (#8998 )	2025-08-09 01:22:01 -07:00
Lianmin Zheng	706bd69cc5	Clean up server_args.py to have a dedicated function for model specific adjustments (#8983 )	2025-08-08 19:56:50 -07:00
ishandhanani	4e7f025219	chore(gb200): update to CUDA 12.9 and improve build process (#8772 )	2025-08-08 13:42:47 -07:00
Zilin Zhu	dd650e0e21	[RL] fix skip_server_warmup and rl health_generate logic (#8757 )	2025-08-08 04:34:38 -07:00
Lianmin Zheng	a947154286	Revert "Support Multi Process Tokenizer Manager" (#8960 )	2025-08-08 02:28:27 -07:00
ybyang	7490e3f67d	Support Multi Process Tokenizer Manager (#6555 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: lw9527 <952799980@qq.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>	2025-08-08 01:45:50 -07:00

1 2 3 4 5

211 Commits