sglang

Author	SHA1	Message	Date
Xinyuan Tong	3e7ff1ab1f	fix: reasoning parser when request have enable_thinking flag (#8933 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-07 15:52:06 -07:00
Xinyuan Tong	3fa3c6cd6a	Enables force reasoning based on chat template for Qwen3-Thinking (#8369 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-06 20:02:47 -07:00
Lifu Huang	6210e2c4f0	Support GPU pinning for LoRA (#8697 )	2025-08-06 19:39:45 -07:00
Chang Su	92cc32d9fc	Support v1/responses and use harmony in serving_chat (#8837 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-06 16:20:34 -07:00
Yineng Zhang	3ae8e3ea8f	chore: upgrade torch 2.8.0 (#8836 )	2025-08-05 17:32:01 -07:00
Yineng Zhang	4f4e0e4162	chore: upgrade flashinfer 0.2.10 (#8827 )	2025-08-05 12:04:01 -07:00
Yineng Zhang	1ea94d3b92	chore: upgrade flashinfer v0.2.9 (#8780 )	2025-08-04 21:59:18 -07:00
ybyang	6f9baf1002	[Improvements] Merge health check route (#8444 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-08-03 01:59:06 -07:00
Guanhua Wang	f7b2853ff8	[feat] support minimum token load balance in dp attention (#7379 )	2025-08-03 00:46:47 -07:00
Nicolas Castet	82e6c3a65a	Add support for NCCL symmetric memory for TP allreduces (#8238 )	2025-08-01 23:30:55 +00:00
Cheng Wan	7a1f7fc504	[Feature] Hybrid EP and TP (#8590 )	2025-07-31 02:53:25 -07:00
Cheng Wan	e179e0b797	update sgl-kernel for EP: python part (#8550 )	2025-07-31 00:14:39 -07:00
Chang Su	a79a5d7012	Revert "Fix the input tools format and history tool_calls in OpenAI API (#6556 )" (#8584 )	2025-07-30 13:12:05 -07:00
Lianmin Zheng	a4c3b121d8	Split the scheduler into multiple mixin classes to reduce the file size (#8483 )	2025-07-29 12:46:50 -07:00
Timofey	c8f549d96d	Fix parsing ChatCompletionMessage (#7273 ) Co-authored-by: Timofey K <timosha1113@gmail.com>	2025-07-28 11:35:14 -07:00
harrisonlimh	747dd45077	feat: throttle requests at scheduler based on --max_queued_requests (#7565 )	2025-07-28 22:32:33 +08:00
Chang Su	b47eda3316	bugfix: Fix multiple finish_reason chunks and tool_calls finish reason check (#8417 )	2025-07-27 13:31:06 -07:00
Binyao Jiang	e983d66680	Fix: Improve test_openai_function_calling unit test and fix reasoning_parser.py think_start_token logic (#8316 ) Co-authored-by: Chang Su <chang.s.su@oracle.com>	2025-07-27 13:12:59 -07:00
Yineng Zhang	10ee89559e	chore: upgrade flashinfer v0.2.9rc2 (#8406 )	2025-07-27 01:41:22 -07:00
Yingchun Lai	36d6f0ba5b	fix: fix the missing metrics on non-rank0 nodes (#7720 )	2025-07-27 00:55:25 -07:00
Lianmin Zheng	ed2e313eb6	Clean up server_args, triton cache manager (#8332 )	2025-07-25 14:14:51 -07:00
Ying Wang	7ad6b766c5	fix: Fix failed functional tests https://github.com/meta-llama/llama-stack-evals (#8266 )	2025-07-24 23:11:32 -07:00
Swipe4057	8d1c5b948e	chore: upgrade flashinfer v0.2.9rc1 (#8301 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-07-24 14:29:56 -07:00
Simo Lin	5dd0f870ab	[bug] fix pd completion protocol for batching support (#8317 )	2025-07-23 23:18:17 -07:00
Yineng Zhang	4953f4ca9a	chore: upgrade sgl-kernel 0.2.7 (#8304 )	2025-07-23 15:07:27 -07:00
xianzhiT	c87d4fec99	Fix the issue of incorrect finish reason in final stream response chunk returned during tool call (#7708 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-23 13:28:53 -07:00
Yineng Zhang	74f59ae555	chore: upgrade sgl-kernel 0.2.6.post1 (#8202 )	2025-07-21 02:10:24 -07:00
Lianmin Zheng	55381a46ac	Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181 )	2025-07-19 22:41:30 -07:00
ybyang	4540a4666a	[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115 ) Signed-off-by: ybyang <ybyang7@iflytek.com>	2025-07-19 18:10:00 -07:00
Yineng Zhang	561dd7b2ce	chore: upgrade sgl-kernel 0.2.6 (#8166 )	2025-07-19 03:17:08 -07:00
jiawei	f1f1d1d40d	Fix the input tools format and history tool_calls in OpenAI API (#6556 )	2025-07-15 00:58:55 -07:00
Xinyuan Tong	6e923dbd30	feat: update multimodal data handling in engine entrypoint (#8002 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-15 00:12:22 -07:00
Yineng Zhang	732fc8e405	chore: upgrade sgl-kernel 0.2.5 (#7971 )	2025-07-11 20:35:06 -07:00
Mick	b5e3d6031c	vlm: support video as an input modality (#5888 )	2025-07-09 23:48:35 -07:00
kyleliang-nv	dd445a41f5	[feature] Add start step profile argument in /start_profile (#7608 )	2025-07-09 18:42:15 -07:00
Brayden Zhong	a37e1247c1	[Multimodal][Perf] Use `pybase64` instead of `base64` (#7724 )	2025-07-08 14:00:58 -07:00
Yineng Zhang	62f5522ffe	chore: upgrade sgl-kernel v0.2.4 (#7801 )	2025-07-05 17:37:40 -07:00
Yineng Zhang	77cfea689d	chore: upgrade sgl-kernel v0.2.3 (#7786 )	2025-07-05 01:55:55 -07:00
Zilin Zhu	af46f299f9	[RL] add pause and continue generation for async rl training (#7419 )	2025-07-04 18:49:49 -07:00
Yi Zhang	489934be0a	fuse renormal into moe topk softmax kernel python code (#7751 ) Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: zhyncs <me@zhyncs.com>	2025-07-03 16:22:14 -07:00
Albert	d3c275b117	Support updating weights at once by stopping all requests (#6698 ) Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com> Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>	2025-07-02 22:26:06 -07:00
Zilin Zhu	0626f678de	[RL] support update_weights_from_distributed with different group and multiple weights (#7292 )	2025-07-02 19:29:11 -07:00
Zilin Zhu	09e699bba4	[RL] add --skip-warmup (#7416 )	2025-07-02 18:50:43 -07:00
Yineng Zhang	f18a8fddd4	chore: upgrade flashinfer v0.2.7.post1 (#7698 )	2025-07-01 14:05:57 -07:00
Zhiqiang Xie	f9eb04ddb2	upgrade sgl kernel to 0.2.1 for main (#7676 )	2025-07-01 00:00:13 -07:00
Yineng Zhang	392e441ad1	chore: upgrade flashinfer v0.2.7 jit (#7663 )	2025-06-30 13:26:26 -07:00
Lianmin Zheng	22352d47a9	Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 ) Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-06-29 23:16:19 -07:00
Lianmin Zheng	071a1f51ae	[Minor] clean up multimodal processor and tokenizer manager (#7624 )	2025-06-29 02:50:14 -07:00
Lifu Huang	49538d111b	Support dynamic LoRA loading / unloading in engine/server API (#7446 )	2025-06-27 21:00:27 -07:00
mlmz	fe2a0f962f	minor: 'role' must be system/assistant/tool, but case insensitive for now (#7499 )	2025-06-25 02:11:03 -07:00

1 2 3 4

161 Commits