sglang

Author	SHA1	Message	Date
narutolhy	839c93bd2d	feat: add original logprobs to response (#8375 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>	2025-08-29 11:43:57 -07:00
gongwei-130	3fd1431df2	support enable in the reasoning field to enable thingking for thinkin… (#9715 )	2025-08-29 10:57:32 -07:00
gongwei-130	9a7c8842ba	accomendate json schema in the "schema" field, not in "json_schema" field of response_format (#9786 )	2025-08-28 23:51:50 -07:00
Hubert Lu	711390a971	[AMD] Support Hierarchical Caching on AMD GPUs (#8236 )	2025-08-28 15:27:07 -07:00
Qiaolin Yu	4a4772ae03	Support speculative decoding in hybrid attention backend (#9573 )	2025-08-28 01:11:42 -07:00
cicirori	b6c14ec0b4	add `response_format` support for `completion` API (#9665 )	2025-08-26 15:01:29 -07:00
Xiaotong Jiang	0936c766ed	Fix kimi k2 function calling format (#9606 )	2025-08-26 00:50:59 -07:00
Netanel Haber	4cd08dc592	model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (#9301 )	2025-08-26 15:33:40 +08:00
ZhengdQin	f92b729d52	[new feat] ascend backend support fia fusion kernel (#8328 ) Co-authored-by: Even Zhou <even.y.zhou@outlook.com>	2025-08-25 23:13:08 -07:00
Jonas	a0a77d937b	Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190 ) Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: minleminzui <2969413251@qq.com> Co-authored-by: maocheng23 <maocheng@berkeley.edu> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-25 15:26:26 -07:00
Yineng Zhang	ebd9dbe71b	fix: revert #8593 (#9581 )	2025-08-25 01:29:06 -07:00
Pavani Majety	3cc3d9b950	Add Support for Page Size greater than 1 for Flashinfer MLA Backend (#8593 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-08-21 18:15:06 -07:00
DiweiSun	029e0af31d	ci: enhance xeon ci (#9395 )	2025-08-21 03:35:17 -07:00
VDV1985	2c4b4b786b	[feature] Ascend NPU graph support (#9399 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>	2025-08-20 21:13:27 -07:00
Mick	ef3004d90a	misc: parse bench_serving result as markdown table (#9377 )	2025-08-20 16:44:20 -07:00
Lifu Huang	b0980af89f	Support pinning adapter via server args. (#9249 )	2025-08-20 16:25:01 -07:00
Even Zhou	de2dd73831	Revert "[feature] Rework Ascend NPU graph support" (#9385 )	2025-08-20 00:35:10 -07:00
Shangming Cai	d8ed60f254	[CI] Fix disaggregation failure tolerance CI (#9378 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-08-19 23:31:08 -07:00
Even Zhou	3680d6f88b	[feature] Rework Ascend NPU graph support (#9350 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>	2025-08-19 20:32:27 -07:00
Even Zhou	f4fafacc5d	Revert "[feature] Ascend NPU graph support (#8027 )" (#9348 )	2025-08-19 10:11:23 -07:00
Yuan Luo	968e181826	Fix triton_fused_moe unit test and benchmark (#9276 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-08-18 00:54:33 -07:00
Netanel Haber	845d12a979	model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067 ) Co-authored-by: Kyle Huang <kylhuang@nvidia.com>	2025-08-17 01:48:15 -07:00
Stefan He	e47800e176	Quick Fix GLM (#9264 )	2025-08-16 23:43:41 -07:00
Mick	1df84ff414	ci: simplify multi-modality tests by using mixins (#9006 )	2025-08-16 22:25:02 -07:00
Binyao Jiang	66d6be0874	Bug fix: use correct mm_items in embed_mm_inputs (#8893 )	2025-08-16 19:55:56 -07:00
Shangming Cai	384f8ab5ce	[PD] Support PD disaggregation with Prefill PP (#8846 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: root <huzhiyuan@xiaohongshu.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Francis <38564764+ssssnow@users.noreply.github.com> Co-authored-by: zitto <zhjc1124@gmail.com>	2025-08-16 18:31:31 -07:00
VDV1985	94371dbbd6	[feature] Ascend NPU graph support (#8027 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>	2025-08-16 17:25:17 -07:00
Hank Han	81da16f6d3	[CI] add deepseek w4a8 test on h20 ci (#7758 )	2025-08-16 01:54:13 -07:00
Hubert Lu	9c3e95d98b	[AMD] Expand test coverage for AMD CI and enable apply_token_bitmask_inplace_cuda in sgl-kernel (#8268 )	2025-08-15 12:32:51 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
Chengxing Xie	c1c7dc4534	feat: Add model version tracking with API endpoints and response metadata (#8795 )	2025-08-14 12:13:46 -07:00
Hongbo Xu	2cc9eeab01	[4/n]decouple quantization implementation from vLLM dependency (#9191 ) Co-authored-by: AniZpZ <aniz1905@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-08-14 12:05:46 -07:00
Sundara Raman Ramachandran	a027a9b4b3	[Generative Score API] Optimization to Remove Decode. (#8840 )	2025-08-14 05:12:24 +08:00
Kevin Xiang Li	3b3b3baf9f	Double vision prefill throughput by defaulting to optimal vision attention backend (#8484 ) Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>	2025-08-13 02:08:30 -07:00
Stefan He	930fe467bd	Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 (#9093 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-08-12 21:21:55 -07:00
jacky.cheng	25caa7a8a9	[AMD] Support Wave attention backend with AMD GPU optimizations (#8660 ) Signed-off-by: Stanley Winata <stanley.winata@amd.com> Signed-off-by: Harsh Menon <harsh@nod-labs.com> Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com> Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Signed-off-by: xintin <gaurav.verma@amd.com> Co-authored-by: Harsh Menon <harsh@nod-labs.com> Co-authored-by: Stanley Winata <stanley.winata@amd.com> Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com> Co-authored-by: Stanley Winata <stanley@nod-labs.com> Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com> Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com> Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com> Co-authored-by: Ivan Butygin <ibutygin@amd.com>	2025-08-12 13:49:11 -07:00
ichernob	83123f481e	[Quantization] Supported w8a8 int8 quantized Gemma3 and Qwen-VL models (#8619 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2025-08-12 13:31:18 -07:00
Lifu Huang	29a610b4d9	Fix broken CI TestRequestLengthValidation (#9095 )	2025-08-11 22:59:56 -07:00
Baizhou Zhang	75e6a7cde1	Support radix cache for Lora feature (#7216 )	2025-08-11 10:14:11 -07:00
Cheng Wan	f003cd3548	[CI] Fix CI tests (#9050 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-10 23:52:05 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00
Stefan He	8ecf6b9d24	Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079 )	2025-08-10 16:08:59 -07:00
Lifu Huang	e322a94d1f	Reduce CI duration of test_lora_update. (#9024 )	2025-08-10 15:34:04 -07:00
Lianmin Zheng	2c7f01bc89	Reorganize CI and test files (#9027 )	2025-08-10 12:30:06 -07:00
Stefan He	6345069f6c	[RL] Add test for /abort_request (#7626 )	2025-08-10 09:14:19 -07:00
Lianmin Zheng	ef48d5547e	Fix CI (#9013 )	2025-08-09 16:00:10 -07:00
Lianmin Zheng	9a44b643c6	Fix CI (#9012 )	2025-08-09 13:33:42 -07:00
Binyao Jiang	f29aba8c6e	Support glm4.1v and glm4.5v (#8798 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-09 00:59:13 -07:00
Binyao Jiang	7b81f956eb	Fix qwen2 audio not working bug (#8600 )	2025-08-09 00:42:29 -07:00
fzyzcjy	442534aa44	Add CI for gpt-oss model on hopper (#8851 )	2025-08-09 00:34:23 -07:00

1 2 3 4 5 ...

911 Commits