sglang

Author	SHA1	Message	Date
Xinyuan Tong	3fa3c6cd6a	Enables force reasoning based on chat template for Qwen3-Thinking (#8369 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-06 20:02:47 -07:00
Lifu Huang	6210e2c4f0	Support GPU pinning for LoRA (#8697 )	2025-08-06 19:39:45 -07:00
HouseWest	ca47e24f5d	[Feature] improve TBO: two chunk overlap (#8144 )	2025-08-05 21:11:01 -07:00
Guanhua Wang	f7b2853ff8	[feat] support minimum token load balance in dp attention (#7379 )	2025-08-03 00:46:47 -07:00
Lifu Huang	8675bdf246	Support limiting max loaded loras in CPU. (#8650 )	2025-08-03 00:02:23 -07:00
Nicolas Castet	82e6c3a65a	Add support for NCCL symmetric memory for TP allreduces (#8238 )	2025-08-01 23:30:55 +00:00
Zac	b17c5b0118	fix arg typo for --disaggregation-transfer-backend (#8664 )	2025-08-01 10:00:47 -07:00
Cheng Wan	6c88f6c8d9	[5/N] MoE Refactor: Update MoE parallelism arguments (#8658 )	2025-08-01 01:20:03 -07:00
Faraz	4b04998d38	TRTLLM Gen MLA Decode Kernel Integration (same as #7938 ) (#8632 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-07-31 16:03:40 -07:00
Chang Su	51c38163c1	model: support Step3V (#8583 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: nnnobody-code <nnnobody@foxmail.com> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: Qiaolin-Yu <qy254@cornell.edu> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-31 02:41:00 -07:00
Kaixi Hou	134fa43e19	[NVIDIA] Change to use `num_local_experts` (#8453 )	2025-07-28 10:38:19 -07:00
Qiaolin Yu	2810338401	[feat] Support different attention backends for prefill and decode (#6338 ) Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-07-28 11:42:29 +08:00
Kevin Xiang Li	44d600cd67	Support precomputed_embeddings for Llama 4 (#8156 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-27 01:14:49 -07:00
Chang Su	d8ee15643b	[Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363 )	2025-07-25 14:59:42 -07:00
Xinyuan Tong	8430bfe3e9	[Refactor] simplify multimodal data processing (#8107 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-20 21:43:09 -07:00
Lifu Huang	4e3defe5a7	Support start up LoRA server without initial adapters (#8019 )	2025-07-19 15:38:09 -07:00
Lianmin Zheng	bb0e8a32b5	Clean up server args (#8161 )	2025-07-19 11:32:52 -07:00
Lianmin Zheng	9c7a46180c	[Doc] Steps to add a new attention backend (#8155 )	2025-07-18 16:38:26 -07:00
Lifu Huang	e2ed9d049a	Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844 )	2025-07-13 18:36:01 -07:00
ronnie_zheng	86044712c6	[feature] kv transfer support of ascend npu (#7795 ) Co-authored-by: liupeng <liupeng374@huawei.com>	2025-07-11 00:07:51 -07:00
Atream	615553079d	Support Kimi K2 (#7940 )	2025-07-11 00:02:21 -07:00
Yikai Zhang	0870232195	Update native_api doc to match the change in the `get_model_info` endpoint (#7660 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-08 21:05:58 -07:00
Shangming Cai	64c5907e12	[PD] Add guidance for prefill bootstrap timeout (#7846 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-07-08 21:00:34 -07:00
Xinyuan Tong	43f93f632c	fix CI: update native api ipynb (#7754 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-03 15:25:00 -07:00
ronnie_zheng	1e0e549766	Ascend attention backend(PA&MLA) (#7722 ) Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: VDV1985 <vladdv85@mail.ru>	2025-07-03 09:23:19 -07:00
Lianmin Zheng	22352d47a9	Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 ) Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-06-29 23:16:19 -07:00
Shangming Cai	5c2142579a	[PD] Raise error for incompatible mooncake version and some minor fixes (#7527 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-25 18:55:24 -07:00
Lianmin Zheng	30ceccc74a	Update hyperparameter_tuning.md (#7454 )	2025-06-22 22:42:55 -07:00
Chang Su	72676cd6c0	feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (#7351 ) Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-06-21 13:21:06 -07:00
Jinn	ab74f8f09d	Remove batches api in docs & example (#7400 )	2025-06-20 19:46:31 -07:00
woodx	97011abc8a	[Doc] add embedding rerank doc (#7364 )	2025-06-19 21:53:54 -07:00
Yijie Zhu	a39d928782	support qwen2 running on ascend npu device (#7022 ) Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>	2025-06-17 11:24:10 -07:00
Lianmin Zheng	21615cc3fe	Minor style and doc fix (#7228 )	2025-06-16 01:03:13 -07:00
Povilas Kanapickas	bd7cfbd2f8	[Fix] Reduce busy polling when scheduler is idle (#6026 )	2025-06-12 14:58:22 -07:00
Lianmin Zheng	dbdf76ca98	Clean up docs for server args and sampling parameters (generated by grok) (#7076 )	2025-06-10 19:55:42 -07:00
Ximingwang-09	f2a75a66c4	update doc (#7046 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-06-11 10:02:01 +08:00
Lianmin Zheng	90bd3e32d6	Improve perf tuning docs (#7071 )	2025-06-10 16:55:04 -07:00
kyle-pena-kuzco	b56de8f943	Open AI API hidden states (#6716 )	2025-06-10 14:37:29 -07:00
shangmingc	dd1012fcbe	[PD] Fix potential perf spike caused by tracker gc and optimize doc (#6764 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-05 10:56:02 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Xinyuan Tong	cf9815ba69	[Refactor] Multimodal data processing for VLM (#6659 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-06-04 11:22:33 -07:00
Marc Sun	37f1547587	[FEAT] Add transformers backend support (#5929 )	2025-06-03 21:05:29 -07:00
Lianmin Zheng	2d72fc47cf	Improve profiler and integrate profiler in bench_one_batch_server (#6787 )	2025-05-31 15:53:55 -07:00
shangmingc	6cb00c6398	[PD] Optimize time out logic and add env var doc for mooncake (#6761 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-30 00:45:02 -07:00
Trevor Morris	e806f708c9	[PD] Make bootstrap code common between NIXL and Mooncake (#6473 )	2025-05-27 12:47:38 -07:00
Vincent Zhong	45a31a82e4	docs: Update documentation to reflect xgrammar as default grammar backend (#6601 ) Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2025-05-27 13:29:13 +08:00
linzhuo	7a0bbe6a64	update toc for doc and dockerfile code style format (#6450 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-27 13:05:11 +08:00
simveit	e235be16fe	Fix some issues with current docs. (#6588 )	2025-05-26 01:04:34 +08:00
Chang Su	ed0c3035cd	feat(Tool Calling): Support `required` and specific function mode (#6550 )	2025-05-23 21:00:37 -07:00
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00

1 2 3 4 5

226 Commits