sglang

Author	SHA1	Message	Date
Shangming Cai	5c2142579a	[PD] Raise error for incompatible mooncake version and some minor fixes (#7527 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-25 18:55:24 -07:00
Lianmin Zheng	30ceccc74a	Update hyperparameter_tuning.md (#7454 )	2025-06-22 22:42:55 -07:00
Chang Su	72676cd6c0	feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (#7351 ) Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-06-21 13:21:06 -07:00
Jinn	ab74f8f09d	Remove batches api in docs & example (#7400 )	2025-06-20 19:46:31 -07:00
woodx	97011abc8a	[Doc] add embedding rerank doc (#7364 )	2025-06-19 21:53:54 -07:00
Yijie Zhu	a39d928782	support qwen2 running on ascend npu device (#7022 ) Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>	2025-06-17 11:24:10 -07:00
Lianmin Zheng	21615cc3fe	Minor style and doc fix (#7228 )	2025-06-16 01:03:13 -07:00
Povilas Kanapickas	bd7cfbd2f8	[Fix] Reduce busy polling when scheduler is idle (#6026 )	2025-06-12 14:58:22 -07:00
Lianmin Zheng	dbdf76ca98	Clean up docs for server args and sampling parameters (generated by grok) (#7076 )	2025-06-10 19:55:42 -07:00
Ximingwang-09	f2a75a66c4	update doc (#7046 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-06-11 10:02:01 +08:00
Lianmin Zheng	90bd3e32d6	Improve perf tuning docs (#7071 )	2025-06-10 16:55:04 -07:00
kyle-pena-kuzco	b56de8f943	Open AI API hidden states (#6716 )	2025-06-10 14:37:29 -07:00
shangmingc	dd1012fcbe	[PD] Fix potential perf spike caused by tracker gc and optimize doc (#6764 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-05 10:56:02 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Xinyuan Tong	cf9815ba69	[Refactor] Multimodal data processing for VLM (#6659 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-06-04 11:22:33 -07:00
Marc Sun	37f1547587	[FEAT] Add transformers backend support (#5929 )	2025-06-03 21:05:29 -07:00
Lianmin Zheng	2d72fc47cf	Improve profiler and integrate profiler in bench_one_batch_server (#6787 )	2025-05-31 15:53:55 -07:00
shangmingc	6cb00c6398	[PD] Optimize time out logic and add env var doc for mooncake (#6761 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-30 00:45:02 -07:00
Trevor Morris	e806f708c9	[PD] Make bootstrap code common between NIXL and Mooncake (#6473 )	2025-05-27 12:47:38 -07:00
Vincent Zhong	45a31a82e4	docs: Update documentation to reflect xgrammar as default grammar backend (#6601 ) Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2025-05-27 13:29:13 +08:00
linzhuo	7a0bbe6a64	update toc for doc and dockerfile code style format (#6450 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-27 13:05:11 +08:00
simveit	e235be16fe	Fix some issues with current docs. (#6588 )	2025-05-26 01:04:34 +08:00
Chang Su	ed0c3035cd	feat(Tool Calling): Support `required` and specific function mode (#6550 )	2025-05-23 21:00:37 -07:00
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00
Byron Hsu	7513558074	[PD] Add doc and simplify sender.send (#6019 )	2025-05-21 21:22:21 -07:00
fzyzcjy	f0653886a5	Expert distribution recording without overhead for EPLB (#4957 )	2025-05-19 20:07:43 -07:00
Yury Sulsky	f19a9204cd	Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136 ) Co-authored-by: Yury Sulsky <ysulsky@tesla.com>	2025-05-16 12:26:15 -07:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00
Brayden Zhong	9a91fa0ed1	docs: fix a bad redirect (#6300 )	2025-05-14 10:27:19 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Cheng Wan	25c83fff6a	Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558 ) Co-authored-by: liusy58 <liusy58@linux.alibaba.com>	2025-05-11 23:36:29 -07:00
Lianmin Zheng	01bdbf7f80	Improve structured outputs: fix race condition, server crash, metrics and style (#6188 )	2025-05-11 08:36:16 -07:00
Adarsh Shirawalmath	94d42b6794	[Docs] minor Qwen3 and reasoning parser docs fix (#6032 )	2025-05-11 08:22:46 -07:00
mlmz	69276f619a	doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (#6199 )	2025-05-11 08:22:11 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Yineng Zhang	66fc63d6b1	Revert "feat: add thinking_budget (#6089 )" (#6181 )	2025-05-10 16:07:45 -07:00
Ximingwang-09	921e4a8185	[Docs]Delete duplicate content (#6146 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-05-10 15:02:15 -07:00
XinyuanTong	9d8ec2e67e	Fix and Clean up chat-template requirement for VLM (#6114 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-11 00:14:09 +08:00
thyecust	63484f9fd6	feat: add thinking_budget (#6089 )	2025-05-09 08:22:09 -07:00
Zhu Chen	fa7d7fd9e5	[Feature] Add FlashAttention3 as a backend for VisionAttention (#5764 ) Co-authored-by: othame <chenzhu_912@zju.edu.cn> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-05-08 10:01:19 -07:00
Baizhou Zhang	8f508cc77f	Update doc for MLA attention backends (#6034 )	2025-05-07 18:51:05 -07:00
Baizhou Zhang	fee37d9e8d	[Doc]Fix description for dp_size argument (#6063 )	2025-05-08 00:04:22 +08:00
mlmz	a68ed76682	feat: append more comprehensive fields in messages instead of merely role and content (#5996 )	2025-05-05 11:43:34 -07:00
Wenxuan Tan	22da3d978f	Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555 )	2025-05-05 10:32:17 -07:00
vzed	95c231e50d	Tool Call: Add `chat_template_kwargs` documentation (#5679 )	2025-05-04 13:12:40 -07:00
Chayenne	73dcf2b326	Remove token in token out in Native API (#5967 )	2025-05-01 21:59:43 -07:00
Chang Su	2b06484bd1	feat: support pythonic tool call and index in tool call streaming (#5725 )	2025-04-29 17:30:44 -07:00
simveit	ae523675e5	[Doc] Tables instead of bulletpoints for sampling doc (#5841 )	2025-04-29 13:49:39 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00

1 2 3 4

200 Commits