sglang

Author	SHA1	Message	Date
tarinkk	eb6c2c1663	Hybrid kv cache for LLaMA4 (#6563 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-06-27 18:58:55 -07:00
Yineng Zhang	69183f8808	chore: bump v0.4.8.post1 (#7559 )	2025-06-26 02:21:12 -07:00
Shangming Cai	5c2142579a	[PD] Raise error for incompatible mooncake version and some minor fixes (#7527 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-25 18:55:24 -07:00
Yineng Zhang	7c3a12c000	chore: bump v0.4.8 (#7493 )	2025-06-23 23:14:22 -07:00
Lianmin Zheng	30ceccc74a	Update hyperparameter_tuning.md (#7454 )	2025-06-22 22:42:55 -07:00
Chang Su	72676cd6c0	feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (#7351 ) Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-06-21 13:21:06 -07:00
Jinn	ab74f8f09d	Remove batches api in docs & example (#7400 )	2025-06-20 19:46:31 -07:00
woodx	97011abc8a	[Doc] add embedding rerank doc (#7364 )	2025-06-19 21:53:54 -07:00
Yineng Zhang	fadf18fdd5	docs: update installation (#7366 )	2025-06-19 12:00:19 -07:00
linzhuo	1de4db9bef	update invalid link in doc (#7297 )	2025-06-18 01:37:36 -07:00
Yijie Zhu	a39d928782	support qwen2 running on ascend npu device (#7022 ) Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>	2025-06-17 11:24:10 -07:00
Yineng Zhang	f9dc9dd28b	chore: bump v0.4.7.post1 (#7248 )	2025-06-16 15:20:29 -07:00
Lianmin Zheng	21615cc3fe	Minor style and doc fix (#7228 )	2025-06-16 01:03:13 -07:00
Lifu Huang	98538822d5	Add Phi-4-mm to supported VLM supported model list. (#7178 )	2025-06-13 23:17:40 -07:00
Povilas Kanapickas	bd7cfbd2f8	[Fix] Reduce busy polling when scheduler is idle (#6026 )	2025-06-12 14:58:22 -07:00
Lianmin Zheng	dbdf76ca98	Clean up docs for server args and sampling parameters (generated by grok) (#7076 )	2025-06-10 19:55:42 -07:00
Ximingwang-09	f2a75a66c4	update doc (#7046 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-06-11 10:02:01 +08:00
Lianmin Zheng	0f218731e3	Do not run frontend_reasoning.ipynb to reduce the CI load (#7073 )	2025-06-10 17:15:31 -07:00
Yudi Xue	14c18d25df	Frontend language separate reasoning support (#6031 )	2025-06-10 17:11:29 -07:00
Lianmin Zheng	90bd3e32d6	Improve perf tuning docs (#7071 )	2025-06-10 16:55:04 -07:00
kyle-pena-kuzco	b56de8f943	Open AI API hidden states (#6716 )	2025-06-10 14:37:29 -07:00
Lianmin Zheng	bb185b0e92	Update README.md (#7040 )	2025-06-10 01:59:14 -07:00
Yineng Zhang	4f723edd3b	chore: bump v0.4.7 (#7038 )	2025-06-10 01:56:20 -07:00
Yueyang Pan	98c00a2df1	Fix torch profiler bugs for bench_offline_throughput.py (#6557 )	2025-06-09 20:33:41 +08:00
HAI	b819381fec	AITER backend extension and workload optimizations (#6838 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-06-05 23:00:18 -07:00
shangmingc	dd1012fcbe	[PD] Fix potential perf spike caused by tracker gc and optimize doc (#6764 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-05 10:56:02 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Xinyuan Tong	cf9815ba69	[Refactor] Multimodal data processing for VLM (#6659 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-06-04 11:22:33 -07:00
Marc Sun	37f1547587	[FEAT] Add transformers backend support (#5929 )	2025-06-03 21:05:29 -07:00
Lianmin Zheng	2d72fc47cf	Improve profiler and integrate profiler in bench_one_batch_server (#6787 )	2025-05-31 15:53:55 -07:00
shangmingc	6cb00c6398	[PD] Optimize time out logic and add env var doc for mooncake (#6761 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-30 00:45:02 -07:00
Baizhou Zhang	791b3bfabb	[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479 )	2025-05-28 16:03:43 -07:00
Trevor Morris	e806f708c9	[PD] Make bootstrap code common between NIXL and Mooncake (#6473 )	2025-05-27 12:47:38 -07:00
Vincent Zhong	45a31a82e4	docs: Update documentation to reflect xgrammar as default grammar backend (#6601 ) Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2025-05-27 13:29:13 +08:00
Brayden Zhong	1aa0fbf416	Add note to add supported model to documentation (#6640 )	2025-05-27 13:18:46 +08:00
linzhuo	7a0bbe6a64	update toc for doc and dockerfile code style format (#6450 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-27 13:05:11 +08:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
simveit	e235be16fe	Fix some issues with current docs. (#6588 )	2025-05-26 01:04:34 +08:00
Yineng Zhang	7e257cd666	chore: bump v0.4.6.post5 (#6566 )	2025-05-24 00:48:05 -07:00
Chang Su	ed0c3035cd	feat(Tool Calling): Support `required` and specific function mode (#6550 )	2025-05-23 21:00:37 -07:00
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00
Byron Hsu	7513558074	[PD] Add doc and simplify sender.send (#6019 )	2025-05-21 21:22:21 -07:00
Wenxuan Tan	66324895c6	[docs] Fix torch version (#6472 )	2025-05-20 10:53:14 -07:00
fzyzcjy	f0653886a5	Expert distribution recording without overhead for EPLB (#4957 )	2025-05-19 20:07:43 -07:00
simveit	506e5de8fe	Improve supported models doc (#6430 )	2025-05-20 01:43:35 +08:00
applesaucethebun	6dc6b30637	Add missing model to doc (#6396 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-18 12:57:58 -07:00
Vincent Zhong	e9ef39d2e9	docs: Update the MD files (#6373 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-17 09:23:16 -07:00
Kiv Chen	64825b8395	model(vlm): mistral 3.1 (#5099 ) Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>	2025-05-16 18:36:18 -07:00
Yury Sulsky	f19a9204cd	Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136 ) Co-authored-by: Yury Sulsky <ysulsky@tesla.com>	2025-05-16 12:26:15 -07:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00

1 2 3 4 5 ...

541 Commits