sglang

Author	SHA1	Message	Date
Chayenne	ebddb65aed	Docs: add torch compile cache (#4151 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 14:27:09 -08:00
Adarsh Shirawalmath	19fd57bcd7	[docs] fix HF reference script command (#4148 )	2025-03-06 13:21:54 -08:00
Lianmin Zheng	9c58e68b4c	Release v0.4.3.post4 (#4140 )	2025-03-06 12:50:28 -08:00
Oliver Stanley	d03b3467b8	Fix constrained generation errors by adding datasets dependency (#4142 )	2025-03-06 12:07:51 -08:00
yinfan98	ab7fba0ece	Fix nightly ci Gsm8k & Fix flashinfer backend kvcache quant (#4147 )	2025-03-06 11:50:07 -08:00
Lianmin Zheng	bc1534ff32	Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134 ) Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-06 06:13:59 -08:00
Lzhang-hub	3a3918121f	fix bench serving bug (#4135 )	2025-03-06 05:34:02 -08:00
Lianmin Zheng	800bf018fb	Update CODEOWNER (#4138 )	2025-03-06 03:42:10 -08:00
kk	b16af90bc3	AMD/ROCm: update base image string (#4137 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: yichiche <yichiche@amd.com>	2025-03-06 03:38:54 -08:00
Lianmin Zheng	98c73d71cb	[Minor] make the `__init__` function of model_runner.py shorter (#4132 )	2025-03-06 01:51:12 -08:00
Lianmin Zheng	fcc2e37f69	Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128 )	2025-03-06 00:13:20 -08:00
Liu Jinjie	0804dd11a0	remove unused max_jobs in setup_rocm.py (#4126 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-06 00:12:19 -08:00
saienduri	55dc8e4d52	Add tag suffix to nightly docker builds. (#4129 )	2025-03-05 23:22:36 -08:00
Ying Sheng	02e9e9f1cf	Add codeowners for eagle implementations (#4131 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-05 23:16:49 -08:00
simveit	8f0b63139e	Docs: improve EAGLE docs (#4038 )	2025-03-05 22:40:21 -08:00
samzong	b9b3b098b9	feat: support docs auto live-reload with sphinx-autobuild (#4111 ) Signed-off-by: samzong <samzong.lu@gmail.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 22:39:34 -08:00
Zhiqiang Xie	aee30630d8	Add a pointer to the real KV cache pool (#4113 )	2025-03-05 21:39:07 -08:00
Lianmin Zheng	286e6540a6	Remove prefill-only-one-req (#4117 )	2025-03-05 20:58:48 -08:00
Wenxuan Tan	718c391fd7	[Hoxfix] Fix incomplete token_to_kv_pool refactor (#4121 )	2025-03-05 19:32:42 -08:00
Yineng Zhang	fc671f66c1	chore: bump v0.4.3.post3 (#4114 )	2025-03-05 17:26:10 -08:00
samzong	197751e9a1	fix Non-consecutive header level increase in docs/router/router.md (#4099 ) Signed-off-by: samzong <samzong.lu@gmail.com>	2025-03-05 17:02:32 -08:00
samzong	d2d0d061d9	fix cross-reference error and spelling mistakes (#4101 ) Signed-off-by: samzong <samzong.lu@gmail.com>	2025-03-05 16:39:02 -08:00
Yueyang Pan	25482edb5c	Online serving benchmarks of real datasets for hierarchical KV caching (#3211 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-03-05 16:16:43 -08:00
luzengxiangcn	62b362b1f1	Debug radixcache: refactor recursive helper methods (#3029 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-03-05 16:11:42 -08:00
saienduri	44d7646371	remove testing on PR workflow change (#4110 )	2025-03-05 16:03:18 -08:00
saienduri	cd85b78f94	Create release-docker-amd-nightly.yml (#4105 )	2025-03-05 14:46:26 -08:00
Yineng Zhang	0aaccbbfec	revert deepseek docs (#4109 )	2025-03-05 13:23:11 -08:00
Qiaolin Yu	357671e216	Add examples for server token-in-token-out (#4103 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 13:16:31 -08:00
Chayenne	e70fa279bc	Docs: reorganize dpsk docs (#4108 )	2025-03-05 13:01:03 -08:00
Tommy Yang	abe74b7b59	Docs: Add DeepSeek optimization ablations documentation (#4107 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 12:25:51 -08:00
Jhin	70b3c6eeb1	Add update_weights_from_disk endpoint to Engine (#4102 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 12:25:18 -08:00
Ke Bao	ef9d3b3c2c	Fix triton kernel illegal memory issue for eagle (#4100 )	2025-03-05 11:23:53 -08:00
Baizhou Zhang	fc91d08a8f	[Revision] Add fast decode plan for flashinfer mla (#4012 )	2025-03-05 11:20:41 -08:00
HAI	71ab0dabe0	Fix the moe padding conditional logic (#4081 )	2025-03-05 10:56:51 -08:00
Ying Sheng	d3d4d76758	[Eagle] Refactor eagle speculative decoding (#3986 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2025-03-05 08:06:07 -08:00
yigex	5be8f1ed98	ROCM: AITER BLOCK GEMM (#4075 )	2025-03-05 03:10:49 -08:00
Lu Changqi	e5760bc40a	bench: add dataset param for bench_multiturn (#3990 )	2025-03-05 01:21:37 -08:00
Qubitium-ModelCloud	56a724eba3	[QUANT] Add GPTQModel Dynamic Quantization + `lm_head` Quantization (#3790 ) Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>	2025-03-05 01:11:00 -08:00
Mick	583d6af71b	example: add vlm to token in & out example (#3941 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-04 22:18:26 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Qiaolin Yu	4725e3f652	Add examples for returning hidden states when using the server (#4074 )	2025-03-04 19:31:50 -08:00
Lianmin Zheng	77a3954bf7	Simplify eagle tests and TP sync in grammar backend (#4066 )	2025-03-04 13:40:40 -08:00
Ke Bao	03b0364f76	Update nextn ci test (#4071 )	2025-03-04 13:01:24 -08:00
Lianmin Zheng	2dd7d0c533	Revert "Fix nightly-test CI" (#4065 )	2025-03-04 05:38:24 -08:00
William	0d4e3228cf	[Feature] Add test for speculative_token_map (#4016 )	2025-03-04 04:26:24 -08:00
Liu Jinjie	926f8efc0c	remove unused max_jobs (#3607 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-04 04:23:39 -08:00
Xiuyu Li	9545bfb28a	fix: support gelu_new activation function in gpt2 (#3712 )	2025-03-04 04:09:52 -08:00
Michael Feil	37373ef2bb	sgl-router - issues on routing and project build. (#3870 ) (#3948 )	2025-03-04 04:06:30 -08:00
Chen Shengzhi	61261b3996	[XCCL] Use xccl for xpu backend since xccl is ready in latest PyTorch. (#3954 )	2025-03-04 04:05:56 -08:00
DarkSharpness	19120f71f3	[Fix & Style] Refactor the grammar backend to reduce human errors and improve readability (#4030 )	2025-03-04 03:56:45 -08:00

1 2 3 4 5 ...

2270 Commits