sglang

Author	SHA1	Message	Date
xiaobochen	3d32e4a32c	Resubmit MoE-EP (#2371 )	2024-12-06 15:05:21 +08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
Lianmin Zheng	18108abe5d	[Minor] Fix code style (#2311 )	2024-12-02 02:27:36 -08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00
Lianmin Zheng	0303ca918f	[CI] Fix missing files in run_suite.py (#2288 )	2024-11-30 23:53:34 -08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	1bfa511b95	[CI] Fix ci tests (#2284 )	2024-11-30 21:12:03 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Chayenne	7d5d1d3d29	udate weights from disk (#2265 )	2024-11-30 01:17:00 +00:00
Lianmin Zheng	fe97a2d40f	Simplify tokenizer manager (#2254 )	2024-11-29 02:18:51 -08:00
Ying Sheng	8b48496aaf	Revert "Revert "Add simple CPU offloading support"" (#2253 ) Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-28 23:58:54 -08:00
Ying Sheng	4057ea82c9	Revert "Add simple CPU offloading support" (#2252 ) We'll re-add the commit to correctly ack Kaichao's authorship	2024-11-28 23:36:55 -08:00
Ying Sheng	b7038fec9b	[fix] Fix prefix caching for multi-image/video (#2239 )	2024-11-28 12:08:13 -08:00
Lianmin Zheng	b2ccf36d4d	Fix memory leak during abort (#2238 )	2024-11-28 02:22:15 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
Jani Monoses	db674e3d24	Add OLMo2 model. (#2233 )	2024-11-28 00:15:20 -08:00
Lianmin Zheng	fed4c6946a	Release v0.3.6.post2 (#2214 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-11-27 03:35:30 -08:00
Ying Sheng	37c8a5761f	[feat] Support session control for vision language models (#2210 )	2024-11-27 00:03:29 -08:00
Lianmin Zheng	c754652fcd	Fix flasky tests (#2212 )	2024-11-26 23:06:20 -08:00
Yudi Xue	19f33b3237	add sglang version to get_server_info (#2206 )	2024-11-26 12:10:23 -08:00
Lianmin Zheng	ea34350d88	Rename double sparsity config file (#2188 )	2024-11-25 17:12:08 -08:00
Lianmin Zheng	1605ae121e	[CI] Minor fix for CI (#2187 )	2024-11-25 16:38:43 -08:00
Rin Intachuen	1aea19f64b	Input_embeds support (#2052 )	2024-11-25 16:35:04 -08:00
Yixin Dong	7f076c2ce6	Update XGrammar to the latest API (#2176 ) Co-authored-by: Ben Gitter <gitterbd@gmail.com>	2024-11-25 15:58:30 -08:00
Lianmin Zheng	3c5538f781	Update CI threshold (#2186 )	2024-11-25 15:24:17 -08:00
Ying Sheng	e1e595d702	[feat] Refactor session control interface and add CI (#2173 )	2024-11-25 12:32:51 -08:00
Lianmin Zheng	254fd130e2	[CI] Split test cases in CI for better load balancing (#2180 )	2024-11-25 04:58:16 -08:00
Lianmin Zheng	5652c56535	Update CI threshold & Improve code style (#2159 )	2024-11-24 06:29:38 -08:00
Henry Hyeonmok Ko	dbe1729395	Merged three native APIs into one: get_server_info (#2152 )	2024-11-24 01:37:58 -08:00
Lianmin Zheng	a78d8f8db3	[CI] Fix test cases (#2137 )	2024-11-23 01:00:07 -08:00
Jani Monoses	d98fa1e93d	Add simple CPU offloading support. (#2081 )	2024-11-23 06:23:53 +00:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Yineng Zhang	4f8c3aeafc	minor: update gsm8k threshold (#2125 )	2024-11-22 19:23:58 +08:00
Lianmin Zheng	dfec7fca06	Rename sglang.bench_latency to sglang.bench_one_batch (#2118 )	2024-11-21 20:07:48 -08:00
Jake Poznanski	8048c28c11	Fix #2037 - Context length check does not take into out pad tokens for visual models (#2106 )	2024-11-21 19:05:41 -08:00
James Xu	f6f713797b	Add support for Qwen2-VL-based embedding models (#2055 )	2024-11-21 14:24:25 -08:00
Lianmin Zheng	56a347f7d3	Move test_session_id.py to playground (#2104 )	2024-11-20 01:28:27 -08:00
Ying Sheng	5942dfc00a	[feat] Add session control (#2073 )	2024-11-20 00:36:53 -08:00
Lianmin Zheng	7d671e4ad2	Enable overlap by default (#2067 )	2024-11-19 22:07:58 -08:00
Yineng Zhang	f239268fad	minor: update gsm8k eval (#2091 )	2024-11-19 20:36:55 +08:00
Lianmin Zheng	b7a065eae3	Use cuda event wait and synchronization instead of busy waiting (#2089 )	2024-11-19 00:21:46 -08:00
Lianmin Zheng	b110453802	Simplify logits penalizer (#2086 )	2024-11-18 17:48:28 -08:00
Lianmin Zheng	80e2c4a8de	Fix chunked prefill with output logprob (#2083 )	2024-11-18 13:16:28 -08:00

1 2 3 4 5 ...

302 Commits