sglang

Author	SHA1	Message	Date
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00
Lianmin Zheng	8496701934	[Misc] Fix metrics, weight update lock, request logging (#2543 )	2024-12-22 06:27:22 -08:00
Lianmin Zheng	9cd9dc83b3	Temporarily disable unit test of torch native attention backend (#2492 )	2024-12-16 14:17:27 -08:00
Ke Bao	2f9bd0fafd	Fix correctness issue for triton decoding kernel (#2479 )	2024-12-14 16:50:54 +08:00
Fred Reiss	993956c6b1	Add support for IBM Granite 3.x models (#2437 )	2024-12-11 06:30:23 -08:00
Ying Sheng	8586b72da0	[feat] Enable chunked prefill for llava-onevision (#2412 )	2024-12-09 09:52:38 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Lianmin Zheng	a6ca736c8e	Simplify stream_output (#2398 )	2024-12-08 12:27:13 -08:00
Yineng Zhang	f62055b528	minor: add random flashinfer vs triton use case (#2409 )	2024-12-09 04:15:21 +08:00
Yineng Zhang	74bc9184c3	minor: add random use case (#2408 )	2024-12-09 03:21:35 +08:00
Yineng Zhang	0f8eb15323	feat: support custom task runner (#2407 )	2024-12-09 02:29:55 +08:00
Yineng Zhang	67470bbb28	minor: update correct measurement unit (#2406 )	2024-12-08 20:55:04 +08:00
Ke Bao	61dec545b0	Remove unused vars in the triton backend (#2401 )	2024-12-08 03:37:03 -08:00
Ke Bao	7dc66fcb40	Optimize Triton decoding kernel for long context (#2394 )	2024-12-08 01:17:37 -08:00
Lianmin Zheng	0e7409adb6	Fix the overlap for xgrammar (#2377 )	2024-12-06 05:49:29 -08:00
xiaobochen	3d32e4a32c	Resubmit MoE-EP (#2371 )	2024-12-06 15:05:21 +08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
Lianmin Zheng	18108abe5d	[Minor] Fix code style (#2311 )	2024-12-02 02:27:36 -08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00
Lianmin Zheng	0303ca918f	[CI] Fix missing files in run_suite.py (#2288 )	2024-11-30 23:53:34 -08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	1bfa511b95	[CI] Fix ci tests (#2284 )	2024-11-30 21:12:03 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Chayenne	7d5d1d3d29	udate weights from disk (#2265 )	2024-11-30 01:17:00 +00:00
Lianmin Zheng	fe97a2d40f	Simplify tokenizer manager (#2254 )	2024-11-29 02:18:51 -08:00
Ying Sheng	8b48496aaf	Revert "Revert "Add simple CPU offloading support"" (#2253 ) Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-28 23:58:54 -08:00
Ying Sheng	4057ea82c9	Revert "Add simple CPU offloading support" (#2252 ) We'll re-add the commit to correctly ack Kaichao's authorship	2024-11-28 23:36:55 -08:00
Ying Sheng	b7038fec9b	[fix] Fix prefix caching for multi-image/video (#2239 )	2024-11-28 12:08:13 -08:00
Lianmin Zheng	b2ccf36d4d	Fix memory leak during abort (#2238 )	2024-11-28 02:22:15 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
Jani Monoses	db674e3d24	Add OLMo2 model. (#2233 )	2024-11-28 00:15:20 -08:00
Lianmin Zheng	fed4c6946a	Release v0.3.6.post2 (#2214 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-11-27 03:35:30 -08:00
Ying Sheng	37c8a5761f	[feat] Support session control for vision language models (#2210 )	2024-11-27 00:03:29 -08:00
Lianmin Zheng	c754652fcd	Fix flasky tests (#2212 )	2024-11-26 23:06:20 -08:00
Yudi Xue	19f33b3237	add sglang version to get_server_info (#2206 )	2024-11-26 12:10:23 -08:00
Lianmin Zheng	ea34350d88	Rename double sparsity config file (#2188 )	2024-11-25 17:12:08 -08:00
Lianmin Zheng	1605ae121e	[CI] Minor fix for CI (#2187 )	2024-11-25 16:38:43 -08:00
Rin Intachuen	1aea19f64b	Input_embeds support (#2052 )	2024-11-25 16:35:04 -08:00
Yixin Dong	7f076c2ce6	Update XGrammar to the latest API (#2176 ) Co-authored-by: Ben Gitter <gitterbd@gmail.com>	2024-11-25 15:58:30 -08:00
Lianmin Zheng	3c5538f781	Update CI threshold (#2186 )	2024-11-25 15:24:17 -08:00
Ying Sheng	e1e595d702	[feat] Refactor session control interface and add CI (#2173 )	2024-11-25 12:32:51 -08:00
Lianmin Zheng	254fd130e2	[CI] Split test cases in CI for better load balancing (#2180 )	2024-11-25 04:58:16 -08:00

1 2 3 4 5 ...

318 Commits