sglang

Author	SHA1	Message	Date
fzyzcjy	9183c23eca	Speed up `update_weights_from_tensor` (#2695 )	2025-01-02 02:05:19 -08:00
Lianmin Zheng	8c3b420eec	[Docs] clean up structured outputs docs (#2654 )	2024-12-29 23:57:16 -08:00
Ying Sheng	e0e09fceeb	[Session] Update session control interface (#2635 )	2024-12-29 02:10:27 -08:00
Lianmin Zheng	3815b23ccb	Clean up wrapper in flashinfer backend (#2638 )	2024-12-29 00:45:57 -08:00
fzyzcjy	fd28640dc5	Add `update_weights_from_tensor` (#2631 )	2024-12-28 13:30:27 -08:00
Lianmin Zheng	855d0ba381	[CI] Fix nightly test and raise better error message (#2626 ) Co-authored-by: Sangbin <rkooo567@gmail.com>	2024-12-27 22:16:39 -08:00
fzyzcjy	b2ed5c8ea7	Tiny code cleanup in tokenizer_manager.py (#2586 )	2024-12-26 17:53:09 -08:00
Lianmin Zheng	8496701934	[Misc] Fix metrics, weight update lock, request logging (#2543 )	2024-12-22 06:27:22 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Lianmin Zheng	f5b2a3aa67	Use proc.join instead of busy waiting (#2374 )	2024-12-06 02:01:23 -08:00
Chayenne	786be44da5	Fix Docs CI When Compile Error (#2323 )	2024-12-04 11:19:46 -08:00
Lianmin Zheng	18108abe5d	[Minor] Fix code style (#2311 )	2024-12-02 02:27:36 -08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Chayenne	7d5d1d3d29	udate weights from disk (#2265 )	2024-11-30 01:17:00 +00:00
Lianmin Zheng	fe97a2d40f	Simplify tokenizer manager (#2254 )	2024-11-29 02:18:51 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
bjmsong	91e5dbf554	add profile in offline benchmark & update doc (#2123 ) Co-authored-by: root <bjmsong@126.com>	2024-11-27 14:57:13 -08:00
Yudi Xue	19f33b3237	add sglang version to get_server_info (#2206 )	2024-11-26 12:10:23 -08:00
Andrew Lyu	88c7763f53	Remove unresolved reference 'self' (#2198 )	2024-11-26 00:59:58 -08:00
Henry Hyeonmok Ko	dbe1729395	Merged three native APIs into one: get_server_info (#2152 )	2024-11-24 01:37:58 -08:00
Henry Hyeonmok Ko	c35cd1f8c7	Expose max total num tokens from Runtime & Engine API (#2092 )	2024-11-22 15:10:10 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Byron Hsu	30af7dfb34	[router] add base_gpu_id server args & merged radix tree python reference (#2115 )	2024-11-21 17:13:33 -08:00
Ying Sheng	5942dfc00a	[feat] Add session control (#2073 )	2024-11-20 00:36:53 -08:00
Lianmin Zheng	c29b98e043	Fix json benchmark (#2043 )	2024-11-15 05:33:43 -08:00
zolinthecow	f6dd648620	Offline LLM Engine Benchmark Throughput (#1968 ) Co-authored-by: ByronHsu <byronhsu1230@gmail.com>	2024-11-14 21:59:33 -08:00
James Xu	ddeb9d42de	Add engine encode (#1995 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-11-11 11:48:17 -08:00
Lianmin Zheng	1929c06762	Simplify prometheus metrics (#1981 ) Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>	2024-11-10 04:39:32 -08:00
Lianmin Zheng	520f0094e4	[CI] balance unit tests (#1977 )	2024-11-09 16:46:14 -08:00
Lianmin Zheng	9c939a3d8b	Clean up metrics code (#1972 )	2024-11-09 15:43:20 -08:00
Yudi Xue	95a4ed129a	Fix metrics (#1963 )	2024-11-08 23:21:11 -08:00
Lianmin Zheng	a509552087	[minor] Improve code style and compatibility (#1961 )	2024-11-08 02:19:41 -08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Lzhang-hub	a146d9990e	support prometheus metrics (#1853 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-11-05 20:42:53 -08:00
Chayenne	02755768d3	Change judge to classify & Modify make file (#1920 )	2024-11-04 23:53:44 -08:00
Lianmin Zheng	2ce32db6fb	Let reward model take text inputs instead of message lists (#1907 ) Co-authored-by: Kyle Corbitt <kyle@corbt.com>	2024-11-03 13:27:12 -08:00
Lianmin Zheng	c17c578108	Simplify tokenizer manager (#1904 )	2024-11-03 08:38:26 -08:00
Chayenne	6aed0445ed	turn off log (#1895 )	2024-11-03 00:19:12 -07:00
Lianmin Zheng	b548801ddb	Update docs (#1839 )	2024-10-30 02:49:08 -07:00
Byron Hsu	680cad2023	fix get_memory_pool_size deadlock for DP (#1830 )	2024-10-28 23:07:14 -07:00
Byron Hsu	6fcd6d7d6d	Support token ids in `engine.generate` (#1820 )	2024-10-27 14:02:34 -07:00
Lianmin Zheng	eaade87a42	Fix unit tests (#1817 )	2024-10-27 03:04:54 -07:00
Lianmin Zheng	86fc0d79d0	Add a watch dog thread (#1816 )	2024-10-27 02:00:50 -07:00
Ying Sheng	2fce449b1c	[API] add get memory pool size (#1760 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-10-23 07:02:29 +00:00
Lianmin Zheng	769bf11c05	Fix the race condition in overlap mode (#1712 )	2024-10-19 06:50:56 -07:00
Lianmin Zheng	dd3809fad8	Fix engine unit test (#1701 )	2024-10-17 09:53:32 -07:00
Lianmin Zheng	7feba41584	Fix failed ci tests on long prompts; Better error messages for embedding models (#1700 )	2024-10-17 09:23:29 -07:00
Michael Feil	e5db40dcbc	ORJson. Faster Json serialization (#1694 )	2024-10-17 08:03:08 -07:00

1 2 3 4 5

206 Commits