Commit Graph

206 Commits

Author SHA1 Message Date
fzyzcjy
9183c23eca Speed up update_weights_from_tensor (#2695) 2025-01-02 02:05:19 -08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
Ying Sheng
e0e09fceeb [Session] Update session control interface (#2635) 2024-12-29 02:10:27 -08:00
Lianmin Zheng
3815b23ccb Clean up wrapper in flashinfer backend (#2638) 2024-12-29 00:45:57 -08:00
fzyzcjy
fd28640dc5 Add update_weights_from_tensor (#2631) 2024-12-28 13:30:27 -08:00
Lianmin Zheng
855d0ba381 [CI] Fix nightly test and raise better error message (#2626)
Co-authored-by: Sangbin <rkooo567@gmail.com>
2024-12-27 22:16:39 -08:00
fzyzcjy
b2ed5c8ea7 Tiny code cleanup in tokenizer_manager.py (#2586) 2024-12-26 17:53:09 -08:00
Lianmin Zheng
8496701934 [Misc] Fix metrics, weight update lock, request logging (#2543) 2024-12-22 06:27:22 -08:00
Lianmin Zheng
641b7d0ae0 [Minor] Improve code style (#2422) 2024-12-09 06:30:35 -08:00
Lianmin Zheng
f5b2a3aa67 Use proc.join instead of busy waiting (#2374) 2024-12-06 02:01:23 -08:00
Chayenne
786be44da5 Fix Docs CI When Compile Error (#2323) 2024-12-04 11:19:46 -08:00
Lianmin Zheng
18108abe5d [Minor] Fix code style (#2311) 2024-12-02 02:27:36 -08:00
Chayenne
983bfcf386 Online weight updates from torch.distributed (#2279) 2024-12-01 23:23:18 -08:00
Chayenne
7d1485d376 Add get weights by parameter name for llama (#2266) 2024-11-29 23:36:38 -08:00
Chayenne
7d5d1d3d29 udate weights from disk (#2265) 2024-11-30 01:17:00 +00:00
Lianmin Zheng
fe97a2d40f Simplify tokenizer manager (#2254) 2024-11-29 02:18:51 -08:00
Lianmin Zheng
d4fc1a70e3 Crash the server correctly during error (#2231) 2024-11-28 00:22:39 -08:00
bjmsong
91e5dbf554 add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
2024-11-27 14:57:13 -08:00
Yudi Xue
19f33b3237 add sglang version to get_server_info (#2206) 2024-11-26 12:10:23 -08:00
Andrew Lyu
88c7763f53 Remove unresolved reference 'self' (#2198) 2024-11-26 00:59:58 -08:00
Henry Hyeonmok Ko
dbe1729395 Merged three native APIs into one: get_server_info (#2152) 2024-11-24 01:37:58 -08:00
Henry Hyeonmok Ko
c35cd1f8c7 Expose max total num tokens from Runtime & Engine API (#2092) 2024-11-22 15:10:10 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Byron Hsu
30af7dfb34 [router] add base_gpu_id server args & merged radix tree python reference (#2115) 2024-11-21 17:13:33 -08:00
Ying Sheng
5942dfc00a [feat] Add session control (#2073) 2024-11-20 00:36:53 -08:00
Lianmin Zheng
c29b98e043 Fix json benchmark (#2043) 2024-11-15 05:33:43 -08:00
zolinthecow
f6dd648620 Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
2024-11-14 21:59:33 -08:00
James Xu
ddeb9d42de Add engine encode (#1995)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-11-11 11:48:17 -08:00
Lianmin Zheng
1929c06762 Simplify prometheus metrics (#1981)
Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>
2024-11-10 04:39:32 -08:00
Lianmin Zheng
520f0094e4 [CI] balance unit tests (#1977) 2024-11-09 16:46:14 -08:00
Lianmin Zheng
9c939a3d8b Clean up metrics code (#1972) 2024-11-09 15:43:20 -08:00
Yudi Xue
95a4ed129a Fix metrics (#1963) 2024-11-08 23:21:11 -08:00
Lianmin Zheng
a509552087 [minor] Improve code style and compatibility (#1961) 2024-11-08 02:19:41 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Lzhang-hub
a146d9990e support prometheus metrics (#1853)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-11-05 20:42:53 -08:00
Chayenne
02755768d3 Change judge to classify & Modify make file (#1920) 2024-11-04 23:53:44 -08:00
Lianmin Zheng
2ce32db6fb Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
2024-11-03 13:27:12 -08:00
Lianmin Zheng
c17c578108 Simplify tokenizer manager (#1904) 2024-11-03 08:38:26 -08:00
Chayenne
6aed0445ed turn off log (#1895) 2024-11-03 00:19:12 -07:00
Lianmin Zheng
b548801ddb Update docs (#1839) 2024-10-30 02:49:08 -07:00
Byron Hsu
680cad2023 fix get_memory_pool_size deadlock for DP (#1830) 2024-10-28 23:07:14 -07:00
Byron Hsu
6fcd6d7d6d Support token ids in engine.generate (#1820) 2024-10-27 14:02:34 -07:00
Lianmin Zheng
eaade87a42 Fix unit tests (#1817) 2024-10-27 03:04:54 -07:00
Lianmin Zheng
86fc0d79d0 Add a watch dog thread (#1816) 2024-10-27 02:00:50 -07:00
Ying Sheng
2fce449b1c [API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-10-23 07:02:29 +00:00
Lianmin Zheng
769bf11c05 Fix the race condition in overlap mode (#1712) 2024-10-19 06:50:56 -07:00
Lianmin Zheng
dd3809fad8 Fix engine unit test (#1701) 2024-10-17 09:53:32 -07:00
Lianmin Zheng
7feba41584 Fix failed ci tests on long prompts; Better error messages for embedding models (#1700) 2024-10-17 09:23:29 -07:00
Michael Feil
e5db40dcbc ORJson. Faster Json serialization (#1694) 2024-10-17 08:03:08 -07:00