Commit Graph

193 Commits

Author SHA1 Message Date
Chayenne
7d1485d376 Add get weights by parameter name for llama (#2266) 2024-11-29 23:36:38 -08:00
Chayenne
7d5d1d3d29 udate weights from disk (#2265) 2024-11-30 01:17:00 +00:00
Lianmin Zheng
fe97a2d40f Simplify tokenizer manager (#2254) 2024-11-29 02:18:51 -08:00
Lianmin Zheng
d4fc1a70e3 Crash the server correctly during error (#2231) 2024-11-28 00:22:39 -08:00
bjmsong
91e5dbf554 add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
2024-11-27 14:57:13 -08:00
Yudi Xue
19f33b3237 add sglang version to get_server_info (#2206) 2024-11-26 12:10:23 -08:00
Andrew Lyu
88c7763f53 Remove unresolved reference 'self' (#2198) 2024-11-26 00:59:58 -08:00
Henry Hyeonmok Ko
dbe1729395 Merged three native APIs into one: get_server_info (#2152) 2024-11-24 01:37:58 -08:00
Henry Hyeonmok Ko
c35cd1f8c7 Expose max total num tokens from Runtime & Engine API (#2092) 2024-11-22 15:10:10 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Byron Hsu
30af7dfb34 [router] add base_gpu_id server args & merged radix tree python reference (#2115) 2024-11-21 17:13:33 -08:00
Ying Sheng
5942dfc00a [feat] Add session control (#2073) 2024-11-20 00:36:53 -08:00
Lianmin Zheng
c29b98e043 Fix json benchmark (#2043) 2024-11-15 05:33:43 -08:00
zolinthecow
f6dd648620 Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
2024-11-14 21:59:33 -08:00
James Xu
ddeb9d42de Add engine encode (#1995)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-11-11 11:48:17 -08:00
Lianmin Zheng
1929c06762 Simplify prometheus metrics (#1981)
Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>
2024-11-10 04:39:32 -08:00
Lianmin Zheng
520f0094e4 [CI] balance unit tests (#1977) 2024-11-09 16:46:14 -08:00
Lianmin Zheng
9c939a3d8b Clean up metrics code (#1972) 2024-11-09 15:43:20 -08:00
Yudi Xue
95a4ed129a Fix metrics (#1963) 2024-11-08 23:21:11 -08:00
Lianmin Zheng
a509552087 [minor] Improve code style and compatibility (#1961) 2024-11-08 02:19:41 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Lzhang-hub
a146d9990e support prometheus metrics (#1853)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-11-05 20:42:53 -08:00
Chayenne
02755768d3 Change judge to classify & Modify make file (#1920) 2024-11-04 23:53:44 -08:00
Lianmin Zheng
2ce32db6fb Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
2024-11-03 13:27:12 -08:00
Lianmin Zheng
c17c578108 Simplify tokenizer manager (#1904) 2024-11-03 08:38:26 -08:00
Chayenne
6aed0445ed turn off log (#1895) 2024-11-03 00:19:12 -07:00
Lianmin Zheng
b548801ddb Update docs (#1839) 2024-10-30 02:49:08 -07:00
Byron Hsu
680cad2023 fix get_memory_pool_size deadlock for DP (#1830) 2024-10-28 23:07:14 -07:00
Byron Hsu
6fcd6d7d6d Support token ids in engine.generate (#1820) 2024-10-27 14:02:34 -07:00
Lianmin Zheng
eaade87a42 Fix unit tests (#1817) 2024-10-27 03:04:54 -07:00
Lianmin Zheng
86fc0d79d0 Add a watch dog thread (#1816) 2024-10-27 02:00:50 -07:00
Ying Sheng
2fce449b1c [API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-10-23 07:02:29 +00:00
Lianmin Zheng
769bf11c05 Fix the race condition in overlap mode (#1712) 2024-10-19 06:50:56 -07:00
Lianmin Zheng
dd3809fad8 Fix engine unit test (#1701) 2024-10-17 09:53:32 -07:00
Lianmin Zheng
7feba41584 Fix failed ci tests on long prompts; Better error messages for embedding models (#1700) 2024-10-17 09:23:29 -07:00
Michael Feil
e5db40dcbc ORJson. Faster Json serialization (#1694) 2024-10-17 08:03:08 -07:00
Lianmin Zheng
02f7f3e488 Update the transformers version in CI (#1690) 2024-10-16 19:03:55 -07:00
Zeng Zhongchao
2782132be8 Add date to logging messages (#1623) (#1679) 2024-10-16 18:54:55 -07:00
Michael Feil
b0facb3316 add orjson for jsonresponse (#1688) 2024-10-16 18:14:30 -07:00
Lianmin Zheng
dbec2f1847 Launch a thread to overlap CPU and GPU (#1687) 2024-10-16 11:20:17 -07:00
Lianmin Zheng
9116b2896f Add a new event loop (#1677) 2024-10-16 01:33:20 -07:00
Patrick Yi
31fad29ab0 Add get_tokenizer function for Engine class (#1653) 2024-10-12 19:39:35 -07:00
Byron Hsu
862cd265e5 [engine] support async and streaming (#1614) 2024-10-11 15:26:25 -07:00
Lianmin Zheng
23cc66f7b6 Add back data parallelism (#1635) 2024-10-11 07:22:48 -07:00
科英
bbd72bfc86 Add the ability to enable and disable the Profiler via HTTP API. (#1626) 2024-10-11 02:34:25 -07:00
Byron Hsu
e8613df071 [Engine] Fix generate hanging issue after the first call (#1606) 2024-10-08 04:26:56 +00:00
Byron Hsu
565b05f02f Use atexit hook to implicitly shutdown Runtime (#1595) 2024-10-07 05:18:45 +00:00
Byron Hsu
551a3a9d38 Provide an offline engine API (#1567) 2024-10-06 20:27:03 -07:00
Lianmin Zheng
114bbc8651 Use ipc instead of tcp in zmq (#1566) 2024-10-04 00:45:52 -07:00