Ying Sheng
|
2fce449b1c
|
[API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-10-23 07:02:29 +00:00 |
|
Byron Hsu
|
17536e7e3d
|
Fix edge case for truncated (#1747)
|
2024-10-23 00:00:25 -04:00 |
|
Liangsheng Yin
|
5e1558f1f2
|
Update max_req_len and max_req_input_len (#1748)
|
2024-10-21 16:12:04 -07:00 |
|
Liangsheng Yin
|
94cde10920
|
Llama3.2 vision model support (#1551)
|
2024-10-21 15:01:21 -07:00 |
|
Liangsheng Yin
|
efb099cdee
|
Fix prefill oom (#1743)
|
2024-10-21 03:54:35 -07:00 |
|
Lianmin Zheng
|
b121bc03a3
|
Simplify batch result resolution (#1735)
|
2024-10-20 19:47:14 -07:00 |
|
Lianmin Zheng
|
e12358dc91
|
Simplify the usage of device (#1734)
|
2024-10-20 18:17:41 -07:00 |
|
Lianmin Zheng
|
b48edff67f
|
Split the overlapped version of TpModelWorkerClient into a separate file (#1726)
|
2024-10-20 00:29:29 -07:00 |
|
Lianmin Zheng
|
59cbf47626
|
Unify the memory pool api and tp worker API (#1724)
|
2024-10-19 23:19:26 -07:00 |
|
Lianmin Zheng
|
12cad0feae
|
Simplify the interface of tp_worker (#1718)
|
2024-10-19 17:39:38 -07:00 |
|
Lianmin Zheng
|
769bf11c05
|
Fix the race condition in overlap mode (#1712)
|
2024-10-19 06:50:56 -07:00 |
|
Lianmin Zheng
|
2bcfba1b08
|
Skip unnecessary penalizer (#1707)
|
2024-10-18 17:54:03 -07:00 |
|
Lianmin Zheng
|
bc12d4033f
|
Add grouped free operations (#1706)
|
2024-10-18 13:21:05 -07:00 |
|
Liangsheng Yin
|
9e0dac1ad7
|
Fix regex and logprob conflicts when chunked prefilling (#1703)
|
2024-10-17 18:33:21 -07:00 |
|
havetc
|
ecb8bad276
|
Returning a per request metric for number of cached_tokens read (#1599)
|
2024-10-16 11:49:22 -07:00 |
|
Lianmin Zheng
|
dbec2f1847
|
Launch a thread to overlap CPU and GPU (#1687)
|
2024-10-16 11:20:17 -07:00 |
|
Lianmin Zheng
|
9116b2896f
|
Add a new event loop (#1677)
|
2024-10-16 01:33:20 -07:00 |
|
Lianmin Zheng
|
f1088e0fc8
|
Fix memory leak during abort (#1674)
|
2024-10-15 08:15:08 -07:00 |
|
Lianmin Zheng
|
4a292f670d
|
[Minor] Add some utility functions (#1671)
|
2024-10-14 20:08:03 -07:00 |
|
Lianmin Zheng
|
02bc95796d
|
Simplify chunked prefill (#1667)
|
2024-10-14 06:47:50 -07:00 |
|
Lianmin Zheng
|
24f3e1511c
|
[Minor] Improve style (#1666)
|
2024-10-14 05:25:00 -07:00 |
|
Lianmin Zheng
|
0c1e87964b
|
Move filter_batch out of stream_output (#1663)
|
2024-10-14 01:15:34 -07:00 |
|
Lianmin Zheng
|
869f1c02c4
|
Add a test case to test retract (#1662)
|
2024-10-13 20:32:37 -07:00 |
|
Ying Sheng
|
2725f8da61
|
[Minor] Rename no_eos_trim to no_stop_trim (#1661)
|
2024-10-13 20:30:03 -07:00 |
|
Lianmin Zheng
|
da1ffed689
|
Add output_ids into ScheduleBatch (#1659)
|
2024-10-13 19:54:02 -07:00 |
|
Ying Sheng
|
4876117171
|
[Fix] fix eos trim inconsistency (#1650)
|
2024-10-13 01:07:09 -07:00 |
|
Lianmin Zheng
|
7ee6c259ff
|
Simplify the event loop and expose --num-continuous-decode-steps as an argument (#1652)
|
2024-10-12 21:35:30 -07:00 |
|
Lianmin Zheng
|
9610fcd469
|
Fix the batch_is_full check for jump-forward decoding (#1654)
|
2024-10-12 19:47:24 -07:00 |
|
Lianmin Zheng
|
9da5a60b18
|
Add an option to disable penalizer (#1651)
|
2024-10-12 17:53:23 -07:00 |
|
Lianmin Zheng
|
69aa937aa5
|
Fix unit tests and type annotations (#1648)
|
2024-10-12 14:49:24 -07:00 |
|
Lianmin Zheng
|
23cc66f7b6
|
Add back data parallelism (#1635)
|
2024-10-11 07:22:48 -07:00 |
|
科英
|
bbd72bfc86
|
Add the ability to enable and disable the Profiler via HTTP API. (#1626)
|
2024-10-11 02:34:25 -07:00 |
|
Byron Hsu
|
01fdb2f377
|
Fix test_vision_openai_server on CI (#1620)
|
2024-10-10 16:34:13 -07:00 |
|
Ying Sheng
|
c5325aba75
|
[Profile] Add pytorch profiler (#1604)
|
2024-10-07 14:37:16 -07:00 |
|
Lianmin Zheng
|
ebbc42d989
|
Optimize broadcast & Reorg code (#1598)
|
2024-10-07 13:19:23 -07:00 |
|
Lianmin Zheng
|
b6aad70ab1
|
[Fix] Fix the case where prompt_len = 0 (#1593)
|
2024-10-06 20:30:02 -07:00 |
|
Lianmin Zheng
|
58d1082e39
|
Clean up event loop (#1586)
|
2024-10-06 03:24:04 -07:00 |
|
Lianmin Zheng
|
9244f27f0a
|
[Minor] Improve the style and fix flaky tests (#1584)
|
2024-10-06 00:10:48 -07:00 |
|
Liangsheng Yin
|
5d0ba4038f
|
Refine the add request reasons to avoid corner cases. (#1574)
|
2024-10-04 18:00:18 -07:00 |
|
Ying Sheng
|
04b262cd91
|
[Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-10-04 08:51:11 +00:00 |
|
Lianmin Zheng
|
114bbc8651
|
Use ipc instead of tcp in zmq (#1566)
|
2024-10-04 00:45:52 -07:00 |
|
Lianmin Zheng
|
32eb6e96f2
|
Organize sampling batch info better (#1562)
|
2024-10-03 18:29:49 -07:00 |
|
Lianmin Zheng
|
63ba2f8d7b
|
Clean up batch data structures: Introducing ModelWorkerBatch (#1544)
|
2024-09-30 06:41:49 -07:00 |
|
Lianmin Zheng
|
36d5acfca5
|
Rename InputMetadata -> ForwardBatch (#1543)
|
2024-09-30 02:41:11 -07:00 |
|
Lianmin Zheng
|
3f0fe08d37
|
Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541)
|
2024-09-29 20:28:45 -07:00 |
|
Lianmin Zheng
|
f86c1e611f
|
Move scheduler code from tp_worker.py to scheduler.py (#1538)
|
2024-09-29 17:42:45 -07:00 |
|
Lianmin Zheng
|
048685430d
|
Improve process creation (#1534)
|
2024-09-29 02:36:12 -07:00 |
|