Commit Graph

812 Commits

Author SHA1 Message Date
Lianmin Zheng
59cbf47626 Unify the memory pool api and tp worker API (#1724) 2024-10-19 23:19:26 -07:00
Yineng Zhang
cbbc82b7b8 Support qwen2 vl model (#1721)
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ispobock <ISPObaoke@163.com>
2024-10-19 21:44:38 -07:00
Yineng Zhang
8bee20f80b Update vllm to 0.6.3 (#1711) (#1720)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
2024-10-19 20:45:41 -07:00
Lianmin Zheng
12cad0feae Simplify the interface of tp_worker (#1718) 2024-10-19 17:39:38 -07:00
Lianmin Zheng
b6cd903604 Update readme and workflow (#1716) 2024-10-19 13:01:44 -07:00
Lianmin Zheng
087257ea03 Release v0.3.4 (#1714) 2024-10-19 08:17:41 -07:00
Lianmin Zheng
769bf11c05 Fix the race condition in overlap mode (#1712) 2024-10-19 06:50:56 -07:00
Lianmin Zheng
3db43d1b08 Fix is_all_ready for overlap copy (#1710) 2024-10-18 21:01:52 -07:00
Lianmin Zheng
f0f8a7699b Simplify the nan detection and greedy check in sampler (#1709) 2024-10-18 20:21:24 -07:00
Lianmin Zheng
2bcfba1b08 Skip unnecessary penalizer (#1707) 2024-10-18 17:54:03 -07:00
Lianmin Zheng
bc12d4033f Add grouped free operations (#1706) 2024-10-18 13:21:05 -07:00
Lianmin Zheng
392f2863c8 Add dtype for more operations (#1705) 2024-10-18 12:18:15 -07:00
Lianmin Zheng
6d0fa73ece Simplify flashinfer utilities (#1704) 2024-10-17 22:54:14 -07:00
Liangsheng Yin
9e0dac1ad7 Fix regex and logprob conflicts when chunked prefilling (#1703) 2024-10-17 18:33:21 -07:00
Gleb Drozdov
a95d5589c3 Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684) 2024-10-17 18:06:52 +00:00
Lianmin Zheng
d17d19e5b8 Fix mixed batch for multi modal models (#1702) 2024-10-17 10:27:26 -07:00
Lianmin Zheng
dd3809fad8 Fix engine unit test (#1701) 2024-10-17 09:53:32 -07:00
Lianmin Zheng
7feba41584 Fix failed ci tests on long prompts; Better error messages for embedding models (#1700) 2024-10-17 09:23:29 -07:00
Michael Feil
e5db40dcbc ORJson. Faster Json serialization (#1694) 2024-10-17 08:03:08 -07:00
wxsm
b170930534 feat: radix tree code optimize (#1697) 2024-10-17 08:01:27 -07:00
Jani Monoses
5ab20cceba Use SGLang imports for linear layer (#1696) 2024-10-17 07:50:01 -07:00
Lianmin Zheng
02f7f3e488 Update the transformers version in CI (#1690) 2024-10-16 19:03:55 -07:00
Zeng Zhongchao
2782132be8 Add date to logging messages (#1623) (#1679) 2024-10-16 18:54:55 -07:00
Michael Feil
b0facb3316 add orjson for jsonresponse (#1688) 2024-10-16 18:14:30 -07:00
havetc
ecb8bad276 Returning a per request metric for number of cached_tokens read (#1599) 2024-10-16 11:49:22 -07:00
Lianmin Zheng
dbec2f1847 Launch a thread to overlap CPU and GPU (#1687) 2024-10-16 11:20:17 -07:00
Ke Bao
d10b933a36 Fix srt dependency (#1685) 2024-10-16 08:21:20 -07:00
Lianmin Zheng
9116b2896f Add a new event loop (#1677) 2024-10-16 01:33:20 -07:00
Jani Monoses
a5114b6f91 Add OLMo model (#1676) 2024-10-16 00:11:18 -07:00
Liangsheng Yin
b6b4094621 Fix filter_batch function call (#1681) 2024-10-15 22:59:26 -07:00
Lianmin Zheng
f1088e0fc8 Fix memory leak during abort (#1674) 2024-10-15 08:15:08 -07:00
Lianmin Zheng
175afed370 Improve benchmark scripts (#1672) 2024-10-14 21:53:01 -07:00
Lianmin Zheng
4a292f670d [Minor] Add some utility functions (#1671) 2024-10-14 20:08:03 -07:00
Byron Hsu
56503d9bc9 [1/N] Remove CacheConfig import in all model files (#1658) 2024-10-14 09:06:34 -07:00
Lianmin Zheng
02bc95796d Simplify chunked prefill (#1667) 2024-10-14 06:47:50 -07:00
Lianmin Zheng
24f3e1511c [Minor] Improve style (#1666) 2024-10-14 05:25:00 -07:00
Lianmin Zheng
6790240cc3 Fix unit test order to balance the tasks in CI (#1665) 2024-10-14 02:01:44 -07:00
Shuo Yang
061e546313 Support double sparsity (#1459) 2024-10-14 02:00:41 -07:00
Lianmin Zheng
0c1e87964b Move filter_batch out of stream_output (#1663) 2024-10-14 01:15:34 -07:00
Lianmin Zheng
869f1c02c4 Add a test case to test retract (#1662) 2024-10-13 20:32:37 -07:00
Ying Sheng
2725f8da61 [Minor] Rename no_eos_trim to no_stop_trim (#1661) 2024-10-13 20:30:03 -07:00
Lianmin Zheng
da1ffed689 Add output_ids into ScheduleBatch (#1659) 2024-10-13 19:54:02 -07:00
Ying Sheng
4876117171 [Fix] fix eos trim inconsistency (#1650) 2024-10-13 01:07:09 -07:00
Lianmin Zheng
7ee6c259ff Simplify the event loop and expose --num-continuous-decode-steps as an argument (#1652) 2024-10-12 21:35:30 -07:00
Lianmin Zheng
9610fcd469 Fix the batch_is_full check for jump-forward decoding (#1654) 2024-10-12 19:47:24 -07:00
Patrick Yi
31fad29ab0 Add get_tokenizer function for Engine class (#1653) 2024-10-12 19:39:35 -07:00
Lianmin Zheng
9da5a60b18 Add an option to disable penalizer (#1651) 2024-10-12 17:53:23 -07:00
Lianmin Zheng
69aa937aa5 Fix unit tests and type annotations (#1648) 2024-10-12 14:49:24 -07:00
Zhang, Liangang
5d638c92f5 [Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch (#1480) 2024-10-12 18:10:32 +00:00
Lianmin Zheng
e37cdab0c6 Fix ignore_eos (#1645) 2024-10-12 00:36:28 -07:00