Liangsheng Yin
|
94cde10920
|
Llama3.2 vision model support (#1551)
|
2024-10-21 15:01:21 -07:00 |
|
Lianmin Zheng
|
00611286a1
|
Fix sliding window attention and gemma-2 unit tests in CI (#1746)
|
2024-10-21 13:47:12 -07:00 |
|
Lianmin Zheng
|
7ce3606891
|
Faster overlap mode scheduler (#1738)
|
2024-10-21 04:30:52 -07:00 |
|
Liangsheng Yin
|
efb099cdee
|
Fix prefill oom (#1743)
|
2024-10-21 03:54:35 -07:00 |
|
Lianmin Zheng
|
09603c6dc9
|
Maintain seq_lens_sum to make more FlashInfer operations non-blocking (#1741)
|
2024-10-21 01:43:16 -07:00 |
|
Lianmin Zheng
|
cf470fea32
|
Make token mapping non-blocking in the overlapped mode (#1740)
|
2024-10-20 23:25:14 -07:00 |
|
sixgod
|
45d5af2416
|
Add GLM-4 TextGeneration Model support for SGLang (#1736)
|
2024-10-21 04:08:30 +00:00 |
|
Lianmin Zheng
|
b121bc03a3
|
Simplify batch result resolution (#1735)
|
2024-10-20 19:47:14 -07:00 |
|
Lianmin Zheng
|
e12358dc91
|
Simplify the usage of device (#1734)
|
2024-10-20 18:17:41 -07:00 |
|
yizhang2077
|
554fbf93cd
|
[Bugfix] qwen2vl forward_extend (#1727)
|
2024-10-20 02:38:35 -07:00 |
|
Lianmin Zheng
|
b48edff67f
|
Split the overlapped version of TpModelWorkerClient into a separate file (#1726)
|
2024-10-20 00:29:29 -07:00 |
|
Lianmin Zheng
|
59cbf47626
|
Unify the memory pool api and tp worker API (#1724)
|
2024-10-19 23:19:26 -07:00 |
|
Yineng Zhang
|
cbbc82b7b8
|
Support qwen2 vl model (#1721)
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ispobock <ISPObaoke@163.com>
|
2024-10-19 21:44:38 -07:00 |
|
Yineng Zhang
|
8bee20f80b
|
Update vllm to 0.6.3 (#1711) (#1720)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
|
2024-10-19 20:45:41 -07:00 |
|
Lianmin Zheng
|
12cad0feae
|
Simplify the interface of tp_worker (#1718)
|
2024-10-19 17:39:38 -07:00 |
|
Lianmin Zheng
|
b6cd903604
|
Update readme and workflow (#1716)
|
2024-10-19 13:01:44 -07:00 |
|
Lianmin Zheng
|
087257ea03
|
Release v0.3.4 (#1714)
|
2024-10-19 08:17:41 -07:00 |
|
Lianmin Zheng
|
769bf11c05
|
Fix the race condition in overlap mode (#1712)
|
2024-10-19 06:50:56 -07:00 |
|
Lianmin Zheng
|
3db43d1b08
|
Fix is_all_ready for overlap copy (#1710)
|
2024-10-18 21:01:52 -07:00 |
|
Lianmin Zheng
|
f0f8a7699b
|
Simplify the nan detection and greedy check in sampler (#1709)
|
2024-10-18 20:21:24 -07:00 |
|
Lianmin Zheng
|
2bcfba1b08
|
Skip unnecessary penalizer (#1707)
|
2024-10-18 17:54:03 -07:00 |
|
Lianmin Zheng
|
bc12d4033f
|
Add grouped free operations (#1706)
|
2024-10-18 13:21:05 -07:00 |
|
Lianmin Zheng
|
392f2863c8
|
Add dtype for more operations (#1705)
|
2024-10-18 12:18:15 -07:00 |
|
Lianmin Zheng
|
6d0fa73ece
|
Simplify flashinfer utilities (#1704)
|
2024-10-17 22:54:14 -07:00 |
|
Liangsheng Yin
|
9e0dac1ad7
|
Fix regex and logprob conflicts when chunked prefilling (#1703)
|
2024-10-17 18:33:21 -07:00 |
|
Gleb Drozdov
|
a95d5589c3
|
Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684)
|
2024-10-17 18:06:52 +00:00 |
|
Lianmin Zheng
|
d17d19e5b8
|
Fix mixed batch for multi modal models (#1702)
|
2024-10-17 10:27:26 -07:00 |
|
Lianmin Zheng
|
dd3809fad8
|
Fix engine unit test (#1701)
|
2024-10-17 09:53:32 -07:00 |
|
Lianmin Zheng
|
7feba41584
|
Fix failed ci tests on long prompts; Better error messages for embedding models (#1700)
|
2024-10-17 09:23:29 -07:00 |
|
Michael Feil
|
e5db40dcbc
|
ORJson. Faster Json serialization (#1694)
|
2024-10-17 08:03:08 -07:00 |
|
wxsm
|
b170930534
|
feat: radix tree code optimize (#1697)
|
2024-10-17 08:01:27 -07:00 |
|
Jani Monoses
|
5ab20cceba
|
Use SGLang imports for linear layer (#1696)
|
2024-10-17 07:50:01 -07:00 |
|
Lianmin Zheng
|
02f7f3e488
|
Update the transformers version in CI (#1690)
|
2024-10-16 19:03:55 -07:00 |
|
Zeng Zhongchao
|
2782132be8
|
Add date to logging messages (#1623) (#1679)
|
2024-10-16 18:54:55 -07:00 |
|
Michael Feil
|
b0facb3316
|
add orjson for jsonresponse (#1688)
|
2024-10-16 18:14:30 -07:00 |
|
havetc
|
ecb8bad276
|
Returning a per request metric for number of cached_tokens read (#1599)
|
2024-10-16 11:49:22 -07:00 |
|
Lianmin Zheng
|
dbec2f1847
|
Launch a thread to overlap CPU and GPU (#1687)
|
2024-10-16 11:20:17 -07:00 |
|
Ke Bao
|
d10b933a36
|
Fix srt dependency (#1685)
|
2024-10-16 08:21:20 -07:00 |
|
Lianmin Zheng
|
9116b2896f
|
Add a new event loop (#1677)
|
2024-10-16 01:33:20 -07:00 |
|
Jani Monoses
|
a5114b6f91
|
Add OLMo model (#1676)
|
2024-10-16 00:11:18 -07:00 |
|
Liangsheng Yin
|
b6b4094621
|
Fix filter_batch function call (#1681)
|
2024-10-15 22:59:26 -07:00 |
|
Lianmin Zheng
|
f1088e0fc8
|
Fix memory leak during abort (#1674)
|
2024-10-15 08:15:08 -07:00 |
|
Lianmin Zheng
|
175afed370
|
Improve benchmark scripts (#1672)
|
2024-10-14 21:53:01 -07:00 |
|
Lianmin Zheng
|
4a292f670d
|
[Minor] Add some utility functions (#1671)
|
2024-10-14 20:08:03 -07:00 |
|
Byron Hsu
|
56503d9bc9
|
[1/N] Remove CacheConfig import in all model files (#1658)
|
2024-10-14 09:06:34 -07:00 |
|
Lianmin Zheng
|
02bc95796d
|
Simplify chunked prefill (#1667)
|
2024-10-14 06:47:50 -07:00 |
|
Lianmin Zheng
|
24f3e1511c
|
[Minor] Improve style (#1666)
|
2024-10-14 05:25:00 -07:00 |
|
Lianmin Zheng
|
6790240cc3
|
Fix unit test order to balance the tasks in CI (#1665)
|
2024-10-14 02:01:44 -07:00 |
|
Shuo Yang
|
061e546313
|
Support double sparsity (#1459)
|
2024-10-14 02:00:41 -07:00 |
|
Lianmin Zheng
|
0c1e87964b
|
Move filter_batch out of stream_output (#1663)
|
2024-10-14 01:15:34 -07:00 |
|