Lianmin Zheng
|
0d800090b4
|
Fix missing additional_stop_token_ids (#1769)
|
2024-10-23 12:18:59 -07:00 |
|
Lianmin Zheng
|
80a905475d
|
Fix stop condition for <|eom_id|> (#1766)
|
2024-10-23 10:47:12 -07:00 |
|
Lianmin Zheng
|
9af7b88e3c
|
[Fix] Fix abort in dp (#1767)
|
2024-10-23 10:46:29 -07:00 |
|
Lianmin Zheng
|
fbcbb26327
|
Fix perf regression for set_kv_buffer (#1765)
|
2024-10-23 09:57:08 -07:00 |
|
Ying Sheng
|
2fce449b1c
|
[API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-10-23 07:02:29 +00:00 |
|
Lianmin Zheng
|
ad4125d1a9
|
Fuse more ops & Simplify token mapping (#1758)
|
2024-10-22 23:20:43 -07:00 |
|
Byron Hsu
|
17536e7e3d
|
Fix edge case for truncated (#1747)
|
2024-10-23 00:00:25 -04:00 |
|
Lianmin Zheng
|
1f26e8b8e4
|
Release v0.3.4.post1 (#1749)
|
2024-10-21 21:16:43 -07:00 |
|
Liangsheng Yin
|
5e1558f1f2
|
Update max_req_len and max_req_input_len (#1748)
|
2024-10-21 16:12:04 -07:00 |
|
Liangsheng Yin
|
94cde10920
|
Llama3.2 vision model support (#1551)
|
2024-10-21 15:01:21 -07:00 |
|
Lianmin Zheng
|
00611286a1
|
Fix sliding window attention and gemma-2 unit tests in CI (#1746)
|
2024-10-21 13:47:12 -07:00 |
|
Lianmin Zheng
|
7ce3606891
|
Faster overlap mode scheduler (#1738)
|
2024-10-21 04:30:52 -07:00 |
|
Liangsheng Yin
|
efb099cdee
|
Fix prefill oom (#1743)
|
2024-10-21 03:54:35 -07:00 |
|
Lianmin Zheng
|
09603c6dc9
|
Maintain seq_lens_sum to make more FlashInfer operations non-blocking (#1741)
|
2024-10-21 01:43:16 -07:00 |
|
Lianmin Zheng
|
cf470fea32
|
Make token mapping non-blocking in the overlapped mode (#1740)
|
2024-10-20 23:25:14 -07:00 |
|
sixgod
|
45d5af2416
|
Add GLM-4 TextGeneration Model support for SGLang (#1736)
|
2024-10-21 04:08:30 +00:00 |
|
Lianmin Zheng
|
b121bc03a3
|
Simplify batch result resolution (#1735)
|
2024-10-20 19:47:14 -07:00 |
|
Lianmin Zheng
|
e12358dc91
|
Simplify the usage of device (#1734)
|
2024-10-20 18:17:41 -07:00 |
|
yizhang2077
|
554fbf93cd
|
[Bugfix] qwen2vl forward_extend (#1727)
|
2024-10-20 02:38:35 -07:00 |
|
Lianmin Zheng
|
b48edff67f
|
Split the overlapped version of TpModelWorkerClient into a separate file (#1726)
|
2024-10-20 00:29:29 -07:00 |
|
Lianmin Zheng
|
59cbf47626
|
Unify the memory pool api and tp worker API (#1724)
|
2024-10-19 23:19:26 -07:00 |
|
Yineng Zhang
|
cbbc82b7b8
|
Support qwen2 vl model (#1721)
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ispobock <ISPObaoke@163.com>
|
2024-10-19 21:44:38 -07:00 |
|
Yineng Zhang
|
8bee20f80b
|
Update vllm to 0.6.3 (#1711) (#1720)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
|
2024-10-19 20:45:41 -07:00 |
|
Lianmin Zheng
|
12cad0feae
|
Simplify the interface of tp_worker (#1718)
|
2024-10-19 17:39:38 -07:00 |
|
Lianmin Zheng
|
b6cd903604
|
Update readme and workflow (#1716)
|
2024-10-19 13:01:44 -07:00 |
|
Lianmin Zheng
|
087257ea03
|
Release v0.3.4 (#1714)
|
2024-10-19 08:17:41 -07:00 |
|
Lianmin Zheng
|
769bf11c05
|
Fix the race condition in overlap mode (#1712)
|
2024-10-19 06:50:56 -07:00 |
|
Lianmin Zheng
|
3db43d1b08
|
Fix is_all_ready for overlap copy (#1710)
|
2024-10-18 21:01:52 -07:00 |
|
Lianmin Zheng
|
f0f8a7699b
|
Simplify the nan detection and greedy check in sampler (#1709)
|
2024-10-18 20:21:24 -07:00 |
|
Lianmin Zheng
|
2bcfba1b08
|
Skip unnecessary penalizer (#1707)
|
2024-10-18 17:54:03 -07:00 |
|
Lianmin Zheng
|
bc12d4033f
|
Add grouped free operations (#1706)
|
2024-10-18 13:21:05 -07:00 |
|
Lianmin Zheng
|
392f2863c8
|
Add dtype for more operations (#1705)
|
2024-10-18 12:18:15 -07:00 |
|
Lianmin Zheng
|
6d0fa73ece
|
Simplify flashinfer utilities (#1704)
|
2024-10-17 22:54:14 -07:00 |
|
Liangsheng Yin
|
9e0dac1ad7
|
Fix regex and logprob conflicts when chunked prefilling (#1703)
|
2024-10-17 18:33:21 -07:00 |
|
Gleb Drozdov
|
a95d5589c3
|
Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684)
|
2024-10-17 18:06:52 +00:00 |
|
Lianmin Zheng
|
d17d19e5b8
|
Fix mixed batch for multi modal models (#1702)
|
2024-10-17 10:27:26 -07:00 |
|
Lianmin Zheng
|
dd3809fad8
|
Fix engine unit test (#1701)
|
2024-10-17 09:53:32 -07:00 |
|
Lianmin Zheng
|
7feba41584
|
Fix failed ci tests on long prompts; Better error messages for embedding models (#1700)
|
2024-10-17 09:23:29 -07:00 |
|
Michael Feil
|
e5db40dcbc
|
ORJson. Faster Json serialization (#1694)
|
2024-10-17 08:03:08 -07:00 |
|
wxsm
|
b170930534
|
feat: radix tree code optimize (#1697)
|
2024-10-17 08:01:27 -07:00 |
|
Jani Monoses
|
5ab20cceba
|
Use SGLang imports for linear layer (#1696)
|
2024-10-17 07:50:01 -07:00 |
|
Lianmin Zheng
|
02f7f3e488
|
Update the transformers version in CI (#1690)
|
2024-10-16 19:03:55 -07:00 |
|
Zeng Zhongchao
|
2782132be8
|
Add date to logging messages (#1623) (#1679)
|
2024-10-16 18:54:55 -07:00 |
|
Michael Feil
|
b0facb3316
|
add orjson for jsonresponse (#1688)
|
2024-10-16 18:14:30 -07:00 |
|
havetc
|
ecb8bad276
|
Returning a per request metric for number of cached_tokens read (#1599)
|
2024-10-16 11:49:22 -07:00 |
|
Lianmin Zheng
|
dbec2f1847
|
Launch a thread to overlap CPU and GPU (#1687)
|
2024-10-16 11:20:17 -07:00 |
|
Ke Bao
|
d10b933a36
|
Fix srt dependency (#1685)
|
2024-10-16 08:21:20 -07:00 |
|
Lianmin Zheng
|
9116b2896f
|
Add a new event loop (#1677)
|
2024-10-16 01:33:20 -07:00 |
|
Jani Monoses
|
a5114b6f91
|
Add OLMo model (#1676)
|
2024-10-16 00:11:18 -07:00 |
|
Liangsheng Yin
|
b6b4094621
|
Fix filter_batch function call (#1681)
|
2024-10-15 22:59:26 -07:00 |
|