Commit Graph

17 Commits

Author SHA1 Message Date
Liangsheng Yin
62757db6f0 Reduce the overhead when cache is disabled (#1010) 2024-08-09 16:36:57 -07:00
gryffindor-rr
9cf0a5bada Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
2024-08-09 12:14:13 -07:00
Juwan Yoo
10bca45bc6 bugfix: penalizers to be merged before reqs (#1001) 2024-08-09 21:46:24 +10:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00
Juwan Yoo
ab7875941b feat: frequency, min_new_tokens, presence, and repetition penalties (#973) 2024-08-08 04:21:08 -07:00
yichuan~
3a79613c28 support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 09:41:57 +00:00
Liangsheng Yin
1ac304eeb4 Adjust InputeMetadata and ScheduleBatch (#981) 2024-08-08 01:11:22 -07:00
Liangsheng Yin
2b8257f325 Adjust max prefix len (#980) 2024-08-08 00:41:26 +00:00
Liangsheng Yin
7623091d97 RadixCache method adjust (#977) 2024-08-07 15:52:24 -07:00
Zhiqiang Xie
6db27f7b3b misc: correct the int data type for token ids and indices (#969) 2024-08-08 04:40:07 +08:00
Liangsheng Yin
7fa54a1ab3 Make req_pool_indices on CPU (#960) 2024-08-07 01:41:25 -07:00
Liangsheng Yin
87e8c090e9 Organize code (rename, movement) (#953) 2024-08-06 20:50:32 -07:00
Ke Bao
e1eae1fd15 Support MLA for DeepSeek-V2 with Triton - step 1 (#905) 2024-08-05 03:40:33 +10:00
min-xu-et
7dd8a7e6d9 fixed an error handling in bench_latency.py (#904) 2024-08-03 17:42:17 -07:00
Liangsheng Yin
c020f9ceda Support chunked prefill when radix cache is disabled (#811) 2024-08-01 00:29:01 -07:00
Ying Sheng
e7487b08bc Adjust default mem fraction to avoid OOM (#823) 2024-07-30 01:58:31 -07:00
Liangsheng Yin
cdcbde5fc3 Code structure refactor (#807) 2024-07-29 23:04:48 -07:00