Commit Graph

473 Commits

Author SHA1 Message Date
Juwan Yoo
10bca45bc6 bugfix: penalizers to be merged before reqs (#1001) 2024-08-09 21:46:24 +10:00
liuyhwangyh
b91a4cb1b1 support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-08-09 02:52:14 -07:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00
Ying Sheng
9f662501a3 Move torch.compile configs into cuda_graph_runner.py (#993) 2024-08-08 13:20:30 -07:00
Juwan Yoo
ab7875941b feat: frequency, min_new_tokens, presence, and repetition penalties (#973) 2024-08-08 04:21:08 -07:00
yichuan~
3a79613c28 support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 09:41:57 +00:00
Liangsheng Yin
1ac304eeb4 Adjust InputeMetadata and ScheduleBatch (#981) 2024-08-08 01:11:22 -07:00
Ying Sheng
20a4f927dc Add io struct for embedding models [unreachable code] - step 2/3 (#987) 2024-08-08 07:52:31 +00:00
Ying Sheng
0de7c2d09e Add e5-mistral modules [unreachable code] - step 1/3 (#983) 2024-08-08 00:04:15 -07:00
Liangsheng Yin
6ed4e3b8fb Fix chunked prefill (#984) 2024-08-07 22:28:42 -07:00
Ying Sheng
00023d622a [minor] Update type annotation in tokenizer_manager.py (#982) 2024-08-08 01:48:45 +00:00
foszto
c62d560c03 #590 Increase default , track changes in examples and documentation (#971)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 00:54:46 +00:00
Liangsheng Yin
2b8257f325 Adjust max prefix len (#980) 2024-08-08 00:41:26 +00:00
Liangsheng Yin
7623091d97 RadixCache method adjust (#977) 2024-08-07 15:52:24 -07:00
Liangsheng Yin
f724f1f1e9 PrefillAdder abstraction (#968) 2024-08-07 13:47:28 -07:00
Zhiqiang Xie
6db27f7b3b misc: correct the int data type for token ids and indices (#969) 2024-08-08 04:40:07 +08:00
Yineng Zhang
dc9d06d886 chore: bump v0.2.11 (#970) 2024-08-07 20:47:53 +08:00
Yineng Zhang
c31f084c71 chore: update vllm to 0.5.4 (#966) 2024-08-07 21:15:41 +10:00
Liangsheng Yin
a01ddd9605 misc: fix the req_to_token member change (#967) 2024-08-07 01:52:10 -07:00
Liangsheng Yin
7fa54a1ab3 Make req_pool_indices on CPU (#960) 2024-08-07 01:41:25 -07:00
Yineng Zhang
05abd1261c misc: add compute capability in check_env (#965) 2024-08-07 18:39:36 +10:00
Ying Sheng
ff68ae857a Show more error messages for warmup errors (#932) 2024-08-06 23:57:06 -07:00
yichuan~
795eab6dda Add support for Batch API test (#936) 2024-08-06 23:52:10 -07:00
Liangsheng Yin
87e8c090e9 Organize code (rename, movement) (#953) 2024-08-06 20:50:32 -07:00
Liangsheng Yin
ad56e68495 Fix stuck in get_new_prefill_batch (#948) 2024-08-06 01:05:58 -07:00
yichuan~
ffb15744b5 Support multiple args options (#941) 2024-08-06 04:12:53 +10:00
Ke Bao
a9c833d580 Fix union operator (#940) 2024-08-06 00:46:34 +08:00
Aidan Cooper
94e0115186 Feat: add alternative choices selection methods (#835) 2024-08-05 03:27:49 -07:00
Aidan Cooper
b216a545b3 Remove leftover auth_token (#934) 2024-08-05 03:25:48 -07:00
yichuan~
fd7926e46e Fix prompt len in parallel sampling (#928) 2024-08-05 00:56:08 -07:00
Ying Sheng
3bc99e6fe4 Test openai vision api (#925) 2024-08-05 13:51:55 +10:00
min-xu-et
ebf69964cd latency test enhancement - final part (#921) 2024-08-04 18:15:23 -07:00
Ying Sheng
141e8c71a3 Bump version to 0.2.10 (#923) 2024-08-04 16:52:51 -07:00
yichuan~
d53dcf9c98 Support more OpenAI API test (#916) 2024-08-04 16:43:09 -07:00
Liangsheng Yin
bb66cc4c52 Fix CI && python3.8 compatible (#920) 2024-08-04 16:02:05 -07:00
Ying Sheng
0d4f3a9fcd Make API Key OpenAI-compatible (#917) 2024-08-04 13:35:44 -07:00
min-xu-et
afd411d09f enhance latency test - part 2 (#915) 2024-08-04 12:27:25 -07:00
Ke Bao
e1eae1fd15 Support MLA for DeepSeek-V2 with Triton - step 1 (#905) 2024-08-05 03:40:33 +10:00
Yineng Zhang
f4d9953d9d misc: add triton in check_env PACKAGE_LIST (#914) 2024-08-04 23:20:59 +10:00
Ying Sheng
995af5a54b Improve the structure of CI (#911) 2024-08-03 23:09:21 -07:00
min-xu-et
539856455d latency test enhancement - part 1 (#909) 2024-08-03 22:44:58 -07:00
Ying Sheng
70cc0749ce Add model accuracy test - step 1 (#866) 2024-08-03 18:20:50 -07:00
min-xu-et
7dd8a7e6d9 fixed an error handling in bench_latency.py (#904) 2024-08-03 17:42:17 -07:00
Ying Sheng
b906c01592 Bump version to 0.2.9.post1 (#899) 2024-08-02 12:08:00 -07:00
Yineng Zhang
046c2b339e chore: add multipart dep for fastapi (#895) 2024-08-03 00:50:19 +10:00
Yineng Zhang
6b8f66efe1 misc: update cuda graph capture exception log (#894) 2024-08-03 00:40:52 +10:00
Ying Sheng
30a9b2ef20 Bump version to v0.2.9 (#890) 2024-08-02 01:45:48 -07:00
Ying Sheng
3cadecf0c4 Increase openai client limit (#886) 2024-08-02 00:47:23 -07:00
Ying Sheng
e90e3a50d4 Add benchmark: HumanEval (#889) 2024-08-02 00:46:41 -07:00
Ying Sheng
fbd6b94d69 Fix the double BOS problem in the HF chat template (#888) 2024-08-02 00:30:50 -07:00