Commit Graph

494 Commits

Author SHA1 Message Date
Lianmin Zheng
d84c5e70f7 Test the case when max_new_tokens is very large (#1038) 2024-08-11 16:41:03 -07:00
Lianmin Zheng
d785412077 Fix the case when max_new_tokens is too large (#1025) 2024-08-11 15:20:18 -07:00
Liangsheng Yin
7b6a5332ca Fix triton args init (#1034) 2024-08-11 12:11:26 -07:00
Lianmin Zheng
4080e82244 Fix the case where r.prefix_indices is None (#1031) 2024-08-11 04:53:51 -07:00
Yineng Zhang
c245b78973 hotfix: add CustomOp abstraction (#1027) 2024-08-11 02:45:59 -07:00
Lianmin Zheng
9dae407812 Improve type annotation (#1029) 2024-08-11 02:44:59 -07:00
Liangsheng Yin
fcc0f5ed99 Fix wrong assert (#1028) 2024-08-11 09:22:16 +00:00
Lianmin Zheng
a97df79124 Clean up readme and arguments of chunked prefill (#1022) 2024-08-11 01:18:52 -07:00
Yineng Zhang
94752ac811 feat: use FlashInfer rmsnorm and silu (#907) 2024-08-11 14:57:13 +10:00
Liangsheng Yin
43fbb6d919 Fix input_ids && rename to fill_ids (#1021) 2024-08-10 16:24:12 -07:00
Lianmin Zheng
54fb1c80c0 Clean up unit tests (#1020) 2024-08-10 15:09:03 -07:00
Ying Sheng
b68c4c073b fix: force max new tokens to be 1 for embedding request (#1019) 2024-08-10 13:46:42 -07:00
Yineng Zhang
e712837d38 misc: update test config (#990) 2024-08-11 04:20:30 +10:00
Ying Sheng
7599badeaf Support embedding input as a list (#1014) 2024-08-10 08:39:05 -07:00
Liangsheng Yin
62757db6f0 Reduce the overhead when cache is disabled (#1010) 2024-08-09 16:36:57 -07:00
Liangsheng Yin
73fa2d49d5 Some warnings to crash when CI (#1009) 2024-08-09 15:16:23 -07:00
Mingyi
61728884d7 Fix benchmark latency (#1007) 2024-08-09 13:18:58 -07:00
gryffindor-rr
9cf0a5bada Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
2024-08-09 12:14:13 -07:00
Ying Sheng
b16e856f11 Add openai embedding API (#997) 2024-08-09 11:19:18 -07:00
Roger Wang
05c50a82b8 Minor bugfix on benchmark serving (#1005) 2024-08-10 02:53:50 +10:00
Yineng Zhang
b568df5d03 fix: resolve correctness_test issue (#1002) 2024-08-09 23:21:42 +10:00
Juwan Yoo
10bca45bc6 bugfix: penalizers to be merged before reqs (#1001) 2024-08-09 21:46:24 +10:00
liuyhwangyh
b91a4cb1b1 support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-08-09 02:52:14 -07:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00
Ying Sheng
9f662501a3 Move torch.compile configs into cuda_graph_runner.py (#993) 2024-08-08 13:20:30 -07:00
Juwan Yoo
ab7875941b feat: frequency, min_new_tokens, presence, and repetition penalties (#973) 2024-08-08 04:21:08 -07:00
yichuan~
3a79613c28 support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 09:41:57 +00:00
Liangsheng Yin
1ac304eeb4 Adjust InputeMetadata and ScheduleBatch (#981) 2024-08-08 01:11:22 -07:00
Ying Sheng
20a4f927dc Add io struct for embedding models [unreachable code] - step 2/3 (#987) 2024-08-08 07:52:31 +00:00
Ying Sheng
0de7c2d09e Add e5-mistral modules [unreachable code] - step 1/3 (#983) 2024-08-08 00:04:15 -07:00
Liangsheng Yin
6ed4e3b8fb Fix chunked prefill (#984) 2024-08-07 22:28:42 -07:00
Ying Sheng
00023d622a [minor] Update type annotation in tokenizer_manager.py (#982) 2024-08-08 01:48:45 +00:00
foszto
c62d560c03 #590 Increase default , track changes in examples and documentation (#971)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 00:54:46 +00:00
Liangsheng Yin
2b8257f325 Adjust max prefix len (#980) 2024-08-08 00:41:26 +00:00
Liangsheng Yin
7623091d97 RadixCache method adjust (#977) 2024-08-07 15:52:24 -07:00
Liangsheng Yin
f724f1f1e9 PrefillAdder abstraction (#968) 2024-08-07 13:47:28 -07:00
Zhiqiang Xie
6db27f7b3b misc: correct the int data type for token ids and indices (#969) 2024-08-08 04:40:07 +08:00
Yineng Zhang
dc9d06d886 chore: bump v0.2.11 (#970) 2024-08-07 20:47:53 +08:00
Yineng Zhang
c31f084c71 chore: update vllm to 0.5.4 (#966) 2024-08-07 21:15:41 +10:00
Liangsheng Yin
a01ddd9605 misc: fix the req_to_token member change (#967) 2024-08-07 01:52:10 -07:00
Liangsheng Yin
7fa54a1ab3 Make req_pool_indices on CPU (#960) 2024-08-07 01:41:25 -07:00
Yineng Zhang
05abd1261c misc: add compute capability in check_env (#965) 2024-08-07 18:39:36 +10:00
Ying Sheng
ff68ae857a Show more error messages for warmup errors (#932) 2024-08-06 23:57:06 -07:00
yichuan~
795eab6dda Add support for Batch API test (#936) 2024-08-06 23:52:10 -07:00
Liangsheng Yin
87e8c090e9 Organize code (rename, movement) (#953) 2024-08-06 20:50:32 -07:00
Liangsheng Yin
ad56e68495 Fix stuck in get_new_prefill_batch (#948) 2024-08-06 01:05:58 -07:00
yichuan~
ffb15744b5 Support multiple args options (#941) 2024-08-06 04:12:53 +10:00
Ke Bao
a9c833d580 Fix union operator (#940) 2024-08-06 00:46:34 +08:00
Aidan Cooper
94e0115186 Feat: add alternative choices selection methods (#835) 2024-08-05 03:27:49 -07:00
Aidan Cooper
b216a545b3 Remove leftover auth_token (#934) 2024-08-05 03:25:48 -07:00