Commit Graph

95 Commits

Author SHA1 Message Date
Shan Yu
cd10654e7e [Feat] Support update weights without restart server (#1157) 2024-08-20 13:48:24 -07:00
Juwan Yoo
d8476818ef feat: allow streaming for multi-prompt and/or parallel sampling (#1134) 2024-08-20 08:06:55 -07:00
yichuan~
b997a18d74 [Feat]Add support for optional start len of logprobs (#1035)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-08-18 23:45:41 -07:00
Liangsheng Yin
5d0d40d0eb Fix CI accuracy && time out limit (#1133) 2024-08-16 21:41:11 -07:00
Liangsheng Yin
3694f8f996 Mixed style of chunked prefill (#1013) 2024-08-16 09:13:00 +00:00
Lianmin Zheng
e86b1ccbf0 Enable chunked prefill by default (#1040) 2024-08-14 21:56:20 -07:00
Liangsheng Yin
73cf6834f2 Support stop_token_ids in sglang API (#1092) 2024-08-15 00:31:39 +00:00
Liangsheng Yin
a34dd86a7d Use dtype to control generate (#1082)
Co-authored-by: zhyncs <me@zhyncs.com>
2024-08-14 15:58:07 +00:00
Yineng Zhang
c8423ca311 ci: update timeout and retry (#1086)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-08-14 00:27:35 -07:00
Ying Sheng
0909bb0d2f [Feat] Add window attention for gemma-2 (#1056) 2024-08-13 17:01:26 -07:00
Lianmin Zheng
ad3e4f1619 Update the mixtral to use the better FusedMoE layer (#1081) 2024-08-13 15:44:25 -07:00
Yineng Zhang
cebd78d83e ci: add accuracy timeout (#1078) 2024-08-13 22:12:58 +10:00
Yineng Zhang
f7fb68d292 ci: add moe test (#1053) 2024-08-13 18:43:23 +10:00
Lianmin Zheng
c877292cc1 Re-organize CI tests (#1052) 2024-08-12 03:39:01 -07:00
Lianmin Zheng
0c1c72a0b4 Fix accuracy test (#1051) 2024-08-12 19:48:40 +10:00
Lianmin Zheng
41598e0d8e Add longer accuracy test on CI (#1049) 2024-08-12 09:21:38 +00:00
Ying Sheng
32f6144323 fix: Fix returned prefill logits and add output str test (#1046) 2024-08-12 06:13:45 +00:00
Lianmin Zheng
14b6493087 Delete the useless test/srt/test_throughput.py (#1045) 2024-08-11 21:31:52 -07:00
Lianmin Zheng
8207637029 Improve end-to-end throughput test and its coverage (#1039) 2024-08-11 18:27:33 -07:00
Lianmin Zheng
d84c5e70f7 Test the case when max_new_tokens is very large (#1038) 2024-08-11 16:41:03 -07:00
Lianmin Zheng
54fb1c80c0 Clean up unit tests (#1020) 2024-08-10 15:09:03 -07:00
Ying Sheng
b68c4c073b fix: force max new tokens to be 1 for embedding request (#1019) 2024-08-10 13:46:42 -07:00
Ying Sheng
7599badeaf Support embedding input as a list (#1014) 2024-08-10 08:39:05 -07:00
gryffindor-rr
9cf0a5bada Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
2024-08-09 12:14:13 -07:00
Ying Sheng
b16e856f11 Add openai embedding API (#997) 2024-08-09 11:19:18 -07:00
Juwan Yoo
10bca45bc6 bugfix: penalizers to be merged before reqs (#1001) 2024-08-09 21:46:24 +10:00
liuyhwangyh
b91a4cb1b1 support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-08-09 02:52:14 -07:00
Juwan Yoo
95a28019ba test: negative value testing for frequency, presence penalizers (#995) 2024-08-08 23:30:50 -07:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00
Juwan Yoo
ab7875941b feat: frequency, min_new_tokens, presence, and repetition penalties (#973) 2024-08-08 04:21:08 -07:00
yichuan~
3a79613c28 support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 09:41:57 +00:00
Yineng Zhang
c31f084c71 chore: update vllm to 0.5.4 (#966) 2024-08-07 21:15:41 +10:00
yichuan~
5f6fa04a3f misc: simplify test (#964) 2024-08-07 01:23:27 -07:00
yichuan~
795eab6dda Add support for Batch API test (#936) 2024-08-06 23:52:10 -07:00
Aidan Cooper
94e0115186 Feat: add alternative choices selection methods (#835) 2024-08-05 03:27:49 -07:00
yichuan~
fd7926e46e Fix prompt len in parallel sampling (#928) 2024-08-05 00:56:08 -07:00
Ying Sheng
0a4f5f9bea Test regex in vision api (#926) 2024-08-04 22:52:41 -07:00
Ying Sheng
3bc99e6fe4 Test openai vision api (#925) 2024-08-05 13:51:55 +10:00
yichuan~
d53dcf9c98 Support more OpenAI API test (#916) 2024-08-04 16:43:09 -07:00
Liangsheng Yin
bb66cc4c52 Fix CI && python3.8 compatible (#920) 2024-08-04 16:02:05 -07:00
Ying Sheng
0d4f3a9fcd Make API Key OpenAI-compatible (#917) 2024-08-04 13:35:44 -07:00
Ying Sheng
995af5a54b Improve the structure of CI (#911) 2024-08-03 23:09:21 -07:00
Ying Sheng
70cc0749ce Add model accuracy test - step 1 (#866) 2024-08-03 18:20:50 -07:00
Yineng Zhang
2e218b9e04 fix: set env in runner (#891) 2024-08-02 20:48:56 +10:00
Ying Sheng
ae7ee01a8e Add accuracy test to CI: MMLU (#882) 2024-08-01 21:20:17 -07:00
Ying Sheng
60340a3643 Improve the coverage of the openai api server test (#878) 2024-08-01 16:01:30 -07:00
Ying Sheng
72b6ea88b4 Make scripts under /test/srt as unit tests (#875) 2024-08-01 14:34:55 -07:00
Ying Sheng
6f221d4ca0 Fix unit tests for the frontend language part (#872) 2024-08-01 12:39:12 -07:00
Ying Sheng
4075677621 Add OpenAI backend to the CI test (#869) 2024-08-01 09:25:24 -07:00
Lianmin Zheng
30db99b3d9 Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776) 2024-07-27 19:50:34 -07:00