Lianmin Zheng
|
14b6493087
|
Delete the useless test/srt/test_throughput.py (#1045)
|
2024-08-11 21:31:52 -07:00 |
|
Lianmin Zheng
|
8207637029
|
Improve end-to-end throughput test and its coverage (#1039)
|
2024-08-11 18:27:33 -07:00 |
|
Lianmin Zheng
|
d84c5e70f7
|
Test the case when max_new_tokens is very large (#1038)
|
2024-08-11 16:41:03 -07:00 |
|
Lianmin Zheng
|
54fb1c80c0
|
Clean up unit tests (#1020)
|
2024-08-10 15:09:03 -07:00 |
|
Ying Sheng
|
b68c4c073b
|
fix: force max new tokens to be 1 for embedding request (#1019)
|
2024-08-10 13:46:42 -07:00 |
|
Ying Sheng
|
7599badeaf
|
Support embedding input as a list (#1014)
|
2024-08-10 08:39:05 -07:00 |
|
gryffindor-rr
|
9cf0a5bada
|
Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-09 12:14:13 -07:00 |
|
Ying Sheng
|
b16e856f11
|
Add openai embedding API (#997)
|
2024-08-09 11:19:18 -07:00 |
|
Juwan Yoo
|
10bca45bc6
|
bugfix: penalizers to be merged before reqs (#1001)
|
2024-08-09 21:46:24 +10:00 |
|
liuyhwangyh
|
b91a4cb1b1
|
support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
|
2024-08-09 02:52:14 -07:00 |
|
Juwan Yoo
|
95a28019ba
|
test: negative value testing for frequency, presence penalizers (#995)
|
2024-08-08 23:30:50 -07:00 |
|
Ying Sheng
|
e040a2450b
|
Add e5-mistral embedding model - step 3/3 (#988)
|
2024-08-08 16:31:19 -07:00 |
|
Juwan Yoo
|
ab7875941b
|
feat: frequency, min_new_tokens, presence, and repetition penalties (#973)
|
2024-08-08 04:21:08 -07:00 |
|
yichuan~
|
3a79613c28
|
support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-08 09:41:57 +00:00 |
|
Yineng Zhang
|
c31f084c71
|
chore: update vllm to 0.5.4 (#966)
|
2024-08-07 21:15:41 +10:00 |
|
yichuan~
|
5f6fa04a3f
|
misc: simplify test (#964)
|
2024-08-07 01:23:27 -07:00 |
|
yichuan~
|
795eab6dda
|
Add support for Batch API test (#936)
|
2024-08-06 23:52:10 -07:00 |
|
Aidan Cooper
|
94e0115186
|
Feat: add alternative choices selection methods (#835)
|
2024-08-05 03:27:49 -07:00 |
|
yichuan~
|
fd7926e46e
|
Fix prompt len in parallel sampling (#928)
|
2024-08-05 00:56:08 -07:00 |
|
Ying Sheng
|
0a4f5f9bea
|
Test regex in vision api (#926)
|
2024-08-04 22:52:41 -07:00 |
|
Ying Sheng
|
3bc99e6fe4
|
Test openai vision api (#925)
|
2024-08-05 13:51:55 +10:00 |
|
yichuan~
|
d53dcf9c98
|
Support more OpenAI API test (#916)
|
2024-08-04 16:43:09 -07:00 |
|
Liangsheng Yin
|
bb66cc4c52
|
Fix CI && python3.8 compatible (#920)
|
2024-08-04 16:02:05 -07:00 |
|
Ying Sheng
|
0d4f3a9fcd
|
Make API Key OpenAI-compatible (#917)
|
2024-08-04 13:35:44 -07:00 |
|
Ying Sheng
|
995af5a54b
|
Improve the structure of CI (#911)
|
2024-08-03 23:09:21 -07:00 |
|
Ying Sheng
|
70cc0749ce
|
Add model accuracy test - step 1 (#866)
|
2024-08-03 18:20:50 -07:00 |
|
Yineng Zhang
|
2e218b9e04
|
fix: set env in runner (#891)
|
2024-08-02 20:48:56 +10:00 |
|
Ying Sheng
|
ae7ee01a8e
|
Add accuracy test to CI: MMLU (#882)
|
2024-08-01 21:20:17 -07:00 |
|
Ying Sheng
|
60340a3643
|
Improve the coverage of the openai api server test (#878)
|
2024-08-01 16:01:30 -07:00 |
|
Ying Sheng
|
72b6ea88b4
|
Make scripts under /test/srt as unit tests (#875)
|
2024-08-01 14:34:55 -07:00 |
|
Ying Sheng
|
6f221d4ca0
|
Fix unit tests for the frontend language part (#872)
|
2024-08-01 12:39:12 -07:00 |
|
Ying Sheng
|
4075677621
|
Add OpenAI backend to the CI test (#869)
|
2024-08-01 09:25:24 -07:00 |
|
Lianmin Zheng
|
30db99b3d9
|
Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776)
|
2024-07-27 19:50:34 -07:00 |
|
Ying Sheng
|
51fda1439f
|
Update Readme (#660)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-07-19 09:54:01 -07:00 |
|
Ying Sheng
|
dc1b8bcfaa
|
Format (#593)
|
2024-07-05 10:06:17 -07:00 |
|
Lianmin Zheng
|
eb1ae6ae0c
|
Add sglang.bench_latency for offline benchmark (#564)
|
2024-06-25 03:38:04 -07:00 |
|
Liangsheng Yin
|
05471f2103
|
Update test_flashinfer (#560)
|
2024-06-24 15:23:57 +08:00 |
|
Lianmin Zheng
|
1fa15099d8
|
Add LlamaForClassification (#559)
|
2024-06-22 00:49:31 -07:00 |
|
Ying Sheng
|
fb9296f0ed
|
Higher priority for user input of max_prefill_tokens & format (#540)
|
2024-06-12 21:48:40 -07:00 |
|
胡译文
|
87260b7bfd
|
Litellm Backend (#502)
|
2024-06-07 12:24:28 -07:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Ying Sheng
|
3e684be7a3
|
Fix openai speculative execution (#456)
|
2024-05-20 17:01:13 -07:00 |
|
Liangsheng Yin
|
690d162d97
|
Format code (#441)
|
2024-05-14 22:40:46 +08:00 |
|
Lianmin Zheng
|
5dc55a5f02
|
Handle truncation errors (#436)
|
2024-05-13 15:56:00 -07:00 |
|
Lianmin Zheng
|
6e09cf6a15
|
Misc fixes (#432)
|
2024-05-12 15:05:40 -07:00 |
|
Lianmin Zheng
|
aee4f523cf
|
Fix logit processor bugs (#427)
|
2024-05-12 04:54:07 -07:00 |
|
Lianmin Zheng
|
7023f413c6
|
Clean up (#422)
|
2024-05-11 20:55:00 -07:00 |
|
Liangsheng Yin
|
19818b9c2f
|
Minor: style improvement of radix_cache and memory_pool (#395)
|
2024-04-26 01:01:36 +08:00 |
|
Liangsheng Yin
|
150d7020ed
|
Revert removing the unused imports (#385)
|
2024-04-23 22:36:33 +08:00 |
|
Liangsheng Yin
|
9acc6e3504
|
add .isort.cfg (#378)
|
2024-04-22 22:38:09 +08:00 |
|