Commit Graph

564 Commits

Author SHA1 Message Date
min-xu-et
ebf69964cd latency test enhancement - final part (#921) 2024-08-04 18:15:23 -07:00
Ying Sheng
141e8c71a3 Bump version to 0.2.10 (#923) 2024-08-04 16:52:51 -07:00
yichuan~
d53dcf9c98 Support more OpenAI API test (#916) 2024-08-04 16:43:09 -07:00
Liangsheng Yin
bb66cc4c52 Fix CI && python3.8 compatible (#920) 2024-08-04 16:02:05 -07:00
Ying Sheng
975adb802b Update hyperparameter_tuning.md (#918) 2024-08-04 13:51:52 -07:00
Ying Sheng
0d4f3a9fcd Make API Key OpenAI-compatible (#917) 2024-08-04 13:35:44 -07:00
min-xu-et
afd411d09f enhance latency test - part 2 (#915) 2024-08-04 12:27:25 -07:00
Ke Bao
e1eae1fd15 Support MLA for DeepSeek-V2 with Triton - step 1 (#905) 2024-08-05 03:40:33 +10:00
Yineng Zhang
f4d9953d9d misc: add triton in check_env PACKAGE_LIST (#914) 2024-08-04 23:20:59 +10:00
Yineng Zhang
4f00525057 fix: use e2e and unit test only for original repo or pr (#912) 2024-08-04 16:34:50 +10:00
Ying Sheng
995af5a54b Improve the structure of CI (#911) 2024-08-03 23:09:21 -07:00
min-xu-et
539856455d latency test enhancement - part 1 (#909) 2024-08-03 22:44:58 -07:00
Ying Sheng
70cc0749ce Add model accuracy test - step 1 (#866) 2024-08-03 18:20:50 -07:00
min-xu-et
7dd8a7e6d9 fixed an error handling in bench_latency.py (#904) 2024-08-03 17:42:17 -07:00
Liangsheng Yin
947402c829 Reorder CI unit tests. (#908) 2024-08-03 16:18:50 -07:00
Ying Sheng
8c5382e62c Update README.md 2024-08-03 12:58:41 -07:00
Ying Sheng
001b0bdd08 Update the base image of the docker (#900) 2024-08-02 21:54:57 -07:00
Ying Sheng
b906c01592 Bump version to 0.2.9.post1 (#899) 2024-08-02 12:08:00 -07:00
min-xu-et
9319cd139c [minor] fixed code formatting doc (#896) 2024-08-03 02:39:28 +10:00
Yineng Zhang
046c2b339e chore: add multipart dep for fastapi (#895) 2024-08-03 00:50:19 +10:00
Yineng Zhang
6b8f66efe1 misc: update cuda graph capture exception log (#894) 2024-08-03 00:40:52 +10:00
Yineng Zhang
7937a886b2 docs: update setup runner (#884) 2024-08-02 21:03:53 +10:00
Yineng Zhang
2e218b9e04 fix: set env in runner (#891) 2024-08-02 20:48:56 +10:00
Ying Sheng
30a9b2ef20 Bump version to v0.2.9 (#890) 2024-08-02 01:45:48 -07:00
Ying Sheng
3cadecf0c4 Increase openai client limit (#886) 2024-08-02 00:47:23 -07:00
Ying Sheng
e90e3a50d4 Add benchmark: HumanEval (#889) 2024-08-02 00:46:41 -07:00
Ying Sheng
fbd6b94d69 Fix the double BOS problem in the HF chat template (#888) 2024-08-02 00:30:50 -07:00
Ying Sheng
4c8093c8db Update workflow name (#883) 2024-08-01 21:29:46 -07:00
Ying Sheng
ae7ee01a8e Add accuracy test to CI: MMLU (#882) 2024-08-01 21:20:17 -07:00
Ying Sheng
76e59088d8 Add more unit tests to CI (#880) 2024-08-01 18:14:33 -07:00
Liangsheng Yin
12ce3befb6 Update runner docs (#879) 2024-08-01 17:37:47 -07:00
任嘉
4013a4e1b0 Implement served_model_name to customize model id when use local mode… (#749)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-01 17:13:51 -07:00
Ying Sheng
60340a3643 Improve the coverage of the openai api server test (#878) 2024-08-01 16:01:30 -07:00
Liangsheng Yin
70c78cfb03 Update runner docs (#876) 2024-08-01 15:32:33 -07:00
Ying Sheng
72b6ea88b4 Make scripts under /test/srt as unit tests (#875) 2024-08-01 14:34:55 -07:00
Ying Sheng
e4d3333c6c bump to 0.2.8 (#877) 2024-08-01 14:18:26 -07:00
Ying Sheng
6f221d4ca0 Fix unit tests for the frontend language part (#872) 2024-08-01 12:39:12 -07:00
Yineng Zhang
aba6f51f88 misc: update unit test config (#873) 2024-08-02 05:27:05 +10:00
Yineng Zhang
7f6c690b67 misc: use pip cache purge and add unit test ci (#871) 2024-08-02 05:12:20 +10:00
Ying Sheng
40e6f5131a Fix openai CI tests (#870) 2024-08-01 09:39:09 -07:00
Ying Sheng
4075677621 Add OpenAI backend to the CI test (#869) 2024-08-01 09:25:24 -07:00
Yineng Zhang
9e8d2c7f74 misc: add cancel previous at e2e (#864) 2024-08-01 18:26:54 +10:00
Yineng Zhang
c9bff5fcc8 misc: disable auto release (#862) 2024-08-01 17:46:51 +10:00
Ying Sheng
b04444ac01 Rename github workflows (#861) 2024-08-01 00:39:55 -07:00
Yineng Zhang
3d617a21ba misc: update e2e test paths config (#860) 2024-08-01 17:38:24 +10:00
Liangsheng Yin
c020f9ceda Support chunked prefill when radix cache is disabled (#811) 2024-08-01 00:29:01 -07:00
yichuan~
ca600e8cd6 Add support for logprobs in OpenAI chat API (#852) 2024-08-01 00:08:21 -07:00
Kai Fronsdal
0c0c81372e Fix #857 (#858) 2024-08-01 00:05:39 -07:00
Ying Sheng
90286d8576 Add troubleshooting doc (#856) 2024-08-01 00:05:26 -07:00
Ying Sheng
5e7dd984fe Fix llama for classification (#855) 2024-07-31 15:48:31 -07:00