min-xu-et
|
7dd8a7e6d9
|
fixed an error handling in bench_latency.py (#904)
|
2024-08-03 17:42:17 -07:00 |
|
Liangsheng Yin
|
947402c829
|
Reorder CI unit tests. (#908)
|
2024-08-03 16:18:50 -07:00 |
|
Ying Sheng
|
8c5382e62c
|
Update README.md
|
2024-08-03 12:58:41 -07:00 |
|
Ying Sheng
|
001b0bdd08
|
Update the base image of the docker (#900)
|
2024-08-02 21:54:57 -07:00 |
|
Ying Sheng
|
b906c01592
|
Bump version to 0.2.9.post1 (#899)
|
2024-08-02 12:08:00 -07:00 |
|
min-xu-et
|
9319cd139c
|
[minor] fixed code formatting doc (#896)
|
2024-08-03 02:39:28 +10:00 |
|
Yineng Zhang
|
046c2b339e
|
chore: add multipart dep for fastapi (#895)
|
2024-08-03 00:50:19 +10:00 |
|
Yineng Zhang
|
6b8f66efe1
|
misc: update cuda graph capture exception log (#894)
|
2024-08-03 00:40:52 +10:00 |
|
Yineng Zhang
|
7937a886b2
|
docs: update setup runner (#884)
|
2024-08-02 21:03:53 +10:00 |
|
Yineng Zhang
|
2e218b9e04
|
fix: set env in runner (#891)
|
2024-08-02 20:48:56 +10:00 |
|
Ying Sheng
|
30a9b2ef20
|
Bump version to v0.2.9 (#890)
|
2024-08-02 01:45:48 -07:00 |
|
Ying Sheng
|
3cadecf0c4
|
Increase openai client limit (#886)
|
2024-08-02 00:47:23 -07:00 |
|
Ying Sheng
|
e90e3a50d4
|
Add benchmark: HumanEval (#889)
|
2024-08-02 00:46:41 -07:00 |
|
Ying Sheng
|
fbd6b94d69
|
Fix the double BOS problem in the HF chat template (#888)
|
2024-08-02 00:30:50 -07:00 |
|
Ying Sheng
|
4c8093c8db
|
Update workflow name (#883)
|
2024-08-01 21:29:46 -07:00 |
|
Ying Sheng
|
ae7ee01a8e
|
Add accuracy test to CI: MMLU (#882)
|
2024-08-01 21:20:17 -07:00 |
|
Ying Sheng
|
76e59088d8
|
Add more unit tests to CI (#880)
|
2024-08-01 18:14:33 -07:00 |
|
Liangsheng Yin
|
12ce3befb6
|
Update runner docs (#879)
|
2024-08-01 17:37:47 -07:00 |
|
任嘉
|
4013a4e1b0
|
Implement served_model_name to customize model id when use local mode… (#749)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-01 17:13:51 -07:00 |
|
Ying Sheng
|
60340a3643
|
Improve the coverage of the openai api server test (#878)
|
2024-08-01 16:01:30 -07:00 |
|
Liangsheng Yin
|
70c78cfb03
|
Update runner docs (#876)
|
2024-08-01 15:32:33 -07:00 |
|
Ying Sheng
|
72b6ea88b4
|
Make scripts under /test/srt as unit tests (#875)
|
2024-08-01 14:34:55 -07:00 |
|
Ying Sheng
|
e4d3333c6c
|
bump to 0.2.8 (#877)
|
2024-08-01 14:18:26 -07:00 |
|
Ying Sheng
|
6f221d4ca0
|
Fix unit tests for the frontend language part (#872)
|
2024-08-01 12:39:12 -07:00 |
|
Yineng Zhang
|
aba6f51f88
|
misc: update unit test config (#873)
|
2024-08-02 05:27:05 +10:00 |
|
Yineng Zhang
|
7f6c690b67
|
misc: use pip cache purge and add unit test ci (#871)
|
2024-08-02 05:12:20 +10:00 |
|
Ying Sheng
|
40e6f5131a
|
Fix openai CI tests (#870)
|
2024-08-01 09:39:09 -07:00 |
|
Ying Sheng
|
4075677621
|
Add OpenAI backend to the CI test (#869)
|
2024-08-01 09:25:24 -07:00 |
|
Yineng Zhang
|
9e8d2c7f74
|
misc: add cancel previous at e2e (#864)
|
2024-08-01 18:26:54 +10:00 |
|
Yineng Zhang
|
c9bff5fcc8
|
misc: disable auto release (#862)
|
2024-08-01 17:46:51 +10:00 |
|
Ying Sheng
|
b04444ac01
|
Rename github workflows (#861)
|
2024-08-01 00:39:55 -07:00 |
|
Yineng Zhang
|
3d617a21ba
|
misc: update e2e test paths config (#860)
|
2024-08-01 17:38:24 +10:00 |
|
Liangsheng Yin
|
c020f9ceda
|
Support chunked prefill when radix cache is disabled (#811)
|
2024-08-01 00:29:01 -07:00 |
|
yichuan~
|
ca600e8cd6
|
Add support for logprobs in OpenAI chat API (#852)
|
2024-08-01 00:08:21 -07:00 |
|
Kai Fronsdal
|
0c0c81372e
|
Fix #857 (#858)
|
2024-08-01 00:05:39 -07:00 |
|
Ying Sheng
|
90286d8576
|
Add troubleshooting doc (#856)
|
2024-08-01 00:05:26 -07:00 |
|
Ying Sheng
|
5e7dd984fe
|
Fix llama for classification (#855)
|
2024-07-31 15:48:31 -07:00 |
|
Yineng Zhang
|
bc3eaac2b8
|
chore: update flashinfer to v0.1.3 (#850)
|
2024-08-01 04:37:05 +10:00 |
|
Yineng Zhang
|
a78d98de19
|
misc: update e2e test paths config (#848)
|
2024-07-31 18:37:29 +10:00 |
|
Ikko Eltociear Ashimine
|
7d5ed7c6ee
|
docs: update README.md (#843)
|
2024-07-31 12:48:18 +10:00 |
|
Liangsheng Yin
|
a6c7ebbbcb
|
Add req slots leaking check (#842)
|
2024-07-30 18:29:01 -07:00 |
|
yichuan~
|
bb0501c0d9
|
Fix List input bug (#838)
|
2024-07-30 13:40:51 -07:00 |
|
Liangsheng Yin
|
6b0f2e9088
|
Add --max-total-tokens (#840)
|
2024-07-30 13:33:55 -07:00 |
|
Yineng Zhang
|
1edd4e07d6
|
chore: bump v0.2.7 (#830)
|
2024-07-30 20:41:10 +10:00 |
|
Yineng Zhang
|
62c673c46f
|
docs: add set up runner (#829)
|
2024-07-30 19:43:40 +10:00 |
|
Yineng Zhang
|
377c5dc9a9
|
misc: enable e2e test when push (#828)
|
2024-07-30 19:26:23 +10:00 |
|
Yineng Zhang
|
f52eda35ea
|
misc: update e2e test benchmark config (#825)
|
2024-07-30 19:19:23 +10:00 |
|
Ying Sheng
|
b579ecf028
|
Add awq_marlin (#826)
|
2024-07-30 02:04:51 -07:00 |
|
Ying Sheng
|
e7487b08bc
|
Adjust default mem fraction to avoid OOM (#823)
|
2024-07-30 01:58:31 -07:00 |
|
Ying Sheng
|
ae5c0fc442
|
Support disable_ignore_eos in bench_serving.py (#824)
|
2024-07-30 01:42:07 -07:00 |
|