Juwan Yoo
|
ab7875941b
|
feat: frequency, min_new_tokens, presence, and repetition penalties (#973)
|
2024-08-08 04:21:08 -07:00 |
|
yichuan~
|
3a79613c28
|
support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-08 09:41:57 +00:00 |
|
Liangsheng Yin
|
1ac304eeb4
|
Adjust InputeMetadata and ScheduleBatch (#981)
|
2024-08-08 01:11:22 -07:00 |
|
Ying Sheng
|
20a4f927dc
|
Add io struct for embedding models [unreachable code] - step 2/3 (#987)
|
2024-08-08 07:52:31 +00:00 |
|
Ying Sheng
|
0de7c2d09e
|
Add e5-mistral modules [unreachable code] - step 1/3 (#983)
|
2024-08-08 00:04:15 -07:00 |
|
Liangsheng Yin
|
6ed4e3b8fb
|
Fix chunked prefill (#984)
|
2024-08-07 22:28:42 -07:00 |
|
Ying Sheng
|
00023d622a
|
[minor] Update type annotation in tokenizer_manager.py (#982)
|
2024-08-08 01:48:45 +00:00 |
|
foszto
|
c62d560c03
|
#590 Increase default , track changes in examples and documentation (#971)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-08 00:54:46 +00:00 |
|
Liangsheng Yin
|
2b8257f325
|
Adjust max prefix len (#980)
|
2024-08-08 00:41:26 +00:00 |
|
Liangsheng Yin
|
7623091d97
|
RadixCache method adjust (#977)
|
2024-08-07 15:52:24 -07:00 |
|
Liangsheng Yin
|
f724f1f1e9
|
PrefillAdder abstraction (#968)
|
2024-08-07 13:47:28 -07:00 |
|
Zhiqiang Xie
|
6db27f7b3b
|
misc: correct the int data type for token ids and indices (#969)
|
2024-08-08 04:40:07 +08:00 |
|
Yineng Zhang
|
dc9d06d886
|
chore: bump v0.2.11 (#970)
|
2024-08-07 20:47:53 +08:00 |
|
Yineng Zhang
|
c31f084c71
|
chore: update vllm to 0.5.4 (#966)
|
2024-08-07 21:15:41 +10:00 |
|
Liangsheng Yin
|
a01ddd9605
|
misc: fix the req_to_token member change (#967)
|
2024-08-07 01:52:10 -07:00 |
|
Liangsheng Yin
|
7fa54a1ab3
|
Make req_pool_indices on CPU (#960)
|
2024-08-07 01:41:25 -07:00 |
|
Yineng Zhang
|
05abd1261c
|
misc: add compute capability in check_env (#965)
|
2024-08-07 18:39:36 +10:00 |
|
Ying Sheng
|
ff68ae857a
|
Show more error messages for warmup errors (#932)
|
2024-08-06 23:57:06 -07:00 |
|
yichuan~
|
795eab6dda
|
Add support for Batch API test (#936)
|
2024-08-06 23:52:10 -07:00 |
|
Liangsheng Yin
|
87e8c090e9
|
Organize code (rename, movement) (#953)
|
2024-08-06 20:50:32 -07:00 |
|
Liangsheng Yin
|
ad56e68495
|
Fix stuck in get_new_prefill_batch (#948)
|
2024-08-06 01:05:58 -07:00 |
|
yichuan~
|
ffb15744b5
|
Support multiple args options (#941)
|
2024-08-06 04:12:53 +10:00 |
|
Ke Bao
|
a9c833d580
|
Fix union operator (#940)
|
2024-08-06 00:46:34 +08:00 |
|
Aidan Cooper
|
94e0115186
|
Feat: add alternative choices selection methods (#835)
|
2024-08-05 03:27:49 -07:00 |
|
Aidan Cooper
|
b216a545b3
|
Remove leftover auth_token (#934)
|
2024-08-05 03:25:48 -07:00 |
|
yichuan~
|
fd7926e46e
|
Fix prompt len in parallel sampling (#928)
|
2024-08-05 00:56:08 -07:00 |
|
Ying Sheng
|
3bc99e6fe4
|
Test openai vision api (#925)
|
2024-08-05 13:51:55 +10:00 |
|
min-xu-et
|
ebf69964cd
|
latency test enhancement - final part (#921)
|
2024-08-04 18:15:23 -07:00 |
|
Ying Sheng
|
141e8c71a3
|
Bump version to 0.2.10 (#923)
|
2024-08-04 16:52:51 -07:00 |
|
yichuan~
|
d53dcf9c98
|
Support more OpenAI API test (#916)
|
2024-08-04 16:43:09 -07:00 |
|
Liangsheng Yin
|
bb66cc4c52
|
Fix CI && python3.8 compatible (#920)
|
2024-08-04 16:02:05 -07:00 |
|
Ying Sheng
|
0d4f3a9fcd
|
Make API Key OpenAI-compatible (#917)
|
2024-08-04 13:35:44 -07:00 |
|
min-xu-et
|
afd411d09f
|
enhance latency test - part 2 (#915)
|
2024-08-04 12:27:25 -07:00 |
|
Ke Bao
|
e1eae1fd15
|
Support MLA for DeepSeek-V2 with Triton - step 1 (#905)
|
2024-08-05 03:40:33 +10:00 |
|
Yineng Zhang
|
f4d9953d9d
|
misc: add triton in check_env PACKAGE_LIST (#914)
|
2024-08-04 23:20:59 +10:00 |
|
Ying Sheng
|
995af5a54b
|
Improve the structure of CI (#911)
|
2024-08-03 23:09:21 -07:00 |
|
min-xu-et
|
539856455d
|
latency test enhancement - part 1 (#909)
|
2024-08-03 22:44:58 -07:00 |
|
Ying Sheng
|
70cc0749ce
|
Add model accuracy test - step 1 (#866)
|
2024-08-03 18:20:50 -07:00 |
|
min-xu-et
|
7dd8a7e6d9
|
fixed an error handling in bench_latency.py (#904)
|
2024-08-03 17:42:17 -07:00 |
|
Ying Sheng
|
b906c01592
|
Bump version to 0.2.9.post1 (#899)
|
2024-08-02 12:08:00 -07:00 |
|
Yineng Zhang
|
046c2b339e
|
chore: add multipart dep for fastapi (#895)
|
2024-08-03 00:50:19 +10:00 |
|
Yineng Zhang
|
6b8f66efe1
|
misc: update cuda graph capture exception log (#894)
|
2024-08-03 00:40:52 +10:00 |
|
Ying Sheng
|
30a9b2ef20
|
Bump version to v0.2.9 (#890)
|
2024-08-02 01:45:48 -07:00 |
|
Ying Sheng
|
3cadecf0c4
|
Increase openai client limit (#886)
|
2024-08-02 00:47:23 -07:00 |
|
Ying Sheng
|
e90e3a50d4
|
Add benchmark: HumanEval (#889)
|
2024-08-02 00:46:41 -07:00 |
|
Ying Sheng
|
fbd6b94d69
|
Fix the double BOS problem in the HF chat template (#888)
|
2024-08-02 00:30:50 -07:00 |
|
Ying Sheng
|
ae7ee01a8e
|
Add accuracy test to CI: MMLU (#882)
|
2024-08-01 21:20:17 -07:00 |
|
任嘉
|
4013a4e1b0
|
Implement served_model_name to customize model id when use local mode… (#749)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-01 17:13:51 -07:00 |
|
Ying Sheng
|
60340a3643
|
Improve the coverage of the openai api server test (#878)
|
2024-08-01 16:01:30 -07:00 |
|
Ying Sheng
|
72b6ea88b4
|
Make scripts under /test/srt as unit tests (#875)
|
2024-08-01 14:34:55 -07:00 |
|