Commit Graph

340 Commits

Author SHA1 Message Date
Ying Sheng
0de7c2d09e Add e5-mistral modules [unreachable code] - step 1/3 (#983) 2024-08-08 00:04:15 -07:00
Liangsheng Yin
6ed4e3b8fb Fix chunked prefill (#984) 2024-08-07 22:28:42 -07:00
Ying Sheng
00023d622a [minor] Update type annotation in tokenizer_manager.py (#982) 2024-08-08 01:48:45 +00:00
foszto
c62d560c03 #590 Increase default , track changes in examples and documentation (#971)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 00:54:46 +00:00
Liangsheng Yin
2b8257f325 Adjust max prefix len (#980) 2024-08-08 00:41:26 +00:00
Liangsheng Yin
7623091d97 RadixCache method adjust (#977) 2024-08-07 15:52:24 -07:00
Liangsheng Yin
f724f1f1e9 PrefillAdder abstraction (#968) 2024-08-07 13:47:28 -07:00
Zhiqiang Xie
6db27f7b3b misc: correct the int data type for token ids and indices (#969) 2024-08-08 04:40:07 +08:00
Liangsheng Yin
a01ddd9605 misc: fix the req_to_token member change (#967) 2024-08-07 01:52:10 -07:00
Liangsheng Yin
7fa54a1ab3 Make req_pool_indices on CPU (#960) 2024-08-07 01:41:25 -07:00
Ying Sheng
ff68ae857a Show more error messages for warmup errors (#932) 2024-08-06 23:57:06 -07:00
yichuan~
795eab6dda Add support for Batch API test (#936) 2024-08-06 23:52:10 -07:00
Liangsheng Yin
87e8c090e9 Organize code (rename, movement) (#953) 2024-08-06 20:50:32 -07:00
Liangsheng Yin
ad56e68495 Fix stuck in get_new_prefill_batch (#948) 2024-08-06 01:05:58 -07:00
yichuan~
ffb15744b5 Support multiple args options (#941) 2024-08-06 04:12:53 +10:00
yichuan~
fd7926e46e Fix prompt len in parallel sampling (#928) 2024-08-05 00:56:08 -07:00
Ying Sheng
3bc99e6fe4 Test openai vision api (#925) 2024-08-05 13:51:55 +10:00
yichuan~
d53dcf9c98 Support more OpenAI API test (#916) 2024-08-04 16:43:09 -07:00
Liangsheng Yin
bb66cc4c52 Fix CI && python3.8 compatible (#920) 2024-08-04 16:02:05 -07:00
Ying Sheng
0d4f3a9fcd Make API Key OpenAI-compatible (#917) 2024-08-04 13:35:44 -07:00
Ke Bao
e1eae1fd15 Support MLA for DeepSeek-V2 with Triton - step 1 (#905) 2024-08-05 03:40:33 +10:00
Ying Sheng
70cc0749ce Add model accuracy test - step 1 (#866) 2024-08-03 18:20:50 -07:00
min-xu-et
7dd8a7e6d9 fixed an error handling in bench_latency.py (#904) 2024-08-03 17:42:17 -07:00
Yineng Zhang
6b8f66efe1 misc: update cuda graph capture exception log (#894) 2024-08-03 00:40:52 +10:00
Ying Sheng
fbd6b94d69 Fix the double BOS problem in the HF chat template (#888) 2024-08-02 00:30:50 -07:00
任嘉
4013a4e1b0 Implement served_model_name to customize model id when use local mode… (#749)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-01 17:13:51 -07:00
Ying Sheng
60340a3643 Improve the coverage of the openai api server test (#878) 2024-08-01 16:01:30 -07:00
Ying Sheng
72b6ea88b4 Make scripts under /test/srt as unit tests (#875) 2024-08-01 14:34:55 -07:00
Ying Sheng
6f221d4ca0 Fix unit tests for the frontend language part (#872) 2024-08-01 12:39:12 -07:00
Liangsheng Yin
c020f9ceda Support chunked prefill when radix cache is disabled (#811) 2024-08-01 00:29:01 -07:00
yichuan~
ca600e8cd6 Add support for logprobs in OpenAI chat API (#852) 2024-08-01 00:08:21 -07:00
Ying Sheng
5e7dd984fe Fix llama for classification (#855) 2024-07-31 15:48:31 -07:00
Yineng Zhang
bc3eaac2b8 chore: update flashinfer to v0.1.3 (#850) 2024-08-01 04:37:05 +10:00
Liangsheng Yin
a6c7ebbbcb Add req slots leaking check (#842) 2024-07-30 18:29:01 -07:00
yichuan~
bb0501c0d9 Fix List input bug (#838) 2024-07-30 13:40:51 -07:00
Liangsheng Yin
6b0f2e9088 Add --max-total-tokens (#840) 2024-07-30 13:33:55 -07:00
Ying Sheng
b579ecf028 Add awq_marlin (#826) 2024-07-30 02:04:51 -07:00
Ying Sheng
e7487b08bc Adjust default mem fraction to avoid OOM (#823) 2024-07-30 01:58:31 -07:00
Liangsheng Yin
cdcbde5fc3 Code structure refactor (#807) 2024-07-29 23:04:48 -07:00
Liangsheng Yin
3520f75fb1 Remove inf value for chunked prefill size (#812) 2024-07-29 18:34:25 -07:00
yichuan~
084fa54d37 Add support for OpenAI API : offline batch(file) processing (#699)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-07-29 13:07:18 -07:00
Ying Sheng
eba458bd19 Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806) 2024-07-29 12:20:42 -07:00
Yineng Zhang
3d1cb0af83 feat: add chat template for internlm2-chat (#802) 2024-07-30 03:18:03 +08:00
Ying Sheng
7d352b4fdd Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805) 2024-07-29 11:39:12 -07:00
Yineng Zhang
87064015d9 fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803) 2024-07-29 11:00:52 -07:00
Liangsheng Yin
7cd4f244a4 Chunked prefill (#800) 2024-07-29 03:32:58 -07:00
Ying Sheng
98111fbe3e Revert "Chunked prefill support" (#799) 2024-07-29 02:38:31 -07:00
Liangsheng Yin
2ec39ab712 Chunked prefill support (#797) 2024-07-29 02:21:50 -07:00
Ying Sheng
325a06c2de Fix logging (#796) 2024-07-28 23:01:45 -07:00
Ying Sheng
8d908a937c Fix echo + lobprob for OpenAI API when the prompt is a list (#791) 2024-07-28 17:09:16 -07:00