Commit Graph

408 Commits

Author SHA1 Message Date
Yineng Zhang
1edd4e07d6 chore: bump v0.2.7 (#830) 2024-07-30 20:41:10 +10:00
Yineng Zhang
f52eda35ea misc: update e2e test benchmark config (#825) 2024-07-30 19:19:23 +10:00
Ying Sheng
b579ecf028 Add awq_marlin (#826) 2024-07-30 02:04:51 -07:00
Ying Sheng
e7487b08bc Adjust default mem fraction to avoid OOM (#823) 2024-07-30 01:58:31 -07:00
Ying Sheng
ae5c0fc442 Support disable_ignore_eos in bench_serving.py (#824) 2024-07-30 01:42:07 -07:00
ObjectNotFound
daf593a385 Fix streaming bug (#820) 2024-07-30 00:32:07 -07:00
Liangsheng Yin
cdcbde5fc3 Code structure refactor (#807) 2024-07-29 23:04:48 -07:00
Enrique Shockwave
21e22b9e96 Fix LiteLLM kwargs (#817) 2024-07-29 22:38:02 -07:00
Ying Sheng
db6089e6f3 Revert "Organize public APIs" (#815) 2024-07-29 19:40:28 -07:00
Liangsheng Yin
3520f75fb1 Remove inf value for chunked prefill size (#812) 2024-07-29 18:34:25 -07:00
Liangsheng Yin
c8e9fed87a Organize public APIs (#809) 2024-07-29 15:34:16 -07:00
yichuan~
084fa54d37 Add support for OpenAI API : offline batch(file) processing (#699)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-07-29 13:07:18 -07:00
Ying Sheng
eba458bd19 Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806) 2024-07-29 12:20:42 -07:00
Yineng Zhang
3d1cb0af83 feat: add chat template for internlm2-chat (#802) 2024-07-30 03:18:03 +08:00
Ying Sheng
7d352b4fdd Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805) 2024-07-29 11:39:12 -07:00
Yineng Zhang
87064015d9 fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803) 2024-07-29 11:00:52 -07:00
Liangsheng Yin
7cd4f244a4 Chunked prefill (#800) 2024-07-29 03:32:58 -07:00
Ying Sheng
98111fbe3e Revert "Chunked prefill support" (#799) 2024-07-29 02:38:31 -07:00
Liangsheng Yin
2ec39ab712 Chunked prefill support (#797) 2024-07-29 02:21:50 -07:00
ObjectNotFound
8f6274c82b Add role documentation, add system begin & end tokens (#793) 2024-07-28 23:02:49 -07:00
Ying Sheng
325a06c2de Fix logging (#796) 2024-07-28 23:01:45 -07:00
Ying Sheng
79f816292e Fix lazy import location (#795) 2024-07-28 22:09:50 -07:00
Eric Yoon
b688fd858d Lazy-import third-party backends (#794) 2024-07-28 21:57:41 -07:00
Ying Sheng
8d908a937c Fix echo + lobprob for OpenAI API when the prompt is a list (#791) 2024-07-28 17:09:16 -07:00
Yineng Zhang
dd7e8b9421 chore: add copyright for srt (#790) 2024-07-28 23:07:12 +10:00
Ying Sheng
c71880f896 Vectorize logprobs computation (#787) 2024-07-28 05:22:14 -07:00
Yineng Zhang
948625799e docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
Yineng Zhang
68e5262699 fix: replace pillow with PIL in PACKAGE_LIST (#781) 2024-07-28 14:06:24 +10:00
Lianmin Zheng
bc1154c399 Bump version to 0.2.6 (#779) 2024-07-27 20:29:33 -07:00
Lianmin Zheng
752e643007 Allow disabling flashinfer sampling kernel (#778) 2024-07-27 20:18:56 -07:00
Lianmin Zheng
30db99b3d9 Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776) 2024-07-27 19:50:34 -07:00
Lianmin Zheng
0a409bd438 Fix return_log_probs with cuda graph (#775) 2024-07-27 19:15:09 -07:00
Mingyi
e4db4e5ba5 minor refactor: move check server args to server_args.py (#774) 2024-07-27 19:03:40 -07:00
Lianmin Zheng
bbc07c4197 Move sampling logits to float32 (#773) 2024-07-27 17:30:12 -07:00
Lianmin Zheng
a036d41980 Fix max new tokens (#772) 2024-07-27 17:22:18 -07:00
Lianmin Zheng
f95e661757 Fix max_tokens for OpenAI chat completion API (#766) 2024-07-27 15:44:27 -07:00
Lianmin Zheng
0736b27020 [Minor] Improve the code style in TokenizerManager (#767) 2024-07-27 05:05:15 -07:00
Ke Bao
3fdab91912 Fix TransformerTokenizer init for chatglm2 & 3 (#761) 2024-07-27 02:44:46 -07:00
Liangsheng Yin
d9fccfefe2 Fix context length (#757) 2024-07-26 18:13:13 -07:00
Liangsheng Yin
679ebcbbdc Deepseek v2 support (#693) 2024-07-26 17:10:07 -07:00
Yineng Zhang
5bd06b4599 fix: use REPO_TOKEN (#755) 2024-07-27 05:56:30 +10:00
Yineng Zhang
9a61182732 fix: add release tag workflow (#754) 2024-07-27 05:48:38 +10:00
Yineng Zhang
eeb2482186 feat: add release tag workflow (#753) 2024-07-27 05:37:02 +10:00
Yineng Zhang
8628ab9c8b feat: add docker workflow (#751) 2024-07-27 03:54:51 +10:00
Yineng Zhang
1b77670f39 chore: bump v0.2.1 (#740) 2024-07-26 21:27:41 +10:00
Yineng Zhang
768e05d08f fix benchmark (#743)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-07-26 21:26:13 +10:00
Yineng Zhang
6b32bb1c0b misc: format (#741) 2024-07-26 21:00:51 +10:00
Toshiki Kataoka
40facad5f1 feat: support token ids in /v1/completions (#736) 2024-07-26 02:53:17 -07:00
Toshiki Kataoka
da504445dc fix /generate without sampling_params (#734) 2024-07-26 01:27:56 -07:00
Ying Sheng
252e0f7bbd fix: small bug for llama-405b fp16 (#733) 2024-07-25 21:14:54 -07:00