Commit Graph

4977 Commits

Author SHA1 Message Date
Yineng Zhang
dd7e8b9421 chore: add copyright for srt (#790) 2024-07-28 23:07:12 +10:00
Yineng Zhang
1f013d64eb docs: make badges center (#789) 2024-07-28 22:27:52 +10:00
Yineng Zhang
628e1fa760 docs: update README (#788) 2024-07-28 22:24:27 +10:00
Ying Sheng
c71880f896 Vectorize logprobs computation (#787) 2024-07-28 05:22:14 -07:00
Ying Sheng
bcb6611a46 Update README.md 2024-07-28 01:00:06 -07:00
Yineng Zhang
fa2aa0db0a docs: update index (#786) 2024-07-28 17:22:00 +10:00
Yineng Zhang
6a387a69cc fix: exclude logo png in gitignore (#785) 2024-07-28 17:08:16 +10:00
Yineng Zhang
27f5ce0a6c fix: init readthedocs support (#784) 2024-07-28 16:55:54 +10:00
Yineng Zhang
948625799e docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
Yineng Zhang
68e5262699 fix: replace pillow with PIL in PACKAGE_LIST (#781) 2024-07-28 14:06:24 +10:00
Lianmin Zheng
bc1154c399 Bump version to 0.2.6 (#779) 2024-07-27 20:29:33 -07:00
Lianmin Zheng
752e643007 Allow disabling flashinfer sampling kernel (#778) 2024-07-27 20:18:56 -07:00
Lianmin Zheng
30db99b3d9 Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776) 2024-07-27 19:50:34 -07:00
Lianmin Zheng
0a409bd438 Fix return_log_probs with cuda graph (#775) 2024-07-27 19:15:09 -07:00
Mingyi
e4db4e5ba5 minor refactor: move check server args to server_args.py (#774) 2024-07-27 19:03:40 -07:00
Lianmin Zheng
bbc07c4197 Move sampling logits to float32 (#773) 2024-07-27 17:30:12 -07:00
Lianmin Zheng
a036d41980 Fix max new tokens (#772) 2024-07-27 17:22:18 -07:00
Lianmin Zheng
f95e661757 Fix max_tokens for OpenAI chat completion API (#766) 2024-07-27 15:44:27 -07:00
Yineng Zhang
de854fb5c5 feat: add fake tag (#770) 2024-07-28 02:22:22 +10:00
Lianmin Zheng
f64b2a9bc0 Add slack invitation link. 2024-07-27 06:29:15 -07:00
Ying Sheng
9f95dcc64f Update readme (#769)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
2024-07-27 06:12:16 -07:00
Lianmin Zheng
0736b27020 [Minor] Improve the code style in TokenizerManager (#767) 2024-07-27 05:05:15 -07:00
Ke Bao
3fdab91912 Fix TransformerTokenizer init for chatglm2 & 3 (#761) 2024-07-27 02:44:46 -07:00
Liangsheng Yin
ba29504b21 Update supported models (#763) 2024-07-27 15:53:53 +10:00
Yineng Zhang
a72342f180 fix: not run workflows on fork repo (#762) 2024-07-27 14:51:33 +10:00
Yineng Zhang
c3c74bf874 docs: update model support (#760) 2024-07-27 14:07:37 +10:00
Liangsheng Yin
d9fccfefe2 Fix context length (#757) 2024-07-26 18:13:13 -07:00
Liangsheng Yin
679ebcbbdc Deepseek v2 support (#693) 2024-07-26 17:10:07 -07:00
Yineng Zhang
5bd06b4599 fix: use REPO_TOKEN (#755) 2024-07-27 05:56:30 +10:00
Yineng Zhang
9a61182732 fix: add release tag workflow (#754) 2024-07-27 05:48:38 +10:00
Yineng Zhang
eeb2482186 feat: add release tag workflow (#753) 2024-07-27 05:37:02 +10:00
Yineng Zhang
3e455b016e misc: replace deprecated variable HUGGING_FACE_HUB_TOKEN with HF_TOKEN (#752) 2024-07-27 04:19:30 +10:00
Yineng Zhang
8628ab9c8b feat: add docker workflow (#751) 2024-07-27 03:54:51 +10:00
Yineng Zhang
1b77670f39 chore: bump v0.2.1 (#740) 2024-07-26 21:27:41 +10:00
Yineng Zhang
768e05d08f fix benchmark (#743)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-07-26 21:26:13 +10:00
Yineng Zhang
01fbb11bb7 docs: fix typo (#742) 2024-07-26 21:05:53 +10:00
Yineng Zhang
05d216da32 docs: add llama 3.1 405b instruction (#739)
Co-authored-by: Ying1123 <sqy1415@gmail.com>
2024-07-26 21:03:20 +10:00
Yineng Zhang
6b32bb1c0b misc: format (#741) 2024-07-26 21:00:51 +10:00
Toshiki Kataoka
40facad5f1 feat: support token ids in /v1/completions (#736) 2024-07-26 02:53:17 -07:00
Toshiki Kataoka
da504445dc fix /generate without sampling_params (#734) 2024-07-26 01:27:56 -07:00
Ying Sheng
252e0f7bbd fix: small bug for llama-405b fp16 (#733) 2024-07-25 21:14:54 -07:00
Ying Sheng
7f6f2f0f09 Update readme (#731) 2024-07-25 09:16:10 -07:00
Ying Sheng
7802df1e2b Update readme 2024-07-25 08:45:06 -07:00
Ying Sheng
1a491d00cb Bump version to 0.2.0 (#730) 2024-07-25 08:03:36 -07:00
Ying Sheng
8fbba3de3d Fix bugs (fp8 checkpoints, triton cache manager) (#729) 2024-07-25 07:42:00 -07:00
Ying Sheng
ae0f6130cb Revert "fix: fp8 config" (#728) 2024-07-25 07:25:33 -07:00
Yineng Zhang
6010589783 misc: update bug issue template (#727) 2024-07-25 20:52:37 +10:00
Yineng Zhang
926ac01b64 fix: resolve the logo display issue on the PyPI page (#726) 2024-07-25 20:47:46 +10:00
Yineng Zhang
25c881a005 chore: bump v0.1.25 (#725) 2024-07-25 20:04:35 +10:00
Liangsheng Yin
04ec6ba2ac Fix dockerfile and triton cache manager (#720) 2024-07-25 03:04:21 -07:00