Yineng Zhang
|
dd7e8b9421
|
chore: add copyright for srt (#790)
|
2024-07-28 23:07:12 +10:00 |
|
Yineng Zhang
|
1f013d64eb
|
docs: make badges center (#789)
|
2024-07-28 22:27:52 +10:00 |
|
Yineng Zhang
|
628e1fa760
|
docs: update README (#788)
|
2024-07-28 22:24:27 +10:00 |
|
Ying Sheng
|
c71880f896
|
Vectorize logprobs computation (#787)
|
2024-07-28 05:22:14 -07:00 |
|
Ying Sheng
|
bcb6611a46
|
Update README.md
|
2024-07-28 01:00:06 -07:00 |
|
Yineng Zhang
|
fa2aa0db0a
|
docs: update index (#786)
|
2024-07-28 17:22:00 +10:00 |
|
Yineng Zhang
|
6a387a69cc
|
fix: exclude logo png in gitignore (#785)
|
2024-07-28 17:08:16 +10:00 |
|
Yineng Zhang
|
27f5ce0a6c
|
fix: init readthedocs support (#784)
|
2024-07-28 16:55:54 +10:00 |
|
Yineng Zhang
|
948625799e
|
docs: init readthedocs support (#783)
|
2024-07-28 16:50:31 +10:00 |
|
Yineng Zhang
|
68e5262699
|
fix: replace pillow with PIL in PACKAGE_LIST (#781)
|
2024-07-28 14:06:24 +10:00 |
|
Lianmin Zheng
|
bc1154c399
|
Bump version to 0.2.6 (#779)
|
2024-07-27 20:29:33 -07:00 |
|
Lianmin Zheng
|
752e643007
|
Allow disabling flashinfer sampling kernel (#778)
|
2024-07-27 20:18:56 -07:00 |
|
Lianmin Zheng
|
30db99b3d9
|
Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776)
|
2024-07-27 19:50:34 -07:00 |
|
Lianmin Zheng
|
0a409bd438
|
Fix return_log_probs with cuda graph (#775)
|
2024-07-27 19:15:09 -07:00 |
|
Mingyi
|
e4db4e5ba5
|
minor refactor: move check server args to server_args.py (#774)
|
2024-07-27 19:03:40 -07:00 |
|
Lianmin Zheng
|
bbc07c4197
|
Move sampling logits to float32 (#773)
|
2024-07-27 17:30:12 -07:00 |
|
Lianmin Zheng
|
a036d41980
|
Fix max new tokens (#772)
|
2024-07-27 17:22:18 -07:00 |
|
Lianmin Zheng
|
f95e661757
|
Fix max_tokens for OpenAI chat completion API (#766)
|
2024-07-27 15:44:27 -07:00 |
|
Yineng Zhang
|
de854fb5c5
|
feat: add fake tag (#770)
|
2024-07-28 02:22:22 +10:00 |
|
Lianmin Zheng
|
f64b2a9bc0
|
Add slack invitation link.
|
2024-07-27 06:29:15 -07:00 |
|
Ying Sheng
|
9f95dcc64f
|
Update readme (#769)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
|
2024-07-27 06:12:16 -07:00 |
|
Lianmin Zheng
|
0736b27020
|
[Minor] Improve the code style in TokenizerManager (#767)
|
2024-07-27 05:05:15 -07:00 |
|
Ke Bao
|
3fdab91912
|
Fix TransformerTokenizer init for chatglm2 & 3 (#761)
|
2024-07-27 02:44:46 -07:00 |
|
Liangsheng Yin
|
ba29504b21
|
Update supported models (#763)
|
2024-07-27 15:53:53 +10:00 |
|
Yineng Zhang
|
a72342f180
|
fix: not run workflows on fork repo (#762)
|
2024-07-27 14:51:33 +10:00 |
|
Yineng Zhang
|
c3c74bf874
|
docs: update model support (#760)
|
2024-07-27 14:07:37 +10:00 |
|
Liangsheng Yin
|
d9fccfefe2
|
Fix context length (#757)
|
2024-07-26 18:13:13 -07:00 |
|
Liangsheng Yin
|
679ebcbbdc
|
Deepseek v2 support (#693)
|
2024-07-26 17:10:07 -07:00 |
|
Yineng Zhang
|
5bd06b4599
|
fix: use REPO_TOKEN (#755)
|
2024-07-27 05:56:30 +10:00 |
|
Yineng Zhang
|
9a61182732
|
fix: add release tag workflow (#754)
|
2024-07-27 05:48:38 +10:00 |
|
Yineng Zhang
|
eeb2482186
|
feat: add release tag workflow (#753)
|
2024-07-27 05:37:02 +10:00 |
|
Yineng Zhang
|
3e455b016e
|
misc: replace deprecated variable HUGGING_FACE_HUB_TOKEN with HF_TOKEN (#752)
|
2024-07-27 04:19:30 +10:00 |
|
Yineng Zhang
|
8628ab9c8b
|
feat: add docker workflow (#751)
|
2024-07-27 03:54:51 +10:00 |
|
Yineng Zhang
|
1b77670f39
|
chore: bump v0.2.1 (#740)
|
2024-07-26 21:27:41 +10:00 |
|
Yineng Zhang
|
768e05d08f
|
fix benchmark (#743)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-07-26 21:26:13 +10:00 |
|
Yineng Zhang
|
01fbb11bb7
|
docs: fix typo (#742)
|
2024-07-26 21:05:53 +10:00 |
|
Yineng Zhang
|
05d216da32
|
docs: add llama 3.1 405b instruction (#739)
Co-authored-by: Ying1123 <sqy1415@gmail.com>
|
2024-07-26 21:03:20 +10:00 |
|
Yineng Zhang
|
6b32bb1c0b
|
misc: format (#741)
|
2024-07-26 21:00:51 +10:00 |
|
Toshiki Kataoka
|
40facad5f1
|
feat: support token ids in /v1/completions (#736)
|
2024-07-26 02:53:17 -07:00 |
|
Toshiki Kataoka
|
da504445dc
|
fix /generate without sampling_params (#734)
|
2024-07-26 01:27:56 -07:00 |
|
Ying Sheng
|
252e0f7bbd
|
fix: small bug for llama-405b fp16 (#733)
|
2024-07-25 21:14:54 -07:00 |
|
Ying Sheng
|
7f6f2f0f09
|
Update readme (#731)
|
2024-07-25 09:16:10 -07:00 |
|
Ying Sheng
|
7802df1e2b
|
Update readme
|
2024-07-25 08:45:06 -07:00 |
|
Ying Sheng
|
1a491d00cb
|
Bump version to 0.2.0 (#730)
|
2024-07-25 08:03:36 -07:00 |
|
Ying Sheng
|
8fbba3de3d
|
Fix bugs (fp8 checkpoints, triton cache manager) (#729)
|
2024-07-25 07:42:00 -07:00 |
|
Ying Sheng
|
ae0f6130cb
|
Revert "fix: fp8 config" (#728)
|
2024-07-25 07:25:33 -07:00 |
|
Yineng Zhang
|
6010589783
|
misc: update bug issue template (#727)
|
2024-07-25 20:52:37 +10:00 |
|
Yineng Zhang
|
926ac01b64
|
fix: resolve the logo display issue on the PyPI page (#726)
|
2024-07-25 20:47:46 +10:00 |
|
Yineng Zhang
|
25c881a005
|
chore: bump v0.1.25 (#725)
|
2024-07-25 20:04:35 +10:00 |
|
Liangsheng Yin
|
04ec6ba2ac
|
Fix dockerfile and triton cache manager (#720)
|
2024-07-25 03:04:21 -07:00 |
|