josephrocca
|
dff2860a69
|
Fix CORS compatibility with OpenAI, vLLM, TGI, LMDeploy (#1373)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-09-11 02:35:03 +10:00 |
|
Lianmin Zheng
|
e4d68afcf0
|
[Minor] Many cleanup (#1357)
|
2024-09-09 04:14:11 -07:00 |
|
Kai-Hsun Chen
|
0836055324
|
[Chore] Rename model_overide_args to model_override_args (#1284)
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-09-01 03:14:56 -07:00 |
|
Lianmin Zheng
|
0a97d7962d
|
[Fix] Fix OOM in llava base class (#1249)
|
2024-08-28 08:45:49 -07:00 |
|
Lianmin Zheng
|
bf53bf5142
|
[Fix] Fix llava on multi images (#1247)
|
2024-08-28 06:33:05 -07:00 |
|
Yineng Zhang
|
198974cd1a
|
feat: support sm75 with FlashInfer v0.1.6 (#1233)
|
2024-08-28 18:39:12 +10:00 |
|
caiyueliang
|
2f1d92834f
|
[FEAT] Support batches cancel (#1222)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 23:28:26 +00:00 |
|
Lianmin Zheng
|
902278008a
|
[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208)
|
2024-08-25 14:46:34 -07:00 |
|
Chayenne
|
30b4f771b0
|
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-25 10:29:12 -07:00 |
|
Ying Sheng
|
1cb4da5c5f
|
[Fix] the issue of random order when input is a list (#1199)
|
2024-08-24 21:43:03 -07:00 |
|
Lianmin Zheng
|
5623826f73
|
[Minor] Improve logging and rename the health check endpoint name (#1180)
|
2024-08-21 19:24:36 -07:00 |
|
Lianmin Zheng
|
bea2bb9eea
|
Improve multi-node stability (#1171)
|
2024-08-20 22:35:05 -07:00 |
|
Shan Yu
|
cd10654e7e
|
[Feat] Support update weights without restart server (#1157)
|
2024-08-20 13:48:24 -07:00 |
|
Lucien
|
6242c399ab
|
Generate 1 token to verify the health of the inference service in /health (#1154)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-21 03:14:34 +10:00 |
|
yichuan~
|
b997a18d74
|
[Feat]Add support for optional start len of logprobs (#1035)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-08-18 23:45:41 -07:00 |
|
Lianmin Zheng
|
cdc8d60752
|
Improve the code style: more comments and remove useless packages (#1139)
|
2024-08-17 14:37:52 -07:00 |
|
Liangsheng Yin
|
3694f8f996
|
Mixed style of chunked prefill (#1013)
|
2024-08-16 09:13:00 +00:00 |
|
Lianmin Zheng
|
0cb099e20a
|
set CUDA_DEVICE_MAX_CONNECTIONS=1 (#1113)
|
2024-08-16 03:47:39 +10:00 |
|
Lianmin Zheng
|
326df4bab2
|
Use a single workspace for flashinfer (#1077)
|
2024-08-14 19:25:37 -07:00 |
|
Ying Sheng
|
6767e2229f
|
Support jinja as chat template file (#1104)
|
2024-08-14 17:43:14 -07:00 |
|
rainred
|
616b59f384
|
[Feature] modify Runtime to support skip_tokenizer_init (#1088)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-14 00:28:04 -07:00 |
|
Lianmin Zheng
|
c877292cc1
|
Re-organize CI tests (#1052)
|
2024-08-12 03:39:01 -07:00 |
|
Yineng Zhang
|
94752ac811
|
feat: use FlashInfer rmsnorm and silu (#907)
|
2024-08-11 14:57:13 +10:00 |
|
gryffindor-rr
|
9cf0a5bada
|
Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-09 12:14:13 -07:00 |
|
Ying Sheng
|
b16e856f11
|
Add openai embedding API (#997)
|
2024-08-09 11:19:18 -07:00 |
|
liuyhwangyh
|
b91a4cb1b1
|
support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
|
2024-08-09 02:52:14 -07:00 |
|
Ying Sheng
|
e040a2450b
|
Add e5-mistral embedding model - step 3/3 (#988)
|
2024-08-08 16:31:19 -07:00 |
|
Ying Sheng
|
9f662501a3
|
Move torch.compile configs into cuda_graph_runner.py (#993)
|
2024-08-08 13:20:30 -07:00 |
|
Ying Sheng
|
ff68ae857a
|
Show more error messages for warmup errors (#932)
|
2024-08-06 23:57:06 -07:00 |
|
yichuan~
|
795eab6dda
|
Add support for Batch API test (#936)
|
2024-08-06 23:52:10 -07:00 |
|
Ying Sheng
|
0d4f3a9fcd
|
Make API Key OpenAI-compatible (#917)
|
2024-08-04 13:35:44 -07:00 |
|
Ying Sheng
|
70cc0749ce
|
Add model accuracy test - step 1 (#866)
|
2024-08-03 18:20:50 -07:00 |
|
任嘉
|
4013a4e1b0
|
Implement served_model_name to customize model id when use local mode… (#749)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-01 17:13:51 -07:00 |
|
Ying Sheng
|
60340a3643
|
Improve the coverage of the openai api server test (#878)
|
2024-08-01 16:01:30 -07:00 |
|
Ying Sheng
|
72b6ea88b4
|
Make scripts under /test/srt as unit tests (#875)
|
2024-08-01 14:34:55 -07:00 |
|
Ying Sheng
|
6f221d4ca0
|
Fix unit tests for the frontend language part (#872)
|
2024-08-01 12:39:12 -07:00 |
|
Yineng Zhang
|
bc3eaac2b8
|
chore: update flashinfer to v0.1.3 (#850)
|
2024-08-01 04:37:05 +10:00 |
|
Liangsheng Yin
|
cdcbde5fc3
|
Code structure refactor (#807)
|
2024-07-29 23:04:48 -07:00 |
|
yichuan~
|
084fa54d37
|
Add support for OpenAI API : offline batch(file) processing (#699)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-07-29 13:07:18 -07:00 |
|
Ying Sheng
|
eba458bd19
|
Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806)
|
2024-07-29 12:20:42 -07:00 |
|
Ying Sheng
|
7d352b4fdd
|
Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805)
|
2024-07-29 11:39:12 -07:00 |
|
Yineng Zhang
|
87064015d9
|
fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803)
|
2024-07-29 11:00:52 -07:00 |
|
Liangsheng Yin
|
7cd4f244a4
|
Chunked prefill (#800)
|
2024-07-29 03:32:58 -07:00 |
|
Ying Sheng
|
98111fbe3e
|
Revert "Chunked prefill support" (#799)
|
2024-07-29 02:38:31 -07:00 |
|
Liangsheng Yin
|
2ec39ab712
|
Chunked prefill support (#797)
|
2024-07-29 02:21:50 -07:00 |
|
Yineng Zhang
|
dd7e8b9421
|
chore: add copyright for srt (#790)
|
2024-07-28 23:07:12 +10:00 |
|
Lianmin Zheng
|
752e643007
|
Allow disabling flashinfer sampling kernel (#778)
|
2024-07-27 20:18:56 -07:00 |
|
Mingyi
|
e4db4e5ba5
|
minor refactor: move check server args to server_args.py (#774)
|
2024-07-27 19:03:40 -07:00 |
|
Ying Sheng
|
8fbba3de3d
|
Fix bugs (fp8 checkpoints, triton cache manager) (#729)
|
2024-07-25 07:42:00 -07:00 |
|
Liangsheng Yin
|
04ec6ba2ac
|
Fix dockerfile and triton cache manager (#720)
|
2024-07-25 03:04:21 -07:00 |
|