Ying Sheng
|
5be9eb8a8c
|
Add PUT for generate api (#448)
|
2024-05-17 02:35:15 -07:00 |
|
Lianmin Zheng
|
c05956e534
|
Simplify port allocation (#447)
|
2024-05-16 18:07:30 -07:00 |
|
Matthias Gerstgrasser
|
d75dc20fae
|
Add finish_reason to OpenAI API (#446)
|
2024-05-16 14:55:05 -07:00 |
|
Liangsheng Yin
|
690d162d97
|
Format code (#441)
|
2024-05-14 22:40:46 +08:00 |
|
Kaichen Zhang - NTU
|
664287b2a7
|
[Feat] Add llava qwen, llava mistral (#419)
Co-authored-by: Bo Li <drluodian@gmail.com>
|
2024-05-13 22:17:50 -07:00 |
|
Lianmin Zheng
|
e0ae5d42ec
|
Update version to 0.1.16 (#438)
|
2024-05-13 17:29:17 -07:00 |
|
Lianmin Zheng
|
32de16ce2f
|
Fix streaming (#437)
|
2024-05-13 17:26:18 -07:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Lianmin Zheng
|
5dc55a5f02
|
Handle truncation errors (#436)
|
2024-05-13 15:56:00 -07:00 |
|
Lianmin Zheng
|
4231a42fa8
|
Fix import of global_config
|
2024-05-13 12:11:55 -07:00 |
|
Liangsheng Yin
|
39191c8515
|
Cache optimizations (#418)
|
2024-05-13 12:47:13 +08:00 |
|
Lianmin Zheng
|
562b8857d8
|
Improve error handling (#433)
|
2024-05-12 20:49:04 -07:00 |
|
Shannon Shen
|
04c0b21488
|
Allow input_ids in the input of the /generate endpoint (#363)
|
2024-05-12 15:29:00 -07:00 |
|
Lianmin Zheng
|
6e09cf6a15
|
Misc fixes (#432)
|
2024-05-12 15:05:40 -07:00 |
|
Lianmin Zheng
|
72bb344388
|
Update version to 0.1.15 (#431)
|
2024-05-12 14:22:33 -07:00 |
|
Lianmin Zheng
|
2d580e7a89
|
Fix flashinfer (#430)
|
2024-05-12 08:18:53 -07:00 |
|
Lianmin Zheng
|
3fc97f6709
|
Move openai api server into a separate file (#429)
|
2024-05-12 06:41:32 -07:00 |
|
Lianmin Zheng
|
abc548c707
|
Minor fix for the import path (#428)
|
2024-05-12 05:10:35 -07:00 |
|
Lianmin Zheng
|
aee4f523cf
|
Fix logit processor bugs (#427)
|
2024-05-12 04:54:07 -07:00 |
|
Lianmin Zheng
|
7023f413c6
|
Clean up (#422)
|
2024-05-11 20:55:00 -07:00 |
|
Lianmin Zheng
|
09deb20dee
|
Optimize the memory usage of logits processor (#420)
|
2024-05-11 16:56:42 -07:00 |
|
Qubitium
|
33b242df30
|
Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
|
2024-05-11 16:37:49 -07:00 |
|
Lianmin Zheng
|
a511a2d089
|
restrict vllm version
|
2024-05-09 15:49:29 -07:00 |
|
Liangsheng Yin
|
6ec65f4555
|
Make public APIs more standard. (#416)
|
2024-05-09 15:39:22 +08:00 |
|
Enrique Shockwave
|
e2c31fca5c
|
Include finish reason in meta info response (#415)
|
2024-05-09 15:14:01 +08:00 |
|
Liangsheng Yin
|
d5de20a3ee
|
Fix sync() when fork(1) (#412)
|
2024-05-08 15:15:18 +08:00 |
|
YoungJoong Noah Kim
|
4a1c6ae2ce
|
Add Cohere Command R chat template (#411)
|
2024-05-07 15:18:15 +08:00 |
|
Liangsheng Yin
|
14522e6a26
|
Organize Benchmark (#381)
|
2024-05-05 16:14:17 +08:00 |
|
ZhouXingg
|
183df47282
|
SamplingParams add "spaces_between_special_tokens" argument (#392)
|
2024-04-30 16:17:12 -07:00 |
|
Joschka Braun
|
5c5aba5900
|
Adding RAG tracing & eval cookbook using Parea (#390)
|
2024-04-30 16:13:28 -07:00 |
|
Lianmin Zheng
|
ba67101f99
|
Fix chatml template (#406)
|
2024-04-30 15:53:39 -07:00 |
|
Liangsheng Yin
|
19818b9c2f
|
Minor: style improvement of radix_cache and memory_pool (#395)
|
2024-04-26 01:01:36 +08:00 |
|
Liangsheng Yin
|
9216b10678
|
Improve performance when running with full parallel (#394)
|
2024-04-25 17:29:07 +08:00 |
|
Liangsheng Yin
|
150d7020ed
|
Revert removing the unused imports (#385)
|
2024-04-23 22:36:33 +08:00 |
|
Liangsheng Yin
|
9acc6e3504
|
add .isort.cfg (#378)
|
2024-04-22 22:38:09 +08:00 |
|
Enrique Shockwave
|
cf9d8efdd3
|
llama3 instruct template (#372)
|
2024-04-21 09:40:12 -07:00 |
|
Liangsheng Yin
|
1bf1cf1953
|
Reduce overhead when fork(1) (#375)
|
2024-04-21 17:25:14 +08:00 |
|
Ke Bao
|
e822e5900b
|
Optimize radix tree matching (#364)
|
2024-04-17 09:47:37 -07:00 |
|
Fronx
|
2b6d999191
|
Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-04-16 11:18:24 -07:00 |
|
Lianmin Zheng
|
65501a9cf1
|
Fix commandr import; format code
|
2024-04-16 18:10:12 +00:00 |
|
ZhouXingg
|
db611066ad
|
support command-r (#369)
|
2024-04-16 10:36:51 -07:00 |
|
Liangsheng Yin
|
62b3812b69
|
Time cost utils (#355)
|
2024-04-09 23:27:31 +08:00 |
|
Tom Dörr
|
550a4f78f3
|
Fix typos in infer_batch.py (#354)
|
2024-04-09 15:10:05 +08:00 |
|
SimoneRaponi
|
ff99c38a07
|
Add timeout to get_meta_info (#346)
Co-authored-by: simone <simone.raponi@equixely.com>
|
2024-04-03 22:22:06 +08:00 |
|
Qubitium
|
c9de3e169c
|
Eliminate 2 gpu ops during sampling when logit_bias is zero (#338)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-04-03 13:56:06 +08:00 |
|
Liangsheng Yin
|
ed27a6b992
|
Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" (#345)
|
2024-04-03 12:45:01 +08:00 |
|
Liangsheng Yin
|
463c6632a8
|
Eliminate 2 gpu ops during sampling when logit_bias is zero (#343)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
|
2024-04-02 19:14:55 +08:00 |
|
Ying Sheng
|
b0890631a0
|
fix gemma import error
|
2024-04-01 07:36:52 +00:00 |
|
Junlong Li
|
cb389c91bc
|
Fix llava parallelism/fork bug (#315)
|
2024-03-28 19:24:54 -07:00 |
|
Qubitium
|
eddaa2b599
|
Add support for new autogptq quant_config.checkpoint_format (#332)
|
2024-03-28 19:24:16 -07:00 |
|