Lianmin Zheng
|
562b8857d8
|
Improve error handling (#433)
|
2024-05-12 20:49:04 -07:00 |
|
Shannon Shen
|
04c0b21488
|
Allow input_ids in the input of the /generate endpoint (#363)
|
2024-05-12 15:29:00 -07:00 |
|
Lianmin Zheng
|
6e09cf6a15
|
Misc fixes (#432)
|
2024-05-12 15:05:40 -07:00 |
|
Lianmin Zheng
|
72bb344388
|
Update version to 0.1.15 (#431)
|
2024-05-12 14:22:33 -07:00 |
|
Lianmin Zheng
|
2d580e7a89
|
Fix flashinfer (#430)
|
2024-05-12 08:18:53 -07:00 |
|
Lianmin Zheng
|
3fc97f6709
|
Move openai api server into a separate file (#429)
|
2024-05-12 06:41:32 -07:00 |
|
Lianmin Zheng
|
abc548c707
|
Minor fix for the import path (#428)
|
2024-05-12 05:10:35 -07:00 |
|
Lianmin Zheng
|
aee4f523cf
|
Fix logit processor bugs (#427)
|
2024-05-12 04:54:07 -07:00 |
|
Lianmin Zheng
|
7023f413c6
|
Clean up (#422)
|
2024-05-11 20:55:00 -07:00 |
|
Lianmin Zheng
|
09deb20dee
|
Optimize the memory usage of logits processor (#420)
|
2024-05-11 16:56:42 -07:00 |
|
Qubitium
|
33b242df30
|
Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
|
2024-05-11 16:37:49 -07:00 |
|
Lianmin Zheng
|
a511a2d089
|
restrict vllm version
|
2024-05-09 15:49:29 -07:00 |
|
Liangsheng Yin
|
6ec65f4555
|
Make public APIs more standard. (#416)
|
2024-05-09 15:39:22 +08:00 |
|
Enrique Shockwave
|
e2c31fca5c
|
Include finish reason in meta info response (#415)
|
2024-05-09 15:14:01 +08:00 |
|
Liangsheng Yin
|
d5de20a3ee
|
Fix sync() when fork(1) (#412)
|
2024-05-08 15:15:18 +08:00 |
|
YoungJoong Noah Kim
|
4a1c6ae2ce
|
Add Cohere Command R chat template (#411)
|
2024-05-07 15:18:15 +08:00 |
|
Liangsheng Yin
|
14522e6a26
|
Organize Benchmark (#381)
|
2024-05-05 16:14:17 +08:00 |
|
ZhouXingg
|
183df47282
|
SamplingParams add "spaces_between_special_tokens" argument (#392)
|
2024-04-30 16:17:12 -07:00 |
|
Joschka Braun
|
5c5aba5900
|
Adding RAG tracing & eval cookbook using Parea (#390)
|
2024-04-30 16:13:28 -07:00 |
|
Lianmin Zheng
|
ba67101f99
|
Fix chatml template (#406)
|
2024-04-30 15:53:39 -07:00 |
|
Liangsheng Yin
|
19818b9c2f
|
Minor: style improvement of radix_cache and memory_pool (#395)
|
2024-04-26 01:01:36 +08:00 |
|
Liangsheng Yin
|
9216b10678
|
Improve performance when running with full parallel (#394)
|
2024-04-25 17:29:07 +08:00 |
|
Liangsheng Yin
|
150d7020ed
|
Revert removing the unused imports (#385)
|
2024-04-23 22:36:33 +08:00 |
|
Liangsheng Yin
|
9acc6e3504
|
add .isort.cfg (#378)
|
2024-04-22 22:38:09 +08:00 |
|
Enrique Shockwave
|
cf9d8efdd3
|
llama3 instruct template (#372)
|
2024-04-21 09:40:12 -07:00 |
|
Liangsheng Yin
|
1bf1cf1953
|
Reduce overhead when fork(1) (#375)
|
2024-04-21 17:25:14 +08:00 |
|
Ke Bao
|
e822e5900b
|
Optimize radix tree matching (#364)
|
2024-04-17 09:47:37 -07:00 |
|
Fronx
|
2b6d999191
|
Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-04-16 11:18:24 -07:00 |
|
Lianmin Zheng
|
65501a9cf1
|
Fix commandr import; format code
|
2024-04-16 18:10:12 +00:00 |
|
ZhouXingg
|
db611066ad
|
support command-r (#369)
|
2024-04-16 10:36:51 -07:00 |
|
Liangsheng Yin
|
62b3812b69
|
Time cost utils (#355)
|
2024-04-09 23:27:31 +08:00 |
|
Tom Dörr
|
550a4f78f3
|
Fix typos in infer_batch.py (#354)
|
2024-04-09 15:10:05 +08:00 |
|
SimoneRaponi
|
ff99c38a07
|
Add timeout to get_meta_info (#346)
Co-authored-by: simone <simone.raponi@equixely.com>
|
2024-04-03 22:22:06 +08:00 |
|
Qubitium
|
c9de3e169c
|
Eliminate 2 gpu ops during sampling when logit_bias is zero (#338)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-04-03 13:56:06 +08:00 |
|
Liangsheng Yin
|
ed27a6b992
|
Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" (#345)
|
2024-04-03 12:45:01 +08:00 |
|
Liangsheng Yin
|
463c6632a8
|
Eliminate 2 gpu ops during sampling when logit_bias is zero (#343)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
|
2024-04-02 19:14:55 +08:00 |
|
Ying Sheng
|
b0890631a0
|
fix gemma import error
|
2024-04-01 07:36:52 +00:00 |
|
Junlong Li
|
cb389c91bc
|
Fix llava parallelism/fork bug (#315)
|
2024-03-28 19:24:54 -07:00 |
|
Qubitium
|
eddaa2b599
|
Add support for new autogptq quant_config.checkpoint_format (#332)
|
2024-03-28 19:24:16 -07:00 |
|
Liangsheng Yin
|
2af565b3bb
|
[model] DBRX-instruct support (#337)
|
2024-03-28 10:05:19 -07:00 |
|
Liangsheng Yin
|
3842eba5fa
|
Logprobs Refractor (#331)
|
2024-03-28 14:34:49 +08:00 |
|
Liangsheng Yin
|
24e59f5350
|
model_runner simplify (#329)
|
2024-03-24 19:48:37 +08:00 |
|
Liangsheng Yin
|
7523541962
|
model_rpc style improvement (#293)
|
2024-03-24 15:41:24 +08:00 |
|
Jani Monoses
|
30d17840fc
|
Update dependencies (#326)
|
2024-03-23 10:15:58 -07:00 |
|
Qubitium
|
ce216c80dc
|
Cleanup codebase: removed unnecessary code/logic (#298)
|
2024-03-23 10:15:16 -07:00 |
|
Lianmin Zheng
|
51104cd405
|
Update version to v0.1.14 (#324)
|
2024-03-22 13:42:22 -07:00 |
|
Lianmin Zheng
|
e2b2f0a213
|
Support oai in benchmark/mmlu (#323)
|
2024-03-22 13:37:57 -07:00 |
|
Jani Monoses
|
b57abe1663
|
Add StableLM model. (#301)
|
2024-03-22 13:24:08 -07:00 |
|
Jani Monoses
|
e57f079275
|
Use Anthropic messages API (#304)
|
2024-03-22 13:23:31 -07:00 |
|
Li Bo
|
08df63a6f8
|
[Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models (#311)
|
2024-03-22 12:19:58 -07:00 |
|