Commit Graph

168 Commits

Author SHA1 Message Date
Lianmin Zheng
aee4f523cf Fix logit processor bugs (#427) 2024-05-12 04:54:07 -07:00
Lianmin Zheng
7023f413c6 Clean up (#422) 2024-05-11 20:55:00 -07:00
Lianmin Zheng
09deb20dee Optimize the memory usage of logits processor (#420) 2024-05-11 16:56:42 -07:00
Qubitium
33b242df30 Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
2024-05-11 16:37:49 -07:00
Lianmin Zheng
a511a2d089 restrict vllm version 2024-05-09 15:49:29 -07:00
Liangsheng Yin
6ec65f4555 Make public APIs more standard. (#416) 2024-05-09 15:39:22 +08:00
Enrique Shockwave
e2c31fca5c Include finish reason in meta info response (#415) 2024-05-09 15:14:01 +08:00
Liangsheng Yin
d5de20a3ee Fix sync() when fork(1) (#412) 2024-05-08 15:15:18 +08:00
YoungJoong Noah Kim
4a1c6ae2ce Add Cohere Command R chat template (#411) 2024-05-07 15:18:15 +08:00
Liangsheng Yin
14522e6a26 Organize Benchmark (#381) 2024-05-05 16:14:17 +08:00
ZhouXingg
183df47282 SamplingParams add "spaces_between_special_tokens" argument (#392) 2024-04-30 16:17:12 -07:00
Joschka Braun
5c5aba5900 Adding RAG tracing & eval cookbook using Parea (#390) 2024-04-30 16:13:28 -07:00
Lianmin Zheng
ba67101f99 Fix chatml template (#406) 2024-04-30 15:53:39 -07:00
Liangsheng Yin
19818b9c2f Minor: style improvement of radix_cache and memory_pool (#395) 2024-04-26 01:01:36 +08:00
Liangsheng Yin
9216b10678 Improve performance when running with full parallel (#394) 2024-04-25 17:29:07 +08:00
Liangsheng Yin
150d7020ed Revert removing the unused imports (#385) 2024-04-23 22:36:33 +08:00
Liangsheng Yin
9acc6e3504 add .isort.cfg (#378) 2024-04-22 22:38:09 +08:00
Enrique Shockwave
cf9d8efdd3 llama3 instruct template (#372) 2024-04-21 09:40:12 -07:00
Liangsheng Yin
1bf1cf1953 Reduce overhead when fork(1) (#375) 2024-04-21 17:25:14 +08:00
Ke Bao
e822e5900b Optimize radix tree matching (#364) 2024-04-17 09:47:37 -07:00
Fronx
2b6d999191 Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-04-16 11:18:24 -07:00
Lianmin Zheng
65501a9cf1 Fix commandr import; format code 2024-04-16 18:10:12 +00:00
ZhouXingg
db611066ad support command-r (#369) 2024-04-16 10:36:51 -07:00
Liangsheng Yin
62b3812b69 Time cost utils (#355) 2024-04-09 23:27:31 +08:00
Tom Dörr
550a4f78f3 Fix typos in infer_batch.py (#354) 2024-04-09 15:10:05 +08:00
SimoneRaponi
ff99c38a07 Add timeout to get_meta_info (#346)
Co-authored-by: simone <simone.raponi@equixely.com>
2024-04-03 22:22:06 +08:00
Qubitium
c9de3e169c Eliminate 2 gpu ops during sampling when logit_bias is zero (#338)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-04-03 13:56:06 +08:00
Liangsheng Yin
ed27a6b992 Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" (#345) 2024-04-03 12:45:01 +08:00
Liangsheng Yin
463c6632a8 Eliminate 2 gpu ops during sampling when logit_bias is zero (#343)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
2024-04-02 19:14:55 +08:00
Ying Sheng
b0890631a0 fix gemma import error 2024-04-01 07:36:52 +00:00
Junlong Li
cb389c91bc Fix llava parallelism/fork bug (#315) 2024-03-28 19:24:54 -07:00
Qubitium
eddaa2b599 Add support for new autogptq quant_config.checkpoint_format (#332) 2024-03-28 19:24:16 -07:00
Liangsheng Yin
2af565b3bb [model] DBRX-instruct support (#337) 2024-03-28 10:05:19 -07:00
Liangsheng Yin
3842eba5fa Logprobs Refractor (#331) 2024-03-28 14:34:49 +08:00
Liangsheng Yin
24e59f5350 model_runner simplify (#329) 2024-03-24 19:48:37 +08:00
Liangsheng Yin
7523541962 model_rpc style improvement (#293) 2024-03-24 15:41:24 +08:00
Jani Monoses
30d17840fc Update dependencies (#326) 2024-03-23 10:15:58 -07:00
Qubitium
ce216c80dc Cleanup codebase: removed unnecessary code/logic (#298) 2024-03-23 10:15:16 -07:00
Lianmin Zheng
51104cd405 Update version to v0.1.14 (#324) 2024-03-22 13:42:22 -07:00
Lianmin Zheng
e2b2f0a213 Support oai in benchmark/mmlu (#323) 2024-03-22 13:37:57 -07:00
Jani Monoses
b57abe1663 Add StableLM model. (#301) 2024-03-22 13:24:08 -07:00
Jani Monoses
e57f079275 Use Anthropic messages API (#304) 2024-03-22 13:23:31 -07:00
Li Bo
08df63a6f8 [Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models (#311) 2024-03-22 12:19:58 -07:00
ZhouGongZaiShi
77835756a7 Fix outlines-0.0.35 incompatibility (#291)
Co-authored-by: ZX <zx@lbx.dev>
2024-03-22 12:19:11 -07:00
Liurl
ed31579971 Fix marlin model loading compat with autogptq (#290)
Co-authored-by: LRL <lrl@lbx.dev>
2024-03-13 13:15:43 +08:00
Qubitium
92e2d74fd0 Fix env (docker) compat due to __file__ usage (#288) 2024-03-13 13:02:48 +08:00
Enrique Shockwave
d9b3b01883 enable marlin kernels (#286) 2024-03-12 22:10:12 -04:00
Qubitium
ad1dd74673 Fix flashinfer >= 0.0.3 compat (#282) 2024-03-12 21:45:58 +08:00
Qubitium
b2eb080501 Fix Runtime missing some ServerArgs options (#281) 2024-03-11 22:32:15 +08:00
Lianmin Zheng
4aa5dd2c5f Update version to v0.1.13 (#280) 2024-03-11 05:49:27 -07:00