Liangsheng Yin
|
19818b9c2f
|
Minor: style improvement of radix_cache and memory_pool (#395)
|
2024-04-26 01:01:36 +08:00 |
|
Liangsheng Yin
|
9216b10678
|
Improve performance when running with full parallel (#394)
|
2024-04-25 17:29:07 +08:00 |
|
Liangsheng Yin
|
150d7020ed
|
Revert removing the unused imports (#385)
|
2024-04-23 22:36:33 +08:00 |
|
Liangsheng Yin
|
9acc6e3504
|
add .isort.cfg (#378)
|
2024-04-22 22:38:09 +08:00 |
|
Enrique Shockwave
|
cf9d8efdd3
|
llama3 instruct template (#372)
|
2024-04-21 09:40:12 -07:00 |
|
Liangsheng Yin
|
1bf1cf1953
|
Reduce overhead when fork(1) (#375)
|
2024-04-21 17:25:14 +08:00 |
|
Ke Bao
|
e822e5900b
|
Optimize radix tree matching (#364)
|
2024-04-17 09:47:37 -07:00 |
|
Fronx
|
2b6d999191
|
Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-04-16 11:18:24 -07:00 |
|
Lianmin Zheng
|
65501a9cf1
|
Fix commandr import; format code
|
2024-04-16 18:10:12 +00:00 |
|
ZhouXingg
|
db611066ad
|
support command-r (#369)
|
2024-04-16 10:36:51 -07:00 |
|
Liangsheng Yin
|
62b3812b69
|
Time cost utils (#355)
|
2024-04-09 23:27:31 +08:00 |
|
Tom Dörr
|
550a4f78f3
|
Fix typos in infer_batch.py (#354)
|
2024-04-09 15:10:05 +08:00 |
|
SimoneRaponi
|
ff99c38a07
|
Add timeout to get_meta_info (#346)
Co-authored-by: simone <simone.raponi@equixely.com>
|
2024-04-03 22:22:06 +08:00 |
|
Qubitium
|
c9de3e169c
|
Eliminate 2 gpu ops during sampling when logit_bias is zero (#338)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-04-03 13:56:06 +08:00 |
|
Liangsheng Yin
|
ed27a6b992
|
Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" (#345)
|
2024-04-03 12:45:01 +08:00 |
|
Liangsheng Yin
|
463c6632a8
|
Eliminate 2 gpu ops during sampling when logit_bias is zero (#343)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
|
2024-04-02 19:14:55 +08:00 |
|
Ying Sheng
|
b0890631a0
|
fix gemma import error
|
2024-04-01 07:36:52 +00:00 |
|
Junlong Li
|
cb389c91bc
|
Fix llava parallelism/fork bug (#315)
|
2024-03-28 19:24:54 -07:00 |
|
Qubitium
|
eddaa2b599
|
Add support for new autogptq quant_config.checkpoint_format (#332)
|
2024-03-28 19:24:16 -07:00 |
|
Liangsheng Yin
|
2af565b3bb
|
[model] DBRX-instruct support (#337)
|
2024-03-28 10:05:19 -07:00 |
|
Liangsheng Yin
|
3842eba5fa
|
Logprobs Refractor (#331)
|
2024-03-28 14:34:49 +08:00 |
|
Liangsheng Yin
|
24e59f5350
|
model_runner simplify (#329)
|
2024-03-24 19:48:37 +08:00 |
|
Liangsheng Yin
|
7523541962
|
model_rpc style improvement (#293)
|
2024-03-24 15:41:24 +08:00 |
|
Jani Monoses
|
30d17840fc
|
Update dependencies (#326)
|
2024-03-23 10:15:58 -07:00 |
|
Qubitium
|
ce216c80dc
|
Cleanup codebase: removed unnecessary code/logic (#298)
|
2024-03-23 10:15:16 -07:00 |
|
Lianmin Zheng
|
51104cd405
|
Update version to v0.1.14 (#324)
|
2024-03-22 13:42:22 -07:00 |
|
Lianmin Zheng
|
e2b2f0a213
|
Support oai in benchmark/mmlu (#323)
|
2024-03-22 13:37:57 -07:00 |
|
Jani Monoses
|
b57abe1663
|
Add StableLM model. (#301)
|
2024-03-22 13:24:08 -07:00 |
|
Jani Monoses
|
e57f079275
|
Use Anthropic messages API (#304)
|
2024-03-22 13:23:31 -07:00 |
|
Li Bo
|
08df63a6f8
|
[Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models (#311)
|
2024-03-22 12:19:58 -07:00 |
|
ZhouGongZaiShi
|
77835756a7
|
Fix outlines-0.0.35 incompatibility (#291)
Co-authored-by: ZX <zx@lbx.dev>
|
2024-03-22 12:19:11 -07:00 |
|
Liurl
|
ed31579971
|
Fix marlin model loading compat with autogptq (#290)
Co-authored-by: LRL <lrl@lbx.dev>
|
2024-03-13 13:15:43 +08:00 |
|
Qubitium
|
92e2d74fd0
|
Fix env (docker) compat due to __file__ usage (#288)
|
2024-03-13 13:02:48 +08:00 |
|
Enrique Shockwave
|
d9b3b01883
|
enable marlin kernels (#286)
|
2024-03-12 22:10:12 -04:00 |
|
Qubitium
|
ad1dd74673
|
Fix flashinfer >= 0.0.3 compat (#282)
|
2024-03-12 21:45:58 +08:00 |
|
Qubitium
|
b2eb080501
|
Fix Runtime missing some ServerArgs options (#281)
|
2024-03-11 22:32:15 +08:00 |
|
Lianmin Zheng
|
4aa5dd2c5f
|
Update version to v0.1.13 (#280)
|
2024-03-11 05:49:27 -07:00 |
|
Lianmin Zheng
|
13662fd533
|
Fix RuntimeEndpoint (#279)
|
2024-03-11 05:24:24 -07:00 |
|
Alessio Dalla Piazza
|
d5ae2ebaa2
|
Add Support for API Key Authentication (#230)
|
2024-03-11 05:16:10 -07:00 |
|
Liangsheng Yin
|
1b35547927
|
Organize server_args (#277)
|
2024-03-11 20:06:52 +08:00 |
|
Lianmin Zheng
|
faba293a0d
|
Improve gemma and documentations (#278)
|
2024-03-11 04:43:39 -07:00 |
|
Liangsheng Yin
|
89885b31ef
|
Gemma Support (#256)
|
2024-03-11 12:14:27 +08:00 |
|
Geary.Z
|
64fe311593
|
replace skip_embed with input_embeds (#222)
|
2024-03-10 19:04:52 -07:00 |
|
Liangsheng Yin
|
a7ace9c88d
|
Fix qwen config (#261)
|
2024-03-10 18:54:18 -07:00 |
|
Lin Tianchuan
|
30d67b2bca
|
Add set_var to interpreter.py (#263)
|
2024-03-07 23:20:11 +08:00 |
|
Xinwei Xiong
|
b0b722ee8e
|
Refactor ChatTemplate for Enhanced Clarity and Efficiency (#201)
|
2024-03-03 17:52:36 +08:00 |
|
Srinivas Billa
|
01b07ea3ac
|
Add SSL Cert Functionality (#224)
|
2024-03-03 17:41:41 +08:00 |
|
Liangsheng Yin
|
dfb13ac455
|
Fix addr reuse in check_port (#253)
|
2024-03-03 17:09:16 +08:00 |
|
Enrique Shockwave
|
9759d927cf
|
fix chatml template (#195)
|
2024-02-24 16:34:22 +08:00 |
|
Zhang Wenbin
|
8d0a7fae3b
|
Fix interpreter.py get_var(var_name) in text iter when stream is not enabled (#198)
|
2024-02-24 16:27:34 +08:00 |
|