Ke Bao
|
ec2150b294
|
Fix kill process util (#666)
|
2024-07-19 21:43:11 -07:00 |
|
Lianmin Zheng
|
e1792cca24
|
Remove cached triton launcher (#656)
|
2024-07-18 23:28:40 -07:00 |
|
Mingyi
|
d774acad5c
|
Remove the dependency of rpyc (#646)
|
2024-07-18 02:13:54 -07:00 |
|
Liangsheng Yin
|
564a898ad9
|
Optimize mem indices mangement (#619)
|
2024-07-13 23:39:37 -07:00 |
|
Tommy Yang
|
b38687226a
|
Make sglang compat with vllm 0.5.1 (#598)
|
2024-07-08 23:44:22 -07:00 |
|
Liangsheng Yin
|
f25b76c02a
|
add LogitsMetadata (#604)
|
2024-07-08 17:46:55 -07:00 |
|
Liangsheng Yin
|
5304b4ef58
|
Add --enable-p2p-check option (#599)
|
2024-07-06 23:34:10 -07:00 |
|
Ying Sheng
|
dc1b8bcfaa
|
Format (#593)
|
2024-07-05 10:06:17 -07:00 |
|
Lianmin Zheng
|
63fbef9876
|
fix flashinfer & http log level
|
2024-07-03 23:19:33 -07:00 |
|
Lianmin Zheng
|
eb1ae6ae0c
|
Add sglang.bench_latency for offline benchmark (#564)
|
2024-06-25 03:38:04 -07:00 |
|
Ying Sheng
|
09593e9bc9
|
Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-06-17 20:41:24 -07:00 |
|
Ying Sheng
|
fb9296f0ed
|
Higher priority for user input of max_prefill_tokens & format (#540)
|
2024-06-12 21:48:40 -07:00 |
|
ZhouXingg
|
111991fe23
|
Fix Regression: Disable p2p for 4090 (#531)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
|
2024-06-11 23:27:17 -07:00 |
|
Lianmin Zheng
|
91f93f141f
|
Crash the server when error or OOM happens (#514)
|
2024-06-07 19:22:34 -07:00 |
|
Lianmin Zheng
|
bf3e271fe0
|
Update vllm to v0.4.3 (#511)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-07 12:11:31 -07:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Lianmin Zheng
|
09de730dee
|
Improve benchmark scripts & add more models (#484)
|
2024-05-27 14:13:26 -07:00 |
|
Lianmin Zheng
|
2cea6146d8
|
Improve logging & add logit cap (#471)
|
2024-05-24 03:48:53 -07:00 |
|
Lianmin Zheng
|
0fafc5606b
|
port fp8 mixtral (#460)
|
2024-05-21 11:46:35 -07:00 |
|
Lianmin Zheng
|
8dbdc018a3
|
Abort disconnected requests (#457)
|
2024-05-20 18:41:21 -07:00 |
|
LiviaSun
|
ec380dfd30
|
openai chat speculative execution (#250)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-05-18 22:23:53 -07:00 |
|
Lianmin Zheng
|
c05956e534
|
Simplify port allocation (#447)
|
2024-05-16 18:07:30 -07:00 |
|
Liangsheng Yin
|
690d162d97
|
Format code (#441)
|
2024-05-14 22:40:46 +08:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Lianmin Zheng
|
562b8857d8
|
Improve error handling (#433)
|
2024-05-12 20:49:04 -07:00 |
|
Lianmin Zheng
|
aee4f523cf
|
Fix logit processor bugs (#427)
|
2024-05-12 04:54:07 -07:00 |
|
Lianmin Zheng
|
7023f413c6
|
Clean up (#422)
|
2024-05-11 20:55:00 -07:00 |
|
Qubitium
|
33b242df30
|
Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
|
2024-05-11 16:37:49 -07:00 |
|
Liangsheng Yin
|
150d7020ed
|
Revert removing the unused imports (#385)
|
2024-04-23 22:36:33 +08:00 |
|
Liangsheng Yin
|
9acc6e3504
|
add .isort.cfg (#378)
|
2024-04-22 22:38:09 +08:00 |
|
Liangsheng Yin
|
62b3812b69
|
Time cost utils (#355)
|
2024-04-09 23:27:31 +08:00 |
|
Liangsheng Yin
|
dfb13ac455
|
Fix addr reuse in check_port (#253)
|
2024-03-03 17:09:16 +08:00 |
|
Lianmin Zheng
|
c51020cf0c
|
Fix the chat template for llava-v1.6-34b & format code (#177)
|
2024-02-11 05:50:13 -08:00 |
|
Lianmin Zheng
|
23f05005fd
|
Format code & move functions (#155)
|
2024-02-06 13:27:46 -08:00 |
|
Arcmoon
|
3ae78a09b3
|
Add gptq quantization model support (#141)
|
2024-02-06 11:35:04 -08:00 |
|
Liangsheng Yin
|
cd8c3ccd95
|
Fix is_multimodal_model judge (#132)
|
2024-02-03 11:48:01 +08:00 |
|
Christopher Chou
|
864425300f
|
Yi-VL Model (#112)
|
2024-02-01 08:33:22 -08:00 |
|
Lianmin Zheng
|
74b3bfaaf8
|
format code
|
2024-01-30 16:36:10 +00:00 |
|
Jay Zhou
|
4a634cf646
|
[Feature] Allow specifying all ports to use in advance (#116)
|
2024-01-30 08:34:51 -08:00 |
|
Lianmin Zheng
|
94e05770db
|
Fix after QWen support (#82)
|
2024-01-22 21:17:05 -08:00 |
|
Arcmoon
|
63e97e5e4c
|
Suppport qwen model and solve some problems (#75)
|
2024-01-22 20:14:51 -08:00 |
|
Christopher Chou
|
5b27a1dce4
|
Rename image_url to image_file (#15)
|
2024-01-16 15:41:30 -08:00 |
|
Lianmin Zheng
|
22085081bb
|
release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-01-08 04:37:50 +00:00 |
|