Commit Graph

43 Commits

Author SHA1 Message Date
Ke Bao
ec2150b294 Fix kill process util (#666) 2024-07-19 21:43:11 -07:00
Lianmin Zheng
e1792cca24 Remove cached triton launcher (#656) 2024-07-18 23:28:40 -07:00
Mingyi
d774acad5c Remove the dependency of rpyc (#646) 2024-07-18 02:13:54 -07:00
Liangsheng Yin
564a898ad9 Optimize mem indices mangement (#619) 2024-07-13 23:39:37 -07:00
Tommy Yang
b38687226a Make sglang compat with vllm 0.5.1 (#598) 2024-07-08 23:44:22 -07:00
Liangsheng Yin
f25b76c02a add LogitsMetadata (#604) 2024-07-08 17:46:55 -07:00
Liangsheng Yin
5304b4ef58 Add --enable-p2p-check option (#599) 2024-07-06 23:34:10 -07:00
Ying Sheng
dc1b8bcfaa Format (#593) 2024-07-05 10:06:17 -07:00
Lianmin Zheng
63fbef9876 fix flashinfer & http log level 2024-07-03 23:19:33 -07:00
Lianmin Zheng
eb1ae6ae0c Add sglang.bench_latency for offline benchmark (#564) 2024-06-25 03:38:04 -07:00
Ying Sheng
09593e9bc9 Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-06-17 20:41:24 -07:00
Ying Sheng
fb9296f0ed Higher priority for user input of max_prefill_tokens & format (#540) 2024-06-12 21:48:40 -07:00
ZhouXingg
111991fe23 Fix Regression: Disable p2p for 4090 (#531)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
2024-06-11 23:27:17 -07:00
Lianmin Zheng
91f93f141f Crash the server when error or OOM happens (#514) 2024-06-07 19:22:34 -07:00
Lianmin Zheng
bf3e271fe0 Update vllm to v0.4.3 (#511)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
Co-authored-by: ZX <zx@lbx.dev>
2024-06-07 12:11:31 -07:00
Ying Sheng
0463f7fb52 Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2024-05-27 21:24:10 -07:00
Lianmin Zheng
09de730dee Improve benchmark scripts & add more models (#484) 2024-05-27 14:13:26 -07:00
Lianmin Zheng
2cea6146d8 Improve logging & add logit cap (#471) 2024-05-24 03:48:53 -07:00
Lianmin Zheng
0fafc5606b port fp8 mixtral (#460) 2024-05-21 11:46:35 -07:00
Lianmin Zheng
8dbdc018a3 Abort disconnected requests (#457) 2024-05-20 18:41:21 -07:00
LiviaSun
ec380dfd30 openai chat speculative execution (#250)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-05-18 22:23:53 -07:00
Lianmin Zheng
c05956e534 Simplify port allocation (#447) 2024-05-16 18:07:30 -07:00
Liangsheng Yin
690d162d97 Format code (#441) 2024-05-14 22:40:46 +08:00
Yuanhan Zhang
0992d85f92 support llava video (#426) 2024-05-13 16:57:00 -07:00
Lianmin Zheng
562b8857d8 Improve error handling (#433) 2024-05-12 20:49:04 -07:00
Lianmin Zheng
aee4f523cf Fix logit processor bugs (#427) 2024-05-12 04:54:07 -07:00
Lianmin Zheng
7023f413c6 Clean up (#422) 2024-05-11 20:55:00 -07:00
Qubitium
33b242df30 Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
2024-05-11 16:37:49 -07:00
Liangsheng Yin
150d7020ed Revert removing the unused imports (#385) 2024-04-23 22:36:33 +08:00
Liangsheng Yin
9acc6e3504 add .isort.cfg (#378) 2024-04-22 22:38:09 +08:00
Liangsheng Yin
62b3812b69 Time cost utils (#355) 2024-04-09 23:27:31 +08:00
Liangsheng Yin
dfb13ac455 Fix addr reuse in check_port (#253) 2024-03-03 17:09:16 +08:00
Lianmin Zheng
c51020cf0c Fix the chat template for llava-v1.6-34b & format code (#177) 2024-02-11 05:50:13 -08:00
Lianmin Zheng
23f05005fd Format code & move functions (#155) 2024-02-06 13:27:46 -08:00
Arcmoon
3ae78a09b3 Add gptq quantization model support (#141) 2024-02-06 11:35:04 -08:00
Liangsheng Yin
cd8c3ccd95 Fix is_multimodal_model judge (#132) 2024-02-03 11:48:01 +08:00
Christopher Chou
864425300f Yi-VL Model (#112) 2024-02-01 08:33:22 -08:00
Lianmin Zheng
74b3bfaaf8 format code 2024-01-30 16:36:10 +00:00
Jay Zhou
4a634cf646 [Feature] Allow specifying all ports to use in advance (#116) 2024-01-30 08:34:51 -08:00
Lianmin Zheng
94e05770db Fix after QWen support (#82) 2024-01-22 21:17:05 -08:00
Arcmoon
63e97e5e4c Suppport qwen model and solve some problems (#75) 2024-01-22 20:14:51 -08:00
Christopher Chou
5b27a1dce4 Rename image_url to image_file (#15) 2024-01-16 15:41:30 -08:00
Lianmin Zheng
22085081bb release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-01-08 04:37:50 +00:00