sglang

Author	SHA1	Message	Date
Ke Bao	ec2150b294	Fix kill process util (#666 )	2024-07-19 21:43:11 -07:00
Lianmin Zheng	e1792cca24	Remove cached triton launcher (#656 )	2024-07-18 23:28:40 -07:00
Mingyi	d774acad5c	Remove the dependency of rpyc (#646 )	2024-07-18 02:13:54 -07:00
Liangsheng Yin	564a898ad9	Optimize mem indices mangement (#619 )	2024-07-13 23:39:37 -07:00
Tommy Yang	b38687226a	Make sglang compat with vllm 0.5.1 (#598 )	2024-07-08 23:44:22 -07:00
Liangsheng Yin	f25b76c02a	add `LogitsMetadata` (#604 )	2024-07-08 17:46:55 -07:00
Liangsheng Yin	5304b4ef58	Add `--enable-p2p-check` option (#599 )	2024-07-06 23:34:10 -07:00
Ying Sheng	dc1b8bcfaa	Format (#593 )	2024-07-05 10:06:17 -07:00
Lianmin Zheng	63fbef9876	fix flashinfer & http log level	2024-07-03 23:19:33 -07:00
Lianmin Zheng	eb1ae6ae0c	Add sglang.bench_latency for offline benchmark (#564 )	2024-06-25 03:38:04 -07:00
Ying Sheng	09593e9bc9	Multi-node Tensor Parallelism (#550 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-06-17 20:41:24 -07:00
Ying Sheng	fb9296f0ed	Higher priority for user input of max_prefill_tokens & format (#540 )	2024-06-12 21:48:40 -07:00
ZhouXingg	111991fe23	Fix Regression: Disable p2p for 4090 (#531 ) Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>	2024-06-11 23:27:17 -07:00
Lianmin Zheng	91f93f141f	Crash the server when error or OOM happens (#514 )	2024-06-07 19:22:34 -07:00
Lianmin Zheng	bf3e271fe0	Update vllm to v0.4.3 (#511 ) Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com> Co-authored-by: ZX <zx@lbx.dev>	2024-06-07 12:11:31 -07:00
Ying Sheng	0463f7fb52	Support data parallelism (static) (#480 ) Co-authored-by: Ying Sheng <ying.sheng@databricks.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2024-05-27 21:24:10 -07:00
Lianmin Zheng	09de730dee	Improve benchmark scripts & add more models (#484 )	2024-05-27 14:13:26 -07:00
Lianmin Zheng	2cea6146d8	Improve logging & add logit cap (#471 )	2024-05-24 03:48:53 -07:00
Lianmin Zheng	0fafc5606b	port fp8 mixtral (#460 )	2024-05-21 11:46:35 -07:00
Lianmin Zheng	8dbdc018a3	Abort disconnected requests (#457 )	2024-05-20 18:41:21 -07:00
LiviaSun	ec380dfd30	openai chat speculative execution (#250 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-05-18 22:23:53 -07:00
Lianmin Zheng	c05956e534	Simplify port allocation (#447 )	2024-05-16 18:07:30 -07:00
Liangsheng Yin	690d162d97	Format code (#441 )	2024-05-14 22:40:46 +08:00
Yuanhan Zhang	0992d85f92	support llava video (#426 )	2024-05-13 16:57:00 -07:00
Lianmin Zheng	562b8857d8	Improve error handling (#433 )	2024-05-12 20:49:04 -07:00
Lianmin Zheng	aee4f523cf	Fix logit processor bugs (#427 )	2024-05-12 04:54:07 -07:00
Lianmin Zheng	7023f413c6	Clean up (#422 )	2024-05-11 20:55:00 -07:00
Qubitium	33b242df30	Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380 ) Co-authored-by: ZX <zx@lbx.dev> Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>	2024-05-11 16:37:49 -07:00
Liangsheng Yin	150d7020ed	Revert removing the unused imports (#385 )	2024-04-23 22:36:33 +08:00
Liangsheng Yin	9acc6e3504	add `.isort.cfg` (#378 )	2024-04-22 22:38:09 +08:00
Liangsheng Yin	62b3812b69	Time cost utils (#355 )	2024-04-09 23:27:31 +08:00
Liangsheng Yin	dfb13ac455	Fix addr reuse in check_port (#253 )	2024-03-03 17:09:16 +08:00
Lianmin Zheng	c51020cf0c	Fix the chat template for llava-v1.6-34b & format code (#177 )	2024-02-11 05:50:13 -08:00
Lianmin Zheng	23f05005fd	Format code & move functions (#155 )	2024-02-06 13:27:46 -08:00
Arcmoon	3ae78a09b3	Add gptq quantization model support (#141 )	2024-02-06 11:35:04 -08:00
Liangsheng Yin	cd8c3ccd95	Fix `is_multimodal_model` judge (#132 )	2024-02-03 11:48:01 +08:00
Christopher Chou	864425300f	Yi-VL Model (#112 )	2024-02-01 08:33:22 -08:00
Lianmin Zheng	74b3bfaaf8	format code	2024-01-30 16:36:10 +00:00
Jay Zhou	4a634cf646	[Feature] Allow specifying all ports to use in advance (#116 )	2024-01-30 08:34:51 -08:00
Lianmin Zheng	94e05770db	Fix after QWen support (#82 )	2024-01-22 21:17:05 -08:00
Arcmoon	63e97e5e4c	Suppport qwen model and solve some problems (#75 )	2024-01-22 20:14:51 -08:00
Christopher Chou	5b27a1dce4	Rename image_url to image_file (#15 )	2024-01-16 15:41:30 -08:00
Lianmin Zheng	22085081bb	release initial code Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-01-08 04:37:50 +00:00

43 Commits