sglang

Author	SHA1	Message	Date
Lianmin Zheng	eb1ae6ae0c	Add sglang.bench_latency for offline benchmark (#564 )	2024-06-25 03:38:04 -07:00
Lianmin Zheng	2187f36237	Add a new arguments log_level_http to control the HTTP logging (#563 )	2024-06-25 01:16:20 -07:00
Lianmin Zheng	9465b668b9	Allow running with vllm==0.4.3 (#561 )	2024-06-24 15:24:21 -07:00
Lianmin Zheng	1fa15099d8	Add LlamaForClassification (#559 )	2024-06-22 00:49:31 -07:00
Lianmin Zheng	303ef8883e	Clean up logits processor (#558 )	2024-06-22 00:25:24 -07:00
Lianmin Zheng	e94e60d6fb	make flashinfer workspace larger	2024-06-21 17:32:36 -07:00
Lianmin Zheng	d2f8bfb2e1	Follow-up fixes for flashinfer 0.0.5 (#556 )	2024-06-20 23:19:52 -07:00
Lianmin Zheng	b7e2f800ac	Update flashinfer to 0.0.5 (#554 )	2024-06-20 20:29:06 -07:00
Ying Sheng	09593e9bc9	Multi-node Tensor Parallelism (#550 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-06-17 20:41:24 -07:00
Lianmin Zheng	53a7ebd89a	Update fused_moe (#553 )	2024-06-17 09:47:58 -07:00
Liangsheng Yin	ad5f04d6ce	Fix the Jump-Forward with Chinese (#551 )	2024-06-16 21:45:04 +08:00
Qubitium-modelcloud	bbec01c9aa	Fix tp worker only checking req[0] for stream (#546 )	2024-06-14 22:56:10 -07:00
Ying Sheng	fb9296f0ed	Higher priority for user input of max_prefill_tokens & format (#540 )	2024-06-12 21:48:40 -07:00
Ying Sheng	1374334d38	Fix dependency & crash issues (#539 )	2024-06-12 21:23:19 -07:00
Lianmin Zheng	94aead9e8d	Fix dependency (#538 )	2024-06-12 13:17:35 -07:00
Liangsheng Yin	9c902b1954	Decode Incrementally (#517 )	2024-06-11 23:39:12 -07:00
ZhouXingg	111991fe23	Fix Regression: Disable p2p for 4090 (#531 ) Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>	2024-06-11 23:27:17 -07:00
Qubitium	a8c787d2b3	Add ChatGLM Model Support (#516 ) Co-authored-by: ZX <zx@lbx.dev>	2024-06-11 16:39:52 -07:00
Fabian Preiß	5f283991e9	[Minor] Correct Optional type hints in api (#526 )	2024-06-11 16:37:27 -07:00
Fabian Preiß	542bc733d6	Fix missing numpy dependency in pyproject.toml (#524 )	2024-06-10 12:13:50 -07:00
Lianmin Zheng	f6dbd24043	Improve doc strings (#518 )	2024-06-08 02:39:32 -07:00
Lianmin Zheng	e8a2327d52	Update version to 0.1.17 (#515 )	2024-06-07 19:49:18 -07:00
Lianmin Zheng	91f93f141f	Crash the server when error or OOM happens (#514 )	2024-06-07 19:22:34 -07:00
Qubitium	f70f72586a	Fix rid state map leak + Refractor .finished (#505 ) Co-authored-by: ZX <zx@lbx.dev>	2024-06-07 13:20:40 -07:00
Lianmin Zheng	c0ae70c8ed	Improve logging & fix litellm dependency. (#512 )	2024-06-07 13:10:32 -07:00
胡译文	87260b7bfd	Litellm Backend (#502 )	2024-06-07 12:24:28 -07:00
Amos You	651a23ee7c	remove redundant pad_input_ids function (#500 )	2024-06-07 12:23:29 -07:00
Lianmin Zheng	bf3e271fe0	Update vllm to v0.4.3 (#511 ) Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com> Co-authored-by: ZX <zx@lbx.dev>	2024-06-07 12:11:31 -07:00
Lianmin Zheng	3bc01ac137	[Minor] improve code style	2024-06-03 18:11:34 -07:00
Lianmin Zheng	159cc741e4	Make the server random by default (#493 )	2024-05-31 23:33:34 -07:00
Ying Sheng	83525a1df2	Revert "Make the server random by default" (#492 )	2024-05-31 12:00:21 -07:00
Lianmin Zheng	80a33ce8b0	Do not set the default value of global random seed (#488 )	2024-05-29 18:41:18 -04:00
Lianmin Zheng	1a57e41679	do not launch workers in parallel	2024-05-27 23:00:16 -07:00
Ying Sheng	0463f7fb52	Support data parallelism (static) (#480 ) Co-authored-by: Ying Sheng <ying.sheng@databricks.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2024-05-27 21:24:10 -07:00
Lianmin Zheng	565d727409	improve logging & fix vllm version	2024-05-27 15:04:23 -07:00
Lianmin Zheng	09de730dee	Improve benchmark scripts & add more models (#484 )	2024-05-27 14:13:26 -07:00
Lianmin Zheng	55c1643627	Improve benchmark scripts & rename some scripts (#477 )	2024-05-26 12:51:45 -07:00
Li Bo	2b605ab1d7	[Feat/Fix] Refactoring Llava models into single file (#475 )	2024-05-26 12:29:51 -07:00
Liangsheng Yin	f06e90c2cf	Optimize retract (#440 )	2024-05-26 00:07:26 +08:00
Lianmin Zheng	2cea6146d8	Improve logging & add logit cap (#471 )	2024-05-24 03:48:53 -07:00
Lianmin Zheng	0fafc5606b	port fp8 mixtral (#460 )	2024-05-21 11:46:35 -07:00
Lianmin Zheng	19d2135cb8	Use model loader from vllm (#459 )	2024-05-21 09:13:37 -07:00
Lianmin Zheng	ced77c6626	Rename api_num_spec_tokens -> num_api_spec_tokens (#458 )	2024-05-20 18:44:23 -07:00
Lianmin Zheng	8dbdc018a3	Abort disconnected requests (#457 )	2024-05-20 18:41:21 -07:00
Ying Sheng	3e684be7a3	Fix openai speculative execution (#456 )	2024-05-20 17:01:13 -07:00
LiviaSun	ec380dfd30	openai chat speculative execution (#250 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-05-18 22:23:53 -07:00
Liangsheng Yin	5b647543c1	Fix the broken `--disable-radix-cache` (#451 )	2024-05-19 13:00:12 +08:00
Lianmin Zheng	8210ec60f4	Improve error handling & abort disconnected requests (#449 )	2024-05-17 05:49:31 -07:00
Ying Sheng	5be9eb8a8c	Add PUT for generate api (#448 )	2024-05-17 02:35:15 -07:00
Lianmin Zheng	c05956e534	Simplify port allocation (#447 )	2024-05-16 18:07:30 -07:00

1 2 3 4 5

234 Commits