sglang

Author	SHA1	Message	Date
Liangsheng Yin	679ebcbbdc	Deepseek v2 support (#693 )	2024-07-26 17:10:07 -07:00
Liangsheng Yin	268684439b	Use min new token ratio at start (#701 )	2024-07-23 11:52:50 -07:00
Ying Sheng	c3f1aac811	Tune params (#696 )	2024-07-22 03:19:24 -07:00
Liangsheng Yin	caaad53b52	Support gpt-bigcode model class (#681 )	2024-07-20 18:34:37 -07:00
Ying Sheng	06487f126e	refactor model loader: initial refactor (#664 )	2024-07-20 02:18:22 -07:00
Ying Sheng	51fda1439f	Update Readme (#660 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-07-19 09:54:01 -07:00
zhyncs	ac971ff633	perf: reduce ttft and itl with stream_interval 1 (#658 )	2024-07-19 09:14:22 -07:00
Mingyi	d774acad5c	Remove the dependency of rpyc (#646 )	2024-07-18 02:13:54 -07:00
Ying Sheng	6a2941f4d0	Improve tensor parallel performance (#625 ) Co-authored-by: Mingyi <wisclmy0611@gmail.com>	2024-07-15 07:10:51 -07:00
Lianmin Zheng	665815969a	Enable cuda graph by default (#612 )	2024-07-13 05:29:46 -07:00
Lianmin Zheng	af4e7910e7	Clean up the usage of flashinfer (#610 )	2024-07-12 13:00:03 -07:00
Liangsheng Yin	5304b4ef58	Add `--enable-p2p-check` option (#599 )	2024-07-06 23:34:10 -07:00
Ying Sheng	dc1b8bcfaa	Format (#593 )	2024-07-05 10:06:17 -07:00
Lianmin Zheng	63fbef9876	fix flashinfer & http log level	2024-07-03 23:19:33 -07:00
Lianmin Zheng	c7709d3abe	Update install commands (#583 )	2024-07-03 02:10:59 -07:00
Ying Sheng	9380f50ff9	Turn on flashinfer by default (#578 )	2024-07-02 02:25:07 -07:00
Lianmin Zheng	badf3fa020	Expose dtype argument (#569 )	2024-06-27 23:30:39 -07:00
Lianmin Zheng	2187f36237	Add a new arguments log_level_http to control the HTTP logging (#563 )	2024-06-25 01:16:20 -07:00
Ying Sheng	09593e9bc9	Multi-node Tensor Parallelism (#550 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-06-17 20:41:24 -07:00
Lianmin Zheng	159cc741e4	Make the server random by default (#493 )	2024-05-31 23:33:34 -07:00
Ying Sheng	83525a1df2	Revert "Make the server random by default" (#492 )	2024-05-31 12:00:21 -07:00
Lianmin Zheng	80a33ce8b0	Do not set the default value of global random seed (#488 )	2024-05-29 18:41:18 -04:00
Ying Sheng	0463f7fb52	Support data parallelism (static) (#480 ) Co-authored-by: Ying Sheng <ying.sheng@databricks.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2024-05-27 21:24:10 -07:00
Lianmin Zheng	55c1643627	Improve benchmark scripts & rename some scripts (#477 )	2024-05-26 12:51:45 -07:00
Lianmin Zheng	0fafc5606b	port fp8 mixtral (#460 )	2024-05-21 11:46:35 -07:00
Yuanhan Zhang	0992d85f92	support llava video (#426 )	2024-05-13 16:57:00 -07:00
Liangsheng Yin	39191c8515	Cache optimizations (#418 )	2024-05-13 12:47:13 +08:00
Lianmin Zheng	3fc97f6709	Move openai api server into a separate file (#429 )	2024-05-12 06:41:32 -07:00
Lianmin Zheng	aee4f523cf	Fix logit processor bugs (#427 )	2024-05-12 04:54:07 -07:00
Lianmin Zheng	7023f413c6	Clean up (#422 )	2024-05-11 20:55:00 -07:00
Liangsheng Yin	62b3812b69	Time cost utils (#355 )	2024-04-09 23:27:31 +08:00
Alessio Dalla Piazza	d5ae2ebaa2	Add Support for API Key Authentication (#230 )	2024-03-11 05:16:10 -07:00
Liangsheng Yin	1b35547927	Organize `server_args` (#277 )	2024-03-11 20:06:52 +08:00
Lianmin Zheng	faba293a0d	Improve gemma and documentations (#278 )	2024-03-11 04:43:39 -07:00
Liangsheng Yin	89885b31ef	Gemma Support (#256 )	2024-03-11 12:14:27 +08:00
psych0v0yager	9de9a46815	Added the ability to Modify the Context Length (#210 )	2024-02-20 16:22:56 -08:00
Liangsheng Yin	b1a3a454ee	add `--disable-disk-cache` (#160 ) Co-authored-by: Ja1Zhou <50169346+Ja1Zhou@users.noreply.github.com>	2024-02-08 00:50:12 +08:00
Lianmin Zheng	23f05005fd	Format code & move functions (#155 )	2024-02-06 13:27:46 -08:00
Liangsheng Yin	26f0bedc8f	jump-forward rename (#144 )	2024-02-05 16:50:37 +08:00
Ying Sheng	e095b16236	Add max_prefill_num_token into server arguments (#133 )	2024-02-03 02:35:54 -08:00
Jay Zhou	4a634cf646	[Feature] Allow specifying all ports to use in advance (#116 )	2024-01-30 08:34:51 -08:00
Lianmin Zheng	6f560c761b	Improve the control of streaming and improve the first token latency in streaming (#117 )	2024-01-29 17:05:42 -08:00
Liangsheng Yin	01ee0fbc05	fast regex decode Auto-detect constant str path in regex FSM, then extend instead.	2024-01-25 01:16:25 +08:00
Liangsheng Yin	40ab1f0129	Fix the possible bug of decode out of memory (#36 )	2024-01-19 11:01:15 -08:00
Cody Yu	23471f9aa3	Support v1/chat/completions (#50 )	2024-01-18 23:43:09 -08:00
Lianmin Zheng	22ec7bc2a1	Expose more arguments to control the scheduling policy (#32 )	2024-01-17 18:37:02 -08:00
Lianmin Zheng	8024fc5eec	Fix streaming (#30 )	2024-01-17 16:38:20 -08:00
Lianmin Zheng	f9d723816a	Teak mem fraction (#20 )	2024-01-17 04:43:17 -08:00
Lianmin Zheng	bf51ddc6e5	Improve docs & Rename Gemini -> VertexAI (#19 )	2024-01-17 02:54:41 -08:00
Lianmin Zheng	70359bf31a	Update benchmark scripts (#8 )	2024-01-15 16:12:57 -08:00

1 2

53 Commits