sglang

Author	SHA1	Message	Date
Qubitium	ad1dd74673	Fix flashinfer >= 0.0.3 compat (#282 )	2024-03-12 21:45:58 +08:00
Qubitium	b2eb080501	Fix Runtime missing some ServerArgs options (#281 )	2024-03-11 22:32:15 +08:00
Lianmin Zheng	4aa5dd2c5f	Update version to v0.1.13 (#280 )	2024-03-11 05:49:27 -07:00
Lianmin Zheng	13662fd533	Fix RuntimeEndpoint (#279 )	2024-03-11 05:24:24 -07:00
Alessio Dalla Piazza	d5ae2ebaa2	Add Support for API Key Authentication (#230 )	2024-03-11 05:16:10 -07:00
Liangsheng Yin	1b35547927	Organize `server_args` (#277 )	2024-03-11 20:06:52 +08:00
Lianmin Zheng	faba293a0d	Improve gemma and documentations (#278 )	2024-03-11 04:43:39 -07:00
Liangsheng Yin	89885b31ef	Gemma Support (#256 )	2024-03-11 12:14:27 +08:00
Geary.Z	64fe311593	replace skip_embed with input_embeds (#222 )	2024-03-10 19:04:52 -07:00
Liangsheng Yin	a7ace9c88d	Fix qwen config (#261 )	2024-03-10 18:54:18 -07:00
Liangsheng Yin	dfb13ac455	Fix addr reuse in check_port (#253 )	2024-03-03 17:09:16 +08:00
Cody Yu	3c2c5869ad	Support outlines > 0.0.31 (#219 )	2024-02-24 15:06:17 +08:00
Cody Yu	4cb9aaedf3	Fix logprobs with logprob_start_len (#193 )	2024-02-22 10:33:03 -08:00
psych0v0yager	9de9a46815	Added the ability to Modify the Context Length (#210 )	2024-02-20 16:22:56 -08:00
Cody Yu	63ba630bbb	Refactor decoding logprob and add completion_tokens_wo_jump_forward (#189 )	2024-02-15 10:54:20 -08:00
Lianmin Zheng	6493256b7d	improve print	2024-02-12 12:43:48 +00:00
Lianmin Zheng	06008bc295	Fix server launch for jupyter notebook (#186 )	2024-02-12 04:43:14 -08:00
Lianmin Zheng	c51020cf0c	Fix the chat template for llava-v1.6-34b & format code (#177 )	2024-02-11 05:50:13 -08:00
Cody Yu	50afed4eaa	Support extra field regex in OpenAI API (#172 )	2024-02-10 17:21:33 -08:00
Cody Yu	4d303c4fa3	Fix token usage with jump forward (#174 )	2024-02-09 20:06:15 -08:00
Liangsheng Yin	37b42297f8	import outlines (#168 )	2024-02-09 10:13:02 +08:00
Cody Yu	cba5027332	Fix BaseCache metric (#170 )	2024-02-08 17:23:09 -08:00
Liangsheng Yin	b1a3a454ee	add `--disable-disk-cache` (#160 ) Co-authored-by: Ja1Zhou <50169346+Ja1Zhou@users.noreply.github.com>	2024-02-08 00:50:12 +08:00
Cody Yu	26c3494152	[Submodule] Change FlashInfer to import (#156 )	2024-02-06 19:28:29 -08:00
Lianmin Zheng	23f05005fd	Format code & move functions (#155 )	2024-02-06 13:27:46 -08:00
Cody Yu	a7334aeea1	Support decode token logprobs (#130 )	2024-02-06 12:24:55 -08:00
Arcmoon	3ae78a09b3	Add gptq quantization model support (#141 )	2024-02-06 11:35:04 -08:00
Cody Yu	ccbe1e67d8	Temporary fix OpenAI API for Pydantic v1/v2 (#153 )	2024-02-06 11:34:15 -08:00
Cody Yu	322421fae3	Add warmup to SRT server (#146 )	2024-02-05 14:21:16 -08:00
Liangsheng Yin	26f0bedc8f	jump-forward rename (#144 )	2024-02-05 16:50:37 +08:00
Liangsheng Yin	bb3a3b6675	Support Faster JSON decoding for llava (#137 ) When sending fast-forwarded reqs to model_rpc, re-calculate `pad_input_ids`	2024-02-03 23:32:05 +08:00
Ying Sheng	45d6592d40	Fix no-cache mode (#136 )	2024-02-03 04:59:06 -08:00
Ying Sheng	e095b16236	Add max_prefill_num_token into server arguments (#133 )	2024-02-03 02:35:54 -08:00
Liangsheng Yin	cd8c3ccd95	Fix `is_multimodal_model` judge (#132 )	2024-02-03 11:48:01 +08:00
Christopher Chou	864425300f	Yi-VL Model (#112 )	2024-02-01 08:33:22 -08:00
Lianmin Zheng	c7af9f7393	Fix a bug in llava-hd	2024-01-31 18:52:15 +00:00
Lianmin Zheng	ad82bac6f5	Fix model loading & format code (#125 )	2024-01-30 23:49:52 -08:00
Cody Yu	71b54eea7d	Add cache metrics (#119 )	2024-01-30 22:13:14 -08:00
Lianmin Zheng	74b3bfaaf8	format code	2024-01-30 16:36:10 +00:00
Jay Zhou	4a634cf646	[Feature] Allow specifying all ports to use in advance (#116 )	2024-01-30 08:34:51 -08:00
Lianmin Zheng	873d0e8537	Ignore detokenization error	2024-01-30 14:52:06 +00:00
Keith Stevens	1d0fbe8e43	[Feature] Adds basic support for image content in OpenAI chat routes (#113 )	2024-01-30 06:12:33 -08:00
Lianmin Zheng	97aa9b3284	Improve docs & Add JSON decode example (#121 )	2024-01-30 05:45:27 -08:00
Lianmin Zheng	0617528632	Update quick start examples (#120 )	2024-01-30 04:29:32 -08:00
Lianmin Zheng	4ea92f8307	Format code (#118 )	2024-01-29 17:08:12 -08:00
Junyang Lin	6b0af2853c	Add qwen2 (#114 )	2024-01-29 17:06:02 -08:00
Lianmin Zheng	6f560c761b	Improve the control of streaming and improve the first token latency in streaming (#117 )	2024-01-29 17:05:42 -08:00
Cody Yu	cd6872334e	Fix Mistral model loading (#108 ) Co-authored-by: johndun <dunavent.jm@gmail.com>	2024-01-26 09:38:43 -08:00
Liangsheng Yin	81561f8e2d	Flush Cache API (#103 )	2024-01-25 21:32:59 -08:00
Cody Yu	3a581e9949	Dynamic model class loading (#101 )	2024-01-25 15:29:07 -08:00

1 2

87 Commits