sglang

Author	SHA1	Message	Date
Cody Yu	26c3494152	[Submodule] Change FlashInfer to import (#156 )	2024-02-06 19:28:29 -08:00
Lianmin Zheng	23f05005fd	Format code & move functions (#155 )	2024-02-06 13:27:46 -08:00
Cody Yu	a7334aeea1	Support decode token logprobs (#130 )	2024-02-06 12:24:55 -08:00
Arcmoon	3ae78a09b3	Add gptq quantization model support (#141 )	2024-02-06 11:35:04 -08:00
Cody Yu	ccbe1e67d8	Temporary fix OpenAI API for Pydantic v1/v2 (#153 )	2024-02-06 11:34:15 -08:00
LiviaSun	e2bf732bc3	add openai error handler with retry and logger (#148 )	2024-02-05 20:38:41 -08:00
Cody Yu	322421fae3	Add warmup to SRT server (#146 )	2024-02-05 14:21:16 -08:00
Liangsheng Yin	26f0bedc8f	jump-forward rename (#144 )	2024-02-05 16:50:37 +08:00
Yaya Sy	82fa69b3cc	fix undfined variable (#142 )	2024-02-04 14:27:52 -08:00
Liangsheng Yin	bb3a3b6675	Support Faster JSON decoding for llava (#137 ) When sending fast-forwarded reqs to model_rpc, re-calculate `pad_input_ids`	2024-02-03 23:32:05 +08:00
Ying Sheng	45d6592d40	Fix no-cache mode (#136 )	2024-02-03 04:59:06 -08:00
Ying Sheng	f6bfe3aaff	Release 0.1.11 (#134 )	2024-02-03 02:50:13 -08:00
Ying Sheng	e095b16236	Add max_prefill_num_token into server arguments (#133 )	2024-02-03 02:35:54 -08:00
Ying Sheng	67be11c790	fix bug of race condition in copy()	2024-02-03 01:38:00 -08:00
Liangsheng Yin	cd8c3ccd95	Fix `is_multimodal_model` judge (#132 )	2024-02-03 11:48:01 +08:00
Christopher Chou	864425300f	Yi-VL Model (#112 )	2024-02-01 08:33:22 -08:00
Lianmin Zheng	c7af9f7393	Fix a bug in llava-hd	2024-01-31 18:52:15 +00:00
Lianmin Zheng	ad82bac6f5	Fix model loading & format code (#125 )	2024-01-30 23:49:52 -08:00
Cody Yu	71b54eea7d	Add cache metrics (#119 )	2024-01-30 22:13:14 -08:00
Lianmin Zheng	74b3bfaaf8	format code	2024-01-30 16:36:10 +00:00
Jay Zhou	4a634cf646	[Feature] Allow specifying all ports to use in advance (#116 )	2024-01-30 08:34:51 -08:00
Lianmin Zheng	a49dc52bfa	release v0.1.10	2024-01-30 15:37:52 +00:00
Lianmin Zheng	873d0e8537	Ignore detokenization error	2024-01-30 14:52:06 +00:00
Keith Stevens	1d0fbe8e43	[Feature] Adds basic support for image content in OpenAI chat routes (#113 )	2024-01-30 06:12:33 -08:00
Lianmin Zheng	97aa9b3284	Improve docs & Add JSON decode example (#121 )	2024-01-30 05:45:27 -08:00
Lianmin Zheng	0617528632	Update quick start examples (#120 )	2024-01-30 04:29:32 -08:00
Lianmin Zheng	4ea92f8307	Format code (#118 )	2024-01-29 17:08:12 -08:00
Junyang Lin	6b0af2853c	Add qwen2 (#114 )	2024-01-29 17:06:02 -08:00
Lianmin Zheng	6f560c761b	Improve the control of streaming and improve the first token latency in streaming (#117 )	2024-01-29 17:05:42 -08:00
Cody Yu	cd6872334e	Fix Mistral model loading (#108 ) Co-authored-by: johndun <dunavent.jm@gmail.com>	2024-01-26 09:38:43 -08:00
Liangsheng Yin	81561f8e2d	Flush Cache API (#103 )	2024-01-25 21:32:59 -08:00
Cody Yu	3a581e9949	Dynamic model class loading (#101 )	2024-01-25 15:29:07 -08:00
shiyi.c_98	0147f940dd	fix batch error for llava-hd (#98 )	2024-01-25 07:56:25 -08:00
parasol-aser	23950056f0	support speculative execution for openai API (#48 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-01-25 01:57:06 -08:00
isaac-vidas	0c457bae8f	Handle grayscale images in expand2square (#97 )	2024-01-24 16:23:11 -08:00
Haotian Liu	d3fc86a43e	Improve Chinese character streaming when the last char is half Chinese word. (#95 )	2024-01-24 12:23:27 -08:00
Liangsheng Yin	01ee0fbc05	fast regex decode Auto-detect constant str path in regex FSM, then extend instead.	2024-01-25 01:16:25 +08:00
Lianmin Zheng	6dceab4d17	bump version to 0.1.9	2024-01-24 11:37:25 +00:00
Lianmin Zheng	c70b3cfa9e	Bump the version to v0.1.8 (#93 )	2024-01-24 03:33:34 -08:00
Ying Sheng	489796c7ea	minor performance fix	2024-01-24 10:45:44 +00:00
Lianmin Zheng	fa7a696d04	Fix max_new_tokens for limited memory	2024-01-24 10:44:32 +00:00
Lianmin Zheng	bef0b35902	Fix llava & Fix multiprocessing	2024-01-24 10:35:31 +00:00
shiyi.c_98	c6576e820c	Llava-hd Support (#92 ) Co-authored-by: Haotian Liu <liuhaotian.cn@gmail.com>	2024-01-24 01:51:21 -08:00
Lianmin Zheng	99258181c6	set start method to spawn	2024-01-24 08:55:38 +00:00
isaac-vidas	3de54a1b55	Add health endpoint to SGLang runtime server (#90 )	2024-01-23 19:00:28 -08:00
Lianmin Zheng	7358fa64f7	Fix a bug in runtime backend	2024-01-23 22:10:17 +00:00
Lianmin Zheng	9a16fea012	Return logprob for choices (#87 )	2024-01-23 05:07:30 -08:00
Lianmin Zheng	959c4174b2	Fix the chat template for QWen (#83 )	2024-01-22 21:46:47 -08:00
Lianmin Zheng	94e05770db	Fix after QWen support (#82 )	2024-01-22 21:17:05 -08:00
Arcmoon	63e97e5e4c	Suppport qwen model and solve some problems (#75 )	2024-01-22 20:14:51 -08:00

1 2

85 Commits