sglang

Author	SHA1	Message	Date
Lianmin Zheng	6f560c761b	Improve the control of streaming and improve the first token latency in streaming (#117 )	2024-01-29 17:05:42 -08:00
Cody Yu	cd6872334e	Fix Mistral model loading (#108 ) Co-authored-by: johndun <dunavent.jm@gmail.com>	2024-01-26 09:38:43 -08:00
Liangsheng Yin	81561f8e2d	Flush Cache API (#103 )	2024-01-25 21:32:59 -08:00
Cody Yu	3a581e9949	Dynamic model class loading (#101 )	2024-01-25 15:29:07 -08:00
shiyi.c_98	0147f940dd	fix batch error for llava-hd (#98 )	2024-01-25 07:56:25 -08:00
parasol-aser	23950056f0	support speculative execution for openai API (#48 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-01-25 01:57:06 -08:00
Lianmin Zheng	93414c8238	Add a link to HF paper page	2024-01-24 22:25:33 -08:00
Lianmin Zheng	ed7c7eca0e	Update README.md	2024-01-24 16:52:21 -08:00
isaac-vidas	0c457bae8f	Handle grayscale images in expand2square (#97 )	2024-01-24 16:23:11 -08:00
Haotian Liu	d3fc86a43e	Improve Chinese character streaming when the last char is half Chinese word. (#95 )	2024-01-24 12:23:27 -08:00
Liangsheng Yin	01ee0fbc05	fast regex decode Auto-detect constant str path in regex FSM, then extend instead.	2024-01-25 01:16:25 +08:00
Lianmin Zheng	711d343530	add a batch llava example	2024-01-24 11:52:10 +00:00
Lianmin Zheng	6dceab4d17	bump version to 0.1.9	2024-01-24 11:37:25 +00:00
Lianmin Zheng	c70b3cfa9e	Bump the version to v0.1.8 (#93 )	2024-01-24 03:33:34 -08:00
Ying Sheng	489796c7ea	minor performance fix	2024-01-24 10:45:44 +00:00
Lianmin Zheng	fa7a696d04	Fix max_new_tokens for limited memory	2024-01-24 10:44:32 +00:00
Lianmin Zheng	bef0b35902	Fix llava & Fix multiprocessing	2024-01-24 10:35:31 +00:00
shiyi.c_98	c6576e820c	Llava-hd Support (#92 ) Co-authored-by: Haotian Liu <liuhaotian.cn@gmail.com>	2024-01-24 01:51:21 -08:00
Lianmin Zheng	99258181c6	set start method to spawn	2024-01-24 08:55:38 +00:00
isaac-vidas	3de54a1b55	Add health endpoint to SGLang runtime server (#90 )	2024-01-23 19:00:28 -08:00
Lianmin Zheng	7358fa64f7	Fix a bug in runtime backend	2024-01-23 22:10:17 +00:00
Lianmin Zheng	9a16fea012	Return logprob for choices (#87 )	2024-01-23 05:07:30 -08:00
Lianmin Zheng	9e037c822c	Update README.md	2024-01-23 03:43:19 -08:00
0xWe11es.eth	9076386d90	Fix SRT endpoint api json syntax (#84 )	2024-01-23 00:25:26 -08:00
Lianmin Zheng	959c4174b2	Fix the chat template for QWen (#83 )	2024-01-22 21:46:47 -08:00
Lianmin Zheng	94e05770db	Fix after QWen support (#82 )	2024-01-22 21:17:05 -08:00
Arcmoon	63e97e5e4c	Suppport qwen model and solve some problems (#75 )	2024-01-22 20:14:51 -08:00
isaac-vidas	e08bca2840	Support load fine-tuned LLaVA model (#80 )	2024-01-22 18:15:48 -08:00
Lianmin Zheng	cd3ccb2ed7	Add a note about triton version for older GPUs (#72 )	2024-01-21 16:51:45 -08:00
Ying Sheng	3f5c2f4c4a	Add an async example (#37 )	2024-01-21 15:17:30 -08:00
Lianmin Zheng	007eeb4eb9	Fix the error message and dependency of openai backend (#71 )	2024-01-21 14:56:25 -08:00
Ying Sheng	e8f2b155fe	Update README.md	2024-01-21 02:45:58 -08:00
Lianmin Zheng	723f042163	release v0.1.7 & fix bugs	2024-01-21 10:31:02 +00:00
Lianmin Zheng	585eababa1	Improve error message of openai	2024-01-21 10:13:45 +00:00
Lianmin Zheng	cc3ada983f	Bump version to 0.1.6 (#68 )	2024-01-21 01:45:02 -08:00
Lianmin Zheng	a837166e6f	Fix select and normalized logprobs (#67 )	2024-01-21 01:39:23 -08:00
Lianmin Zheng	11f3cca64f	Fix select (#64 )	2024-01-20 23:20:35 -08:00
Liangsheng Yin	ca13f3b8c5	Disk FSM cache and adjust code. (#63 )	2024-01-20 21:26:11 -08:00
Ikko Eltociear Ashimine	0b2efc2adc	Update README.md (#58 )	2024-01-19 21:00:29 -08:00
Lianmin Zheng	f30abd090a	Improve error message & Add vicuna template (#57 )	2024-01-19 17:03:33 -08:00
Liangsheng Yin	40ab1f0129	Fix the possible bug of decode out of memory (#36 )	2024-01-19 11:01:15 -08:00
Lianmin Zheng	199e82a15d	Format code & Improve readme (#52 )	2024-01-18 23:51:19 -08:00
Cody Yu	23471f9aa3	Support v1/chat/completions (#50 )	2024-01-18 23:43:09 -08:00
Cody Yu	61d4c93962	Support stream=True in v1/completions (#49 )	2024-01-18 17:00:56 -08:00
Lianmin Zheng	98a3e8ef78	Add a llava example (#47 )	2024-01-18 13:46:38 -08:00
Lianmin Zheng	2b079f8931	Increase interpreter parallelism (#46 )	2024-01-18 13:30:10 -08:00
Lianmin Zheng	05b4c398df	Document sampling parameters (#45 )	2024-01-18 11:49:27 -08:00
Cody Yu	dafafe5b11	Use HTTP link in 3rdparty module (#42 )	2024-01-18 11:18:22 -08:00
Lianmin Zheng	b240f75100	Add a parallel sampling case (#34 )	2024-01-18 06:29:43 +00:00
Lianmin Zheng	501f944445	Bump version to 0.1.5 (#33 )	2024-01-17 21:14:31 -08:00

1 2

76 Commits