sglang

Author	SHA1	Message	Date
Lianmin Zheng	b548801ddb	Update docs (#1839 )	2024-10-30 02:49:08 -07:00
Byron Hsu	680cad2023	fix get_memory_pool_size deadlock for DP (#1830 )	2024-10-28 23:07:14 -07:00
Byron Hsu	6fcd6d7d6d	Support token ids in `engine.generate` (#1820 )	2024-10-27 14:02:34 -07:00
Lianmin Zheng	eaade87a42	Fix unit tests (#1817 )	2024-10-27 03:04:54 -07:00
Lianmin Zheng	86fc0d79d0	Add a watch dog thread (#1816 )	2024-10-27 02:00:50 -07:00
Ying Sheng	2fce449b1c	[API] add get memory pool size (#1760 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-10-23 07:02:29 +00:00
Lianmin Zheng	769bf11c05	Fix the race condition in overlap mode (#1712 )	2024-10-19 06:50:56 -07:00
Lianmin Zheng	dd3809fad8	Fix engine unit test (#1701 )	2024-10-17 09:53:32 -07:00
Lianmin Zheng	7feba41584	Fix failed ci tests on long prompts; Better error messages for embedding models (#1700 )	2024-10-17 09:23:29 -07:00
Michael Feil	e5db40dcbc	ORJson. Faster Json serialization (#1694 )	2024-10-17 08:03:08 -07:00
Lianmin Zheng	02f7f3e488	Update the transformers version in CI (#1690 )	2024-10-16 19:03:55 -07:00
Zeng Zhongchao	2782132be8	Add date to logging messages (#1623 ) (#1679 )	2024-10-16 18:54:55 -07:00
Michael Feil	b0facb3316	add orjson for jsonresponse (#1688 )	2024-10-16 18:14:30 -07:00
Lianmin Zheng	dbec2f1847	Launch a thread to overlap CPU and GPU (#1687 )	2024-10-16 11:20:17 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Patrick Yi	31fad29ab0	Add get_tokenizer function for Engine class (#1653 )	2024-10-12 19:39:35 -07:00
Byron Hsu	862cd265e5	[engine] support async and streaming (#1614 )	2024-10-11 15:26:25 -07:00
Lianmin Zheng	23cc66f7b6	Add back data parallelism (#1635 )	2024-10-11 07:22:48 -07:00
科英	bbd72bfc86	Add the ability to enable and disable the Profiler via HTTP API. (#1626 )	2024-10-11 02:34:25 -07:00
Byron Hsu	e8613df071	[Engine] Fix generate hanging issue after the first call (#1606 )	2024-10-08 04:26:56 +00:00
Byron Hsu	565b05f02f	Use `atexit` hook to implicitly shutdown `Runtime` (#1595 )	2024-10-07 05:18:45 +00:00
Byron Hsu	551a3a9d38	Provide an offline engine API (#1567 )	2024-10-06 20:27:03 -07:00
Lianmin Zheng	114bbc8651	Use ipc instead of tcp in zmq (#1566 )	2024-10-04 00:45:52 -07:00
Lianmin Zheng	32eb6e96f2	Organize sampling batch info better (#1562 )	2024-10-03 18:29:49 -07:00
Lianmin Zheng	63ba2f8d7b	Clean up batch data structures: Introducing ModelWorkerBatch (#1544 )	2024-09-30 06:41:49 -07:00
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00
Lianmin Zheng	4e4459b91f	Multiple minor fixes (#1530 )	2024-09-28 14:43:35 -07:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
HAI	3a6e04185b	[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420 )	2024-09-17 07:43:52 +00:00
Lianmin Zheng	27b557aea7	Clean up model loader (#1440 )	2024-09-16 18:16:27 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Lianmin Zheng	fec185ce0c	Refactor attention backend (#1381 )	2024-09-11 11:44:26 -07:00
Lianmin Zheng	c03cece42f	Improve error reporting during server launch (#1390 )	2024-09-11 04:50:04 -07:00
Lianmin Zheng	46094e0c1b	Deprecate --disable-flashinfer and introduce --attention-backend (#1380 )	2024-09-10 17:11:16 -07:00
josephrocca	dff2860a69	Fix CORS compatibility with OpenAI, vLLM, TGI, LMDeploy (#1373 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-11 02:35:03 +10:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Kai-Hsun Chen	0836055324	[Chore] Rename model_overide_args to model_override_args (#1284 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-01 03:14:56 -07:00
Lianmin Zheng	0a97d7962d	[Fix] Fix OOM in llava base class (#1249 )	2024-08-28 08:45:49 -07:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Yineng Zhang	198974cd1a	feat: support sm75 with FlashInfer v0.1.6 (#1233 )	2024-08-28 18:39:12 +10:00
caiyueliang	2f1d92834f	[FEAT] Support batches cancel (#1222 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 23:28:26 +00:00
Lianmin Zheng	902278008a	[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208 )	2024-08-25 14:46:34 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Ying Sheng	1cb4da5c5f	[Fix] the issue of random order when input is a list (#1199 )	2024-08-24 21:43:03 -07:00
Lianmin Zheng	5623826f73	[Minor] Improve logging and rename the health check endpoint name (#1180 )	2024-08-21 19:24:36 -07:00
Lianmin Zheng	bea2bb9eea	Improve multi-node stability (#1171 )	2024-08-20 22:35:05 -07:00
Shan Yu	cd10654e7e	[Feat] Support update weights without restart server (#1157 )	2024-08-20 13:48:24 -07:00
Lucien	6242c399ab	Generate 1 token to verify the health of the inference service in /health (#1154 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-21 03:14:34 +10:00
yichuan~	b997a18d74	[Feat]Add support for optional start len of logprobs (#1035 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-08-18 23:45:41 -07:00
Lianmin Zheng	cdc8d60752	Improve the code style: more comments and remove useless packages (#1139 )	2024-08-17 14:37:52 -07:00

1 2 3 4

166 Commits