sglang

Author	SHA1	Message	Date
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Chayenne	7d5d1d3d29	udate weights from disk (#2265 )	2024-11-30 01:17:00 +00:00
Lianmin Zheng	f50a6cf443	Fix hash collision for multi modal models (#2256 )	2024-11-29 03:15:58 -08:00
Lianmin Zheng	fe97a2d40f	Simplify tokenizer manager (#2254 )	2024-11-29 02:18:51 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
Rin Intachuen	1aea19f64b	Input_embeds support (#2052 )	2024-11-25 16:35:04 -08:00
Ying Sheng	e1e595d702	[feat] Refactor session control interface and add CI (#2173 )	2024-11-25 12:32:51 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Ying Sheng	5942dfc00a	[feat] Add session control (#2073 )	2024-11-20 00:36:53 -08:00
Lianmin Zheng	1929c06762	Simplify prometheus metrics (#1981 ) Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>	2024-11-10 04:39:32 -08:00
Lianmin Zheng	9c939a3d8b	Clean up metrics code (#1972 )	2024-11-09 15:43:20 -08:00
Lianmin Zheng	a509552087	[minor] Improve code style and compatibility (#1961 )	2024-11-08 02:19:41 -08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Lianmin Zheng	0abbf289a8	Unify the model type checking (#1905 )	2024-11-03 12:25:39 -08:00
Lianmin Zheng	c17c578108	Simplify tokenizer manager (#1904 )	2024-11-03 08:38:26 -08:00
Lianmin Zheng	838dcda162	Simplify tokenizer manager (#1899 )	2024-11-03 03:52:38 -08:00
Byron Hsu	438526a814	Refactor tokenizer manager (#1846 )	2024-10-30 21:32:18 -07:00
Ying Sheng	4e2af03cfa	[Production] Drain requests before exit when receive SIGTERM (#1838 )	2024-10-30 10:22:56 -07:00
Byron Hsu	680cad2023	fix get_memory_pool_size deadlock for DP (#1830 )	2024-10-28 23:07:14 -07:00
Byron Hsu	0a24eb850a	Fix update_weights deadlock for DP (#1825 )	2024-10-28 12:02:23 -07:00
Liangsheng Yin	1e8903414a	Fix possible ZMQ hanging (#1800 )	2024-10-25 23:07:07 -07:00
Lianmin Zheng	fb99aaa527	[Fix] Fix --skip-tokenizer-init (#1798 )	2024-10-25 18:51:59 -07:00
Ying Sheng	2fce449b1c	[API] add get memory pool size (#1760 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-10-23 07:02:29 +00:00
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Lianmin Zheng	7feba41584	Fix failed ci tests on long prompts; Better error messages for embedding models (#1700 )	2024-10-17 09:23:29 -07:00
科英	bbd72bfc86	Add the ability to enable and disable the Profiler via HTTP API. (#1626 )	2024-10-11 02:34:25 -07:00
Lianmin Zheng	b6aad70ab1	[Fix] Fix the case where prompt_len = 0 (#1593 )	2024-10-06 20:30:02 -07:00
Lianmin Zheng	114bbc8651	Use ipc instead of tcp in zmq (#1566 )	2024-10-04 00:45:52 -07:00
Lianmin Zheng	32eb6e96f2	Organize sampling batch info better (#1562 )	2024-10-03 18:29:49 -07:00
Ying Sheng	f202ed9712	[Refactor] Simplify io_struct and tokenizer_manager (#1549 )	2024-10-01 10:25:32 -07:00
Liangsheng Yin	55b974f96f	Process image in parallel (#1539 )	2024-09-29 18:52:43 -07:00
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00
Liangsheng Yin	fd9ad817ec	Organize image inputs (#1531 )	2024-09-29 06:28:55 +00:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
wellhowtosay	2a99993cd9	Pr fix max workers (#1456 ) Co-authored-by: baolujia <baolujia@shizhuang-inc.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-09-22 02:20:26 -07:00
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Kaichen Zhang - NTU	662ecd9368	[Feat] Add modalities for vision server when handling pixel values for llava (#1346 )	2024-09-09 02:07:34 -07:00
Lianmin Zheng	f64eae3a29	[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308 )	2024-09-02 21:44:45 -07:00
Kai-Hsun Chen	0836055324	[Chore] Rename model_overide_args to model_override_args (#1284 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-01 03:14:56 -07:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Liangsheng Yin	632d506d0b	minor: improve CI and dependencies (#1212 )	2024-08-26 04:26:31 +00:00
Kaichen Zhang - NTU	3579162ab1	[Fix] Multi-images loading error (#1218 )	2024-08-26 03:58:51 +00:00
Lianmin Zheng	902278008a	[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208 )	2024-08-25 14:46:34 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Kaichen Zhang - NTU	66e7dcaf70	[Fix] Fixing the multi-images error for llava-onevision (#1205 )	2024-08-25 10:28:23 -07:00
Ying Sheng	1cb4da5c5f	[Fix] the issue of random order when input is a list (#1199 )	2024-08-24 21:43:03 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00

1 2 3

111 Commits