sglang

Author	SHA1	Message	Date
josephrocca	dff2860a69	Fix CORS compatibility with OpenAI, vLLM, TGI, LMDeploy (#1373 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-11 02:35:03 +10:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Kai-Hsun Chen	0836055324	[Chore] Rename model_overide_args to model_override_args (#1284 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-01 03:14:56 -07:00
Lianmin Zheng	0a97d7962d	[Fix] Fix OOM in llava base class (#1249 )	2024-08-28 08:45:49 -07:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Yineng Zhang	198974cd1a	feat: support sm75 with FlashInfer v0.1.6 (#1233 )	2024-08-28 18:39:12 +10:00
caiyueliang	2f1d92834f	[FEAT] Support batches cancel (#1222 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 23:28:26 +00:00
Lianmin Zheng	902278008a	[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208 )	2024-08-25 14:46:34 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Ying Sheng	1cb4da5c5f	[Fix] the issue of random order when input is a list (#1199 )	2024-08-24 21:43:03 -07:00
Lianmin Zheng	5623826f73	[Minor] Improve logging and rename the health check endpoint name (#1180 )	2024-08-21 19:24:36 -07:00
Lianmin Zheng	bea2bb9eea	Improve multi-node stability (#1171 )	2024-08-20 22:35:05 -07:00
Shan Yu	cd10654e7e	[Feat] Support update weights without restart server (#1157 )	2024-08-20 13:48:24 -07:00
Lucien	6242c399ab	Generate 1 token to verify the health of the inference service in /health (#1154 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-21 03:14:34 +10:00
yichuan~	b997a18d74	[Feat]Add support for optional start len of logprobs (#1035 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-08-18 23:45:41 -07:00
Lianmin Zheng	cdc8d60752	Improve the code style: more comments and remove useless packages (#1139 )	2024-08-17 14:37:52 -07:00
Liangsheng Yin	3694f8f996	Mixed style of chunked prefill (#1013 )	2024-08-16 09:13:00 +00:00
Lianmin Zheng	0cb099e20a	set CUDA_DEVICE_MAX_CONNECTIONS=1 (#1113 )	2024-08-16 03:47:39 +10:00
Lianmin Zheng	326df4bab2	Use a single workspace for flashinfer (#1077 )	2024-08-14 19:25:37 -07:00
Ying Sheng	6767e2229f	Support jinja as chat template file (#1104 )	2024-08-14 17:43:14 -07:00
rainred	616b59f384	[Feature] modify Runtime to support skip_tokenizer_init (#1088 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-14 00:28:04 -07:00
Lianmin Zheng	c877292cc1	Re-organize CI tests (#1052 )	2024-08-12 03:39:01 -07:00
Yineng Zhang	94752ac811	feat: use FlashInfer rmsnorm and silu (#907 )	2024-08-11 14:57:13 +10:00
gryffindor-rr	9cf0a5bada	Add skip_tokenizer_init args. (#959 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-09 12:14:13 -07:00
Ying Sheng	b16e856f11	Add openai embedding API (#997 )	2024-08-09 11:19:18 -07:00
liuyhwangyh	b91a4cb1b1	support models from www.modelscope.cn (#994 ) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>	2024-08-09 02:52:14 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Ying Sheng	9f662501a3	Move torch.compile configs into cuda_graph_runner.py (#993 )	2024-08-08 13:20:30 -07:00
Ying Sheng	ff68ae857a	Show more error messages for warmup errors (#932 )	2024-08-06 23:57:06 -07:00
yichuan~	795eab6dda	Add support for Batch API test (#936 )	2024-08-06 23:52:10 -07:00
Ying Sheng	0d4f3a9fcd	Make API Key OpenAI-compatible (#917 )	2024-08-04 13:35:44 -07:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00
任嘉	4013a4e1b0	Implement served_model_name to customize model id when use local mode… (#749 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-01 17:13:51 -07:00
Ying Sheng	60340a3643	Improve the coverage of the openai api server test (#878 )	2024-08-01 16:01:30 -07:00
Ying Sheng	72b6ea88b4	Make scripts under `/test/srt` as unit tests (#875 )	2024-08-01 14:34:55 -07:00
Ying Sheng	6f221d4ca0	Fix unit tests for the frontend language part (#872 )	2024-08-01 12:39:12 -07:00
Yineng Zhang	bc3eaac2b8	chore: update flashinfer to v0.1.3 (#850 )	2024-08-01 04:37:05 +10:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
yichuan~	084fa54d37	Add support for OpenAI API : offline batch(file) processing (#699 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-07-29 13:07:18 -07:00
Ying Sheng	eba458bd19	Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806 )	2024-07-29 12:20:42 -07:00
Ying Sheng	7d352b4fdd	Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805 )	2024-07-29 11:39:12 -07:00
Yineng Zhang	87064015d9	fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803 )	2024-07-29 11:00:52 -07:00
Liangsheng Yin	7cd4f244a4	Chunked prefill (#800 )	2024-07-29 03:32:58 -07:00
Ying Sheng	98111fbe3e	Revert "Chunked prefill support" (#799 )	2024-07-29 02:38:31 -07:00
Liangsheng Yin	2ec39ab712	Chunked prefill support (#797 )	2024-07-29 02:21:50 -07:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
Lianmin Zheng	752e643007	Allow disabling flashinfer sampling kernel (#778 )	2024-07-27 20:18:56 -07:00
Mingyi	e4db4e5ba5	minor refactor: move check server args to server_args.py (#774 )	2024-07-27 19:03:40 -07:00
Ying Sheng	8fbba3de3d	Fix bugs (fp8 checkpoints, triton cache manager) (#729 )	2024-07-25 07:42:00 -07:00
Liangsheng Yin	04ec6ba2ac	Fix dockerfile and triton cache manager (#720 )	2024-07-25 03:04:21 -07:00

1 2 3

132 Commits