sglang

Author	SHA1	Message	Date
Lianmin Zheng	0cb099e20a	set CUDA_DEVICE_MAX_CONNECTIONS=1 (#1113 )	2024-08-16 03:47:39 +10:00
Lianmin Zheng	326df4bab2	Use a single workspace for flashinfer (#1077 )	2024-08-14 19:25:37 -07:00
Ying Sheng	6767e2229f	Support jinja as chat template file (#1104 )	2024-08-14 17:43:14 -07:00
rainred	616b59f384	[Feature] modify Runtime to support skip_tokenizer_init (#1088 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-14 00:28:04 -07:00
Lianmin Zheng	c877292cc1	Re-organize CI tests (#1052 )	2024-08-12 03:39:01 -07:00
Yineng Zhang	94752ac811	feat: use FlashInfer rmsnorm and silu (#907 )	2024-08-11 14:57:13 +10:00
gryffindor-rr	9cf0a5bada	Add skip_tokenizer_init args. (#959 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-09 12:14:13 -07:00
Ying Sheng	b16e856f11	Add openai embedding API (#997 )	2024-08-09 11:19:18 -07:00
liuyhwangyh	b91a4cb1b1	support models from www.modelscope.cn (#994 ) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>	2024-08-09 02:52:14 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Ying Sheng	9f662501a3	Move torch.compile configs into cuda_graph_runner.py (#993 )	2024-08-08 13:20:30 -07:00
Ying Sheng	ff68ae857a	Show more error messages for warmup errors (#932 )	2024-08-06 23:57:06 -07:00
yichuan~	795eab6dda	Add support for Batch API test (#936 )	2024-08-06 23:52:10 -07:00
Ying Sheng	0d4f3a9fcd	Make API Key OpenAI-compatible (#917 )	2024-08-04 13:35:44 -07:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00
任嘉	4013a4e1b0	Implement served_model_name to customize model id when use local mode… (#749 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-01 17:13:51 -07:00
Ying Sheng	60340a3643	Improve the coverage of the openai api server test (#878 )	2024-08-01 16:01:30 -07:00
Ying Sheng	72b6ea88b4	Make scripts under `/test/srt` as unit tests (#875 )	2024-08-01 14:34:55 -07:00
Ying Sheng	6f221d4ca0	Fix unit tests for the frontend language part (#872 )	2024-08-01 12:39:12 -07:00
Yineng Zhang	bc3eaac2b8	chore: update flashinfer to v0.1.3 (#850 )	2024-08-01 04:37:05 +10:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
yichuan~	084fa54d37	Add support for OpenAI API : offline batch(file) processing (#699 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-07-29 13:07:18 -07:00
Ying Sheng	eba458bd19	Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806 )	2024-07-29 12:20:42 -07:00
Ying Sheng	7d352b4fdd	Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805 )	2024-07-29 11:39:12 -07:00
Yineng Zhang	87064015d9	fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803 )	2024-07-29 11:00:52 -07:00
Liangsheng Yin	7cd4f244a4	Chunked prefill (#800 )	2024-07-29 03:32:58 -07:00
Ying Sheng	98111fbe3e	Revert "Chunked prefill support" (#799 )	2024-07-29 02:38:31 -07:00
Liangsheng Yin	2ec39ab712	Chunked prefill support (#797 )	2024-07-29 02:21:50 -07:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
Lianmin Zheng	752e643007	Allow disabling flashinfer sampling kernel (#778 )	2024-07-27 20:18:56 -07:00
Mingyi	e4db4e5ba5	minor refactor: move check server args to server_args.py (#774 )	2024-07-27 19:03:40 -07:00
Ying Sheng	8fbba3de3d	Fix bugs (fp8 checkpoints, triton cache manager) (#729 )	2024-07-25 07:42:00 -07:00
Liangsheng Yin	04ec6ba2ac	Fix dockerfile and triton cache manager (#720 )	2024-07-25 03:04:21 -07:00
Lianmin Zheng	01d66ae2e8	Fix multi-node deadlock (#709 )	2024-07-23 21:53:36 -07:00
Ying Sheng	444a02441a	Update vllm version to support llama3.1 (#705 )	2024-07-23 13:49:34 -07:00
Liangsheng Yin	eedc12e12e	Support Deepseek MoE Model (#689 )	2024-07-21 03:09:29 -07:00
Liangsheng Yin	caaad53b52	Support gpt-bigcode model class (#681 )	2024-07-20 18:34:37 -07:00
Liangsheng Yin	69d19188fc	Decouple kv (#679 )	2024-07-20 14:16:45 -07:00
Mingyi	e3046ea3a8	Update OpenAI API (#667 )	2024-07-19 23:20:54 -07:00
Ying Sheng	e87c7fd501	Improve docs (#662 )	2024-07-19 10:58:03 -07:00
Ying Sheng	51fda1439f	Update Readme (#660 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-07-19 09:54:01 -07:00
zhyncs	c126a6ccba	feat: add benchmark serving (#657 )	2024-07-19 09:15:21 -07:00
Lianmin Zheng	e1792cca24	Remove cached triton launcher (#656 )	2024-07-18 23:28:40 -07:00
zhyncs	b050d9283f	fix: set ulimit -n 65535 (#647 )	2024-07-18 02:35:45 -07:00
Mingyi	d774acad5c	Remove the dependency of rpyc (#646 )	2024-07-18 02:13:54 -07:00
Liangsheng Yin	abd5385ac5	Move `global_server_args_dict` (#642 )	2024-07-17 13:49:15 -07:00
Liangsheng Yin	3de2f30a27	Flashinfer sample kernel (#617 )	2024-07-17 13:24:43 -07:00
zhyncs	2e341cd493	misc: add pre-commit config (#637 )	2024-07-17 11:55:39 -07:00
Ying Sheng	0aa189f150	Disable NCCL_NVLS by default (#631 )	2024-07-16 09:05:10 -07:00
Ying Sheng	6a2941f4d0	Improve tensor parallel performance (#625 ) Co-authored-by: Mingyi <wisclmy0611@gmail.com>	2024-07-15 07:10:51 -07:00

1 2 3

115 Commits