sglang

Author	SHA1	Message	Date
Lianmin Zheng	68be2f6d3b	[CI] Include triton backend and online serving benchmark into CI (#1408 )	2024-09-12 21:36:41 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Ying Sheng	689ff588ec	[CI] Return output logprobs in unit test (#1361 )	2024-09-09 13:05:13 -07:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Lianmin Zheng	1e495e0847	[Fix] Fix select by ensuring each request has at least one token (#1318 )	2024-09-03 06:31:45 -07:00
Yineng Zhang	2561ed012c	feat: update nightly gsm8k eval (#1304 )	2024-09-03 01:18:41 +10:00
Liangsheng Yin	381dd57bd6	Sampler cudagraph (#1253 )	2024-08-28 18:58:52 -07:00
Yineng Zhang	b1a540ec42	feat: update GemmaRMSNorm (#1232 )	2024-08-28 22:47:34 +10:00
Yineng Zhang	66975360e7	fix: increase max_new_tokens when testing generation models (#1244 )	2024-08-28 22:12:36 +10:00
Yineng Zhang	f25f4dfde5	hotfix: revert sampler CUDA Graph (#1242 )	2024-08-28 21:16:47 +10:00
Liangsheng Yin	75ce37f401	Move sampler into CUDA graph (#1201 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 07:02:50 -07:00
Mingyi	97589a60a2	[CI] Parallelize unit tests in CI (#1219 )	2024-08-26 04:54:02 +00:00
Liangsheng Yin	632d506d0b	minor: improve CI and dependencies (#1212 )	2024-08-26 04:26:31 +00:00
Mingyi	7514b9f8d3	[CI] Fix CI (#1217 )	2024-08-26 02:56:42 +00:00
Mingyi	158e8f1e2d	improve the threshold and ports in tests (#1215 )	2024-08-25 19:02:08 -07:00
Lianmin Zheng	15f1a49d2d	Update CI workflows (#1210 )	2024-08-25 16:43:07 -07:00
Ying Sheng	308d024092	[CI] Fix the issue of unit test hanging (#1211 )	2024-08-25 16:21:37 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Ying Sheng	1cb4da5c5f	[Fix] the issue of random order when input is a list (#1199 )	2024-08-24 21:43:03 -07:00
Ying Sheng	e61d13acdf	[CI] Fix the problem of hf runner too slow (#1202 )	2024-08-24 18:35:55 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00
Yineng Zhang	c9064e6fd9	feat: use gelu_tanh_and_mul (#1193 )	2024-08-24 01:58:16 -07:00
yichuan~	b997a18d74	[Feat]Add support for optional start len of logprobs (#1035 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-08-18 23:45:41 -07:00
Lianmin Zheng	3c1f5a9220	Fix duplicated imports in hf_transformers_utils.py (#1141 )	2024-08-17 18:03:00 -07:00
Lianmin Zheng	57d0bd91ec	Improve benchmark (#1140 )	2024-08-17 17:43:23 -07:00
Liangsheng Yin	f624f6a6cc	Fix port conflicts between local CI and runner CI. (#1131 )	2024-08-16 15:12:38 -07:00
Liangsheng Yin	3694f8f996	Mixed style of chunked prefill (#1013 )	2024-08-16 09:13:00 +00:00
Liangsheng Yin	73cf6834f2	Support `stop_token_ids` in sglang API (#1092 )	2024-08-15 00:31:39 +00:00
Ying Sheng	96a2093ef0	[Fix] Compatibility of window attention and cuda graph (#1090 )	2024-08-14 10:37:01 -07:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Yineng Zhang	f7fb68d292	ci: add moe test (#1053 )	2024-08-13 18:43:23 +10:00
Yineng Zhang	65e89baea9	fix: not use the default port (#1068 )	2024-08-13 15:12:56 +10:00
Lianmin Zheng	0c1c72a0b4	Fix accuracy test (#1051 )	2024-08-12 19:48:40 +10:00
Lianmin Zheng	41598e0d8e	Add longer accuracy test on CI (#1049 )	2024-08-12 09:21:38 +00:00
Ying Sheng	32f6144323	fix: Fix returned prefill logits and add output str test (#1046 )	2024-08-12 06:13:45 +00:00
Lianmin Zheng	8207637029	Improve end-to-end throughput test and its coverage (#1039 )	2024-08-11 18:27:33 -07:00
Lianmin Zheng	d84c5e70f7	Test the case when max_new_tokens is very large (#1038 )	2024-08-11 16:41:03 -07:00
Yineng Zhang	94752ac811	feat: use FlashInfer rmsnorm and silu (#907 )	2024-08-11 14:57:13 +10:00
Lianmin Zheng	54fb1c80c0	Clean up unit tests (#1020 )	2024-08-10 15:09:03 -07:00
Ying Sheng	7599badeaf	Support embedding input as a list (#1014 )	2024-08-10 08:39:05 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Juwan Yoo	ab7875941b	feat: frequency, min_new_tokens, presence, and repetition penalties (#973 )	2024-08-08 04:21:08 -07:00
Ying Sheng	3bc99e6fe4	Test openai vision api (#925 )	2024-08-05 13:51:55 +10:00
Liangsheng Yin	bb66cc4c52	Fix CI && python3.8 compatible (#920 )	2024-08-04 16:02:05 -07:00
Ying Sheng	0d4f3a9fcd	Make API Key OpenAI-compatible (#917 )	2024-08-04 13:35:44 -07:00
Ying Sheng	995af5a54b	Improve the structure of CI (#911 )	2024-08-03 23:09:21 -07:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00
Ying Sheng	3cadecf0c4	Increase openai client limit (#886 )	2024-08-02 00:47:23 -07:00
Ying Sheng	e90e3a50d4	Add benchmark: HumanEval (#889 )	2024-08-02 00:46:41 -07:00

1 2 3 4

180 Commits