sglang

Author	SHA1	Message	Date
Lianmin Zheng	761b2cebd6	[CI] merge all ci tests into one file (#1289 )	2024-09-01 02:36:56 -07:00
Lianmin Zheng	1b5d56f7f8	[CI] Add more multi-gpu tests (#1280 )	2024-09-01 00:27:25 -07:00
xiaobochen	d134c139a1	Optimize the update flashinfer indices (#1262 )	2024-08-31 23:40:28 -07:00
Christopher Chou	51c554d812	Allow more flexible assistant and system response (#1256 )	2024-08-30 11:51:44 -07:00
Yineng Zhang	c411f32e1c	feat: replace GeluAndMul (#1234 )	2024-08-28 14:07:02 +00:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Yineng Zhang	66975360e7	fix: increase max_new_tokens when testing generation models (#1244 )	2024-08-28 22:12:36 +10:00
yichuan~	5ff25cdf5b	[Minor] add delete test and delete tmp file on ci server (#1227 )	2024-08-26 22:04:52 -07:00
caiyueliang	2f1d92834f	[FEAT] Support batches cancel (#1222 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 23:28:26 +00:00
Liangsheng Yin	c61a1b6f97	Torch compile CI throughput test (#1223 )	2024-08-26 13:52:58 -07:00
havetc	9935f97b3e	[FEAT] JSON constrained support (#1125 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 09:37:26 -07:00
Mingyi	97589a60a2	[CI] Parallelize unit tests in CI (#1219 )	2024-08-26 04:54:02 +00:00
Kaichen Zhang - NTU	3579162ab1	[Fix] Multi-images loading error (#1218 )	2024-08-26 03:58:51 +00:00
Mingyi	7514b9f8d3	[CI] Fix CI (#1217 )	2024-08-26 02:56:42 +00:00
Mingyi	158e8f1e2d	improve the threshold and ports in tests (#1215 )	2024-08-25 19:02:08 -07:00
Lianmin Zheng	15f1a49d2d	Update CI workflows (#1210 )	2024-08-25 16:43:07 -07:00
Ying Sheng	308d024092	[CI] Fix the issue of unit test hanging (#1211 )	2024-08-25 16:21:37 -07:00
Ying Sheng	ab4990e4bf	[Minor] Temporarily skip flaky test (#1209 )	2024-08-25 14:49:23 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Kaichen Zhang - NTU	66e7dcaf70	[Fix] Fixing the multi-images error for llava-onevision (#1205 )	2024-08-25 10:28:23 -07:00
Lianmin Zheng	bc4c7a3545	Relax the assert in moe throughput test to fix the flaky CI (#1207 )	2024-08-25 10:27:02 -07:00
Ying Sheng	1cb4da5c5f	[Fix] the issue of random order when input is a list (#1199 )	2024-08-24 21:43:03 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00
Kaichen Zhang - NTU	a5b14ad043	[Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123 ) Co-authored-by: Bo Li <drluodian@gmail.com>	2024-08-23 14:11:16 -07:00
Shan Yu	cd10654e7e	[Feat] Support update weights without restart server (#1157 )	2024-08-20 13:48:24 -07:00
Juwan Yoo	d8476818ef	feat: allow streaming for multi-prompt and/or parallel sampling (#1134 )	2024-08-20 08:06:55 -07:00
yichuan~	b997a18d74	[Feat]Add support for optional start len of logprobs (#1035 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-08-18 23:45:41 -07:00
Liangsheng Yin	5d0d40d0eb	Fix CI accuracy && time out limit (#1133 )	2024-08-16 21:41:11 -07:00
Liangsheng Yin	3694f8f996	Mixed style of chunked prefill (#1013 )	2024-08-16 09:13:00 +00:00
Lianmin Zheng	e86b1ccbf0	Enable chunked prefill by default (#1040 )	2024-08-14 21:56:20 -07:00
Liangsheng Yin	73cf6834f2	Support `stop_token_ids` in sglang API (#1092 )	2024-08-15 00:31:39 +00:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Yineng Zhang	c8423ca311	ci: update timeout and retry (#1086 ) Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-08-14 00:27:35 -07:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Lianmin Zheng	ad3e4f1619	Update the mixtral to use the better FusedMoE layer (#1081 )	2024-08-13 15:44:25 -07:00
Yineng Zhang	cebd78d83e	ci: add accuracy timeout (#1078 )	2024-08-13 22:12:58 +10:00
Yineng Zhang	f7fb68d292	ci: add moe test (#1053 )	2024-08-13 18:43:23 +10:00
Lianmin Zheng	c877292cc1	Re-organize CI tests (#1052 )	2024-08-12 03:39:01 -07:00
Lianmin Zheng	0c1c72a0b4	Fix accuracy test (#1051 )	2024-08-12 19:48:40 +10:00
Lianmin Zheng	41598e0d8e	Add longer accuracy test on CI (#1049 )	2024-08-12 09:21:38 +00:00
Ying Sheng	32f6144323	fix: Fix returned prefill logits and add output str test (#1046 )	2024-08-12 06:13:45 +00:00
Lianmin Zheng	14b6493087	Delete the useless test/srt/test_throughput.py (#1045 )	2024-08-11 21:31:52 -07:00
Lianmin Zheng	8207637029	Improve end-to-end throughput test and its coverage (#1039 )	2024-08-11 18:27:33 -07:00
Lianmin Zheng	d84c5e70f7	Test the case when max_new_tokens is very large (#1038 )	2024-08-11 16:41:03 -07:00
Lianmin Zheng	54fb1c80c0	Clean up unit tests (#1020 )	2024-08-10 15:09:03 -07:00
Ying Sheng	b68c4c073b	fix: force max new tokens to be 1 for embedding request (#1019 )	2024-08-10 13:46:42 -07:00
Ying Sheng	7599badeaf	Support embedding input as a list (#1014 )	2024-08-10 08:39:05 -07:00
gryffindor-rr	9cf0a5bada	Add skip_tokenizer_init args. (#959 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-09 12:14:13 -07:00
Ying Sheng	b16e856f11	Add openai embedding API (#997 )	2024-08-09 11:19:18 -07:00
Juwan Yoo	10bca45bc6	bugfix: penalizers to be merged before reqs (#1001 )	2024-08-09 21:46:24 +10:00

1 2 3

119 Commits