sglang

Author	SHA1	Message	Date
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Lianmin Zheng	9463bc1385	Enable torch.compile for triton backend (#1422 )	2024-09-14 15:38:37 -07:00
Yineng Zhang	e3fc4658f4	fix: resolve nightly eval (#1426 )	2024-09-15 02:07:52 +10:00
Ke Bao	33b54e7c40	Add pytorch sampling backend ut (#1425 )	2024-09-15 01:15:30 +10:00
Lianmin Zheng	ad0ff62a4c	Balance test in CI (#1411 )	2024-09-12 23:29:44 -07:00
Lianmin Zheng	68be2f6d3b	[CI] Include triton backend and online serving benchmark into CI (#1408 )	2024-09-12 21:36:41 -07:00
Ying Sheng	eb02c1618a	[Minor, CI] remove lora test from minimal suite (#1406 )	2024-09-12 16:49:50 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Lianmin Zheng	3efa798116	Support cuda graph in the triton attention backend (#1401 )	2024-09-12 00:36:55 -07:00
Lianmin Zheng	fec185ce0c	Refactor attention backend (#1381 )	2024-09-11 11:44:26 -07:00
Byron Hsu	8c0efa514d	remove assertion in triton attention and add an unit test (#1385 )	2024-09-11 03:22:07 -07:00
Liangsheng Yin	144bc70fcc	Organize flashinfer indices update (#1378 )	2024-09-10 17:38:59 -07:00
Lianmin Zheng	46094e0c1b	Deprecate --disable-flashinfer and introduce --attention-backend (#1380 )	2024-09-10 17:11:16 -07:00
Lianmin Zheng	6c7cb90365	[Minor] improve kill scripts and torchao import (#1375 )	2024-09-11 04:27:03 +10:00
zifeitong	9144ed1067	Support OpenAI API json_schema response format (#1363 )	2024-09-09 19:08:25 -07:00
Ying Sheng	689ff588ec	[CI] Return output logprobs in unit test (#1361 )	2024-09-09 13:05:13 -07:00
Jerry Zhang	a7c47e0f02	Add torchao quant (int4/int8/fp8) to llama models (#1341 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-09-09 05:32:41 -07:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Kai-Hsun Chen	c9b75917d5	[server] Passing `model_override_args` to `launch_server` via the CLI. (#1298 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2024-09-09 02:14:25 -07:00
Kaichen Zhang - NTU	662ecd9368	[Feat] Add modalities for vision server when handling pixel values for llava (#1346 )	2024-09-09 02:07:34 -07:00
Lianmin Zheng	843e63d809	Fix the flaky test test_moe_eval_accuracy_large.py (#1326 )	2024-09-04 04:15:11 -07:00
Lianmin Zheng	1e495e0847	[Fix] Fix select by ensuring each request has at least one token (#1318 )	2024-09-03 06:31:45 -07:00
Yineng Zhang	2561ed012c	feat: update nightly gsm8k eval (#1304 )	2024-09-03 01:18:41 +10:00
Lianmin Zheng	58fa607622	Fix the flaky tests in test_moe_eval_accuracy_large.py (#1293 )	2024-09-01 12:20:46 -07:00
Lianmin Zheng	761b2cebd6	[CI] merge all ci tests into one file (#1289 )	2024-09-01 02:36:56 -07:00
Lianmin Zheng	1b5d56f7f8	[CI] Add more multi-gpu tests (#1280 )	2024-09-01 00:27:25 -07:00
xiaobochen	d134c139a1	Optimize the update flashinfer indices (#1262 )	2024-08-31 23:40:28 -07:00
Christopher Chou	51c554d812	Allow more flexible assistant and system response (#1256 )	2024-08-30 11:51:44 -07:00
Yineng Zhang	c411f32e1c	feat: replace GeluAndMul (#1234 )	2024-08-28 14:07:02 +00:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Yineng Zhang	66975360e7	fix: increase max_new_tokens when testing generation models (#1244 )	2024-08-28 22:12:36 +10:00
yichuan~	5ff25cdf5b	[Minor] add delete test and delete tmp file on ci server (#1227 )	2024-08-26 22:04:52 -07:00
caiyueliang	2f1d92834f	[FEAT] Support batches cancel (#1222 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 23:28:26 +00:00
Liangsheng Yin	c61a1b6f97	Torch compile CI throughput test (#1223 )	2024-08-26 13:52:58 -07:00
havetc	9935f97b3e	[FEAT] JSON constrained support (#1125 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 09:37:26 -07:00
Mingyi	97589a60a2	[CI] Parallelize unit tests in CI (#1219 )	2024-08-26 04:54:02 +00:00
Kaichen Zhang - NTU	3579162ab1	[Fix] Multi-images loading error (#1218 )	2024-08-26 03:58:51 +00:00
Mingyi	7514b9f8d3	[CI] Fix CI (#1217 )	2024-08-26 02:56:42 +00:00
Mingyi	158e8f1e2d	improve the threshold and ports in tests (#1215 )	2024-08-25 19:02:08 -07:00
Lianmin Zheng	15f1a49d2d	Update CI workflows (#1210 )	2024-08-25 16:43:07 -07:00
Ying Sheng	308d024092	[CI] Fix the issue of unit test hanging (#1211 )	2024-08-25 16:21:37 -07:00
Ying Sheng	ab4990e4bf	[Minor] Temporarily skip flaky test (#1209 )	2024-08-25 14:49:23 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Kaichen Zhang - NTU	66e7dcaf70	[Fix] Fixing the multi-images error for llava-onevision (#1205 )	2024-08-25 10:28:23 -07:00
Lianmin Zheng	bc4c7a3545	Relax the assert in moe throughput test to fix the flaky CI (#1207 )	2024-08-25 10:27:02 -07:00
Ying Sheng	1cb4da5c5f	[Fix] the issue of random order when input is a list (#1199 )	2024-08-24 21:43:03 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00
Kaichen Zhang - NTU	a5b14ad043	[Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123 ) Co-authored-by: Bo Li <drluodian@gmail.com>	2024-08-23 14:11:16 -07:00
Shan Yu	cd10654e7e	[Feat] Support update weights without restart server (#1157 )	2024-08-20 13:48:24 -07:00
Juwan Yoo	d8476818ef	feat: allow streaming for multi-prompt and/or parallel sampling (#1134 )	2024-08-20 08:06:55 -07:00

1 2 3

143 Commits