Commit Graph

128 Commits

Author SHA1 Message Date
Ying Sheng
689ff588ec [CI] Return output logprobs in unit test (#1361) 2024-09-09 13:05:13 -07:00
Jerry Zhang
a7c47e0f02 Add torchao quant (int4/int8/fp8) to llama models (#1341)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-09-09 05:32:41 -07:00
Lianmin Zheng
e4d68afcf0 [Minor] Many cleanup (#1357) 2024-09-09 04:14:11 -07:00
Kai-Hsun Chen
c9b75917d5 [server] Passing model_override_args to launch_server via the CLI. (#1298)
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
2024-09-09 02:14:25 -07:00
Kaichen Zhang - NTU
662ecd9368 [Feat] Add modalities for vision server when handling pixel values for llava (#1346) 2024-09-09 02:07:34 -07:00
Lianmin Zheng
843e63d809 Fix the flaky test test_moe_eval_accuracy_large.py (#1326) 2024-09-04 04:15:11 -07:00
Lianmin Zheng
1e495e0847 [Fix] Fix select by ensuring each request has at least one token (#1318) 2024-09-03 06:31:45 -07:00
Yineng Zhang
2561ed012c feat: update nightly gsm8k eval (#1304) 2024-09-03 01:18:41 +10:00
Lianmin Zheng
58fa607622 Fix the flaky tests in test_moe_eval_accuracy_large.py (#1293) 2024-09-01 12:20:46 -07:00
Lianmin Zheng
761b2cebd6 [CI] merge all ci tests into one file (#1289) 2024-09-01 02:36:56 -07:00
Lianmin Zheng
1b5d56f7f8 [CI] Add more multi-gpu tests (#1280) 2024-09-01 00:27:25 -07:00
xiaobochen
d134c139a1 Optimize the update flashinfer indices (#1262) 2024-08-31 23:40:28 -07:00
Christopher Chou
51c554d812 Allow more flexible assistant and system response (#1256) 2024-08-30 11:51:44 -07:00
Yineng Zhang
c411f32e1c feat: replace GeluAndMul (#1234) 2024-08-28 14:07:02 +00:00
Lianmin Zheng
bf53bf5142 [Fix] Fix llava on multi images (#1247) 2024-08-28 06:33:05 -07:00
Yineng Zhang
66975360e7 fix: increase max_new_tokens when testing generation models (#1244) 2024-08-28 22:12:36 +10:00
yichuan~
5ff25cdf5b [Minor] add delete test and delete tmp file on ci server (#1227) 2024-08-26 22:04:52 -07:00
caiyueliang
2f1d92834f [FEAT] Support batches cancel (#1222)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-08-26 23:28:26 +00:00
Liangsheng Yin
c61a1b6f97 Torch compile CI throughput test (#1223) 2024-08-26 13:52:58 -07:00
havetc
9935f97b3e [FEAT] JSON constrained support (#1125)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-08-26 09:37:26 -07:00
Mingyi
97589a60a2 [CI] Parallelize unit tests in CI (#1219) 2024-08-26 04:54:02 +00:00
Kaichen Zhang - NTU
3579162ab1 [Fix] Multi-images loading error (#1218) 2024-08-26 03:58:51 +00:00
Mingyi
7514b9f8d3 [CI] Fix CI (#1217) 2024-08-26 02:56:42 +00:00
Mingyi
158e8f1e2d improve the threshold and ports in tests (#1215) 2024-08-25 19:02:08 -07:00
Lianmin Zheng
15f1a49d2d Update CI workflows (#1210) 2024-08-25 16:43:07 -07:00
Ying Sheng
308d024092 [CI] Fix the issue of unit test hanging (#1211) 2024-08-25 16:21:37 -07:00
Ying Sheng
ab4990e4bf [Minor] Temporarily skip flaky test (#1209) 2024-08-25 14:49:23 -07:00
Chayenne
30b4f771b0 Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-25 10:29:12 -07:00
Kaichen Zhang - NTU
66e7dcaf70 [Fix] Fixing the multi-images error for llava-onevision (#1205) 2024-08-25 10:28:23 -07:00
Lianmin Zheng
bc4c7a3545 Relax the assert in moe throughput test to fix the flaky CI (#1207) 2024-08-25 10:27:02 -07:00
Ying Sheng
1cb4da5c5f [Fix] the issue of random order when input is a list (#1199) 2024-08-24 21:43:03 -07:00
Lianmin Zheng
f6af3a6561 Cleanup readme, llava examples, usage examples and nccl init (#1194) 2024-08-24 08:02:23 -07:00
Kaichen Zhang - NTU
a5b14ad043 [Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123)
Co-authored-by: Bo Li <drluodian@gmail.com>
2024-08-23 14:11:16 -07:00
Shan Yu
cd10654e7e [Feat] Support update weights without restart server (#1157) 2024-08-20 13:48:24 -07:00
Juwan Yoo
d8476818ef feat: allow streaming for multi-prompt and/or parallel sampling (#1134) 2024-08-20 08:06:55 -07:00
yichuan~
b997a18d74 [Feat]Add support for optional start len of logprobs (#1035)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-08-18 23:45:41 -07:00
Liangsheng Yin
5d0d40d0eb Fix CI accuracy && time out limit (#1133) 2024-08-16 21:41:11 -07:00
Liangsheng Yin
3694f8f996 Mixed style of chunked prefill (#1013) 2024-08-16 09:13:00 +00:00
Lianmin Zheng
e86b1ccbf0 Enable chunked prefill by default (#1040) 2024-08-14 21:56:20 -07:00
Liangsheng Yin
73cf6834f2 Support stop_token_ids in sglang API (#1092) 2024-08-15 00:31:39 +00:00
Liangsheng Yin
a34dd86a7d Use dtype to control generate (#1082)
Co-authored-by: zhyncs <me@zhyncs.com>
2024-08-14 15:58:07 +00:00
Yineng Zhang
c8423ca311 ci: update timeout and retry (#1086)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-08-14 00:27:35 -07:00
Ying Sheng
0909bb0d2f [Feat] Add window attention for gemma-2 (#1056) 2024-08-13 17:01:26 -07:00
Lianmin Zheng
ad3e4f1619 Update the mixtral to use the better FusedMoE layer (#1081) 2024-08-13 15:44:25 -07:00
Yineng Zhang
cebd78d83e ci: add accuracy timeout (#1078) 2024-08-13 22:12:58 +10:00
Yineng Zhang
f7fb68d292 ci: add moe test (#1053) 2024-08-13 18:43:23 +10:00
Lianmin Zheng
c877292cc1 Re-organize CI tests (#1052) 2024-08-12 03:39:01 -07:00
Lianmin Zheng
0c1c72a0b4 Fix accuracy test (#1051) 2024-08-12 19:48:40 +10:00
Lianmin Zheng
41598e0d8e Add longer accuracy test on CI (#1049) 2024-08-12 09:21:38 +00:00
Ying Sheng
32f6144323 fix: Fix returned prefill logits and add output str test (#1046) 2024-08-12 06:13:45 +00:00