Commit Graph

87 Commits

Author SHA1 Message Date
Lianmin Zheng
69aa937aa5 Fix unit tests and type annotations (#1648) 2024-10-12 14:49:24 -07:00
Lianmin Zheng
00c7e6368b Release v0.3.3.post1 (#1636) 2024-10-11 07:56:16 -07:00
Lianmin Zheng
23cc66f7b6 Add back data parallelism (#1635) 2024-10-11 07:22:48 -07:00
Ying Sheng
04b262cd91 [Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-10-04 08:51:11 +00:00
Lianmin Zheng
048685430d Improve process creation (#1534) 2024-09-29 02:36:12 -07:00
Ying Sheng
9aa6553d2a [Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525) 2024-09-27 23:32:11 -07:00
Lianmin Zheng
bc068e9618 [CI] Move AMD test to a separate file (#1500) 2024-09-24 02:06:28 -07:00
Yineng Zhang
42a2d82ba7 minor: add mla fp8 test (#1494) 2024-09-23 20:40:17 +08:00
Ying Sheng
6f3cf1297e [CI, AMD] Add AMD tests to CI (#1491) 2024-09-22 04:45:10 -07:00
Lianmin Zheng
13f1357ef0 Add a unit test for data parallelism (#1489) 2024-09-22 02:21:05 -07:00
Ke Bao
b8ccaf4d73 Add MLA gsm8k eval (#1484) 2024-09-21 11:16:13 +08:00
Ke Bao
a68cb201dd Fix triton head num (#1482) 2024-09-21 10:25:20 +08:00
Lianmin Zheng
1acccb364a Fix oom issues with fp8 for llama (#1454) 2024-09-18 03:45:19 -07:00
Lianmin Zheng
9ba1f09760 [Fix] Fix logprob and normalized_logprob (#1428) 2024-09-15 06:36:06 -07:00
Yineng Zhang
f3d32f888a ci: fix finish (#1414) 2024-09-14 01:01:30 +10:00
Lianmin Zheng
8779da95d6 Update pr-test.yml (#1412) 2024-09-13 00:37:13 -07:00
Lianmin Zheng
ad0ff62a4c Balance test in CI (#1411) 2024-09-12 23:29:44 -07:00
Lianmin Zheng
68be2f6d3b [CI] Include triton backend and online serving benchmark into CI (#1408) 2024-09-12 21:36:41 -07:00
Lianmin Zheng
f64eae3a29 [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308) 2024-09-02 21:44:45 -07:00
Yineng Zhang
2561ed012c feat: update nightly gsm8k eval (#1304) 2024-09-03 01:18:41 +10:00
Yineng Zhang
6487ef64c6 ci: add nightly eval (#1291) 2024-09-02 03:19:49 +10:00
Lianmin Zheng
761b2cebd6 [CI] merge all ci tests into one file (#1289) 2024-09-01 02:36:56 -07:00
Lianmin Zheng
1b5d56f7f8 [CI] Add more multi-gpu tests (#1280) 2024-09-01 00:27:25 -07:00
Lianmin Zheng
6c49831394 Add sglang.bench_latency to CI (#1243) 2024-08-28 21:20:54 +10:00
Yineng Zhang
f25f4dfde5 hotfix: revert sampler CUDA Graph (#1242) 2024-08-28 21:16:47 +10:00
Liangsheng Yin
1ece2cda3d Fix bench latency benchmark (#1225) 2024-08-28 00:37:32 -07:00
Mingyi
97589a60a2 [CI] Parallelize unit tests in CI (#1219) 2024-08-26 04:54:02 +00:00
Liangsheng Yin
632d506d0b minor: improve CI and dependencies (#1212) 2024-08-26 04:26:31 +00:00
Lianmin Zheng
d3efcb3930 Update workflow files (#1214) 2024-08-25 17:45:35 -07:00
Lianmin Zheng
61bb223e0f Update CI runner docs (#1213) 2024-08-25 17:31:52 -07:00
Lianmin Zheng
15f1a49d2d Update CI workflows (#1210) 2024-08-25 16:43:07 -07:00
Chayenne
30b4f771b0 Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-25 10:29:12 -07:00
Liangsheng Yin
5d0d40d0eb Fix CI accuracy && time out limit (#1133) 2024-08-16 21:41:11 -07:00
Yineng Zhang
26e9c12c15 ci: compatible with fork repo (#1115) 2024-08-16 04:26:44 +10:00
Lianmin Zheng
e86b1ccbf0 Enable chunked prefill by default (#1040) 2024-08-14 21:56:20 -07:00
Yineng Zhang
f14569f64a ci: remove workflow path trigger (#1096) 2024-08-14 20:36:24 +10:00
Yineng Zhang
c8423ca311 ci: update timeout and retry (#1086)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-08-14 00:27:35 -07:00
Yineng Zhang
cebd78d83e ci: add accuracy timeout (#1078) 2024-08-13 22:12:58 +10:00
Yineng Zhang
f7fb68d292 ci: add moe test (#1053) 2024-08-13 18:43:23 +10:00
Yineng Zhang
396a13e6ad ci: add cancel pr workflow (#1070) 2024-08-13 18:16:50 +10:00
Lianmin Zheng
c877292cc1 Re-organize CI tests (#1052) 2024-08-12 03:39:01 -07:00
Lianmin Zheng
41598e0d8e Add longer accuracy test on CI (#1049) 2024-08-12 09:21:38 +00:00
Yineng Zhang
cb99ba4fc6 feat: update Dockerfile (#1033)
Co-authored-by: vhain <vhain6512@gmail.com>
2024-08-12 16:24:06 +10:00
Lianmin Zheng
8207637029 Improve end-to-end throughput test and its coverage (#1039) 2024-08-11 18:27:33 -07:00
Lianmin Zheng
54fb1c80c0 Clean up unit tests (#1020) 2024-08-10 15:09:03 -07:00
Yineng Zhang
e712837d38 misc: update test config (#990) 2024-08-11 04:20:30 +10:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00
Liangsheng Yin
4d929107ae Run purge-cache only in sgl-project (#976) 2024-08-07 13:16:36 -07:00
Liangsheng Yin
fbe0c818c2 Purge self-runner's pip cache weekly (#975) 2024-08-07 12:43:12 -07:00
Yineng Zhang
c31f084c71 chore: update vllm to 0.5.4 (#966) 2024-08-07 21:15:41 +10:00