Commit Graph

108 Commits

Author SHA1 Message Date
Chayenne
f4cd804073 Fix ci and link error (#1892)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-11-02 19:08:49 -07:00
Chayenne
3b60558dd7 Native api (#1886)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-11-02 01:02:17 -07:00
Lianmin Zheng
2565cb0f40 Update docs and workflow (#1881) 2024-11-01 20:29:41 -07:00
Yineng Zhang
104bf2609b minor: update nightly eval (#1867) 2024-11-01 21:38:29 +08:00
Yineng Zhang
d86a2d6562 minor: add human eval (#1754) 2024-11-01 14:29:20 +08:00
Chayenne
61cf00e112 change file tree (#1859)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-10-31 20:10:16 -07:00
Liangsheng Yin
b9fd178f1b Fix retraction + overlap (#1860)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-10-31 18:27:42 -07:00
Lianmin Zheng
a2e0424abf Fix memory leak for chunked prefill 2 (#1858)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-10-31 14:51:51 -07:00
Chayenne
6e13b650a9 Fix docs deploy ci (#1821) 2024-10-27 21:03:41 -07:00
Chayenne
51c81e339b Add openAI compatible API (#1810)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-10-27 10:51:42 -07:00
Chayenne
9d6fb08457 Fix docs ci (#1808) 2024-10-26 11:23:51 -07:00
Chayenne
ced362f7c6 Simplify our docs with complicated functions into utils (#1807)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
2024-10-26 17:44:11 +00:00
Lianmin Zheng
9084a86445 Update links (#1805) 2024-10-26 04:46:01 -07:00
Lianmin Zheng
6aa94b967c Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00
Chayenne
715b16c140 Add support for ipynb (#1786) 2024-10-25 20:48:35 -07:00
Lianmin Zheng
1701b0db31 Enhance the test case for chunked prefill (#1785) 2024-10-24 21:23:09 -07:00
Yineng Zhang
cbbc82b7b8 Support qwen2 vl model (#1721)
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ispobock <ISPObaoke@163.com>
2024-10-19 21:44:38 -07:00
Lianmin Zheng
b6cd903604 Update readme and workflow (#1716) 2024-10-19 13:01:44 -07:00
Lianmin Zheng
d17d19e5b8 Fix mixed batch for multi modal models (#1702) 2024-10-17 10:27:26 -07:00
Lianmin Zheng
02f7f3e488 Update the transformers version in CI (#1690) 2024-10-16 19:03:55 -07:00
Lianmin Zheng
6790240cc3 Fix unit test order to balance the tasks in CI (#1665) 2024-10-14 02:01:44 -07:00
Lianmin Zheng
69aa937aa5 Fix unit tests and type annotations (#1648) 2024-10-12 14:49:24 -07:00
Lianmin Zheng
00c7e6368b Release v0.3.3.post1 (#1636) 2024-10-11 07:56:16 -07:00
Lianmin Zheng
23cc66f7b6 Add back data parallelism (#1635) 2024-10-11 07:22:48 -07:00
Ying Sheng
04b262cd91 [Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-10-04 08:51:11 +00:00
Lianmin Zheng
048685430d Improve process creation (#1534) 2024-09-29 02:36:12 -07:00
Ying Sheng
9aa6553d2a [Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525) 2024-09-27 23:32:11 -07:00
Lianmin Zheng
bc068e9618 [CI] Move AMD test to a separate file (#1500) 2024-09-24 02:06:28 -07:00
Yineng Zhang
42a2d82ba7 minor: add mla fp8 test (#1494) 2024-09-23 20:40:17 +08:00
Ying Sheng
6f3cf1297e [CI, AMD] Add AMD tests to CI (#1491) 2024-09-22 04:45:10 -07:00
Lianmin Zheng
13f1357ef0 Add a unit test for data parallelism (#1489) 2024-09-22 02:21:05 -07:00
Ke Bao
b8ccaf4d73 Add MLA gsm8k eval (#1484) 2024-09-21 11:16:13 +08:00
Ke Bao
a68cb201dd Fix triton head num (#1482) 2024-09-21 10:25:20 +08:00
Lianmin Zheng
1acccb364a Fix oom issues with fp8 for llama (#1454) 2024-09-18 03:45:19 -07:00
Lianmin Zheng
9ba1f09760 [Fix] Fix logprob and normalized_logprob (#1428) 2024-09-15 06:36:06 -07:00
Yineng Zhang
f3d32f888a ci: fix finish (#1414) 2024-09-14 01:01:30 +10:00
Lianmin Zheng
8779da95d6 Update pr-test.yml (#1412) 2024-09-13 00:37:13 -07:00
Lianmin Zheng
ad0ff62a4c Balance test in CI (#1411) 2024-09-12 23:29:44 -07:00
Lianmin Zheng
68be2f6d3b [CI] Include triton backend and online serving benchmark into CI (#1408) 2024-09-12 21:36:41 -07:00
Lianmin Zheng
f64eae3a29 [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308) 2024-09-02 21:44:45 -07:00
Yineng Zhang
2561ed012c feat: update nightly gsm8k eval (#1304) 2024-09-03 01:18:41 +10:00
Yineng Zhang
6487ef64c6 ci: add nightly eval (#1291) 2024-09-02 03:19:49 +10:00
Lianmin Zheng
761b2cebd6 [CI] merge all ci tests into one file (#1289) 2024-09-01 02:36:56 -07:00
Lianmin Zheng
1b5d56f7f8 [CI] Add more multi-gpu tests (#1280) 2024-09-01 00:27:25 -07:00
Lianmin Zheng
6c49831394 Add sglang.bench_latency to CI (#1243) 2024-08-28 21:20:54 +10:00
Yineng Zhang
f25f4dfde5 hotfix: revert sampler CUDA Graph (#1242) 2024-08-28 21:16:47 +10:00
Liangsheng Yin
1ece2cda3d Fix bench latency benchmark (#1225) 2024-08-28 00:37:32 -07:00
Mingyi
97589a60a2 [CI] Parallelize unit tests in CI (#1219) 2024-08-26 04:54:02 +00:00
Liangsheng Yin
632d506d0b minor: improve CI and dependencies (#1212) 2024-08-26 04:26:31 +00:00
Lianmin Zheng
d3efcb3930 Update workflow files (#1214) 2024-08-25 17:45:35 -07:00