Lianmin Zheng
|
048685430d
|
Improve process creation (#1534)
|
2024-09-29 02:36:12 -07:00 |
|
Ying Sheng
|
9aa6553d2a
|
[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525)
|
2024-09-27 23:32:11 -07:00 |
|
Lianmin Zheng
|
bc068e9618
|
[CI] Move AMD test to a separate file (#1500)
|
2024-09-24 02:06:28 -07:00 |
|
Yineng Zhang
|
42a2d82ba7
|
minor: add mla fp8 test (#1494)
|
2024-09-23 20:40:17 +08:00 |
|
Ying Sheng
|
6f3cf1297e
|
[CI, AMD] Add AMD tests to CI (#1491)
|
2024-09-22 04:45:10 -07:00 |
|
Lianmin Zheng
|
13f1357ef0
|
Add a unit test for data parallelism (#1489)
|
2024-09-22 02:21:05 -07:00 |
|
Ke Bao
|
b8ccaf4d73
|
Add MLA gsm8k eval (#1484)
|
2024-09-21 11:16:13 +08:00 |
|
Ke Bao
|
a68cb201dd
|
Fix triton head num (#1482)
|
2024-09-21 10:25:20 +08:00 |
|
Lianmin Zheng
|
1acccb364a
|
Fix oom issues with fp8 for llama (#1454)
|
2024-09-18 03:45:19 -07:00 |
|
Lianmin Zheng
|
9ba1f09760
|
[Fix] Fix logprob and normalized_logprob (#1428)
|
2024-09-15 06:36:06 -07:00 |
|
Yineng Zhang
|
f3d32f888a
|
ci: fix finish (#1414)
|
2024-09-14 01:01:30 +10:00 |
|
Lianmin Zheng
|
8779da95d6
|
Update pr-test.yml (#1412)
|
2024-09-13 00:37:13 -07:00 |
|
Lianmin Zheng
|
ad0ff62a4c
|
Balance test in CI (#1411)
|
2024-09-12 23:29:44 -07:00 |
|
Lianmin Zheng
|
68be2f6d3b
|
[CI] Include triton backend and online serving benchmark into CI (#1408)
|
2024-09-12 21:36:41 -07:00 |
|
Lianmin Zheng
|
f64eae3a29
|
[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308)
|
2024-09-02 21:44:45 -07:00 |
|
Yineng Zhang
|
2561ed012c
|
feat: update nightly gsm8k eval (#1304)
|
2024-09-03 01:18:41 +10:00 |
|
Yineng Zhang
|
6487ef64c6
|
ci: add nightly eval (#1291)
|
2024-09-02 03:19:49 +10:00 |
|
Lianmin Zheng
|
761b2cebd6
|
[CI] merge all ci tests into one file (#1289)
|
2024-09-01 02:36:56 -07:00 |
|
Lianmin Zheng
|
1b5d56f7f8
|
[CI] Add more multi-gpu tests (#1280)
|
2024-09-01 00:27:25 -07:00 |
|
Lianmin Zheng
|
6c49831394
|
Add sglang.bench_latency to CI (#1243)
|
2024-08-28 21:20:54 +10:00 |
|
Yineng Zhang
|
f25f4dfde5
|
hotfix: revert sampler CUDA Graph (#1242)
|
2024-08-28 21:16:47 +10:00 |
|
Liangsheng Yin
|
1ece2cda3d
|
Fix bench latency benchmark (#1225)
|
2024-08-28 00:37:32 -07:00 |
|
Mingyi
|
97589a60a2
|
[CI] Parallelize unit tests in CI (#1219)
|
2024-08-26 04:54:02 +00:00 |
|
Liangsheng Yin
|
632d506d0b
|
minor: improve CI and dependencies (#1212)
|
2024-08-26 04:26:31 +00:00 |
|
Lianmin Zheng
|
d3efcb3930
|
Update workflow files (#1214)
|
2024-08-25 17:45:35 -07:00 |
|
Lianmin Zheng
|
61bb223e0f
|
Update CI runner docs (#1213)
|
2024-08-25 17:31:52 -07:00 |
|
Lianmin Zheng
|
15f1a49d2d
|
Update CI workflows (#1210)
|
2024-08-25 16:43:07 -07:00 |
|
Chayenne
|
30b4f771b0
|
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-25 10:29:12 -07:00 |
|
Lianmin Zheng
|
a8ae640328
|
Improve docs and warnings (#1164)
|
2024-08-20 08:31:29 -07:00 |
|
Liangsheng Yin
|
5d0d40d0eb
|
Fix CI accuracy && time out limit (#1133)
|
2024-08-16 21:41:11 -07:00 |
|
Yineng Zhang
|
26e9c12c15
|
ci: compatible with fork repo (#1115)
|
2024-08-16 04:26:44 +10:00 |
|
Lianmin Zheng
|
e86b1ccbf0
|
Enable chunked prefill by default (#1040)
|
2024-08-14 21:56:20 -07:00 |
|
Yineng Zhang
|
67c0d832a6
|
docs: update pr template (#1099)
|
2024-08-14 22:25:39 +10:00 |
|
Yineng Zhang
|
fe5024325b
|
docs: update README (#1098)
|
2024-08-14 04:40:05 -07:00 |
|
Yineng Zhang
|
f14569f64a
|
ci: remove workflow path trigger (#1096)
|
2024-08-14 20:36:24 +10:00 |
|
Yineng Zhang
|
c8423ca311
|
ci: update timeout and retry (#1086)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-08-14 00:27:35 -07:00 |
|
Yineng Zhang
|
cebd78d83e
|
ci: add accuracy timeout (#1078)
|
2024-08-13 22:12:58 +10:00 |
|
Yineng Zhang
|
f7fb68d292
|
ci: add moe test (#1053)
|
2024-08-13 18:43:23 +10:00 |
|
Yineng Zhang
|
396a13e6ad
|
ci: add cancel pr workflow (#1070)
|
2024-08-13 18:16:50 +10:00 |
|
Lianmin Zheng
|
c877292cc1
|
Re-organize CI tests (#1052)
|
2024-08-12 03:39:01 -07:00 |
|
Lianmin Zheng
|
41598e0d8e
|
Add longer accuracy test on CI (#1049)
|
2024-08-12 09:21:38 +00:00 |
|
Yineng Zhang
|
cb99ba4fc6
|
feat: update Dockerfile (#1033)
Co-authored-by: vhain <vhain6512@gmail.com>
|
2024-08-12 16:24:06 +10:00 |
|
Lianmin Zheng
|
8207637029
|
Improve end-to-end throughput test and its coverage (#1039)
|
2024-08-11 18:27:33 -07:00 |
|
Yineng Zhang
|
33d61356b8
|
misc: update issue template (#1024)
|
2024-08-11 17:34:30 +10:00 |
|
Lianmin Zheng
|
54fb1c80c0
|
Clean up unit tests (#1020)
|
2024-08-10 15:09:03 -07:00 |
|
Yineng Zhang
|
e712837d38
|
misc: update test config (#990)
|
2024-08-11 04:20:30 +10:00 |
|
Ying Sheng
|
e040a2450b
|
Add e5-mistral embedding model - step 3/3 (#988)
|
2024-08-08 16:31:19 -07:00 |
|
Liangsheng Yin
|
4d929107ae
|
Run purge-cache only in sgl-project (#976)
|
2024-08-07 13:16:36 -07:00 |
|
Liangsheng Yin
|
fbe0c818c2
|
Purge self-runner's pip cache weekly (#975)
|
2024-08-07 12:43:12 -07:00 |
|
Yineng Zhang
|
c31f084c71
|
chore: update vllm to 0.5.4 (#966)
|
2024-08-07 21:15:41 +10:00 |
|