Commit Graph

11 Commits

Author SHA1 Message Date
Lianmin Zheng
d4fc1a70e3 Crash the server correctly during error (#2231) 2024-11-28 00:22:39 -08:00
Lianmin Zheng
7d671e4ad2 Enable overlap by default (#2067) 2024-11-19 22:07:58 -08:00
Lianmin Zheng
a7164b620f Tune the threshold for accuracy tests in CI (#2071) 2024-11-17 21:51:00 -08:00
Lianmin Zheng
86fc0d79d0 Add a watch dog thread (#1816) 2024-10-27 02:00:50 -07:00
Lianmin Zheng
05b3bf5e8e Crash the server on warnings in CI (#1772) 2024-10-23 16:27:13 -07:00
Lianmin Zheng
9463bc1385 Enable torch.compile for triton backend (#1422) 2024-09-14 15:38:37 -07:00
Jerry Zhang
a7c47e0f02 Add torchao quant (int4/int8/fp8) to llama models (#1341)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-09-09 05:32:41 -07:00
Lianmin Zheng
e4d68afcf0 [Minor] Many cleanup (#1357) 2024-09-09 04:14:11 -07:00
Lianmin Zheng
843e63d809 Fix the flaky test test_moe_eval_accuracy_large.py (#1326) 2024-09-04 04:15:11 -07:00
Lianmin Zheng
58fa607622 Fix the flaky tests in test_moe_eval_accuracy_large.py (#1293) 2024-09-01 12:20:46 -07:00
Lianmin Zheng
1b5d56f7f8 [CI] Add more multi-gpu tests (#1280) 2024-09-01 00:27:25 -07:00