Commit Graph

11 Commits

Author SHA1 Message Date
Lianmin Zheng
3efa798116 Support cuda graph in the triton attention backend (#1401) 2024-09-12 00:36:55 -07:00
Lianmin Zheng
fec185ce0c Refactor attention backend (#1381) 2024-09-11 11:44:26 -07:00
Lianmin Zheng
46094e0c1b Deprecate --disable-flashinfer and introduce --attention-backend (#1380) 2024-09-10 17:11:16 -07:00
Lianmin Zheng
1b5d56f7f8 [CI] Add more multi-gpu tests (#1280) 2024-09-01 00:27:25 -07:00
Mingyi
158e8f1e2d improve the threshold and ports in tests (#1215) 2024-08-25 19:02:08 -07:00
Lianmin Zheng
e86b1ccbf0 Enable chunked prefill by default (#1040) 2024-08-14 21:56:20 -07:00
Yineng Zhang
f7fb68d292 ci: add moe test (#1053) 2024-08-13 18:43:23 +10:00
Lianmin Zheng
c877292cc1 Re-organize CI tests (#1052) 2024-08-12 03:39:01 -07:00
Lianmin Zheng
0c1c72a0b4 Fix accuracy test (#1051) 2024-08-12 19:48:40 +10:00
Lianmin Zheng
41598e0d8e Add longer accuracy test on CI (#1049) 2024-08-12 09:21:38 +00:00
Lianmin Zheng
8207637029 Improve end-to-end throughput test and its coverage (#1039) 2024-08-11 18:27:33 -07:00