Commit Graph

13 Commits

Author SHA1 Message Date
Lianmin Zheng
d4fc1a70e3 Crash the server correctly during error (#2231) 2024-11-28 00:22:39 -08:00
Lianmin Zheng
b7a065eae3 Use cuda event wait and synchronization instead of busy waiting (#2089) 2024-11-19 00:21:46 -08:00
Lianmin Zheng
520f0094e4 [CI] balance unit tests (#1977) 2024-11-09 16:46:14 -08:00
Lianmin Zheng
9c939a3d8b Clean up metrics code (#1972) 2024-11-09 15:43:20 -08:00
Lianmin Zheng
86fc0d79d0 Add a watch dog thread (#1816) 2024-10-27 02:00:50 -07:00
Lianmin Zheng
2b80978859 Provide an argument to set the maximum batch size for cuda graph (#1809) 2024-10-26 15:09:33 -07:00
Lianmin Zheng
1701b0db31 Enhance the test case for chunked prefill (#1785) 2024-10-24 21:23:09 -07:00
Lianmin Zheng
dafb6a5266 [Fix] Fix the style of test_large_max_new_tokens.py (#1638) 2024-10-11 16:05:58 -07:00
Ying Sheng
e4780cf839 [API, Feature] Support response prefill for openai API (#1490) 2024-09-22 06:46:17 -07:00
Mingyi
158e8f1e2d improve the threshold and ports in tests (#1215) 2024-08-25 19:02:08 -07:00
Yineng Zhang
f7fb68d292 ci: add moe test (#1053) 2024-08-13 18:43:23 +10:00
Lianmin Zheng
8207637029 Improve end-to-end throughput test and its coverage (#1039) 2024-08-11 18:27:33 -07:00
Lianmin Zheng
d84c5e70f7 Test the case when max_new_tokens is very large (#1038) 2024-08-11 16:41:03 -07:00