Commit Graph

15 Commits

Author SHA1 Message Date
Yineng Zhang
67470bbb28 minor: update correct measurement unit (#2406) 2024-12-08 20:55:04 +08:00
Lianmin Zheng
ccaf1f997c [CI] Print summary on github actions (#2274) 2024-11-29 23:48:54 -08:00
Lianmin Zheng
3c5538f781 Update CI threshold (#2186) 2024-11-25 15:24:17 -08:00
Lianmin Zheng
5652c56535 Update CI threshold & Improve code style (#2159) 2024-11-24 06:29:38 -08:00
Lianmin Zheng
7d671e4ad2 Enable overlap by default (#2067) 2024-11-19 22:07:58 -08:00
Yineng Zhang
766192610e feat: update torch 2.5.1 (#2069) 2024-11-18 21:29:13 +08:00
Lianmin Zheng
38625e2139 Remove monkey_patch_vllm_dummy_weight_loader (#2064) 2024-11-17 15:48:12 -08:00
Lianmin Zheng
c1f401fc58 Revert "chore: update torch v2.5.1" (#2063) 2024-11-17 15:29:38 -08:00
Yineng Zhang
3b878863f7 chore: update torch v2.5.1 (#1849) 2024-11-18 00:06:00 +08:00
Ying Sheng
c98e84c21e [Minor, Performance] Use torch.argmax for greedy sampling (#1589) 2024-10-06 13:15:05 -07:00
Ying Sheng
04b262cd91 [Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-10-04 08:51:11 +00:00
Lianmin Zheng
1acccb364a Fix oom issues with fp8 for llama (#1454) 2024-09-18 03:45:19 -07:00
Lianmin Zheng
9463bc1385 Enable torch.compile for triton backend (#1422) 2024-09-14 15:38:37 -07:00
Lianmin Zheng
ad0ff62a4c Balance test in CI (#1411) 2024-09-12 23:29:44 -07:00
Lianmin Zheng
68be2f6d3b [CI] Include triton backend and online serving benchmark into CI (#1408) 2024-09-12 21:36:41 -07:00