Commit Graph

11 Commits

Author SHA1 Message Date
Lianmin Zheng
b7a065eae3 Use cuda event wait and synchronization instead of busy waiting (#2089) 2024-11-19 00:21:46 -08:00
Lianmin Zheng
116685337e Fix cuda illegal memory access in overlap mode (#2070) 2024-11-17 21:29:30 -08:00
zolinthecow
f6dd648620 Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
2024-11-14 21:59:33 -08:00
James Xu
ddeb9d42de Add engine encode (#1995)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-11-11 11:48:17 -08:00
Lianmin Zheng
9c939a3d8b Clean up metrics code (#1972) 2024-11-09 15:43:20 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Byron Hsu
6fcd6d7d6d Support token ids in engine.generate (#1820) 2024-10-27 14:02:34 -07:00
Lianmin Zheng
dd3809fad8 Fix engine unit test (#1701) 2024-10-17 09:53:32 -07:00
Byron Hsu
862cd265e5 [engine] support async and streaming (#1614) 2024-10-11 15:26:25 -07:00
Byron Hsu
e8613df071 [Engine] Fix generate hanging issue after the first call (#1606) 2024-10-08 04:26:56 +00:00
Byron Hsu
551a3a9d38 Provide an offline engine API (#1567) 2024-10-06 20:27:03 -07:00