Lianmin Zheng
|
b7a065eae3
|
Use cuda event wait and synchronization instead of busy waiting (#2089)
|
2024-11-19 00:21:46 -08:00 |
|
Lianmin Zheng
|
116685337e
|
Fix cuda illegal memory access in overlap mode (#2070)
|
2024-11-17 21:29:30 -08:00 |
|
zolinthecow
|
f6dd648620
|
Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2024-11-14 21:59:33 -08:00 |
|
James Xu
|
ddeb9d42de
|
Add engine encode (#1995)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-11-11 11:48:17 -08:00 |
|
Lianmin Zheng
|
9c939a3d8b
|
Clean up metrics code (#1972)
|
2024-11-09 15:43:20 -08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Byron Hsu
|
6fcd6d7d6d
|
Support token ids in engine.generate (#1820)
|
2024-10-27 14:02:34 -07:00 |
|
Lianmin Zheng
|
dd3809fad8
|
Fix engine unit test (#1701)
|
2024-10-17 09:53:32 -07:00 |
|
Byron Hsu
|
862cd265e5
|
[engine] support async and streaming (#1614)
|
2024-10-11 15:26:25 -07:00 |
|
Byron Hsu
|
e8613df071
|
[Engine] Fix generate hanging issue after the first call (#1606)
|
2024-10-08 04:26:56 +00:00 |
|
Byron Hsu
|
551a3a9d38
|
Provide an offline engine API (#1567)
|
2024-10-06 20:27:03 -07:00 |
|