Commit Graph

428 Commits

Author SHA1 Message Date
Liangsheng Yin
04ec6ba2ac Fix dockerfile and triton cache manager (#720) 2024-07-25 03:04:21 -07:00
Ying Sheng
d63f13c13b fix: fp8 config (#723) 2024-07-25 02:01:56 -07:00
Yineng Zhang
fded67441d misc: update bulid instruction (#724) 2024-07-25 17:08:11 +10:00
Yineng Zhang
6e45394051 chore: add close inactive issues workflow (#722) 2024-07-25 16:31:23 +10:00
Yineng Zhang
97e0f7d250 docs: update comment (#721) 2024-07-25 10:51:18 +10:00
Yineng Zhang
d5146baec9 docs: update supported models (#719) 2024-07-25 09:34:01 +10:00
Ying Sheng
459abad261 Bump version to 0.1.24 (#718) 2024-07-24 15:55:01 -07:00
Ying Sheng
30d8e130e7 Improve benchmark scripts (#717) 2024-07-24 14:44:14 -07:00
Ying Sheng
08a3bd19cc docs: update doc (#716) 2024-07-24 20:44:03 +00:00
Yineng Zhang
321a963b01 misc: update doc (#715) 2024-07-24 13:05:46 -07:00
Yineng Zhang
e17deb27b5 fix: llama 3.1 405b fp8 (#714) 2024-07-24 09:37:41 -07:00
Yineng Zhang
2d3ae4e125 docs: update doc (#713) 2024-07-25 00:03:17 +10:00
Yineng Zhang
75f4ccb7dd docs: update README (#712) 2024-07-24 23:33:28 +10:00
Ying Sheng
83d2b30d75 format 2024-07-24 10:53:07 +00:00
Ying Sheng
4367f4bb8d Fix prefill size (#711) 2024-07-24 03:41:15 -07:00
Lianmin Zheng
00e4baa728 Update schedule_heuristic.py 2024-07-24 01:22:30 -07:00
Liangsheng Yin
4cd64b8ee6 Auto adjust new ratio (#708) 2024-07-23 22:06:02 -07:00
Lianmin Zheng
01d66ae2e8 Fix multi-node deadlock (#709) 2024-07-23 21:53:36 -07:00
Mingyi
a523a3c13a Reduce hardcoded logic of kernel usage (#707) 2024-07-23 16:42:21 -07:00
Ying Sheng
9f94728f5a bump version to 0.1.23 (#706) 2024-07-23 13:53:19 -07:00
Ying Sheng
444a02441a Update vllm version to support llama3.1 (#705) 2024-07-23 13:49:34 -07:00
zhyncs
fa7ccb3316 feat: add e2e latency (#704) 2024-07-24 05:51:10 +10:00
Liangsheng Yin
268684439b Use min new token ratio at start (#701) 2024-07-23 11:52:50 -07:00
Ke Bao
824a77d04d Fix hf config loading (#702) 2024-07-23 11:39:08 -07:00
Ying Sheng
cf99eab7d5 Fix flashinfer (#700) 2024-07-23 01:27:01 -07:00
zhyncs
9fdea29d05 misc: fix typo (#698) 2024-07-23 02:00:27 +10:00
Ying Sheng
df7c4c19b4 Fix trt benchmark (#697) 2024-07-22 23:32:41 +10:00
Ying Sheng
c3f1aac811 Tune params (#696) 2024-07-22 03:19:24 -07:00
zhyncs
d198791fe8 misc: update output token logic (#695) 2024-07-22 19:34:05 +10:00
zhyncs
c07526e46c fix: update bench serving (#694) 2024-07-22 18:23:33 +10:00
zhyncs
7b597475f2 docs: update README (#692) 2024-07-22 03:41:20 +10:00
Ke Bao
5303c1ed22 Support Mistral-Nemo (#691) 2024-07-22 03:36:53 +10:00
zhyncs
65bd13386b misc: recommend to use chat model for benchmark (#690) 2024-07-22 00:13:33 +10:00
Liangsheng Yin
eedc12e12e Support Deepseek MoE Model (#689) 2024-07-21 03:09:29 -07:00
Lianmin Zheng
5a4ef2b5c8 update readme 2024-07-21 02:58:57 -07:00
zhyncs
9dab947d56 docs: update README (#688) 2024-07-21 18:32:58 +10:00
Lianmin Zheng
33ee97b0bf Allow disabling streaming in bench (#687) 2024-07-21 01:12:34 -07:00
zhyncs
6a846bb1fd misc: update output file logic (#686) 2024-07-21 18:07:30 +10:00
zhyncs
0fdb3127a1 feat: update bench serving (#685) 2024-07-21 16:46:58 +10:00
Max Shawabkeh
5ad033a070 Fix StreamExecutor.fork() losing the current role start index. (#684) 2024-07-20 23:32:11 -07:00
Lianmin Zheng
77e592e8e0 support non-streaming benchmark (#682) 2024-07-20 18:36:42 -07:00
Liangsheng Yin
caaad53b52 Support gpt-bigcode model class (#681) 2024-07-20 18:34:37 -07:00
Liangsheng Yin
69d19188fc Decouple kv (#679) 2024-07-20 14:16:45 -07:00
zhyncs
4b4a67f814 feat: support TRT LLM benchmark and multiple benchmarks (#670) 2024-07-20 11:05:35 -07:00
Ke Bao
0ac94c36cb Fallback when sampling failed (#678) 2024-07-20 10:44:54 -07:00
Ying Sheng
2b4c646277 Update version to 0.1.22 (#677) 2024-07-20 03:39:50 -07:00
Liangsheng Yin
f424e76d96 Fix illegal tokens during sampling (#676) 2024-07-20 03:11:15 -07:00
Lianmin Zheng
490a1f39dd Fix cuda graph with flashinfer (#675) 2024-07-20 02:43:55 -07:00
Ying Sheng
06487f126e refactor model loader: initial refactor (#664) 2024-07-20 02:18:22 -07:00
Liangsheng Yin
39c57317e1 Revert "Temporary fix invalid sample results" (#673) 2024-07-20 02:06:31 -07:00