Pratyush Patel
|
8f15789314
|
Add more metrics to serving benchmark. (#2819)
|
2025-01-10 23:30:44 +08:00 |
|
Lianmin Zheng
|
bdc1acf6cd
|
Misc fix for min_p_sampling, --cuda-graph-bs (#2761)
|
2025-01-07 02:52:53 -08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Ying Sheng
|
8a56b43175
|
[Bench] Flush cache before benchmarking (#2566)
|
2024-12-24 11:21:21 +08:00 |
|
Lianmin Zheng
|
a2486eb58f
|
Fix a bug with logprob streaming + chunked prefill (#2403)
|
2024-12-08 03:55:27 -08:00 |
|
Lianmin Zheng
|
3ddb1c4679
|
[Minor] Fix logger and style (#2325)
|
2024-12-02 20:45:53 -08:00 |
|
bjmsong
|
01017d4c20
|
Support LoRA in Completion API (#2243)
Co-authored-by: root <bjmsong@126.com>
|
2024-11-29 16:13:38 -08:00 |
|
Byron Hsu
|
4b0a1c9365
|
Replace prob based with threshold based load balancing (#2170)
|
2024-11-24 23:17:11 -08:00 |
|
Lianmin Zheng
|
8e1adb8441
|
Allow overwrite flashinfer use_tensorcore (#2169)
|
2024-11-24 20:58:17 -08:00 |
|
Byron Hsu
|
cbedd1db1d
|
[router] cache-aware load-balancing router v1 (#2114)
|
2024-11-23 08:34:48 -08:00 |
|
Yineng Zhang
|
ad47749b82
|
fix: resolve bench_serving args (#2139)
|
2024-11-23 17:45:42 +08:00 |
|
Yunmeng
|
60769be14d
|
Add concurrency option for benchmark (#2136)
|
2024-11-23 17:07:07 +08:00 |
|
bjmsong
|
ad30d5cf9a
|
Benchmark with Pytorch Profiler easily (#2110)
Co-authored-by: root <bjmsong@126.com>
|
2024-11-21 23:29:50 -08:00 |
|
Yineng Zhang
|
55bd97f3e5
|
minor: add dataset dump and questions shuffle (#2093)
|
2024-11-19 14:07:27 -08:00 |
|
Lianmin Zheng
|
2558d6a675
|
Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042)
|
2024-11-15 05:02:44 -08:00 |
|
zolinthecow
|
f6dd648620
|
Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2024-11-14 21:59:33 -08:00 |
|
Ke Bao
|
b808a38365
|
Filter empty prompt in random bench serving (#2011)
|
2024-11-12 14:53:41 +08:00 |
|
Byron Hsu
|
8169c6f4ef
|
Add gen-shared-prefix dataset in bench_serving (#1990)
|
2024-11-11 08:39:56 +08:00 |
|
Yineng Zhang
|
793b79dbe9
|
feat: support truss endpoint for benchmark serving (#1906)
|
2024-11-03 12:56:10 -08:00 |
|
Lianmin Zheng
|
175afed370
|
Improve benchmark scripts (#1672)
|
2024-10-14 21:53:01 -07:00 |
|
Lianmin Zheng
|
4a292f670d
|
[Minor] Add some utility functions (#1671)
|
2024-10-14 20:08:03 -07:00 |
|
Ying Sheng
|
04b262cd91
|
[Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-10-04 08:51:11 +00:00 |
|
Lianmin Zheng
|
5e62a6b706
|
Add bench_server_latency.py (#1452)
|
2024-09-18 00:56:06 -07:00 |
|
Lianmin Zheng
|
e4d68afcf0
|
[Minor] Many cleanup (#1357)
|
2024-09-09 04:14:11 -07:00 |
|
min-xu-et
|
fa13b95d6b
|
fixed a typo (#1143)
|
2024-08-18 14:29:09 -07:00 |
|
Lianmin Zheng
|
3c1f5a9220
|
Fix duplicated imports in hf_transformers_utils.py (#1141)
|
2024-08-17 18:03:00 -07:00 |
|
Lianmin Zheng
|
57d0bd91ec
|
Improve benchmark (#1140)
|
2024-08-17 17:43:23 -07:00 |
|
Lianmin Zheng
|
41598e0d8e
|
Add longer accuracy test on CI (#1049)
|
2024-08-12 09:21:38 +00:00 |
|
Lianmin Zheng
|
8207637029
|
Improve end-to-end throughput test and its coverage (#1039)
|
2024-08-11 18:27:33 -07:00 |
|
Roger Wang
|
05c50a82b8
|
Minor bugfix on benchmark serving (#1005)
|
2024-08-10 02:53:50 +10:00 |
|
Juwan Yoo
|
ab7875941b
|
feat: frequency, min_new_tokens, presence, and repetition penalties (#973)
|
2024-08-08 04:21:08 -07:00 |
|
Ying Sheng
|
ae7ee01a8e
|
Add accuracy test to CI: MMLU (#882)
|
2024-08-01 21:20:17 -07:00 |
|
Yineng Zhang
|
f52eda35ea
|
misc: update e2e test benchmark config (#825)
|
2024-07-30 19:19:23 +10:00 |
|
Ying Sheng
|
ae5c0fc442
|
Support disable_ignore_eos in bench_serving.py (#824)
|
2024-07-30 01:42:07 -07:00 |
|
Ying Sheng
|
db6089e6f3
|
Revert "Organize public APIs" (#815)
|
2024-07-29 19:40:28 -07:00 |
|
Liangsheng Yin
|
c8e9fed87a
|
Organize public APIs (#809)
|
2024-07-29 15:34:16 -07:00 |
|
Yineng Zhang
|
768e05d08f
|
fix benchmark (#743)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-07-26 21:26:13 +10:00 |
|
Ying Sheng
|
30d8e130e7
|
Improve benchmark scripts (#717)
|
2024-07-24 14:44:14 -07:00 |
|
zhyncs
|
fa7ccb3316
|
feat: add e2e latency (#704)
|
2024-07-24 05:51:10 +10:00 |
|
zhyncs
|
9fdea29d05
|
misc: fix typo (#698)
|
2024-07-23 02:00:27 +10:00 |
|
Ying Sheng
|
df7c4c19b4
|
Fix trt benchmark (#697)
|
2024-07-22 23:32:41 +10:00 |
|
zhyncs
|
d198791fe8
|
misc: update output token logic (#695)
|
2024-07-22 19:34:05 +10:00 |
|
zhyncs
|
c07526e46c
|
fix: update bench serving (#694)
|
2024-07-22 18:23:33 +10:00 |
|
zhyncs
|
65bd13386b
|
misc: recommend to use chat model for benchmark (#690)
|
2024-07-22 00:13:33 +10:00 |
|
zhyncs
|
6a846bb1fd
|
misc: update output file logic (#686)
|
2024-07-21 18:07:30 +10:00 |
|
zhyncs
|
0fdb3127a1
|
feat: update bench serving (#685)
|
2024-07-21 16:46:58 +10:00 |
|
Lianmin Zheng
|
77e592e8e0
|
support non-streaming benchmark (#682)
|
2024-07-20 18:36:42 -07:00 |
|
zhyncs
|
4b4a67f814
|
feat: support TRT LLM benchmark and multiple benchmarks (#670)
|
2024-07-20 11:05:35 -07:00 |
|
Lianmin Zheng
|
9592a1f3bd
|
Fix random dataset (#671)
|
2024-07-20 01:57:43 -07:00 |
|
Lianmin Zheng
|
35759efa91
|
Support random dataset in bench_serving.py (#669)
|
2024-07-20 01:06:43 -07:00 |
|