Commit Graph

100 Commits

Author SHA1 Message Date
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00
Muqi Li
d5e2a37414 Benchmark: Support API_KEY without 'bearer' (#10380) 2025-09-12 16:29:04 -07:00
blzheng
97fff98c68 [CPU] Fix phi4-mm prompt issue in bench_serving (#9900) 2025-09-08 20:12:32 -07:00
Yineng Zhang
19d64f2b72 fix: resolve lint issue (#10181) 2025-09-08 15:09:55 -07:00
Teng Ma
a02071a12c [Bench] feat: mooncake trace integration (#9839)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
2025-09-09 02:50:54 +08:00
Mick
16a6d21b95 chore: enhance bench_serving for vlms with a new dataset of configurable image count and resolution (#9583)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
2025-08-26 17:42:54 -07:00
Mick
584e1ab2d0 fix: fix unsupported palette mode of images in bench_serving for mmmu (#9206) 2025-08-14 18:44:46 -07:00
Brayden Zhong
a37e1247c1 [Multimodal][Perf] Use pybase64 instead of base64 (#7724) 2025-07-08 14:00:58 -07:00
Lianmin Zheng
22352d47a9 Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-06-29 23:16:19 -07:00
Xinyuan Tong
c45e49d817 oai: Adds support for OpenAI chat completions API in bench_serving (#7036)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2025-06-28 22:59:20 +00:00
Zijian
31d6dee5c4 Support VILA models (#6106) 2025-06-11 11:47:25 -07:00
Xinyuan Tong
697b0f71f0 [Refactor] image data process in bench_serving (#6879)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-06 21:11:17 -07:00
Yikai Zhang
fb507b7b10 [FIX] mmmu bench serving result display error (#6525) (#6791) 2025-05-31 13:48:06 -07:00
Yuhong Guo
d279d4990c Fix aiohttp 'Chunk too big' in bench_serving (#6737) 2025-05-30 00:50:36 -07:00
fzyzcjy
501efc3d36 Tiny fix CI (#6611) 2025-05-25 23:36:34 -07:00
fzyzcjy
6bebef60a7 Support accurate length control for bench serving (#6594) 2025-05-25 22:46:23 -07:00
fzyzcjy
25be63d0b2 Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-25 22:41:27 -07:00
fzyzcjy
2c3a6fe1de Fix bench_serving does not support changing warmup requests (#6439) 2025-05-25 22:35:36 -07:00
Yineng Zhang
a6970a17f3 misc: fix accept_length (#6536) 2025-05-22 14:27:10 -07:00
fzyzcjy
969660c762 Recover from corrupted cache file in bench serving (#6510) 2025-05-21 17:13:54 -07:00
fzyzcjy
7222e1dacc Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573) 2025-05-21 02:08:43 -07:00
fzyzcjy
26ebb849eb Tiny refactor bench_serving to extract RequestFuncOutput.init_new (#6108) 2025-05-17 17:08:52 -07:00
fzyzcjy
02973cd9a4 Tiny refactor bench_serving to improve extensibility (#6134) 2025-05-17 17:07:58 -07:00
fzyzcjy
6d95a35abf Support outputing details for bench_serving (#6107) 2025-05-17 17:06:52 -07:00
Yineng Zhang
f24fc5b86d fix typo (#6248) 2025-05-12 15:45:12 -07:00
Lianmin Zheng
fba8eccd7e Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 00:17:33 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
XinyuanTong
e88dd482ed [CI]Add performance CI for VLM (#6038)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-07 19:20:03 -07:00
Lianmin Zheng
35ca04d2fa [CI] fix port conflicts (#5789) 2025-04-27 05:17:44 -07:00
vzed
094891c01a fix: Use is not None instead of != None for None checks. (#5687) 2025-04-26 19:26:57 -07:00
Yineng Zhang
b1f6d89b5f fix: update truss bench_serving (#5683) 2025-04-23 13:28:35 -07:00
Yineng Zhang
7282ab741a fix: update bench_speculative (#5649) 2025-04-22 16:08:15 -07:00
fzyzcjy
9924bbe153 Fix bench_serving fail when zero warmup requests (#5574) 2025-04-20 14:16:03 -07:00
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
Yuhong Guo
3dfc6023ce Fix bench_serving with random-ids (#5214) 2025-04-15 01:34:35 -07:00
Xiaoyu Zhang
924ca7c92c Add DeepSeek V3/R1 shared experts fusion (#4918) 2025-04-04 01:59:29 -07:00
Yineng Zhang
fda6bb78da update bench_serving (#4958) 2025-04-01 15:10:56 -07:00
chaobo jia
ef9a378a20 [Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
2025-03-28 09:38:44 -07:00
Stefan He
5d7edc8e55 Support FA3 as Attention backend by using --attention-backend fa3 (#4680)
Co-authored-by: qsong <qsong@linkedin.com>
Co-authored-by: qingquansong <ustcsqq@gmail.com>
2025-03-23 23:28:11 -07:00
Xu Song
470b474075 Update bench_serving.py (#4454) 2025-03-15 16:33:58 -07:00
Mingshan
0fe7c13be1 Fix bench_serving flush cache not recognizing OPENAI_API_KEY (#4181)
Signed-off-by: Mingshan <git@brighill.com>
2025-03-08 01:03:38 -08:00
Lzhang-hub
3a3918121f fix bench serving bug (#4135) 2025-03-06 05:34:02 -08:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Kebe
ec0a72c2d9 Fix bench_serving not recognizing OPENAI_API_KEY (#3870)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-02-27 20:18:53 -08:00
Lianmin Zheng
287d07a669 Misc fixes for eagle (flush_cache, CPU overhead) (#3014) 2025-01-20 20:27:38 -08:00
Lianmin Zheng
8f2c522aba Improve benchmark scripts and error message printing (#2922) 2025-01-16 06:24:31 -08:00
Pratyush Patel
8f15789314 Add more metrics to serving benchmark. (#2819) 2025-01-10 23:30:44 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00