Zhihao Zhang
|
e7bc600304
|
[Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-09-18 16:42:41 -07:00 |
|
Muqi Li
|
d5e2a37414
|
Benchmark: Support API_KEY without 'bearer' (#10380)
|
2025-09-12 16:29:04 -07:00 |
|
blzheng
|
97fff98c68
|
[CPU] Fix phi4-mm prompt issue in bench_serving (#9900)
|
2025-09-08 20:12:32 -07:00 |
|
Yineng Zhang
|
19d64f2b72
|
fix: resolve lint issue (#10181)
|
2025-09-08 15:09:55 -07:00 |
|
Teng Ma
|
a02071a12c
|
[Bench] feat: mooncake trace integration (#9839)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
2025-09-09 02:50:54 +08:00 |
|
Mick
|
16a6d21b95
|
chore: enhance bench_serving for vlms with a new dataset of configurable image count and resolution (#9583)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
|
2025-08-26 17:42:54 -07:00 |
|
Mick
|
584e1ab2d0
|
fix: fix unsupported palette mode of images in bench_serving for mmmu (#9206)
|
2025-08-14 18:44:46 -07:00 |
|
Brayden Zhong
|
a37e1247c1
|
[Multimodal][Perf] Use pybase64 instead of base64 (#7724)
|
2025-07-08 14:00:58 -07:00 |
|
Lianmin Zheng
|
22352d47a9
|
Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
|
2025-06-29 23:16:19 -07:00 |
|
Xinyuan Tong
|
c45e49d817
|
oai: Adds support for OpenAI chat completions API in bench_serving (#7036)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-06-28 22:59:20 +00:00 |
|
Zijian
|
31d6dee5c4
|
Support VILA models (#6106)
|
2025-06-11 11:47:25 -07:00 |
|
Xinyuan Tong
|
697b0f71f0
|
[Refactor] image data process in bench_serving (#6879)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-06-06 21:11:17 -07:00 |
|
Yikai Zhang
|
fb507b7b10
|
[FIX] mmmu bench serving result display error (#6525) (#6791)
|
2025-05-31 13:48:06 -07:00 |
|
Yuhong Guo
|
d279d4990c
|
Fix aiohttp 'Chunk too big' in bench_serving (#6737)
|
2025-05-30 00:50:36 -07:00 |
|
fzyzcjy
|
501efc3d36
|
Tiny fix CI (#6611)
|
2025-05-25 23:36:34 -07:00 |
|
fzyzcjy
|
6bebef60a7
|
Support accurate length control for bench serving (#6594)
|
2025-05-25 22:46:23 -07:00 |
|
fzyzcjy
|
25be63d0b2
|
Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-25 22:41:27 -07:00 |
|
fzyzcjy
|
2c3a6fe1de
|
Fix bench_serving does not support changing warmup requests (#6439)
|
2025-05-25 22:35:36 -07:00 |
|
Yineng Zhang
|
a6970a17f3
|
misc: fix accept_length (#6536)
|
2025-05-22 14:27:10 -07:00 |
|
fzyzcjy
|
969660c762
|
Recover from corrupted cache file in bench serving (#6510)
|
2025-05-21 17:13:54 -07:00 |
|
fzyzcjy
|
7222e1dacc
|
Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573)
|
2025-05-21 02:08:43 -07:00 |
|
fzyzcjy
|
26ebb849eb
|
Tiny refactor bench_serving to extract RequestFuncOutput.init_new (#6108)
|
2025-05-17 17:08:52 -07:00 |
|
fzyzcjy
|
02973cd9a4
|
Tiny refactor bench_serving to improve extensibility (#6134)
|
2025-05-17 17:07:58 -07:00 |
|
fzyzcjy
|
6d95a35abf
|
Support outputing details for bench_serving (#6107)
|
2025-05-17 17:06:52 -07:00 |
|
Yineng Zhang
|
f24fc5b86d
|
fix typo (#6248)
|
2025-05-12 15:45:12 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
XinyuanTong
|
e88dd482ed
|
[CI]Add performance CI for VLM (#6038)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-07 19:20:03 -07:00 |
|
Lianmin Zheng
|
35ca04d2fa
|
[CI] fix port conflicts (#5789)
|
2025-04-27 05:17:44 -07:00 |
|
vzed
|
094891c01a
|
fix: Use is not None instead of != None for None checks. (#5687)
|
2025-04-26 19:26:57 -07:00 |
|
Yineng Zhang
|
b1f6d89b5f
|
fix: update truss bench_serving (#5683)
|
2025-04-23 13:28:35 -07:00 |
|
Yineng Zhang
|
7282ab741a
|
fix: update bench_speculative (#5649)
|
2025-04-22 16:08:15 -07:00 |
|
fzyzcjy
|
9924bbe153
|
Fix bench_serving fail when zero warmup requests (#5574)
|
2025-04-20 14:16:03 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
Yuhong Guo
|
3dfc6023ce
|
Fix bench_serving with random-ids (#5214)
|
2025-04-15 01:34:35 -07:00 |
|
Xiaoyu Zhang
|
924ca7c92c
|
Add DeepSeek V3/R1 shared experts fusion (#4918)
|
2025-04-04 01:59:29 -07:00 |
|
Yineng Zhang
|
fda6bb78da
|
update bench_serving (#4958)
|
2025-04-01 15:10:56 -07:00 |
|
chaobo jia
|
ef9a378a20
|
[Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
|
2025-03-28 09:38:44 -07:00 |
|
Stefan He
|
5d7edc8e55
|
Support FA3 as Attention backend by using --attention-backend fa3 (#4680)
Co-authored-by: qsong <qsong@linkedin.com>
Co-authored-by: qingquansong <ustcsqq@gmail.com>
|
2025-03-23 23:28:11 -07:00 |
|
Xu Song
|
470b474075
|
Update bench_serving.py (#4454)
|
2025-03-15 16:33:58 -07:00 |
|
Mingshan
|
0fe7c13be1
|
Fix bench_serving flush cache not recognizing OPENAI_API_KEY (#4181)
Signed-off-by: Mingshan <git@brighill.com>
|
2025-03-08 01:03:38 -08:00 |
|
Lzhang-hub
|
3a3918121f
|
fix bench serving bug (#4135)
|
2025-03-06 05:34:02 -08:00 |
|
Lianmin Zheng
|
e074d84e5b
|
[Minor] more code cleanup (#4077)
|
2025-03-04 21:23:47 -08:00 |
|
Lianmin Zheng
|
66301e124f
|
Improve code styles (#4021)
|
2025-03-03 03:20:23 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Kebe
|
ec0a72c2d9
|
Fix bench_serving not recognizing OPENAI_API_KEY (#3870)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-02-27 20:18:53 -08:00 |
|
Lianmin Zheng
|
287d07a669
|
Misc fixes for eagle (flush_cache, CPU overhead) (#3014)
|
2025-01-20 20:27:38 -08:00 |
|
Lianmin Zheng
|
8f2c522aba
|
Improve benchmark scripts and error message printing (#2922)
|
2025-01-16 06:24:31 -08:00 |
|
Pratyush Patel
|
8f15789314
|
Add more metrics to serving benchmark. (#2819)
|
2025-01-10 23:30:44 +08:00 |
|
Lianmin Zheng
|
bdc1acf6cd
|
Misc fix for min_p_sampling, --cuda-graph-bs (#2761)
|
2025-01-07 02:52:53 -08:00 |
|