fzyzcjy
|
501efc3d36
|
Tiny fix CI (#6611)
|
2025-05-25 23:36:34 -07:00 |
|
fzyzcjy
|
6bebef60a7
|
Support accurate length control for bench serving (#6594)
|
2025-05-25 22:46:23 -07:00 |
|
fzyzcjy
|
25be63d0b2
|
Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-25 22:41:27 -07:00 |
|
fzyzcjy
|
2c3a6fe1de
|
Fix bench_serving does not support changing warmup requests (#6439)
|
2025-05-25 22:35:36 -07:00 |
|
Yineng Zhang
|
a6970a17f3
|
misc: fix accept_length (#6536)
|
2025-05-22 14:27:10 -07:00 |
|
fzyzcjy
|
969660c762
|
Recover from corrupted cache file in bench serving (#6510)
|
2025-05-21 17:13:54 -07:00 |
|
fzyzcjy
|
7222e1dacc
|
Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573)
|
2025-05-21 02:08:43 -07:00 |
|
fzyzcjy
|
26ebb849eb
|
Tiny refactor bench_serving to extract RequestFuncOutput.init_new (#6108)
|
2025-05-17 17:08:52 -07:00 |
|
fzyzcjy
|
02973cd9a4
|
Tiny refactor bench_serving to improve extensibility (#6134)
|
2025-05-17 17:07:58 -07:00 |
|
fzyzcjy
|
6d95a35abf
|
Support outputing details for bench_serving (#6107)
|
2025-05-17 17:06:52 -07:00 |
|
Yineng Zhang
|
f24fc5b86d
|
fix typo (#6248)
|
2025-05-12 15:45:12 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
XinyuanTong
|
e88dd482ed
|
[CI]Add performance CI for VLM (#6038)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-07 19:20:03 -07:00 |
|
Lianmin Zheng
|
35ca04d2fa
|
[CI] fix port conflicts (#5789)
|
2025-04-27 05:17:44 -07:00 |
|
vzed
|
094891c01a
|
fix: Use is not None instead of != None for None checks. (#5687)
|
2025-04-26 19:26:57 -07:00 |
|
Yineng Zhang
|
b1f6d89b5f
|
fix: update truss bench_serving (#5683)
|
2025-04-23 13:28:35 -07:00 |
|
Yineng Zhang
|
7282ab741a
|
fix: update bench_speculative (#5649)
|
2025-04-22 16:08:15 -07:00 |
|
fzyzcjy
|
9924bbe153
|
Fix bench_serving fail when zero warmup requests (#5574)
|
2025-04-20 14:16:03 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
Yuhong Guo
|
3dfc6023ce
|
Fix bench_serving with random-ids (#5214)
|
2025-04-15 01:34:35 -07:00 |
|
Xiaoyu Zhang
|
924ca7c92c
|
Add DeepSeek V3/R1 shared experts fusion (#4918)
|
2025-04-04 01:59:29 -07:00 |
|
Yineng Zhang
|
fda6bb78da
|
update bench_serving (#4958)
|
2025-04-01 15:10:56 -07:00 |
|
chaobo jia
|
ef9a378a20
|
[Feature] add multi-rank support for Lora (#4492)
Co-authored-by: rudy152 <czh1137892874@gmail.com>
|
2025-03-28 09:38:44 -07:00 |
|
Stefan He
|
5d7edc8e55
|
Support FA3 as Attention backend by using --attention-backend fa3 (#4680)
Co-authored-by: qsong <qsong@linkedin.com>
Co-authored-by: qingquansong <ustcsqq@gmail.com>
|
2025-03-23 23:28:11 -07:00 |
|
Xu Song
|
470b474075
|
Update bench_serving.py (#4454)
|
2025-03-15 16:33:58 -07:00 |
|
Mingshan
|
0fe7c13be1
|
Fix bench_serving flush cache not recognizing OPENAI_API_KEY (#4181)
Signed-off-by: Mingshan <git@brighill.com>
|
2025-03-08 01:03:38 -08:00 |
|
Lzhang-hub
|
3a3918121f
|
fix bench serving bug (#4135)
|
2025-03-06 05:34:02 -08:00 |
|
Lianmin Zheng
|
e074d84e5b
|
[Minor] more code cleanup (#4077)
|
2025-03-04 21:23:47 -08:00 |
|
Lianmin Zheng
|
66301e124f
|
Improve code styles (#4021)
|
2025-03-03 03:20:23 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Kebe
|
ec0a72c2d9
|
Fix bench_serving not recognizing OPENAI_API_KEY (#3870)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-02-27 20:18:53 -08:00 |
|
Lianmin Zheng
|
287d07a669
|
Misc fixes for eagle (flush_cache, CPU overhead) (#3014)
|
2025-01-20 20:27:38 -08:00 |
|
Lianmin Zheng
|
8f2c522aba
|
Improve benchmark scripts and error message printing (#2922)
|
2025-01-16 06:24:31 -08:00 |
|
Pratyush Patel
|
8f15789314
|
Add more metrics to serving benchmark. (#2819)
|
2025-01-10 23:30:44 +08:00 |
|
Lianmin Zheng
|
bdc1acf6cd
|
Misc fix for min_p_sampling, --cuda-graph-bs (#2761)
|
2025-01-07 02:52:53 -08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Ying Sheng
|
8a56b43175
|
[Bench] Flush cache before benchmarking (#2566)
|
2024-12-24 11:21:21 +08:00 |
|
Lianmin Zheng
|
a2486eb58f
|
Fix a bug with logprob streaming + chunked prefill (#2403)
|
2024-12-08 03:55:27 -08:00 |
|
Lianmin Zheng
|
3ddb1c4679
|
[Minor] Fix logger and style (#2325)
|
2024-12-02 20:45:53 -08:00 |
|
bjmsong
|
01017d4c20
|
Support LoRA in Completion API (#2243)
Co-authored-by: root <bjmsong@126.com>
|
2024-11-29 16:13:38 -08:00 |
|
Byron Hsu
|
4b0a1c9365
|
Replace prob based with threshold based load balancing (#2170)
|
2024-11-24 23:17:11 -08:00 |
|
Lianmin Zheng
|
8e1adb8441
|
Allow overwrite flashinfer use_tensorcore (#2169)
|
2024-11-24 20:58:17 -08:00 |
|
Byron Hsu
|
cbedd1db1d
|
[router] cache-aware load-balancing router v1 (#2114)
|
2024-11-23 08:34:48 -08:00 |
|
Yineng Zhang
|
ad47749b82
|
fix: resolve bench_serving args (#2139)
|
2024-11-23 17:45:42 +08:00 |
|
Yunmeng
|
60769be14d
|
Add concurrency option for benchmark (#2136)
|
2024-11-23 17:07:07 +08:00 |
|
bjmsong
|
ad30d5cf9a
|
Benchmark with Pytorch Profiler easily (#2110)
Co-authored-by: root <bjmsong@126.com>
|
2024-11-21 23:29:50 -08:00 |
|
Yineng Zhang
|
55bd97f3e5
|
minor: add dataset dump and questions shuffle (#2093)
|
2024-11-19 14:07:27 -08:00 |
|
Lianmin Zheng
|
2558d6a675
|
Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042)
|
2024-11-15 05:02:44 -08:00 |
|
zolinthecow
|
f6dd648620
|
Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2024-11-14 21:59:33 -08:00 |
|