Commit Graph

8 Commits

Author SHA1 Message Date
Lzhang-hub
4efe2c57c9 support vlm model spec bench (#10173) 2025-09-10 13:37:04 +08:00
Kay Yan
975a5ec69c [fix] update bench_speculative.py for compatibility (#7764)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-07-04 16:32:54 +08:00
Yineng Zhang
7282ab741a fix: update bench_speculative (#5649) 2025-04-22 16:08:15 -07:00
Baizhou Zhang
6fb29ffd9e Deprecate enable-flashinfer-mla and enable-flashmla (#5480) 2025-04-17 01:43:33 -07:00
lukec
a53fe428f9 Support FlashMLA backend (#4472)
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-03-16 09:07:06 -07:00
Ke Bao
f1d09a6541 Update bench speculative script (#4235) 2025-03-09 12:19:01 -07:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00