Lzhang-hub
|
4efe2c57c9
|
support vlm model spec bench (#10173)
|
2025-09-10 13:37:04 +08:00 |
|
Kay Yan
|
975a5ec69c
|
[fix] update bench_speculative.py for compatibility (#7764)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-07-04 16:32:54 +08:00 |
|
Yineng Zhang
|
7282ab741a
|
fix: update bench_speculative (#5649)
|
2025-04-22 16:08:15 -07:00 |
|
Baizhou Zhang
|
6fb29ffd9e
|
Deprecate enable-flashinfer-mla and enable-flashmla (#5480)
|
2025-04-17 01:43:33 -07:00 |
|
lukec
|
a53fe428f9
|
Support FlashMLA backend (#4472)
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-03-16 09:07:06 -07:00 |
|
Ke Bao
|
f1d09a6541
|
Update bench speculative script (#4235)
|
2025-03-09 12:19:01 -07:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|