Commit Graph

12 Commits

Author SHA1 Message Date
Adarsh Shirawalmath
19fd57bcd7 [docs] fix HF reference script command (#4148) 2025-03-06 13:21:54 -08:00
Yineng Zhang
bc6ad367c2 fix lint (#2733) 2025-01-05 14:45:42 +08:00
Ce Gao
f5d0865b25 feat: Support VLM in reference_hf (#2726)
Signed-off-by: Ce Gao <gaocegege@hotmail.com>
2025-01-03 22:32:30 +08:00
Ke Bao
62832bb272 Support cuda graph for DP attention (#2061) 2024-11-17 16:29:20 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Jani Monoses
916b3cdddc Allow passing dtype and max_new_tokens to HF reference script (#1903) 2024-11-03 08:24:37 -08:00
Lianmin Zheng
fb2d0680e0 [Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510) 2024-09-24 21:37:33 -07:00
Lianmin Zheng
2854a5ea9f Fix the overhead due to penalizer in bench_latency (#1496) 2024-09-23 07:38:14 -07:00
Lianmin Zheng
167591e864 Better unit tests for adding a new model (#1488) 2024-09-22 01:50:37 -07:00
Lianmin Zheng
f64eae3a29 [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308) 2024-09-02 21:44:45 -07:00
Ying Sheng
0909bb0d2f [Feat] Add window attention for gemma-2 (#1056) 2024-08-13 17:01:26 -07:00
Ying Sheng
4075677621 Add OpenAI backend to the CI test (#869) 2024-08-01 09:25:24 -07:00