Commit Graph

11 Commits

Author SHA1 Message Date
Yineng Zhang
e0b9a423c8 chore: bump v0.4.3 (#3556) 2025-02-14 09:43:14 +08:00
Yineng Zhang
70f894b810 feat: support flashinfer mla attention for deepseek v3 (#3550) 2025-02-14 08:50:14 +08:00
Ke Bao
7e6d5fc694 Support Eagle cuda graph for Triton backend (#3500) 2025-02-12 02:27:45 +08:00
Ke Bao
2d61132374 Support Eagle2 for Triton backend (#3466) 2025-02-10 20:00:42 +08:00
Yineng Zhang
60abdb3e7c minor: cleanup test_eagle_infer (#3415) 2025-02-09 09:34:30 +08:00
Ying Sheng
7b4e61fff3 [Fix] Fix eagle with disable cuda graph (#3411) 2025-02-09 08:40:00 +08:00
Yineng Zhang
6222e1c228 add disable cuda graph unit test for eagle 2 (#3412) 2025-02-09 08:02:56 +08:00
Lianmin Zheng
a4331cd260 Add accuracy and latency tests of eagle into CI (#3027) 2025-01-21 02:55:14 -08:00
justdoit
a47bf39123 [Eagle2] Fix multiple concurrent request crashes (#2730) 2025-01-10 14:00:43 -08:00
JJJJOHNSON
694e41925e [eagle2] fix end check when target model verify (#2723) 2025-01-07 21:46:02 -08:00
yukavio
815dce0554 Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-02 03:22:34 -08:00