Commit Graph

21 Commits

Author SHA1 Message Date
Ke Bao
862dd76c76 Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 (#3582) 2025-02-15 05:28:34 +08:00
Ying Sheng
d23cb9a01e [Eagle] reduce one draft forward (#3468) 2025-02-10 20:21:49 +08:00
Ke Bao
2d61132374 Support Eagle2 for Triton backend (#3466) 2025-02-10 20:00:42 +08:00
Yineng Zhang
1646149a83 fix draft cuda graph capture failure (#3431) 2025-02-09 23:16:20 +08:00
Yineng Zhang
fad315cb8e fix EAGLE 2 non greedy case (#3407)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-09 07:28:34 +08:00
Yineng Zhang
fa82dfccdd fix EagleVerifyInput (#3378) 2025-02-07 22:30:43 +08:00
Yineng Zhang
f9905d59a8 support speculative decoding kernel in sgl-kernel (#3373)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-07 20:29:51 +08:00
Yineng Zhang
d39899e85c upgrade flashinfer v0.2.0.post2 (#3288)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-04 21:41:40 +08:00
Yineng Zhang
013021b6a1 refactor EAGLE 2 (#3269)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
2025-02-03 20:52:30 +08:00
Lianmin Zheng
1dda8c5e4c Return more infos for computing average acceptance length (#3152) 2025-01-26 04:51:54 -08:00
Lianmin Zheng
287d07a669 Misc fixes for eagle (flush_cache, CPU overhead) (#3014) 2025-01-20 20:27:38 -08:00
996_icu
b730aa6b9e [EAGLE] Fix some boundary situation when retract reqs and req's max token = 1 (#2939)
Co-authored-by: josephyou <josephyou@tencent.com>
2025-01-20 17:46:43 -08:00
justdoit
4093aa4660 [Fix]eagle2 health_generate is first request,apiserver will core (#2853) 2025-01-13 01:01:21 -08:00
justdoit
a47bf39123 [Eagle2] Fix multiple concurrent request crashes (#2730) 2025-01-10 14:00:43 -08:00
Lianmin Zheng
8a6906127a Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784)
Co-authored-by: SangBin Cho rkooo567@gmail.com
2025-01-07 23:29:10 -08:00
JJJJOHNSON
694e41925e [eagle2] fix end check when target model verify (#2723) 2025-01-07 21:46:02 -08:00
Lianmin Zheng
b8574f6953 Clean up eagle code (#2756) 2025-01-06 14:54:18 -08:00
yukavio
8c8779cd05 [Fix] fix retract error in eagle speculative decoding (#2711)
Co-authored-by: kavioyu <kavioyu@tencent.com>
2025-01-02 10:28:39 -08:00
yukavio
815dce0554 Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-02 03:22:34 -08:00
Lianmin Zheng
ad20b7957e Eagle speculative decoding part 3: small modifications to the general scheduler (#2709)
Co-authored-by: kavioyu <kavioyu@tencent.com>
2025-01-02 02:09:08 -08:00
Lianmin Zheng
3815b23ccb Clean up wrapper in flashinfer backend (#2638) 2024-12-29 00:45:57 -08:00