Ke Bao
|
f127355a30
|
Add batch test for draft extend (#6672)
|
2025-05-27 16:32:05 -07:00 |
|
Ke Bao
|
6ce0ed073b
|
Apply constraint grammar to EAGLE (#6499)
Co-authored-by: merrymercy <lianminzheng@gmail.com>
|
2025-05-21 17:18:41 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
Lianmin Zheng
|
981a2619d5
|
Fix eagle test case (#5776)
|
2025-04-27 01:00:54 -07:00 |
|
Lianmin Zheng
|
21514ff5bd
|
Disable flaky eagle tests (#5753)
|
2025-04-25 15:54:39 -07:00 |
|
Zhiqiang Xie
|
a169b9f813
|
Fix oom error for large page size (#4913)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-03-30 21:34:21 -07:00 |
|
Lianmin Zheng
|
9adf178cc2
|
Fix 2-gpu CI test and suppress some warnings (#4930)
|
2025-03-30 12:51:44 -07:00 |
|
Lianmin Zheng
|
b26bc86b36
|
Support page size > 1 + eagle (#4908)
|
2025-03-30 00:46:23 -07:00 |
|
Lianmin Zheng
|
74e0ac1dbd
|
Clean up import vllm in quantization/__init__.py (#4834)
|
2025-03-28 10:34:10 -07:00 |
|
Lianmin Zheng
|
47e6628aae
|
Fix CI tests (#4853)
|
2025-03-28 00:28:35 -07:00 |
|
fzyzcjy
|
15ddd84322
|
Add retry for flaky tests in CI (#4755)
|
2025-03-25 16:53:12 -07:00 |
|
James Liu
|
9e0186f352
|
[Feature] Support EAGLE 3 (#4247)
|
2025-03-18 07:35:23 -07:00 |
|
Ying Sheng
|
1b859295f4
|
[Eagle] Remove the greedy branch and some redundant code (#4363)
Co-authored-by: Sehoon Kim <sehoon@x.ai>
|
2025-03-16 02:48:55 -07:00 |
|
Lianmin Zheng
|
08c4d764a5
|
lazy import attn backends (#4200)
|
2025-03-08 00:41:35 -08:00 |
|
Lianmin Zheng
|
d4017a6b63
|
[EAGLE] many fixes for eagle (#4195)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
|
2025-03-07 22:12:13 -08:00 |
|
Lianmin Zheng
|
bc1534ff32
|
Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134)
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
|
2025-03-06 06:13:59 -08:00 |
|
Lianmin Zheng
|
fcc2e37f69
|
Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128)
|
2025-03-06 00:13:20 -08:00 |
|
Ying Sheng
|
d3d4d76758
|
[Eagle] Refactor eagle speculative decoding (#3986)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
|
2025-03-05 08:06:07 -08:00 |
|
Lianmin Zheng
|
77a3954bf7
|
Simplify eagle tests and TP sync in grammar backend (#4066)
|
2025-03-04 13:40:40 -08:00 |
|
William
|
0d4e3228cf
|
[Feature] Add test for speculative_token_map (#4016)
|
2025-03-04 04:26:24 -08:00 |
|
Yineng Zhang
|
e0b9a423c8
|
chore: bump v0.4.3 (#3556)
|
2025-02-14 09:43:14 +08:00 |
|
Yineng Zhang
|
70f894b810
|
feat: support flashinfer mla attention for deepseek v3 (#3550)
|
2025-02-14 08:50:14 +08:00 |
|
Ke Bao
|
7e6d5fc694
|
Support Eagle cuda graph for Triton backend (#3500)
|
2025-02-12 02:27:45 +08:00 |
|
Ke Bao
|
2d61132374
|
Support Eagle2 for Triton backend (#3466)
|
2025-02-10 20:00:42 +08:00 |
|
Yineng Zhang
|
60abdb3e7c
|
minor: cleanup test_eagle_infer (#3415)
|
2025-02-09 09:34:30 +08:00 |
|
Ying Sheng
|
7b4e61fff3
|
[Fix] Fix eagle with disable cuda graph (#3411)
|
2025-02-09 08:40:00 +08:00 |
|
Yineng Zhang
|
6222e1c228
|
add disable cuda graph unit test for eagle 2 (#3412)
|
2025-02-09 08:02:56 +08:00 |
|
Lianmin Zheng
|
a4331cd260
|
Add accuracy and latency tests of eagle into CI (#3027)
|
2025-01-21 02:55:14 -08:00 |
|
justdoit
|
a47bf39123
|
[Eagle2] Fix multiple concurrent request crashes (#2730)
|
2025-01-10 14:00:43 -08:00 |
|
JJJJOHNSON
|
694e41925e
|
[eagle2] fix end check when target model verify (#2723)
|
2025-01-07 21:46:02 -08:00 |
|
yukavio
|
815dce0554
|
Eagle speculative decoding part 4: Add EAGLE2 worker (#2150)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-01-02 03:22:34 -08:00 |
|