sglang

Author	SHA1	Message	Date
Ke Bao	f127355a30	Add batch test for draft extend (#6672 )	2025-05-27 16:32:05 -07:00
Ke Bao	6ce0ed073b	Apply constraint grammar to EAGLE (#6499 ) Co-authored-by: merrymercy <lianminzheng@gmail.com>	2025-05-21 17:18:41 -07:00
Lianmin Zheng	fba8eccd7e	Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-12 00:17:33 -07:00
Lianmin Zheng	981a2619d5	Fix eagle test case (#5776 )	2025-04-27 01:00:54 -07:00
Lianmin Zheng	21514ff5bd	Disable flaky eagle tests (#5753 )	2025-04-25 15:54:39 -07:00
Zhiqiang Xie	a169b9f813	Fix oom error for large page size (#4913 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-30 21:34:21 -07:00
Lianmin Zheng	9adf178cc2	Fix 2-gpu CI test and suppress some warnings (#4930 )	2025-03-30 12:51:44 -07:00
Lianmin Zheng	b26bc86b36	Support page size > 1 + eagle (#4908 )	2025-03-30 00:46:23 -07:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
Lianmin Zheng	47e6628aae	Fix CI tests (#4853 )	2025-03-28 00:28:35 -07:00
fzyzcjy	15ddd84322	Add retry for flaky tests in CI (#4755 )	2025-03-25 16:53:12 -07:00
James Liu	9e0186f352	[Feature] Support EAGLE 3 (#4247 )	2025-03-18 07:35:23 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Lianmin Zheng	08c4d764a5	lazy import attn backends (#4200 )	2025-03-08 00:41:35 -08:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
Lianmin Zheng	bc1534ff32	Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134 ) Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-06 06:13:59 -08:00
Lianmin Zheng	fcc2e37f69	Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128 )	2025-03-06 00:13:20 -08:00
Ying Sheng	d3d4d76758	[Eagle] Refactor eagle speculative decoding (#3986 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2025-03-05 08:06:07 -08:00
Lianmin Zheng	77a3954bf7	Simplify eagle tests and TP sync in grammar backend (#4066 )	2025-03-04 13:40:40 -08:00
William	0d4e3228cf	[Feature] Add test for speculative_token_map (#4016 )	2025-03-04 04:26:24 -08:00
Yineng Zhang	e0b9a423c8	chore: bump v0.4.3 (#3556 )	2025-02-14 09:43:14 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Ke Bao	7e6d5fc694	Support Eagle cuda graph for Triton backend (#3500 )	2025-02-12 02:27:45 +08:00
Ke Bao	2d61132374	Support Eagle2 for Triton backend (#3466 )	2025-02-10 20:00:42 +08:00
Yineng Zhang	60abdb3e7c	minor: cleanup test_eagle_infer (#3415 )	2025-02-09 09:34:30 +08:00
Ying Sheng	7b4e61fff3	[Fix] Fix eagle with disable cuda graph (#3411 )	2025-02-09 08:40:00 +08:00
Yineng Zhang	6222e1c228	add disable cuda graph unit test for eagle 2 (#3412 )	2025-02-09 08:02:56 +08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
justdoit	a47bf39123	[Eagle2] Fix multiple concurrent request crashes (#2730 )	2025-01-10 14:00:43 -08:00
JJJJOHNSON	694e41925e	[eagle2] fix end check when target model verify (#2723 )	2025-01-07 21:46:02 -08:00
yukavio	815dce0554	Eagle speculative decoding part 4: Add EAGLE2 worker (#2150 ) Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-01-02 03:22:34 -08:00

31 Commits