sglang

Author	SHA1	Message	Date
Yineng Zhang	230106304d	chore: upgrade sgl-kernel v0.1.2.post1 (#6196 ) Co-authored-by: alcanderian <alcanderian@gmail.com>	2025-05-11 22:41:37 +08:00
shangmingc	31d1f6e7f4	[PD] Add simple unit test for disaggregation feature (#5654 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-11 13:35:27 +08:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Jinyan Chen	8a828666a3	Add DeepEP to CI PR Test (#5655 ) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>	2025-05-06 17:36:03 -07:00
Huapeng Zhou	b8559764f6	[Test] Add flashmla attention backend test (#5587 )	2025-05-05 10:32:02 -07:00
Yineng Zhang	9a6ad8916d	chore: upgrade sgl-kernel 0.1.1 (#5933 )	2025-04-30 16:13:30 -07:00
Yineng Zhang	41ac0c6d48	chore: upgrade sgl-kernel 0.1.0 (#5690 )	2025-04-27 21:00:50 -07:00
Lianmin Zheng	3dd3538c18	Pin torch audio to 2.6.0 (#5750 )	2025-04-25 15:06:28 -07:00
Ravi Theja	7d9679b74d	Add MMMU benchmark results (#4491 ) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>	2025-04-25 15:23:53 +08:00
Yineng Zhang	7282ab741a	fix: update bench_speculative (#5649 )	2025-04-22 16:08:15 -07:00
Byron Hsu	bf98d2e377	[PD] Support prefill overlap + Ensure no race condition (#5609 )	2025-04-21 12:12:56 -07:00
Byron Hsu	deded17f38	[PD] Fix edge case and simplify large page size + chunked prefill (#5589 )	2025-04-21 10:27:02 -07:00
Byron Hsu	c951d312ed	[PD] Fix large page size + chunk prefill (#5588 )	2025-04-20 17:21:54 -07:00
lukec	417b44eba8	[Feat] upgrade pytorch2.6 (#5417 )	2025-04-20 16:06:34 -07:00
Yineng Zhang	0961feefca	feat: use flashinfer jit package (#5547 )	2025-04-19 00:28:39 -07:00
Yineng Zhang	2c11f9c2eb	chore: upgrade sgl-kernel 0.0.9.post2 (#5540 )	2025-04-18 21:17:23 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Yineng Zhang	8ec0bb7d55	chore: upgrade sgl-kernel 0.0.9.post1 (#5436 )	2025-04-15 15:45:51 -07:00
Yineng Zhang	8aab7fdb21	chore: upgrade sgl-kernel 0.0.9 (#5401 )	2025-04-14 22:37:59 -07:00
Yineng Zhang	f58b929a51	chore: upgrade sgl-kernel 0.0.8.post3 (#5342 )	2025-04-13 00:45:59 -07:00
Adarsh Shirawalmath	a0a9f6d64f	[Docs] Remove the older supported docs section (#5301 )	2025-04-11 11:30:18 -07:00
Yineng Zhang	80aa8ca84e	fix: update update_wheel_index for cu128 (#5300 )	2025-04-11 09:31:03 -07:00
Yi Zhang	aba5ca154d	python transfer custom allreduce from trt kernel to vllm kernel (#5080 )	2025-04-05 15:35:55 -07:00
Yineng Zhang	0d99adb715	upgrade transformers 4.51.0 (#5088 )	2025-04-05 14:20:23 -07:00
Yineng Zhang	e53bf190bc	upgrade sgl-kernel v0.0.7 (#5049 )	2025-04-03 17:07:54 -07:00
Xiaoyu Zhang	772d2a191d	try to fix ci oserror (#5024 )	2025-04-03 02:45:05 -07:00
Yineng Zhang	1c63e79756	use fa3 in sgl-kernel (#4954 )	2025-03-31 16:14:49 -07:00
Lianmin Zheng	b26bc86b36	Support page size > 1 + eagle (#4908 )	2025-03-30 00:46:23 -07:00
Yineng Zhang	d8a136a113	upgrade sgl-kernel 0.0.5.post4 (#4873 )	2025-03-28 19:48:56 -07:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
Xiaoyu Zhang	04e3ff6975	Support compressed tensors fp8w8a8 (#4743 )	2025-03-26 13:21:25 -07:00
Yineng Zhang	9b7cf9ee6c	support cu128 sgl-kernel (#4744 )	2025-03-24 20:53:23 -07:00
Adarsh Shirawalmath	f8f9244a61	[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-22 14:27:39 -07:00
Xiaoyu Zhang	dd865befde	[Hotfix] solve fp8 w8a8 ci test fail (#4531 )	2025-03-17 23:17:04 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
lukec	a53fe428f9	Support FlashMLA backend (#4472 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-16 09:07:06 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Mick	035ac2ab74	ci: update transformers==4.48.3 (#4451 )	2025-03-15 13:27:26 -07:00
Yineng Zhang	ad1ae7f7cd	use topk_softmax with sgl-kernel (#4439 )	2025-03-14 15:59:06 -07:00
Lianmin Zheng	f141298a3c	Update ci_install_dependency.sh to use accelerate 1.4.0 (#4392 ) Co-authored-by: wangyu <wangyu.steph@bytedance.com> Co-authored-by: wangyu <yuwangauto@foxmail.com>	2025-03-13 07:16:11 -07:00
Yineng Zhang	3623b6a7f5	upgrade sgl-kernel 0.0.5 (#4381 )	2025-03-13 02:37:56 -07:00
Yineng Zhang	ed91561f79	upgrade sgl-kernel 0.0.4.post3 (#4334 )	2025-03-12 01:36:41 -07:00
Yineng Zhang	1cf63485c1	upgrade flashinfer 0.2.3 (#4317 ) Co-authored-by: qingquansong <qsong@linkedin.com>	2025-03-11 15:37:17 -07:00
Yineng Zhang	4d27eb9ad1	update sgl-kernel 0.0.4.post2 (#4291 )	2025-03-11 00:34:33 -07:00
Lianmin Zheng	5a6400eec5	Test no vllm custom allreduce (#4256 )	2025-03-10 10:08:25 -07:00
Ke Bao	f1d09a6541	Update bench speculative script (#4235 )	2025-03-09 12:19:01 -07:00
Yineng Zhang	89ccb533ad	use sgl-kernel 0.0.4 (#4224 )	2025-03-08 23:43:09 -08:00
Yineng Zhang	70866b6f4f	use same version for ci and pyproject (#4187 )	2025-03-07 10:39:55 -08:00
Adarsh Shirawalmath	19fd57bcd7	[docs] fix HF reference script command (#4148 )	2025-03-06 13:21:54 -08:00
Ke Bao	9fafa62db7	Share target model embed and head weights for nextn (#4033 )	2025-03-03 13:30:04 -08:00

1 2 3

122 Commits