sglang

Author	SHA1	Message	Date
Yineng Zhang	d8a136a113	upgrade sgl-kernel 0.0.5.post4 (#4873 )	2025-03-28 19:48:56 -07:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
fzyzcjy	d3f71f5e19	Fix torch.cuda.MemPool() internal assertion failure (#4687 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-27 22:29:36 -07:00
Junrong Lin	bb0fd749a6	[Fix] Add compressed_tensors as deps (#4819 )	2025-03-27 18:08:24 -07:00
Yineng Zhang	bbab97a6a8	add partial_json_parser and einops (#4827 )	2025-03-27 13:24:54 -07:00
Yineng Zhang	6f5cc5eb05	update xgrammar 0.1.17 (#4804 )	2025-03-27 00:21:59 -07:00
Yineng Zhang	1099f6c974	bump v0.4.4.post2 (#4669 )	2025-03-26 19:58:00 -07:00
Xiaoyu Zhang	04e3ff6975	Support compressed tensors fp8w8a8 (#4743 )	2025-03-26 13:21:25 -07:00
fzyzcjy	26f07294f1	Warn users when release_memory_occupation is called without memory saver enabled (#4566 )	2025-03-26 00:18:14 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Yineng Zhang	c11cfda07b	update pyproject (#4731 )	2025-03-24 09:50:28 -07:00
Yuhong Guo	64edeb798f	Support dynamic version name in sglang's pyproject.toml (#4720 )	2025-03-24 08:56:31 -07:00
Adarsh Shirawalmath	f8f9244a61	[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-22 14:27:39 -07:00
Yineng Zhang	f81a27f65e	upgrade sgl-kernel 0.0.5.post3 (#4522 )	2025-03-17 14:49:56 -07:00
mlmz	452db50808	Constraint Decoding: Set xgrammar as the default grammar backend (#4386 )	2025-03-16 18:53:43 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Yineng Zhang	ad1ae7f7cd	use topk_softmax with sgl-kernel (#4439 )	2025-03-14 15:59:06 -07:00
Yineng Zhang	ba80c102f9	bump v0.4.4.post1 (#4402 )	2025-03-13 17:53:46 -07:00
Yineng Zhang	6aaeb84872	chore: bump v0.4.4 (#4041 )	2025-03-13 02:49:58 -07:00
Yineng Zhang	3623b6a7f5	upgrade sgl-kernel 0.0.5 (#4381 )	2025-03-13 02:37:56 -07:00
Lianmin Zheng	45de89719c	Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367 )	2025-03-12 23:45:52 -07:00
Meng, Hengyu	71046fcd71	[XPU][CPU] Enable the native path of DeepSeek (#4086 ) Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>	2025-03-12 22:26:29 -07:00
YR Chen	ccdd10c84b	Move `aiohttp` into public dependencies (#3980 )	2025-03-12 21:42:57 -07:00
Yineng Zhang	ed91561f79	upgrade sgl-kernel 0.0.4.post3 (#4334 )	2025-03-12 01:36:41 -07:00
Yineng Zhang	1cf63485c1	upgrade flashinfer 0.2.3 (#4317 ) Co-authored-by: qingquansong <qsong@linkedin.com>	2025-03-11 15:37:17 -07:00
yigex	690e1f2371	[AMD] Fix rocm sgl-kernel missing modules error (#4311 ) Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>	2025-03-11 10:35:28 -07:00
Yineng Zhang	4d27eb9ad1	update sgl-kernel 0.0.4.post2 (#4291 )	2025-03-11 00:34:33 -07:00
Yineng Zhang	e187a3d595	upgrade xgrammar 0.1.15 (#4275 )	2025-03-10 14:53:24 -07:00
Lianmin Zheng	5a6400eec5	Test no vllm custom allreduce (#4256 )	2025-03-10 10:08:25 -07:00
Yineng Zhang	89ccb533ad	use sgl-kernel 0.0.4 (#4224 )	2025-03-08 23:43:09 -08:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
Lianmin Zheng	9c58e68b4c	Release v0.4.3.post4 (#4140 )	2025-03-06 12:50:28 -08:00
Oliver Stanley	d03b3467b8	Fix constrained generation errors by adding datasets dependency (#4142 )	2025-03-06 12:07:51 -08:00
Yineng Zhang	fc671f66c1	chore: bump v0.4.3.post3 (#4114 )	2025-03-05 17:26:10 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
HAI	51d25405a7	ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053 )	2025-03-04 03:00:46 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Yineng Zhang	f3b99f73b3	update flashinfer-python version	2025-02-28 16:31:59 -08:00
Chayenne	90bc26a813	set a strict sgl-kernel version (#3950 )	2025-02-27 22:44:57 -08:00
Yineng Zhang	564bdf29f7	upgrade flashinfer v0.2.2.post1 (#3934 )	2025-02-27 09:53:48 -08:00
Enrique Shockwave	d281587989	Improve: Support xgrammar 0.1.14 (#3593 )	2025-02-27 08:42:54 -08:00
JC1DA	7551498a69	[Feature] Support llguidance for constrained decoding (#3298 )	2025-02-26 10:41:49 -08:00
Lianmin Zheng	27a46317b6	Fix dependency (#3813 )	2025-02-24 03:50:58 -08:00
Yineng Zhang	058d199d4e	use transformers 4.48.3 (#3650 )	2025-02-18 04:40:47 +08:00
Yineng Zhang	a5375adc3a	chore: bump v0.4.3.post2 (#3645 ) Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-18 02:48:30 +08:00
Yineng Zhang	75d171a9c5	chore: update flashinfer v0.2.1.post2 (#3644 )	2025-02-18 02:47:42 +08:00
Yineng Zhang	e782eb7e6a	chore: bump v0.4.3.post1 (#3638 )	2025-02-17 21:58:19 +08:00
Yineng Zhang	bbc47c348f	fix apply_token_bitmask_inplace_cuda (#3594 )	2025-02-15 23:55:08 +08:00
Yineng Zhang	e0b9a423c8	chore: bump v0.4.3 (#3556 )	2025-02-14 09:43:14 +08:00

1 2 3 4 5

222 Commits