Commit Graph

222 Commits

Author SHA1 Message Date
Yineng Zhang
d8a136a113 upgrade sgl-kernel 0.0.5.post4 (#4873) 2025-03-28 19:48:56 -07:00
Lianmin Zheng
74e0ac1dbd Clean up import vllm in quantization/__init__.py (#4834) 2025-03-28 10:34:10 -07:00
fzyzcjy
d3f71f5e19 Fix torch.cuda.MemPool() internal assertion failure (#4687)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-03-27 22:29:36 -07:00
Junrong Lin
bb0fd749a6 [Fix] Add compressed_tensors as deps (#4819) 2025-03-27 18:08:24 -07:00
Yineng Zhang
bbab97a6a8 add partial_json_parser and einops (#4827) 2025-03-27 13:24:54 -07:00
Yineng Zhang
6f5cc5eb05 update xgrammar 0.1.17 (#4804) 2025-03-27 00:21:59 -07:00
Yineng Zhang
1099f6c974 bump v0.4.4.post2 (#4669) 2025-03-26 19:58:00 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
fzyzcjy
26f07294f1 Warn users when release_memory_occupation is called without memory saver enabled (#4566) 2025-03-26 00:18:14 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Yineng Zhang
c11cfda07b update pyproject (#4731) 2025-03-24 09:50:28 -07:00
Yuhong Guo
64edeb798f Support dynamic version name in sglang's pyproject.toml (#4720) 2025-03-24 08:56:31 -07:00
Adarsh Shirawalmath
f8f9244a61 [Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-22 14:27:39 -07:00
Yineng Zhang
f81a27f65e upgrade sgl-kernel 0.0.5.post3 (#4522) 2025-03-17 14:49:56 -07:00
mlmz
452db50808 Constraint Decoding: Set xgrammar as the default grammar backend (#4386) 2025-03-16 18:53:43 -07:00
Ying Sheng
1b859295f4 [Eagle] Remove the greedy branch and some redundant code (#4363)
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-16 02:48:55 -07:00
Yineng Zhang
ad1ae7f7cd use topk_softmax with sgl-kernel (#4439) 2025-03-14 15:59:06 -07:00
Yineng Zhang
ba80c102f9 bump v0.4.4.post1 (#4402) 2025-03-13 17:53:46 -07:00
Yineng Zhang
6aaeb84872 chore: bump v0.4.4 (#4041) 2025-03-13 02:49:58 -07:00
Yineng Zhang
3623b6a7f5 upgrade sgl-kernel 0.0.5 (#4381) 2025-03-13 02:37:56 -07:00
Lianmin Zheng
45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) 2025-03-12 23:45:52 -07:00
Meng, Hengyu
71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
2025-03-12 22:26:29 -07:00
YR Chen
ccdd10c84b Move aiohttp into public dependencies (#3980) 2025-03-12 21:42:57 -07:00
Yineng Zhang
ed91561f79 upgrade sgl-kernel 0.0.4.post3 (#4334) 2025-03-12 01:36:41 -07:00
Yineng Zhang
1cf63485c1 upgrade flashinfer 0.2.3 (#4317)
Co-authored-by:  qingquansong <qsong@linkedin.com>
2025-03-11 15:37:17 -07:00
yigex
690e1f2371 [AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
2025-03-11 10:35:28 -07:00
Yineng Zhang
4d27eb9ad1 update sgl-kernel 0.0.4.post2 (#4291) 2025-03-11 00:34:33 -07:00
Yineng Zhang
e187a3d595 upgrade xgrammar 0.1.15 (#4275) 2025-03-10 14:53:24 -07:00
Lianmin Zheng
5a6400eec5 Test no vllm custom allreduce (#4256) 2025-03-10 10:08:25 -07:00
Yineng Zhang
89ccb533ad use sgl-kernel 0.0.4 (#4224) 2025-03-08 23:43:09 -08:00
Lianmin Zheng
d4017a6b63 [EAGLE] many fixes for eagle (#4195)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-07 22:12:13 -08:00
Lianmin Zheng
9c58e68b4c Release v0.4.3.post4 (#4140) 2025-03-06 12:50:28 -08:00
Oliver Stanley
d03b3467b8 Fix constrained generation errors by adding datasets dependency (#4142) 2025-03-06 12:07:51 -08:00
Yineng Zhang
fc671f66c1 chore: bump v0.4.3.post3 (#4114) 2025-03-05 17:26:10 -08:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
HAI
51d25405a7 ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053) 2025-03-04 03:00:46 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Yineng Zhang
f3b99f73b3 update flashinfer-python version 2025-02-28 16:31:59 -08:00
Chayenne
90bc26a813 set a strict sgl-kernel version (#3950) 2025-02-27 22:44:57 -08:00
Yineng Zhang
564bdf29f7 upgrade flashinfer v0.2.2.post1 (#3934) 2025-02-27 09:53:48 -08:00
Enrique Shockwave
d281587989 Improve: Support xgrammar 0.1.14 (#3593) 2025-02-27 08:42:54 -08:00
JC1DA
7551498a69 [Feature] Support llguidance for constrained decoding (#3298) 2025-02-26 10:41:49 -08:00
Lianmin Zheng
27a46317b6 Fix dependency (#3813) 2025-02-24 03:50:58 -08:00
Yineng Zhang
058d199d4e use transformers 4.48.3 (#3650) 2025-02-18 04:40:47 +08:00
Yineng Zhang
a5375adc3a chore: bump v0.4.3.post2 (#3645)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-18 02:48:30 +08:00
Yineng Zhang
75d171a9c5 chore: update flashinfer v0.2.1.post2 (#3644) 2025-02-18 02:47:42 +08:00
Yineng Zhang
e782eb7e6a chore: bump v0.4.3.post1 (#3638) 2025-02-17 21:58:19 +08:00
Yineng Zhang
bbc47c348f fix apply_token_bitmask_inplace_cuda (#3594) 2025-02-15 23:55:08 +08:00
Yineng Zhang
e0b9a423c8 chore: bump v0.4.3 (#3556) 2025-02-14 09:43:14 +08:00