Commit Graph

191 Commits

Author SHA1 Message Date
Lianmin Zheng
9c58e68b4c Release v0.4.3.post4 (#4140) 2025-03-06 12:50:28 -08:00
Oliver Stanley
d03b3467b8 Fix constrained generation errors by adding datasets dependency (#4142) 2025-03-06 12:07:51 -08:00
Yineng Zhang
fc671f66c1 chore: bump v0.4.3.post3 (#4114) 2025-03-05 17:26:10 -08:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
HAI
51d25405a7 ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053) 2025-03-04 03:00:46 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Yineng Zhang
f3b99f73b3 update flashinfer-python version 2025-02-28 16:31:59 -08:00
Chayenne
90bc26a813 set a strict sgl-kernel version (#3950) 2025-02-27 22:44:57 -08:00
Yineng Zhang
564bdf29f7 upgrade flashinfer v0.2.2.post1 (#3934) 2025-02-27 09:53:48 -08:00
Enrique Shockwave
d281587989 Improve: Support xgrammar 0.1.14 (#3593) 2025-02-27 08:42:54 -08:00
JC1DA
7551498a69 [Feature] Support llguidance for constrained decoding (#3298) 2025-02-26 10:41:49 -08:00
Lianmin Zheng
27a46317b6 Fix dependency (#3813) 2025-02-24 03:50:58 -08:00
Yineng Zhang
058d199d4e use transformers 4.48.3 (#3650) 2025-02-18 04:40:47 +08:00
Yineng Zhang
a5375adc3a chore: bump v0.4.3.post2 (#3645)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-18 02:48:30 +08:00
Yineng Zhang
75d171a9c5 chore: update flashinfer v0.2.1.post2 (#3644) 2025-02-18 02:47:42 +08:00
Yineng Zhang
e782eb7e6a chore: bump v0.4.3.post1 (#3638) 2025-02-17 21:58:19 +08:00
Yineng Zhang
bbc47c348f fix apply_token_bitmask_inplace_cuda (#3594) 2025-02-15 23:55:08 +08:00
Yineng Zhang
e0b9a423c8 chore: bump v0.4.3 (#3556) 2025-02-14 09:43:14 +08:00
Yineng Zhang
70f894b810 feat: support flashinfer mla attention for deepseek v3 (#3550) 2025-02-14 08:50:14 +08:00
yizhang2077
98eecbda54 integrate blockwise fp8 kernel (#3529) 2025-02-13 04:39:33 +08:00
Xiaoyu Zhang
45e3a7bc41 use sgl_per_token_group_quant_fp8 kernel (#3493) 2025-02-12 18:40:42 +08:00
Yineng Zhang
cddb1cdf8f chore: bump v0.4.2.post4 (#3459) 2025-02-10 14:12:16 +08:00
Yineng Zhang
85986bb978 compatible with new outlines (#3435) 2025-02-10 01:51:30 +08:00
Yineng Zhang
c1f5f99f60 chore: bump v0.4.2.post3 (#3369) 2025-02-07 08:20:03 -08:00
Yineng Zhang
f287037673 update sgl-kernel version (#3374) 2025-02-07 20:51:06 +08:00
Yineng Zhang
7aad8d1854 chore: bump v0.4.2.post2 (#3313) 2025-02-05 17:35:02 +08:00
HAI
2c1a695ff1 ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287) 2025-02-04 21:44:44 +08:00
Yineng Zhang
d39899e85c upgrade flashinfer v0.2.0.post2 (#3288)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-04 21:41:40 +08:00
HAI
566d61d90f ROCm: bump 6.3.0 (#3259) 2025-02-03 04:13:40 +08:00
Yineng Zhang
34e405e01f update sgl-kernel version for sglang (#3238) 2025-02-01 02:14:41 +08:00
Yineng Zhang
cf0f7eafe6 chore: bump v0.4.2.post1 (#3233) 2025-01-31 20:35:55 +08:00
Yineng Zhang
4ab43cfb3e chore: bump v0.4.2 (#3180) 2025-01-27 21:42:05 +08:00
Yineng Zhang
2f79f58873 feat: use sgl-kernel 0.0.3 in sglang (#3179) 2025-01-27 21:39:52 +08:00
Yineng Zhang
7e0976133c udpate sgl-kernel version for srt (#3150) 2025-01-26 20:22:34 +08:00
Yineng Zhang
e94fb7cb10 chore: bump v0.4.1.post7 (#3009) 2025-01-20 21:50:55 +08:00
Enrique Shockwave
3bcf5ecea7 support regex in xgrammar backend (#2983) 2025-01-20 04:34:41 +08:00
Yineng Zhang
2c05f81f15 fix custom op version compatibility (#2988) 2025-01-20 04:21:29 +08:00
Chunyuan WU
63051738a9 Enable CPU device on SGLang (#2806) 2025-01-16 21:22:53 -08:00
yizhang2077
767c9dec03 adapt custom allreduce for tensorrt llm (#2511) 2025-01-16 04:57:35 +08:00
Yineng Zhang
b3e99dfb22 chore: bump v0.4.1.post6 (#2899) 2025-01-15 16:23:42 +08:00
fzyzcjy
923f518337 CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630) 2025-01-13 11:38:51 -08:00
Xiaoyu Zhang
d08c77c434 Sampling penalties memory interface (#2870) 2025-01-13 23:09:00 +08:00
Lianmin Zheng
6249e4a19e Revert "Integration of TurboMind AWQ" (#2866) 2025-01-13 04:44:39 -08:00
bjmsong
17de02f98d Integration of TurboMind AWQ (#2828)
Co-authored-by: root <bjmsong@126.com>
2025-01-13 20:14:16 +08:00
Yineng Zhang
f624901cdd chore: bump v0.4.1.post5 (#2840) 2025-01-11 23:10:02 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
Yineng Zhang
2f0d386496 chore: bump v0.4.1.post4 (#2713) 2025-01-06 01:29:54 +08:00
kk
b6e0cfb5e1 ROCm base image update (#2692)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-01-01 12:12:19 +08:00
Lianmin Zheng
03d5fbfd44 Release 0.4.1.post3 - upload the config.json to PyPI (#2647) 2024-12-29 14:25:53 -08:00