Commit Graph

185 Commits

Author SHA1 Message Date
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Yineng Zhang
f3b99f73b3 update flashinfer-python version 2025-02-28 16:31:59 -08:00
Chayenne
90bc26a813 set a strict sgl-kernel version (#3950) 2025-02-27 22:44:57 -08:00
Yineng Zhang
564bdf29f7 upgrade flashinfer v0.2.2.post1 (#3934) 2025-02-27 09:53:48 -08:00
Enrique Shockwave
d281587989 Improve: Support xgrammar 0.1.14 (#3593) 2025-02-27 08:42:54 -08:00
JC1DA
7551498a69 [Feature] Support llguidance for constrained decoding (#3298) 2025-02-26 10:41:49 -08:00
Lianmin Zheng
27a46317b6 Fix dependency (#3813) 2025-02-24 03:50:58 -08:00
Yineng Zhang
058d199d4e use transformers 4.48.3 (#3650) 2025-02-18 04:40:47 +08:00
Yineng Zhang
a5375adc3a chore: bump v0.4.3.post2 (#3645)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-18 02:48:30 +08:00
Yineng Zhang
75d171a9c5 chore: update flashinfer v0.2.1.post2 (#3644) 2025-02-18 02:47:42 +08:00
Yineng Zhang
e782eb7e6a chore: bump v0.4.3.post1 (#3638) 2025-02-17 21:58:19 +08:00
Yineng Zhang
bbc47c348f fix apply_token_bitmask_inplace_cuda (#3594) 2025-02-15 23:55:08 +08:00
Yineng Zhang
e0b9a423c8 chore: bump v0.4.3 (#3556) 2025-02-14 09:43:14 +08:00
Yineng Zhang
70f894b810 feat: support flashinfer mla attention for deepseek v3 (#3550) 2025-02-14 08:50:14 +08:00
yizhang2077
98eecbda54 integrate blockwise fp8 kernel (#3529) 2025-02-13 04:39:33 +08:00
Xiaoyu Zhang
45e3a7bc41 use sgl_per_token_group_quant_fp8 kernel (#3493) 2025-02-12 18:40:42 +08:00
Yineng Zhang
cddb1cdf8f chore: bump v0.4.2.post4 (#3459) 2025-02-10 14:12:16 +08:00
Yineng Zhang
85986bb978 compatible with new outlines (#3435) 2025-02-10 01:51:30 +08:00
Yineng Zhang
c1f5f99f60 chore: bump v0.4.2.post3 (#3369) 2025-02-07 08:20:03 -08:00
Yineng Zhang
f287037673 update sgl-kernel version (#3374) 2025-02-07 20:51:06 +08:00
Yineng Zhang
7aad8d1854 chore: bump v0.4.2.post2 (#3313) 2025-02-05 17:35:02 +08:00
HAI
2c1a695ff1 ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287) 2025-02-04 21:44:44 +08:00
Yineng Zhang
d39899e85c upgrade flashinfer v0.2.0.post2 (#3288)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-04 21:41:40 +08:00
HAI
566d61d90f ROCm: bump 6.3.0 (#3259) 2025-02-03 04:13:40 +08:00
Yineng Zhang
34e405e01f update sgl-kernel version for sglang (#3238) 2025-02-01 02:14:41 +08:00
Yineng Zhang
cf0f7eafe6 chore: bump v0.4.2.post1 (#3233) 2025-01-31 20:35:55 +08:00
Yineng Zhang
4ab43cfb3e chore: bump v0.4.2 (#3180) 2025-01-27 21:42:05 +08:00
Yineng Zhang
2f79f58873 feat: use sgl-kernel 0.0.3 in sglang (#3179) 2025-01-27 21:39:52 +08:00
Yineng Zhang
7e0976133c udpate sgl-kernel version for srt (#3150) 2025-01-26 20:22:34 +08:00
Yineng Zhang
e94fb7cb10 chore: bump v0.4.1.post7 (#3009) 2025-01-20 21:50:55 +08:00
Enrique Shockwave
3bcf5ecea7 support regex in xgrammar backend (#2983) 2025-01-20 04:34:41 +08:00
Yineng Zhang
2c05f81f15 fix custom op version compatibility (#2988) 2025-01-20 04:21:29 +08:00
Chunyuan WU
63051738a9 Enable CPU device on SGLang (#2806) 2025-01-16 21:22:53 -08:00
yizhang2077
767c9dec03 adapt custom allreduce for tensorrt llm (#2511) 2025-01-16 04:57:35 +08:00
Yineng Zhang
b3e99dfb22 chore: bump v0.4.1.post6 (#2899) 2025-01-15 16:23:42 +08:00
fzyzcjy
923f518337 CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630) 2025-01-13 11:38:51 -08:00
Xiaoyu Zhang
d08c77c434 Sampling penalties memory interface (#2870) 2025-01-13 23:09:00 +08:00
Lianmin Zheng
6249e4a19e Revert "Integration of TurboMind AWQ" (#2866) 2025-01-13 04:44:39 -08:00
bjmsong
17de02f98d Integration of TurboMind AWQ (#2828)
Co-authored-by: root <bjmsong@126.com>
2025-01-13 20:14:16 +08:00
Yineng Zhang
f624901cdd chore: bump v0.4.1.post5 (#2840) 2025-01-11 23:10:02 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
Yineng Zhang
2f0d386496 chore: bump v0.4.1.post4 (#2713) 2025-01-06 01:29:54 +08:00
kk
b6e0cfb5e1 ROCm base image update (#2692)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-01-01 12:12:19 +08:00
Lianmin Zheng
03d5fbfd44 Release 0.4.1.post3 - upload the config.json to PyPI (#2647) 2024-12-29 14:25:53 -08:00
Yineng Zhang
3ccf566b0d chore: bump v0.4.1.post2 (#2643) 2024-12-30 00:11:46 +08:00
Yineng Zhang
ef5b0ff90b chore: bump v0.4.1.post1 (#2616) 2024-12-28 00:11:06 +08:00
HandH1998
6e5305158c update sgl_moe_align_block_size usage (#2617) 2024-12-28 00:01:13 +08:00
yudian0504
531d6ea968 fix: package data missing (#2521) 2024-12-26 08:16:48 -08:00
Yineng Zhang
635a042623 docs: update deepseek v3 example (#2592) 2024-12-26 17:43:37 +08:00
Yineng Zhang
efc52f85e2 chore: bump v0.4.1 (#2582) 2024-12-26 07:14:51 +08:00