Commit Graph

389 Commits

Author SHA1 Message Date
Yineng Zhang
8c1ef0f914 chore: upgrade sgl-kernel 0.3.12 (#10782) 2025-09-23 00:18:54 -07:00
Yineng Zhang
6f993e8b9e chore: cleanup docker image (#10671) 2025-09-19 16:56:49 -07:00
Baizhou Zhang
3fa3c22ae2 Fix fast decode plan for flashinfer v0.4.0rc1 and upgrade sgl-kernel 0.3.11 (#10634)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-09-19 01:25:29 -07:00
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00
kyleliang-nv
e1d45bc280 Fix decord dependency for aarch64 docker build (#10529) 2025-09-16 17:34:37 -07:00
Yineng Zhang
c0c6f543e4 chore: upgrade sgl-kernel 0.3.10 (#10500) 2025-09-16 02:00:53 -07:00
Yineng Zhang
86a32bb5cd chore: bump v0.5.3rc0 (#10468) 2025-09-15 03:55:18 -07:00
Yineng Zhang
5afd036533 feat: support pip install sglang (#10465) 2025-09-15 03:09:17 -07:00
Feng Su
4c21b09074 [Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peng Wang <rocking@linux.alibaba.com>
2025-09-15 02:08:02 +08:00
Mohammad Miadh Angkad
321fecab74 Add sentencepiece to project dependencies (#10386) 2025-09-12 16:02:54 -07:00
chenge@xiaohongshu.com
1b1701f1f7 model: support dots.vlm1 model (#8778)
Co-authored-by: weishi <bushou@xiaohongshu.com>
Co-authored-by: Ezra-Yu <1105212286@qq.com>
Co-authored-by: Jianfei Wang <905787410@qq.com>
Co-authored-by: qianwu <wangjianfei@xiaohongshu.com>
2025-09-12 17:38:38 +08:00
Yineng Zhang
b0d25e72c4 chore: bump v0.5.2 (#10221) 2025-09-11 16:09:20 -07:00
Yineng Zhang
bfe01a5eef chore: upgrade v0.3.9.post2 sgl-kernel (#10297) 2025-09-11 04:10:29 -07:00
Zaili Wang
ef959d7b85 [CPU] fix OOM when mem-fraction is not set (#9090) 2025-09-10 23:52:22 -07:00
Lianmin Zheng
bcf1955f7e Revert "chore: upgrade v0.3.9 sgl-kernel" (#10245) 2025-09-09 19:05:20 -07:00
Yineng Zhang
d3ee70985f chore: upgrade v0.3.9 sgl-kernel (#10220) 2025-09-09 03:16:25 -07:00
Yineng Zhang
94fb4e9e54 feat: support fa cute in sgl-kernel (#10205)
Co-authored-by: cicirori <32845984+cicirori@users.noreply.github.com>
2025-09-09 00:14:39 -07:00
Swipe4057
bfd7a18d8d update xgrammar 0.1.24 and transformers 4.56.1 (#10155) 2025-09-08 01:20:31 -07:00
jacky.cheng
efb0de2c8d Update wave-lang to 3.7.0 and unify Wave kernel buffer options (#10069) 2025-09-05 16:01:52 -07:00
hlu1
2985090084 Update flashinfer to 0.3.1 for B300 support (#10087)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-09-05 13:41:01 -07:00
Yineng Zhang
fa9c82d339 chore: bump v0.5.2rc2 (#10050) 2025-09-04 20:07:27 -07:00
Yineng Zhang
18f91eb639 chore: bump v0.5.2rc1 (#9920) 2025-09-02 04:43:34 -07:00
JieXin Liang
1db649ac02 [feat] apply deep_gemm compile_mode to skip launch (#9879) 2025-09-02 03:20:30 -07:00
Yineng Zhang
16e56ea693 chore: bump v0.5.2rc0 (#9862) 2025-09-01 03:07:36 -07:00
Yineng Zhang
349b491c63 chore: upgrade flashinfer 0.3.0 (#9864) 2025-09-01 03:07:19 -07:00
Yineng Zhang
300676afac chore: upgrade transformers 4.56.0 (#9827) 2025-08-30 14:07:34 -07:00
Yineng Zhang
9970e3bf32 chore: upgrade sgl-kernel 0.3.7.post1 with deepgemm fix (#9822) 2025-08-30 04:02:25 -07:00
Yineng Zhang
3d8fc43400 chore: upgrade flashinfer 0.3.0rc1 (#9793) 2025-08-29 16:24:17 -07:00
Yineng Zhang
bc80dc4ce0 chore: bump v0.5.1.post3 (#9716) 2025-08-27 15:42:42 -07:00
Yineng Zhang
b962a296ed chore: upgrade sgl-kernel 0.3.7 (#9708) 2025-08-27 14:00:31 -07:00
Liangsheng Yin
0ff7241995 Improve bench_one_batch_server script (#9608)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-26 10:38:37 +08:00
Yineng Zhang
e3e97a120b chore: bump v0.5.1.post2 (#9592) 2025-08-25 03:45:09 -07:00
Yineng Zhang
938e986e15 chore: upgrade flashinfer 0.2.14.post1 (#9578) 2025-08-25 00:12:17 -07:00
Yineng Zhang
e0ab167db0 chore: bump v0.5.1.post1 (#9558) 2025-08-24 01:14:17 -07:00
Lianmin Zheng
97a38ee85b Release 0.5.1 (#9533) 2025-08-23 07:09:26 -07:00
Lianmin Zheng
f20b6a3f2b [minor] Sync style changes (#9376) 2025-08-19 21:35:01 -07:00
Swipe4057
6805f6da40 upgrade xgrammar 0.1.23 and openai-harmony 0.0.4 (#9284) 2025-08-18 14:02:00 -07:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00
Yineng Zhang
fab0f6e77d chore: bump v0.5.0rc2 (#9203) 2025-08-14 16:11:16 -07:00
Yineng Zhang
ac474869d4 chore: upgrade transformers 4.55.2 (#9197) 2025-08-14 13:51:02 -07:00
Hongbo Xu
2cc9eeab01 [4/n]decouple quantization implementation from vLLM dependency (#9191)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 12:05:46 -07:00
eigen
4dbf43601d fix: zero_init buffer (#9065)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 02:39:09 -07:00
Yineng Zhang
7b56e494be chore: bump v0.5.0rc1 (#9069) 2025-08-13 10:44:14 -07:00
Ke Bao
0ff6d1fce1 Support FA3 backend for gpt-oss (#9028) 2025-08-13 10:41:50 -07:00
jacky.cheng
25caa7a8a9 [AMD] Support Wave attention backend with AMD GPU optimizations (#8660)
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Co-authored-by: Harsh Menon <harsh@nod-labs.com>
Co-authored-by: Stanley Winata <stanley.winata@amd.com>
Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com>
Co-authored-by: Stanley Winata <stanley@nod-labs.com>
Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>
Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com>
Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com>
Co-authored-by: Ivan Butygin <ibutygin@amd.com>
2025-08-12 13:49:11 -07:00
Jiaqi Gu
c9ee738515 Fuse writing KV buffer into rope kernel (part 2: srt) (#9014)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-08-12 13:15:30 -07:00
zhyncs
f4ae50e97c fix: use flashinfer v0.2.11.post1 2025-08-11 02:49:25 -07:00
Yineng Zhang
84cb449eec Revert "chore: upgrade flashinfer 0.2.11 (#9036)" (#9057) 2025-08-11 00:16:39 -07:00
Yineng Zhang
dd001a5477 chore: upgrade flashinfer 0.2.11 (#9036) 2025-08-10 17:35:37 -07:00
Lianmin Zheng
b58ae7a2a0 Simplify frontend language (#9029) 2025-08-10 10:59:30 -07:00