Commit Graph

351 Commits

Author SHA1 Message Date
Yineng Zhang
fab0f6e77d chore: bump v0.5.0rc2 (#9203) 2025-08-14 16:11:16 -07:00
Yineng Zhang
ac474869d4 chore: upgrade transformers 4.55.2 (#9197) 2025-08-14 13:51:02 -07:00
Hongbo Xu
2cc9eeab01 [4/n]decouple quantization implementation from vLLM dependency (#9191)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 12:05:46 -07:00
eigen
4dbf43601d fix: zero_init buffer (#9065)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 02:39:09 -07:00
Yineng Zhang
7b56e494be chore: bump v0.5.0rc1 (#9069) 2025-08-13 10:44:14 -07:00
Ke Bao
0ff6d1fce1 Support FA3 backend for gpt-oss (#9028) 2025-08-13 10:41:50 -07:00
jacky.cheng
25caa7a8a9 [AMD] Support Wave attention backend with AMD GPU optimizations (#8660)
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Co-authored-by: Harsh Menon <harsh@nod-labs.com>
Co-authored-by: Stanley Winata <stanley.winata@amd.com>
Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com>
Co-authored-by: Stanley Winata <stanley@nod-labs.com>
Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>
Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com>
Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com>
Co-authored-by: Ivan Butygin <ibutygin@amd.com>
2025-08-12 13:49:11 -07:00
Jiaqi Gu
c9ee738515 Fuse writing KV buffer into rope kernel (part 2: srt) (#9014)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-08-12 13:15:30 -07:00
zhyncs
f4ae50e97c fix: use flashinfer v0.2.11.post1 2025-08-11 02:49:25 -07:00
Yineng Zhang
84cb449eec Revert "chore: upgrade flashinfer 0.2.11 (#9036)" (#9057) 2025-08-11 00:16:39 -07:00
Yineng Zhang
dd001a5477 chore: upgrade flashinfer 0.2.11 (#9036) 2025-08-10 17:35:37 -07:00
Lianmin Zheng
b58ae7a2a0 Simplify frontend language (#9029) 2025-08-10 10:59:30 -07:00
Yineng Zhang
326a901df4 chore: upgrade sgl-kernel 0.3.3 (#8998) 2025-08-09 01:22:01 -07:00
ishandhanani
de8b8b6e5c chore(deps): update minimum python to 3.10 (#8984) 2025-08-09 00:30:23 -07:00
Lianmin Zheng
706bd69cc5 Clean up server_args.py to have a dedicated function for model specific adjustments (#8983) 2025-08-08 19:56:50 -07:00
Lianmin Zheng
67a7d1f699 Create cancel-all-pr-test-runs (#8986) 2025-08-08 15:53:51 -07:00
Yineng Zhang
9020f7fc32 chore: bump v0.5.0rc0 (#8959) 2025-08-08 09:16:18 -07:00
Yineng Zhang
4bf6e5a6b0 fix: use openai 1.99.1 (#8927) 2025-08-07 14:20:35 -07:00
Chang Su
92cc32d9fc Support v1/responses and use harmony in serving_chat (#8837)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-06 16:20:34 -07:00
Yineng Zhang
3ae8e3ea8f chore: upgrade torch 2.8.0 (#8836) 2025-08-05 17:32:01 -07:00
Yineng Zhang
4f4e0e4162 chore: upgrade flashinfer 0.2.10 (#8827) 2025-08-05 12:04:01 -07:00
Yineng Zhang
901ab758ec chore: upgrade transformers 4.55.0 (#8823)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
2025-08-05 11:37:21 -07:00
Yineng Zhang
1ea94d3b92 chore: upgrade flashinfer v0.2.9 (#8780) 2025-08-04 21:59:18 -07:00
Yineng Zhang
8cd344586e chore: bump v0.4.10.post2 (#8727) 2025-08-03 03:43:29 -07:00
fzyzcjy
0e612dbf12 Tiny fix CI pytest error (#8524) 2025-08-02 22:48:42 -07:00
Swipe4057
5deab1283a upgrade xgrammar 0.1.22 (#8522) 2025-08-01 15:59:15 -07:00
pansicheng
20b5563eda Add hf3fs_utils.cpp to package-data (#8653) 2025-08-01 12:41:09 +08:00
Ke Bao
33f0de337d chore: bump v0.4.10.post1 (#8652) 2025-08-01 12:07:30 +08:00
Yineng Zhang
023288645b chore: bump v0.4.10 (#8608) 2025-07-31 20:50:17 +08:00
Cheng Wan
e179e0b797 update sgl-kernel for EP: python part (#8550) 2025-07-31 00:14:39 -07:00
Lifu Huang
67e53b16f5 Bump transfomers to 4.54.1 to fix Gemma cache issue. (#8541) 2025-07-30 19:50:54 -07:00
Yineng Zhang
6478831be9 chore: bump v0.4.9.post6 (#8517) 2025-07-29 02:30:07 -07:00
Yineng Zhang
ccfe52a057 fix: update dep (#8467) 2025-07-28 10:19:33 -07:00
Yineng Zhang
45bc170b36 chore: bump v0.4.9.post5 (#8458) 2025-07-28 02:11:06 -07:00
Stefan He
4ad9737045 chore: bump transformer to 4.54.0 (#8416)
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
2025-07-27 21:27:25 -07:00
Yineng Zhang
10ee89559e chore: upgrade flashinfer v0.2.9rc2 (#8406) 2025-07-27 01:41:22 -07:00
Yineng Zhang
2272c2a5b5 chore: bump v0.4.9.post4 (#8305) 2025-07-25 17:12:47 -07:00
Swipe4057
8d1c5b948e chore: upgrade flashinfer v0.2.9rc1 (#8301)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-07-24 14:29:56 -07:00
Haohui Mai
f7e102d56a Pin the version of petit kernel to fix the APIs (#8235) 2025-07-23 17:57:20 -07:00
Yineng Zhang
4953f4ca9a chore: upgrade sgl-kernel 0.2.7 (#8304) 2025-07-23 15:07:27 -07:00
Yineng Zhang
01c000043c chore: bump v0.4.9.post3 (#8265) 2025-07-22 15:55:48 -07:00
Yineng Zhang
74f59ae555 chore: upgrade sgl-kernel 0.2.6.post1 (#8202) 2025-07-21 02:10:24 -07:00
Yineng Zhang
561dd7b2ce chore: upgrade sgl-kernel 0.2.6 (#8166) 2025-07-19 03:17:08 -07:00
Haohui Mai
d918ab7985 Support NVFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#7302)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
2025-07-18 19:59:39 -07:00
Mick
497efe747d Revert "feat: replace Decord with video_reader-rs" (#8077) 2025-07-15 20:04:56 -07:00
Xinyuan Tong
7498522f7d update transformers to 4.53.2 (#8029)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-07-15 18:24:39 -07:00
kozo
ebff5fcb06 feat: replace Decord with video_reader-rs (#5163)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-07-15 18:17:34 -07:00
mqhc2020
a562c8a35c [Dockerfile] Multi-arch support for ROCm (#7902)
Co-authored-by: Lin, Soga <soga.lin@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
2025-07-14 06:13:09 +00:00
Yineng Zhang
eb118d88c4 chore: bump v0.4.9.post2 (#7963) 2025-07-11 21:11:20 -07:00
Yineng Zhang
732fc8e405 chore: upgrade sgl-kernel 0.2.5 (#7971) 2025-07-11 20:35:06 -07:00