Commit Graph

122 Commits

Author SHA1 Message Date
Yineng Zhang
8c1ef0f914 chore: upgrade sgl-kernel 0.3.12 (#10782) 2025-09-23 00:18:54 -07:00
Baizhou Zhang
3fa3c22ae2 Fix fast decode plan for flashinfer v0.4.0rc1 and upgrade sgl-kernel 0.3.11 (#10634)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-09-19 01:25:29 -07:00
penguin_wwy
93f75778be [RL] Add destroy process group api (#9979) 2025-09-19 00:31:56 +08:00
cicirori
a2f7218a2e support using fa4 on deepseek on blackwell (#9928) 2025-09-16 16:16:06 -07:00
Yineng Zhang
c0c6f543e4 chore: upgrade sgl-kernel 0.3.10 (#10500) 2025-09-16 02:00:53 -07:00
Feng Su
4c21b09074 [Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peng Wang <rocking@linux.alibaba.com>
2025-09-15 02:08:02 +08:00
fzyzcjy
4da5533682 Support profile args in Engine API (#6539) 2025-09-14 01:21:10 -07:00
Yineng Zhang
bfe01a5eef chore: upgrade v0.3.9.post2 sgl-kernel (#10297) 2025-09-11 04:10:29 -07:00
Lianmin Zheng
bcf1955f7e Revert "chore: upgrade v0.3.9 sgl-kernel" (#10245) 2025-09-09 19:05:20 -07:00
Yineng Zhang
d3ee70985f chore: upgrade v0.3.9 sgl-kernel (#10220) 2025-09-09 03:16:25 -07:00
Liangsheng Yin
e719bb0e84 [1/2] Refactor multi-tokenizer manager (#10074) 2025-09-07 19:13:34 +08:00
Jinyang Yuan
012584ecd5 perf: Avoid unnecessary data type conversions for DeepSeek-V3 on Blackwell (#9834)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-09-06 14:06:46 +08:00
hlu1
2985090084 Update flashinfer to 0.3.1 for B300 support (#10087)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-09-05 13:41:01 -07:00
JieXin Liang
1db649ac02 [feat] apply deep_gemm compile_mode to skip launch (#9879) 2025-09-02 03:20:30 -07:00
Yineng Zhang
349b491c63 chore: upgrade flashinfer 0.3.0 (#9864) 2025-09-01 03:07:19 -07:00
ybyang
5f77e1292d Support Multi Process Tokenizer Manager(#6555) (#8964)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-01 01:00:13 -07:00
Yineng Zhang
9970e3bf32 chore: upgrade sgl-kernel 0.3.7.post1 with deepgemm fix (#9822) 2025-08-30 04:02:25 -07:00
Yineng Zhang
3d8fc43400 chore: upgrade flashinfer 0.3.0rc1 (#9793) 2025-08-29 16:24:17 -07:00
Yineng Zhang
b962a296ed chore: upgrade sgl-kernel 0.3.7 (#9708) 2025-08-27 14:00:31 -07:00
Yineng Zhang
938e986e15 chore: upgrade flashinfer 0.2.14.post1 (#9578) 2025-08-25 00:12:17 -07:00
fzyzcjy
2600fc0d47 Overlapped weight offload (#8034) 2025-08-23 02:06:46 -07:00
Chanh Nguyen
127d4b0d5e Support GC Freezing to improve latency & throughput (#9241)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2025-08-23 13:43:09 +08:00
fzyzcjy
42c8704560 Add PDL support for quant kernel and rope kernel (#9106) 2025-08-20 01:56:29 -07:00
江家瑋
ca533580f2 [Docs] Correct and clarify notes in Engine docstring (#9313)
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
2025-08-18 13:24:19 -07:00
Hongbo Xu
2cc9eeab01 [4/n]decouple quantization implementation from vLLM dependency (#9191)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 12:05:46 -07:00
eigen
4dbf43601d fix: zero_init buffer (#9065)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 02:39:09 -07:00
Jiaqi Gu
c9ee738515 Fuse writing KV buffer into rope kernel (part 2: srt) (#9014)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-08-12 13:15:30 -07:00
zhyncs
f4ae50e97c fix: use flashinfer v0.2.11.post1 2025-08-11 02:49:25 -07:00
Yineng Zhang
84cb449eec Revert "chore: upgrade flashinfer 0.2.11 (#9036)" (#9057) 2025-08-11 00:16:39 -07:00
Yineng Zhang
dd001a5477 chore: upgrade flashinfer 0.2.11 (#9036) 2025-08-10 17:35:37 -07:00
Stefan He
8ecf6b9d24 Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079) 2025-08-10 16:08:59 -07:00
Lianmin Zheng
9a44b643c6 Fix CI (#9012) 2025-08-09 13:33:42 -07:00
Yineng Zhang
326a901df4 chore: upgrade sgl-kernel 0.3.3 (#8998) 2025-08-09 01:22:01 -07:00
ishandhanani
4e7f025219 chore(gb200): update to CUDA 12.9 and improve build process (#8772) 2025-08-08 13:42:47 -07:00
Lifu Huang
6210e2c4f0 Support GPU pinning for LoRA (#8697) 2025-08-06 19:39:45 -07:00
Yineng Zhang
3ae8e3ea8f chore: upgrade torch 2.8.0 (#8836) 2025-08-05 17:32:01 -07:00
Yineng Zhang
4f4e0e4162 chore: upgrade flashinfer 0.2.10 (#8827) 2025-08-05 12:04:01 -07:00
Yineng Zhang
1ea94d3b92 chore: upgrade flashinfer v0.2.9 (#8780) 2025-08-04 21:59:18 -07:00
Guanhua Wang
f7b2853ff8 [feat] support minimum token load balance in dp attention (#7379) 2025-08-03 00:46:47 -07:00
Nicolas Castet
82e6c3a65a Add support for NCCL symmetric memory for TP allreduces (#8238) 2025-08-01 23:30:55 +00:00
Cheng Wan
7a1f7fc504 [Feature] Hybrid EP and TP (#8590) 2025-07-31 02:53:25 -07:00
Cheng Wan
e179e0b797 update sgl-kernel for EP: python part (#8550) 2025-07-31 00:14:39 -07:00
Lianmin Zheng
a4c3b121d8 Split the scheduler into multiple mixin classes to reduce the file size (#8483) 2025-07-29 12:46:50 -07:00
Yineng Zhang
10ee89559e chore: upgrade flashinfer v0.2.9rc2 (#8406) 2025-07-27 01:41:22 -07:00
Yingchun Lai
36d6f0ba5b fix: fix the missing metrics on non-rank0 nodes (#7720) 2025-07-27 00:55:25 -07:00
Lianmin Zheng
ed2e313eb6 Clean up server_args, triton cache manager (#8332) 2025-07-25 14:14:51 -07:00
Swipe4057
8d1c5b948e chore: upgrade flashinfer v0.2.9rc1 (#8301)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-07-24 14:29:56 -07:00
Yineng Zhang
4953f4ca9a chore: upgrade sgl-kernel 0.2.7 (#8304) 2025-07-23 15:07:27 -07:00
Yineng Zhang
74f59ae555 chore: upgrade sgl-kernel 0.2.6.post1 (#8202) 2025-07-21 02:10:24 -07:00
Lianmin Zheng
55381a46ac Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181) 2025-07-19 22:41:30 -07:00