Commit Graph

4977 Commits

Author SHA1 Message Date
DefTruth
388e15c0db kernel: support slightly faster merge_state_v2 cuda kernel (#5381) 2025-04-14 21:28:23 -07:00
Yineng Zhang
11421a3f44 fix: update pr-test-sgl-kernel (#5399) 2025-04-14 21:14:59 -07:00
Yineng Zhang
6c41fcf0e4 chore: upgrade DeepGEMM (#5395) 2025-04-14 20:32:46 -07:00
Yangcheng Li
ee9d6ca677 [fix/misc] remove duplicate row in deepseek v2 model (#5279) 2025-04-14 18:41:24 -07:00
Ximingwang-09
2dd6489468 Add H20 dtype fp8_w8a8 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5291)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-04-14 18:40:31 -07:00
lambert0312
61e7c4dd21 Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368) 2025-04-14 18:39:44 -07:00
Lianmin Zheng
dae7944440 minor clean up of sgl-kernel/CMakeLists.txt (#5393) 2025-04-14 18:38:44 -07:00
Baizhou Zhang
f6772f1497 [Fix] Turn off DeepGEMM by default (#5263) 2025-04-14 17:45:44 -07:00
Yineng Zhang
ac5b78baf6 fix: update test config (#5392) 2025-04-14 17:39:47 -07:00
Xiaoyu Zhang
38076dea84 apply fused moe gate in ds v3/r1 (#5371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-04-14 16:24:26 -07:00
Ke Bao
5e0a9b0981 Apply deepseek cuda rope (#5385)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-04-14 15:22:43 -07:00
JieXin Liang
bdde237562 [perf] experimental enhance fp8 per-tensor quant (#5370) 2025-04-14 12:35:43 -07:00
ybyang
e9fc2ac7b6 [PD Bug] fix MLA get_contiguous_buf_infos error (#5384) 2025-04-14 22:56:39 +08:00
Liangsheng Yin
44afde82d7 Fix PD disaggregation bugs (#5326) 2025-04-14 19:27:30 +08:00
yhyang201
072df75354 Support for Qwen2.5-VL Model in bitsandbytes Format (#5003) 2025-04-14 02:03:40 -07:00
fzyzcjy
defede5073 Fix DeepSeek DP Attention + torch compile (#5367)
Co-authored-by: ispobock <ispobaoke@163.com>
2025-04-14 01:07:58 -07:00
Yongtong Wu
fc72871975 Free metadata_buffer_index after transfer finished (#5364) 2025-04-14 01:06:14 -07:00
Yongtong Wu
14e8bd889f Free metadata_buffer_index after transfer finished (#5364) 2025-04-14 16:04:46 +08:00
yulei
adca585bfb [DeepEP] Reduce routed scaling overhead (#5277)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-04-13 16:03:09 -07:00
Yineng Zhang
39d90449f3 feat: update experiment_runner (#5360) 2025-04-13 15:37:05 -07:00
Yineng Zhang
39e411385c fix #5322 (#5359) 2025-04-13 13:57:36 -07:00
huangtingwei
5fbafbb8f8 fix MLATokenToKVPoolHost get_size_per_token bug (#5161)
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
2025-04-13 12:37:26 -07:00
Byron Hsu
a9499885e9 [PD] Add transfer backend abstraction (#5328) 2025-04-14 01:39:39 +08:00
Liangsheng Yin
f765579046 Fix typo: infight -> inflight (#5357) 2025-04-14 01:25:30 +08:00
Yineng Zhang
f58b929a51 chore: upgrade sgl-kernel 0.0.8.post3 (#5342) 2025-04-13 00:45:59 -07:00
Yineng Zhang
c1270aabc5 docs: update adoption and sponsorship list with Oracle (#5343) 2025-04-12 22:55:25 -07:00
mlmz
8311b07fb9 Fix: Ensure tensors for dist.broadcast match NCCL backend device (#5322) 2025-04-12 22:50:37 -07:00
Yineng Zhang
c138025731 misc: update sagemaker Dockerfile (#5341) 2025-04-12 22:39:49 -07:00
Yineng Zhang
b62e7e99b8 feat: adapt merge_state (#5337) 2025-04-12 21:14:04 -07:00
Yineng Zhang
7d3b7c87f5 fix: determine if flashinfer is installed (#5336) 2025-04-12 19:59:13 -07:00
Yineng Zhang
75015bb688 ci: update release node (#5333) 2025-04-12 14:22:45 -07:00
Yineng Zhang
b371f7cd36 chore: bump sgl-kernel v0.0.8.post3 (#5332) 2025-04-12 12:53:37 -07:00
Yineng Zhang
812e82f35e fix: solve cu118 issue for cutlass mla (#5331) 2025-04-12 12:51:09 -07:00
PGFLMG
4879e50c6d [Feat] Add sparse attn to sgl-kernel (#5327) 2025-04-12 11:36:36 -07:00
tianlian yi
bc92107b03 Support server based rollout in Verlengine (#4848)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
2025-04-12 10:07:52 -07:00
Xiaoyu Zhang
3e4794aad8 refine fused_moe tuning docs (#5294) 2025-04-12 10:01:13 -07:00
Xiaoyu Zhang
690ec20587 Delete python/sglang/srt/layers/moe/fused_moe_triton/configs/E=257,N=… (#5321) 2025-04-12 10:00:03 -07:00
thyecust
2074a2e6b6 Fix: docs/backend/structured_outputs.ipynb (#4884) 2025-04-12 02:18:55 -07:00
Yineng Zhang
57de7c6b5f feat: use fa3 mla by default on hopper (#5210)
Co-authored-by: yundai424 <yundai424@gmail.com>
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
2025-04-12 01:09:25 -07:00
Yineng Zhang
115ae2e728 chore: bump sgl-kernel v0.0.8.post2 (#5317) 2025-04-11 23:42:03 -07:00
Qingquan Song
aea98512a8 Fix fa3 window size setup (#5316) 2025-04-11 23:37:52 -07:00
Baizhou Zhang
e4155e96d0 Add flash_attn_varlen_func to sgl-kernel (#5315) 2025-04-11 23:36:36 -07:00
lambert0312
1b1b47a949 Fix w8a8_int8 model shared experts fusion load weights error (#5120) 2025-04-11 23:33:51 -07:00
Zhaoyi Li
3c9740d200 update variable naming and comments for rocm (#5299) 2025-04-11 23:15:05 -07:00
Yineng Zhang
2eb55770f9 misc: cleanup 3rdparty (#5311) 2025-04-11 22:53:50 -07:00
Trevor Morris
f65b8d5c89 Blackwell Cutlass MLA kernel (#5142) 2025-04-11 22:16:51 -07:00
Ke Bao
5ad0571903 Adjust ci test threshold (#5271) 2025-04-11 22:03:37 -07:00
Mick
34ef6c8135 [VLM] Adopt fast image processor by default (#5065) 2025-04-11 21:46:58 -07:00
Yineng Zhang
611720919d fix: use deepgemm only on hopper (#5310) 2025-04-11 20:48:24 -07:00
Yineng Zhang
4f288113ce fix: update flash attn (#5308) 2025-04-11 16:23:09 -07:00