DefTruth
|
12ef7e3bc3
|
bugfix: fix merge_state_v2 cuda graph (#5419)
|
2025-04-15 10:18:47 -07:00 |
|
Lianmin Zheng
|
838fa0f218
|
[minor] cleanup cmakelists.txt (#5420)
|
2025-04-15 07:07:07 -07:00 |
|
shangmingc
|
f1b3b75fc6
|
[PD] Remove unused bootstrap param and fix port table type (#5423)
|
2025-04-15 21:21:20 +08:00 |
|
Liangsheng Yin
|
33b16ad178
|
Distinguish bootstrap key only in decode server (#5422)
|
2025-04-15 20:59:28 +08:00 |
|
shangmingc
|
ffde65a094
|
[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-04-15 19:29:31 +08:00 |
|
lambert0312
|
471650dee0
|
Fix broadcast use cuda device lead to memory capacity unbalanced (#5416)
|
2025-04-15 02:47:26 -07:00 |
|
Yuan Luo
|
d06a83fb01
|
Support dynamic connection and TP 16 (#5351)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-04-15 17:08:07 +08:00 |
|
Zhaoyang Hao
|
5d13440162
|
[FIX] Fix concatenation error in capture_bs when open --disable-cuda-graph-padding and without MTP (#5412)
|
2025-04-15 01:42:27 -07:00 |
|
JieXin Liang
|
f88f7e1943
|
[misc] fix ci flaky case (#5352)
|
2025-04-15 01:37:16 -07:00 |
|
Yuhong Guo
|
3dfc6023ce
|
Fix bench_serving with random-ids (#5214)
|
2025-04-15 01:34:35 -07:00 |
|
fzyzcjy
|
15e91d721b
|
Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (#5406)
|
2025-04-15 01:33:47 -07:00 |
|
Yineng Zhang
|
8aab7fdb21
|
chore: upgrade sgl-kernel 0.0.9 (#5401)
|
2025-04-14 22:37:59 -07:00 |
|
Yineng Zhang
|
e940dc4f06
|
chore: bump sgl-kernel 0.0.9 (#5400)
|
2025-04-14 21:34:04 -07:00 |
|
DefTruth
|
388e15c0db
|
kernel: support slightly faster merge_state_v2 cuda kernel (#5381)
|
2025-04-14 21:28:23 -07:00 |
|
Yineng Zhang
|
11421a3f44
|
fix: update pr-test-sgl-kernel (#5399)
|
2025-04-14 21:14:59 -07:00 |
|
Yineng Zhang
|
6c41fcf0e4
|
chore: upgrade DeepGEMM (#5395)
|
2025-04-14 20:32:46 -07:00 |
|
Yangcheng Li
|
ee9d6ca677
|
[fix/misc] remove duplicate row in deepseek v2 model (#5279)
|
2025-04-14 18:41:24 -07:00 |
|
Ximingwang-09
|
2dd6489468
|
Add H20 dtype fp8_w8a8 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5291)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-04-14 18:40:31 -07:00 |
|
lambert0312
|
61e7c4dd21
|
Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368)
|
2025-04-14 18:39:44 -07:00 |
|
Lianmin Zheng
|
dae7944440
|
minor clean up of sgl-kernel/CMakeLists.txt (#5393)
|
2025-04-14 18:38:44 -07:00 |
|
Baizhou Zhang
|
f6772f1497
|
[Fix] Turn off DeepGEMM by default (#5263)
|
2025-04-14 17:45:44 -07:00 |
|
Yineng Zhang
|
ac5b78baf6
|
fix: update test config (#5392)
|
2025-04-14 17:39:47 -07:00 |
|
Xiaoyu Zhang
|
38076dea84
|
apply fused moe gate in ds v3/r1 (#5371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-04-14 16:24:26 -07:00 |
|
Ke Bao
|
5e0a9b0981
|
Apply deepseek cuda rope (#5385)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-04-14 15:22:43 -07:00 |
|
JieXin Liang
|
bdde237562
|
[perf] experimental enhance fp8 per-tensor quant (#5370)
|
2025-04-14 12:35:43 -07:00 |
|
ybyang
|
e9fc2ac7b6
|
[PD Bug] fix MLA get_contiguous_buf_infos error (#5384)
|
2025-04-14 22:56:39 +08:00 |
|
Liangsheng Yin
|
44afde82d7
|
Fix PD disaggregation bugs (#5326)
|
2025-04-14 19:27:30 +08:00 |
|
yhyang201
|
072df75354
|
Support for Qwen2.5-VL Model in bitsandbytes Format (#5003)
|
2025-04-14 02:03:40 -07:00 |
|
fzyzcjy
|
defede5073
|
Fix DeepSeek DP Attention + torch compile (#5367)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-14 01:07:58 -07:00 |
|
Yongtong Wu
|
fc72871975
|
Free metadata_buffer_index after transfer finished (#5364)
|
2025-04-14 01:06:14 -07:00 |
|
Yongtong Wu
|
14e8bd889f
|
Free metadata_buffer_index after transfer finished (#5364)
|
2025-04-14 16:04:46 +08:00 |
|
yulei
|
adca585bfb
|
[DeepEP] Reduce routed scaling overhead (#5277)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-04-13 16:03:09 -07:00 |
|
Yineng Zhang
|
39d90449f3
|
feat: update experiment_runner (#5360)
|
2025-04-13 15:37:05 -07:00 |
|
Yineng Zhang
|
39e411385c
|
fix #5322 (#5359)
|
2025-04-13 13:57:36 -07:00 |
|
huangtingwei
|
5fbafbb8f8
|
fix MLATokenToKVPoolHost get_size_per_token bug (#5161)
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
|
2025-04-13 12:37:26 -07:00 |
|
Byron Hsu
|
a9499885e9
|
[PD] Add transfer backend abstraction (#5328)
|
2025-04-14 01:39:39 +08:00 |
|
Liangsheng Yin
|
f765579046
|
Fix typo: infight -> inflight (#5357)
|
2025-04-14 01:25:30 +08:00 |
|
Yineng Zhang
|
f58b929a51
|
chore: upgrade sgl-kernel 0.0.8.post3 (#5342)
|
2025-04-13 00:45:59 -07:00 |
|
Yineng Zhang
|
c1270aabc5
|
docs: update adoption and sponsorship list with Oracle (#5343)
|
2025-04-12 22:55:25 -07:00 |
|
mlmz
|
8311b07fb9
|
Fix: Ensure tensors for dist.broadcast match NCCL backend device (#5322)
|
2025-04-12 22:50:37 -07:00 |
|
Yineng Zhang
|
c138025731
|
misc: update sagemaker Dockerfile (#5341)
|
2025-04-12 22:39:49 -07:00 |
|
Yineng Zhang
|
b62e7e99b8
|
feat: adapt merge_state (#5337)
|
2025-04-12 21:14:04 -07:00 |
|
Yineng Zhang
|
7d3b7c87f5
|
fix: determine if flashinfer is installed (#5336)
|
2025-04-12 19:59:13 -07:00 |
|
Yineng Zhang
|
75015bb688
|
ci: update release node (#5333)
|
2025-04-12 14:22:45 -07:00 |
|
Yineng Zhang
|
b371f7cd36
|
chore: bump sgl-kernel v0.0.8.post3 (#5332)
|
2025-04-12 12:53:37 -07:00 |
|
Yineng Zhang
|
812e82f35e
|
fix: solve cu118 issue for cutlass mla (#5331)
|
2025-04-12 12:51:09 -07:00 |
|
PGFLMG
|
4879e50c6d
|
[Feat] Add sparse attn to sgl-kernel (#5327)
|
2025-04-12 11:36:36 -07:00 |
|
tianlian yi
|
bc92107b03
|
Support server based rollout in Verlengine (#4848)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
|
2025-04-12 10:07:52 -07:00 |
|
Xiaoyu Zhang
|
3e4794aad8
|
refine fused_moe tuning docs (#5294)
|
2025-04-12 10:01:13 -07:00 |
|
Xiaoyu Zhang
|
690ec20587
|
Delete python/sglang/srt/layers/moe/fused_moe_triton/configs/E=257,N=… (#5321)
|
2025-04-12 10:00:03 -07:00 |
|