maxiao1
|
75cd34d172
|
change sgl_kernel WARP_SIZE to 64
|
2025-11-03 10:17:53 +08:00 |
|
maxiao1
|
32b1ccaf62
|
修改sgl-kernel下的setup_hip.py
|
2025-10-25 13:11:02 +08:00 |
|
maxiao
|
251235c229
|
适配v0.5.4
|
2025-10-25 12:16:25 +08:00 |
|
blzheng
|
13fb8b5489
|
[CPU] Optimize FP16 decode_attention_cpu (#10652)
|
2025-10-22 21:39:51 -07:00 |
|
Zaili Wang
|
007b849b0e
|
[CPU] misc updates (#11906)
|
2025-10-22 21:10:05 -07:00 |
|
Johnny
|
e7aa4664b3
|
[NVIDIA] Build CUDA 13 (#11299)
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-22 20:03:12 -07:00 |
|
Johnny
|
4b65ed42cc
|
[NVIDIA] upstream FA4 and fix cccl path (#11929)
|
2025-10-21 21:18:25 -07:00 |
|
Fan Yin
|
23afdfd1c2
|
[sgl-kernel] support flashmla libtorch (#11717)
|
2025-10-21 21:17:50 -07:00 |
|
Serge Panev
|
2b1da821b5
|
[NVIDIA] Add new SMs support for Spark & Thor (#11287)
Signed-off-by: Serge Panev <spanev@nvidia.com>
|
2025-10-22 02:02:24 +08:00 |
|
Yuan Luo
|
271d3d0d50
|
Support mrope triton kernel and add unit test (#11722)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2025-10-20 11:51:07 +08:00 |
|
sglang-bot
|
283c8ba031
|
chore: bump sgl-kernel version to 0.3.16.post3 (#11733)
|
2025-10-19 21:44:15 -05:00 |
|
Kangyan-Zhou
|
27a223aba4
|
Improve Kernel Build Time (#11508)
|
2025-10-19 18:11:48 -07:00 |
|
hlu1
|
3b80232d06
|
[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-10-19 17:13:39 -07:00 |
|
Johnny
|
252dc4e112
|
[NVIDIA] FA3/FA4 Fix (#11606)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-19 17:10:10 -07:00 |
|
fzyzcjy
|
a27825ae01
|
Support not officially supported high sgl-kernel version with low srt version (#11786)
|
2025-10-19 16:11:59 +08:00 |
|
Fan Yin
|
3289da5b41
|
[sgl-kernel] support hadamard (#11663)
|
2025-10-15 19:00:44 -07:00 |
|
Fan Yin
|
5464457251
|
[sgl-kernel] Optimize gguf test (#11667)
|
2025-10-15 15:45:53 -07:00 |
|
Qi Yuhang
|
6c01844f45
|
[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674)
|
2025-10-15 13:39:31 -07:00 |
|
fzyzcjy
|
32803fb279
|
Super tiny improve FA3 import error message (#11590)
|
2025-10-14 22:06:31 -07:00 |
|
sglang-bot
|
98923880bc
|
chore: bump sgl-kernel version to 0.3.16.post2 (#11583)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-13 20:52:38 -07:00 |
|
Yineng Zhang
|
f792e3c561
|
Revert "[NVIDIA] BUMP FA3 (#11444)" (#11582)
|
2025-10-13 20:51:45 -07:00 |
|
sglang-bot
|
60b0503227
|
chore: bump sgl-kernel version to 0.3.16.post1 (#11573)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-13 16:26:18 -07:00 |
|
Qi Yuhang
|
dc48c4c0e3
|
[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534)
|
2025-10-13 16:24:48 -07:00 |
|
Johnny
|
b8c430f1ce
|
[NVIDIA] BUMP FA3 (#11444)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2025-10-13 09:30:57 -07:00 |
|
Qi Yuhang
|
9a30914e94
|
[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-10-12 20:19:21 -07:00 |
|
sglang-bot
|
2db2cddd12
|
chore: bump sgl-kernel version to 0.3.16 (#11476)
|
2025-10-11 22:04:49 -07:00 |
|
PGFLMG
|
8fdcd98efe
|
[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019)
|
2025-10-11 14:04:57 -07:00 |
|
fzyzcjy
|
21337b22b9
|
Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-10-11 15:59:03 +08:00 |
|
Lianmin Zheng
|
9b8ebb2798
|
move more files under srt/utils (#11285)
|
2025-10-09 16:46:15 -07:00 |
|
Mick
|
a3c2ea4451
|
fix: fix revision for sgl-flash-attn in sgl-kernel (#11327)
|
2025-10-08 15:50:44 -07:00 |
|
Yuan Luo
|
4f42c8cd3e
|
[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-07 14:31:11 +00:00 |
|
sglang-bot
|
8c9670375f
|
chore: bump sgl-kernel version to 0.3.15 (#11281)
|
2025-10-06 18:17:51 -07:00 |
|
Yineng Zhang
|
fb27d38305
|
docs: update sgl-kernel README (#11286)
|
2025-10-06 17:55:22 -07:00 |
|
Lifu Huang
|
748f86f3de
|
[Bug] Fix incorrect assertion in FA4 and add UT. (#11182)
|
2025-10-06 14:58:39 -07:00 |
|
Lianmin Zheng
|
d645ae90a3
|
Rename runner labels (#11228)
|
2025-10-05 18:05:41 -07:00 |
|
Lianmin Zheng
|
148d8d485d
|
Update DeepGEMM repository tag to specific commit (#11229)
|
2025-10-05 13:47:36 -07:00 |
|
PGFLMG
|
1a599509cc
|
chore: bump sgl-kernel v0.3.14.post1 (#11137)
|
2025-10-05 13:46:43 -07:00 |
|
DarkSharpness
|
e0b2d3eebe
|
[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-05 10:19:03 -07:00 |
|
PGFLMG
|
580051c5a8
|
chore: bump sgl-kernel v0.3.14 (#11067)
|
2025-09-30 02:53:24 -07:00 |
|
Xiaoyu Zhang
|
11965b0daf
|
Fix sgl-kernel benchmark dead code (#11022)
|
2025-09-29 15:06:40 +08:00 |
|
Zhihao Zhang
|
24f7cb1ece
|
[speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
|
2025-09-28 21:06:59 -07:00 |
|
Lifu Huang
|
e98d9346c7
|
[1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940)
|
2025-09-28 19:59:14 -07:00 |
|
Kangyan-Zhou
|
0c9174108a
|
Unify SGL Kernel Releases (#10701)
|
2025-09-28 19:48:28 -07:00 |
|
Lianmin Zheng
|
07440f5f34
|
Fix FusedSetKVBufferArg in RotaryEmbedding (#11003)
|
2025-09-28 11:17:27 -07:00 |
|
Yuan Luo
|
42245551ef
|
[sgl-kernel] Optimize concat_mla_k kernel (#10543)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
|
2025-09-28 23:04:22 +08:00 |
|
Lianmin Zheng
|
35ec2a45a8
|
[minor] Remove deprecated function get_ip (#10883)
|
2025-09-25 16:18:04 -07:00 |
|
Yuhao Yao
|
fe531d6f4e
|
[Bug] Fix Issue#10215 (#10572)
|
2025-09-25 09:51:50 +08:00 |
|
Xiaoyu Zhang
|
c4e314f986
|
Restruct sgl-kernel benchmark (#10861)
|
2025-09-25 07:45:25 +08:00 |
|
Yineng Zhang
|
e53df7c009
|
chore: bump sgl-kernel v0.3.12 (#10732)
|
2025-09-22 14:39:25 -07:00 |
|
Qi Yuhang
|
0f04a5f428
|
Optimize cutlass int8 gemm kernel for large M on SM89 Ada GPU (#10714)
|
2025-09-21 17:04:27 -07:00 |
|