Byron Hsu
|
a9499885e9
|
[PD] Add transfer backend abstraction (#5328)
|
2025-04-14 01:39:39 +08:00 |
|
Liangsheng Yin
|
f765579046
|
Fix typo: infight -> inflight (#5357)
|
2025-04-14 01:25:30 +08:00 |
|
Yineng Zhang
|
f58b929a51
|
chore: upgrade sgl-kernel 0.0.8.post3 (#5342)
|
2025-04-13 00:45:59 -07:00 |
|
Yineng Zhang
|
c1270aabc5
|
docs: update adoption and sponsorship list with Oracle (#5343)
|
2025-04-12 22:55:25 -07:00 |
|
mlmz
|
8311b07fb9
|
Fix: Ensure tensors for dist.broadcast match NCCL backend device (#5322)
|
2025-04-12 22:50:37 -07:00 |
|
Yineng Zhang
|
c138025731
|
misc: update sagemaker Dockerfile (#5341)
|
2025-04-12 22:39:49 -07:00 |
|
Yineng Zhang
|
b62e7e99b8
|
feat: adapt merge_state (#5337)
|
2025-04-12 21:14:04 -07:00 |
|
Yineng Zhang
|
7d3b7c87f5
|
fix: determine if flashinfer is installed (#5336)
|
2025-04-12 19:59:13 -07:00 |
|
Yineng Zhang
|
75015bb688
|
ci: update release node (#5333)
|
2025-04-12 14:22:45 -07:00 |
|
Yineng Zhang
|
b371f7cd36
|
chore: bump sgl-kernel v0.0.8.post3 (#5332)
|
2025-04-12 12:53:37 -07:00 |
|
Yineng Zhang
|
812e82f35e
|
fix: solve cu118 issue for cutlass mla (#5331)
|
2025-04-12 12:51:09 -07:00 |
|
PGFLMG
|
4879e50c6d
|
[Feat] Add sparse attn to sgl-kernel (#5327)
|
2025-04-12 11:36:36 -07:00 |
|
tianlian yi
|
bc92107b03
|
Support server based rollout in Verlengine (#4848)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
|
2025-04-12 10:07:52 -07:00 |
|
Xiaoyu Zhang
|
3e4794aad8
|
refine fused_moe tuning docs (#5294)
|
2025-04-12 10:01:13 -07:00 |
|
Xiaoyu Zhang
|
690ec20587
|
Delete python/sglang/srt/layers/moe/fused_moe_triton/configs/E=257,N=… (#5321)
|
2025-04-12 10:00:03 -07:00 |
|
thyecust
|
2074a2e6b6
|
Fix: docs/backend/structured_outputs.ipynb (#4884)
|
2025-04-12 02:18:55 -07:00 |
|
Yineng Zhang
|
57de7c6b5f
|
feat: use fa3 mla by default on hopper (#5210)
Co-authored-by: yundai424 <yundai424@gmail.com>
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
|
2025-04-12 01:09:25 -07:00 |
|
Yineng Zhang
|
115ae2e728
|
chore: bump sgl-kernel v0.0.8.post2 (#5317)
|
2025-04-11 23:42:03 -07:00 |
|
Qingquan Song
|
aea98512a8
|
Fix fa3 window size setup (#5316)
|
2025-04-11 23:37:52 -07:00 |
|
Baizhou Zhang
|
e4155e96d0
|
Add flash_attn_varlen_func to sgl-kernel (#5315)
|
2025-04-11 23:36:36 -07:00 |
|
lambert0312
|
1b1b47a949
|
Fix w8a8_int8 model shared experts fusion load weights error (#5120)
|
2025-04-11 23:33:51 -07:00 |
|
Zhaoyi Li
|
3c9740d200
|
update variable naming and comments for rocm (#5299)
|
2025-04-11 23:15:05 -07:00 |
|
Yineng Zhang
|
2eb55770f9
|
misc: cleanup 3rdparty (#5311)
|
2025-04-11 22:53:50 -07:00 |
|
Trevor Morris
|
f65b8d5c89
|
Blackwell Cutlass MLA kernel (#5142)
|
2025-04-11 22:16:51 -07:00 |
|
Ke Bao
|
5ad0571903
|
Adjust ci test threshold (#5271)
|
2025-04-11 22:03:37 -07:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Yineng Zhang
|
611720919d
|
fix: use deepgemm only on hopper (#5310)
|
2025-04-11 20:48:24 -07:00 |
|
Yineng Zhang
|
4f288113ce
|
fix: update flash attn (#5308)
|
2025-04-11 16:23:09 -07:00 |
|
Yineng Zhang
|
136b8e6afb
|
fix: remove cublas_grouped_gemm (#5307)
|
2025-04-11 16:22:37 -07:00 |
|
Yineng Zhang
|
034c5256cc
|
misc: update blackwell Dockerfile (#5306)
|
2025-04-11 15:58:04 -07:00 |
|
Yineng Zhang
|
c1dd773c19
|
fix: use fa3 unit test on hopper only (#5304)
|
2025-04-11 15:10:49 -07:00 |
|
Yineng Zhang
|
6f8593799b
|
feat: add blackwell workflow (#5303)
|
2025-04-11 13:42:00 -07:00 |
|
Yineng Zhang
|
f774a0d275
|
feat: add blackwell Dockerfile (#5302)
|
2025-04-11 13:08:53 -07:00 |
|
Xiaoyu Zhang
|
60bcbf2a35
|
remove moe_align_block_size torch.zeros in small batch/expert mode (#5298)
|
2025-04-11 12:13:55 -07:00 |
|
Adarsh Shirawalmath
|
a0a9f6d64f
|
[Docs] Remove the older supported docs section (#5301)
|
2025-04-11 11:30:18 -07:00 |
|
Yineng Zhang
|
80aa8ca84e
|
fix: update update_wheel_index for cu128 (#5300)
|
2025-04-11 09:31:03 -07:00 |
|
Adarsh Shirawalmath
|
4aa6bab0b0
|
[Docs] Supported Model Docs - Major restructuring (#5290)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-04-11 09:17:47 -07:00 |
|
Yusong Gao
|
c35dcfdb30
|
[PD] fix: skip warmup request in disaggregation mode to prevent crash on timeout (#5292)
|
2025-04-11 23:03:07 +08:00 |
|
Yineng Zhang
|
c163bf4ff1
|
chore: bump sgl-kernel v0.0.8.post1 (#5289)
|
2025-04-11 02:11:53 -07:00 |
|
Yineng Zhang
|
5598634326
|
chore: relax the torch version restriction for sgl-kernel compilation (#5288)
|
2025-04-11 02:05:53 -07:00 |
|
Yineng Zhang
|
b75275b6f2
|
feat: add cu128 identifier for sgl-kernel (#5287)
|
2025-04-11 01:58:46 -07:00 |
|
Yineng Zhang
|
7074e9ca20
|
fix: enable fp4 compilation on cu128 (#5286)
|
2025-04-11 01:43:44 -07:00 |
|
Michael Yao
|
fc14cca088
|
Fix a 404 link in send_request.ipynb (#5280)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-11 01:38:45 -07:00 |
|
XinyuanTong
|
e7beff8a13
|
fix: examples for token_in_token_out_vlm (#5193)
|
2025-04-11 01:38:23 -07:00 |
|
mlmz
|
4d2e305149
|
doc: nested loop code for offline engine (#5244)
|
2025-04-11 01:36:30 -07:00 |
|
Mick
|
e53a0b3d5b
|
[fix] fix mrope positions not picked up (#5265)
|
2025-04-11 01:29:45 -07:00 |
|
Cheng Wan
|
038bc5d521
|
Support --enable-llama4-multimodal (#5254)
|
2025-04-11 01:24:14 -07:00 |
|
Chang Su
|
aee62d744b
|
Optimize GPU memory usage in FlashAttentionBackend's strided indexing (#5262)
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-04-11 00:34:17 -07:00 |
|
fzyzcjy
|
cd7e32e2cb
|
Optimize attention in llama4 (#5127)
|
2025-04-11 00:32:41 -07:00 |
|
HAI
|
8879944800
|
ROCm/AITER CK_MoE: update 2-stage kernels & support both Activations (#5228)
|
2025-04-10 18:19:57 -07:00 |
|