sglang-bot
|
1053e1be17
|
chore: bump SGLang version to 0.5.4 (#12027)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-23 18:01:40 -07:00 |
|
Zhengyi Lai
|
81fd2b0ee0
|
fix(deepep): resolve benchmark failure on 4×IB-card setup by aligning tuning config with DeepEP commit bdd119f8 (#11965)
|
2025-10-22 21:20:54 -07:00 |
|
Liangsheng Yin
|
9d61205dac
|
[lint] improve ruff check (#11922)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-10-22 11:32:50 +08:00 |
|
b8zhong
|
d0a64c7e2c
|
vlm: enforce pybase64 for image and str encode/decode (#10700)
|
2025-10-21 19:05:32 +08:00 |
|
Cheng Wan
|
5b214b50b6
|
[Refactor] move deep_gemm_wrapper out of quantization (#11784)
|
2025-10-17 18:57:54 -07:00 |
|
Yineng Zhang
|
da681f35d3
|
Revert "Set csgmv as default lora backend. (#11488)" (#11735)
|
2025-10-17 12:01:36 -05:00 |
|
sglang-bot
|
85ebeecf06
|
chore: bump SGLang version to 0.5.3.post3 (#11693)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-16 13:14:55 -07:00 |
|
Lifu Huang
|
b0d20cdec7
|
Set csgmv as default lora backend. (#11488)
|
2025-10-15 23:53:24 -05:00 |
|
sglang-bot
|
baf277a9bf
|
chore: bump SGLang version to 0.5.3.post2 (#11680)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-15 16:49:14 -07:00 |
|
sglang-bot
|
758b887ad1
|
chore: bump SGLang version to 0.5.3.post1 (#11324)
|
2025-10-09 15:19:59 -07:00 |
|
Cheng Wan
|
3c06b673af
|
[8/N] MoE Refactor: deprecate EPMoE (#11211)
|
2025-10-07 21:51:41 -07:00 |
|
Yuan Luo
|
4f42c8cd3e
|
[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-07 14:31:11 +00:00 |
|
sglang-bot
|
a4a3d82393
|
chore: bump SGLang version to 0.5.3 (#11263)
|
2025-10-06 20:07:02 +08:00 |
|
sglang-bot
|
0b13cbb7c9
|
chore: bump SGLang version to 0.5.3rc2 (#11259)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-06 01:12:10 -07:00 |
|
Yuan Luo
|
590f2da052
|
[Feat] Support Torch Symm Mem AllReduce (#10571)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-05 13:55:19 -07:00 |
|
yhyang201
|
48e9e71930
|
Add --max-new-tokens CLI flag for MMMU evaluation (#11217)
|
2025-10-04 17:35:53 -07:00 |
|
fzyzcjy
|
fdc4e1e570
|
Tiny move files to utils folder (#11166)
|
2025-10-03 22:40:06 +08:00 |
|
Yuan Luo
|
42245551ef
|
[sgl-kernel] Optimize concat_mla_k kernel (#10543)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
|
2025-09-28 23:04:22 +08:00 |
|
lukec
|
77830a265e
|
Add fuse_moe per-channel tune (#10915)
|
2025-09-25 21:12:09 +08:00 |
|
Xiaoyu Zhang
|
c4e314f986
|
Restruct sgl-kernel benchmark (#10861)
|
2025-09-25 07:45:25 +08:00 |
|
Yiakwy
|
984730b732
|
add tunning files for QWEN-3-NEXT (#10794)
|
2025-09-23 12:46:30 -07:00 |
|
ZhengHSI
|
adc24a3a0c
|
fix ceval (#10504)
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
|
2025-09-24 02:35:25 +08:00 |
|
Yuan Luo
|
616a3e20df
|
[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-09-19 14:12:09 +08:00 |
|
zhannngchen
|
7a68b4225a
|
[improvement] add average input/output token length for hicache benchmark stats output (#10525)
|
2025-09-18 00:38:03 -07:00 |
|
zhannngchen
|
541551cefe
|
[bugfix]hicache bench_long_context.py run failed (#10523)
|
2025-09-17 11:27:06 +08:00 |
|
ykwd
|
4bb08f6e07
|
[Hicache] Evaluate Per-Round Metrics in Multiturn Bench (#10203)
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
|
2025-09-15 19:34:40 -07:00 |
|
Yineng Zhang
|
86a32bb5cd
|
chore: bump v0.5.3rc0 (#10468)
|
2025-09-15 03:55:18 -07:00 |
|
hzh0425
|
2a37b24d23
|
[HotFix]: Hot fix import path in 3fs_bench_client.py (#10463)
|
2025-09-14 23:45:46 -07:00 |
|
Vincent Zhong
|
1489cd6c02
|
[docs / oneliner] update mmmu docs instruction (#9768)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-15 11:26:39 +08:00 |
|
chenge@xiaohongshu.com
|
1b1701f1f7
|
model: support dots.vlm1 model (#8778)
Co-authored-by: weishi <bushou@xiaohongshu.com>
Co-authored-by: Ezra-Yu <1105212286@qq.com>
Co-authored-by: Jianfei Wang <905787410@qq.com>
Co-authored-by: qianwu <wangjianfei@xiaohongshu.com>
|
2025-09-12 17:38:38 +08:00 |
|
strgrb
|
fac07c9b08
|
Support LingV2 model (#10359)
Co-authored-by: 羽癫 <yudian.zy@antgroup.com>
Co-authored-by: guoyuhong <yuhong.gyh@antgroup.com>
|
2025-09-11 23:53:52 -07:00 |
|
Yineng Zhang
|
b0d25e72c4
|
chore: bump v0.5.2 (#10221)
|
2025-09-11 16:09:20 -07:00 |
|
Sundara Raman Ramachandran
|
a1d038924b
|
[Benchmark] Prefil-only benchmark scripts (#10240)
|
2025-09-10 10:59:07 +08:00 |
|
Yuan Luo
|
cb3918a091
|
Optimize moe_sum_reduce_kernel (#9477)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-09-07 09:16:18 +08:00 |
|
Baizhou Zhang
|
beac202bfd
|
Add lora_path argument to bench_multiturn.py (#10092)
|
2025-09-05 19:20:42 -07:00 |
|
DevashishLal-CB
|
13705dae06
|
[Fix] Add speculative_draft_model_revision to server_args (#5255)
Signed-off-by: Devashish Lal <devashish@rivosinc.com>
|
2025-09-05 19:45:46 +08:00 |
|
Yineng Zhang
|
fa9c82d339
|
chore: bump v0.5.2rc2 (#10050)
|
2025-09-04 20:07:27 -07:00 |
|
Xiaoyu Zhang
|
b1fb7e458c
|
[benchmark] add flashinfer_allreduce_fusion benchmark (#9937)
|
2025-09-03 16:31:01 +08:00 |
|
Yineng Zhang
|
18f91eb639
|
chore: bump v0.5.2rc1 (#9920)
|
2025-09-02 04:43:34 -07:00 |
|
Lifu Huang
|
1fbfdebe6b
|
[chore] fix dead links in doc (#9913)
|
2025-09-02 00:28:26 -07:00 |
|
Yineng Zhang
|
16e56ea693
|
chore: bump v0.5.2rc0 (#9862)
|
2025-09-01 03:07:36 -07:00 |
|
hzh0425
|
8c2ffaaf0f
|
fix(hicahce-long-bench): adjust context workload generator to use full query set (#9847)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-08-31 14:51:18 -07:00 |
|
Pawel Kowalski
|
20445327b2
|
fix inconsistent arguments for generated shared prefix bench (#9073)
Co-authored-by: Pawel Kowalski <pawel.kowalski@silo.ai>
|
2025-08-31 14:27:33 -07:00 |
|
Kaixi Hou
|
5c34b4f1c7
|
[NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf (#9556)
|
2025-08-29 17:17:03 -07:00 |
|
pansicheng
|
09a1df2231
|
add bench_mix.py (#9788)
|
2025-08-28 23:44:26 -07:00 |
|
Xinyuan Tong
|
f84b57c80e
|
Move git clone command up from README (#9740)
|
2025-08-28 00:27:00 -07:00 |
|
Liangsheng Yin
|
d0934a5192
|
gpt-oss blog reproduction document (#9728)
|
2025-08-28 10:15:08 +08:00 |
|
Yineng Zhang
|
bc80dc4ce0
|
chore: bump v0.5.1.post3 (#9716)
|
2025-08-27 15:42:42 -07:00 |
|
ehuaa
|
8f7b1c31e8
|
Add A100 fused MoE kernel configs for Dpsk (#9677)
|
2025-08-26 20:49:48 -07:00 |
|
hzh0425
|
c04c17edfa
|
refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555)
Co-authored-by: Teng Ma <805522925@qq.com>
|
2025-08-26 17:55:20 -07:00 |
|