sglang

Author	SHA1	Message	Date
sglang-bot	1053e1be17	chore: bump SGLang version to 0.5.4 (#12027 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-23 18:01:40 -07:00
Zhengyi Lai	81fd2b0ee0	fix(deepep): resolve benchmark failure on 4×IB-card setup by aligning tuning config with DeepEP commit bdd119f8 (#11965 )	2025-10-22 21:20:54 -07:00
Liangsheng Yin	9d61205dac	[lint] improve ruff check (#11922 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-22 11:32:50 +08:00
b8zhong	d0a64c7e2c	vlm: enforce pybase64 for image and str encode/decode (#10700 )	2025-10-21 19:05:32 +08:00
Cheng Wan	5b214b50b6	[Refactor] move `deep_gemm_wrapper` out of `quantization` (#11784 )	2025-10-17 18:57:54 -07:00
Yineng Zhang	da681f35d3	Revert "Set csgmv as default lora backend. (#11488 )" (#11735 )	2025-10-17 12:01:36 -05:00
sglang-bot	85ebeecf06	chore: bump SGLang version to 0.5.3.post3 (#11693 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-16 13:14:55 -07:00
Lifu Huang	b0d20cdec7	Set csgmv as default lora backend. (#11488 )	2025-10-15 23:53:24 -05:00
sglang-bot	baf277a9bf	chore: bump SGLang version to 0.5.3.post2 (#11680 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-15 16:49:14 -07:00
sglang-bot	758b887ad1	chore: bump SGLang version to 0.5.3.post1 (#11324 )	2025-10-09 15:19:59 -07:00
Cheng Wan	3c06b673af	[8/N] MoE Refactor: deprecate `EPMoE` (#11211 )	2025-10-07 21:51:41 -07:00
Yuan Luo	4f42c8cd3e	[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-07 14:31:11 +00:00
sglang-bot	a4a3d82393	chore: bump SGLang version to 0.5.3 (#11263 )	2025-10-06 20:07:02 +08:00
sglang-bot	0b13cbb7c9	chore: bump SGLang version to 0.5.3rc2 (#11259 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-06 01:12:10 -07:00
Yuan Luo	590f2da052	[Feat] Support Torch Symm Mem AllReduce (#10571 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-05 13:55:19 -07:00
yhyang201	48e9e71930	Add --max-new-tokens CLI flag for MMMU evaluation (#11217 )	2025-10-04 17:35:53 -07:00
fzyzcjy	fdc4e1e570	Tiny move files to utils folder (#11166 )	2025-10-03 22:40:06 +08:00
Yuan Luo	42245551ef	[sgl-kernel] Optimize concat_mla_k kernel (#10543 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com>	2025-09-28 23:04:22 +08:00
lukec	77830a265e	Add fuse_moe per-channel tune (#10915 )	2025-09-25 21:12:09 +08:00
Xiaoyu Zhang	c4e314f986	Restruct sgl-kernel benchmark (#10861 )	2025-09-25 07:45:25 +08:00
Yiakwy	984730b732	add tunning files for QWEN-3-NEXT (#10794 )	2025-09-23 12:46:30 -07:00
ZhengHSI	adc24a3a0c	fix ceval (#10504 ) Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>	2025-09-24 02:35:25 +08:00
Yuan Luo	616a3e20df	[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-09-19 14:12:09 +08:00
zhannngchen	7a68b4225a	[improvement] add average input/output token length for hicache benchmark stats output (#10525 )	2025-09-18 00:38:03 -07:00
zhannngchen	541551cefe	[bugfix]hicache bench_long_context.py run failed (#10523 )	2025-09-17 11:27:06 +08:00
ykwd	4bb08f6e07	[Hicache] Evaluate Per-Round Metrics in Multiturn Bench (#10203 ) Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>	2025-09-15 19:34:40 -07:00
Yineng Zhang	86a32bb5cd	chore: bump v0.5.3rc0 (#10468 )	2025-09-15 03:55:18 -07:00
hzh0425	2a37b24d23	[HotFix]: Hot fix import path in 3fs_bench_client.py (#10463 )	2025-09-14 23:45:46 -07:00
Vincent Zhong	1489cd6c02	[docs / oneliner] update mmmu docs instruction (#9768 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-15 11:26:39 +08:00
chenge@xiaohongshu.com	1b1701f1f7	model: support dots.vlm1 model (#8778 ) Co-authored-by: weishi <bushou@xiaohongshu.com> Co-authored-by: Ezra-Yu <1105212286@qq.com> Co-authored-by: Jianfei Wang <905787410@qq.com> Co-authored-by: qianwu <wangjianfei@xiaohongshu.com>	2025-09-12 17:38:38 +08:00
strgrb	fac07c9b08	Support LingV2 model (#10359 ) Co-authored-by: 羽癫 <yudian.zy@antgroup.com> Co-authored-by: guoyuhong <yuhong.gyh@antgroup.com>	2025-09-11 23:53:52 -07:00
Yineng Zhang	b0d25e72c4	chore: bump v0.5.2 (#10221 )	2025-09-11 16:09:20 -07:00
Sundara Raman Ramachandran	a1d038924b	[Benchmark] Prefil-only benchmark scripts (#10240 )	2025-09-10 10:59:07 +08:00
Yuan Luo	cb3918a091	Optimize moe_sum_reduce_kernel (#9477 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-09-07 09:16:18 +08:00
Baizhou Zhang	beac202bfd	Add lora_path argument to bench_multiturn.py (#10092 )	2025-09-05 19:20:42 -07:00
DevashishLal-CB	13705dae06	[Fix] Add speculative_draft_model_revision to server_args (#5255 ) Signed-off-by: Devashish Lal <devashish@rivosinc.com>	2025-09-05 19:45:46 +08:00
Yineng Zhang	fa9c82d339	chore: bump v0.5.2rc2 (#10050 )	2025-09-04 20:07:27 -07:00
Xiaoyu Zhang	b1fb7e458c	[benchmark] add flashinfer_allreduce_fusion benchmark (#9937 )	2025-09-03 16:31:01 +08:00
Yineng Zhang	18f91eb639	chore: bump v0.5.2rc1 (#9920 )	2025-09-02 04:43:34 -07:00
Lifu Huang	1fbfdebe6b	[chore] fix dead links in doc (#9913 )	2025-09-02 00:28:26 -07:00
Yineng Zhang	16e56ea693	chore: bump v0.5.2rc0 (#9862 )	2025-09-01 03:07:36 -07:00
hzh0425	8c2ffaaf0f	fix(hicahce-long-bench): adjust context workload generator to use full query set (#9847 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-31 14:51:18 -07:00
Pawel Kowalski	20445327b2	fix inconsistent arguments for generated shared prefix bench (#9073 ) Co-authored-by: Pawel Kowalski <pawel.kowalski@silo.ai>	2025-08-31 14:27:33 -07:00
Kaixi Hou	5c34b4f1c7	[NVIDIA] [2/N] Optimize `silu_and_mul_scaled_fp4_grouped_quant` perf (#9556 )	2025-08-29 17:17:03 -07:00
pansicheng	09a1df2231	add bench_mix.py (#9788 )	2025-08-28 23:44:26 -07:00
Xinyuan Tong	f84b57c80e	Move git clone command up from README (#9740 )	2025-08-28 00:27:00 -07:00
Liangsheng Yin	d0934a5192	gpt-oss blog reproduction document (#9728 )	2025-08-28 10:15:08 +08:00
Yineng Zhang	bc80dc4ce0	chore: bump v0.5.1.post3 (#9716 )	2025-08-27 15:42:42 -07:00
ehuaa	8f7b1c31e8	Add A100 fused MoE kernel configs for Dpsk (#9677 )	2025-08-26 20:49:48 -07:00
hzh0425	c04c17edfa	refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555 ) Co-authored-by: Teng Ma <805522925@qq.com>	2025-08-26 17:55:20 -07:00

1 2 3 4 5 ...

352 Commits