sglang

Author	SHA1	Message	Date
Mick	4fa44d63c6	chore: improve mmmu benchmark (#7000 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-26 16:19:45 +08:00
Yineng Zhang	2272c2a5b5	chore: bump v0.4.9.post4 (#8305 )	2025-07-25 17:12:47 -07:00
Zhiqiang Xie	ce86e201df	bug fix and tag (#8282 )	2025-07-23 16:50:31 +08:00
Yineng Zhang	01c000043c	chore: bump v0.4.9.post3 (#8265 )	2025-07-22 15:55:48 -07:00
zhongwei	ff45ab7a5f	[Benchmark] add disable-auto-run param for hicache/bench_multiturn (#7822 ) Co-authored-by: zhongwei.ren <zhongwei.ren@bytedance.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-22 14:02:40 -07:00
Cheng Wan	abda2542d5	Fix tuning_fused_moe_triton.py (#8175 )	2025-07-19 17:33:50 -07:00
Hongbo Xu	1f76fc8747	[3/n] chore: decouple AWQ implementation from vLLM dependency (#8113 ) Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>	2025-07-18 11:45:22 -07:00
Yineng Zhang	eb118d88c4	chore: bump v0.4.9.post2 (#7963 )	2025-07-11 21:11:20 -07:00
Yineng Zhang	066f4ec91f	chore: bump v0.4.9.post1 (#7882 )	2025-07-09 00:28:17 -07:00
Yuan Luo	253454de9b	Integrate triton moe kernel (#7689 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-07-06 20:05:49 -07:00
Yineng Zhang	ec5f9c6269	chore: bump v0.4.9 (#7802 )	2025-07-05 17:40:29 -07:00
Yineng Zhang	69183f8808	chore: bump v0.4.8.post1 (#7559 )	2025-06-26 02:21:12 -07:00
Xiaoyu Zhang	8ecad0b16f	[benchmark] fbgemm benchmark support bandwidth report and support fbgemm_cutlass_gmm (#7422 )	2025-06-24 09:44:55 -07:00
Yineng Zhang	7c3a12c000	chore: bump v0.4.8 (#7493 )	2025-06-23 23:14:22 -07:00
Chang Su	72676cd6c0	feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (#7351 ) Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-06-21 13:21:06 -07:00
Binyao Jiang	b783c1cb82	Fix hicache benchmark script bug - some sampled input_request is [] (#7300 )	2025-06-17 23:47:11 -07:00
Zhiqiang Xie	e56685ac1b	Upstreaming hicache bug fixes (#7267 )	2025-06-17 17:44:57 -07:00
Yineng Zhang	f9dc9dd28b	chore: bump v0.4.7.post1 (#7248 )	2025-06-16 15:20:29 -07:00
Xiaoyu Zhang	0ae1e9a755	refine fused_moe benchmark (#7221 )	2025-06-15 21:21:32 -07:00
Lifu Huang	e07d064729	Support LoRA in MMMU benchmark script. (#7218 )	2025-06-15 21:17:57 -07:00
Quanfeng Li	ef32677444	Fix positional argument (#7093 )	2025-06-11 18:31:13 -07:00
Yineng Zhang	4f723edd3b	chore: bump v0.4.7 (#7038 )	2025-06-10 01:56:20 -07:00
Xiaoyu Zhang	3712abfaf9	Fuse routed scaling factor in deepseek (#6970 )	2025-06-08 15:24:24 -07:00
Xiaoyu Zhang	fa3592cfeb	rebase h20 fused_moe config (#6966 )	2025-06-08 05:01:34 -07:00
Yineng Zhang	1fb76ebb93	Revert "Fuse routed scaling factor in topk_reduce kernel (#6220 )" (#6968 )	2025-06-07 21:02:49 -07:00
Xiaoyu Zhang	515ef4facb	Fuse routed scaling factor in topk_reduce kernel (#6220 )	2025-06-07 11:06:50 -07:00
Xiaoyu Zhang	bae4fdc7ab	add fbgemm moe grouped gemm kernel benchmark (#6924 )	2025-06-07 02:57:30 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Cheng Wan	81964328b7	Set `num_fused_shared_experts` as `num_shared_experts` when shared_experts fusion is not disabled (#6736 )	2025-06-04 15:53:22 -07:00
Cheng Wan	8a5480528d	[Refactor] Rename `n_share_experts_fusion` as `num_fused_shared_experts` (#6735 )	2025-06-03 17:48:24 -07:00
JieXin Liang	d9d35def3d	[test] add ut and bm for get_last_loc (#6746 )	2025-05-29 11:47:21 -07:00
fzyzcjy	6df81e8a39	Support tuning DeepEP configs (#6742 )	2025-05-29 08:12:22 -07:00
ChangyiYang	485a023bd8	refactor apply_w8a8_block_fp8_linear in fp (#6545 )	2025-05-29 00:15:11 -07:00
Wenxuan Tan	844a8f42c7	Fix LoRA bench (#6719 )	2025-05-28 16:38:55 -07:00
Xiaoyu Zhang	076103535c	fix log_info_on_rank0 error when run benchmark (#6260 )	2025-05-28 00:20:01 -07:00
Yuan Luo	c087ddd686	Refine pre_reorder_triton_kernel slightly to improve performance (#6627 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-28 00:15:23 -07:00
Yineng Zhang	7e257cd666	chore: bump v0.4.6.post5 (#6566 )	2025-05-24 00:48:05 -07:00
Qiaolin Yu	cd8d4b9dfc	Fix lora bench (#6302 )	2025-05-15 10:09:55 -07:00
Yineng Zhang	16267d4fa7	chore: bump v0.4.6.post4 (#6245 )	2025-05-13 01:57:51 -07:00
fzyzcjy	ef8ec07b2c	Support tuning moe for llama 4 model (#6042 )	2025-05-12 15:47:01 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Lifu Huang	6e2da51561	Replace time.time() to time.perf_counter() for benchmarking. (#6178 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-11 14:32:49 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
XinyuanTong	9d8ec2e67e	Fix and Clean up chat-template requirement for VLM (#6114 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-11 00:14:09 +08:00
Yineng Zhang	678d8cc987	chore: bump v0.4.6.post3 (#6165 )	2025-05-09 15:38:47 -07:00
XinyuanTong	6ea1e6ac6e	Support MMMU benchmark for InternVL (#5968 )	2025-05-02 00:17:21 -07:00
XinyuanTong	c5645e928f	feat: add concurrency evaluation logic in mmmu benchmark (#5782 )	2025-05-01 18:20:08 -07:00
Yineng Zhang	9858113c33	chore: bump v0.4.6.post2 (#5939 )	2025-04-30 22:04:40 -07:00
Yi Zhang	d50e36a79d	support vlm benchmark profile (#5905 )	2025-04-29 23:48:27 -07:00

1 2 3 4 5 ...

266 Commits