sglang

Author	SHA1	Message	Date
Yineng Zhang	69183f8808	chore: bump v0.4.8.post1 (#7559 )	2025-06-26 02:21:12 -07:00
Xiaoyu Zhang	8ecad0b16f	[benchmark] fbgemm benchmark support bandwidth report and support fbgemm_cutlass_gmm (#7422 )	2025-06-24 09:44:55 -07:00
Yineng Zhang	7c3a12c000	chore: bump v0.4.8 (#7493 )	2025-06-23 23:14:22 -07:00
Chang Su	72676cd6c0	feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (#7351 ) Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-06-21 13:21:06 -07:00
Binyao Jiang	b783c1cb82	Fix hicache benchmark script bug - some sampled input_request is [] (#7300 )	2025-06-17 23:47:11 -07:00
Zhiqiang Xie	e56685ac1b	Upstreaming hicache bug fixes (#7267 )	2025-06-17 17:44:57 -07:00
Yineng Zhang	f9dc9dd28b	chore: bump v0.4.7.post1 (#7248 )	2025-06-16 15:20:29 -07:00
Xiaoyu Zhang	0ae1e9a755	refine fused_moe benchmark (#7221 )	2025-06-15 21:21:32 -07:00
Lifu Huang	e07d064729	Support LoRA in MMMU benchmark script. (#7218 )	2025-06-15 21:17:57 -07:00
Quanfeng Li	ef32677444	Fix positional argument (#7093 )	2025-06-11 18:31:13 -07:00
Yineng Zhang	4f723edd3b	chore: bump v0.4.7 (#7038 )	2025-06-10 01:56:20 -07:00
Xiaoyu Zhang	3712abfaf9	Fuse routed scaling factor in deepseek (#6970 )	2025-06-08 15:24:24 -07:00
Xiaoyu Zhang	fa3592cfeb	rebase h20 fused_moe config (#6966 )	2025-06-08 05:01:34 -07:00
Yineng Zhang	1fb76ebb93	Revert "Fuse routed scaling factor in topk_reduce kernel (#6220 )" (#6968 )	2025-06-07 21:02:49 -07:00
Xiaoyu Zhang	515ef4facb	Fuse routed scaling factor in topk_reduce kernel (#6220 )	2025-06-07 11:06:50 -07:00
Xiaoyu Zhang	bae4fdc7ab	add fbgemm moe grouped gemm kernel benchmark (#6924 )	2025-06-07 02:57:30 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Cheng Wan	81964328b7	Set `num_fused_shared_experts` as `num_shared_experts` when shared_experts fusion is not disabled (#6736 )	2025-06-04 15:53:22 -07:00
Cheng Wan	8a5480528d	[Refactor] Rename `n_share_experts_fusion` as `num_fused_shared_experts` (#6735 )	2025-06-03 17:48:24 -07:00
JieXin Liang	d9d35def3d	[test] add ut and bm for get_last_loc (#6746 )	2025-05-29 11:47:21 -07:00
fzyzcjy	6df81e8a39	Support tuning DeepEP configs (#6742 )	2025-05-29 08:12:22 -07:00
ChangyiYang	485a023bd8	refactor apply_w8a8_block_fp8_linear in fp (#6545 )	2025-05-29 00:15:11 -07:00
Wenxuan Tan	844a8f42c7	Fix LoRA bench (#6719 )	2025-05-28 16:38:55 -07:00
Xiaoyu Zhang	076103535c	fix log_info_on_rank0 error when run benchmark (#6260 )	2025-05-28 00:20:01 -07:00
Yuan Luo	c087ddd686	Refine pre_reorder_triton_kernel slightly to improve performance (#6627 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-28 00:15:23 -07:00
Yineng Zhang	7e257cd666	chore: bump v0.4.6.post5 (#6566 )	2025-05-24 00:48:05 -07:00
Qiaolin Yu	cd8d4b9dfc	Fix lora bench (#6302 )	2025-05-15 10:09:55 -07:00
Yineng Zhang	16267d4fa7	chore: bump v0.4.6.post4 (#6245 )	2025-05-13 01:57:51 -07:00
fzyzcjy	ef8ec07b2c	Support tuning moe for llama 4 model (#6042 )	2025-05-12 15:47:01 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Lifu Huang	6e2da51561	Replace time.time() to time.perf_counter() for benchmarking. (#6178 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-11 14:32:49 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
XinyuanTong	9d8ec2e67e	Fix and Clean up chat-template requirement for VLM (#6114 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-11 00:14:09 +08:00
Yineng Zhang	678d8cc987	chore: bump v0.4.6.post3 (#6165 )	2025-05-09 15:38:47 -07:00
XinyuanTong	6ea1e6ac6e	Support MMMU benchmark for InternVL (#5968 )	2025-05-02 00:17:21 -07:00
XinyuanTong	c5645e928f	feat: add concurrency evaluation logic in mmmu benchmark (#5782 )	2025-05-01 18:20:08 -07:00
Yineng Zhang	9858113c33	chore: bump v0.4.6.post2 (#5939 )	2025-04-30 22:04:40 -07:00
Yi Zhang	d50e36a79d	support vlm benchmark profile (#5905 )	2025-04-29 23:48:27 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00
Xiaoyu Zhang	1cc326032d	simplify fused_moe config logging (#5801 )	2025-04-28 17:04:54 -07:00
Yineng Zhang	dcae1fb2cd	chore: bump v0.4.6.post1 (#5845 )	2025-04-28 12:57:08 -07:00
Yi Zhang	a0251a3fd6	add fused moe config for qwen3moe fp8/bf16 (#5849 )	2025-04-28 11:55:52 -07:00
Xiaoyu Zhang	e132cba2a8	fused moe triton tuning script support qwen3 (#5842 )	2025-04-28 09:13:04 -07:00
XinyuanTong	0045f4b2af	feat: Add fused moe triton config for qwen3 moe on h100 (#5833 )	2025-04-28 08:37:13 -07:00
Baizhou Zhang	84022c0e56	Release v0.4.6 (#5795 )	2025-04-27 14:07:05 -07:00
Ravi Theja	7d9679b74d	Add MMMU benchmark results (#4491 ) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>	2025-04-25 15:23:53 +08:00
Mick	c998d04b46	vlm: enable radix cache for qwen-vl models (#5349 ) Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>	2025-04-23 20:35:05 -07:00
Yineng Zhang	b9c87e781d	chore: bump v0.4.5.post3 (#5611 )	2025-04-21 18:16:20 -07:00
Sundara Raman Ramachandran	f08154193c	Perform Batch Tokenization. (#5141 )	2025-04-20 18:10:37 -07:00

1 2 3 4 5 ...

255 Commits