sglang

Author	SHA1	Message	Date
JieXin Liang	d9d35def3d	[test] add ut and bm for get_last_loc (#6746 )	2025-05-29 11:47:21 -07:00
fzyzcjy	6df81e8a39	Support tuning DeepEP configs (#6742 )	2025-05-29 08:12:22 -07:00
ChangyiYang	485a023bd8	refactor apply_w8a8_block_fp8_linear in fp (#6545 )	2025-05-29 00:15:11 -07:00
Wenxuan Tan	844a8f42c7	Fix LoRA bench (#6719 )	2025-05-28 16:38:55 -07:00
Xiaoyu Zhang	076103535c	fix log_info_on_rank0 error when run benchmark (#6260 )	2025-05-28 00:20:01 -07:00
Yuan Luo	c087ddd686	Refine pre_reorder_triton_kernel slightly to improve performance (#6627 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-28 00:15:23 -07:00
Yineng Zhang	7e257cd666	chore: bump v0.4.6.post5 (#6566 )	2025-05-24 00:48:05 -07:00
Qiaolin Yu	cd8d4b9dfc	Fix lora bench (#6302 )	2025-05-15 10:09:55 -07:00
Yineng Zhang	16267d4fa7	chore: bump v0.4.6.post4 (#6245 )	2025-05-13 01:57:51 -07:00
fzyzcjy	ef8ec07b2c	Support tuning moe for llama 4 model (#6042 )	2025-05-12 15:47:01 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Lifu Huang	6e2da51561	Replace time.time() to time.perf_counter() for benchmarking. (#6178 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-11 14:32:49 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
XinyuanTong	9d8ec2e67e	Fix and Clean up chat-template requirement for VLM (#6114 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-11 00:14:09 +08:00
Yineng Zhang	678d8cc987	chore: bump v0.4.6.post3 (#6165 )	2025-05-09 15:38:47 -07:00
XinyuanTong	6ea1e6ac6e	Support MMMU benchmark for InternVL (#5968 )	2025-05-02 00:17:21 -07:00
XinyuanTong	c5645e928f	feat: add concurrency evaluation logic in mmmu benchmark (#5782 )	2025-05-01 18:20:08 -07:00
Yineng Zhang	9858113c33	chore: bump v0.4.6.post2 (#5939 )	2025-04-30 22:04:40 -07:00
Yi Zhang	d50e36a79d	support vlm benchmark profile (#5905 )	2025-04-29 23:48:27 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00
Xiaoyu Zhang	1cc326032d	simplify fused_moe config logging (#5801 )	2025-04-28 17:04:54 -07:00
Yineng Zhang	dcae1fb2cd	chore: bump v0.4.6.post1 (#5845 )	2025-04-28 12:57:08 -07:00
Yi Zhang	a0251a3fd6	add fused moe config for qwen3moe fp8/bf16 (#5849 )	2025-04-28 11:55:52 -07:00
Xiaoyu Zhang	e132cba2a8	fused moe triton tuning script support qwen3 (#5842 )	2025-04-28 09:13:04 -07:00
XinyuanTong	0045f4b2af	feat: Add fused moe triton config for qwen3 moe on h100 (#5833 )	2025-04-28 08:37:13 -07:00
Baizhou Zhang	84022c0e56	Release v0.4.6 (#5795 )	2025-04-27 14:07:05 -07:00
Ravi Theja	7d9679b74d	Add MMMU benchmark results (#4491 ) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>	2025-04-25 15:23:53 +08:00
Mick	c998d04b46	vlm: enable radix cache for qwen-vl models (#5349 ) Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>	2025-04-23 20:35:05 -07:00
Yineng Zhang	b9c87e781d	chore: bump v0.4.5.post3 (#5611 )	2025-04-21 18:16:20 -07:00
Sundara Raman Ramachandran	f08154193c	Perform Batch Tokenization. (#5141 )	2025-04-20 18:10:37 -07:00
lukec	417b44eba8	[Feat] upgrade pytorch2.6 (#5417 )	2025-04-20 16:06:34 -07:00
Zhaoyi Li	c555d794f7	Minor update for ROCm variable style (#5562 )	2025-04-19 23:45:27 -07:00
lambert0312	61e7c4dd21	Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368 )	2025-04-14 18:39:44 -07:00
Xiaoyu Zhang	3e4794aad8	refine fused_moe tuning docs (#5294 )	2025-04-12 10:01:13 -07:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Chunan Zeng	a7c3f74bec	[FA3 Feature] Support multi modal Llama-3.2-11B-Vision-Instruct (#5103 )	2025-04-07 22:58:08 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
AniZpZ	d95269f9b3	[2/3] fix dsv3 awq issue (#4625 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>	2025-04-03 17:36:39 -07:00
Ravi Theja	69df9761dd	Add LlavaLlamaForCausaLM in MultiModal Processors (#5039 ) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>	2025-04-03 15:41:12 -07:00
Mick	5cb552b1d4	refactor: multimodal data (#4754 )	2025-03-31 09:57:51 -07:00
Brayden Zhong	b149b39353	[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969 )	2025-03-27 19:45:02 -07:00
Daniel Holanda	98a2cfa9b2	Basic Cleanup (#4833 )	2025-03-27 16:55:48 -07:00
Ravi Theja	e6e4d02245	Update MMMU Benchmark instructions (#4694 )	2025-03-27 14:44:16 -07:00
Chunan Zeng	14269198e3	[Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735 )	2025-03-24 20:56:31 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Tongbao Zhang	3980ff1be6	rename benchmark_deepgemm_fp8_group_gemm.py (#4605 )	2025-03-23 23:35:20 -07:00
Mick	11577cedb7	refactor: bug fixes and refactor for vlm (#4661 )	2025-03-22 22:48:49 -07:00
Ke Bao	8f163b1653	Add EAGLE mtbench benchmark script (#4676 ) Co-authored-by: chromecast56 <jamesll@mit.edu>	2025-03-22 13:34:01 -07:00
penguin_wwy	38f25e87fc	Correcting default configuration when benchmarking fused_moe (#4665 )	2025-03-22 00:52:34 -07:00

1 2 3 4 5

236 Commits