sglang

Author	SHA1	Message	Date
Zhiqiang Xie	0eec4cb6cc	HiCache, add bench long context plus minor fixs (#9086 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 16:54:52 -07:00
Lianmin Zheng	b58ae7a2a0	Simplify frontend language (#9029 )	2025-08-10 10:59:30 -07:00
Binyao Jiang	f29aba8c6e	Support glm4.1v and glm4.5v (#8798 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-09 00:59:13 -07:00
Yineng Zhang	9020f7fc32	chore: bump v0.5.0rc0 (#8959 )	2025-08-08 09:16:18 -07:00
pansicheng	e2fd2b9c7e	Simple prefetch policy (#8692 )	2025-08-08 02:09:28 -07:00
eigen	9c7e392465	bench: add attention sink op benchmark, triton and trtllm-gen [B200] (#8932 ) Co-authored-by: averyhuang <averyh@nvidia.com>	2025-08-08 00:16:23 -07:00
Ke Bao	0475448ee3	Optimize triton swa kernel by skipping computation (#8860 )	2025-08-06 21:37:50 +08:00
Yineng Zhang	8cd344586e	chore: bump v0.4.10.post2 (#8727 )	2025-08-03 03:43:29 -07:00
Ke Bao	33f0de337d	chore: bump v0.4.10.post1 (#8652 )	2025-08-01 12:07:30 +08:00
Yineng Zhang	023288645b	chore: bump v0.4.10 (#8608 )	2025-07-31 20:50:17 +08:00
pansicheng	299803343d	Add hf3fs support for hicache storage (based on #7704 ) (#7280 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-30 17:42:41 -07:00
Yineng Zhang	6478831be9	chore: bump v0.4.9.post6 (#8517 )	2025-07-29 02:30:07 -07:00
Yineng Zhang	1466c1b896	feat: support glm4 tuning (#8473 )	2025-07-28 14:32:58 -07:00
Yineng Zhang	45bc170b36	chore: bump v0.4.9.post5 (#8458 )	2025-07-28 02:11:06 -07:00
Yuxuan Zhang	6d6a8bc278	GLM-4.5 Model Support (#8224 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Binyao Jiang <byjiang1996@gmail.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-27 22:54:07 -07:00
fzyzcjy	62222bd27e	Minor tool for comparison of benchmark results (#7974 )	2025-07-27 00:27:50 -07:00
Mick	4fa44d63c6	chore: improve mmmu benchmark (#7000 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-26 16:19:45 +08:00
Yineng Zhang	2272c2a5b5	chore: bump v0.4.9.post4 (#8305 )	2025-07-25 17:12:47 -07:00
Zhiqiang Xie	ce86e201df	bug fix and tag (#8282 )	2025-07-23 16:50:31 +08:00
Yineng Zhang	01c000043c	chore: bump v0.4.9.post3 (#8265 )	2025-07-22 15:55:48 -07:00
zhongwei	ff45ab7a5f	[Benchmark] add disable-auto-run param for hicache/bench_multiturn (#7822 ) Co-authored-by: zhongwei.ren <zhongwei.ren@bytedance.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-22 14:02:40 -07:00
Cheng Wan	abda2542d5	Fix tuning_fused_moe_triton.py (#8175 )	2025-07-19 17:33:50 -07:00
Hongbo Xu	1f76fc8747	[3/n] chore: decouple AWQ implementation from vLLM dependency (#8113 ) Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>	2025-07-18 11:45:22 -07:00
Yineng Zhang	eb118d88c4	chore: bump v0.4.9.post2 (#7963 )	2025-07-11 21:11:20 -07:00
Yineng Zhang	066f4ec91f	chore: bump v0.4.9.post1 (#7882 )	2025-07-09 00:28:17 -07:00
Yuan Luo	253454de9b	Integrate triton moe kernel (#7689 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-07-06 20:05:49 -07:00
Yineng Zhang	ec5f9c6269	chore: bump v0.4.9 (#7802 )	2025-07-05 17:40:29 -07:00
Yineng Zhang	69183f8808	chore: bump v0.4.8.post1 (#7559 )	2025-06-26 02:21:12 -07:00
Xiaoyu Zhang	8ecad0b16f	[benchmark] fbgemm benchmark support bandwidth report and support fbgemm_cutlass_gmm (#7422 )	2025-06-24 09:44:55 -07:00
Yineng Zhang	7c3a12c000	chore: bump v0.4.8 (#7493 )	2025-06-23 23:14:22 -07:00
Chang Su	72676cd6c0	feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (#7351 ) Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-06-21 13:21:06 -07:00
Binyao Jiang	b783c1cb82	Fix hicache benchmark script bug - some sampled input_request is [] (#7300 )	2025-06-17 23:47:11 -07:00
Zhiqiang Xie	e56685ac1b	Upstreaming hicache bug fixes (#7267 )	2025-06-17 17:44:57 -07:00
Yineng Zhang	f9dc9dd28b	chore: bump v0.4.7.post1 (#7248 )	2025-06-16 15:20:29 -07:00
Xiaoyu Zhang	0ae1e9a755	refine fused_moe benchmark (#7221 )	2025-06-15 21:21:32 -07:00
Lifu Huang	e07d064729	Support LoRA in MMMU benchmark script. (#7218 )	2025-06-15 21:17:57 -07:00
Quanfeng Li	ef32677444	Fix positional argument (#7093 )	2025-06-11 18:31:13 -07:00
Yineng Zhang	4f723edd3b	chore: bump v0.4.7 (#7038 )	2025-06-10 01:56:20 -07:00
Xiaoyu Zhang	3712abfaf9	Fuse routed scaling factor in deepseek (#6970 )	2025-06-08 15:24:24 -07:00
Xiaoyu Zhang	fa3592cfeb	rebase h20 fused_moe config (#6966 )	2025-06-08 05:01:34 -07:00
Yineng Zhang	1fb76ebb93	Revert "Fuse routed scaling factor in topk_reduce kernel (#6220 )" (#6968 )	2025-06-07 21:02:49 -07:00
Xiaoyu Zhang	515ef4facb	Fuse routed scaling factor in topk_reduce kernel (#6220 )	2025-06-07 11:06:50 -07:00
Xiaoyu Zhang	bae4fdc7ab	add fbgemm moe grouped gemm kernel benchmark (#6924 )	2025-06-07 02:57:30 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Cheng Wan	81964328b7	Set `num_fused_shared_experts` as `num_shared_experts` when shared_experts fusion is not disabled (#6736 )	2025-06-04 15:53:22 -07:00
Cheng Wan	8a5480528d	[Refactor] Rename `n_share_experts_fusion` as `num_fused_shared_experts` (#6735 )	2025-06-03 17:48:24 -07:00
JieXin Liang	d9d35def3d	[test] add ut and bm for get_last_loc (#6746 )	2025-05-29 11:47:21 -07:00
fzyzcjy	6df81e8a39	Support tuning DeepEP configs (#6742 )	2025-05-29 08:12:22 -07:00
ChangyiYang	485a023bd8	refactor apply_w8a8_block_fp8_linear in fp (#6545 )	2025-05-29 00:15:11 -07:00
Wenxuan Tan	844a8f42c7	Fix LoRA bench (#6719 )	2025-05-28 16:38:55 -07:00

1 2 3 4 5 ...

282 Commits