sglang

Author	SHA1	Message	Date
Brayden Zhong	b149b39353	[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969 )	2025-03-27 19:45:02 -07:00
Daniel Holanda	98a2cfa9b2	Basic Cleanup (#4833 )	2025-03-27 16:55:48 -07:00
Ravi Theja	e6e4d02245	Update MMMU Benchmark instructions (#4694 )	2025-03-27 14:44:16 -07:00
Chunan Zeng	14269198e3	[Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735 )	2025-03-24 20:56:31 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Tongbao Zhang	3980ff1be6	rename benchmark_deepgemm_fp8_group_gemm.py (#4605 )	2025-03-23 23:35:20 -07:00
Mick	11577cedb7	refactor: bug fixes and refactor for vlm (#4661 )	2025-03-22 22:48:49 -07:00
Ke Bao	8f163b1653	Add EAGLE mtbench benchmark script (#4676 ) Co-authored-by: chromecast56 <jamesll@mit.edu>	2025-03-22 13:34:01 -07:00
penguin_wwy	38f25e87fc	Correcting default configuration when benchmarking fused_moe (#4665 )	2025-03-22 00:52:34 -07:00
aoshen524	588865f0e0	[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-03-18 20:33:07 -07:00
Mick	98be3bd306	refactor: rewrite bench-mmmu-sglang (#4458 )	2025-03-17 18:11:47 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
ZelinTan	402db5c58c	Benchmark: Statistical Analysis of the Output Stability of the Deepseek Model (#4202 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-16 17:32:57 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Zhan Lu	660305c38a	[Doc] fix wrong flag in deepseek documentation (#4427 )	2025-03-14 11:30:55 -07:00
laixin	0c02086015	add INT8 example into dsv3 README (#4079 )	2025-03-12 21:37:30 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
Xiaoyu Zhang	7130a7cea9	refine sgl_moe_align_block_size_benchmark (#4327 )	2025-03-11 22:48:38 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
yych0745	6a02b32d07	Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-03-11 00:49:06 -07:00
lukec	ffa1b3e318	Add an example of using deepseekv3 int8 sglang. (#4177 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-07 01:56:09 -08:00
Yueyang Pan	25482edb5c	Online serving benchmarks of real datasets for hierarchical KV caching (#3211 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-03-05 16:16:43 -08:00
Lu Changqi	e5760bc40a	bench: add dataset param for bench_multiturn (#3990 )	2025-03-05 01:21:37 -08:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Stefan He	0194948fd9	Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014 )	2025-03-02 23:29:55 -08:00
Stefan He	b7e274f2d9	Add Benchmark for DeepGEMM Group GEMM (#3993 )	2025-03-02 17:47:21 -08:00
Xiaoyu Zhang	50f28f65a0	fix typo in deep gemm benchmarking(#3991 )	2025-03-02 00:34:00 -08:00
Xiaoyu Zhang	90a55e2566	add deepgemm and sglang fp8 block-wise gemm benchmark (#3893 )	2025-03-01 23:01:58 -08:00
Chayenne	18bb216c28	Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982 )	2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team	1c96fa86cf	[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613 )	2025-02-27 19:42:48 -08:00
Yineng Zhang	5d86016855	revert "Docs: Reorngaize dpsk links #3900 " (#3933 )	2025-02-27 08:57:13 -08:00
laixin	b0df5d240b	Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-02-27 10:59:46 +00:00
Chayenne	7c1692aa90	Docs: Reorngaize dpsk links (#3900 )	2025-02-26 15:16:31 -08:00
IAN	107710268a	[BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841 )	2025-02-25 09:32:05 -08:00
Zhiqiang Xie	6c7a152c5a	Hierarchical Caching for SGLang (#2693 ) Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-02-23 21:56:30 -08:00
Mick	45205d88a0	bench: Add MMMU benchmark for vLM (#3562 )	2025-02-22 08:10:59 -08:00
simveit	bb121214c2	Variance measure for reasoning benchmark (#3677 )	2025-02-20 03:49:49 +08:00
Zhanghao Wu	f93e915817	[Docs] Add SkyPilot DeepSeek example (#3706 )	2025-02-20 02:10:23 +08:00
Yineng Zhang	fe0673f1cc	set NCCL_IB_GID_INDEX=3 for multi node NVIDIA InfiniBand if needed (#3698 )	2025-02-19 20:50:22 +08:00
yigex	ddf39d3fce	[ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567 )	2025-02-17 17:54:10 -08:00
Xiaoyu Zhang	c38f3aed24	support multi-gpu block-gemm tuning (#3639 )	2025-02-18 00:00:35 +08:00
Shenggui Li	c9565e49e7	[docker] added rdma support (#3619 )	2025-02-17 15:36:16 +08:00
simveit	3d4a8f9bc0	Benchmark for reasoning models (#3532 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-17 03:07:30 +08:00
Yineng Zhang	ac963be234	update flashinfer-python (#3557 )	2025-02-14 09:52:56 +08:00
Yineng Zhang	e0b9a423c8	chore: bump v0.4.3 (#3556 )	2025-02-14 09:43:14 +08:00
Yineng Zhang	20de05a753	update README (#3543 )	2025-02-13 17:22:11 +08:00
Jhin	bf2a70872e	Update DeepSeek V3 Doc (#3541 )	2025-02-12 23:15:37 -08:00
Xiaoyu Zhang	693c2600e0	refine deepseek_v3 launch server doc (#3522 )	2025-02-12 17:27:07 +08:00
yigex	fdf04a1426	[ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418 ) Co-authored-by: Bruce Xue <yigex@xilinx.com> Co-authored-by: HAI <hixiao@gmail.com>	2025-02-10 23:55:04 -08:00

1 2 3 4

195 Commits