Commit Graph

195 Commits

Author SHA1 Message Date
Brayden Zhong
b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) 2025-03-27 19:45:02 -07:00
Daniel Holanda
98a2cfa9b2 Basic Cleanup (#4833) 2025-03-27 16:55:48 -07:00
Ravi Theja
e6e4d02245 Update MMMU Benchmark instructions (#4694) 2025-03-27 14:44:16 -07:00
Chunan Zeng
14269198e3 [Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735) 2025-03-24 20:56:31 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Tongbao Zhang
3980ff1be6 rename benchmark_deepgemm_fp8_group_gemm.py (#4605) 2025-03-23 23:35:20 -07:00
Mick
11577cedb7 refactor: bug fixes and refactor for vlm (#4661) 2025-03-22 22:48:49 -07:00
Ke Bao
8f163b1653 Add EAGLE mtbench benchmark script (#4676)
Co-authored-by: chromecast56 <jamesll@mit.edu>
2025-03-22 13:34:01 -07:00
penguin_wwy
38f25e87fc Correcting default configuration when benchmarking fused_moe (#4665) 2025-03-22 00:52:34 -07:00
aoshen524
588865f0e0 [Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-03-18 20:33:07 -07:00
Mick
98be3bd306 refactor: rewrite bench-mmmu-sglang (#4458) 2025-03-17 18:11:47 -07:00
Wenbo Yang
75b656488a Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418) 2025-03-17 00:03:43 -07:00
ZelinTan
402db5c58c Benchmark: Statistical Analysis of the Output Stability of the Deepseek Model (#4202)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-16 17:32:57 -07:00
JieXin Liang
1a3fa75f2f [Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466) 2025-03-16 00:02:47 -07:00
Zhan Lu
660305c38a [Doc] fix wrong flag in deepseek documentation (#4427) 2025-03-14 11:30:55 -07:00
laixin
0c02086015 add INT8 example into dsv3 README (#4079) 2025-03-12 21:37:30 -07:00
Mick
01090e8ac3 model: Support Janus-pro (#3203) 2025-03-12 11:02:11 -07:00
Xiaoyu Zhang
7130a7cea9 refine sgl_moe_align_block_size_benchmark (#4327) 2025-03-11 22:48:38 -07:00
Mick
ff2ce0b86f refactor: move image processors to separate files (#4229) 2025-03-11 12:35:35 -07:00
yych0745
6a02b32d07 Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-03-11 00:49:06 -07:00
lukec
ffa1b3e318 Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-07 01:56:09 -08:00
Yueyang Pan
25482edb5c Online serving benchmarks of real datasets for hierarchical KV caching (#3211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-03-05 16:16:43 -08:00
Lu Changqi
e5760bc40a bench: add dataset param for bench_multiturn (#3990) 2025-03-05 01:21:37 -08:00
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Stefan He
0194948fd9 Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014) 2025-03-02 23:29:55 -08:00
Stefan He
b7e274f2d9 Add Benchmark for DeepGEMM Group GEMM (#3993) 2025-03-02 17:47:21 -08:00
Xiaoyu Zhang
50f28f65a0 fix typo in deep gemm benchmarking(#3991) 2025-03-02 00:34:00 -08:00
Xiaoyu Zhang
90a55e2566 add deepgemm and sglang fp8 block-wise gemm benchmark (#3893) 2025-03-01 23:01:58 -08:00
Chayenne
18bb216c28 Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982) 2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team
1c96fa86cf [MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613) 2025-02-27 19:42:48 -08:00
Yineng Zhang
5d86016855 revert "Docs: Reorngaize dpsk links #3900" (#3933) 2025-02-27 08:57:13 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
Chayenne
7c1692aa90 Docs: Reorngaize dpsk links (#3900) 2025-02-26 15:16:31 -08:00
IAN
107710268a [BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841) 2025-02-25 09:32:05 -08:00
Zhiqiang Xie
6c7a152c5a Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-02-23 21:56:30 -08:00
Mick
45205d88a0 bench: Add MMMU benchmark for vLM (#3562) 2025-02-22 08:10:59 -08:00
simveit
bb121214c2 Variance measure for reasoning benchmark (#3677) 2025-02-20 03:49:49 +08:00
Zhanghao Wu
f93e915817 [Docs] Add SkyPilot DeepSeek example (#3706) 2025-02-20 02:10:23 +08:00
Yineng Zhang
fe0673f1cc set NCCL_IB_GID_INDEX=3 for multi node NVIDIA InfiniBand if needed (#3698) 2025-02-19 20:50:22 +08:00
yigex
ddf39d3fce [ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567) 2025-02-17 17:54:10 -08:00
Xiaoyu Zhang
c38f3aed24 support multi-gpu block-gemm tuning (#3639) 2025-02-18 00:00:35 +08:00
Shenggui Li
c9565e49e7 [docker] added rdma support (#3619) 2025-02-17 15:36:16 +08:00
simveit
3d4a8f9bc0 Benchmark for reasoning models (#3532)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-17 03:07:30 +08:00
Yineng Zhang
ac963be234 update flashinfer-python (#3557) 2025-02-14 09:52:56 +08:00
Yineng Zhang
e0b9a423c8 chore: bump v0.4.3 (#3556) 2025-02-14 09:43:14 +08:00
Yineng Zhang
20de05a753 update README (#3543) 2025-02-13 17:22:11 +08:00
Jhin
bf2a70872e Update DeepSeek V3 Doc (#3541) 2025-02-12 23:15:37 -08:00
Xiaoyu Zhang
693c2600e0 refine deepseek_v3 launch server doc (#3522) 2025-02-12 17:27:07 +08:00
yigex
fdf04a1426 [ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418)
Co-authored-by: Bruce Xue <yigex@xilinx.com>
Co-authored-by: HAI <hixiao@gmail.com>
2025-02-10 23:55:04 -08:00