Commit Graph

206 Commits

Author SHA1 Message Date
Sundara Raman Ramachandran
f08154193c Perform Batch Tokenization. (#5141) 2025-04-20 18:10:37 -07:00
lukec
417b44eba8 [Feat] upgrade pytorch2.6 (#5417) 2025-04-20 16:06:34 -07:00
Zhaoyi Li
c555d794f7 Minor update for ROCm variable style (#5562) 2025-04-19 23:45:27 -07:00
lambert0312
61e7c4dd21 Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368) 2025-04-14 18:39:44 -07:00
Xiaoyu Zhang
3e4794aad8 refine fused_moe tuning docs (#5294) 2025-04-12 10:01:13 -07:00
Mick
34ef6c8135 [VLM] Adopt fast image processor by default (#5065) 2025-04-11 21:46:58 -07:00
Chunan Zeng
a7c3f74bec [FA3 Feature] Support multi modal Llama-3.2-11B-Vision-Instruct (#5103) 2025-04-07 22:58:08 -07:00
Xiaoyu Zhang
924ca7c92c Add DeepSeek V3/R1 shared experts fusion (#4918) 2025-04-04 01:59:29 -07:00
AniZpZ
d95269f9b3 [2/3] fix dsv3 awq issue (#4625)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
2025-04-03 17:36:39 -07:00
Ravi Theja
69df9761dd Add LlavaLlamaForCausaLM in MultiModal Processors (#5039)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-03 15:41:12 -07:00
Mick
5cb552b1d4 refactor: multimodal data (#4754) 2025-03-31 09:57:51 -07:00
Brayden Zhong
b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) 2025-03-27 19:45:02 -07:00
Daniel Holanda
98a2cfa9b2 Basic Cleanup (#4833) 2025-03-27 16:55:48 -07:00
Ravi Theja
e6e4d02245 Update MMMU Benchmark instructions (#4694) 2025-03-27 14:44:16 -07:00
Chunan Zeng
14269198e3 [Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735) 2025-03-24 20:56:31 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Tongbao Zhang
3980ff1be6 rename benchmark_deepgemm_fp8_group_gemm.py (#4605) 2025-03-23 23:35:20 -07:00
Mick
11577cedb7 refactor: bug fixes and refactor for vlm (#4661) 2025-03-22 22:48:49 -07:00
Ke Bao
8f163b1653 Add EAGLE mtbench benchmark script (#4676)
Co-authored-by: chromecast56 <jamesll@mit.edu>
2025-03-22 13:34:01 -07:00
penguin_wwy
38f25e87fc Correcting default configuration when benchmarking fused_moe (#4665) 2025-03-22 00:52:34 -07:00
aoshen524
588865f0e0 [Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-03-18 20:33:07 -07:00
Mick
98be3bd306 refactor: rewrite bench-mmmu-sglang (#4458) 2025-03-17 18:11:47 -07:00
Wenbo Yang
75b656488a Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418) 2025-03-17 00:03:43 -07:00
ZelinTan
402db5c58c Benchmark: Statistical Analysis of the Output Stability of the Deepseek Model (#4202)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-16 17:32:57 -07:00
JieXin Liang
1a3fa75f2f [Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466) 2025-03-16 00:02:47 -07:00
Zhan Lu
660305c38a [Doc] fix wrong flag in deepseek documentation (#4427) 2025-03-14 11:30:55 -07:00
laixin
0c02086015 add INT8 example into dsv3 README (#4079) 2025-03-12 21:37:30 -07:00
Mick
01090e8ac3 model: Support Janus-pro (#3203) 2025-03-12 11:02:11 -07:00
Xiaoyu Zhang
7130a7cea9 refine sgl_moe_align_block_size_benchmark (#4327) 2025-03-11 22:48:38 -07:00
Mick
ff2ce0b86f refactor: move image processors to separate files (#4229) 2025-03-11 12:35:35 -07:00
yych0745
6a02b32d07 Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-03-11 00:49:06 -07:00
lukec
ffa1b3e318 Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-07 01:56:09 -08:00
Yueyang Pan
25482edb5c Online serving benchmarks of real datasets for hierarchical KV caching (#3211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-03-05 16:16:43 -08:00
Lu Changqi
e5760bc40a bench: add dataset param for bench_multiturn (#3990) 2025-03-05 01:21:37 -08:00
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Stefan He
0194948fd9 Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014) 2025-03-02 23:29:55 -08:00
Stefan He
b7e274f2d9 Add Benchmark for DeepGEMM Group GEMM (#3993) 2025-03-02 17:47:21 -08:00
Xiaoyu Zhang
50f28f65a0 fix typo in deep gemm benchmarking(#3991) 2025-03-02 00:34:00 -08:00
Xiaoyu Zhang
90a55e2566 add deepgemm and sglang fp8 block-wise gemm benchmark (#3893) 2025-03-01 23:01:58 -08:00
Chayenne
18bb216c28 Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982) 2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team
1c96fa86cf [MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613) 2025-02-27 19:42:48 -08:00
Yineng Zhang
5d86016855 revert "Docs: Reorngaize dpsk links #3900" (#3933) 2025-02-27 08:57:13 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
Chayenne
7c1692aa90 Docs: Reorngaize dpsk links (#3900) 2025-02-26 15:16:31 -08:00
IAN
107710268a [BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841) 2025-02-25 09:32:05 -08:00
Zhiqiang Xie
6c7a152c5a Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-02-23 21:56:30 -08:00
Mick
45205d88a0 bench: Add MMMU benchmark for vLM (#3562) 2025-02-22 08:10:59 -08:00
simveit
bb121214c2 Variance measure for reasoning benchmark (#3677) 2025-02-20 03:49:49 +08:00
Zhanghao Wu
f93e915817 [Docs] Add SkyPilot DeepSeek example (#3706) 2025-02-20 02:10:23 +08:00