lambert0312
|
61e7c4dd21
|
Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368)
|
2025-04-14 18:39:44 -07:00 |
|
Xiaoyu Zhang
|
3e4794aad8
|
refine fused_moe tuning docs (#5294)
|
2025-04-12 10:01:13 -07:00 |
|
Mick
|
34ef6c8135
|
[VLM] Adopt fast image processor by default (#5065)
|
2025-04-11 21:46:58 -07:00 |
|
Chunan Zeng
|
a7c3f74bec
|
[FA3 Feature] Support multi modal Llama-3.2-11B-Vision-Instruct (#5103)
|
2025-04-07 22:58:08 -07:00 |
|
Xiaoyu Zhang
|
924ca7c92c
|
Add DeepSeek V3/R1 shared experts fusion (#4918)
|
2025-04-04 01:59:29 -07:00 |
|
AniZpZ
|
d95269f9b3
|
[2/3] fix dsv3 awq issue (#4625)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
|
2025-04-03 17:36:39 -07:00 |
|
Ravi Theja
|
69df9761dd
|
Add LlavaLlamaForCausaLM in MultiModal Processors (#5039)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
|
2025-04-03 15:41:12 -07:00 |
|
Mick
|
5cb552b1d4
|
refactor: multimodal data (#4754)
|
2025-03-31 09:57:51 -07:00 |
|
Brayden Zhong
|
b149b39353
|
[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969)
|
2025-03-27 19:45:02 -07:00 |
|
Daniel Holanda
|
98a2cfa9b2
|
Basic Cleanup (#4833)
|
2025-03-27 16:55:48 -07:00 |
|
Ravi Theja
|
e6e4d02245
|
Update MMMU Benchmark instructions (#4694)
|
2025-03-27 14:44:16 -07:00 |
|
Chunan Zeng
|
14269198e3
|
[Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735)
|
2025-03-24 20:56:31 -07:00 |
|
Mick
|
1e86457c90
|
model: Minicpmo (#3023)
|
2025-03-24 20:08:40 -07:00 |
|
Tongbao Zhang
|
3980ff1be6
|
rename benchmark_deepgemm_fp8_group_gemm.py (#4605)
|
2025-03-23 23:35:20 -07:00 |
|
Mick
|
11577cedb7
|
refactor: bug fixes and refactor for vlm (#4661)
|
2025-03-22 22:48:49 -07:00 |
|
Ke Bao
|
8f163b1653
|
Add EAGLE mtbench benchmark script (#4676)
Co-authored-by: chromecast56 <jamesll@mit.edu>
|
2025-03-22 13:34:01 -07:00 |
|
penguin_wwy
|
38f25e87fc
|
Correcting default configuration when benchmarking fused_moe (#4665)
|
2025-03-22 00:52:34 -07:00 |
|
aoshen524
|
588865f0e0
|
[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-03-18 20:33:07 -07:00 |
|
Mick
|
98be3bd306
|
refactor: rewrite bench-mmmu-sglang (#4458)
|
2025-03-17 18:11:47 -07:00 |
|
Wenbo Yang
|
75b656488a
|
Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418)
|
2025-03-17 00:03:43 -07:00 |
|
ZelinTan
|
402db5c58c
|
Benchmark: Statistical Analysis of the Output Stability of the Deepseek Model (#4202)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-03-16 17:32:57 -07:00 |
|
JieXin Liang
|
1a3fa75f2f
|
[Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466)
|
2025-03-16 00:02:47 -07:00 |
|
Zhan Lu
|
660305c38a
|
[Doc] fix wrong flag in deepseek documentation (#4427)
|
2025-03-14 11:30:55 -07:00 |
|
laixin
|
0c02086015
|
add INT8 example into dsv3 README (#4079)
|
2025-03-12 21:37:30 -07:00 |
|
Mick
|
01090e8ac3
|
model: Support Janus-pro (#3203)
|
2025-03-12 11:02:11 -07:00 |
|
Xiaoyu Zhang
|
7130a7cea9
|
refine sgl_moe_align_block_size_benchmark (#4327)
|
2025-03-11 22:48:38 -07:00 |
|
Mick
|
ff2ce0b86f
|
refactor: move image processors to separate files (#4229)
|
2025-03-11 12:35:35 -07:00 |
|
yych0745
|
6a02b32d07
|
Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-03-11 00:49:06 -07:00 |
|
lukec
|
ffa1b3e318
|
Add an example of using deepseekv3 int8 sglang. (#4177)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-03-07 01:56:09 -08:00 |
|
Yueyang Pan
|
25482edb5c
|
Online serving benchmarks of real datasets for hierarchical KV caching (#3211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-03-05 16:16:43 -08:00 |
|
Lu Changqi
|
e5760bc40a
|
bench: add dataset param for bench_multiturn (#3990)
|
2025-03-05 01:21:37 -08:00 |
|
Lianmin Zheng
|
66301e124f
|
Improve code styles (#4021)
|
2025-03-03 03:20:23 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Stefan He
|
0194948fd9
|
Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014)
|
2025-03-02 23:29:55 -08:00 |
|
Stefan He
|
b7e274f2d9
|
Add Benchmark for DeepGEMM Group GEMM (#3993)
|
2025-03-02 17:47:21 -08:00 |
|
Xiaoyu Zhang
|
50f28f65a0
|
fix typo in deep gemm benchmarking(#3991)
|
2025-03-02 00:34:00 -08:00 |
|
Xiaoyu Zhang
|
90a55e2566
|
add deepgemm and sglang fp8 block-wise gemm benchmark (#3893)
|
2025-03-01 23:01:58 -08:00 |
|
Chayenne
|
18bb216c28
|
Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982)
|
2025-02-28 23:57:17 -08:00 |
|
yiakwy-xpu-ml-framework-team
|
1c96fa86cf
|
[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613)
|
2025-02-27 19:42:48 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
laixin
|
b0df5d240b
|
Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-02-27 10:59:46 +00:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
IAN
|
107710268a
|
[BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841)
|
2025-02-25 09:32:05 -08:00 |
|
Zhiqiang Xie
|
6c7a152c5a
|
Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-02-23 21:56:30 -08:00 |
|
Mick
|
45205d88a0
|
bench: Add MMMU benchmark for vLM (#3562)
|
2025-02-22 08:10:59 -08:00 |
|
simveit
|
bb121214c2
|
Variance measure for reasoning benchmark (#3677)
|
2025-02-20 03:49:49 +08:00 |
|
Zhanghao Wu
|
f93e915817
|
[Docs] Add SkyPilot DeepSeek example (#3706)
|
2025-02-20 02:10:23 +08:00 |
|
Yineng Zhang
|
fe0673f1cc
|
set NCCL_IB_GID_INDEX=3 for multi node NVIDIA InfiniBand if needed (#3698)
|
2025-02-19 20:50:22 +08:00 |
|
yigex
|
ddf39d3fce
|
[ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567)
|
2025-02-17 17:54:10 -08:00 |
|
Xiaoyu Zhang
|
c38f3aed24
|
support multi-gpu block-gemm tuning (#3639)
|
2025-02-18 00:00:35 +08:00 |
|