Yueyang Pan
|
25482edb5c
|
Online serving benchmarks of real datasets for hierarchical KV caching (#3211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-03-05 16:16:43 -08:00 |
|
Lu Changqi
|
e5760bc40a
|
bench: add dataset param for bench_multiturn (#3990)
|
2025-03-05 01:21:37 -08:00 |
|
Lianmin Zheng
|
66301e124f
|
Improve code styles (#4021)
|
2025-03-03 03:20:23 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Stefan He
|
0194948fd9
|
Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014)
|
2025-03-02 23:29:55 -08:00 |
|
Stefan He
|
b7e274f2d9
|
Add Benchmark for DeepGEMM Group GEMM (#3993)
|
2025-03-02 17:47:21 -08:00 |
|
Xiaoyu Zhang
|
50f28f65a0
|
fix typo in deep gemm benchmarking(#3991)
|
2025-03-02 00:34:00 -08:00 |
|
Xiaoyu Zhang
|
90a55e2566
|
add deepgemm and sglang fp8 block-wise gemm benchmark (#3893)
|
2025-03-01 23:01:58 -08:00 |
|
Chayenne
|
18bb216c28
|
Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982)
|
2025-02-28 23:57:17 -08:00 |
|
yiakwy-xpu-ml-framework-team
|
1c96fa86cf
|
[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613)
|
2025-02-27 19:42:48 -08:00 |
|
Yineng Zhang
|
5d86016855
|
revert "Docs: Reorngaize dpsk links #3900" (#3933)
|
2025-02-27 08:57:13 -08:00 |
|
laixin
|
b0df5d240b
|
Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-02-27 10:59:46 +00:00 |
|
Chayenne
|
7c1692aa90
|
Docs: Reorngaize dpsk links (#3900)
|
2025-02-26 15:16:31 -08:00 |
|
IAN
|
107710268a
|
[BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841)
|
2025-02-25 09:32:05 -08:00 |
|
Zhiqiang Xie
|
6c7a152c5a
|
Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-02-23 21:56:30 -08:00 |
|
Mick
|
45205d88a0
|
bench: Add MMMU benchmark for vLM (#3562)
|
2025-02-22 08:10:59 -08:00 |
|
simveit
|
bb121214c2
|
Variance measure for reasoning benchmark (#3677)
|
2025-02-20 03:49:49 +08:00 |
|
Zhanghao Wu
|
f93e915817
|
[Docs] Add SkyPilot DeepSeek example (#3706)
|
2025-02-20 02:10:23 +08:00 |
|
Yineng Zhang
|
fe0673f1cc
|
set NCCL_IB_GID_INDEX=3 for multi node NVIDIA InfiniBand if needed (#3698)
|
2025-02-19 20:50:22 +08:00 |
|
yigex
|
ddf39d3fce
|
[ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567)
|
2025-02-17 17:54:10 -08:00 |
|
Xiaoyu Zhang
|
c38f3aed24
|
support multi-gpu block-gemm tuning (#3639)
|
2025-02-18 00:00:35 +08:00 |
|
Shenggui Li
|
c9565e49e7
|
[docker] added rdma support (#3619)
|
2025-02-17 15:36:16 +08:00 |
|
simveit
|
3d4a8f9bc0
|
Benchmark for reasoning models (#3532)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-17 03:07:30 +08:00 |
|
Yineng Zhang
|
ac963be234
|
update flashinfer-python (#3557)
|
2025-02-14 09:52:56 +08:00 |
|
Yineng Zhang
|
e0b9a423c8
|
chore: bump v0.4.3 (#3556)
|
2025-02-14 09:43:14 +08:00 |
|
Yineng Zhang
|
20de05a753
|
update README (#3543)
|
2025-02-13 17:22:11 +08:00 |
|
Jhin
|
bf2a70872e
|
Update DeepSeek V3 Doc (#3541)
|
2025-02-12 23:15:37 -08:00 |
|
Xiaoyu Zhang
|
693c2600e0
|
refine deepseek_v3 launch server doc (#3522)
|
2025-02-12 17:27:07 +08:00 |
|
yigex
|
fdf04a1426
|
[ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418)
Co-authored-by: Bruce Xue <yigex@xilinx.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-02-10 23:55:04 -08:00 |
|
Xiaoyu Zhang
|
2f47d710ae
|
refine some typo (#3473)
|
2025-02-10 23:35:44 +08:00 |
|
Yineng Zhang
|
cddb1cdf8f
|
chore: bump v0.4.2.post4 (#3459)
|
2025-02-10 14:12:16 +08:00 |
|
Yineng Zhang
|
fad315cb8e
|
fix EAGLE 2 non greedy case (#3407)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-09 07:28:34 +08:00 |
|
Yineng Zhang
|
f90db8bc07
|
fix typo
|
2025-02-08 22:16:42 +08:00 |
|
Ke Bao
|
d8ad597048
|
Add deepseek-v3 a100 serving example (#3404)
|
2025-02-08 22:13:52 +08:00 |
|
GaoYuYang
|
849f58d617
|
Update fused_moe's benchmark (#3346)
|
2025-02-08 21:58:21 +08:00 |
|
yiakwy-xpu-ml-framework-team
|
64480df495
|
[BUG] fix moe benchmark when bs*seq is small (#3382)
|
2025-02-08 15:39:44 +08:00 |
|
Yineng Zhang
|
c1f5f99f60
|
chore: bump v0.4.2.post3 (#3369)
|
2025-02-07 08:20:03 -08:00 |
|
Xiaoyu Zhang
|
cdae77b03d
|
optimize moe_align_kernel cuda (#3347)
|
2025-02-07 00:53:46 +08:00 |
|
Ke Bao
|
6792411e7f
|
[Doc] Add optimization option guide for deepseek v3 (#3349)
|
2025-02-06 23:28:09 +08:00 |
|
Yineng Zhang
|
7348d9627e
|
add AMD guide for DeepSeek-R1 (#3338)
|
2025-02-06 16:54:40 +08:00 |
|
Xiaoyu Zhang
|
ad3499858e
|
clean moe align block kernel code and add acc test (#3332)
|
2025-02-06 16:42:36 +08:00 |
|
Yineng Zhang
|
07e58a2dcb
|
update README (#3324)
|
2025-02-06 07:13:05 +08:00 |
|
Baizhou Zhang
|
70817a7eae
|
[Feature] Define backends and add Triton backend for Lora (#3161)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-03 22:09:13 -08:00 |
|
Yineng Zhang
|
7b020cca2d
|
add tuning block wise fp8 (#3242)
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
|
2025-02-01 03:58:18 +08:00 |
|
yigex
|
351a72d40b
|
add dsv3 mi300 triton config for block scale (#3146)
|
2025-01-27 17:25:53 +08:00 |
|
Lianmin Zheng
|
27acf63bbd
|
Use torch.compile for scaling penalty (#3133)
|
2025-01-25 18:27:33 -08:00 |
|
Xiaoyu Zhang
|
ac2dc35d0e
|
support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030)
|
2025-01-23 15:29:20 +08:00 |
|
yiakwy-xpu-ml-framework-team
|
10bfce71b3
|
fix moe align blocks benchmark (#3003)
|
2025-01-20 19:33:29 +08:00 |
|
Xiaoyu Zhang
|
83452dbb4a
|
fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971)
|
2025-01-18 18:56:13 -08:00 |
|
Xiaoyu Zhang
|
c2f212d672
|
optimize MiniMax-Text-01 lightning_attn_decode triton (#2966)
|
2025-01-18 23:41:01 +08:00 |
|