yiakwy-xpu-ml-framework-team
|
10bfce71b3
|
fix moe align blocks benchmark (#3003)
|
2025-01-20 19:33:29 +08:00 |
|
Xiaoyu Zhang
|
83452dbb4a
|
fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971)
|
2025-01-18 18:56:13 -08:00 |
|
Xiaoyu Zhang
|
c2f212d672
|
optimize MiniMax-Text-01 lightning_attn_decode triton (#2966)
|
2025-01-18 23:41:01 +08:00 |
|
Xiaoyu Zhang
|
78e974b2a5
|
[kernel] MiniMax-Text-01 decode lightning_attn with triton (#2920)
|
2025-01-16 12:51:38 -08:00 |
|
Xiaoyu Zhang
|
ab31793661
|
[kernel] MiniMax-Text-01 prefill lightning_attn with triton (#2911)
|
2025-01-16 14:18:29 +08:00 |
|
Xiaoyu Zhang
|
d08c77c434
|
Sampling penalties memory interface (#2870)
|
2025-01-13 23:09:00 +08:00 |
|
Ke Bao
|
85b2e05770
|
Add int8 quant kernel (#2848)
|
2025-01-13 13:16:58 +08:00 |
|
Lianmin Zheng
|
bdc1acf6cd
|
Misc fix for min_p_sampling, --cuda-graph-bs (#2761)
|
2025-01-07 02:52:53 -08:00 |
|
Xiaoyu Zhang
|
380930a959
|
add benchmark_moe_align_blocks (#2767)
|
2025-01-07 14:20:50 +08:00 |
|
HandH1998
|
afa0341e57
|
Update Triton configs for block fp8 kernels (#2641)
|
2024-12-29 22:53:47 +08:00 |
|
Yineng Zhang
|
7863e4368a
|
add configs for block fp8 related kernels (#2628)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-28 23:12:04 +08:00 |
|
Xiaoyu Zhang
|
9a23c48456
|
h100 tuning fused_moe_triton for qwen2 moe (#2560)
|
2024-12-26 03:13:31 -08:00 |
|
Ke Bao
|
e835a50021
|
Reorg moe code (#2563)
|
2024-12-24 01:10:22 +08:00 |
|
Xiaoyu Zhang
|
7d672d277b
|
[kernel optimize] benchmark write_req_to_token_pool_triton and optimize kernel (#2509)
|
2024-12-22 02:31:02 -08:00 |
|
bjmsong
|
e21026690d
|
benchmark decoding attention kernel with cudnn (#2467)
Co-authored-by: root <bjmsong@126.com>
|
2024-12-17 03:31:57 -08:00 |
|
Xiaoyu Zhang
|
a0592c059f
|
[Benchmark] add a benchmark for hf/vllm/sglang rmsnorm (#2486)
|
2024-12-15 13:52:08 +08:00 |
|
bjmsong
|
f67723940d
|
decoding attention kernel benchmark (#2425)
Co-authored-by: root <bjmsong@126.com>
|
2024-12-11 04:46:59 -08:00 |
|
Xiaoyu Zhang
|
3844feb9bb
|
Add a unittest for fused_moe (#2416)
|
2024-12-08 22:46:10 -08:00 |
|
Lianmin Zheng
|
07ec07ad1f
|
Improve torch compile for fused moe (#2327)
|
2024-12-03 01:58:25 -08:00 |
|
Lianmin Zheng
|
33deca81b5
|
Add more fused moe benchmark utilities (#2314)
|
2024-12-02 04:26:55 -08:00 |
|
Xiaoyu Zhang
|
262e370f78
|
[benchmark] Add fused_moe_triton benchmark and tuning tools (#2225)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2024-11-29 13:36:45 -08:00 |
|