Johannes Gäßler
11f0af5504
CUDA: faster tile FA, add oob checks, more HSs ( #16492 )
2025-10-11 20:54:32 +02:00
uvos
e95fec640f
HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.0 ( #16221 )
...
* HIP: Disable ROCWMMA fatt on CDNA when compiled against ROCWMMA 2.0.0
rocwmma 2.0.0 includes a bug in the code fakeing fp16 accumulation on CDNA
* CUDA: Fix volta condition in ggml_cuda_should_use_wmma_fattn
2025-10-01 23:09:25 +02:00
Johannes Gäßler
368560a1e3
CUDA: fix compilation on CC 6.0 ( #16091 )
2025-09-18 19:28:32 +02:00
Johannes Gäßler
c959b676be
CUDA: fix FA occupancy, optimize tile kernel ( #15982 )
2025-09-17 15:32:42 +02:00
Johannes Gäßler
0e6ff0046f
CUDA: larger SRAM reads for tile FA, AMD FP16 dot ( #15927 )
...
* CUDA: larger SRAM reads for tile FA, AMD FP16 dot
* fix logic for availability of v_dot2_f32_f16
2025-09-11 21:19:58 +02:00
Johannes Gäßler
17bc5a815f
HIP: use v_dot2_f32_f16 instruction for FA ( #15884 )
2025-09-09 14:04:43 +02:00
Johannes Gäßler
79bc429262
CUDA: faster tile FA (Pascal/AMD), headsize 256 ( #15769 )
2025-09-07 00:26:28 +02:00