Commit Graph

7 Commits

Author SHA1 Message Date
Johannes Gäßler
11f0af5504 CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
uvos
e95fec640f HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.0 (#16221)
* HIP: Disable ROCWMMA fatt on CDNA when compiled against ROCWMMA 2.0.0

rocwmma 2.0.0 includes a bug in the code fakeing fp16 accumulation on CDNA

* CUDA: Fix volta condition in ggml_cuda_should_use_wmma_fattn
2025-10-01 23:09:25 +02:00
Johannes Gäßler
368560a1e3 CUDA: fix compilation on CC 6.0 (#16091) 2025-09-18 19:28:32 +02:00
Johannes Gäßler
c959b676be CUDA: fix FA occupancy, optimize tile kernel (#15982) 2025-09-17 15:32:42 +02:00
Johannes Gäßler
0e6ff0046f CUDA: larger SRAM reads for tile FA, AMD FP16 dot (#15927)
* CUDA: larger SRAM reads for tile FA, AMD FP16 dot

* fix logic for availability of v_dot2_f32_f16
2025-09-11 21:19:58 +02:00
Johannes Gäßler
17bc5a815f HIP: use v_dot2_f32_f16 instruction for FA (#15884) 2025-09-09 14:04:43 +02:00
Johannes Gäßler
79bc429262 CUDA: faster tile FA (Pascal/AMD), headsize 256 (#15769) 2025-09-07 00:26:28 +02:00