CUDA: optimize FA for GQA + large batches (#12014)

2025-02-22 12:20:17 +01:00
parent 335eb04a91
commit 5fa07c2f93
32 changed files with 940 additions and 411 deletions
--- a/ggml/src/ggml-cuda/fattn-mma-f16.cuh
+++ b/ggml/src/ggml-cuda/fattn-mma-f16.cuh