CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131)

* CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16
This commit is contained in:
Johannes Gäßler
2025-08-07 10:53:21 +02:00
committed by GitHub
parent 20638e4f16
commit 1d72c84188
13 changed files with 750 additions and 225 deletions

View File

@@ -310,7 +310,7 @@ bool ggml_cuda_should_use_mmq(enum ggml_type type, int cc, int64_t ne11) {
return false;
}
if (new_mma_available(cc)) {
if (turing_mma_available(cc)) {
return true;
}