CUDA: larger SRAM reads for tile FA, AMD FP16 dot (#15927)

* CUDA: larger SRAM reads for tile FA, AMD FP16 dot

* fix logic for availability of v_dot2_f32_f16
This commit is contained in:
Johannes Gäßler
2025-09-11 21:19:58 +02:00
committed by GitHub
parent df082f5630
commit 0e6ff0046f
3 changed files with 127 additions and 36 deletions

View File

@@ -162,6 +162,14 @@
#define GCN
#endif
#if defined(__gfx900__) || defined(__gfx906__)
#define GCN5
#endif
#if defined(__gfx803__)
#define GCN4
#endif
#if defined(__gfx908__) || defined(__gfx90a__) || defined(__gfx942__)
#define CDNA // For the entire family
#endif