* Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too
ggml-ci
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>