CUDA: optimize MMQ int8 tensor core performance (#8062)
* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr
This commit is contained in:
1412
ggml-cuda/mmq.cuh
1412
ggml-cuda/mmq.cuh
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user