Johannes Gäßler
905d87b70a
ggml : GPU-accelerated token generation (#1412)
* CUDA kernel for q4_0 dequant. + mat. vec. mult.
* Added q4_1 via template
* Added missing __syncthreads();
* --gpu_layers -> --gpu-layers
* Shorter dequantize_mul_mat_vec line
* q5_0 dequantize_mul_mat kernel
* More readable dequantize_mul_mat_vec logic
* dequantize_mul_mat_vec kernels for q5_1, q8_0, f16
* llama : offload "output" tensor to GPU too + coding style fixes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-13 16:38:36 +03:00
..
2023-05-13 15:56:40 +03:00
2023-05-02 23:09:08 +03:00
2023-05-13 10:24:20 +03:00
2023-04-28 19:13:33 +03:00
2023-05-12 21:44:20 +03:00
2023-05-08 17:41:54 +03:00
2023-05-12 00:23:08 +03:00
2023-05-01 18:23:47 +02:00
2023-05-01 18:23:47 +02:00
2023-04-22 09:54:33 +03:00
2023-03-29 20:21:09 +03:00
2023-05-03 20:58:11 +03:00
2023-03-25 21:51:41 +02:00
2023-05-13 15:56:40 +03:00
2023-05-13 16:38:36 +03:00
2023-05-13 16:38:36 +03:00
2023-04-13 16:03:39 +03:00
2023-05-03 18:26:47 +03:00
2023-03-29 10:10:24 -05:00