Georgi Gerganov
225e7a1438
llama : add high-throughput mode (#14363)
* kv-cache : prepare K/V buffers for separation
ggml-ci
* batched-bench : fix oob write
ggml-ci
* llama : add "virtual sequences"
ggml-ci
* llama : use "stream" vs "virtual sequence"
ggml-ci
* graph : fix stream splitting when KV cache is not used
ggml-ci
* kv-cache : add multi-stream save/load support
ggml-ci
* llama : add "--attn-streams" flag
ggml-ci
* kv-cache : fix handling when find_slot fails
ggml-ci
* kv-cache : restore find_slot impl
ggml-ci
* kv-cache : add comments
* kv-cache : add bounds checks for sequence id
ggml-ci
* cont : add n_seq_max to batch allocr
ggml-ci
* kv-cache : perform stream copies lazily after llama_synchronize
ggml-ci
* kv-cache : avoid throwing exceptions across the C boundary
ggml-ci
* CUDA: 4D FlashAttention support (#14628)
* CUDA: 4D FlashAttention support
* CUDA: fix WMMA FA kernel
* llama : rename attn_streams -> kv_unified
ggml-ci
* common : rename kv_split -> kv_unified
ggml-ci
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-07-16 16:35:42 +03:00
..
2025-05-09 13:34:58 +02:00
2025-07-11 18:55:00 +02:00
2025-05-12 14:44:49 +02:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2024-11-21 18:18:50 +01:00
2024-10-03 21:17:26 +03:00
2024-07-27 04:41:55 +02:00
2024-06-26 18:33:02 +03:00
2025-03-03 18:18:11 +02:00
2024-09-20 21:15:05 +03:00
2025-03-03 18:18:11 +02:00
2024-06-26 18:33:02 +03:00
2025-05-09 13:34:58 +02:00
2025-07-08 07:58:30 +08:00
2025-03-30 10:59:38 +02:00
2024-06-26 18:33:02 +03:00
2025-06-20 09:50:24 +08:00
2025-06-20 09:50:24 +08:00
2025-06-20 22:48:24 +08:00
2025-06-20 22:48:24 +08:00
2025-03-30 10:59:38 +02:00
2024-07-08 12:23:00 +03:00
2025-06-29 01:30:53 +08:00
2025-06-29 01:30:53 +08:00
2024-11-09 08:35:46 +01:00
2024-10-03 21:17:26 +03:00
2025-05-09 13:34:58 +02:00
2025-05-21 09:58:49 +08:00
2025-04-17 15:19:42 +02:00
2025-07-03 07:45:11 +08:00
2024-09-20 21:15:05 +03:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2025-07-16 16:35:42 +03:00
2025-07-16 16:35:42 +03:00
2025-07-16 16:35:42 +03:00
2024-06-26 18:33:02 +03:00
2025-07-16 16:35:42 +03:00
2024-06-26 18:33:02 +03:00
2025-07-16 16:35:42 +03:00
2025-07-16 16:35:42 +03:00
2025-07-16 16:35:42 +03:00
2025-02-02 19:31:09 +01:00
2025-05-14 16:08:20 +02:00
2024-06-26 18:33:02 +03:00
2025-07-07 21:45:43 +08:00
2025-04-30 23:12:59 +02:00
2025-07-16 16:35:42 +03:00
2025-01-10 09:58:08 +08:00
2025-01-10 09:58:08 +08:00
2024-10-23 16:50:02 +03:00
2024-06-26 18:33:02 +03:00
2025-06-22 12:39:54 +08:00
2025-06-22 12:39:54 +08:00
2025-03-30 10:59:38 +02:00
2025-05-14 16:41:02 +02:00
2025-07-03 07:45:11 +08:00
2025-06-24 01:12:56 +02:00
2025-06-23 13:11:31 +02:00
2025-05-11 16:09:33 +02:00
2025-04-22 21:27:40 +02:00
2025-03-18 07:27:50 +08:00
2025-03-18 07:27:50 +08:00
2024-11-17 08:30:29 +02:00
2024-09-20 21:15:05 +03:00
2025-01-24 12:38:31 +01:00
2024-09-20 21:15:05 +03:00
2025-03-30 10:59:38 +02:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2025-05-14 16:41:02 +02:00
2025-04-30 23:12:59 +02:00
2025-07-08 10:15:21 +03:00
2025-01-15 12:51:37 +01:00
2025-07-09 18:16:12 +02:00
2024-06-26 18:33:02 +03:00
2025-07-15 15:28:53 +08:00
2025-07-12 16:31:38 +03:00
2025-07-03 07:45:11 +08:00
2025-01-16 16:43:38 +01:00
2025-07-11 20:27:01 +02:00
2025-03-31 18:05:13 +02:00
2025-07-09 23:54:38 -04:00
2025-03-31 18:05:13 +02:00
2025-05-12 14:44:49 +02:00
2024-09-08 11:05:55 +03:00
2025-06-22 12:39:54 +08:00
2025-06-22 12:39:54 +08:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2025-07-13 11:33:16 +02:00
2025-07-13 11:33:16 +02:00
2025-07-08 10:11:18 +08:00
2024-06-26 18:33:02 +03:00
2025-04-22 21:27:40 +02:00
2025-03-18 07:27:50 +08:00
2025-03-18 07:27:50 +08:00