Georgi Gerganov
e298d2fbd0
kv-cache : add SWA support (#13194)
* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci
2025-05-20 08:05:46 +03:00
..
2024-06-26 18:33:02 +03:00
2025-05-15 23:29:10 +01:00
2025-05-20 08:05:46 +03:00
2025-04-26 22:58:12 +02:00
2023-11-07 00:36:23 +03:00
2023-11-02 08:50:16 +02:00
2025-05-15 02:39:51 +01:00
2025-05-15 02:39:51 +01:00
2025-05-16 22:56:28 +02:00
2025-05-20 08:05:46 +03:00
2025-05-20 08:05:46 +03:00
2024-09-30 11:23:42 +03:00
2023-08-21 23:07:43 +03:00
2025-04-26 10:10:20 +02:00
2025-03-05 13:05:13 +00:00
2024-03-21 11:50:43 +00:00
2025-05-10 17:19:52 +02:00
2025-02-12 21:36:11 +01:00
2025-02-12 10:06:53 -04:00
2025-03-04 18:53:26 +02:00
2025-01-06 10:52:15 +02:00
2025-05-14 19:50:57 +01:00
2025-05-14 19:50:57 +01:00
2025-05-07 11:23:28 +03:00
2025-02-02 09:55:32 +02:00
2025-03-13 12:35:44 +02:00
2025-02-19 13:29:42 +02:00
2024-08-27 08:58:50 +03:00