Georgi Gerganov
e298d2fbd0
kv-cache : add SWA support (#13194)
* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci
2025-05-20 08:05:46 +03:00
..
2025-05-13 18:01:53 +03:00
2025-05-02 20:27:13 +02:00
2025-05-02 20:27:13 +02:00
2025-05-02 20:27:13 +02:00
2025-05-09 11:53:58 +02:00
2025-05-20 08:05:46 +03:00
2025-05-09 13:02:07 +02:00
2025-05-19 13:04:14 +02:00
2025-05-08 14:26:50 +03:00
2025-05-13 19:12:31 +02:00
2025-05-09 13:02:07 +02:00
2025-05-09 10:25:50 +01:00
2025-05-20 08:05:46 +03:00
2025-05-02 20:27:13 +02:00
2025-05-02 20:27:13 +02:00
2025-05-05 16:02:55 +02:00