enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Files

Georgi Gerganov e298d2fbd0 kv-cache : add SWA support (#13194 )

* kv-cache : prepare for SWA

ggml-ci

* kv-cache : initial iSWA implementation

ggml-ci

* kv-cache : rework error recovery logic

ggml-ci

* models : fix Phi-3 SWA parameters

ggml-ci

* model : adjust Granite to rope factor changes

ggml-ci

* server : check if context can do shifts

ggml-ci

* iswa : for now, always enable shifts (experiment)

ggml-ci

* kv-cache : simplify SWA logic

ggml-ci

* kv-cache : apply defrag when we fail to find slots for the batch

ggml-ci

* llama : update docs about llama_decode

ggml-ci

* kv-cache : update warning logs when no space for the batch is available

ggml-ci

* llama : add llama_kv_self_seq_pos_min()

* kv-cache : keep track of partial SWA computes and print warnings

* server : disallow use cases involving partial SWA context

ggml-ci

* llama : add param to control SWA cache size

ggml-ci

* minor : clean-up

ggml-ci

2025-05-20 08:05:46 +03:00

batched-bench

batched-bench : fix pp batch contents (#13492 )

2025-05-13 18:01:53 +03:00

cvector-generator

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

export-lora

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

gguf-split

llama : move end-user examples to tools directory (#13249 )