enginex-ascend-910-llama.cpp/common at ad57d3edd2f48cf6dc41a98fd9b303435ecb4fb0 - enginex-ascend-910-llama.cpp - Gitea: Git with a cup of tea

EngineX-Ascend/enginex-ascend-910-llama.cpp

Files

History

Georgi Gerganov 225e7a1438 llama : add high-throughput mode (#14363 )

* kv-cache : prepare K/V buffers for separation

ggml-ci

* batched-bench : fix oob write

ggml-ci

* llama : add "virtual sequences"

ggml-ci

* llama : use "stream" vs "virtual sequence"

ggml-ci

* graph : fix stream splitting when KV cache is not used

ggml-ci

* kv-cache : add multi-stream save/load support

ggml-ci

* llama : add "--attn-streams" flag

ggml-ci

* kv-cache : fix handling when find_slot fails

ggml-ci

* kv-cache : restore find_slot impl

ggml-ci

* kv-cache : add comments

* kv-cache : add bounds checks for sequence id

ggml-ci

* cont : add n_seq_max to batch allocr

ggml-ci

* kv-cache : perform stream copies lazily after llama_synchronize

ggml-ci

* kv-cache : avoid throwing exceptions across the C boundary

ggml-ci

* CUDA: 4D FlashAttention support (#14628)

* CUDA: 4D FlashAttention support

* CUDA: fix WMMA FA kernel

* llama : rename attn_streams -> kv_unified

ggml-ci

* common : rename kv_split -> kv_unified

ggml-ci

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2025-07-16 16:35:42 +03:00

..

arg.cpp

llama : add high-throughput mode (#14363 )

2025-07-16 16:35:42 +03:00

arg.h

common : add common_remote_get_content (#13123 )

2025-04-26 22:58:12 +02:00

base64.hpp

llava : expose as a shared library for downstream projects (#3613 )

2023-11-07 00:36:23 +03:00

build-info.cpp.in

cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )

2025-06-13 10:38:52 +02:00

chat-parser.cpp

llama-chat : Do not throw when tool parsing fails (#14012 )

2025-06-14 17:25:15 +01:00

chat-parser.h

llama-chat : Do not throw when tool parsing fails (#14012 )

2025-06-14 17:25:15 +01:00

chat.cpp

server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196 )

2025-06-29 20:02:53 +02:00

chat.h

server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196 )

2025-06-29 20:02:53 +02:00

CMakeLists.txt

cmake : do not search for curl libraries by ourselves (#14613 )

2025-07-10 15:29:05 +03:00

common.cpp

llama : add high-throughput mode (#14363 )

2025-07-16 16:35:42 +03:00

common.h

llama : add high-throughput mode (#14363 )

2025-07-16 16:35:42 +03:00

console.cpp

console : utf-8 fix for windows stdin (#9690 )

2024-09-30 11:23:42 +03:00

console.h

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

json-partial.cpp

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

json-partial.h

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

json-schema-to-grammar.cpp

common : use std::string_view now that we target c++17 (#14319 )

2025-06-22 08:37:43 +03:00

json-schema-to-grammar.h

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

llguidance.cpp

llguidance : set tokenizer slices to default (#13424 )

2025-05-10 17:19:52 +02:00

log.cpp

Fix: Compile failure due to Microsoft STL breaking change (#11836 )

2025-02-12 21:36:11 +01:00

log.h

cleanup: fix compile warnings associated with gnu_printf (#11811 )

2025-02-12 10:06:53 -04:00

ngram-cache.cpp

ggml : portability fixes for VS 2017 (#12150 )

2025-03-04 18:53:26 +02:00

ngram-cache.h

llama : use LLAMA_TOKEN_NULL (#11062 )

2025-01-06 10:52:15 +02:00

regex-partial.cpp

common: add partial regex support (#12808 )

2025-05-14 19:50:57 +01:00

regex-partial.h

common: add partial regex support (#12808 )

2025-05-14 19:50:57 +01:00

sampling.cpp

server: streaming of tool calls and thoughts when --jinja is on (#12379 )

2025-05-25 01:48:08 +01:00

sampling.h

sampling : support for llguidance grammars (#10224 )

2025-02-02 09:55:32 +02:00

speculative.cpp

llama : deprecate llama_kv_self_ API (#14030 )

2025-06-06 14:11:15 +03:00

speculative.h

speculative : update default params (#11954 )

2025-02-19 13:29:42 +02:00