This website requires JavaScript.
228f34c9ce
SYCL: Implement few same quantized type copy kernels (#13739 )
Akarshan Biswas
2025-06-07 18:58:20 +05:30
0974ad7a7c
llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050 )
Sigbjørn Skjæret
2025-06-07 14:13:12 +02:00
745aa5319b
llama : deprecate llama_kv_self_ API (#14030 )
Georgi Gerganov
2025-06-06 14:11:15 +03:00
487a5e0401
context : fix SWA-related warning for multiple sequences (#14045 )
Georgi Gerganov
2025-06-06 13:29:18 +03:00
d17a809ef0
llama : support multiple classifier outputs and labels (#13940 )
Sigbjørn Skjæret
2025-06-06 09:03:25 +02:00
1caae7fc6c
gguf-py : add add_classifier_output_labels method to writer (#14031 )
Sigbjørn Skjæret
2025-06-05 17:42:31 +02:00
669c13e0f6
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001 )
Masato Nakasaka
2025-06-05 23:00:29 +09:00
146b88e8b3
ci: fix CUDA build failure on autodl cloud machines (#14005 )
pockers21
2025-06-05 06:25:29 -07:00
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory (#14006 )
Georgi Gerganov
2025-06-05 15:29:22 +03:00
3a077146a4
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013 )
Diego Devesa
2025-06-05 02:57:42 -07:00
d01d112abb
readme : add badge (#13938 )
Olexandr88
2025-06-05 10:50:55 +03:00
9f47fa5792
vocab : warn about missing mask token (#14022 )
Sigbjørn Skjæret
2025-06-05 09:29:18 +02:00
9e31bec4fd
context : fix pos_min initialization upon error decode (#14008 )
Georgi Gerganov
2025-06-05 09:06:29 +03:00
5a8ae3053c
vulkan: automatically deduce size of push constants (#13936 )
Jeff Bolz
2025-06-05 00:17:58 -05:00
0d3984424f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813 )
Ervin Áron Tasnádi
2025-06-04 22:02:00 +02:00
3e63a58ef7
kv-cache : refactor the update/defrag mechanism (#13988 )
Georgi Gerganov
2025-06-04 18:58:20 +03:00
2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997 )
Diego Devesa
2025-06-04 06:37:40 -07:00
482548716f
releases : use dl backend for linux release, remove arm64 linux release (#13996 )
Diego Devesa
2025-06-04 04:15:54 -07:00
3ac67535c8
llama-graph : use ggml_repeat_4d (#13998 )
Xuan-Son Nguyen
2025-06-04 10:11:26 +02:00
0b4be4c435
CUDA: fix FTZ in FA for Gemma 3 (#13991 )
Johannes Gäßler
2025-06-04 08:57:05 +02:00
e0e806f52e
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985 )
Georgi Gerganov
2025-06-04 09:50:32 +03:00
7e00e60ef8
vulkan: fix warnings in perf logger querypool code (#13937 )
Jeff Bolz
2025-06-03 13:30:22 -05:00
ea1431b0fa
docs : add "Quick start" section for new users (#13862 )
Xuan-Son Nguyen
2025-06-03 13:09:36 +02:00
71e74a3ac9
opencl: add backend_synchronize (#13939 )
lhez
2025-06-02 16:54:58 -07:00
bfb1e012a0
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840 )
rmatif
2025-06-02 23:53:36 +00:00
3637576288
server : disable speculative decoding for SWA models (#13970 )
Georgi Gerganov
2025-06-02 21:34:40 +03:00
ea394d7ab1
metal : use F32 accumulators in FA kernels (#13975 )
Georgi Gerganov
2025-06-02 21:33:40 +03:00
5582c49c39
gemma : more consistent attention scaling for v2 and v3 (#13951 )
Georgi Gerganov
2025-06-02 20:54:26 +03:00
c9bbc77931
server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933 )
Olivier Chafik
2025-06-02 10:15:44 -07:00
bfd322796c
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961 )
Xuan-Son Nguyen
2025-06-02 16:29:28 +02:00
093e3f1feb
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966 )
shalinib-ibm
2025-06-02 17:48:36 +05:30
663445b0de
sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826 )
Atharva Dubey
2025-06-02 10:12:20 +01:00
7675c555a1
gguf: fix failure on version == 0 (#13956 )
Johannes Gäßler
2025-06-01 18:08:05 +02:00
5e1c3aed40
convert : fix nomic-bert-moe mask token (#13757 )
Sigbjørn Skjæret
2025-06-01 18:07:21 +02:00
c496fe0b1d
convert : fix vocab padding code for bert models (#13954 )
Sigbjørn Skjæret
2025-06-01 17:23:11 +02:00
e57bb87ced
ggml: check if non-native endian model is being loaded (#13943 )
Aaron Teo
2025-06-01 22:53:57 +08:00
f3a4b1659c
sync : ggml
Georgi Gerganov
2025-06-01 12:23:14 +03:00
108009f5c7
vulkan : Remove unexpected ; (ggml/1253)
Kai Pastor
2025-05-31 12:49:55 +02:00
d337252acf
cmake : Fix broken CMake error messages (ggml/1252)
Kai Pastor
2025-05-31 12:39:19 +02:00
af6f91db47
ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247)
Radoslav Gerganov
2025-05-30 09:11:09 +03:00
a7b8d35f78
sync : whisper.cpp (ggml/1250)
Georgi Gerganov
2025-05-29 13:29:50 +03:00
6eba72b71c
ggml : install dynamic backends (ggml/1240)
Radoslav Gerganov
2025-05-29 08:34:46 +03:00
fedf034a98
ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)
Daniel Tang
2025-05-27 20:58:46 -04:00
8726392d3d
readme : update bindings (#13950 )
ddh0
2025-06-01 03:44:30 -05:00
c04621711a
parallel : fix n_junk == 0 (#13952 )
Georgi Gerganov
2025-06-01 11:42:16 +03:00
0fc16b42e8
kv-cache : split implementation in separate sources (#13920 )
Georgi Gerganov
2025-06-01 11:39:27 +03:00
053b1539c0
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995 )
Max Krasnyansky
2025-05-31 15:39:19 -07:00
b3a89c3d9e
docs : Note about necessity of having libcurl installed for standard build. (#13945 )
Jiří Podivín
2025-05-31 18:58:35 +02:00
e15898d1c7
server: allow unclosed thinking tags (#13931 )
Olivier Chafik
2025-05-31 08:26:10 -07:00
803f8baf4f
llama : deprecate explicit kv_self defrag/update calls (#13921 )
Georgi Gerganov
2025-05-31 15:58:33 +03:00
3600cc2886
llama : use n_swa + n_ubatch cells for SWA cache (#13833 )
Georgi Gerganov
2025-05-31 15:57:44 +03:00
c7e0a2054b
webui : Replace alert and confirm with custom modals. (#13711 )
igardev
2025-05-31 12:56:08 +03:00
3f55f781f1
llama : auto-batch preparation (#13845 )
Georgi Gerganov
2025-05-31 12:55:57 +03:00
51fa76f172
mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (⚠️ breaking change) (#13917 )
Xuan-Son Nguyen
2025-05-31 10:14:29 +02:00
12d0188c0d
kv-cache : refactor + add llama_memory_state_i (#13746 )
Georgi Gerganov
2025-05-31 10:24:04 +03:00
eb3949938e
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856 ) (#13895 )
Shawn yang
2025-05-31 14:48:04 +08:00
e562eece7c
CUDA: fix typo in FlashAttention code (#13926 )
Johannes Gäßler
2025-05-30 21:22:03 +02:00
b47ab7b8e9
sched : avoid changing cur_copy when a graph is already allocated (#13922 )
Diego Devesa
2025-05-30 09:56:19 -07:00
dd665cc9d4
parallel : increase the variability of the prompt lengths (#13927 )
Georgi Gerganov
2025-05-30 19:38:07 +03:00
df0c0c7d02
cuda : prevent using split buffers with 3d/4d matrices (#13919 )
Diego Devesa
2025-05-30 07:37:18 -07:00
b49a8ff96b
SYCL: Add mrope kernel (#13755 )
Akarshan Biswas
2025-05-30 19:40:57 +05:30
53f925074d
sync : vendor (#13901 )
Georgi Gerganov
2025-05-30 16:25:45 +03:00
db38704f01
convert : fix rwkv bos/eos token (#13844 )
Sigbjørn Skjæret
2025-05-30 14:50:43 +02:00
07e4351ce6
convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )
Xuan-Son Nguyen
2025-05-30 12:24:37 +02:00
291f2b6913
llama : add support for DistilBert (#13907 )
Đinh Trọng Huy
2025-05-30 18:56:02 +09:00
2c90da4c7e
llama : use llm_build_granite for minicpm (#13911 )
zhangkaihuo
2025-05-30 16:31:48 +08:00
ec9e0301fe
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890 )
Christian Kastner
2025-05-30 01:28:54 +02:00
e83ba3e460
llama : add support for jina-reranker-v2 (#13900 )
Sigbjørn Skjæret
2025-05-29 21:42:31 +02:00
2b131621e6
gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method (#13561 )
Sigbjørn Skjæret
2025-05-29 15:36:05 +02:00
54a2c7a8cd
arm64: optimize q4_k_q8_k kernel with i8mm (#13886 )
Yibo Cai
2025-05-29 19:39:20 +08:00
21fcc21ad5
cmake: Factor out CPU architecture detection (#13883 )
Christian Kastner
2025-05-29 12:50:25 +02:00
dd8ba93416
ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (#13882 )
Vineel Abhinav
2025-05-29 14:48:43 +05:30
66c92061f5
tests : remove json.hpp from a test (#13880 )
Georgi Gerganov
2025-05-29 12:17:16 +03:00
5ca82fc1d7
convert : workaround for AutoConfig dummy labels (#13881 )
Sigbjørn Skjæret
2025-05-29 10:00:57 +02:00
6385b843a8
llama : add RobertaForSequenceClassification reranker support (#13875 )
Sigbjørn Skjæret
2025-05-29 08:15:01 +02:00
1b8fb8152d
ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843 )
Vineel Abhinav
2025-05-29 11:31:33 +05:30
53ae30640e
gguf-py : fix SafetensorRemote return on undefined size (< 0) (#13841 )
Beinsezii
2025-05-28 14:50:20 -07:00
763d06edb7
llama : fix KV shift for qwen2vl (#13870 )
Xuan-Son Nguyen
2025-05-28 22:35:31 +02:00
10961339b2
mtmd : move helpers to dedicated library (⚠️ breaking change) (#13866 )
Xuan-Son Nguyen
2025-05-28 22:35:22 +02:00
d98f2a35fc
ci: disable LLAMA_CURL for Linux cross-builds (#13871 )
bandoti
2025-05-28 15:46:47 -03:00
e0e3aa231d
llama : add support for BertForSequenceClassification reranker (#13858 )
Đinh Trọng Huy
2025-05-29 02:01:58 +09:00
aa6dff05be
convert: small addition to support LlamaModel (#13838 )
Đinh Trọng Huy
2025-05-28 23:34:18 +09:00
c962ae3382
server: fix remove 'image_url'/'input_audio' json-object effectlly for 'llama_params' in multimodal-model-mode (#13853 )
Sky
2025-05-28 22:33:54 +08:00
a3938fb53d
convert : fix qwen omni conversion (#13859 )
Xuan-Son Nguyen
2025-05-28 16:12:35 +02:00
f7873fc698
tests : change umlaut test (#11600 )
Alex Fanthome
2025-05-28 14:49:28 +01:00
a68247439b
CUDA: fix FA tg at long context for CC >= 8.9 (#13852 )
Johannes Gäßler
2025-05-28 13:33:37 +02:00
26b79b6cb3
convert : fix tensor naming conflict for llama 4 vision (#13836 )
Xuan-Son Nguyen
2025-05-28 10:05:54 +02:00
1e8659e65a
CANN: Add SOC TYPE printing in cmake configuration (#13837 )
leo-pony
2025-05-28 11:54:20 +08:00
a3c30846e4
opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm (#13787 )
lhez
2025-05-27 12:56:08 -07:00
1701d4c54f
opencl: mark mul_mat f32f32 as supporting non-contiguous tensors (#13790 )
lhez
2025-05-27 12:53:14 -07:00
bef8176387
vulkan: use timestamp queries for GGML_VULKAN_PERF (#13817 )
Jeff Bolz
2025-05-27 11:39:07 -05:00
34b7c0439e
cmake : add llama-cparams.cpp to build (#13832 )
Georgi Gerganov
2025-05-27 19:08:44 +03:00
f3101a8cc6
SYCL: add gelu_erf kernel (#13749 )
Akarshan Biswas
2025-05-27 20:52:59 +05:30
1c49c70d07
sync : ggml
Georgi Gerganov
2025-05-27 18:04:38 +03:00
a8ea03d8ad
ggml : add ggml_repeat_4d (#13824 )
Xuan-Son Nguyen
2025-05-27 15:53:55 +02:00
05f6ac6283
ggml : riscv: add xtheadvector support (#13720 )
xctan
2025-05-27 21:21:36 +08:00
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784 )
Xuan-Son Nguyen
2025-05-27 14:06:10 +02:00
72b090da2c
docs: remove link for llama-cli function calling (#13810 )
bandoti
2025-05-27 08:52:40 -03:00
7fe03e7446
ggml-cpu: x86 feature detection is specific to x86 (#13811 )
Christian Kastner
2025-05-27 13:18:39 +02:00
952f3953c1
ggml : allow CUDA graphs when using pipeline parallelism (#13814 )
Diego Devesa
2025-05-27 04:05:18 -07:00