Commit Graph

  • 228f34c9ce SYCL: Implement few same quantized type copy kernels (#13739) Akarshan Biswas 2025-06-07 18:58:20 +05:30
  • 0974ad7a7c llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050) Sigbjørn Skjæret 2025-06-07 14:13:12 +02:00
  • 745aa5319b llama : deprecate llama_kv_self_ API (#14030) Georgi Gerganov 2025-06-06 14:11:15 +03:00
  • 487a5e0401 context : fix SWA-related warning for multiple sequences (#14045) Georgi Gerganov 2025-06-06 13:29:18 +03:00
  • d17a809ef0 llama : support multiple classifier outputs and labels (#13940) Sigbjørn Skjæret 2025-06-06 09:03:25 +02:00
  • 1caae7fc6c gguf-py : add add_classifier_output_labels method to writer (#14031) Sigbjørn Skjæret 2025-06-05 17:42:31 +02:00
  • 669c13e0f6 vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001) Masato Nakasaka 2025-06-05 23:00:29 +09:00
  • 146b88e8b3 ci: fix CUDA build failure on autodl cloud machines (#14005) pockers21 2025-06-05 06:25:29 -07:00
  • 7f37b6cf1e memory : migrate from llama_kv_cache to more generic llama_memory (#14006) Georgi Gerganov 2025-06-05 15:29:22 +03:00
  • 3a077146a4 llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) Diego Devesa 2025-06-05 02:57:42 -07:00
  • d01d112abb readme : add badge (#13938) Olexandr88 2025-06-05 10:50:55 +03:00
  • 9f47fa5792 vocab : warn about missing mask token (#14022) Sigbjørn Skjæret 2025-06-05 09:29:18 +02:00
  • 9e31bec4fd context : fix pos_min initialization upon error decode (#14008) Georgi Gerganov 2025-06-05 09:06:29 +03:00
  • 5a8ae3053c vulkan: automatically deduce size of push constants (#13936) Jeff Bolz 2025-06-05 00:17:58 -05:00
  • 0d3984424f ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813) Ervin Áron Tasnádi 2025-06-04 22:02:00 +02:00
  • 3e63a58ef7 kv-cache : refactor the update/defrag mechanism (#13988) Georgi Gerganov 2025-06-04 18:58:20 +03:00
  • 2589ad3704 ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) Diego Devesa 2025-06-04 06:37:40 -07:00
  • 482548716f releases : use dl backend for linux release, remove arm64 linux release (#13996) Diego Devesa 2025-06-04 04:15:54 -07:00
  • 3ac67535c8 llama-graph : use ggml_repeat_4d (#13998) Xuan-Son Nguyen 2025-06-04 10:11:26 +02:00
  • 0b4be4c435 CUDA: fix FTZ in FA for Gemma 3 (#13991) Johannes Gäßler 2025-06-04 08:57:05 +02:00
  • e0e806f52e kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985) Georgi Gerganov 2025-06-04 09:50:32 +03:00
  • 7e00e60ef8 vulkan: fix warnings in perf logger querypool code (#13937) Jeff Bolz 2025-06-03 13:30:22 -05:00
  • ea1431b0fa docs : add "Quick start" section for new users (#13862) Xuan-Son Nguyen 2025-06-03 13:09:36 +02:00
  • 71e74a3ac9 opencl: add backend_synchronize (#13939) lhez 2025-06-02 16:54:58 -07:00
  • bfb1e012a0 OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840) rmatif 2025-06-02 23:53:36 +00:00
  • 3637576288 server : disable speculative decoding for SWA models (#13970) Georgi Gerganov 2025-06-02 21:34:40 +03:00
  • ea394d7ab1 metal : use F32 accumulators in FA kernels (#13975) Georgi Gerganov 2025-06-02 21:33:40 +03:00
  • 5582c49c39 gemma : more consistent attention scaling for v2 and v3 (#13951) Georgi Gerganov 2025-06-02 20:54:26 +03:00
  • c9bbc77931 server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933) Olivier Chafik 2025-06-02 10:15:44 -07:00
  • bfd322796c mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961) Xuan-Son Nguyen 2025-06-02 16:29:28 +02:00
  • 093e3f1feb cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966) shalinib-ibm 2025-06-02 17:48:36 +05:30
  • 663445b0de sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826) Atharva Dubey 2025-06-02 10:12:20 +01:00
  • 7675c555a1 gguf: fix failure on version == 0 (#13956) Johannes Gäßler 2025-06-01 18:08:05 +02:00
  • 5e1c3aed40 convert : fix nomic-bert-moe mask token (#13757) Sigbjørn Skjæret 2025-06-01 18:07:21 +02:00
  • c496fe0b1d convert : fix vocab padding code for bert models (#13954) Sigbjørn Skjæret 2025-06-01 17:23:11 +02:00
  • e57bb87ced ggml: check if non-native endian model is being loaded (#13943) Aaron Teo 2025-06-01 22:53:57 +08:00
  • f3a4b1659c sync : ggml Georgi Gerganov 2025-06-01 12:23:14 +03:00
  • 108009f5c7 vulkan : Remove unexpected ; (ggml/1253) Kai Pastor 2025-05-31 12:49:55 +02:00
  • d337252acf cmake : Fix broken CMake error messages (ggml/1252) Kai Pastor 2025-05-31 12:39:19 +02:00
  • af6f91db47 ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) Radoslav Gerganov 2025-05-30 09:11:09 +03:00
  • a7b8d35f78 sync : whisper.cpp (ggml/1250) Georgi Gerganov 2025-05-29 13:29:50 +03:00
  • 6eba72b71c ggml : install dynamic backends (ggml/1240) Radoslav Gerganov 2025-05-29 08:34:46 +03:00
  • fedf034a98 ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) Daniel Tang 2025-05-27 20:58:46 -04:00
  • 8726392d3d readme : update bindings (#13950) ddh0 2025-06-01 03:44:30 -05:00
  • c04621711a parallel : fix n_junk == 0 (#13952) Georgi Gerganov 2025-06-01 11:42:16 +03:00
  • 0fc16b42e8 kv-cache : split implementation in separate sources (#13920) Georgi Gerganov 2025-06-01 11:39:27 +03:00
  • 053b1539c0 threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995) Max Krasnyansky 2025-05-31 15:39:19 -07:00
  • b3a89c3d9e docs : Note about necessity of having libcurl installed for standard build. (#13945) Jiří Podivín 2025-05-31 18:58:35 +02:00
  • e15898d1c7 server: allow unclosed thinking tags (#13931) Olivier Chafik 2025-05-31 08:26:10 -07:00
  • 803f8baf4f llama : deprecate explicit kv_self defrag/update calls (#13921) Georgi Gerganov 2025-05-31 15:58:33 +03:00
  • 3600cc2886 llama : use n_swa + n_ubatch cells for SWA cache (#13833) Georgi Gerganov 2025-05-31 15:57:44 +03:00
  • c7e0a2054b webui : Replace alert and confirm with custom modals. (#13711) igardev 2025-05-31 12:56:08 +03:00
  • 3f55f781f1 llama : auto-batch preparation (#13845) Georgi Gerganov 2025-05-31 12:55:57 +03:00
  • 51fa76f172 mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (⚠️ breaking change) (#13917) Xuan-Son Nguyen 2025-05-31 10:14:29 +02:00
  • 12d0188c0d kv-cache : refactor + add llama_memory_state_i (#13746) Georgi Gerganov 2025-05-31 10:24:04 +03:00
  • eb3949938e CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (#13895) Shawn yang 2025-05-31 14:48:04 +08:00
  • e562eece7c CUDA: fix typo in FlashAttention code (#13926) Johannes Gäßler 2025-05-30 21:22:03 +02:00
  • b47ab7b8e9 sched : avoid changing cur_copy when a graph is already allocated (#13922) Diego Devesa 2025-05-30 09:56:19 -07:00
  • dd665cc9d4 parallel : increase the variability of the prompt lengths (#13927) Georgi Gerganov 2025-05-30 19:38:07 +03:00
  • df0c0c7d02 cuda : prevent using split buffers with 3d/4d matrices (#13919) Diego Devesa 2025-05-30 07:37:18 -07:00
  • b49a8ff96b SYCL: Add mrope kernel (#13755) Akarshan Biswas 2025-05-30 19:40:57 +05:30
  • 53f925074d sync : vendor (#13901) Georgi Gerganov 2025-05-30 16:25:45 +03:00
  • db38704f01 convert : fix rwkv bos/eos token (#13844) Sigbjørn Skjæret 2025-05-30 14:50:43 +02:00
  • 07e4351ce6 convert : allow partial update to the chkhsh pre-tokenizer list (#13847) Xuan-Son Nguyen 2025-05-30 12:24:37 +02:00
  • 291f2b6913 llama : add support for DistilBert (#13907) Đinh Trọng Huy 2025-05-30 18:56:02 +09:00
  • 2c90da4c7e llama : use llm_build_granite for minicpm (#13911) zhangkaihuo 2025-05-30 16:31:48 +08:00
  • ec9e0301fe cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890) Christian Kastner 2025-05-30 01:28:54 +02:00
  • e83ba3e460 llama : add support for jina-reranker-v2 (#13900) Sigbjørn Skjæret 2025-05-29 21:42:31 +02:00
  • 2b131621e6 gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method (#13561) Sigbjørn Skjæret 2025-05-29 15:36:05 +02:00
  • 54a2c7a8cd arm64: optimize q4_k_q8_k kernel with i8mm (#13886) Yibo Cai 2025-05-29 19:39:20 +08:00
  • 21fcc21ad5 cmake: Factor out CPU architecture detection (#13883) Christian Kastner 2025-05-29 12:50:25 +02:00
  • dd8ba93416 ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (#13882) Vineel Abhinav 2025-05-29 14:48:43 +05:30
  • 66c92061f5 tests : remove json.hpp from a test (#13880) Georgi Gerganov 2025-05-29 12:17:16 +03:00
  • 5ca82fc1d7 convert : workaround for AutoConfig dummy labels (#13881) Sigbjørn Skjæret 2025-05-29 10:00:57 +02:00
  • 6385b843a8 llama : add RobertaForSequenceClassification reranker support (#13875) Sigbjørn Skjæret 2025-05-29 08:15:01 +02:00
  • 1b8fb8152d ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843) Vineel Abhinav 2025-05-29 11:31:33 +05:30
  • 53ae30640e gguf-py : fix SafetensorRemote return on undefined size (< 0) (#13841) Beinsezii 2025-05-28 14:50:20 -07:00
  • 763d06edb7 llama : fix KV shift for qwen2vl (#13870) Xuan-Son Nguyen 2025-05-28 22:35:31 +02:00
  • 10961339b2 mtmd : move helpers to dedicated library (⚠️ breaking change) (#13866) Xuan-Son Nguyen 2025-05-28 22:35:22 +02:00
  • d98f2a35fc ci: disable LLAMA_CURL for Linux cross-builds (#13871) bandoti 2025-05-28 15:46:47 -03:00
  • e0e3aa231d llama : add support for BertForSequenceClassification reranker (#13858) Đinh Trọng Huy 2025-05-29 02:01:58 +09:00
  • aa6dff05be convert: small addition to support LlamaModel (#13838) Đinh Trọng Huy 2025-05-28 23:34:18 +09:00
  • c962ae3382 server: fix remove 'image_url'/'input_audio' json-object effectlly for 'llama_params' in multimodal-model-mode (#13853) Sky 2025-05-28 22:33:54 +08:00
  • a3938fb53d convert : fix qwen omni conversion (#13859) Xuan-Son Nguyen 2025-05-28 16:12:35 +02:00
  • f7873fc698 tests : change umlaut test (#11600) Alex Fanthome 2025-05-28 14:49:28 +01:00
  • a68247439b CUDA: fix FA tg at long context for CC >= 8.9 (#13852) Johannes Gäßler 2025-05-28 13:33:37 +02:00
  • 26b79b6cb3 convert : fix tensor naming conflict for llama 4 vision (#13836) Xuan-Son Nguyen 2025-05-28 10:05:54 +02:00
  • 1e8659e65a CANN: Add SOC TYPE printing in cmake configuration (#13837) leo-pony 2025-05-28 11:54:20 +08:00
  • a3c30846e4 opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm (#13787) lhez 2025-05-27 12:56:08 -07:00
  • 1701d4c54f opencl: mark mul_mat f32f32 as supporting non-contiguous tensors (#13790) lhez 2025-05-27 12:53:14 -07:00
  • bef8176387 vulkan: use timestamp queries for GGML_VULKAN_PERF (#13817) Jeff Bolz 2025-05-27 11:39:07 -05:00
  • 34b7c0439e cmake : add llama-cparams.cpp to build (#13832) Georgi Gerganov 2025-05-27 19:08:44 +03:00
  • f3101a8cc6 SYCL: add gelu_erf kernel (#13749) Akarshan Biswas 2025-05-27 20:52:59 +05:30
  • 1c49c70d07 sync : ggml Georgi Gerganov 2025-05-27 18:04:38 +03:00
  • a8ea03d8ad ggml : add ggml_repeat_4d (#13824) Xuan-Son Nguyen 2025-05-27 15:53:55 +02:00
  • 05f6ac6283 ggml : riscv: add xtheadvector support (#13720) xctan 2025-05-27 21:21:36 +08:00
  • bc583e3c63 mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784) Xuan-Son Nguyen 2025-05-27 14:06:10 +02:00
  • 72b090da2c docs: remove link for llama-cli function calling (#13810) bandoti 2025-05-27 08:52:40 -03:00
  • 7fe03e7446 ggml-cpu: x86 feature detection is specific to x86 (#13811) Christian Kastner 2025-05-27 13:18:39 +02:00
  • 952f3953c1 ggml : allow CUDA graphs when using pipeline parallelism (#13814) Diego Devesa 2025-05-27 04:05:18 -07:00