Commit Graph

  • 0123ff38f5 memory : use sequential equal splits for recurrent modules (#16442) Georgi Gerganov 2025-10-07 08:24:17 +03:00
  • 0a319bb75e metal : add support for non-padded FA KV (#16148) Georgi Gerganov 2025-10-07 08:23:30 +03:00
  • 1d6092fc72 tests : add -INF blocks to the KQ mask in the FA tests (#16380) Georgi Gerganov 2025-10-07 08:22:35 +03:00
  • 8ae32dc9ec metal : various optimizations + refactoring (#16446) Georgi Gerganov 2025-10-07 08:21:40 +03:00
  • 3df2244df4 llama : add --no-host to disable host buffers (#16310) Gadflyii 2025-10-06 12:55:53 -05:00
  • c08002a198 chat : Granite Docling stopping (#16438) Gabe Goodhart 2025-10-06 10:59:40 -06:00
  • 3a002afafa ci : refactor sdk caching to minimize storage (#16414) Sigbjørn Skjæret 2025-10-06 17:40:21 +02:00
  • a23b9bdbd3 ggml : fix unaligned access in AMX code (#16315) Georgi Gerganov 2025-10-06 16:05:27 +03:00
  • 04e632a4aa ci : remove missing reranker model files (#16444) Daniel Bevenius 2025-10-06 14:56:59 +02:00
  • a80ff183ab ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (#16443) Daniel Bevenius 2025-10-06 14:17:12 +02:00
  • 1d49ca3759 nix : removed metal for nix (#16118) Yuannan 2025-10-06 09:29:56 +00:00
  • c5fef0fcea server: update readme to mention n_past_max metric (#16436) Oleksandr Kuvshynov 2025-10-06 03:53:31 -04:00
  • ca71fb9b36 model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206) Gabe Goodhart 2025-10-05 06:57:47 -06:00
  • 35266573b9 ggml webgpu: actually add softmax, fix rms_norm offset (#16400) Reese Levine 2025-10-04 20:59:31 -07:00
  • 86df2c9ae4 vulkan: use a more appropriate amount of threads when generating shaders (#16418) Eve 2025-10-04 20:04:27 +00:00
  • f39283960b rpc : check src buffer when copying tensor (#16421) Radoslav Gerganov 2025-10-04 16:22:45 +03:00
  • 898acba681 rpc : add support for multiple devices (#16276) Radoslav Gerganov 2025-10-04 12:49:16 +03:00
  • e29acf74fe vulkan : incremental shader builds (#16341) Acly 2025-10-04 11:42:56 +02:00
  • 128d522c04 chat : support Magistral thinking (#16413) Pascal 2025-10-03 20:51:48 +02:00
  • f6dcda3900 server : context checkpointing for hybrid and recurrent models (#16382) ddh0 2025-10-03 13:34:51 -05:00
  • 606a73f531 metal : fix loop bound in ggml_mem_ranges (#16412) Georgi Gerganov 2025-10-03 19:18:56 +03:00
  • 946f71ed9a llama : fix shapes for bert/mpt q/k norm (#16409) Sigbjørn Skjæret 2025-10-03 14:40:25 +02:00
  • 638d330246 ggml : fix graph reallocation with multiple chunks (#16396) Acly 2025-10-03 13:49:08 +02:00
  • 84c8e305e8 Fix missing messages on sibling navigation (#16408) Aleksander Grygier 2025-10-03 12:51:40 +02:00
  • 2aaf0a2a20 vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#16354) Jeff Bolz 2025-10-03 05:50:46 -05:00
  • 0e1f838556 vulkan: Fix FA coopmat1 invalid array indexing (#16365) Jeff Bolz 2025-10-03 04:52:46 -05:00
  • ad126479c2 ci : change macos-13 to macos-15-intel (#16401) Daniel Bevenius 2025-10-03 11:45:16 +02:00
  • 77233277c9 Capture model name only after first token (streaming) or completed request (#16405) Aleksander Grygier 2025-10-03 11:30:39 +02:00
  • e308efda8e vulkan: in flash attention, bounds check against nem1 (don't rely on GGML_KQ_MASK_PAD) (#16316) Jeff Bolz 2025-10-03 03:33:08 -05:00
  • 136bda78c5 webui : Fix messages payload sent to chat completions (#16402) Aleksander Grygier 2025-10-03 09:11:34 +02:00
  • 5113efd34c fix: track viewportHeight via window.innerHeight to avoid unwanted scrolling (#16356) Pascal 2025-10-03 08:01:31 +02:00
  • d64c8104f0 test-barrier : do not use more threads than physically available (#16389) Sigbjørn Skjæret 2025-10-02 20:10:12 +02:00
  • ef07a40906 ggml webgpu: add support for soft_max, optimize rms_norm (#16357) Reese Levine 2025-10-02 11:00:31 -07:00
  • 34fcc5a4ac model : Apertus model implementation (#15852) Piotr Wilkin (ilintar) 2025-10-02 19:43:22 +02:00
  • 91a2a56556 musa: update compile flags (#16265) R0CKSTAR 2025-10-02 21:29:56 +08:00
  • 72ee736c44 ci : fix ubuntu-latest-cmake-rpc (disable ccache) (#16388) Sigbjørn Skjæret 2025-10-02 13:51:36 +02:00
  • f09aefaa84 ci: update vulkan ci (#16294) Eve 2025-10-02 08:10:07 +00:00
  • bbd32bc038 ci : fix clean-up of old logs (#16381) Georgi Gerganov 2025-10-02 10:35:43 +03:00
  • 2be72c2b12 SYCL: Update to oneAPI 2025.2 (#16371) Neo Zhang Jianyu 2025-10-02 15:16:25 +08:00
  • 95ce098544 HIP: add IMbackK to codeowner (#16375) uvos 2025-10-02 05:52:59 +02:00
  • c8dedc9999 CI: reenable cdna in rocm docker builds (#16376) uvos 2025-10-01 23:32:39 +02:00
  • e95fec640f HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.0 (#16221) uvos 2025-10-01 23:09:25 +02:00
  • ded67b9444 llama : parameter conversion and loading fixes for PLaMo2 variants (#16075) Shunta Saito 2025-10-02 06:08:15 +09:00
  • 1fe4e38cc2 ci: Properly install rocwmma for hip builds (#16305) uvos 2025-10-01 20:18:03 +02:00
  • 4201deae9c common: introduce http.h for httplib-based client (#16373) Adrien Gallouët 2025-10-01 19:22:18 +02:00
  • 764799279f Conversation action dialogs as singletons from Chat Sidebar + apply conditional rendering for Actions Dropdown for Chat Conversation Items (#16369) Aleksander Grygier 2025-10-01 18:18:10 +02:00
  • 2a9b63383a Improve code block color theming (#16325) Aleksander Grygier 2025-10-01 15:54:42 +02:00
  • 1104ca1a1c ci : use registry cache for docker builds (#16366) Sigbjørn Skjæret 2025-10-01 14:09:52 +02:00
  • 4f1575921c Add optional setting for showing "Model used:" information (#16337) Aleksander Grygier 2025-10-01 12:08:16 +02:00
  • 132d673554 vulkan: make ggml_vk_default_dispatcher support older vulkan headers (#16345) Eve 2025-10-01 07:56:36 +00:00
  • aa9538a63a webui: Remove running llama-server within WebUI dev.sh script (#16363) Aleksander Grygier 2025-10-01 07:40:26 +02:00
  • e74c92e842 model : support GLM 4.6 (make a few NextN/MTP tensors not required) (#16359) Bartowski 2025-09-30 16:24:36 -04:00
  • b2ba81dbe0 ci : fix ccache key for ubuntu-cpu-cmake (#16355) Sigbjørn Skjæret 2025-09-30 21:41:42 +02:00
  • bf6f3b3a19 common : disable progress bar without a tty (#16352) Adrien Gallouët 2025-09-30 19:52:41 +02:00
  • 7c156df414 opencl: support pad_ext (#15888) lhez 2025-09-30 10:45:45 -07:00
  • 16b0ca0d2e Chatapi ignore empty sampling (#16330) Pascal 2025-09-30 19:18:54 +02:00
  • 8d78cd2613 ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187) Reese Levine 2025-09-30 09:57:51 -07:00
  • d1c84a662d opencl: support ne3 in get_rows (#15866) lhez 2025-09-30 09:55:13 -07:00
  • 364a7a6d4a common : remove common_has_curl() (#16351) Adrien Gallouët 2025-09-30 16:39:44 +02:00
  • 2df5bcf357 ci : disable ccache for android (#16348) Sigbjørn Skjæret 2025-09-30 15:38:01 +02:00
  • 075c01567b ggml : bump version to 0.9.4 (ggml/1363) Georgi Gerganov 2025-09-30 13:42:39 +03:00
  • a014310374 cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328) anavp-nvidia 2025-09-30 08:13:22 +00:00
  • 35fb82497e metal : dynamic simdgroups for MV kernels (#16340) Georgi Gerganov 2025-09-30 11:03:23 +03:00
  • 3c62aed89f common : simplify etag tracking by removing json (#16342) Adrien Gallouët 2025-09-30 09:36:33 +02:00
  • f1eb1cb1eb kleidiai : fix work size and threads sync for fp16 (#16246) Charles Xu 2025-09-30 09:07:20 +02:00
  • de41f2b7bf codeowners: add codeowners for opencl backend (#16344) lhez 2025-09-29 22:30:16 -07:00
  • a74a0d69f3 tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences (#16295) Jeff Bolz 2025-09-29 19:26:34 -05:00
  • 5f7e166cbf Fix thinking blocks with quotes + add handling [THINK]...[/THINK] blocks (#16326) Pascal 2025-09-29 18:49:47 +02:00
  • d72f5f7ba2 ci : add AMD runners and workflows (#16249) Georgi Gerganov 2025-09-29 17:51:48 +03:00
  • b77e6c18e1 ggml: riscv: add riscv spacemit backend (#15288) alex-spacemit 2025-09-29 22:50:44 +08:00
  • 2ddd3f2356 sync : ggml Georgi Gerganov 2025-09-29 16:50:52 +03:00
  • 4d3d455d3c sync : whisper.cpp (ggml/1359) Georgi Gerganov 2025-09-29 16:49:11 +03:00
  • c9b1c06467 ggml : remove -dev suffix from release version (ggml/1355) Daniel Bevenius 2025-09-26 17:34:42 +02:00
  • b6ae75afb4 ggml : bump version to 0.9.3 (ggml/1353) Daniel Bevenius 2025-09-25 14:39:05 +02:00
  • b6dff20e2f ggml : prepare for development of 0.9.2-dev Georgi Gerganov 2025-09-20 16:44:23 +03:00
  • 2db78c75e4 ggml : bump version to 0.9.1 Georgi Gerganov 2025-09-20 16:44:23 +03:00
  • 02463ab27b ggml-backend : add root cause in error message if loading backend library fails (#16172) Rafal Lewczuk 2025-09-29 13:17:09 +02:00
  • adc76347d7 ggml : check cuda and metal argsort limits and add test (#16323) Sigbjørn Skjæret 2025-09-29 11:09:00 +02:00
  • 3a2bdcda0b Improve Mobile UI for dialogs and action dropdowns (#16222) Aleksander Grygier 2025-09-29 10:37:20 +02:00
  • 66bb7985c3 fix: preserved zero values in chat settings inputs and textareas by switching to nullish coalescing for field values and default placeholders (#16312) Pascal 2025-09-29 09:08:41 +02:00
  • 2f61c0f5bf llama-cli: prevent spurious assistant token (#16202) Vinkal 2025-09-29 12:33:12 +05:30
  • 3ffd0fae47 perplexity : show more kl-divergence data (#16321) ddh0 2025-09-29 01:30:45 -05:00
  • a4a0aa5ea2 ggml : fix dependencies for ggml_set_rows (#16318) Georgi Gerganov 2025-09-29 08:41:28 +03:00
  • 92cd103f62 vulkan: Fix validation failure in quantized flash attention (#16292) Jeff Bolz 2025-09-28 23:50:37 -05:00
  • b887d2f341 ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307) Sigbjørn Skjæret 2025-09-28 23:15:03 +02:00
  • bd0af02fc9 common : fix reasoning before forced tool call via tool_choice = required (#16264) crat0z 2025-09-28 14:13:50 -04:00
  • d9e0e7c819 ci : fix musa docker build (#16306) R0CKSTAR 2025-09-28 22:38:15 +08:00
  • 0124ac989f devops: switch to using ubuntu-22.04-s390x image (#16302) Aaron Teo 2025-09-28 19:25:58 +08:00
  • 2811c65286 Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297) Imad Saddik 2025-09-28 12:04:46 +01:00
  • d8359f5fde vulkan: 64-bit im2col (#16135) Jeff Bolz 2025-09-28 01:38:37 -05:00
  • 6a2c6145a0 metal : extend mat-mat multiplication support (#16225) Georgi Gerganov 2025-09-28 09:34:44 +03:00
  • 3b53634fe3 metal : fuse non-sequential nodes (#16102) Georgi Gerganov 2025-09-28 09:34:05 +03:00
  • 1384abf8b8 vulkan: handle mat_mul with A matrix > 4GB (#16176) Jeff Bolz 2025-09-27 20:36:34 -05:00
  • e6d65fb02d vulkan: support arbitrary KV dimension in flash attention (#16160) Jeff Bolz 2025-09-27 16:43:39 -04:00
  • 8656f5de68 vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16224) Acly 2025-09-27 22:41:03 +02:00
  • 4807e8f96a Show message actions by default (#16289) Aleksander Grygier 2025-09-27 19:56:40 +02:00
  • c0bfc57af4 CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277) Aman Gupta 2025-09-28 00:49:32 +08:00
  • 75a3a6c2cd CUDA: refactor and deduplicate vector FA kernels (#16208) Johannes Gäßler 2025-09-27 18:45:07 +02:00
  • 0499b29c6f vulkan: throw system error instead of SIGABRT during init on older devices (#16156) Dmytro Minochkin 2025-09-27 19:26:46 +03:00
  • 234e2ff8ed server : remove old LLAMA_SERVER_SSL (#16290) Adrien Gallouët 2025-09-27 18:17:08 +02:00