Commit Graph

  • 307e79d33d opencl : fix possible buffer overflow in dump_tensor (#14490) zhouwg 2025-07-02 20:38:10 +08:00
  • d7f5f4e578 simple-chat : fix context-exceeded condition (#14494) Georgi Gerganov 2025-07-02 14:12:07 +03:00
  • c8a4e470f6 opencl : skip empty nodes on cgraph compute (#14491) Eric Zhang 2025-07-02 19:00:04 +08:00
  • 603e43dc91 opencl : update upscale to support align corners (#14488) lhez 2025-07-02 00:07:42 -07:00
  • 611ba4b264 ci : add OpenCL to labeler workflow (#14496) Sigbjørn Skjæret 2025-07-02 09:02:51 +02:00
  • 85841e121d github : add OpenCL backend to issue templates (#14492) Eric Zhang 2025-07-02 13:41:35 +08:00
  • 68b3cd6514 ggml : Callback before abort (#14481) Björn Ganster 2025-07-02 07:19:31 +02:00
  • de56944147 ci : disable fast-math for Metal GHA CI (#14478) Georgi Gerganov 2025-07-01 18:04:08 +03:00
  • 1b2aaf28ac Add Vulkan images to docker.md (#14472) Grzegorz Grasza 2025-07-01 15:44:11 +02:00
  • 343b6e94b6 CANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (#14411) Chenguang Li 2025-07-01 16:47:30 +08:00
  • 6a746cf9c4 vulkan: Split large mul_mat_id to fit in shared memory (#14451) Jeff Bolz 2025-07-01 03:43:08 -05:00
  • eff5e45443 add GELU_ERF (#14455) Sigbjørn Skjæret 2025-07-01 10:14:21 +02:00
  • a6a47958a1 ggml : remove trailing whitespace (#0) Georgi Gerganov 2025-07-01 11:05:48 +03:00
  • f61c05d4b1 sync : ggml Georgi Gerganov 2025-07-01 10:27:52 +03:00
  • 431b2c24f3 ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285) Acly 2025-07-01 09:11:00 +02:00
  • 497be7c01d ggml-quants : rename best_mad to best_error (ggml/1283) Daniel Bevenius 2025-06-24 06:10:16 +02:00
  • 79b33b2317 opencl : add GEGLU, REGLU, SWIGLU (#14456) lhez 2025-07-01 00:19:16 -07:00
  • 0a5a3b5cdf Add Conv2d for CPU (#14388) Aman Gupta 2025-06-30 23:57:04 +08:00
  • 745f11fed0 memory : correctly handle failure in apply() (#14438) Georgi Gerganov 2025-06-30 18:03:03 +03:00
  • 5dd942de59 metal : disable fast-math for some cpy kernels (#14460) Georgi Gerganov 2025-06-30 17:04:05 +03:00
  • a7417f5594 ggml-cpu: sycl: Re-enable exp f16 (#14462) Romain Biessy 2025-06-30 14:52:02 +02:00
  • eb3fa2913e test-backend-ops : disable llama test (#14461) Diego Devesa 2025-06-30 03:43:15 -07:00
  • c839a2da1a cmake : Remove redundant include path in CMakeLists.txt (#14452) xiaobing318 2025-06-30 17:48:24 +08:00
  • e9b6350e61 scripts : make the shell scripts cross-platform (#14341) Vedran Miletić 2025-06-30 10:17:18 +02:00
  • caf5681fcb server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196) matteo 2025-06-29 20:02:53 +02:00
  • 83790b0e7e server : fix appearance of the chats list context menu for Safari (#14322) Renat 2025-06-29 19:29:57 +02:00
  • f47c1d7106 SYCL: disable faulty fp16 exp kernel (#14395) Akarshan Biswas 2025-06-29 21:07:58 +05:30
  • a5d1fb6212 ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443) Sigbjørn Skjæret 2025-06-29 14:38:10 +02:00
  • a0535ffa0d ggml : implement REGLU/GEGLU/SWIGLU ops (#14158) Sigbjørn Skjæret 2025-06-29 11:04:10 +02:00
  • bd9c981d72 vulkan: Add fusion support for RMS_NORM+MUL (#14366) Jeff Bolz 2025-06-29 02:43:36 -05:00
  • 27208bf657 CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361) Aman Gupta 2025-06-29 01:30:53 +08:00
  • 63a7bb3c7e vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline (#14378) Jeff Bolz 2025-06-28 10:36:40 -05:00
  • 00d5282c7f vulkan: lock accesses of pinned_memory vector (#14333) Jeff Bolz 2025-06-28 10:17:09 -05:00
  • 566c16fcce model : add support for ERNIE 4.5 0.3B model (#14408) Weizhao Ouyang 2025-06-28 22:08:21 +08:00
  • b25e92774e fix async_mode bug (#14432) Xinpeng Dou 2025-06-28 17:35:41 +08:00
  • 6609507a91 ci : fix windows build and release (#14431) Sigbjørn Skjæret 2025-06-28 09:57:07 +02:00
  • ceb1bf5a34 vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (#14427) Jeff Bolz 2025-06-27 22:35:30 -05:00
  • 72babea5de graph : make llm_graph_context destructor virtual (#14410) Georgi Gerganov 2025-06-27 21:42:02 +03:00
  • 43678060c1 recurrent : call balloc split_reset() in init_batch() (#14414) Georgi Gerganov 2025-06-27 17:55:45 +03:00
  • 8d94219a4a ggml : add ggml_set_rows (#14274) Radoslav Gerganov 2025-06-27 16:41:40 +03:00
  • f667f1e624 convert : fix broken sentencepiece vocab (#14416) Sigbjørn Skjæret 2025-06-27 10:42:19 +02:00
  • 8846aace49 model : gemma3n text-only (#14400) Xuan-Son Nguyen 2025-06-26 19:34:02 +02:00
  • a01047b041 cmake: regen vulkan shaders when shaders-gen sources change (#14398) bandoti 2025-06-26 13:46:53 -03:00
  • b25346221d llama : return mistral-v7-tekken as default template only (#14390) Sigbjørn Skjæret 2025-06-26 15:01:14 +02:00
  • e8215dbb96 metal : add special-case mat-vec mul for ne00 == 4 (#14385) Georgi Gerganov 2025-06-26 15:51:19 +03:00
  • 5783ae4359 metal : batch rows copy in a single threadgroup (#14384) Georgi Gerganov 2025-06-26 15:50:15 +03:00
  • bf5bcd0b85 docs: update s390x documentation + add faq (#14389) Aaron Teo 2025-06-26 18:41:41 +08:00
  • 716301d1b0 musa: enable fp16 mma (all) and cublas on qy2 (#13842) R0CKSTAR 2025-06-26 12:11:59 +08:00
  • 60ef23d6c1 ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317) Aaron Teo 2025-06-26 05:49:04 +08:00
  • b193d53069 ggml : do not output unprintable characters on GGUF load failure (#14381) Sigbjørn Skjæret 2025-06-25 23:26:51 +02:00
  • 2bf9d539dd sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973) Anton Mitkov 2025-06-25 17:09:55 +01:00
  • 73e53dc834 opencl: ref count ggml_backend_opencl_context and refactor profiling (#14254) lhez 2025-06-24 11:46:25 -07:00
  • 62af464227 batch : fix check for empty sequences in memory (#14364) Georgi Gerganov 2025-06-24 18:26:30 +03:00
  • c148cf1946 cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (#14362) Mathieu Baudier 2025-06-24 15:05:31 +02:00
  • 1b809cee22 server : move no API key doc to /health (#14352) Nigel Bosch 2025-06-24 08:59:11 +00:00
  • abf241045d main : honor --verbose-prompt on interactive prompts (#14350) Sigbjørn Skjæret 2025-06-24 09:31:00 +02:00
  • 901e20bbe5 jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349) Bartowski 2025-06-24 02:17:58 -04:00
  • 0142961a2e CUDA/HIP: optimize mmv paths taken for HIP devices (#14324) uvos 2025-06-24 01:12:56 +02:00
  • ce82bd0117 ci: add workflow for relocatable cmake package (#14346) bandoti 2025-06-23 15:30:51 -03:00
  • bf2a99e3cb vulkan: update windows SDK in release.yml (#14344) Jeff Bolz 2025-06-23 08:44:48 -05:00
  • 72c6bc3f3d llama : better rwkv chat template and add missing inputs.use_jinja setting (#14336) Molly Sophia 2025-06-23 19:56:19 +08:00
  • defe2158dd CUDA: mul_mat_v support for batch sizes > 1 (#14262) Johannes Gäßler 2025-06-23 13:11:31 +02:00
  • 7b50d589a8 kv-cells : fix tracking of seq_pos (#14339) Georgi Gerganov 2025-06-23 12:27:35 +03:00
  • 3a9457df96 vulkan: update windows SDK in CI (#14334) Jeff Bolz 2025-06-23 03:19:24 -05:00
  • fa4a9f2a1c quantize : handle user-defined pruning of whole layers (blocks) (#13037) Ed Addario 2025-06-22 22:16:26 +01:00
  • 238005c2dc gguf-py : fix SpecialVocab parsing when post_processor is null (#14330) Sigbjørn Skjæret 2025-06-22 19:46:17 +02:00
  • 66aba7aca9 run : avoid double tokenization (#14327) Ruikai Peng 2025-06-23 01:28:06 +08:00
  • f1f5e82df6 examples : fix is_first logic for tokenization (#14329) Georgi Gerganov 2025-06-22 20:10:07 +03:00
  • af3373f1ad HIP: enable vec fattn on RDNA4 (#14323) uvos 2025-06-22 16:51:23 +02:00
  • 5d5c066de8 mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326) yuiseki 2025-06-22 21:44:57 +09:00
  • 40bfa04c95 common : use std::string_view now that we target c++17 (#14319) Sigbjørn Skjæret 2025-06-22 07:37:43 +02:00
  • aa064b2eb7 CUDA: add mean operation (#14313) Aman Gupta 2025-06-22 12:39:54 +08:00
  • aa0ef5c578 gguf-py : fix Qwen3-Embedding eos token (#14314) Sigbjørn Skjæret 2025-06-21 18:12:05 +02:00
  • bb16041cae Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792) Markus Tavenrath 2025-06-21 08:17:12 +02:00
  • 58cba76a9a gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312) Sigbjørn Skjæret 2025-06-21 07:33:21 +02:00
  • 67ae5312e2 metal : fix thread-safety (#14300) Georgi Gerganov 2025-06-21 08:04:18 +03:00
  • 692e3cdd0a memory : rename interface to llama_memory_context_i (#14296) Georgi Gerganov 2025-06-21 08:03:46 +03:00
  • b23fa0b3f4 convert : fix Llama 4 conversion (#14311) Daniel Han 2025-06-20 21:32:01 -07:00
  • 06cbedfca1 sync : ggml Georgi Gerganov 2025-06-20 20:50:24 +03:00
  • b7147673f2 Add ggml_roll (ggml/1274) Acly 2025-06-18 13:34:50 +02:00
  • d860dd99a4 docs : fix the link to llama.h (#14293) David Chiu 2025-06-21 01:43:35 +08:00
  • c959f462a0 CUDA: add conv_2d_transpose (#14287) Aman Gupta 2025-06-20 22:48:24 +08:00
  • 22015b2092 lint : remove trailing whitepace (#14304) Sigbjørn Skjæret 2025-06-20 16:37:44 +02:00
  • dd6e6d0b6a vocab : prevent tokenizer overflow (#14301) Ruikai Peng 2025-06-20 22:13:06 +08:00
  • 8308f98c7f sycl: add usage of enqueue_functions extension (#14244) Nicolò Scipione 2025-06-20 15:07:21 +02:00
  • 6369be0735 Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286) Christian Kastner 2025-06-20 12:17:32 +00:00
  • 88fc854b4b llama : improve sep token handling (#14272) Sigbjørn Skjæret 2025-06-20 14:04:09 +02:00
  • e28c1b93fd cuda : synchronize graph capture and cublas handle destruction (#14288) Diego Devesa 2025-06-20 04:57:36 -07:00
  • d27b3ca175 ggml : fix repack work size for mul_mat_id (#14292) Georgi Gerganov 2025-06-20 11:19:15 +03:00
  • 9230dbe2c7 ggml: Update KleidiAI to v1.9.0 (#14277) Charles Xu 2025-06-20 09:51:01 +02:00
  • 812939a9e9 model : more uniform output id handling (#14275) Georgi Gerganov 2025-06-20 10:50:27 +03:00
  • 4c9fdfbe15 ubatch : new splitting logic (#14217) Georgi Gerganov 2025-06-20 10:14:14 +03:00
  • 9eaa51e7f0 CUDA: add conv_2d_dw (#14265) Aman Gupta 2025-06-20 09:50:24 +08:00
  • 8f71d0f3e8 ggml-cpu : remove unnecesary arm feature detection (#14281) Diego Devesa 2025-06-19 12:24:14 -07:00
  • 381174bbda gguf-py : make sentencepiece optional (#14200) Alex Trotta 2025-06-19 09:56:12 -04:00
  • d67341dc18 server : add server parameters for draft model cache type (#13782) aa956 2025-06-19 16:01:03 +03:00
  • 456af35eb7 build : suppress gcc15 compile warnings (#14261) fanyang 2025-06-19 20:49:48 +08:00
  • 600e3e9b50 sycl: Cleanup codepaths in Get Rows in sycl backend (#14215) Anton Mitkov 2025-06-19 11:40:21 +01:00
  • fffcce535e llama-bench : add --no-warmup flag (#14224) (#14270) bashayer hijji 2025-06-19 13:24:12 +03:00
  • 5fc7856815 convert : fix remote option in Windows (#14100) pqnet 2025-06-19 12:21:40 +02:00