Commit Graph

  • eb420e1148 sync : ggml Georgi Gerganov 2025-04-10 23:59:16 +03:00
  • cb79c2e7fa ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) cmdr2 2025-04-10 17:53:08 +05:30
  • fe92821ea9 ggml : add bilinear upscale support (ggml/1185) Diego Devesa 2025-04-09 12:32:13 +02:00
  • 459895c326 ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) Diego Devesa 2025-04-09 12:31:34 +02:00
  • e4bf72d631 scripts : fix sync-ggml-am.sh Georgi Gerganov 2025-04-10 23:59:01 +03:00
  • 8b9cc7cdd8 llava : introduce libmtmd (#12849) Xuan-Son Nguyen 2025-04-10 22:57:16 +02:00
  • 64eda5deb9 convert : ability to lazy-load safetensors remotely without downloading to disk (#12820) Xuan-Son Nguyen 2025-04-10 17:24:44 +02:00
  • fe5b78c896 CANN: Support more ops (#12841) Chenguang Li 2025-04-10 08:51:52 +08:00
  • 11d07e1e69 Fixes #12823 (#12830) Prajwal B Mehendarkar 2025-04-10 04:48:01 +05:30
  • b0091ecc1e docker : added all CPU to GPU images (#12749) Rudi Servo 2025-04-09 23:17:12 +00:00
  • 31f7803bc4 ggml-cpu-impl.h: do not redefine bool on POWER9 (#12856) Piotr Kubaj 2025-04-09 23:00:34 +00:00
  • 2391506ace ggml-impl.h: fix build on POWER9 (#12855) Piotr Kubaj 2025-04-09 23:00:25 +00:00
  • d3bd7193ba llama : Support Qwen3 and Qwen3MoE (#12828) Bo Zheng 2025-04-09 17:47:36 +08:00
  • d9a63b2f2e musa: enable freediskspace for docker image build (#12839) R0CKSTAR 2025-04-09 17:22:30 +08:00
  • 8ed71242f4 sycl: update documentation to use -no-cnv (#12845) Romain Biessy 2025-04-09 11:22:04 +02:00
  • 381603a775 ci: detach common from the library (#12827) Plamen Minev 2025-04-09 11:11:11 +03:00
  • 65a69e6e1b clip : do not print ftype (#12832) Xuan-Son Nguyen 2025-04-09 10:09:53 +02:00
  • 47277d6d1d readme : add rpc backend (#12842) Georgi Gerganov 2025-04-09 10:54:42 +03:00
  • 6e1c4cebdb CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786) Chenguang Li 2025-04-09 14:04:14 +08:00
  • 0090950f67 vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (#12833) Jeff Bolz 2025-04-09 00:25:08 -05:00
  • 7ecd780b1a vulkan: Use fp16 for the flash attention P*V multiplication (#12783) Jeff Bolz 2025-04-09 00:12:57 -05:00
  • 7538246e7c cuda : add f32 to bf16 copy op (#12806) Sigbjørn Skjæret 2025-04-08 23:21:31 +02:00
  • b32efad2bc llava: improve clip_ctx destructor to not memleak load_image_size (#12834) Matt Clayton 2025-04-08 16:01:58 -04:00
  • a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) Georgi Gerganov 2025-04-08 19:54:51 +03:00
  • 78a1ba0a4f server : fix thread.join() on exit (#12831) Xuan-Son Nguyen 2025-04-08 18:37:06 +02:00
  • 2dabf759e7 llava: add more helper functions to check projector types in clip context (#12824) dm4 2025-04-08 21:49:13 +08:00
  • 1d343b4069 arg : Including limits file on AIX (#12822) Prajwal B Mehendarkar 2025-04-08 18:00:59 +05:30
  • 8ca6e1c3a4 server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785) characharm 2025-04-08 14:14:59 +05:00
  • 656babd6c2 Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (#12812) Neo Zhang Jianyu 2025-04-08 15:03:21 +08:00
  • a226bc7a9a gguf-py : support lazy tensor splitting (#12809) compilade 2025-04-08 03:03:07 -04:00
  • 1466621e73 llama : Support llama 4 text-only (#12791) Xuan-Son Nguyen 2025-04-07 23:06:44 +02:00
  • 82974011f3 opencl: better identify Adreno GPU (#12760) lhez 2025-04-07 13:22:54 -07:00
  • 4ccea213bc hellaswag: display estimated score confidence interval (#12797) stduhpf 2025-04-07 17:47:08 +02:00
  • 1a1ab7e7a4 cuda : fix HIP and MUSA BF16 (#0) Georgi Gerganov 2025-04-07 13:18:07 +03:00
  • a4e46e28f9 sync : ggml Georgi Gerganov 2025-04-07 12:32:39 +03:00
  • ff067dbcb9 ggml : simplify Arm fp16 CPU logic (ggml/1177) Georgi Gerganov 2025-04-07 12:25:15 +03:00
  • 36ca8b3628 CUDA: don't convert BF16 weights to FP32 (ggml/1174) Sigbjørn Skjæret 2025-04-04 21:05:12 +02:00
  • 995083e4ed cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) cmdr2 2025-04-02 17:46:16 +05:30
  • 518a01480e sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (#12734) zhouwg 2025-04-07 23:22:57 +08:00
  • e391d3ee8d ci : no curl on ggml-ci (#12796) Xuan-Son Nguyen 2025-04-07 14:37:28 +02:00
  • bd3f59f812 cmake : enable curl by default (#12761) Xuan-Son Nguyen 2025-04-07 13:35:19 +02:00
  • 52b3d71f12 CANN: fix typo in ggml-cann (#12733) zhouwg 2025-04-07 19:34:14 +08:00
  • d0d5b2232b CANN: Refactor to reduce duplicate code (#12731) hipudding 2025-04-07 17:10:36 +08:00
  • 916c83bfe7 musa: fix compilation warnings in mp_22/31 (#12780) R0CKSTAR 2025-04-06 21:23:54 +08:00
  • 0c74b04376 vulkan: fix NaN issue in flash attention shader (#12776) Jeff Bolz 2025-04-06 04:03:47 -05:00
  • 80b717d493 vulkan: Use unclamped loads for flash attention mask (#12720) Jeff Bolz 2025-04-06 03:47:13 -05:00
  • 6bf28f0111 Vulkan: Tune Vulkan mmq int dot shader for performance (#12767) 0cc4m 2025-04-05 18:04:03 +02:00
  • f1e3eb4249 common : fix includes in arg.cpp and gemma3-cli.cpp (#12766) Sergey Fedorov 2025-04-05 23:46:00 +08:00
  • 0364178ca2 clip : refactor clip_init, add tests (#12757) Xuan-Son Nguyen 2025-04-05 17:17:40 +02:00
  • c6ff5d2a8d common: custom hf endpoint support (#12769) エシュナヴァリシア 2025-04-05 21:31:42 +08:00
  • 7a84777f42 sync: minja (#12739) Olivier Chafik 2025-04-04 13:16:39 -07:00
  • 3e1d29348b kv-cache : simplify + fix warning for recurrent models (#12756) Georgi Gerganov 2025-04-04 21:48:10 +03:00
  • 1be76e4620 ci: add Linux cross-compile build (#12428) bandoti 2025-04-04 14:05:12 -03:00
  • b772394297 server : webui : Upgrade daisyui, tailwindcss. (#12735) Nauful Shaikh 2025-04-04 09:09:52 -05:00
  • 23106f94ea gguf-split : --merge now respects --dry-run option (#12681) nick huang 2025-04-04 22:09:12 +08:00
  • 94148ba330 sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution (#12625) Nicolò Scipione 2025-04-04 16:00:46 +02:00
  • 9ac4d611d0 cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747) Ronny Brendel 2025-04-04 15:12:40 +02:00
  • 348888e0dc docs : add XCFramework section to README.md [no ci] (#12746) Daniel Bevenius 2025-04-04 10:24:12 +02:00
  • 74d4f5b041 vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (#12630) Jeff Bolz 2025-04-04 00:54:35 -05:00
  • 35e592eb30 vulkan: set cmake minimum and project name in vulkan-shaders (#12744) Jeff Bolz 2025-04-04 00:53:20 -05:00
  • 7d7b1bafa7 opencl: update doc for OpenCL (#12702) lhez 2025-04-03 22:18:17 -07:00
  • c262beddf2 CUDA: Prefer vector flash decoding kernel for Gemma models (#12738) Gaurav Garg 2025-04-03 21:50:29 +05:30
  • 5dd5d1ab00 vocab : use string_view::find() to avoid unnecessary looking up beyond the fragment range (#12706) yumeyao 2025-04-03 23:32:54 +08:00
  • 1c059995e0 vulkan: Fix missing cmake logic for dot product extension (#12721) Jeff Bolz 2025-04-03 10:08:26 -05:00
  • 2004644b7a ci : add env variable in ggml-ci and document the same in SYCL.md (#12736) Atharva Dubey 2025-04-03 13:12:39 +01:00
  • 5f696e88e0 sync : minja (inclusionAI/Ling) and update tests (#12699) R0CKSTAR 2025-04-03 19:51:35 +08:00
  • 193c3e03a6 fix MUSA compiler warning (#12704) a3sh 2025-04-03 15:32:55 +08:00
  • 65cfe136a0 CANN: Support operator SIN COS ARGMAX (#12709) Chenguang Li 2025-04-03 15:18:08 +08:00
  • 3f9da22c2b Simplify and improve CUDA graphs through use of indirect copy pointers (#9017) Alan Gray 2025-04-03 02:31:15 +01:00
  • 2a0dc97e56 CANN: Fix failed test cases (#12708) hipudding 2025-04-03 08:49:51 +08:00
  • 97a20c012b opencl: use max_alloc_size in backend ctx instead of querying again (#12705) lhez 2025-04-02 17:01:42 -07:00
  • f01bd02376 vulkan: Implement split_k for coopmat2 flash attention. (#12627) Jeff Bolz 2025-04-02 14:25:08 -05:00
  • 6f3bd38640 cmake: remove caching from vulkan coopmat checks (#12719) bandoti 2025-04-02 14:56:26 -03:00
  • be0a0f8cae vulkan: Implement grouped query attention in the coopmat2 FA shader (#12559) Jeff Bolz 2025-04-02 12:40:32 -05:00
  • 92e3006bb6 Vulkan: Fix mmq int dot float cache size (#12722) 0cc4m 2025-04-02 19:12:30 +02:00
  • 833e2b7409 model : print tensor size during load (#12711) Georgi Gerganov 2025-04-02 16:38:54 +03:00
  • e0e912f49b llama : add option to override model tensor buffers (#11397) Diego Devesa 2025-04-02 14:52:01 +02:00
  • a10b36c91a llama : refactor kv cache guard (#12695) Georgi Gerganov 2025-04-02 14:32:59 +03:00
  • 83a88bd6af vocab : BailingMoE : change possessive quantifiers to greedy (#12677) Sigbjørn Skjæret 2025-04-02 11:21:48 +02:00
  • 42eb248f46 common : remove json.hpp from common.cpp (#12697) Xuan-Son Nguyen 2025-04-02 09:58:34 +02:00
  • 9bacd6b374 [CANN] get_rows and dup optimization (#12671) Chenguang Li 2025-04-02 15:22:13 +08:00
  • 267c1399f1 common : refactor downloading system, handle mmproj with -hf option (#12694) Xuan-Son Nguyen 2025-04-01 23:44:05 +02:00
  • f423981ac8 opencl : fix memory allocation size (#12649) Junil Kim 2025-04-02 01:54:34 +09:00
  • e39e727e9a llama : use LLM_KV_GENERAL_FILE_TYPE instead of gguf_find_key (#12672) jklincn 2025-04-01 20:54:28 +08:00
  • 5936a616e4 convert : BailingMoE : fix qkv split when head_dim is 0 (#12687) Sigbjørn Skjæret 2025-04-01 14:37:13 +02:00
  • 3fd072a540 metal : use F32 prec in FA kernels (#12688) Georgi Gerganov 2025-04-01 14:57:19 +03:00
  • a6f32f0b34 Fix clang warning in gguf_check_reserved_keys (#12686) R0CKSTAR 2025-04-01 19:12:53 +08:00
  • 2bb3597e42 vulkan: fix build when glslc doesn't support coopmat (#12683) Wagner Bruna 2025-04-01 06:38:07 -03:00
  • 8293970542 SYCL: Rename oneMKL to oneMath (#12192) Romain Biessy 2025-04-01 10:24:29 +02:00
  • 8bbf26083d SYCL: switch to SYCL namespace (#12674) Akarshan Biswas 2025-04-01 13:41:39 +05:30
  • 35782aeedb convert : BailingMoE : avoid setting rope_dim to 0 (#12678) Sigbjørn Skjæret 2025-03-31 23:09:48 +02:00
  • c80a7759da vocab : add special infill tokens for CodeLlama (#11850) Daniel Bevenius 2025-03-31 18:40:56 +02:00
  • 250d7953e8 ggml : faster ssm scan (#10558) a3sh 2025-04-01 00:05:13 +08:00
  • 403fbacbbc convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present (#12667) Sigbjørn Skjæret 2025-03-31 16:36:25 +02:00
  • a8a1f33567 Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135) 0cc4m 2025-03-31 14:37:01 +02:00
  • 1790e73157 cmake : fix whitespace (#0) Georgi Gerganov 2025-03-31 15:05:30 +03:00
  • 0114a32da0 sync : ggml Georgi Gerganov 2025-03-31 14:59:21 +03:00
  • a7724480fd cmake: improve Vulkan cooperative matrix support checks (whisper/2966) Sandro Hanea 2025-03-31 12:44:36 +02:00
  • 1a85949067 llava : proper description fix (#12668) Sigbjørn Skjæret 2025-03-31 11:28:30 +02:00
  • 6c02a032fa SYCL: Remove misleading ggml_sycl_op_flatten function (#12387) Akarshan Biswas 2025-03-31 14:55:24 +05:30