Commit Graph

  • 01e8f2138b ggml-vulkan: remove unused find_program(glslc) (#12416) Guus Waals 2025-03-18 00:35:43 +08:00
  • 484a8ab513 vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312) Jeff Bolz 2025-03-17 09:26:18 -05:00
  • cf2270e4d3 vulkan: subgroup size tuning (#12087) Daniele 2025-03-17 12:42:33 +01:00
  • f07690c930 vulkan: use fp32 in coopmat2 q4_k dequant function (#12309) Jeff Bolz 2025-03-17 04:43:35 -05:00
  • 891c63956d vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (#12273) Jeff Bolz 2025-03-17 04:41:59 -05:00
  • 2f21123c1d vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258) Jeff Bolz 2025-03-17 04:35:00 -05:00
  • 374101fd74 cmake : enable building llama.cpp using system libggml (#12321) Christian Kastner 2025-03-17 10:05:23 +01:00
  • b3c9a65673 SYCL: set extras only on GGML_TYPE_Q4_0 (#12366) Akarshan Biswas 2025-03-17 07:15:12 +05:30
  • 8ba95dca20 llama : fix OLMo-2-0325-32B-Instruct K-norm size (#12400) Sigbjørn Skjæret 2025-03-16 18:46:36 +01:00
  • dc079cfdff context : fix init of n_outputs (#12397) Georgi Gerganov 2025-03-16 19:29:36 +02:00
  • 7b61bcc87c ci : add --symlinks to xcframework zip command (#12409) Daniel Bevenius 2025-03-16 18:22:05 +01:00
  • f4c3dd5daa llama-tts : add '-o' option (#12398) marcoStocchi 2025-03-15 17:23:11 +01:00
  • 3d35d87b41 SYCL: Delete redundant plus sign and space (#12391) aubreyli 2025-03-15 22:49:03 +08:00
  • b19bd064c0 SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399) fairydreaming 2025-03-15 15:19:30 +01:00
  • 92a391327e [CANN]MUL_MAT optimization (#12382) Chenguang Li 2025-03-15 09:31:08 +08:00
  • 9f2250ba72 Add CLI arg to llama-run to adjust the number of threads used (#12370) Eric Curtin 2025-03-14 16:41:20 +00:00
  • 774973b8f3 main : add -sysf / --system-prompt-file (#12249) (#12250) Sigbjørn Skjæret 2025-03-14 16:57:05 +01:00
  • 8fcb563613 Load all MoE experts during warmup (#11571) fairydreaming 2025-03-14 13:47:05 +01:00
  • add2a3aa5a server: fix "--grammar-file" parameter (#12285) Victor 2025-03-14 11:21:17 +01:00
  • c522ce4143 graph : simplify attn input build for unified KV cache (#12381) Georgi Gerganov 2025-03-14 10:47:44 +02:00
  • 081bee8c64 hparams : add SWA rope parameters (#12374) Georgi Gerganov 2025-03-14 09:03:24 +02:00
  • 84d5475541 llama : fix Gemma3 SWA KV cache shift (#12373) Georgi Gerganov 2025-03-13 19:08:07 +02:00
  • be7c303410 arg : no n_predict = -2 for examples except for main and infill (#12364) Xuan-Son Nguyen 2025-03-13 12:34:54 +01:00
  • e0dbec0bc6 llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) Georgi Gerganov 2025-03-13 12:35:44 +02:00
  • 2048b5913d server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338) Ishaan Gandhi 2025-03-13 06:10:05 -04:00
  • f08f4b3187 Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) Oscar Barenys 2025-03-12 20:06:58 +01:00
  • 80a02aa858 llama.swiftui : fix xcframework dir in README [no ci] (#12353) Daniel Bevenius 2025-03-12 13:45:32 +01:00
  • 363f8c5d67 sycl : variable sg_size support for mmvq kernels (#12336) Alberto Cabrera Pérez 2025-03-12 09:57:32 +00:00
  • 34c961b181 CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315) uvos 2025-03-12 10:14:11 +01:00
  • 7841fc723e llama : Add Gemma 3 support (+ experimental vision capability) (#12343) Xuan-Son Nguyen 2025-03-12 09:30:24 +01:00
  • bf69cfe62f vulkan: fix bug in coopmat1 mul_mat_id (#12316) Jeff Bolz 2025-03-12 00:59:19 -05:00
  • 10f2e81809 CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177) uvos 2025-03-11 20:16:03 +01:00
  • ba7654380a ggml-backend : fix backend search path (#12330) jklincn 2025-03-11 21:25:17 +08:00
  • 6ab2e4765a metal : Cache the Metal library at the device context level (#12265) BB-fat 2025-03-11 19:45:02 +08:00
  • 96e1280839 clip : bring back GPU support (#12322) Xuan-Son Nguyen 2025-03-11 09:20:16 +01:00
  • 2c9f833d17 mat vec double buffer (#12188) Eve 2025-03-10 19:28:11 +00:00
  • 251364549f musa: support new arch mp_31 and update doc (#12296) R0CKSTAR 2025-03-11 01:18:25 +08:00
  • 8acdacb3ea opencl: use OpenCL C standard supported by the device (#12221) Henry Linjamäki 2025-03-10 18:57:00 +02:00
  • 89b2b56e86 readme: added Sidekick to available UIs (#12311) John Bean 2025-03-10 22:13:09 +08:00
  • e128a1bf5b tests : fix test-quantize-fns to init the CPU backend (#12306) Georgi Gerganov 2025-03-10 14:07:15 +02:00
  • 6ef79a67ca common : refactor '-o' option (#12278) marcoStocchi 2025-03-10 12:34:13 +01:00
  • 4e39a3c332 server: extract <think> tags from qwq outputs (#12297) Olivier Chafik 2025-03-10 10:59:03 +00:00
  • be421fc429 tool-call: ensure there's always a non-empty tool call id (#12292) Olivier Chafik 2025-03-10 09:45:29 +00:00
  • 87c2630546 allow missing content in message if tool_calls provided (#12293) Olivier Chafik 2025-03-10 09:45:07 +00:00
  • 2b3a25c212 sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291) Olivier Chafik 2025-03-10 09:44:42 +00:00
  • 8352cdc87b llava : fix bug in minicpm-v code (#11513) tc-mb 2025-03-10 16:33:24 +08:00
  • 1e2f78a004 server : add speculative decoding presets for FIM (#12287) Georgi Gerganov 2025-03-09 19:08:20 +02:00
  • 0fd7ca7a21 authors : update (#12271) Georgi Gerganov 2025-03-08 18:26:00 +02:00
  • 6fefc05a7a ggml-backend : make path_str compatible with C++20 (#12269) Jason C.H 2025-03-09 00:02:39 +08:00
  • 7ab364390f server : infill gen ends on new line (#12254) Georgi Gerganov 2025-03-07 20:54:30 +02:00
  • 7c7f3b7f43 ggml : skip intermediate .air file when compiling .metallib (#12247) Daniel Bevenius 2025-03-07 14:15:27 +01:00
  • 102ac1891d sync : ggml Georgi Gerganov 2025-03-07 14:00:27 +02:00
  • d6ae2fa061 ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) vmobilis 2025-03-07 11:11:40 +03:00
  • 68d0027f3d ggml-cpu: faster AVX2 variant for IQ1_M (#12216) Rémy O 2025-03-07 12:54:22 +01:00
  • ea002810a2 ci : fix save-load test invocations (#12245) Georgi Gerganov 2025-03-07 12:19:31 +02:00
  • 8fad3c7a7c server : Log original chat template parsing error (#12233) Sigbjørn Skjæret 2025-03-07 11:15:33 +01:00
  • 7cf64f6bee sync: minja - support QwQ-32B (#12235) Olivier Chafik 2025-03-07 09:33:37 +00:00
  • 5e2d57b2b2 metal : simplify kernel arguments using a struct (#3229) (#12194) BB-fat 2025-03-07 15:35:57 +08:00
  • f1648e91cf HIP: fix rocWMMA build flags under Windows (#12230) David Huang 2025-03-07 15:06:08 +08:00
  • d6c95b0740 metal : fix default.metallib build (#12224) Daniel Bevenius 2025-03-07 06:23:16 +01:00
  • d76a86d967 opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops (#12217) lhez 2025-03-06 16:20:35 -08:00
  • 776f9e59cc cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (#12094) xiaofei 2025-03-07 06:58:25 +08:00
  • 3d652bfddf readme : update bindings (#12229) Lucas Moura Belo 2025-03-06 16:15:13 -03:00
  • 5220a16d18 CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222) Johannes Gäßler 2025-03-06 18:45:09 +01:00
  • 3ffbbd5ce1 HIP: rocWMMA documentation and enabling in workflow builds (#12179) David Huang 2025-03-06 21:14:11 +08:00
  • 42994048a3 update function-calling.md w/ template override for functionary-small-v3.2 (#12214) Olivier Chafik 2025-03-06 09:03:31 +00:00
  • e9b2f84f14 llava: add big-endian conversion for image encoder (#12218) Aaron Teo 2025-03-06 16:33:21 +08:00
  • e721c05c93 HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (#12209) uvos 2025-03-06 08:20:52 +01:00
  • 57b6abf85a android : fix KV cache log message condition (#12212) Han Yin 2025-03-05 22:22:49 -08:00
  • 94bb63e4f0 opencl : fix buffer alignment (#12197) Henry Linjamäki 2025-03-06 03:33:40 +02:00
  • f79243992c opencl : fix ulong kernel args were set from int variables (#12174) Henry Linjamäki 2025-03-06 03:31:14 +02:00
  • ed4ce0dda2 opencl : fix profile-related errors (#12095) simon886212 2025-03-06 09:30:05 +08:00
  • 07d1572347 ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154) Rémy O 2025-03-06 02:26:10 +01:00
  • 5e43f104cc SYCL: Disable f16 Unary OPs as not supported by the kernels (#12201) Akarshan Biswas 2025-03-05 21:28:23 +05:30
  • 16e4b22c5e ggml : fix GGMLMetalClass ODR (#12200) Plamen Minev 2025-03-05 17:16:01 +02:00
  • 074c4fd39d ci : add fetch-depth to xcframework upload (#12195) Daniel Bevenius 2025-03-05 14:16:40 +01:00
  • 669912d9a5 tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) Olivier Chafik 2025-03-05 13:05:13 +00:00
  • fa31c438e0 ci : fix xcframework artifact tag (#12191) Daniel Bevenius 2025-03-05 10:22:29 +01:00
  • 3ccbfe5a71 ci : remove xframework upload (#12190) Daniel Bevenius 2025-03-05 08:34:02 +01:00
  • 06a92a193a server : fix cache reuse logic (#12161) Clauszy 2025-03-05 15:25:45 +08:00
  • a057897ad4 llama : add xcframework build script (#11996) Daniel Bevenius 2025-03-05 06:30:31 +01:00
  • 5bbe6a9fe9 ggml : portability fixes for VS 2017 (#12150) mgroeber9110 2025-03-04 17:53:26 +01:00
  • 20a9b8f5e1 readme : fix roadmap link (#12185) Georgi Gerganov 2025-03-04 18:42:44 +02:00
  • 56d7a9f812 main: allow preloading conversation with -p and add -st / --single-turn (#12145) Sigbjørn Skjæret 2025-03-04 17:19:39 +01:00
  • 1a24c4621f server: fix deadly typo in response_format.json_schema.schema handling (#12168) Olivier Chafik 2025-03-04 06:24:07 +00:00
  • becade5de7 HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (#12032) David Huang 2025-03-04 05:10:54 +08:00
  • dfd6b2c0be sync : ggml Georgi Gerganov 2025-03-03 17:57:38 +02:00
  • b64d7cc272 cuda: unary ops as float + de-duplicate (ggml/1130) cmdr2 2025-03-03 20:51:31 +05:30
  • 3d1cf3cf33 sync : ggml Georgi Gerganov 2025-02-28 12:37:35 +02:00
  • 0cbee131ad cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) cmdr2 2025-02-28 12:36:46 +02:00
  • 8371d44595 sync : ggml Georgi Gerganov 2025-02-28 09:09:58 +02:00
  • 87abb7e903 cuda/cpu: Increase support for fp16 unary operations (ggml/1125) cmdr2 2025-02-28 12:34:39 +05:30
  • 6d4c23b81b whisper : support GGML_BACKEND_DL (whisper/2843) Diego Devesa 2025-02-27 13:35:07 +01:00
  • 6512a90037 cmake : fix compile assumptions for power9/etc (whisper/2777) midnight 2025-02-05 04:41:10 -08:00
  • 4512055792 Told cmake to install ggml-cpp.h as a public header file. (ggml/1126) petterreinholdtsen 2025-02-26 21:44:00 +01:00
  • f54a4ba11e Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) cmdr2 2025-02-25 18:06:34 +05:30
  • aede2074f6 scripts : sync-ggml-am.sh fix Georgi Gerganov 2025-02-28 09:09:38 +02:00
  • 2679c3b55d ci : set GITHUB_ACTION env var for server tests (#12162) Daniel Bevenius 2025-03-03 16:17:36 +01:00
  • c43af9276b tts: add speaker file support (#12048) dm4 2025-03-03 21:09:29 +08:00
  • d5c63cd7f9 test-backend-ops : add option -p to filter by op params (#12155) Diego Devesa 2025-03-03 14:00:46 +01:00