Commit Graph

  • bbfc849274 SYCL: add ops doc (#14901) Akarshan Biswas 2025-07-27 17:52:58 +05:30
  • ca0ef2dddb llama : clarify comment about pp and tg graphs [no ci] (#14895) Daniel Bevenius 2025-07-27 12:10:51 +02:00
  • 89d1029559 vulkan : add fp16 support for the conv_2d kernel (#14872) Erik Scholz 2025-07-27 12:04:33 +02:00
  • f1a4e72de5 vulkan: skip empty set_rows to avoid invalid API usage (#14860) Jeff Bolz 2025-07-27 04:05:34 -05:00
  • 4762ad7316 model : make rope_yarn_log_mul optional for deepseek2 (#14896) Gabriel Larson 2025-07-27 03:18:37 -05:00
  • 1dc9614e06 llama : fix kq_scale for the attention layers of PLaMo2 (#14892) Shunta Saito 2025-07-27 16:38:44 +09:00
  • 446595b9b3 Docs: add instructions for adding backends (#14889) Aman Gupta 2025-07-27 09:36:43 +08:00
  • 66906cd82a HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (#14624) deepsek 2025-07-26 18:28:14 -04:00
  • 11dd5a44eb CANN: Implement GLU ops (#14884) hipudding 2025-07-26 17:56:18 +08:00
  • 9b8f3c6c77 musa: fix build warnings (unused variable) (#14869) R0CKSTAR 2025-07-26 10:36:02 +08:00
  • c7f3169cd5 ggml-cpu : disable GGML_NNPA by default due to instability (#14880) Aaron Teo 2025-07-26 01:09:03 +08:00
  • 793c0d7f46 metal: SSM_SCAN performance (#14743) Gabe Goodhart 2025-07-25 10:47:39 -06:00
  • ce111d39d6 opencl: add fused rms_norm_mul (#14841) lhez 2025-07-25 08:12:13 -07:00
  • e7fecba934 docs : update HOWTO‑add‑model.md for ModelBase and new model classes (#14874) wooksong 2025-07-25 23:25:05 +09:00
  • e2b7621e7c ggml : remove invalid portPos specifiers from dot files (#14838) Oliver Simons 2025-07-25 13:29:57 +02:00
  • c1dbea752a context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (#14870) Georgi Gerganov 2025-07-25 14:28:06 +03:00
  • 749e0d27f0 mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503) kiwi 2025-07-25 19:08:04 +08:00
  • 64bf1c3744 rpc : check for null buffers in get/set/copy tensor endpoints (#14868) Chris Rohlf 2025-07-25 06:17:02 -04:00
  • c12bbde372 sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855) Diego Devesa 2025-07-25 01:07:26 -07:00
  • 3f4fc97f1d musa: upgrade musa sdk to rc4.2.0 (#14498) R0CKSTAR 2025-07-25 03:05:37 +08:00
  • 2df255da3c sync : ggml Georgi Gerganov 2025-07-24 18:30:33 +03:00
  • 60f816a79d cmake : fix usage issues (ggml/1257) Kai Pastor 2025-07-22 20:13:21 +02:00
  • 5592f278b6 ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) Daniel Bevenius 2025-07-21 15:53:12 +02:00
  • e4868d16d2 context : perform output reorder lazily upon access after sync (#14853) Georgi Gerganov 2025-07-24 16:31:48 +03:00
  • 820de57d4f chat : fix kimi-k2 chat template (#14852) Xuan-Son Nguyen 2025-07-24 13:59:56 +02:00
  • cb4a63aad6 sycl: fixed semantics of block offset calculation (#14814) Alberto Cabrera Pérez 2025-07-24 11:09:57 +01:00
  • 86f5623d90 llama : fix MiniCPM inference after Granite Four changes (#14850) yummy 2025-07-24 17:50:51 +08:00
  • 39cffdf188 docs: add libcurl-dev install hint for Linux distros (#14801) Pouya 2025-07-24 12:26:44 +03:00
  • 065908cb09 metal : fix fusion across different encoders (#14849) Georgi Gerganov 2025-07-24 10:24:05 +03:00
  • 4ec6291a24 sycl: fix undefined variable in work group size check (#14843) Donghyeon Jeong 2025-07-24 13:50:41 +09:00
  • a12363bbf0 convert : text-only support for GLM-4.1V-9B-Thinking (#14823) jacekpoplawski 2025-07-23 23:23:57 +02:00
  • a86f52b285 CUDA: fix overflow in FA, tune performance (#14840) Johannes Gäßler 2025-07-23 21:43:25 +02:00
  • b284197df4 CUDA: fix compilation with GGML_CUDA_F16 (#14837) Johannes Gäßler 2025-07-23 18:22:30 +02:00
  • 221c0e0c58 ci : correct label refactor->refactoring (#14832) Sigbjørn Skjæret 2025-07-23 14:27:54 +02:00
  • 07a19e27a2 CUDA: fix quantized KV cache + multiple sequences (#14822) Johannes Gäßler 2025-07-23 12:35:53 +02:00
  • 18f3b5ff9e tests : add non-cont K,V FA tests Georgi Gerganov 2025-07-18 13:36:27 +03:00
  • 7233358d29 memory : handle saving/loading null layers in recurrent memory (#14675) l3utterfly 2025-07-23 16:16:41 +08:00
  • 6c88b3bb25 ggml: fix loongarch quantize_row_q8_1 error (#14827) lixing-star 2025-07-23 14:39:51 +08:00
  • 14c28dfc50 CANN: weight format to NZ for Ascend310P3 (#14407) chen fan 2025-07-23 11:58:00 +08:00
  • 8c988fa41d CUDA: add fused rms norm (#14800) Aman Gupta 2025-07-23 09:25:42 +08:00
  • acd6cb1c41 ggml : model card yaml tab->2xspace (#14819) Csaba Kecskemeti 2025-07-22 09:29:43 -07:00
  • 84712b6043 vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817) Jeff Bolz 2025-07-22 10:35:21 -05:00
  • d4d1522b20 llama : add model type detection for rwkv7 7B&14B (#14816) Molly Sophia 2025-07-22 23:01:29 +08:00
  • d1aa0cc5d1 imatrix: add option to display importance score statistics for a given imatrix file (#12718) Ed Addario 2025-07-22 13:33:37 +01:00
  • c8ade30036 Mtmd: add a way to select device for vision encoder (#14236) stduhpf 2025-07-22 12:51:03 +02:00
  • e28c0b80c2 cuda : implement bf16 cpy ops and enable bf16 cont (#14763) Sigbjørn Skjæret 2025-07-22 12:33:10 +02:00
  • 8e6f8bc875 opencl: remove unreachable return (#14806) lhez 2025-07-21 23:53:30 -07:00
  • adef81781a server : allow setting --reverse-prompt arg (#14799) Molly Sophia 2025-07-22 09:24:22 +08:00
  • 48b86c4fdb cuda: remove linking to cublasLt (#14790) R0CKSTAR 2025-07-22 07:45:26 +08:00
  • 38d3af1b73 opencl: fix im2col when KW!=KH (#14803) Sigbjørn Skjæret 2025-07-21 22:55:10 +02:00
  • 6c9ee3b17e opencl: add conv2d kernel (#14403) rmatif 2025-07-21 19:03:19 +02:00
  • cd465d823c sycl: Fix im2col (#14797) Romain Biessy 2025-07-21 18:39:29 +02:00
  • 922042601b kleidiai: add support for get_rows (#14676) Charles Xu 2025-07-21 15:49:52 +02:00
  • 2ba1333b35 docs : fix backends table in README.md (#14796) Radoslav Gerganov 2025-07-21 15:03:49 +03:00
  • c2e058f1b4 vulkan/cuda: Fix im2col when KW!=KH (#14789) Jeff Bolz 2025-07-21 06:35:40 -05:00
  • c82d48ec23 llama : fix --reverse-prompt crashing issue (#14794) Molly Sophia 2025-07-21 17:38:36 +08:00
  • b4efd77f8a server : add parse_special option to /tokenize endpoint (#14783) IsaacDynamo 2025-07-21 09:24:51 +02:00
  • 2be60cbc27 docs : fix link for tools/perplexity in README.md (#14780) Aman Gupta 2025-07-21 02:13:47 +08:00
  • b526ad2668 Documentation: Further revisions to the Vulkan section in build.md (#14785) rspOverflow 2025-07-20 23:55:32 +07:00
  • 938b785764 Clang-format: local files first + fix BinPacking (#14779) Aman Gupta 2025-07-20 19:42:34 +08:00
  • 36c153248f Contrib: add 0cc4m as codeowner for Vulkan backend (#14775) 0cc4m 2025-07-19 22:47:21 +02:00
  • a979ca22db ggml: adds CONV_2D op and direct GEMM Vulkan implementation (#14316) Ervin Áron Tasnádi 2025-07-19 21:59:08 +02:00
  • 90083283ec imatrix : use GGUF to store importance matrices (#9400) compilade 2025-07-19 12:51:22 -04:00
  • d4b91ea7b2 vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274) (#14707) Peter0x44 2025-07-19 16:58:03 +01:00
  • 83f5872404 Vulkan: Fix fprintf format-security warning (#14770) 0cc4m 2025-07-19 17:47:53 +02:00
  • f0d4d176df Documentation: Update build.md's Vulkan section (#14736) rspOverflow 2025-07-19 17:18:36 +07:00
  • b17230917c sync : ggml Georgi Gerganov 2025-07-19 11:46:12 +03:00
  • bf9087f59a metal : fuse add, mul + add tests (#14596) Georgi Gerganov 2025-07-18 20:37:26 +03:00
  • 9fb1042ce6 graph : fix graph reuse reset of params (#14760) Georgi Gerganov 2025-07-18 20:08:33 +03:00
  • 2adf8d83ac parallel : add option for different RNG seeds (#14757) Georgi Gerganov 2025-07-18 17:33:41 +03:00
  • 021cc28bef cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (#14741) Oliver Simons 2025-07-18 13:35:32 +02:00
  • d498af3d5a graph : avoid huge warm-up graphs for MoE models (#14753) Georgi Gerganov 2025-07-18 14:31:15 +03:00
  • eacdeb5bfc model : fix build after merge conflict (#14754) Georgi Gerganov 2025-07-18 11:53:55 +03:00
  • e0cb5c5cb8 model : add EXAONE 4.0 support (#14630) lgai-exaone 2025-07-18 17:45:49 +09:00
  • f9a31eea06 CUDA: set_rows + cpy.cu refactor (#14712) Aman Gupta 2025-07-18 14:54:18 +08:00
  • 8f974bc1e9 graph : refactor context to not pass gf explicitly (#14629) Georgi Gerganov 2025-07-18 08:29:28 +03:00
  • 09651d09ff graph : Pass the graph placeholder message in debug mode (#14748) Nexes the Elder 2025-07-18 06:25:54 +02:00
  • 349ea79fce use max work group size for device to replace the magic number (#14732) Neo Zhang Jianyu 2025-07-18 10:23:14 +08:00
  • 670e1360cd convert : fix Ernie4.5 MoE without shared experts (#14746) Piotr Wilkin (ilintar) 2025-07-18 01:17:16 +02:00
  • 760b4484e3 nix : use optionalAttrs for env mkDerivation attrset argument (#14726) Wroclaw 2025-07-18 00:18:16 +02:00
  • cb887f1bc1 model: add Ernie 4.5 MoE support (#14658) Piotr Wilkin (ilintar) 2025-07-17 23:15:32 +02:00
  • d6fb3f6b49 kv-cache : fix k-shift for multiple streams (#14742) Georgi Gerganov 2025-07-17 20:52:33 +03:00
  • 01612b7409 llama : reuse compute graphs (#14482) Georgi Gerganov 2025-07-17 19:08:33 +03:00
  • 086cf81e88 llama : fix parallel processing for lfm2 (#14705) Tarek Dakhran 2025-07-17 09:22:11 +02:00
  • d9b691081c kv-cache : opt mask set input (#14600) Georgi Gerganov 2025-07-17 09:49:15 +03:00
  • ad57d3edd2 batch : fix uninitialized has_cpl flag (#14733) Georgi Gerganov 2025-07-17 09:45:54 +03:00
  • 1ba45d4982 ci : disable failing vulkan crossbuilds (#14723) Sigbjørn Skjæret 2025-07-17 01:52:08 +02:00
  • 19e5943d9e convert : make hf token optional (#14717) Sigbjørn Skjæret 2025-07-16 23:17:43 +02:00
  • 496957e1cb llama : fix parameter order for hybrid memory initialization (#14725) Diner Burger 2025-07-16 15:17:25 -04:00
  • 21c021745d ggml: Add initial WebGPU backend (#14521) Reese Levine 2025-07-16 08:18:51 -07:00
  • b0f0ecc3dc model : support output bias for qwen2 (#14711) tempstudio 2025-07-16 10:02:06 -05:00
  • 225e7a1438 llama : add high-throughput mode (#14363) Georgi Gerganov 2025-07-16 16:35:42 +03:00
  • ab14019821 Support diffusion models: Add Dream 7B (#14644) Aman Gupta 2025-07-16 20:03:51 +08:00
  • 64978340b0 ggml : add asserts (#14720) Georgi Gerganov 2025-07-16 14:43:32 +03:00
  • 6ffd4e9c44 server : pre-calculate EOG logit biases (#14721) Georgi Gerganov 2025-07-16 14:04:12 +03:00
  • e4841d24d3 llama : fix parallel processing for plamo2 (#14716) Shunta Saito 2025-07-16 19:12:22 +09:00
  • 538cc77f7f server : fix handling of the ignore_eos flag (#14710) Georgi Gerganov 2025-07-16 12:13:57 +03:00
  • 5cae766541 scripts: synthetic prompt mode for server-bench.py (#14695) Johannes Gäßler 2025-07-16 09:33:28 +02:00
  • 4b91d6f71f convert : only check for tokenizer folder if we need it (#14704) Sigbjørn Skjæret 2025-07-16 08:52:04 +02:00
  • cf91f217f1 convert : add pre-computed hashes first to prevent order mishaps (#14701) Sigbjørn Skjæret 2025-07-16 08:51:12 +02:00