Commit Graph

  • 06332e2867 llama-batch: fix build fails with -Werror=missing-braces (#16614) takuya kodama 2025-10-20 16:27:09 +08:00
  • 72d53e6c4d readme: update bindings (#16651) Ron Evans 2025-10-20 10:20:04 +02:00
  • 2330de7b84 SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613) safranowith 2025-10-20 11:08:32 +03:00
  • 7062dd8460 llama-context: only warn on pooling_type when user specified (#16674) takuya kodama 2025-10-20 15:44:21 +08:00
  • 0398752dd4 model : add Granite Hybrid types (#16635) Giuseppe Scrivano 2025-10-19 23:54:31 +02:00
  • 4f73d0a951 ci : fix binaries release failure for s390x (binaries may not work yet) (#16664) Aaron Teo 2025-10-20 05:06:39 +08:00
  • cec5edbcae ci : avoid manual updates of docs/ops.md (#16663) Sigbjørn Skjæret 2025-10-19 14:03:25 +02:00
  • fcb235b466 ci: include s390x release binaries (#16648) Aaron Teo 2025-10-19 18:37:47 +08:00
  • 55754bebd5 CODEOWNERS: update for ggml-cuda/mmf (#16660) Aman Gupta 2025-10-19 15:37:12 +08:00
  • ee09828cb0 HIP: fix GPU_TARGETS (#16642) Johannes Gäßler 2025-10-18 14:47:32 +02:00
  • e56abd2098 vulkan: Implement topk_moe fused shader, ported from CUDA (#16641) Jeff Bolz 2025-10-18 05:22:57 -05:00
  • 38355c6c8e CUDA: use registers instead of smem in topk-moe (#16647) Aman Gupta 2025-10-18 17:52:53 +08:00
  • 81387858f1 opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602) Shawn Gu 2025-10-17 17:55:32 -07:00
  • 66b0dbcb2d llama-model: fix insonsistent ctxs <-> bufs order (#16581) Johannes Gäßler 2025-10-17 17:41:09 +02:00
  • 41386cf365 rpc : report actual free memory (#16616) Radoslav Gerganov 2025-10-17 18:02:52 +03:00
  • 3d4e86bbeb vulkan: Add State Space Model (SSM) Operations Support (#16463) Giuseppe Scrivano 2025-10-17 14:23:47 +02:00
  • 342c728d03 ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629) muggle-stack 2025-10-17 18:01:23 +08:00
  • ababae7e1e webui: reorganize settings layout (#16607) Pascal 2025-10-17 10:35:03 +02:00
  • b19491599d vulkan: fix debug build (add_rms_len/data not found) (#16624) Jeff Bolz 2025-10-17 02:31:04 -05:00
  • 9ad4f1931e metal : add CONV_TRANSPOSE_2D (#16542) Ilia Ilmer 2025-10-17 02:33:58 -04:00
  • 79967ec596 grammar : use int64_t to avoid int overflows in int schema to grammar conversion logic (#16626) Olivier Chafik 2025-10-17 06:59:31 +01:00
  • ceff6bb253 SYCL SET operator optimized for F32 tensors (#16350) GittyBurstein 2025-10-17 05:36:40 +03:00
  • 1bb4f43380 mtmd : support home-cooked Mistral Small Omni (#14928) Xuan-Son Nguyen 2025-10-16 19:00:31 +02:00
  • 683fa6ba4e fix: added a normalization step for MathJax-style \[\] and \(\) delimiters (#16599) Pascal 2025-10-16 16:28:41 +02:00
  • b22572e97d sycl : add ARANGE operator (#16362) GittyBurstein 2025-10-16 16:26:21 +03:00
  • 7a50cf388a CANN: format code using .clang-format (#15863) Chenguang Li 2025-10-16 16:41:11 +08:00
  • 6f5d924637 common : Update the docs on -t --threads (#16236) takasurazeem 2025-10-16 01:11:33 -04:00
  • adc9b60f19 ggml-cpu: replace putenv with setenv for const-correctness (#16573) takuya kodama 2025-10-16 13:10:32 +08:00
  • ee50ee1ead SYCL: Add GGML_OP_MEAN operator support (#16009) yael-works 2025-10-16 07:21:28 +03:00
  • 7adc79c032 gguf-py : add support for endian conversion of BF16 data (#16594) Aleksei Nikiforov 2025-10-15 22:43:08 +02:00
  • 466c1911ab cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083) safranowith 2025-10-15 22:24:51 +03:00
  • 0cb7a0683b opencl: add q8_0 mm support (#16469) lhez 2025-10-15 10:51:04 -07:00
  • d93f8439b0 opencl: fix FA for f32 (#16584) lhez 2025-10-15 10:48:28 -07:00
  • f9fb33f263 Add server-driven parameter defaults and syncing (#16515) Aleksander Grygier 2025-10-15 16:22:20 +02:00
  • f4ce81c45e metal: optimise GGML_OP_SUM (#16559) Sam/Samuel 2025-10-15 23:05:56 +09:00
  • 17304cbcc1 server : fix img token logs (#16595) Georgi Gerganov 2025-10-15 16:53:12 +03:00
  • 3e3cb19f64 llama-quant: add support for mmproj (#16592) Xuan-Son Nguyen 2025-10-15 14:48:08 +02:00
  • 5acd455460 CUDA: Changing the CUDA scheduling strategy to spin (#16585) Julius Tischbein 2025-10-15 13:54:15 +02:00
  • 554fd578a5 server : fix mtmd checkpoints (#16591) Georgi Gerganov 2025-10-15 12:51:27 +03:00
  • fa882fd2b1 metal : avoid using Metal's gpuAddress property (#16576) Georgi Gerganov 2025-10-14 20:33:05 +03:00
  • ffa059034c vulkan: Add ACC_TYPE_VEC2 implementation (#16203) SavicStefan 2025-10-14 19:18:05 +02:00
  • 120bf7046d CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (#16577) Aman Gupta 2025-10-14 22:48:08 +08:00
  • 4258e0cfe7 vulkan: Support FA with K/V in F32 (#16543) Jeff Bolz 2025-10-14 08:53:37 -05:00
  • 7ea15bb64c vulkan: Improve build time for MSVC (#16545) Jeff Bolz 2025-10-14 07:51:36 -05:00
  • 9c7185dd28 CUDA: enable FA for FP32 KV cache (#16546) Johannes Gäßler 2025-10-14 14:22:47 +02:00
  • 1ee9d0b415 CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557) Aman Gupta 2025-10-14 19:16:21 +08:00
  • 48e2fa9fb7 CUDA: add fp kernel for larger batch size MoE (#16512) Aman Gupta 2025-10-14 19:15:15 +08:00
  • 5b6913c47b cuda : remove legacy copy-op pointer indirection code (#16485) Anav Prasad 2025-10-14 09:53:49 +00:00
  • bc07349a7f server : dynamic token limit for prompt cache (#16560) Georgi Gerganov 2025-10-14 08:48:50 +03:00
  • e60f241eac metal : FA support F32 K and V and head size = 32 (#16531) Georgi Gerganov 2025-10-13 23:07:57 +03:00
  • e38b7c6e9e graph : support cacheless embeddings with FA and iSWA (#16528) Georgi Gerganov 2025-10-13 22:42:37 +03:00
  • 5016b72862 opencl: fix build targeting CL 2 (#16554) lhez 2025-10-13 11:50:37 -07:00
  • 7049736b2d CUDA: fix numerical issues in tile FA kernel (#16540) Johannes Gäßler 2025-10-13 16:29:45 +02:00
  • 01d2bdc2bc ggml : fix build broken with -march=armv9-a on MacOS (#16520) Jie Fu (傅杰) 2025-10-13 20:48:47 +08:00
  • 56fc38b965 CANN: fix CPU memory leak in CANN backend (#16549) Chenguang Li 2025-10-13 17:01:24 +08:00
  • 1fb9504eb7 fix: add remark plugin to render raw HTML as literal text (#16505) Pascal 2025-10-13 10:55:32 +02:00
  • 3f750f8d76 metal: add support for opt_step_sgd (#16539) Sam/Samuel 2025-10-13 16:25:02 +08:00
  • c515fc5771 ggml : fix scalar path for computing norm (#16558) Georgi Gerganov 2025-10-13 11:22:27 +03:00
  • f9bc66c3eb CANN: Update several operators to support FP16 data format (#16251) hipudding 2025-10-13 08:52:22 +08:00
  • a31cf36ad9 metal : add opt_step_adamw and op_sum (#16529) Sam/Samuel 2025-10-13 02:43:14 +08:00
  • 81d54bbfd5 webui: remove client-side context pre-check and rely on backend for limits (#16506) Pascal 2025-10-12 18:06:41 +02:00
  • c7be9febcb [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521) Neo Zhang Jianyu 2025-10-12 21:53:35 +08:00
  • 8415f61e23 ci : add Vulkan on Ubuntu with default packages build (#16532) Mathieu Baudier 2025-10-12 15:48:03 +02:00
  • 2c301e91ab common : handle unicode during partial json parsing (#16526) Aldehir Rojas 2025-10-12 08:18:47 -05:00
  • 4b2dae383d common : update presets (#16504) Georgi Gerganov 2025-10-12 09:29:13 +03:00
  • 41aac5c69b ggml : Fix FP16 ELU positive branch (#16519) sirus20x6 2025-10-12 00:25:37 -05:00
  • a2fba89a42 hparams : add check for layer index in is_recurrent (#16511) Daniel Bevenius 2025-10-12 07:19:06 +02:00
  • 20cc625edc ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518) sirus20x6 2025-10-12 00:15:00 -05:00
  • 11f0af5504 CUDA: faster tile FA, add oob checks, more HSs (#16492) Johannes Gäßler 2025-10-11 20:54:32 +02:00
  • a3cb04744f metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494) Georgi Gerganov 2025-10-11 16:54:10 +03:00
  • 4a8fbe0a5e feat: render user content as markdown option (#16358) Pascal 2025-10-11 15:50:49 +02:00
  • 31d0ff1869 server / ranking : add sorting and management of top_n (#16403) Yann Follet 2025-10-11 21:39:04 +08:00
  • 97870e6497 cuda : avoid initializing unused devices (#16510) Diego Devesa 2025-10-11 04:02:26 -07:00
  • 477a66b035 convert : correctly handle LLaMA tokenizer for Jamba (#16470) amirai21 2025-10-11 11:33:41 +03:00
  • e60f01d941 server : fix division by zero when reporting stats (#16501) Georgi Gerganov 2025-10-10 22:15:05 +03:00
  • 81086cd6a3 vocab : mark EOT token for Granite models (#16499) Georgi Gerganov 2025-10-10 17:17:31 +03:00
  • 68ee98ae18 server : return HTTP 400 if prompt exceeds context length (#16486) Radoslav Gerganov 2025-10-10 17:11:07 +03:00
  • cdb6da468c server : log requests to /v1/completions (#16495) Radoslav Gerganov 2025-10-10 13:22:27 +03:00
  • 6d69ab3f26 cmake : Dont define XOPENSOURCE on AIX (#16481) Prajwal B Mehendarkar 2025-10-10 13:45:46 +05:30
  • 1faa13a118 webui: updated the chat service to only include max_tokens in the req… (#16489) Pascal 2025-10-09 22:54:57 +02:00
  • 1deee0f8d4 cpu : optimize the ggml NORM operation (#15953) duduta 2025-10-09 22:11:15 +03:00
  • d00cbea63c server : host-memory prompt caching (#16391) Georgi Gerganov 2025-10-09 18:54:51 +03:00
  • 8328fd4bae No markdown in cot (#16483) Pascal 2025-10-09 17:36:29 +02:00
  • 56b4795842 model-conversion : add support for SentenceTransformers (#16387) Daniel Bevenius 2025-10-09 14:35:22 +02:00
  • 2c0d875ae6 ci: add ARM64 Kleidiai build and test support (#16462) sudhiarm 2025-10-09 09:13:18 +01:00
  • aa4711d369 CANN: Improve ACL graph matching (#16166) Chenguang Li 2025-10-09 15:50:25 +08:00
  • d80d6d2400 kleidiai: kernel interface refactoring (#16460) Charles Xu 2025-10-09 09:29:17 +02:00
  • b260213755 [SYCL] refactor soft_max, add soft_max_back (#16472) Neo Zhang Jianyu 2025-10-09 15:25:11 +08:00
  • e08db42595 model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (#16367) Saba Fallah 2025-10-09 08:39:18 +02:00
  • 12bbc3fa50 refactor: centralize CoT parsing in backend for streaming mode (#16394) Pascal 2025-10-08 22:18:41 +02:00
  • 9d0882840e Disable CUDA host buffers on integrated GPUs (#16308) ai-fonsi 2025-10-08 20:21:46 +02:00
  • d2ee056e1d server : fix cancel pending task (#16467) issixx 2025-10-08 17:20:18 +09:00
  • b2c08c9ec4 metal : mark FA blocks (#16372) Georgi Gerganov 2025-10-08 10:57:53 +03:00
  • 7fdd16b432 server : improve context checkpoint logic (#16440) Georgi Gerganov 2025-10-08 10:57:29 +03:00
  • 74b8fc17f9 ggml webgpu: profiling, CI updates, reworking of command submission (#16452) Reese Levine 2025-10-07 13:48:56 -07:00
  • aeaf8a36f0 llama : support LiquidAI LFM2-MoE hybrid model (#16464) Tarek Dakhran 2025-10-07 20:03:35 +02:00
  • df1b612e29 server : add /v1/health endpoint (#16461) Georgi Gerganov 2025-10-07 15:57:14 +03:00
  • 4e0388aa8a webui : added download action (#13552) (#16282) Sascha Rogmann 2025-10-07 11:11:08 +02:00
  • ef4c5b87ea presets : fix pooling param for embedding models (#16455) Georgi Gerganov 2025-10-07 10:32:32 +03:00
  • c61ae20d05 rpc : update documentation (#16441) Radoslav Gerganov 2025-10-07 09:59:13 +03:00