Commit Graph

  • 31c511a968 CUDA: Volta tensor core support for MMF (#16843) Johannes Gäßler 2025-10-31 15:57:19 +01:00
  • 6d39015a74 sync : ggml Georgi Gerganov 2025-10-31 16:25:50 +02:00
  • 4146d6a1a6 CUDA: add expert reduce kernel (#16857) Aman Gupta 2025-10-31 20:05:07 +08:00
  • 8da3c0e200 batch : fix consistency checks for the input positions (#16890) Georgi Gerganov 2025-10-31 13:50:33 +02:00
  • c22473b580 server : don't print user inputs to console (#16871) Georgi Gerganov 2025-10-31 10:54:19 +02:00
  • 0f715b4e75 server : fix typos in server.cpp comments [no ci] (#16883) Daniel Bevenius 2025-10-31 09:51:26 +01:00
  • d2d931f173 vulkan: disable spirv-opt for rope shaders (#16872) Jeff Bolz 2025-10-31 02:34:47 -05:00
  • 2976b0374d vulkan: Fix crash when FP16 mul_mat accumulation is not supported (#16796) Masato Nakasaka 2025-10-31 16:18:59 +09:00
  • d2a2673dd1 vulkan: fix shmem overrun in mmq id shader (#16873) Ruben Ortlam 2025-10-31 08:14:49 +01:00
  • 13002a0896 ggml-hexagon: respect input size when getting/setting tensor data (#16836) l3utterfly 2025-10-31 12:46:31 +08:00
  • 6eb208d17e ci : enable free-disk-space on cuda docker build (#16877) Sigbjørn Skjæret 2025-10-31 00:34:27 +01:00
  • 9984cbb61d opencl: fix boundary handling for mul_mm (#16875) lhez 2025-10-30 16:00:20 -07:00
  • ce18efeaf1 convert : update transformers requirements (#16866) RodriMora 2025-10-30 23:15:03 +01:00
  • 16724b5b68 server : bump request URI max length to 32768 (#16862) chansikpark 2025-10-30 14:22:23 -04:00
  • b52edd2558 server : remove n_past (#16818) Georgi Gerganov 2025-10-30 18:42:57 +02:00
  • 517b7170e1 cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833) Max Krasnyansky 2025-10-30 09:06:13 -07:00
  • 835e918d84 common: fix typo in cli help text (#16864) Shagun Bera 2025-10-30 21:17:31 +05:30
  • d261223d24 model: add support for qwen3vl series (#16780) JJJYmmm 2025-10-30 23:19:14 +08:00
  • dcca0d3ab8 cpu: introduce chunking for flash attention (#16829) Max Krasnyansky 2025-10-30 05:26:05 -07:00
  • bacddc049a model: Add support for CogVLM model (#15002) Tianyue-Zhao 2025-10-30 07:18:50 -04:00
  • 229bf68628 cuda : fix argsort with 64k+ rows (#16849) Sigbjørn Skjæret 2025-10-30 08:56:28 +01:00
  • d7395115ba llama : use std::abs instead of abs (#16853) Jan Boon 2025-10-30 14:30:58 +08:00
  • 052df28b0e vulkan: Handle argsort with a large number of rows (#16851) Jeff Bolz 2025-10-30 01:27:41 -05:00
  • 8b11deea46 Hide latency of bias and gate-loading (#16847) Oliver Simons 2025-10-30 04:34:15 +01:00
  • b9ce940177 vulkan: Fuse rope+set_rows (#16769) Jeff Bolz 2025-10-29 15:13:10 -05:00
  • 3464bdac37 llama: fix ASAN error with M-RoPE (#16848) Xuan-Son Nguyen 2025-10-29 20:11:39 +01:00
  • e3af5563bd llama: store mrope data in KV cell (#16825) Xuan-Son Nguyen 2025-10-29 18:09:18 +01:00
  • 10fcc41290 vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656) Jeff Bolz 2025-10-29 08:44:29 -05:00
  • bcf5bda6f5 Vulkan MMQ Integer Dot Refactor and K-Quant support (#16536) Ruben Ortlam 2025-10-29 14:39:03 +01:00
  • 3eb2be1ca5 Hexagon Op queue & dispatch optimizations (#16820) Max Krasnyansky 2025-10-29 06:29:12 -07:00
  • e41bcce8f0 CUDA: use fastdiv in set-rows (#16834) Aman Gupta 2025-10-29 21:11:53 +08:00
  • 144a4ce824 vendor : sync minja (#16500) Sigbjørn Skjæret 2025-10-29 14:09:50 +01:00
  • f549b0007d vulkan: Call ggml_vk_buffer_write_2d from ggml_vk_buffer_copy (#16793) Jeff Bolz 2025-10-29 03:53:04 -05:00
  • 9a3ea685b9 CUDA: Fix bug in topk-moe for gpt-oss (#16821) Aman Gupta 2025-10-29 15:55:06 +08:00
  • 338074c383 sycl: add RMS_NORM_BACK operation support (#16808) YaelLogic 2025-10-29 08:14:39 +02:00
  • 851553ea6b cuda: add SET operation support (#16804) YaelGitAccount 2025-10-28 21:10:28 +02:00
  • 85a7d8677b memory : remove KV cache size padding (#16812) Georgi Gerganov 2025-10-28 20:19:44 +02:00
  • a8ca18b4b8 llama-bench : clarify benchmarked parts of the computation (#16823) Georgi Gerganov 2025-10-28 19:41:43 +02:00
  • 8284efc35c initialise buffer.device in ggml_hexagon_session (#16816) l3utterfly 2025-10-28 23:16:20 +08:00
  • 1c1409e131 embedding: add raw option for --embd-output-format (#16541) Sam Malayek 2025-10-28 03:51:41 -07:00
  • 7a0e900e36 llama: consistent ctx <-> buf order for KV cache (#16746) Johannes Gäßler 2025-10-28 11:23:54 +01:00
  • 280d97be96 grammar : support array references in json schema (#16792) Aldehir Rojas 2025-10-28 03:37:52 -05:00
  • 3479efd112 CANN: Improve device ID handling and aclnnArange checks (#16752) Chenguang Li 2025-10-28 10:54:53 +08:00
  • 463bbf20bf CUDA: add unused vars to mmvf and mmvq (#16807) Aman Gupta 2025-10-28 10:31:21 +08:00
  • ad8d36beff sycl: add SSM_CONV operation support (#16800) tamarPal 2025-10-28 03:50:33 +02:00
  • c053e18a66 chat: Add LFM2 tool handling (#16763) Yuri Khrustalev 2025-10-27 18:54:01 -04:00
  • e1ab084803 mtmd : fix idefics3 preprocessing (#16806) Xuan-Son Nguyen 2025-10-27 23:12:16 +01:00
  • 5a4ff43e7d llama : disable pipeline parallelism if compute buffer allocation fails (#16748) Diego Devesa 2025-10-27 13:51:28 -07:00
  • 10640e31aa ggml : fix interpolate with align-corners and ne=1 (#16700) Acly 2025-10-27 21:50:22 +01:00
  • 80d28f104c HIP: fix AMDGPU_TARGETS, update documentation (#16803) Johannes Gäßler 2025-10-27 21:39:49 +01:00
  • c55d53acec model : add LightOnOCR-1B model (#16764) Xuan-Son Nguyen 2025-10-27 16:02:58 +01:00
  • 945501f5ea llama: fix leaked buffers for mmap + split files (#16765) Johannes Gäßler 2025-10-27 09:17:31 +01:00
  • 75cbdd3fce test-backend-ops: print failed tests at the end (#16785) Aman Gupta 2025-10-27 09:25:10 +08:00
  • 2b9bd9bf4e sycl: add ROLL operation support (#16665) tamarPal 2025-10-27 03:20:24 +02:00
  • 59fc1ec8e8 sycl: add REPEAT_BACK operation support (#16734) shani-f 2025-10-27 03:19:50 +02:00
  • 75d33b9302 CUDA: support for weight clamp in top-k norm (#16702) Aman Gupta 2025-10-27 09:06:16 +08:00
  • 3470a5c891 ggml-alloc : make gallocr prefer chunks that allow memory reuse (#16788) Acly 2025-10-26 23:19:03 +01:00
  • bd562fe4f7 cuda : use fast copy when src and dst are of different type and contiguous (#16789) Sigbjørn Skjæret 2025-10-26 21:31:41 +01:00
  • bbac6a26b2 ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744) leejet 2025-10-27 02:13:31 +08:00
  • 73a48c9790 convert : enable expert group selection for all models with it (#16691) Sigbjørn Skjæret 2025-10-26 17:21:23 +01:00
  • f696428ce8 graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16655) Sigbjørn Skjæret 2025-10-26 17:20:32 +01:00
  • 7cce4f8158 model : set res->t_embd in SmallThinker models (#16782) Sigbjørn Skjæret 2025-10-26 16:08:52 +01:00
  • 8d8862829c docs : add Jamba to Text-only models list (#16778) amirai21 2025-10-26 14:01:20 +02:00
  • f77c13b91f CUDA: General GEMV fusion (#16715) Aman Gupta 2025-10-26 19:28:04 +08:00
  • 3cfa9c3f12 vulkan: deduplicate Microsoft Direct3D12 devices (#16689) Gilad S. 2025-10-26 06:37:38 +02:00
  • 5d195f17bc convert : handle mmproj filename/path properly (#16760) Galunid 2025-10-25 20:41:36 +02:00
  • 226f295f4d model : set res->t_embd in PLaMo2 models (#16766) Shunta Saito 2025-10-25 19:26:27 +09:00
  • f90b4a8efe vulkan: delete dead code (#16732) Giuseppe Scrivano 2025-10-25 10:59:54 +02:00
  • 8423d01931 vulkan: Optimize SSM_SCAN (#16645) Jeff Bolz 2025-10-25 00:04:12 -05:00
  • 5cca2542ac convert : avoid dequantizing mxfp4 for GPT-OSS (#16756) compilade 2025-10-24 20:52:00 -04:00
  • 55945d2ef5 ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742) leejet 2025-10-25 03:39:37 +08:00
  • 0bcb40b48c CUDA: use CUB for arbitary size argsort (#16754) Aman Gupta 2025-10-24 20:46:19 +08:00
  • 69e9ff0103 webui: support q URL parameter (#16728) Florian Badie 2025-10-24 14:10:29 +02:00
  • 5a91109a5d model-conversion : add trust_remote_code for orig model run [no ci] (#16751) Daniel Bevenius 2025-10-24 12:02:02 +02:00
  • f8f071fadd convert : handle pre-quantized models (#14810) compilade 2025-10-23 16:31:41 -04:00
  • 0bf47a1dbb server: add memory breakdown print (#16740) Johannes Gäßler 2025-10-23 21:30:17 +02:00
  • dd62dcfab9 convert : Make mistral-common dependency optional (#16738) Julien Denize 2025-10-23 15:54:46 +02:00
  • d0660f237a mtmd-cli : allow using --jinja (#16718) Xuan-Son Nguyen 2025-10-23 15:00:49 +02:00
  • fe6a9882ac Manually link -lbsd to resolve flock symbol on AIX (#16610) Prajwal B Mehendarkar 2025-10-23 17:07:31 +05:30
  • 061f0eff02 ggml-cuda: use passed ops instead of hardcoded ops (#16712) Aman Gupta 2025-10-23 19:14:06 +08:00
  • 8cf6b42d46 server : send partial stop string when <EOG> is reached (#15007) matteo 2025-10-23 11:32:24 +02:00
  • 9de9672adb sycl: use async memory allocation to fix crashes during graph recording (#16644) Matthew Michel 2025-10-22 20:05:15 -05:00
  • 63d2fc46e1 Add experimental ggml-hexagon backend for the Hexagon NPU (#16547) Max Krasnyansky 2025-10-22 13:47:09 -07:00
  • a2e0088d92 Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…" (#16723) Diego Devesa 2025-10-22 11:20:55 -07:00
  • 9b9201f65a webui: introduce OpenAI-compatible model selector in JSON payload (#16562) Pascal 2025-10-22 16:58:23 +02:00
  • 19a5a3edfd ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_vec_set_f32 for faster fills (#16522) sirus20x6 2025-10-22 05:14:14 -05:00
  • d8eaa26e4d tests : fix test-thread-safety when compiling with multiple backends (#16699) Acly 2025-10-22 12:01:22 +02:00
  • 9285325ce0 CUDA: fix bug in topk-moe softmax (#16711) Aman Gupta 2025-10-22 12:33:08 +08:00
  • 03792ad936 CUDA: topk-moe: add optional parameter for gpt-oss (#16649) Aman Gupta 2025-10-21 22:40:38 +08:00
  • 51d1a8c997 CUDA: better error for FA kernel with 0 occupancy (#16643) Johannes Gäßler 2025-10-21 15:27:53 +02:00
  • 4926419c4d ggml: add ggml_can_fuse_subgraph (#16662) Aman Gupta 2025-10-21 16:43:14 +08:00
  • 6ea37f5739 opencl: fix warnings and clean up profiling (#16688) lhez 2025-10-20 22:26:17 -07:00
  • fb349848f3 vulkan: Handle FA with all -inf mask values (#16447) Jeff Bolz 2025-10-20 22:16:08 -05:00
  • 6de8ed7519 sycl : add PAD_REFLECT_D1 operator support (#16145) YehuditE 2025-10-21 01:21:12 +03:00
  • 84bf3c6778 model : add BailingMoeV2 support (#16063) Sigbjørn Skjæret 2025-10-20 21:38:20 +02:00
  • c9c1972e2c Handle legacy 'context' attachments (#16687) Aleksander Grygier 2025-10-20 19:49:02 +02:00
  • b617cfd289 ggml-alloc : fix leak when reusing a tensor with a larger size (#16679) Diego Devesa 2025-10-20 05:53:50 -07:00
  • 79068501fa Prevent premature submission on IME input (#16673) Aleksander Grygier 2025-10-20 14:21:12 +02:00
  • 0e4a0cf2fa Import/Export UX improvements (#16619) Aleksander Grygier 2025-10-20 13:29:14 +02:00
  • 13f2cfad41 Enable per-conversation loading states to allow having parallel conversations (#16327) Aleksander Grygier 2025-10-20 12:41:13 +02:00