Commit Graph

  • faed5a5f5d llamafile : support s390x SIMD instruction set (#14273) Aaron Teo 2025-06-19 17:48:54 +08:00
  • 10bb545c5b Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (#14249) 0cc4m 2025-06-19 09:15:42 +02:00
  • edc4a29eff memory : Hybrid recurrent cache (#13979) Gabe Goodhart 2025-06-19 00:08:14 -05:00
  • ed3290ab34 metal : add mean kernel (#14267) Georgi Gerganov 2025-06-19 08:05:21 +03:00
  • 8d94713654 docs: add s390x build documentation (#14264) Aaron Teo 2025-06-19 01:10:26 +08:00
  • 50d2227953 ggml-cpu: reduce asm calls for hsum (#14037) Aaron Teo 2025-06-19 01:10:08 +08:00
  • 6231c5cd6d ggml-cpu: fix uncaught underscore terminators (#14023) Aaron Teo 2025-06-19 01:06:49 +08:00
  • ef035803eb ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258) Charles Xu 2025-06-18 13:40:07 +02:00
  • 413977de32 mtmd : refactor llava-uhd preprocessing logic (#14247) Xuan-Son Nguyen 2025-06-18 10:43:57 +02:00
  • 95402553a5 llama-chat : fix multiple system message for gemma, orion (#14246) Xuan-Son Nguyen 2025-06-18 09:58:43 +02:00
  • 3865cff4f5 convert : fix null head_dim AutoConfig regression (#14248) Sigbjørn Skjæret 2025-06-18 09:52:07 +02:00
  • d03172cc79 sync : ggml Georgi Gerganov 2025-06-18 09:58:23 +03:00
  • dd8e59f443 ggml : disable warnings for tests when using MSVC (ggml/1273) Daniel Bevenius 2025-06-13 15:06:42 +02:00
  • bbe98d2784 ggml : remove unused ggml_context_container (ggml/1272) Daniel Bevenius 2025-06-13 09:05:44 +02:00
  • c2056ed6d4 examples : include examples in msvc disable warn (ggml/1270) Daniel Bevenius 2025-06-12 12:27:09 +02:00
  • c46503014d cmake: remove shader-gen step-targets from ggml-vulkan (#14226) bandoti 2025-06-17 17:33:25 -03:00
  • 860a9e4eef ggml-cpu : remove the weak alias trick (#14221) xctan 2025-06-17 17:58:32 +08:00
  • fe9d60e74a musa: fix build warning (unused variable) (#14231) R0CKSTAR 2025-06-17 17:48:08 +08:00
  • e434e69183 common : suggest --jinja when autodetection fails (#14222) Sigbjørn Skjæret 2025-06-16 21:58:42 +02:00
  • 89fea80d29 server : fix incorrect usage of llama_get_embeddings() (#14225) Georgi Gerganov 2025-06-16 22:33:27 +03:00
  • 6adc3c3ebc llama : add thread safety test (#14035) Diego Devesa 2025-06-16 08:11:43 -07:00
  • 0dbcabde8c cmake: clean up external project logic for vulkan-shaders-gen (#14179) bandoti 2025-06-16 10:32:13 -03:00
  • ad590be98c model : add NeoBERT (#14164) Đinh Trọng Huy 2025-06-16 21:53:41 +09:00
  • 7d6d91babf HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202) uvos 2025-06-16 13:47:38 +02:00
  • d3e64b9f49 llama : rework embeddings logic (#14208) Georgi Gerganov 2025-06-16 14:14:00 +03:00
  • 3ba0d843c6 ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206) Charles Xu 2025-06-16 11:47:57 +02:00
  • 0bf49eb668 convert : remove arcee change in convert_hf_to_gguf_update.py (#14207) Bartowski 2025-06-16 09:16:06 +01:00
  • 4ad243677b gguf-py : allow key override when adding value to GGUFWriter (#14194) Đinh Trọng Huy 2025-06-16 16:20:59 +09:00
  • c89c2d1ab9 vulkan: mutex around vkQueueSubmit (#14127) Jeff Bolz 2025-06-16 00:21:08 -06:00
  • 3555b3004b ggml-cpu : rework weak alias on apple targets (#14146) xctan 2025-06-16 13:54:15 +08:00
  • d7da8dc83a model : Add support for Arcee AI's upcoming AFM model (#14185) Bartowski 2025-06-16 00:04:06 +01:00
  • cd355eda7d server : When listening on a unix domain socket don't print http:// and port (#14180) Eric Curtin 2025-06-15 23:36:22 +02:00
  • 30e5b01de2 quantize : change int to unsigned int for KV overrides (#14197) Ed Addario 2025-06-15 17:53:45 +01:00
  • e54b394082 CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196) uvos 2025-06-15 17:30:13 +02:00
  • 2c2caa4443 HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183) uvos 2025-06-15 15:45:27 +02:00
  • 5fce5f948d kv-cache : fix use-after-move of defrag info (#14189) Georgi Gerganov 2025-06-15 10:52:11 +03:00
  • 9ae4143bc6 model : add dots.llm1 architecture support (#14044) (#14118) Mikko Juola 2025-06-15 00:52:06 -07:00
  • c311ac664d cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) Georgi Gerganov 2025-06-15 10:08:58 +03:00
  • b9912ac570 batch : auto-gen positions + verify multi-sequence input (#14177) Georgi Gerganov 2025-06-15 09:18:37 +03:00
  • 00ba772610 docs : remove WIP since PR has been merged (#13912) Pepijn de Vos 2025-06-15 08:06:37 +02:00
  • 3cb203c89f llama-chat : Do not throw when tool parsing fails (#14012) Piotr 2025-06-14 18:25:15 +02:00
  • 2e42be42bd compare-llama-bench: add option to plot (#14169) Aman Gupta 2025-06-14 16:34:20 +08:00
  • fb85a288d7 vocab : fix build (#14175) Georgi Gerganov 2025-06-13 20:03:05 +03:00
  • 40643edb86 sycl: fix docker image (#14144) Svetlozar Georgiev 2025-06-13 17:32:56 +01:00
  • 3cfbbdb44e Merge commit from fork Guy Goldenberg 2025-06-13 19:20:25 +03:00
  • 80709b70a2 batch : add LLAMA_BATCH_DEBUG environment variable (#14172) Georgi Gerganov 2025-06-13 18:35:00 +03:00
  • 26ff3685bf docs : Update multimodal.md (#14122) ddpasa 2025-06-13 15:17:53 +02:00
  • 60c666347b batch : rework llama_batch_allocr (#14153) Georgi Gerganov 2025-06-13 13:47:55 +03:00
  • b7cc7745e3 readme : remove survey link (#14168) Georgi Gerganov 2025-06-13 11:55:44 +03:00
  • cc8d081879 cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) Christian Kastner 2025-06-13 08:38:52 +00:00
  • d714dadb57 pooling : make cls_b and cls_out_b optional (#14165) Đinh Trọng Huy 2025-06-13 17:34:08 +09:00
  • ffad043973 server : fix SWA condition for full context reprocess (#14163) Georgi Gerganov 2025-06-13 11:18:25 +03:00
  • 0889eba570 sycl: Adding additional cpy dbg print output (#14034) Anton Mitkov 2025-06-13 08:51:39 +01:00
  • c61285e739 SYCL: Bump oneMath commit (#14152) Ewan Crawford 2025-06-13 08:45:37 +01:00
  • 09cf2c7c65 cmake : Improve build-info.cpp generation (#14156) Christian Kastner 2025-06-13 06:51:34 +00:00
  • c33fe8b8c4 vocab : prevent heap overflow when vocab is too small (#14145) Georgi Gerganov 2025-06-13 08:03:54 +03:00
  • ed52f3668e sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125) Anton Mitkov 2025-06-12 14:15:11 +01:00
  • a681b4ba83 readme : remove project status link (#14149) Georgi Gerganov 2025-06-12 14:43:09 +03:00
  • 7d516443dd server : re-enable SWA speculative decoding (#14131) Georgi Gerganov 2025-06-12 11:51:38 +03:00
  • f6e1a7aa87 context : simplify output counting logic during decode (#14142) Georgi Gerganov 2025-06-12 11:50:01 +03:00
  • c3ee46fab4 batch : remove logits_all flag (#14141) Georgi Gerganov 2025-06-12 11:49:26 +03:00
  • e2c0b6e46a cmake : handle whitepsaces in path during metal build (#14126) Georgi Gerganov 2025-06-12 10:14:24 +03:00
  • 9596506965 kv-cache : fix split_equal handling in unified implementation (#14130) Georgi Gerganov 2025-06-12 10:02:15 +03:00
  • a20b2b05bc context : round n_tokens to next multiple of n_seqs when reserving (#14140) compilade 2025-06-12 02:56:04 -04:00
  • 2e89f76b7a common: fix issue with regex_escape routine on windows (#14133) bandoti 2025-06-11 17:19:44 -03:00
  • 532802f938 Implement GGML_CPU_ALL_VARIANTS for ARM (#14080) Christian Kastner 2025-06-11 19:07:44 +00:00
  • d4e0d95cf5 chore : clean up relative source dir paths (#14128) Sigbjørn Skjæret 2025-06-11 19:04:23 +02:00
  • cc66a7f78f tests : add test-tokenizers-repo (#14017) Sigbjørn Skjæret 2025-06-11 17:16:32 +02:00
  • bd248d4dc7 vulkan: Better thread-safety for command pools/buffers (#14116) Jeff Bolz 2025-06-11 09:48:52 -05:00
  • 7781e5fe99 webui: Wrap long numbers instead of infinite horizontal scroll (#14062) Aman 2025-06-11 22:42:25 +08:00
  • 89a184fa71 kv-cache : relax SWA masking condition (#14119) Georgi Gerganov 2025-06-11 16:48:45 +03:00
  • 2baf07727f server : pass default --keep argument (#14120) Taylor 2025-06-11 06:43:43 -04:00
  • 7ae2932116 kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121) Georgi Gerganov 2025-06-11 12:52:45 +03:00
  • 1f7d50b293 vulkan: Track descriptor pools/sets per-context (#14109) Jeff Bolz 2025-06-11 00:19:25 -05:00
  • 4c763c8d1b opencl: add mul_mv_id_q4_0_f32_8x_flat (#14003) lhez 2025-06-10 16:55:58 -07:00
  • dad5c44398 kv-cache : avoid modifying recurrent cells when setting inputs (#13834) compilade 2025-06-10 18:20:14 -04:00
  • 55f6b9fa65 convert : fix duplicate key DeepSeek-R1 conversion error (#14103) Sigbjørn Skjæret 2025-06-10 23:29:52 +02:00
  • 3678b838bb llama : support GEGLU for jina-bert-v2 (#14090) Sigbjørn Skjæret 2025-06-10 18:02:08 +02:00
  • 652b70e667 vulkan: force device 0 in CI (#14106) Jeff Bolz 2025-06-10 10:53:47 -05:00
  • 3a12db23b6 Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104) Juk Armstrong 2025-06-10 16:48:07 +01:00
  • ae92c1855b sync : ggml Georgi Gerganov 2025-06-10 17:37:45 +03:00
  • b7ce1ad1e3 ggml : fix weak alias win32 (whisper/0) Georgi Gerganov 2025-06-10 11:34:10 +03:00
  • 97340b4c99 Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099) 0cc4m 2025-06-10 14:01:33 +02:00
  • 2bb0467043 rpc : nicer error messages for RPC server crash (#14076) Isaac McFadyen 2025-06-10 02:41:01 -04:00
  • b8e2194efc sync : ggml Georgi Gerganov 2025-06-10 09:20:51 +03:00
  • 1a3b5e80f7 Add in-build ggml::ggml ALIAS library (ggml/1260) Kai Pastor 2025-06-03 12:33:28 +02:00
  • 1f63e75f3b metal : use less stack memory in FA kernel (#14088) Georgi Gerganov 2025-06-09 23:05:02 +03:00
  • 40cbf571c9 kv-cache : fix shift and defrag logic (#14081) Georgi Gerganov 2025-06-09 23:04:35 +03:00
  • 7f4fbe5183 llama : allow building all tests on windows when not using shared libs (#13980) Diego Devesa 2025-06-09 11:03:09 -07:00
  • f470bc36be ggml-cpu : split arch-specific implementations (#13892) xctan 2025-06-09 22:47:13 +08:00
  • 8f47e25f56 cuda : fix device sync on buffer clear (#14033) Diego Devesa 2025-06-09 07:36:26 -07:00
  • 201b31dc2e graph : fix geglu (#14077) Georgi Gerganov 2025-06-09 17:17:31 +03:00
  • e21d2d4ae2 CANN: Simplify the environment variable setting(#13104) Xinpeng Dou 2025-06-09 19:47:39 +08:00
  • dc0623fddb webui: fix sidebar being covered by main content (#14082) R0CKSTAR 2025-06-09 18:01:17 +08:00
  • 87d34b381d server : fix LRU check (#14079) Georgi Gerganov 2025-06-09 12:57:58 +03:00
  • b460d16ae8 sycl: Add reorder to Q6_K mmvq implementation (#13885) Nicolò Scipione 2025-06-09 11:47:07 +02:00
  • 91a8ee6a6f add geglu activation function (#14074) Đinh Trọng Huy 2025-06-09 13:15:31 +09:00
  • 056eb74534 CANN: Enable labeler for Ascend NPU (#13914) Yuanhao Ji 2025-06-09 11:20:06 +08:00
  • 247e5c6e44 cuda : fix buffer type check with integrated GPUs (#14069) Diego Devesa 2025-06-08 11:39:56 -07:00
  • 5787b5da57 ci: add LoongArch cross-compile build (#13944) 吴小白 2025-06-07 21:39:11 +08:00