-
f2f28380ea
metal : handle nil cv during pipeline creation (#16065)
Georgi Gerganov
2025-09-18 10:03:24 +03:00
-
62c3b645c5
CANN: Remove print (#16044)
Chenguang Li
2025-09-18 09:26:33 +08:00
-
d304f459d8
GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018)
Reese Levine
2025-09-17 13:09:40 -07:00
-
0320ac5264
metal : refactor + optimize v2 (#15995)
Georgi Gerganov
2025-09-17 20:38:12 +03:00
-
a7a98e0fff
SvelteKit-based WebUI (#14839)
Aleksander Grygier
2025-09-17 19:29:13 +02:00
-
8f8f2274ee
convert : add Llama4ForCausalLM (#16042)
Xuan-Son Nguyen
2025-09-18 00:18:21 +07:00
-
c959b676be
CUDA: fix FA occupancy, optimize tile kernel (#15982)
Johannes Gäßler
2025-09-17 15:32:42 +02:00
-
cd08fc3ecc
common : Fix corrupted memory error on json grammar initialization (#16038)
David Ribeiro Alves
2025-09-17 01:08:02 -07:00
-
cb5bb6cc05
vulkan: automatically remove unsupported devices (#15976)
Eve
2025-09-17 07:35:37 +00:00
-
a91d035b90
ci : revert back to macos-13 for macOS-latest-cmake-x64 (#16040)
Daniel Bevenius
2025-09-17 09:34:09 +02:00
-
745cbcf2fe
llama-quant : fix the verification of attention layers for encoder-decoder models (#16023)
Jie Fu (傅杰)
2025-09-17 15:30:55 +08:00
-
1cbd80f8cf
examples : support encoder-decoder models in the simple example (#16002)
Jie Fu (傅杰)
2025-09-17 15:29:00 +08:00
-
85286f3548
model : add OLMo3 support (#16015)
Shane A
2025-09-17 00:01:58 -07:00
-
d5fabe3682
CANN: Optimize ggml_cann_set_device (#15935)
Chenguang Li
2025-09-17 14:33:08 +08:00
-
8ff206097c
llama-bench: add --n-cpu-moe support (#15952)
jacekpoplawski
2025-09-16 16:17:08 +02:00
-
77475530b8
ci : use macos-latest for arm64 webgpu build (#16029)
Daniel Bevenius
2025-09-16 15:27:52 +02:00
-
3913f8730e
ggml : fix padding in timestep embedding kernels (#15932)
Daniel Bevenius
2025-09-16 15:25:57 +02:00
-
76888d202e
ci : upload xcframework artifact from ios-xcode-build job (#16010)
Daniel Bevenius
2025-09-16 13:41:38 +02:00
-
f1fbffb5c0
fix: apply clang-format to CUDA macros (#16017)
Bowen Han
2025-09-15 23:59:19 -07:00
-
51abc96bdc
ci : update macos-latest* jobs to use macos-latest (#15938)
Daniel Bevenius
2025-09-16 05:57:16 +02:00
-
07808ebb07
cmake : Do not install tools on iOS targets (#15903)
Yuri Khrustalev
2025-09-15 22:54:44 -04:00
-
6d758839ff
Add LLaDA-7b-MoE diffusion model (#16003)
Aman Gupta
2025-09-16 10:38:28 +08:00
-
3d4053f77f
CUDA: fix im2col_3d to respect non-contiguous inputs (views) (#15956)
Jake Karnes
2025-09-15 16:28:31 -06:00
-
dc381aa9a6
docker : enable rocWMMA in ROCm images, add gfx1151 (#15997)
Diego Devesa
2025-09-15 14:38:52 -07:00
-
10d197409b
releases : switch to rocWMMA develop branch, add gfx1151 (#15992)
Diego Devesa
2025-09-15 14:38:42 -07:00
-
b907255f4b
SYCL: Add COUNT_EQUAL operator support (#15991)
yael-works
2025-09-15 19:51:35 +03:00
-
28c39da7c6
llama-run: Fix model download on Windows (#15988)
Nikolay Popov
2025-09-15 13:08:30 +03:00
-
106220562a
CUDA: some micro-optimizations in mmf.cuh for mul_mat_id (#15926)
Aman Gupta
2025-09-15 17:35:11 +08:00
-
a68f31edd7
fix KLD percentile output (#15999)
ddh0
2025-09-15 02:54:57 -05:00
-
b8e09f08b9
model : add grok-2 support (#15539)
Sigbjørn Skjæret
2025-09-14 23:00:59 +02:00
-
6c019cb04e
server : only attempt to enable thinking if using jinja (#15967)
Sigbjørn Skjæret
2025-09-14 21:17:04 +02:00
-
9dcd200d57
metal : remove memory pools (#15966)
Georgi Gerganov
2025-09-14 22:02:32 +03:00
-
0fa154e350
rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD Radeon RX 9000 series (#15994)
Adam
2025-09-15 04:43:54 +10:00
-
261e6a20ff
Vulkan: Clean up mul_mm shader (#15987)
Ruben Ortlam
2025-09-14 16:56:28 +02:00
-
a0e13dcbe5
build: fix the build failures of Windows HIP release job (#15984)
lcy
2025-09-14 22:20:35 +08:00
-
a14bd35014
metal : fix kernel requirements (#15983)
Georgi Gerganov
2025-09-14 15:33:22 +03:00
-
918b26f197
rpc : fix regression when --device is used (#15981)
Radoslav Gerganov
2025-09-14 12:28:18 +03:00
-
9ecb884346
releases : update ROCM, add gfx1200, gfx1201, gfx1151 (#15972)
Diego Devesa
2025-09-14 02:21:59 -07:00
-
d1c6f11f47
doc : update documentation for --tensor-split (#15980)
Radoslav Gerganov
2025-09-14 12:10:07 +03:00
-
6380d6a3e7
ggml-zdnn: rm user mapped buffers (#15965)
Aaron Teo
2025-09-14 13:37:03 +08:00
-
aa0c461efe
vulkan: fix failing dequant shaders (#15862)
Jeff Bolz
2025-09-13 16:29:43 +01:00
-
b9c9c9f789
vulkan: initialize vulkan-hpp to allow using extension function pointers (#15705)
Jeff Bolz
2025-09-13 16:23:30 +01:00
-
50f4281a6f
llama : allow using iGPUs with --device (#15951)
Diego Devesa
2025-09-13 07:49:49 -07:00
-
55758b00ca
metal : refactor kernel loading (#15964)
Georgi Gerganov
2025-09-13 16:24:22 +03:00
-
f161463a54
metal : allow ops to run concurrently (#15929)
Georgi Gerganov
2025-09-13 13:54:28 +03:00
-
84d7b2fca1
metal : fix memory leaks (#15962)
Georgi Gerganov
2025-09-13 12:45:04 +03:00
-
40be51152d
ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free (#15839)
Aaron Teo
2025-09-13 02:39:52 +08:00
-
4bf5549269
Add docker protocol support for llama-server model loading (#15790)
Eric Curtin
2025-09-12 16:31:50 +01:00
-
f4e664f838
context : remove redundant explicit casting to the same type (#15948)
Haiyue Wang
2025-09-12 23:16:32 +08:00
-
f088b6a84f
server : adjust prompt similarity thold + add logs (#15913)
Georgi Gerganov
2025-09-12 17:02:55 +03:00
-
304ac5693d
Vulkan iGPU device selection overhaul and PCI ID API support (#15947)
Ruben Ortlam
2025-09-12 13:24:21 +02:00
-
6c88ad8fa7
vulkan: Make device memory check more portable (#15939)
Mathieu Baudier
2025-09-12 09:06:20 +02:00
-
704d90c987
Revert "sycl: add usage of enqueue_functions extension (#14244)" (#15910)
Neo Zhang Jianyu
2025-09-12 09:15:12 +08:00
-
360d6533db
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797)
Diego Devesa
2025-09-11 13:47:38 -07:00
-
0e6ff0046f
CUDA: larger SRAM reads for tile FA, AMD FP16 dot (#15927)
Johannes Gäßler
2025-09-11 21:19:58 +02:00
-
df082f5630
nitpick : correct MB to MiB (#15934)
ddh0
2025-09-11 12:12:34 -05:00
-
24a6734daf
ggml-cpu : add check for ARM MATMUL_INT8/i8mm support (#15922)
Daniel Bevenius
2025-09-11 15:39:12 +02:00
-
2b3efea9a4
kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed (#15614)
Charles Xu
2025-09-11 12:45:40 +02:00
-
c0389dba43
CANN: Disable acl_graph for prefill stage (#15933)
hipudding
2025-09-11 15:59:37 +08:00
-
00681dfc16
CUDA: Add
fastdiv to k_bin_bcast*, giving 1-3% E2E performance (#15872)
Oliver Simons
2025-09-10 22:04:03 +02:00
-
4f658855fa
llama : support T5 models with unequal number of encoder-decoder layers (#15909)
Jie Fu (傅杰)
2025-09-11 02:51:51 +08:00
-
6ab397e12b
graph : support non-contiguous Q in build_attn_mha (#15908)
Sigbjørn Skjæret
2025-09-10 19:08:59 +02:00
-
9de447d94e
ggml-cpu : fix padding in ggml_timestep_embedding (#15917)
Daniel Bevenius
2025-09-10 17:31:40 +02:00
-
0f0a3c2851
metal : make the backend async (#15906)
Georgi Gerganov
2025-09-10 17:52:35 +03:00
-
33daece86b
ci : add caching for ROCm installation in release workflow (#15924)
Daniel Bevenius
2025-09-10 15:39:57 +02:00
-
e7b6d83b52
tests : filter out no-ops from coverage report (#15900)
Daniel Bevenius
2025-09-10 14:17:09 +02:00
-
2cfef4d117
media : add transparent icon svg and png [no ci] (#15891)
j-k
2025-09-10 12:51:28 +01:00
-
09e72a037c
gitignore : Ignore vim swap files in tests (#15901)
Jesse
2025-09-10 07:28:47 -04:00
-
10d8b2b6b0
CANN: Add ROPE sin/cos cache for reuse (#15912)
Chenguang Li
2025-09-10 18:42:00 +08:00
-
28b5f190ef
CANN: implement LRU cache for ACL graphs (#15814)
Chenguang Li
2025-09-10 15:29:12 +08:00
-
86587da03b
llama : check returned fn ptrs from ggml_backend_reg_get_proc_address (#15893)
Daniel Bevenius
2025-09-10 05:33:58 +02:00
-
ff02caf9ee
ci : cache ROCm installation in windows-latest-cmake-hip (#15887)
Daniel Bevenius
2025-09-10 05:23:19 +02:00
-
ae355f6f71
vulkan: throw the oom error instead of no memory type found (#15905)
Ruben Ortlam
2025-09-09 22:26:03 +02:00
-
4f63cd705c
vulkan: Fix OOB accesses in soft_max_back (#15861)
Jeff Bolz
2025-09-09 07:41:15 -05:00
-
17bc5a815f
HIP: use v_dot2_f32_f16 instruction for FA (#15884)
Johannes Gäßler
2025-09-09 14:04:43 +02:00
-
ed54e32558
Workaround for subgroup arithmetic failing on MoltenVK with AMD GPUs (issue 15846) (#15886)
lksj92hs
2025-09-09 15:01:15 +03:00
-
a972faebed
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
Aman Gupta
2025-09-09 14:38:02 +08:00
-
550cf726e1
CUDA: fix GET_ROWS for large tensors (#15882)
Johannes Gäßler
2025-09-09 08:11:01 +02:00
-
c252ce67c4
contrib : add notes about merging PRs (#15881)
Georgi Gerganov
2025-09-09 08:42:10 +03:00
-
70cd37dbbe
requirements : update transformers/torch for Embedding Gemma (#15828)
Daniel Bevenius
2025-09-09 06:06:52 +02:00
-
acc1b008cf
model-conversion : add extra debugging support for model conversion (#15877)
Piotr Wilkin (ilintar)
2025-09-09 06:05:55 +02:00
-
7057faf64b
json : support
enum values within allOf (#15830)
Aldehir Rojas
2025-09-08 16:14:32 -05:00
-
fe1c92cd7b
media : add llama1 icon (#15878)
j-k
2025-09-08 19:57:01 +01:00
-
e68aa10d8f
vulkan: sort graph to allow more parallel execution (#15850)
Jeff Bolz
2025-09-08 13:10:07 -05:00
-
0a16bf52e6
CUDA: generate_cu_files.py - add missing mxfp4 (#15880)
Aman Gupta
2025-09-09 01:23:46 +08:00
-
88021565f0
chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533)
Jesse
2025-09-08 10:59:48 -04:00
-
56920f5665
server : bring back timings_per_token (#15879)
Xuan-Son Nguyen
2025-09-08 21:50:05 +07:00
-
b0d52998b9
cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868)
Georgi Gerganov
2025-09-08 13:56:51 +03:00
-
f28d4f4ac9
metal : refactor + optimize (#15857)
Georgi Gerganov
2025-09-08 13:34:56 +03:00
-
9fcb29f22f
ggml: allow casting between f32 and i32 (#15783)
Xuan-Son Nguyen
2025-09-08 17:33:01 +07:00
-
5ef22d281d
CUDA: non-contiguous src0 not supported for PAD (#15869)
Sigbjørn Skjæret
2025-09-08 11:55:44 +02:00
-
233d773d02
convert : force setting sliding_window from original config (#15867)
Daniel Bevenius
2025-09-08 09:44:34 +02:00
-
a885dcff11
batched-bench : fix llama_synchronize usage during prompt processing (#15835)
Georgi Gerganov
2025-09-08 10:27:07 +03:00
-
663027fd54
context : fix n_outputs during reserve (#15858)
Georgi Gerganov
2025-09-08 10:26:36 +03:00
-
cf0e3ba150
model : avoid ggml_cont_3d for fused QKV weights (#15662)
Georgi Gerganov
2025-09-08 10:25:33 +03:00
-
d413dca003
tests: large sizes for get_rows (#15687)
Jeff Bolz
2025-09-07 23:23:41 -05:00
-
85ca66a746
CANN: Stream sync between devices for acl_graph (#15809)
Chenguang Li
2025-09-08 10:03:29 +08:00
-
3976dfbe00
vulkan: support im2col_3d (#15795)
Jeff Bolz
2025-09-07 13:50:26 -05:00
-
d36e61c580
ggml-cpu: clean up s390x SIMD (#15855)
Aaron Teo
2025-09-08 02:18:28 +08:00
-
c97b5e5854
vulkan: Support pad_ext (#15794)
Jeff Bolz
2025-09-07 12:00:49 -05:00