-
06332e2867
llama-batch: fix build fails with
-Werror=missing-braces (#16614)
takuya kodama
2025-10-20 16:27:09 +08:00
-
72d53e6c4d
readme: update bindings (#16651)
Ron Evans
2025-10-20 10:20:04 +02:00
-
2330de7b84
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613)
safranowith
2025-10-20 11:08:32 +03:00
-
7062dd8460
llama-context: only warn on pooling_type when user specified (#16674)
takuya kodama
2025-10-20 15:44:21 +08:00
-
0398752dd4
model : add Granite Hybrid types (#16635)
Giuseppe Scrivano
2025-10-19 23:54:31 +02:00
-
4f73d0a951
ci : fix binaries release failure for s390x (binaries may not work yet) (#16664)
Aaron Teo
2025-10-20 05:06:39 +08:00
-
cec5edbcae
ci : avoid manual updates of docs/ops.md (#16663)
Sigbjørn Skjæret
2025-10-19 14:03:25 +02:00
-
fcb235b466
ci: include s390x release binaries (#16648)
Aaron Teo
2025-10-19 18:37:47 +08:00
-
55754bebd5
CODEOWNERS: update for ggml-cuda/mmf (#16660)
Aman Gupta
2025-10-19 15:37:12 +08:00
-
ee09828cb0
HIP: fix GPU_TARGETS (#16642)
Johannes Gäßler
2025-10-18 14:47:32 +02:00
-
e56abd2098
vulkan: Implement topk_moe fused shader, ported from CUDA (#16641)
Jeff Bolz
2025-10-18 05:22:57 -05:00
-
38355c6c8e
CUDA: use registers instead of smem in topk-moe (#16647)
Aman Gupta
2025-10-18 17:52:53 +08:00
-
81387858f1
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)
Shawn Gu
2025-10-17 17:55:32 -07:00
-
66b0dbcb2d
llama-model: fix insonsistent ctxs <-> bufs order (#16581)
Johannes Gäßler
2025-10-17 17:41:09 +02:00
-
41386cf365
rpc : report actual free memory (#16616)
Radoslav Gerganov
2025-10-17 18:02:52 +03:00
-
3d4e86bbeb
vulkan: Add State Space Model (SSM) Operations Support (#16463)
Giuseppe Scrivano
2025-10-17 14:23:47 +02:00
-
342c728d03
ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)
muggle-stack
2025-10-17 18:01:23 +08:00
-
ababae7e1e
webui: reorganize settings layout (#16607)
Pascal
2025-10-17 10:35:03 +02:00
-
b19491599d
vulkan: fix debug build (add_rms_len/data not found) (#16624)
Jeff Bolz
2025-10-17 02:31:04 -05:00
-
9ad4f1931e
metal : add
CONV_TRANSPOSE_2D (#16542)
Ilia Ilmer
2025-10-17 02:33:58 -04:00
-
79967ec596
grammar : use int64_t to avoid int overflows in int schema to grammar conversion logic (#16626)
Olivier Chafik
2025-10-17 06:59:31 +01:00
-
ceff6bb253
SYCL SET operator optimized for F32 tensors (#16350)
GittyBurstein
2025-10-17 05:36:40 +03:00
-
1bb4f43380
mtmd : support home-cooked Mistral Small Omni (#14928)
Xuan-Son Nguyen
2025-10-16 19:00:31 +02:00
-
683fa6ba4e
fix: added a normalization step for MathJax-style \[\] and \(\) delimiters (#16599)
Pascal
2025-10-16 16:28:41 +02:00
-
b22572e97d
sycl : add ARANGE operator (#16362)
GittyBurstein
2025-10-16 16:26:21 +03:00
-
7a50cf388a
CANN: format code using .clang-format (#15863)
Chenguang Li
2025-10-16 16:41:11 +08:00
-
6f5d924637
common : Update the docs on -t --threads (#16236)
takasurazeem
2025-10-16 01:11:33 -04:00
-
adc9b60f19
ggml-cpu: replace putenv with setenv for const-correctness (#16573)
takuya kodama
2025-10-16 13:10:32 +08:00
-
ee50ee1ead
SYCL: Add GGML_OP_MEAN operator support (#16009)
yael-works
2025-10-16 07:21:28 +03:00
-
7adc79c032
gguf-py : add support for endian conversion of BF16 data (#16594)
Aleksei Nikiforov
2025-10-15 22:43:08 +02:00
-
466c1911ab
cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083)
safranowith
2025-10-15 22:24:51 +03:00
-
0cb7a0683b
opencl: add q8_0 mm support (#16469)
lhez
2025-10-15 10:51:04 -07:00
-
d93f8439b0
opencl: fix FA for f32 (#16584)
lhez
2025-10-15 10:48:28 -07:00
-
f9fb33f263
Add server-driven parameter defaults and syncing (#16515)
Aleksander Grygier
2025-10-15 16:22:20 +02:00
-
f4ce81c45e
metal: optimise
GGML_OP_SUM (#16559)
Sam/Samuel
2025-10-15 23:05:56 +09:00
-
17304cbcc1
server : fix img token logs (#16595)
Georgi Gerganov
2025-10-15 16:53:12 +03:00
-
3e3cb19f64
llama-quant: add support for mmproj (#16592)
Xuan-Son Nguyen
2025-10-15 14:48:08 +02:00
-
5acd455460
CUDA: Changing the CUDA scheduling strategy to spin (#16585)
Julius Tischbein
2025-10-15 13:54:15 +02:00
-
554fd578a5
server : fix mtmd checkpoints (#16591)
Georgi Gerganov
2025-10-15 12:51:27 +03:00
-
fa882fd2b1
metal : avoid using Metal's gpuAddress property (#16576)
Georgi Gerganov
2025-10-14 20:33:05 +03:00
-
ffa059034c
vulkan: Add ACC_TYPE_VEC2 implementation (#16203)
SavicStefan
2025-10-14 19:18:05 +02:00
-
120bf7046d
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (#16577)
Aman Gupta
2025-10-14 22:48:08 +08:00
-
4258e0cfe7
vulkan: Support FA with K/V in F32 (#16543)
Jeff Bolz
2025-10-14 08:53:37 -05:00
-
7ea15bb64c
vulkan: Improve build time for MSVC (#16545)
Jeff Bolz
2025-10-14 07:51:36 -05:00
-
9c7185dd28
CUDA: enable FA for FP32 KV cache (#16546)
Johannes Gäßler
2025-10-14 14:22:47 +02:00
-
1ee9d0b415
CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)
Aman Gupta
2025-10-14 19:16:21 +08:00
-
48e2fa9fb7
CUDA: add fp kernel for larger batch size MoE (#16512)
Aman Gupta
2025-10-14 19:15:15 +08:00
-
5b6913c47b
cuda : remove legacy copy-op pointer indirection code (#16485)
Anav Prasad
2025-10-14 09:53:49 +00:00
-
bc07349a7f
server : dynamic token limit for prompt cache (#16560)
Georgi Gerganov
2025-10-14 08:48:50 +03:00
-
e60f241eac
metal : FA support F32 K and V and head size = 32 (#16531)
Georgi Gerganov
2025-10-13 23:07:57 +03:00
-
e38b7c6e9e
graph : support cacheless embeddings with FA and iSWA (#16528)
Georgi Gerganov
2025-10-13 22:42:37 +03:00
-
5016b72862
opencl: fix build targeting CL 2 (#16554)
lhez
2025-10-13 11:50:37 -07:00
-
7049736b2d
CUDA: fix numerical issues in tile FA kernel (#16540)
Johannes Gäßler
2025-10-13 16:29:45 +02:00
-
01d2bdc2bc
ggml : fix build broken with -march=armv9-a on MacOS (#16520)
Jie Fu (傅杰)
2025-10-13 20:48:47 +08:00
-
56fc38b965
CANN: fix CPU memory leak in CANN backend (#16549)
Chenguang Li
2025-10-13 17:01:24 +08:00
-
1fb9504eb7
fix: add remark plugin to render raw HTML as literal text (#16505)
Pascal
2025-10-13 10:55:32 +02:00
-
3f750f8d76
metal: add support for opt_step_sgd (#16539)
Sam/Samuel
2025-10-13 16:25:02 +08:00
-
c515fc5771
ggml : fix scalar path for computing norm (#16558)
Georgi Gerganov
2025-10-13 11:22:27 +03:00
-
f9bc66c3eb
CANN: Update several operators to support FP16 data format (#16251)
hipudding
2025-10-13 08:52:22 +08:00
-
a31cf36ad9
metal : add opt_step_adamw and op_sum (#16529)
Sam/Samuel
2025-10-13 02:43:14 +08:00
-
81d54bbfd5
webui: remove client-side context pre-check and rely on backend for limits (#16506)
Pascal
2025-10-12 18:06:41 +02:00
-
c7be9febcb
[SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521)
Neo Zhang Jianyu
2025-10-12 21:53:35 +08:00
-
8415f61e23
ci : add Vulkan on Ubuntu with default packages build (#16532)
Mathieu Baudier
2025-10-12 15:48:03 +02:00
-
2c301e91ab
common : handle unicode during partial json parsing (#16526)
Aldehir Rojas
2025-10-12 08:18:47 -05:00
-
4b2dae383d
common : update presets (#16504)
Georgi Gerganov
2025-10-12 09:29:13 +03:00
-
41aac5c69b
ggml : Fix FP16 ELU positive branch (#16519)
sirus20x6
2025-10-12 00:25:37 -05:00
-
a2fba89a42
hparams : add check for layer index in is_recurrent (#16511)
Daniel Bevenius
2025-10-12 07:19:06 +02:00
-
20cc625edc
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518)
sirus20x6
2025-10-12 00:15:00 -05:00
-
11f0af5504
CUDA: faster tile FA, add oob checks, more HSs (#16492)
Johannes Gäßler
2025-10-11 20:54:32 +02:00
-
a3cb04744f
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494)
Georgi Gerganov
2025-10-11 16:54:10 +03:00
-
4a8fbe0a5e
feat: render user content as markdown option (#16358)
Pascal
2025-10-11 15:50:49 +02:00
-
31d0ff1869
server / ranking : add sorting and management of top_n (#16403)
Yann Follet
2025-10-11 21:39:04 +08:00
-
97870e6497
cuda : avoid initializing unused devices (#16510)
Diego Devesa
2025-10-11 04:02:26 -07:00
-
477a66b035
convert : correctly handle LLaMA tokenizer for Jamba (#16470)
amirai21
2025-10-11 11:33:41 +03:00
-
e60f01d941
server : fix division by zero when reporting stats (#16501)
Georgi Gerganov
2025-10-10 22:15:05 +03:00
-
81086cd6a3
vocab : mark EOT token for Granite models (#16499)
Georgi Gerganov
2025-10-10 17:17:31 +03:00
-
68ee98ae18
server : return HTTP 400 if prompt exceeds context length (#16486)
Radoslav Gerganov
2025-10-10 17:11:07 +03:00
-
cdb6da468c
server : log requests to /v1/completions (#16495)
Radoslav Gerganov
2025-10-10 13:22:27 +03:00
-
6d69ab3f26
cmake : Dont define XOPENSOURCE on AIX (#16481)
Prajwal B Mehendarkar
2025-10-10 13:45:46 +05:30
-
1faa13a118
webui: updated the chat service to only include max_tokens in the req… (#16489)
Pascal
2025-10-09 22:54:57 +02:00
-
1deee0f8d4
cpu : optimize the ggml NORM operation (#15953)
duduta
2025-10-09 22:11:15 +03:00
-
d00cbea63c
server : host-memory prompt caching (#16391)
Georgi Gerganov
2025-10-09 18:54:51 +03:00
-
8328fd4bae
No markdown in cot (#16483)
Pascal
2025-10-09 17:36:29 +02:00
-
56b4795842
model-conversion : add support for SentenceTransformers (#16387)
Daniel Bevenius
2025-10-09 14:35:22 +02:00
-
2c0d875ae6
ci: add ARM64 Kleidiai build and test support (#16462)
sudhiarm
2025-10-09 09:13:18 +01:00
-
aa4711d369
CANN: Improve ACL graph matching (#16166)
Chenguang Li
2025-10-09 15:50:25 +08:00
-
d80d6d2400
kleidiai: kernel interface refactoring (#16460)
Charles Xu
2025-10-09 09:29:17 +02:00
-
b260213755
[SYCL] refactor soft_max, add soft_max_back (#16472)
Neo Zhang Jianyu
2025-10-09 15:25:11 +08:00
-
e08db42595
model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (#16367)
Saba Fallah
2025-10-09 08:39:18 +02:00
-
12bbc3fa50
refactor: centralize CoT parsing in backend for streaming mode (#16394)
Pascal
2025-10-08 22:18:41 +02:00
-
9d0882840e
Disable CUDA host buffers on integrated GPUs (#16308)
ai-fonsi
2025-10-08 20:21:46 +02:00
-
d2ee056e1d
server : fix cancel pending task (#16467)
issixx
2025-10-08 17:20:18 +09:00
-
b2c08c9ec4
metal : mark FA blocks (#16372)
Georgi Gerganov
2025-10-08 10:57:53 +03:00
-
7fdd16b432
server : improve context checkpoint logic (#16440)
Georgi Gerganov
2025-10-08 10:57:29 +03:00
-
74b8fc17f9
ggml webgpu: profiling, CI updates, reworking of command submission (#16452)
Reese Levine
2025-10-07 13:48:56 -07:00
-
aeaf8a36f0
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
Tarek Dakhran
2025-10-07 20:03:35 +02:00
-
df1b612e29
server : add
/v1/health endpoint (#16461)
Georgi Gerganov
2025-10-07 15:57:14 +03:00
-
4e0388aa8a
webui : added download action (#13552) (#16282)
Sascha Rogmann
2025-10-07 11:11:08 +02:00
-
ef4c5b87ea
presets : fix pooling param for embedding models (#16455)
Georgi Gerganov
2025-10-07 10:32:32 +03:00
-
c61ae20d05
rpc : update documentation (#16441)
Radoslav Gerganov
2025-10-07 09:59:13 +03:00