enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

e5d6c2554e llama-chat : fix typo GML --> GLM (#13143) Xuan-Son Nguyen 2025-04-28 10:11:58 +02:00
f0dd6a1926 musa: fix typo in cc control (#13144) R0CKSTAR 2025-04-28 15:33:28 +08:00
69699be48a CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137) Johannes Gäßler 2025-04-28 09:29:26 +02:00
85f36e5e71 arg : fix unused variable (#13142) Xuan-Son Nguyen 2025-04-28 07:16:59 +02:00
c0a97b762e llama-bench : Add --override-tensors arg (#12922) 4onen 2025-04-27 14:48:26 -07:00
ced44be342 llama-chat : fix wrong template in GLM4-0414 (#13140) matteo 2025-04-27 21:57:32 +02:00
e291450b76 musa: fix build warning (#13129) R0CKSTAR 2025-04-27 19:22:49 +08:00
59e991c23c Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133) LostRuins Concedo 2025-04-27 18:43:37 +08:00
ca2bb89eac clip : Add Qwen2.5VL support (#12402) HimariO 2025-04-27 16:10:34 +08:00
2d451c8059 common : add common_remote_get_content (#13123) Xuan-Son Nguyen 2025-04-26 22:58:12 +02:00
4753791e70 clip : improve projector naming (#13118) Xuan-Son Nguyen 2025-04-26 22:39:47 +02:00
77d5e9a76a ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (#13107) SXX 2025-04-26 22:05:31 +08:00
d5fe4e81bd grammar : handle maxItems == 0 in JSON schema (#13117) frob 2025-04-26 10:10:20 +02:00
295354ea68 llama : fix K-shift with quantized K and BLAS backend (#13113) Diego Devesa 2025-04-25 19:40:11 +02:00
558a764713 Force FP32 compute in GLM4 FFN Down (#13101) City 2025-04-25 14:38:34 +02:00
edb18b6e8f clip : fix pixtral on some GPU backends (#13097) Xuan-Son Nguyen 2025-04-25 14:31:42 +02:00
514c45608f change the reorder tensor from init to execute OP (#13003) Neo Zhang Jianyu 2025-04-25 17:37:51 +08:00
553a5c3a9f rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943) Radoslav Gerganov 2025-04-25 10:08:08 +03:00
13be08daf9 clip : remove boi/eoi embeddings for GLM-edge model (#13081) Xuan-Son Nguyen 2025-04-24 22:17:04 +02:00
226251ed56 embeddings : fix batch sizes (#13076) Georgi Gerganov 2025-04-24 22:29:22 +03:00
87616f0680 ggml : fix trailing whitespaces (#0) Georgi Gerganov 2025-04-24 17:22:27 +03:00
63b4911494 sync : ggml Georgi Gerganov 2025-04-24 16:47:43 +03:00
c6e8cc28c1 ggml : Depthwise 2D convolution (ggml/1152) Acly 2025-04-17 14:16:45 +02:00
b10d8bfdb1 CUDA: use switch statements in constexpr functions (#13095) Johannes Gäßler 2025-04-24 15:57:10 +02:00
13b4548877 cmake : do not include ./src as public for libllama (#13062) Georgi Gerganov 2025-04-24 16:00:10 +03:00
572b3141d3 clang-tidy : disable warning about missing math parenthesis (#13091) Georgi Gerganov 2025-04-24 15:44:05 +03:00
7c727fbe39 arg : add --no-mmproj-offload (#13093) Xuan-Son Nguyen 2025-04-24 14:04:14 +02:00
80982e815e arg : clean up handling --mmproj with -hf (#13082) Xuan-Son Nguyen 2025-04-24 12:14:13 +02:00
7604a7d6b8 metal : fix floating-point range of attention scores in FA kernels (#13090) Georgi Gerganov 2025-04-24 10:38:30 +03:00
b3b6d862cf vulkan: matmul gcn tuning (#13016) Eve 2025-04-24 07:18:33 +00:00
5630406959 llama-mtmd-cli: Sigint rework in mtmd vision example (#13080) pl752 2025-04-24 02:32:35 +05:00
ecda2ec4b3 mtmd : Support Pixtral 12B (#13065) Xuan-Son Nguyen 2025-04-23 20:21:59 +02:00
eb1776b15a convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z (#13021) piDack 2025-04-23 22:59:14 +08:00
2cca6c01e4 rpc : add command line option for number of threads for the CPU backend (#13060) Radoslav Gerganov 2025-04-23 10:32:49 +03:00
658987cfc9 CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014) Johannes Gäßler 2025-04-22 21:27:40 +02:00
dc39a5e7a8 mtmd : support SmolVLM (version 1 and 2) (#13050) Xuan-Son Nguyen 2025-04-22 16:24:54 +02:00
ab47dec3d3 security : add note about RPC and server functionality (#13061) Georgi Gerganov 2025-04-22 16:16:10 +03:00
7b53389c24 metal : add memory pool for temp allocs (#12850) Georgi Gerganov 2025-04-22 16:15:51 +03:00
243453533e llava : update documentations (#13055) Xuan-Son Nguyen 2025-04-22 10:37:00 +02:00
1d735c0b4f ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (#12871) Diego Devesa 2025-04-21 18:13:51 +02:00
5368ddda7a SYCL: Add non-contiguous support in ROPE (#12993) Akarshan Biswas 2025-04-21 19:13:30 +05:30
84a9bf2fc2 mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli (#13012) Xuan-Son Nguyen 2025-04-21 15:32:58 +02:00
2016f07bd1 convert : experimental support for --mmproj flag (#13023) Xuan-Son Nguyen 2025-04-20 23:29:36 +02:00
6602304814 llava: fix errors in clip.h on certain compilers (#13030) Jeffrey Morgan 2025-04-20 03:15:41 -07:00
66168204be vulkan: support noncontiguous rms_norm (#13031) Jeff Bolz 2025-04-20 03:50:02 -05:00
4ba9d711ba metal: add neg operator (#13029) Jeffrey Morgan 2025-04-19 22:28:40 -07:00
00137157fc Disable CI cross-compile builds (#13022) bandoti 2025-04-19 13:05:03 -03:00
fb28f4f80e gguf-py : fix upload python package workflow (#13020) Sigbjørn Skjæret 2025-04-19 16:26:38 +02:00
37b9f0d29d clip : refactor, add image_manipulation and llava_uhd classes (#13011) Xuan-Son Nguyen 2025-04-19 09:15:45 +02:00
6408210082 main : Fix Ctrl+D/newline handling (#12951) Daniel Tang 2025-04-18 16:02:55 -04:00
aff9d107b0 gguf-py : GGUF Editor GUI - Python + Qt6 (#12930) Chris Thompson 2025-04-18 12:30:41 -06:00
35370ba945 server : use std::move whenever possible (#12936) Xuan-Son Nguyen 2025-04-18 19:58:12 +02:00
8d66005763 SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975) Akarshan Biswas 2025-04-18 19:27:56 +05:30
b9154ecff9 mtmd : add methods to access mtmd_image_tokens (#12906) Xuan-Son Nguyen 2025-04-18 10:04:51 +02:00
2db9ba1464 rpc : add RPC_CMD_HELLO (#12955) Radoslav Gerganov 2025-04-18 10:13:42 +03:00
2f74c354c0 graph : make FA compatible with MLA + add initial Metal kernels (#12953) Georgi Gerganov 2025-04-17 18:16:36 +03:00
207c22ec2d ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (#12970) Alan Gray 2025-04-17 14:19:42 +01:00
7a395f67a7 CANN: Add support for async operator submission (#12864) hipudding 2025-04-17 20:34:16 +08:00
971f245b3b llama : recognize IBM Granite 3.3 FIM tokens (#12988) Mikko Juola 2025-04-17 01:37:05 -07:00
12b17501e6 opencl: fix incorrect local_size index in profiling log (#12868) kimminsu 2025-04-17 06:25:57 +09:00
015022bb53 vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931) Jeff Bolz 2025-04-16 13:37:25 -05:00
b43d89e311 CANN: Add 310P operator support check (#12962) Chenguang Li 2025-04-16 16:21:05 +08:00
80f19b4186 opencl: split ggml-opencl.cl into multiple files and cleanup (#12886) lhez 2025-04-15 12:26:00 -07:00
f8f820cc4d metal : add FA-vec kernels for head size 96 (#12952) Georgi Gerganov 2025-04-15 14:45:05 +03:00
54a7272043 CANN: Add x86 build ci (#12950) hipudding 2025-04-15 19:08:55 +08:00
84778e9770 CUDA/HIP: Share the same unified memory allocation logic. (#12934) David Huang 2025-04-15 17:20:38 +08:00
510676475f SYCL: Add ROPE vision kernel (#12887) Akarshan Biswas 2025-04-15 14:07:42 +05:30
daa422881a llama : DeepSeek V2/V3 MLA implementation (#12801) Juk Armstrong 2025-04-15 07:49:57 +01:00
eccc7a1602 ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829) Srihari-mcw 2025-04-15 11:52:36 +05:30
0019279bb5 CANN: Opt ROPE optimization (#12865) Chenguang Li 2025-04-15 10:09:35 +08:00
b0c75ac9f9 CANN: Optimize CANN buffer pool memory management (#12875) Xinpeng Dou 2025-04-15 10:04:24 +08:00
d6d2c2ab8c Add performance print for gemma3 in example (#12929) Russyyds 2025-04-15 01:18:20 +08:00
75afa0ae31 SYCL: Fix im2col (#12910) Akarshan Biswas 2025-04-14 17:53:53 +05:30
c772d54926 rpc : use ggml_context_ptr (#12938) Radoslav Gerganov 2025-04-14 13:59:34 +03:00
81c7e64fc2 dsiable curl lib check, this action is missed by commit bd3f59f812 (#12761) (#12937) Neo Zhang Jianyu 2025-04-14 18:19:07 +08:00
526739b879 sync : ggml Georgi Gerganov 2025-04-14 08:52:10 +03:00
a25355e264 cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190) cmdr2 2025-04-11 12:14:19 +05:30
e959d32b1c ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (#12773) SXX 2025-04-14 13:47:55 +08:00
307bfa253d ggml: disable CUDA graphs for unsupported DUP and CONT node types (#12891) Alan Gray 2025-04-13 22:12:21 +01:00
71e90e8813 quantize: Handle user-defined quantization levels for additional tensors (#12511) Ed Addario 2025-04-13 19:29:28 +01:00
bc091a4dc5 common : Define cache directory on AIX (#12915) Prajwal B Mehendarkar 2025-04-12 21:03:39 +05:30
a4837577aa vulkan: use aligned loads for flash attention mask (#12853) Jeff Bolz 2025-04-12 03:44:48 -05:00
e59ea539b8 llava: Fix cpu-only clip image encoding sefault (#12907) Matt Clayton 2025-04-12 01:29:03 -04:00
c94085df28 server : add VSCode's Github Copilot Chat support (#12896) Georgi Gerganov 2025-04-11 23:37:41 +03:00
e8a62631b3 rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903) yuri@FreeBSD 2025-04-11 13:04:14 -07:00
b6930ebc42 tool-call: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900) Olivier Chafik 2025-04-11 12:47:52 -07:00
68b08f36d0 common : Define cache directory on FreeBSD (#12892) yuri@FreeBSD 2025-04-11 12:45:44 -07:00
578754b315 sycl: Support sycl_ext_oneapi_limited_graph (#12873) Ewan Crawford 2025-04-11 15:32:14 +02:00
b2034c2b55 contrib: support modelscope community (#12664) tastelikefeet 2025-04-11 20:01:56 +08:00
06bb53ad9b llama-model : add Glm4Model implementation for GLM-4-0414 (#12867) Yuxuan Zhang 2025-04-11 18:10:10 +08:00
0c50923944 clip : use smart pointer (⚠️ breaking change) (#12869) Xuan-Son Nguyen 2025-04-11 12:09:39 +02:00
fccf9cae83 SYCL: Add fp16 type support to unary op kernels (#12788) Akarshan Biswas 2025-04-11 13:33:50 +05:30
ec6c09d0fa convert : Llama4 RoPE fix (#12889) Daniel Han 2025-04-11 00:49:09 -07:00
8ac9f5d765 ci : Replace freediskspace to free_disk_space in docker.yml (#12861) R0CKSTAR 2025-04-11 15:26:17 +08:00
12e9158f25 xcf : add check for visionos build version (#12854) Daniel Bevenius 2025-04-11 09:24:34 +02:00
5b1f13cb64 convert : proper tensor name mapping for llama4 (#12870) Xuan-Son Nguyen 2025-04-11 09:23:37 +02:00
8b91d5355a llama : correct rms norm for llama 4 (#12882) Xuan-Son Nguyen 2025-04-11 08:49:50 +02:00
0fed24c347 ggml: fix compilation error s390x (#12848) Aaron Teo 2025-04-11 13:20:07 +08:00
47ba87d0a4 sync : ggml Georgi Gerganov 2025-04-11 00:08:23 +03:00
1d2b613445 tests : fix init order (#0) Georgi Gerganov 2025-04-11 00:04:25 +03:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full