enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

e725a1a982 opencl: add swiglu_oai and add_id (#15121) lhez 2025-08-07 04:12:17 +09:00
3db4da56a5 chat : support Granite model reasoning and tool call (#14864) Sachin Desai 2025-08-06 11:27:30 -07:00
476aa3fd57 Fixed name -override-tensors to -override-tensor (#15129) Juk Armstrong 2025-08-06 17:28:48 +01:00
0d8831543c ggml : fix fallback to CPU for ununsupported ops (#15118) Diego Devesa 2025-08-06 05:37:35 -07:00
65c797c4fa chat : fix yandex chat template (#15116) Sigbjørn Skjæret 2025-08-06 13:26:49 +02:00
25726898e8 chat : fix hunyuan auto-detection (#15114) stevenkuang 2025-08-06 17:48:30 +08:00
2241453252 CANN: add support for ACL Graph (#15065) Chenguang Li 2025-08-06 14:12:42 +08:00
9515c6131a ggml: WebGPU disable SET_ROWS for now (#15078) Reese Levine 2025-08-05 16:26:38 -07:00
fd1234cb46 llama : add gpt-oss (#15091) Georgi Gerganov 2025-08-05 22:10:36 +03:00
f324a3b715 chat : only remove double bos/eos if added (#15086) Sigbjørn Skjæret 2025-08-05 20:43:36 +02:00
be42642581 readme : update hot topics (#15097) Georgi Gerganov 2025-08-05 20:19:33 +03:00
3306ceabf0 sycl: fix mul_mat selection (#15092) Romain Biessy 2025-08-05 18:39:55 +02:00
c81de6e107 Fix glm4moe bug (#15088) Juk Armstrong 2025-08-05 13:56:44 +01:00
22f060c9c4 webui: fix markdown table (#15081) Alex Wu 2025-08-05 19:56:44 +08:00
ee3a9fcf88 context : fix index overflow on huge outputs (#15080) compilade 2025-08-05 05:27:45 -04:00
ec428b02c3 llama : add --n-cpu-moe option (#15077) Diego Devesa 2025-08-04 16:05:36 -07:00
19f68fa5a4 imatrix : warn when GGUF imatrix is saved without .gguf suffix (#15076) compilade 2025-08-04 17:26:52 -04:00
41613437ff cmake: Add GGML_BACKEND_DIR option (#15074) Christian Kastner 2025-08-04 21:29:14 +02:00
e5bebe5251 gguf-py : add --chat-template-file to gguf_new_metadata (#15075) Sigbjørn Skjæret 2025-08-04 21:01:48 +02:00
ef0144c087 model: support GLM 4.5 family of models (#14939) Sam 2025-08-05 04:29:25 +10:00
2721257e3e quantize : fix confusing error message if ftype is invalid (#15071) Sigbjørn Skjæret 2025-08-04 18:11:02 +02:00
587d0118f5 ggml: WebGPU backend host improvements and style fixing (#14978) Reese Levine 2025-08-04 08:52:43 -07:00
5aa1105da2 vulkan: fix build when using glslang that does not support coopmat2 (#15062) Jeff Bolz 2025-08-04 00:09:19 -05:00
d31192b4ee imatrix : use GGUF by default (#14842) compilade 2025-08-03 16:00:05 -04:00
0a2f5496be imatrix : fix 3d activation handling for hybrid and recurrent models (#14994) compilade 2025-08-03 15:49:13 -04:00
11a3811164 memory : handle kv_unified for hybrid models (#15050) compilade 2025-08-03 15:43:07 -04:00
97366dc6ab vocab : JetBrains Mellum pre-tokenizer (#15045) Csaba Kecskemeti 2025-08-03 12:38:18 -07:00
83bc2f288c model : add text-only support for Kimi-VL (and find special tokens in text_config) (#15051) Gabriel Larson 2025-08-03 09:56:25 -05:00
6c7a441161 vulkan: Use coopmat2 for conv2d (#14982) Jeff Bolz 2025-08-03 07:23:57 -05:00
5c0eb5ef54 opencl: fix adreno compiler detection logic (#15029) lhez 2025-08-02 10:51:18 -07:00
03d4698218 CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035) Johannes Gäßler 2025-08-02 16:37:08 +02:00
3303c19b16 cuda: make im2col a little faster (#15025) leejet 2025-08-02 22:15:36 +08:00
4fdea540bd kv-cache : skip alignment of n_stream in kv-cache log msg [no ci] (#15040) Daniel Bevenius 2025-08-02 16:14:57 +02:00
a4569c41fd llama : enable LLAMA_SET_ROWS=1 by default (#14959) Georgi Gerganov 2025-08-02 17:14:21 +03:00
15e92fd337 cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038) Georgi Gerganov 2025-08-02 17:13:05 +03:00
2bf3fbf0b5 ci : check that pre-tokenizer hashes are up-to-date (#15032) Sigbjørn Skjæret 2025-08-02 14:39:01 +02:00
711d5e6fe6 convert : fix Qwen3-Embedding pre-tokenizer hash (#15030) Douglas Hanley 2025-08-02 05:51:02 -05:00
f738989dcb chat : fix multiple tool_calls on hermes-2-pro (#14962) Jhen-Jie Hong 2025-08-02 18:04:48 +08:00
4cb208c93c vulkan: coopmat2 mul_mat optimizations (#14934) Jeff Bolz 2025-08-02 04:21:37 -05:00
3025b621d1 llama-bench: rename DB table name from test to llama_bench (#15003) R0CKSTAR 2025-08-02 17:20:40 +08:00
ec0b18802c vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015) Jeff Bolz 2025-08-02 03:48:30 -05:00
339bd0268c model : support Qwen3-Embedding (#15023) Douglas Hanley 2025-08-02 03:44:50 -05:00
f906275537 server: enable token array inputs for OAI API (#15001) Johannes Gäßler 2025-08-02 10:12:41 +02:00
a9f7541ec2 vulkan: optimizations for direct convolution (#14933) Jeff Bolz 2025-08-02 02:57:04 -05:00
9c35706b98 CUDA: fix MMQ nwarps for AMD with warp_size==32 (#15014) Johannes Gäßler 2025-08-01 20:47:32 +02:00
c76b420e4c vendor : update vendored copy of google/minja (#15011) l-austenfeld 2025-08-01 16:59:06 +02:00
0f5ccd6fd1 model : add hunyuan dense (#14878) stevenkuang 2025-08-01 21:31:12 +08:00
1c872f71fb opencl: add f16 for add, sub, mul, div (#14984) lhez 2025-08-01 04:15:44 -07:00
baad94885d ggml : Q2k interleaving implementation - x86/x64 SIMD (#14373) Srihari-mcw 2025-08-01 11:50:33 +05:30
ba42794c9e graph : fix equal_seq() check (#14986) Georgi Gerganov 2025-08-01 06:38:12 +03:00
2860d479b4 docker : add cann build pipline (#14591) diannao 2025-08-01 10:02:34 +08:00
484b2091ce compare-commits.sh: support both llama-bench and test-backend-ops (#14392) R0CKSTAR 2025-08-01 08:47:27 +08:00
daf2dd7880 quantize : skip tensor override when in fallback mode (#14995) Ed Addario 2025-07-31 20:32:18 +01:00
a06ed5feae llama : add simple option to enable CPU for MoE weights (--cpu-moe) (#14992) Diego Devesa 2025-07-31 11:15:41 -07:00
784524053d Fix params bug in diffusion example (#14993) Aman Gupta 2025-08-01 01:22:58 +08:00
d6818d06a6 llama : allow other bufts when overriding to CPU, add --no-repack option (#14990) Diego Devesa 2025-07-31 09:11:34 -07:00
e08a98826b Vulkan: Fix minor debug mode issues (#14899) Ruben Ortlam 2025-07-31 17:46:54 +02:00
952a47f455 mtmd : support MiniCPM-V 4.0 (#14983) tc-mb 2025-07-31 23:22:17 +08:00
36e5fe7bcd MODEL_TENSOR.SSM_DT_NORM has defined twice (#14991) Csaba Kecskemeti 2025-07-31 07:59:49 -07:00
94933c8c2e server : implement universal assisted decoding (#12635) g2mt 2025-07-31 05:25:23 -07:00
c1dacaa99b llama : merge build_moe_ffn_from_probs function into build_moe_ffn (#14968) Dongliang Wei 2025-07-31 20:12:20 +08:00
a9f77a8be3 server : add openai-style logit_bias support (#14946) Lukas Straub 2025-07-31 14:08:23 +02:00
8a4a856277 Add LLaDA 8b Diffusion model (#14771) Aman Gupta 2025-07-31 19:49:09 +08:00
11490b3672 CANN: Improve loading efficiency after converting weights to NZ format. (#14985) hipudding 2025-07-31 19:47:20 +08:00
66625a59a5 graph : reduce splits for recurrent and hybrid models (#14825) compilade 2025-07-31 01:02:46 -04:00
6e6725459a opencl: add mul_mat_f32_f32_l4_lm and mul_mat_f16_f32_l4_lm (#14809) lhez 2025-07-30 14:56:55 -07:00
e9192bec56 quantize : fix using combined imatrix GGUFs (multiple datasets) (#14973) Ed Addario 2025-07-30 20:11:56 +01:00
41e78c567e server : add support for embd_normalize parameter (#14964) Daniel Bevenius 2025-07-30 18:07:11 +02:00
ad4a700117 HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (#14949) uvos 2025-07-30 17:38:06 +02:00
e32a4ec60e sync : ggml Georgi Gerganov 2025-07-30 16:03:13 +03:00
e228de9449 cmake : Fix BLAS link interface (ggml/1316) Kai Pastor 2025-07-30 14:53:16 +02:00
73a8e5ca03 vulkan : fix 32-bit builds (ggml/1313) Kai Pastor 2025-07-30 14:52:26 +02:00
92b8810ec7 CUDA: skip masked KV slices for all FA kernels (#14924) Johannes Gäßler 2025-07-30 15:46:13 +02:00
00131d6eaf tests : update for LLAMA_SET_ROWS=1 (#14961) Georgi Gerganov 2025-07-30 15:12:02 +03:00
1e15bfd42c graph : fix stack-use-after-return (#14960) Georgi Gerganov 2025-07-30 13:52:11 +03:00
a118d80233 embeddings: fix extraction of CLS pooling results (#14927) Douglas Hanley 2025-07-30 00:25:05 -05:00
61550f8231 CANN: update ops docs (#14935) Xinpeng Dou 2025-07-30 08:39:24 +08:00
aa79524c51 HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (#14945) uvos 2025-07-29 20:23:04 +02:00
b77d11179d HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (#14930) uvos 2025-07-29 17:44:30 +02:00
c7aa1364fd HIP: Ignore unsupported unroll transformation in fattn-vec (#14931) uvos 2025-07-29 17:43:43 +02:00
1a67fcc306 common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937) kallewoof 2025-07-30 00:05:38 +09:00
204f2cf168 CANN: Add ggml_set_rows (#14943) hipudding 2025-07-29 22:36:43 +08:00
138b288b59 cuda : add softcap fusion (#14907) Sigbjørn Skjæret 2025-07-29 14:22:03 +02:00
bbd0f91779 server-bench: make seed choice configurable (#14929) Johannes Gäßler 2025-07-29 10:40:50 +02:00
0a5036bee9 CUDA: add roll (#14919) Aman Gupta 2025-07-29 14:45:18 +08:00
8ad7b3e65b opencl : add ops docs (#14910) lhez 2025-07-28 09:50:17 -07:00
bda62193b2 test-backend-ops : extend test case filtering (#14865) Leonard Mosescu 2025-07-28 09:04:27 -07:00
c556418b60 llama-bench : use local GPUs along with RPC servers (#14917) Radoslav Gerganov 2025-07-28 18:59:04 +03:00
db16e2831c ggml-cpu : deduplicate scalar implementations (#14897) xctan 2025-07-28 23:40:24 +08:00
cd1fce6d4f SYCL: Add set_rows support for quantized types (#14883) Akarshan Biswas 2025-07-28 20:32:15 +05:30
00fa15fedc mtmd : add support for Voxtral (#14862) Xuan-Son Nguyen 2025-07-28 15:01:48 +02:00
946b1f6859 CUDA: fix pointer incrementation in FA (#14916) Johannes Gäßler 2025-07-28 14:30:22 +02:00
6c6e397aff model : add support for SmallThinker series (#14898) Dongliang Wei 2025-07-28 19:47:00 +08:00
afc0e89698 sycl: refactor quantization to q8_1 (#14815) Alberto Cabrera Pérez 2025-07-28 11:05:53 +01:00
a5771c9eea ops : update BLAS (#14914) Georgi Gerganov 2025-07-28 11:01:03 +03:00
c35f9eaf09 ops : update Metal (#14912) Georgi Gerganov 2025-07-28 08:22:56 +03:00
1f45f2890e sync : ggml Georgi Gerganov 2025-07-28 08:14:20 +03:00
613c5095c3 cmake : Indent ggml-config.cmake (ggml/1310) Kai Pastor 2025-07-24 19:58:02 +02:00
7f97599581 quantize : update README.md (#14905) Ed Addario 2025-07-27 22:31:11 +01:00
bf78f5439e vulkan: add ops docs (#14900) Ruben Ortlam 2025-07-27 15:33:08 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full