enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

3f81b4e91c vulkan: support GET_ROWS for k-quants (#16235) Jeff Bolz 2025-09-27 06:36:11 -04:00
ace6a54565 build : add LLAMA_OPENSSL option (#16287) Adrien Gallouët 2025-09-27 11:12:46 +02:00
72b24d96c6 model : make minicpm embedding_scale, residual_scale and logit_scale optional with legacy defaults (#16273) Vinkal 2025-09-27 02:58:29 +05:30
624207e676 devops: add s390x & ppc64le CI (#15925) Aaron Teo 2025-09-27 02:03:33 +08:00
807e8c6d31 Enhance text file detection logic for file attachments (#16199) Aleksander Grygier 2025-09-26 19:25:29 +02:00
1a18927894 Allow viewing conversations even when llama server is down (#16255) Aleksander Grygier 2025-09-26 18:35:42 +02:00
e0539eb6ae webui: switch to hash-based routing (alternative of #16079) (#16157) Isaac McFadyen 2025-09-26 11:36:48 -04:00
5d0a40f390 Always show message actions for mobile UI + improvements for user message sizing (#16076) Aleksander Grygier 2025-09-26 15:59:07 +02:00
d12a983659 codeowners : add rgerganov as owner of RPC [no ci] (#16279) Radoslav Gerganov 2025-09-26 16:09:34 +03:00
cc1cfa277b mtmd : fix uninitialized variable in bicubic_resize (#16275) Aleksei Nikiforov 2025-09-26 15:00:44 +02:00
54dbc37053 metal : report OOM errors (#16274) Georgi Gerganov 2025-09-26 14:14:28 +03:00
b995a10760 common : use cpp-httplib as a cURL alternative for downloads (#16185) Adrien Gallouët 2025-09-26 13:12:19 +02:00
4710dd31bb build : fix build-ios-device (#16257) Adrien Gallouët 2025-09-26 12:39:35 +02:00
9b26511857 ggml-cpu: implement MXFP4 SIMD for s390x (#16193) Aaron Teo 2025-09-26 18:27:25 +08:00
00217cd413 ci : create git tags for released docker images (#16008) Radoslav Gerganov 2025-09-26 13:19:23 +03:00
3b337b01a1 codeowners : add danbev as owner of build-xcframework.sh [no ci] (#16268) Daniel Bevenius 2025-09-26 07:53:36 +02:00
a86a580a66 musa: upgrade musa sdk to 4.3.0 (#16240) R0CKSTAR 2025-09-26 08:56:38 +08:00
0f7c69689f musa: fix build warnings (#15611) R0CKSTAR 2025-09-26 08:56:10 +08:00
835b2b915c model : add GroveMoE support (#15510) Sigbjørn Skjæret 2025-09-25 19:50:28 +02:00
b05a9d650f vendors: update miniaudio version (#16212) Aaron Teo 2025-09-25 23:38:10 +08:00
27052978e4 readme : update bindings (#16144) rtaluyev 2025-09-25 18:20:34 +03:00
077c94d0ca CUDA: add a fused top-K MoE kernel (#16130) Aman Gupta 2025-09-25 22:35:05 +08:00
aa3ee0eb0b model-conversion : add embedding prompt file support (#15871) Daniel Bevenius 2025-09-25 12:02:36 +02:00
d0991da39d server : add support for external server for tests (#16243) Daniel Bevenius 2025-09-25 11:36:47 +02:00
aa719c2f88 ggml : fix loongarch lsx compilation error (#15864) junchao-zhao 2025-09-25 17:22:55 +08:00
4cdd0bb453 docs: fix typo [no ci] (#16244) Johannes Gäßler 2025-09-25 11:12:27 +02:00
b5bd037832 llama : add support for qwen3 reranker (#15824) Douglas Hanley 2025-09-25 03:53:09 -05:00
dfcd53f7ec metal : fuse NORM + MUL + ADD, support non-multiples of 4 (#16220) Georgi Gerganov 2025-09-25 11:30:16 +03:00
4ea00794b8 metal : relax reorder conditions (#16216) Georgi Gerganov 2025-09-25 11:29:42 +03:00
02a6a82ae7 metal : restore im2col perf (#16219) Georgi Gerganov 2025-09-25 11:29:08 +03:00
c498fc82fe rpc : use ggml logging facilities Radoslav Gerganov 2025-09-25 10:20:02 +03:00
e7a5130a20 codeowners: add ownership of zdnn backend [no ci] (#16232) Aaron Teo 2025-09-25 13:06:30 +08:00
bee378e098 ci: run the x64 and arm ci on the github machines instead (#16183) Eve 2025-09-25 05:06:06 +00:00
5fb557653b devops: fix s390x docker release failure (#16231) Aaron Teo 2025-09-25 11:36:30 +08:00
4ae88d07d0 codeowners: add ownership of zdnn backend [no ci] (#16229) Aaron Teo 2025-09-25 00:25:04 +08:00
e789095502 llama: print memory breakdown on exit (#15860) Johannes Gäßler 2025-09-24 16:53:48 +02:00
f2a789e334 ggml : split graph allocations according to backend max buffer size (#15815) Acly 2025-09-24 16:17:49 +02:00
3a59971967 model : add label for LiquidAI LFM2-2.6B model (#16204) Tarek Dakhran 2025-09-24 13:42:26 +02:00
63b54c81a6 model-conversion : make causal-verify-logits fails with model names containing "." (#16215) Jie Fu (傅杰) 2025-09-24 16:25:26 +08:00
152729f884 common : add missing chrono header for common.cpp (#16211) Uilian Ries 2025-09-24 08:53:47 +02:00
c0c59c1157 codeowners : match all requirements files (#16214) Sigbjørn Skjæret 2025-09-24 08:53:20 +02:00
7735706b93 model-conversion : run-org-model.py fails to run on mac m1 (#16213) Jie Fu (傅杰) 2025-09-24 14:46:52 +08:00
4d9ea03d17 codeowners : use slash prefix for root files [no ci] (#16210) Daniel Bevenius 2025-09-24 08:10:09 +02:00
8ba548dae2 model-conversion : fix the make targets in the README.md (#16209) Jie Fu (傅杰) 2025-09-24 12:19:23 +08:00
f505bd83ca ci : disable AMD workflows + update NVIDIA workflows (#16200) Georgi Gerganov 2025-09-23 20:41:40 +03:00
0889589dbe ci : enable Vulkan workflow on Mac (#16194) Georgi Gerganov 2025-09-23 13:44:25 +03:00
4e29084ba4 ggml-cpu: Respect cpumask settings (#16164) Xiangyan Sun 2025-09-23 01:58:12 -07:00
f6b4af3d04 ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928) Sigbjørn Skjæret 2025-09-23 10:25:20 +02:00
264f1b5187 zdnn: refactor codebase + add docs (#16178) Aaron Teo 2025-09-23 14:53:05 +08:00
0bc7cc7154 codeowners : add @danbev to model-conversion example [no ci] (#16190) Daniel Bevenius 2025-09-23 08:13:22 +02:00
4b9f4cb0f8 devops: add s390x containers (#15915) Aaron Teo 2025-09-23 13:59:34 +08:00
85e72271ba ggml-cpu : fix typo in gemm comments [no ci] (#16189) Daniel Bevenius 2025-09-23 05:59:03 +02:00
1d0125bcf1 feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (#16177) Gabe Goodhart 2025-09-22 12:40:10 -06:00
351f3da39c clang-tidy : disable warning about performance enum size (#16127) Haiyue Wang 2025-09-23 01:57:46 +08:00
3ecb2f671a ggml : implement set_rows with i32 index (#16159) Sigbjørn Skjæret 2025-09-22 19:13:00 +02:00
432cf4304c codeowners : update + cleanup (#16174) Georgi Gerganov 2025-09-22 18:20:21 +03:00
37a23c17bd common : enable --offline mode without curl support (#16137) Adrien Gallouët 2025-09-22 14:13:51 +02:00
138c87ce8b webui : fix handling incomplete chunks (#16107) Quentin Bramas 2025-09-22 10:53:13 +02:00
c6db9a1027 embedding : fix typos in README (#16171) GideonSerf 2025-09-22 10:49:58 +02:00
d05affbab7 common : remove unused local variables (#16140) Haiyue Wang 2025-09-22 16:48:42 +08:00
4f324a556c ggml : extend ggml_can_fuse to work with non-sequential nodes (#16123) Georgi Gerganov 2025-09-22 11:12:37 +03:00
a71ae3ba7a ggml : add ggml_op_is_empty (#16122) Georgi Gerganov 2025-09-22 11:12:09 +03:00
05a2458121 codeowners : update ownership for @ngxson and @allozuar (#16128) Xuan-Son Nguyen 2025-09-22 15:10:58 +07:00
96fdca043b Vulkan: add conv_transpose_2d operation (#16022) Shin-myoung-serp 2025-09-22 17:04:01 +09:00
b2d980fce0 codeowners : claim responsibility for ci, models, gguf-py and convert (#16124) Sigbjørn Skjæret 2025-09-22 09:59:05 +02:00
5c6106a696 contrib : update roles (#16113) Georgi Gerganov 2025-09-22 10:58:02 +03:00
ec65fb52f0 ci : remove vulkaninfo calls (#16169) Georgi Gerganov 2025-09-22 10:16:05 +03:00
1d660d2fae ci : use smaller model (#16168) Georgi Gerganov 2025-09-22 09:11:39 +03:00
a20d810d79 vulkan: add RTE variants of exp shader (#16165) Jeff Bolz 2025-09-22 00:37:17 -05:00
4d0a7cbc61 ci : adjust params for less runtime (#16167) Georgi Gerganov 2025-09-22 08:31:40 +03:00
9073a73d82 vulkan: vec dot matrix multiplication fix (#16151) Ruben Ortlam 2025-09-22 07:22:43 +02:00
51f5a45fbe opencl: fix concat crash on win arm64 with Adreno (#15944) lhez 2025-09-21 16:42:10 -07:00
c4510dc937 opencl: initial q8_0 mv support (#15732) lhez 2025-09-21 14:48:44 -07:00
da30ab5f86 ci : add label for the RISC-V runner (#16150) Georgi Gerganov 2025-09-21 19:00:27 +03:00
28baac9c9f ci : migrate ggml ci to self-hosted runners (#16116) Georgi Gerganov 2025-09-21 16:50:45 +03:00
1eeb523c3e vulkan: optimize UMA buffer operations and fix driver hangs (#16059) Giuseppe Scrivano 2025-09-21 08:31:55 +02:00
5bb4a3edec vulkan: fix validation error about VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR (#16086) Jeff Bolz 2025-09-21 01:23:37 -05:00
7f766929ca sync : ggml Georgi Gerganov 2025-09-20 12:55:47 +03:00
405921dcef ggml : introduce semantic versioning (ggml/1336) Daniel Bevenius 2025-09-16 06:16:52 +02:00
fa6383ca7e CUDA : conditionally add cuda architectures (ggml/1341) Gregor Jasny 2025-09-10 17:21:11 +02:00
803dac2e48 vulkan: use vec dot for matrix matrix multiplications (#16056) Ruben Ortlam 2025-09-20 10:42:56 +02:00
459c0c2c1a server: fix SSE and OpenAI compatibility for error messages when streaming (#16109) Benni 2025-09-20 07:56:30 +02:00
be79d9fdd9 llama-bench: add --devices and --list-devices support (#16039) ssweens 2025-09-19 15:15:21 -07:00
f432d8d83e chat: Fix streaming parser for granite models (#15682) shun095 2025-09-20 00:57:30 +09:00
4067f07fc5 feat: Improve mobile UI for Settings Dialog (#16084) Aleksander Grygier 2025-09-19 09:52:27 +02:00
4b8560ab56 chat : fix build on arm64 (#16101) Xuan-Son Nguyen 2025-09-19 13:02:51 +07:00
0dd58b6877 ggml : refactor forward_dup for cpu backend (#16062) Xuan-Son Nguyen 2025-09-19 11:31:56 +07:00
69ffd89163 ggml-amx : fix ggml_amx_init() on generic Linux (#16049) Adrien Gallouët 2025-09-18 23:07:26 +02:00
246c0d9c79 cmake : fix static linking for OpenMP on Unix-like systems (#16031) Adrien Gallouët 2025-09-18 23:07:18 +02:00
3edd87cd05 opencl: optimize mxfp4 kernels (#16037) Shawn Gu 2025-09-18 12:03:34 -07:00
c0b45097c3 rename optimize_graph to graph_optimize (#16082) Jeff Bolz 2025-09-18 13:46:17 -05:00
38dbdf4c05 CUDA: Optimize PAD_REFLECT_1D (#15957) Bowen Han 2025-09-18 11:26:03 -07:00
368560a1e3 CUDA: fix compilation on CC 6.0 (#16091) Johannes Gäßler 2025-09-18 19:28:32 +02:00
4ca088b036 Add resumable downloads for llama-server model loading (#15963) Eric Curtin 2025-09-18 16:22:50 +01:00
703f9e32c4 metal : use function constants for mul_mv_ext kernels (#16074) Georgi Gerganov 2025-09-18 16:28:41 +03:00
ad6bd9083b cuda : add missing F32<->I32 entries in ggml_cuda_cpy_fn (#16060) Sigbjørn Skjæret 2025-09-18 13:28:22 +02:00
2b6b55a59f server : include usage statistics only when user request them (#16052) Radoslav Gerganov 2025-09-18 13:36:57 +03:00
e58174cecb llama : bump max seq limit from 64 to 256 (#15916) Georgi Gerganov 2025-09-18 12:47:56 +03:00
b213fce89b metal : improve F32, F16 and BF16 mat-vec multiplication (#16057) Georgi Gerganov 2025-09-18 12:33:45 +03:00
e00f3fd8ff metal : avoid call free for non-owned buffer (#16067) Jhen-Jie Hong 2025-09-18 15:06:48 +08:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full