enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

f313d06f2c update instructions main b7003-full Lu Xinlong 2025-11-18 14:20:13 +08:00
b8595b16e6 mtmd : fix embedding size for image input (#17123) Georgi Gerganov 2025-11-09 18:31:02 +02:00
392e09a608 vulkan: fix memory allocations (#17122) Ruben Ortlam 2025-11-09 16:14:41 +01:00
802cef44bf convert : parse safetensors directly (#15667) compilade 2025-11-09 09:49:40 -05:00
1c07c0c68c convert : handle compressed-tensors quant method (#17069) compilade 2025-11-09 09:45:50 -05:00
cb1adf8851 server : handle failures to restore host cache (#17078) Georgi Gerganov 2025-11-09 14:27:05 +02:00
ef1d826997 benches : add folder with benchmarks (#16931) Georgi Gerganov 2025-11-09 12:53:29 +02:00
86fde91e62 Switch to using Ubuntu 25.10 vulkan/mesa (#16497) Eric Curtin 2025-11-09 09:25:38 +00:00
7f3e9d339c vulkan: iGPU memory reporting fix (#17110) Ruben Ortlam 2025-11-09 09:54:47 +01:00
8a3519b708 vulkan: fix mmq out of bounds reads (#17108) Ruben Ortlam 2025-11-09 09:52:57 +01:00
80a6cf6347 vulkan: fuse mul_mat_id + mul (#17095) Jeff Bolz 2025-11-09 02:48:42 -06:00
0750a59903 metal : retain src and dst buffers during async ops (#17101) Georgi Gerganov 2025-11-09 08:28:51 +02:00
aa3b7a90b4 arg: add --cache-list argument to list cached models (#17073) Xuan-Son Nguyen 2025-11-08 21:54:14 +01:00
333f2595a3 webui: fix keyboard shortcuts for new chat & edit chat title (#17007) chansikpark 2025-11-08 14:52:35 -05:00
53d7d21e61 vulkan: Use spec constants for conv2d s/d/p and kernel W/H (#16978) Jeff Bolz 2025-11-08 13:24:29 -06:00
eeee367de5 server: fix correct time_ms calculation in prompt_progress (#17093) Aidan 2025-11-08 13:12:11 +00:00
64fe17fbb8 Revert "CUDA: add expert reduce kernel (#16857)" (#17100) Aman Gupta 2025-11-08 21:05:19 +08:00
c1b187688d CUDA: skip fusion for repeating adds in bias (#17080) Aman Gupta 2025-11-08 16:58:05 +08:00
b8a5cfd11a vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (#16636) SavicStefan 2025-11-08 09:28:22 +01:00
08416ebe7f ggml: disable vxe for cross-compilation by default (#16966) Aleksei Nikiforov 2025-11-08 09:00:20 +01:00
b4e335d8dc vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977) Jeff Bolz 2025-11-08 01:52:15 -06:00
d6fe40fa00 vulkan: Fix test-thread-safety crashes (#17024) Jeff Bolz 2025-11-08 01:39:45 -06:00
e14e842e87 CUDA: fix MMQ stream-k fixup ne1 indices (#17089) Johannes Gäßler 2025-11-08 08:26:18 +01:00
647b960bd8 ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) Reese Levine 2025-11-07 19:27:20 -08:00
299f5d782c CUDA: properly handle nb00=nb02 case for cpy (#17081) bssrdf 2025-11-07 17:41:58 -05:00
ac76d36201 vulkan : refactor buffer handling in vk_op_f32 (#16840) Acly 2025-11-07 21:08:50 +01:00
6515610506 CUDA: fix should_use_mmvf for ne11 == 1 (#17085) Johannes Gäßler 2025-11-07 20:53:14 +01:00
7956bb4d7f bench : cache the llama_context state at computed depth (#16944) Georgi Gerganov 2025-11-07 21:23:11 +02:00
9008027aa3 hparams : add n_embd_inp() to support extended embed (#16928) Sigbjørn Skjæret 2025-11-07 19:27:58 +01:00
16bcc1259d kv-cache : pad the cache size to 256 for performance (#17046) Georgi Gerganov 2025-11-07 20:03:25 +02:00
9eb9a1331d Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084) Adrien Gallouët 2025-11-07 17:34:05 +01:00
7c23f3f0d4 ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239) iron 2025-11-08 00:18:14 +08:00
8c0d6bb455 server : print the samplers chain for each request (#17070) Georgi Gerganov 2025-11-07 12:24:47 +02:00
5c9a18e674 common: move download functions to download.(cpp|h) (#17059) Xuan-Son Nguyen 2025-11-07 11:23:34 +01:00
7f09a680af ggml-cpu : optimize RVV q2_k and q3_k kernels (#16887) xctan 2025-11-07 00:12:45 +08:00
aa374175c3 CUDA: fix crash on uneven context without FA (#16988) Johannes Gäßler 2025-11-06 14:05:47 +01:00
5b180c3d60 metal : initial Metal4 tensor API support (#16634) Georgi Gerganov 2025-11-06 14:45:10 +02:00
b7f9010d24 server : disable checkpoints with mtmd (#17045) Georgi Gerganov 2025-11-06 12:09:29 +02:00
4882f0ff78 clip: implement minicpm-v sinusoidal embd using GGML (#17036) Xuan-Son Nguyen 2025-11-06 11:02:54 +01:00
9d7c518d64 sycl: add CONCAT operator support (#16047) YehuditE 2025-11-06 12:02:33 +02:00
22c8c3c6ad docs: explain CUDA 11 compilation [no ci] (#16824) Johannes Gäßler 2025-11-06 08:14:35 +01:00
6db3d1ffe6 ggml-hexagon: graceful fallback for older socs where rpcmem_alloc2 and FASTRPC_GET_URI is unsupported (#16987) l3utterfly 2025-11-06 13:46:38 +08:00
230d1169e5 improve CUDA cpy memory bandwidth when copying transposed tensor (#16841) bssrdf 2025-11-05 15:55:04 -05:00
a44d77126c vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (#16919) Jeff Bolz 2025-11-05 12:51:03 -06:00
5886f4f545 examples(gguf): GGUF example outputs (#17025) Gabe Goodhart 2025-11-05 10:58:16 -07:00
92bb84f775 mtmd: allow QwenVL to process larger image by default (#17020) Xuan-Son Nguyen 2025-11-05 14:26:49 +01:00
13b339bcd9 server : do not default to multiple slots with speculative decoding (#17017) Georgi Gerganov 2025-11-05 14:32:55 +02:00
2f0c2db43e mtmd: improve struct initialization (#16981) Xuan-Son Nguyen 2025-11-05 11:26:37 +01:00
fd2f84f468 docs: Clarify the endpoint that webui uses (#17001) 손희준 2025-11-05 19:20:28 +09:00
9f052478c2 model : add openPangu-Embedded (#16941) Li Pengzhan 2025-11-05 17:28:58 +08:00
03ea04175d ggml webgpu: minor set rows optimization (#16810) Reese Levine 2025-11-05 01:27:42 -08:00
cdabeb2c27 sync : ggml Georgi Gerganov 2025-11-04 20:44:18 +02:00
852ce5180a ggml : fix conv2d_dw SVE path (ggml/1380) Georgi Gerganov 2025-11-04 20:40:52 +02:00
9aa63374f2 CUDA: update ops.md (#17005) mnehete32 2025-11-05 08:31:15 +05:30
5e90233bdb opencl: update doc (#17011) lhez 2025-11-04 16:02:36 -08:00
a5c07dcd7b refactor: replace sprintf with snprintf for safer string handling in dump functions (#16913) nullname 2025-11-05 04:25:39 +08:00
ad51c0a720 vulkan: remove the need for the dryrun (#16826) Jeff Bolz 2025-11-04 13:28:17 -06:00
66d8eccd42 server : do context shift only while generating (#17000) Georgi Gerganov 2025-11-04 19:21:36 +02:00
afd353246d readme : update hot topics (#17002) Georgi Gerganov 2025-11-04 17:21:31 +02:00
cc98f8d349 ggml-cpu : bicubic interpolation (#16891) Acly 2025-11-04 13:12:20 +01:00
d945834366 ci : apply model label to models (#16994) Sigbjørn Skjæret 2025-11-04 12:29:39 +01:00
b164259bba chore : fix models indent after refactor (#16992) Sigbjørn Skjæret 2025-11-04 12:29:15 +01:00
1f5accb8d0 Fix garbled output with REPACK at high thread counts (#16956) Noah 2025-11-04 05:04:59 +00:00
2759ccdb4a CUDA: avoid mul + bias fusion when doing fusion (#16935) Aman Gupta 2025-11-04 10:53:48 +08:00
c5023daf60 opencl: support imrope (#16914) lhez 2025-11-03 11:47:57 -08:00
e7da30b584 fix: Viewing multiple PDF attachments (#16974) Aleksander Grygier 2025-11-03 18:53:26 +01:00
ed8aa63320 model-conversion : pass config to from_pretrained (#16963) Daniel Bevenius 2025-11-03 18:01:59 +01:00
48bd26501b server : add props.model_alias (#16943) Georgi Gerganov 2025-11-03 15:38:23 +02:00
622cd010ff ggml: CUDA: add head size 72 for flash-attn (#16962) theo77186 2025-11-03 14:29:11 +01:00
070ff4d535 mtmd: add --image-min/max-tokens (#16921) Xuan-Son Nguyen 2025-11-03 11:11:18 +01:00
bf7b0c9725 mtmd: pad mask for qwen2.5vl (#16954) Xuan-Son Nguyen 2025-11-03 10:25:55 +01:00
fcfce040e8 ggml : LoongArch fixes (#16958) Jinyang He 2025-11-03 14:40:02 +08:00
ee3a5a10ad sync: minja (glm 4.6 & minmax m2 templates) (#16949) Olivier Chafik 2025-11-03 05:33:56 +00:00
7e994168b1 SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (#16869) shani-f 2025-11-03 03:35:33 +02:00
bcfa87622a feat(webui): improve LaTeX rendering with currency detection (#16508) Sascha Rogmann 2025-11-03 00:41:08 +01:00
a2054e3a8f test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936) Shagun Bera 2025-11-03 04:40:30 +05:30
dd52868050 ci : disable failing riscv cross build (#16952) Sigbjørn Skjæret 2025-11-02 23:11:21 +01:00
6b9a52422b model: add Janus Pro for image understanding (#16906) Zhiyong Wang 2025-11-02 13:08:04 -08:00
2f966b8ed8 clip : use FA (#16837) Georgi Gerganov 2025-11-02 22:21:48 +02:00
cd5e3b5754 server : support unified cache across slots (#16736) Georgi Gerganov 2025-11-02 18:14:04 +02:00
87c9efc3b2 common : move gpt-oss reasoning processing to init params (#16937) Aldehir Rojas 2025-11-02 08:56:28 -06:00
76af40aaaa docs: remove llama_sampler_accept reference in sampling sample usage (#16920) Adrian Lundberg 2025-11-02 10:28:37 +01:00
7db35a7958 CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (#16917) mnehete32 2025-11-02 08:42:57 +05:30
a864132ba5 devops: fix failing s390x docker build (#16918) Aaron Teo 2025-11-02 08:48:46 +08:00
d38d9f0877 ggml: add s390x cpu-feats (#16774) Aaron Teo 2025-11-02 08:48:23 +08:00
7fd205a8e8 scripts : add script to bench models (#16894) Georgi Gerganov 2025-11-02 00:15:31 +02:00
2f68ce7cfd webui: auto-refresh /props on inference start to resync model metadata (#16784) Pascal 2025-11-01 19:49:51 +01:00
e4a71599e5 webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe (#16757) Pascal 2025-11-01 17:14:54 +01:00
dd5e8cab51 vendor : update cpp-httplib to 0.27.0 (#16846) Adrien Gallouët 2025-11-01 16:52:17 +01:00
cf659bbb8e mtmd: refactor preprocessing + support max/min pixels (#16878) Xuan-Son Nguyen 2025-11-01 15:51:36 +01:00
d8b860a219 Add a setting to display message generation statistics (#16901) Aleksander Grygier 2025-11-01 15:35:57 +01:00
1ae74882f8 webui: recognize AsciiDoc files as valid text files (#16850) Jaromír Hradílek 2025-11-01 15:02:57 +01:00
961660b8c3 common : allow --system-prompt-file for diffusion-cli (#16903) Sigbjørn Skjæret 2025-11-01 11:01:42 +01:00
74fef4129f codeowners : update after refactor (#16905) Sigbjørn Skjæret 2025-11-01 08:55:25 +01:00
5d8bb900bc vulkan: Fix multi_add invalid descriptor usage (#16899) Jeff Bolz 2025-11-01 00:52:14 -05:00
2e76e01360 vulkan: fuse mul_mat+add and mul_mat_id+add_id (#16868) Jeff Bolz 2025-11-01 00:45:28 -05:00
d3dc9dd898 CUDA: Remove unneded bias/gate dims in fused mmvq (#16858) Oliver Simons 2025-11-01 06:13:26 +01:00
bea04522ff refactor : llama-model.cpp (#16252) Piotr Wilkin (ilintar) 2025-10-31 23:40:23 +01:00
0de0a01576 model : Minimax M2 (#16831) Piotr Wilkin (ilintar) 2025-10-31 21:20:47 +01:00
e58d585604 model : add Granite Hybrid nano types (#16896) Giuseppe Scrivano 2025-10-31 21:20:07 +01:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full