enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

67f09a3a27 musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (#15413) R0CKSTAR 2025-08-19 18:33:47 +08:00
6424594c56 ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (#15385) Marvin Gießing 2025-08-19 10:54:31 +02:00
e9288e8869 chat : clarify the meaning of reasoning_format (#15408) Xuan-Son Nguyen 2025-08-19 10:29:36 +02:00
9d262f4bad server : remove swa_full warning (#15399) Georgi Gerganov 2025-08-19 08:45:26 +03:00
f0d3c7405c batched-bench : use rand tokens (#15398) Georgi Gerganov 2025-08-19 08:45:12 +03:00
f08c4c0d8d mtmd : clean up clip_n_output_tokens (#15391) Xuan-Son Nguyen 2025-08-18 22:53:52 +02:00
6d7f1117e3 codeowners : remove mmv.* Georgi Gerganov 2025-08-18 22:02:50 +03:00
60212f1ead sync : ggml Georgi Gerganov 2025-08-18 22:02:11 +03:00
f0c541d315 scripts : update sync scripts Georgi Gerganov 2025-08-18 20:35:47 +03:00
baa9255a45 llama : merge conts and reshapes and remove unnecessary cont (#15380) Sigbjørn Skjæret 2025-08-18 19:30:17 +02:00
3007baf201 readme : update hot topics (#15397) Georgi Gerganov 2025-08-18 18:11:44 +03:00
d1d8241600 server : fix incoming tasks not process in order (#15395) davidef 2025-08-18 16:51:42 +02:00
618575c582 Fix broken build: require updated pip to support --break-system-packages (#15357) Dobri Danchev 2025-08-18 05:50:48 -05:00
f44f793172 ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (#15379) compilade 2025-08-18 03:23:56 -04:00
ae532eac2c vulkan: disable spirv-opt for bfloat16 shaders (#15352) Jeff Bolz 2025-08-18 00:56:29 -05:00
e5155e6986 server : export max observed n_past value (#15361) Oleksandr Kuvshynov 2025-08-17 18:28:58 -04:00
21c17b5bef vulkan: Use larger workgroups for mul_mat_vec when M is small (#15355) Jeff Bolz 2025-08-17 11:08:57 -05:00
19f4decae0 vulkan: support sqrt (#15370) Dong Won Kim 2025-08-17 23:03:09 +09:00
4d196981d4 convert : force patch_embd weights to F16 or F32 to avoid broken GGUFs (#15367) Sigbjørn Skjæret 2025-08-17 14:47:42 +02:00
b143fbc87a ci : fix hang in windows-hip build/release (#15365) Sigbjørn Skjæret 2025-08-17 13:30:23 +02:00
de5627910d vulkan: Optimize argsort (#15354) Jeff Bolz 2025-08-17 03:41:45 -05:00
65349f26f2 model : support vision LiquidAI LFM2-VL family (#15347) Tarek Dakhran 2025-08-16 23:33:54 +02:00
1fe00296f5 vulkan: fuse adds (#15252) Jeff Bolz 2025-08-16 11:48:22 -05:00
de2192794f vulkan: Support mul_mat_id with f32 accumulators (#15337) Jeff Bolz 2025-08-16 04:18:31 -05:00
2e2b22ba66 vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#15334) Jeff Bolz 2025-08-16 03:58:38 -05:00
912ff8c119 OpenCL: add initial FA support (#14987) rmatif 2025-08-16 10:05:55 +02:00
5e6229a840 common : fix double bos, use common_chat_templates for add_bos and add_eos (#15326) Daniel Bevenius 2025-08-15 19:50:52 +02:00
e2c1bfff53 opencl: add initial mxfp4 support via mv (#15270) lhez 2025-08-16 00:52:14 +08:00
5edf1592fd vulkan : fix out-of-bounds access in argmax kernel (#15342) Georgi Gerganov 2025-08-15 17:16:36 +03:00
db3010bd23 vulkan : fix compile warnings on macos (#15340) Georgi Gerganov 2025-08-15 16:28:28 +03:00
ff27f80a74 ggml: initial IBM zDNN backend (#14975) Aaron Teo 2025-08-15 21:11:22 +08:00
d3248d9b65 ci : fix ios-xcode-build (#15324) Sigbjørn Skjæret 2025-08-15 14:02:39 +02:00
7aeee88cfe ci : move ccache action to ggml-org fork (#15328) Diego Devesa 2025-08-15 03:27:02 -07:00
b07791aa1d test-opt: fix backend support check (#15317) Johannes Gäßler 2025-08-15 11:23:17 +02:00
4227c9be42 CUDA: fix negative KV_max values in FA (#15321) Johannes Gäßler 2025-08-14 23:21:24 +02:00
df36bce667 eval-callback : stop on first NaN (#15320) Georgi Gerganov 2025-08-14 22:10:51 +03:00
f75b830647 chat : include kwargs in template example (#15309) Diego Devesa 2025-08-14 10:28:29 -07:00
7a0de96045 llama : add 18-layer model type for Gemma 3-270m (#15319) Daniel Bevenius 2025-08-14 17:56:26 +02:00
e4e915912c devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 (#15005) simevo 2025-08-14 17:45:27 +02:00
5ba36f6103 HIP: Cleanup hipification header (#15285) uvos 2025-08-14 16:23:56 +02:00
b204a5a234 gpt-oss: implement harmony parsing (#15181) Aldehir Rojas 2025-08-14 09:23:11 -05:00
646944cfa8 docker : Enable GGML_CPU_ALL_VARIANTS for ARM (#15267) Christian Kastner 2025-08-14 16:22:58 +02:00
1a01899b61 readme : update hot topics (#15315) Georgi Gerganov 2025-08-14 17:16:03 +03:00
863d341eeb vulkan: perf_logger improvements (#15246) Jeff Bolz 2025-08-14 08:38:10 -05:00
d32e03f449 server : add SWA checkpoints (#15293) Georgi Gerganov 2025-08-14 14:59:50 +03:00
3973163bff sync : ggml Georgi Gerganov 2025-08-14 14:19:23 +03:00
5ade3000bd ggml: fix ggml_conv_1d_dw bug (ggml/1323) Jason Ni 2025-08-14 19:17:51 +08:00
8b2483730f tests : remove unused includes (ggml/0) Georgi Gerganov 2025-08-14 13:41:03 +03:00
810b9fc8b9 perplexity : provide a helpful hint for has_cpl case in split_equal error. (#15304) kallewoof 2025-08-14 20:03:30 +09:00
4ebd0c125b cuda : fix GGML_CUDA_GRAPHS=OFF (#15300) Sigbjørn Skjæret 2025-08-14 12:22:07 +02:00
5cdb27e091 finetune: SGD optimizer, more CLI args (#13873) Jonathan Graehl 2025-08-14 03:03:57 -07:00
3ea913f1ce perplexity: give more information about constraints on failure (#15303) kallewoof 2025-08-14 15:16:32 +09:00
29c8fbe4e0 HIP: bump requirement to rocm 6.1 (#15296) uvos 2025-08-13 20:44:30 +02:00
1adc9812bd fix(nix): remove non-functional llama-cpp cachix cache from flake.nix (#15295) Bas Nijholt 2025-08-13 11:21:31 -07:00
b3e16665e1 server : enable -td and -tbd parameters (#15172) Sigbjørn Skjæret 2025-08-13 15:43:00 +02:00
c24f4e2688 ggml : update ggml_rope_multi (#12665) Judd 2025-08-13 18:45:15 +08:00
d8914fc47e common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191) Copilot 2025-08-13 12:44:40 +02:00
e885445bc1 server : filter out harmony thought messages (#15278) Aldehir Rojas 2025-08-13 05:28:21 -05:00
648ebcdb73 ci : Added CI with RISC-V RVV1.0 Hardware (#14439) Ali Tariq 2025-08-13 15:14:44 +05:00
07aa869a91 ci : add more python requirements to copilot-setup-steps (#15289) Sigbjørn Skjæret 2025-08-13 11:30:45 +02:00
00f35d509e ggml : repack block_iq4_nlx8 (#14904) Georgi Gerganov 2025-08-13 11:09:39 +03:00
6028bf7435 CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (#15132) Oliver Simons 2025-08-13 10:04:46 +02:00
bc5182272c ci : add copilot-setup-steps.yml (#15214) Sigbjørn Skjæret 2025-08-13 09:07:13 +02:00
e71d48e326 ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) (#15188) Tak-RS 2025-08-13 14:54:30 +09:00
b0493156fa HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (#15273) uvos 2025-08-12 22:15:12 +02:00
f4586ee598 sycl: Fix and disable more configurations of mul_mat (#15151) Romain Biessy 2025-08-12 13:58:22 +02:00
60a7658810 opencl: allow mixed f16/f32 add (#15140) rmatif 2025-08-12 11:42:41 +02:00
efe3a90996 CUDA cmake: add -lineinfo for easier debug (#15260) Aman Gupta 2025-08-12 17:21:45 +08:00
bbd57b7eaf CANN: GGML_OP_CPY optimization (#15070) Chenguang Li 2025-08-12 16:12:13 +08:00
25ff6f7659 musa: fix failures in test-backend-ops for mul_mat_id op (#15236) R0CKSTAR 2025-08-12 10:02:51 +08:00
be48528b06 CANN: Add broadcast for softmax and FA (#15208) hipudding 2025-08-11 22:50:31 +08:00
cf9e5648a7 mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750) rainred 2025-08-11 22:12:12 +08:00
fba5c0d680 chat : hotfix gpt-oss jinja raising an exception (#15243) Xuan-Son Nguyen 2025-08-11 15:31:35 +02:00
53d0a12658 server : allow specifying reasoning_format in HTTP request (#15238) Xuan-Son Nguyen 2025-08-11 14:48:41 +02:00
27093afe78 readme : update infra list (#15234) Zagaj 2025-08-11 14:27:54 +02:00
228f724d9c kv-cache : fix seq_rm with seq_id == -1 (#15226) Georgi Gerganov 2025-08-11 13:58:24 +03:00
cd3069dfcb kv-cache : log (debug) all streams in find_slot (#15176) Daniel Bevenius 2025-08-11 11:21:19 +02:00
50e81bdf5d convert : fix merge conflicts (#15229) Sigbjørn Skjæret 2025-08-11 11:15:44 +02:00
1ebbaddff2 perplexity : update comments/error msg to use decode [no ci] (#15227) Daniel Bevenius 2025-08-11 10:21:24 +02:00
a3a7874272 convert : improve Mistral models integration (#14737) Julien Denize 2025-08-11 10:07:49 +02:00
002cb1bb33 kleidiai: fix unsigned overflow bug (#15150) Charles Xu 2025-08-11 09:59:26 +02:00
79c1160b07 cuda: refactored ssm_scan and use CUB (#13291) David Zhao 2025-08-09 13:29:43 -05:00
34c9d765bf CUDA: add attention sinks for tile and wmma (#15178) Aman Gupta 2025-08-09 20:00:24 +08:00
e54d41befc gguf-py : add Numpy MXFP4 de/quantization support (#15111) compilade 2025-08-08 17:48:26 -04:00
4850b52aed server-bench: external OAI servers, sqlite (#15179) Johannes Gäßler 2025-08-08 23:04:36 +02:00
cd6983d56d ggml : fix field name when new ggml_backend (#14944) AN Long 2025-08-08 21:37:22 +09:00
6c7e9a5440 vendor: sync minja (#15161) Olivier Chafik 2025-08-08 10:45:18 +01:00
1425f587a8 CUDA: attention sinks for mma FlashAttention (#15157) Johannes Gäßler 2025-08-08 08:19:58 +02:00
aaa3d07ae7 opencl: support sink in soft_max (attn sinks) (#15152) lhez 2025-08-08 13:47:03 +09:00
50aa938901 convert : support non-mxfp4 HF model (#15153) Xuan-Son Nguyen 2025-08-07 23:26:03 +02:00
c4f53563df vulkan: support fattn sinks (#15126) Jeff Bolz 2025-08-07 15:44:20 -05:00
a0552c8bee vulkan: Add env var to disable host visible vidmem (#15109) Jeff Bolz 2025-08-07 15:07:11 -05:00
99acbc9921 llama : Support intern-s1 (#14875) RunningLeon 2025-08-08 00:20:40 +08:00
7ad67ba9fe HIP: add cmake option to enable compiler output of kernel resource usage metrics (#15103) uvos 2025-08-07 16:44:14 +02:00
9a96389544 ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) Christian Kastner 2025-08-07 13:45:41 +02:00
1d72c84188 CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131) Johannes Gäßler 2025-08-07 10:53:21 +02:00
20638e4f16 scripts: fix crash when --tool is not set (#15133) Johannes Gäßler 2025-08-07 08:50:30 +02:00
36d3f00e14 requirements : fix PyTorch uint64 compatibility (#15134) Daniel Bevenius 2025-08-07 05:31:48 +02:00
5fd160bbd9 ggml: Add basic SET_ROWS support in WebGPU (#15137) Reese Levine 2025-08-06 15:14:40 -07:00
756cfea826 fix profiling crash (#15072) rmatif 2025-08-06 23:17:51 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full