enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

f2f28380ea metal : handle nil cv during pipeline creation (#16065) Georgi Gerganov 2025-09-18 10:03:24 +03:00
62c3b645c5 CANN: Remove print (#16044) Chenguang Li 2025-09-18 09:26:33 +08:00
d304f459d8 GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018) Reese Levine 2025-09-17 13:09:40 -07:00
0320ac5264 metal : refactor + optimize v2 (#15995) Georgi Gerganov 2025-09-17 20:38:12 +03:00
a7a98e0fff SvelteKit-based WebUI (#14839) Aleksander Grygier 2025-09-17 19:29:13 +02:00
8f8f2274ee convert : add Llama4ForCausalLM (#16042) Xuan-Son Nguyen 2025-09-18 00:18:21 +07:00
c959b676be CUDA: fix FA occupancy, optimize tile kernel (#15982) Johannes Gäßler 2025-09-17 15:32:42 +02:00
cd08fc3ecc common : Fix corrupted memory error on json grammar initialization (#16038) David Ribeiro Alves 2025-09-17 01:08:02 -07:00
cb5bb6cc05 vulkan: automatically remove unsupported devices (#15976) Eve 2025-09-17 07:35:37 +00:00
a91d035b90 ci : revert back to macos-13 for macOS-latest-cmake-x64 (#16040) Daniel Bevenius 2025-09-17 09:34:09 +02:00
745cbcf2fe llama-quant : fix the verification of attention layers for encoder-decoder models (#16023) Jie Fu (傅杰) 2025-09-17 15:30:55 +08:00
1cbd80f8cf examples : support encoder-decoder models in the simple example (#16002) Jie Fu (傅杰) 2025-09-17 15:29:00 +08:00
85286f3548 model : add OLMo3 support (#16015) Shane A 2025-09-17 00:01:58 -07:00
d5fabe3682 CANN: Optimize ggml_cann_set_device (#15935) Chenguang Li 2025-09-17 14:33:08 +08:00
8ff206097c llama-bench: add --n-cpu-moe support (#15952) jacekpoplawski 2025-09-16 16:17:08 +02:00
77475530b8 ci : use macos-latest for arm64 webgpu build (#16029) Daniel Bevenius 2025-09-16 15:27:52 +02:00
3913f8730e ggml : fix padding in timestep embedding kernels (#15932) Daniel Bevenius 2025-09-16 15:25:57 +02:00
76888d202e ci : upload xcframework artifact from ios-xcode-build job (#16010) Daniel Bevenius 2025-09-16 13:41:38 +02:00
f1fbffb5c0 fix: apply clang-format to CUDA macros (#16017) Bowen Han 2025-09-15 23:59:19 -07:00
51abc96bdc ci : update macos-latest* jobs to use macos-latest (#15938) Daniel Bevenius 2025-09-16 05:57:16 +02:00
07808ebb07 cmake : Do not install tools on iOS targets (#15903) Yuri Khrustalev 2025-09-15 22:54:44 -04:00
6d758839ff Add LLaDA-7b-MoE diffusion model (#16003) Aman Gupta 2025-09-16 10:38:28 +08:00
3d4053f77f CUDA: fix im2col_3d to respect non-contiguous inputs (views) (#15956) Jake Karnes 2025-09-15 16:28:31 -06:00
dc381aa9a6 docker : enable rocWMMA in ROCm images, add gfx1151 (#15997) Diego Devesa 2025-09-15 14:38:52 -07:00
10d197409b releases : switch to rocWMMA develop branch, add gfx1151 (#15992) Diego Devesa 2025-09-15 14:38:42 -07:00
b907255f4b SYCL: Add COUNT_EQUAL operator support (#15991) yael-works 2025-09-15 19:51:35 +03:00
28c39da7c6 llama-run: Fix model download on Windows (#15988) Nikolay Popov 2025-09-15 13:08:30 +03:00
106220562a CUDA: some micro-optimizations in mmf.cuh for mul_mat_id (#15926) Aman Gupta 2025-09-15 17:35:11 +08:00
a68f31edd7 fix KLD percentile output (#15999) ddh0 2025-09-15 02:54:57 -05:00
b8e09f08b9 model : add grok-2 support (#15539) Sigbjørn Skjæret 2025-09-14 23:00:59 +02:00
6c019cb04e server : only attempt to enable thinking if using jinja (#15967) Sigbjørn Skjæret 2025-09-14 21:17:04 +02:00
9dcd200d57 metal : remove memory pools (#15966) Georgi Gerganov 2025-09-14 22:02:32 +03:00
0fa154e350 rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD Radeon RX 9000 series (#15994) Adam 2025-09-15 04:43:54 +10:00
261e6a20ff Vulkan: Clean up mul_mm shader (#15987) Ruben Ortlam 2025-09-14 16:56:28 +02:00
a0e13dcbe5 build: fix the build failures of Windows HIP release job (#15984) lcy 2025-09-14 22:20:35 +08:00
a14bd35014 metal : fix kernel requirements (#15983) Georgi Gerganov 2025-09-14 15:33:22 +03:00
918b26f197 rpc : fix regression when --device is used (#15981) Radoslav Gerganov 2025-09-14 12:28:18 +03:00
9ecb884346 releases : update ROCM, add gfx1200, gfx1201, gfx1151 (#15972) Diego Devesa 2025-09-14 02:21:59 -07:00
d1c6f11f47 doc : update documentation for --tensor-split (#15980) Radoslav Gerganov 2025-09-14 12:10:07 +03:00
6380d6a3e7 ggml-zdnn: rm user mapped buffers (#15965) Aaron Teo 2025-09-14 13:37:03 +08:00
aa0c461efe vulkan: fix failing dequant shaders (#15862) Jeff Bolz 2025-09-13 16:29:43 +01:00
b9c9c9f789 vulkan: initialize vulkan-hpp to allow using extension function pointers (#15705) Jeff Bolz 2025-09-13 16:23:30 +01:00
50f4281a6f llama : allow using iGPUs with --device (#15951) Diego Devesa 2025-09-13 07:49:49 -07:00
55758b00ca metal : refactor kernel loading (#15964) Georgi Gerganov 2025-09-13 16:24:22 +03:00
f161463a54 metal : allow ops to run concurrently (#15929) Georgi Gerganov 2025-09-13 13:54:28 +03:00
84d7b2fca1 metal : fix memory leaks (#15962) Georgi Gerganov 2025-09-13 12:45:04 +03:00
40be51152d ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free (#15839) Aaron Teo 2025-09-13 02:39:52 +08:00
4bf5549269 Add docker protocol support for llama-server model loading (#15790) Eric Curtin 2025-09-12 16:31:50 +01:00
f4e664f838 context : remove redundant explicit casting to the same type (#15948) Haiyue Wang 2025-09-12 23:16:32 +08:00
f088b6a84f server : adjust prompt similarity thold + add logs (#15913) Georgi Gerganov 2025-09-12 17:02:55 +03:00
304ac5693d Vulkan iGPU device selection overhaul and PCI ID API support (#15947) Ruben Ortlam 2025-09-12 13:24:21 +02:00
6c88ad8fa7 vulkan: Make device memory check more portable (#15939) Mathieu Baudier 2025-09-12 09:06:20 +02:00
704d90c987 Revert "sycl: add usage of enqueue_functions extension (#14244)" (#15910) Neo Zhang Jianyu 2025-09-12 09:15:12 +08:00
360d6533db ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797) Diego Devesa 2025-09-11 13:47:38 -07:00
0e6ff0046f CUDA: larger SRAM reads for tile FA, AMD FP16 dot (#15927) Johannes Gäßler 2025-09-11 21:19:58 +02:00
df082f5630 nitpick : correct MB to MiB (#15934) ddh0 2025-09-11 12:12:34 -05:00
24a6734daf ggml-cpu : add check for ARM MATMUL_INT8/i8mm support (#15922) Daniel Bevenius 2025-09-11 15:39:12 +02:00
2b3efea9a4 kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed (#15614) Charles Xu 2025-09-11 12:45:40 +02:00
c0389dba43 CANN: Disable acl_graph for prefill stage (#15933) hipudding 2025-09-11 15:59:37 +08:00
00681dfc16 CUDA: Add fastdiv to k_bin_bcast*, giving 1-3% E2E performance (#15872) Oliver Simons 2025-09-10 22:04:03 +02:00
4f658855fa llama : support T5 models with unequal number of encoder-decoder layers (#15909) Jie Fu (傅杰) 2025-09-11 02:51:51 +08:00
6ab397e12b graph : support non-contiguous Q in build_attn_mha (#15908) Sigbjørn Skjæret 2025-09-10 19:08:59 +02:00
9de447d94e ggml-cpu : fix padding in ggml_timestep_embedding (#15917) Daniel Bevenius 2025-09-10 17:31:40 +02:00
0f0a3c2851 metal : make the backend async (#15906) Georgi Gerganov 2025-09-10 17:52:35 +03:00
33daece86b ci : add caching for ROCm installation in release workflow (#15924) Daniel Bevenius 2025-09-10 15:39:57 +02:00
e7b6d83b52 tests : filter out no-ops from coverage report (#15900) Daniel Bevenius 2025-09-10 14:17:09 +02:00
2cfef4d117 media : add transparent icon svg and png [no ci] (#15891) j-k 2025-09-10 12:51:28 +01:00
09e72a037c gitignore : Ignore vim swap files in tests (#15901) Jesse 2025-09-10 07:28:47 -04:00
10d8b2b6b0 CANN: Add ROPE sin/cos cache for reuse (#15912) Chenguang Li 2025-09-10 18:42:00 +08:00
28b5f190ef CANN: implement LRU cache for ACL graphs (#15814) Chenguang Li 2025-09-10 15:29:12 +08:00
86587da03b llama : check returned fn ptrs from ggml_backend_reg_get_proc_address (#15893) Daniel Bevenius 2025-09-10 05:33:58 +02:00
ff02caf9ee ci : cache ROCm installation in windows-latest-cmake-hip (#15887) Daniel Bevenius 2025-09-10 05:23:19 +02:00
ae355f6f71 vulkan: throw the oom error instead of no memory type found (#15905) Ruben Ortlam 2025-09-09 22:26:03 +02:00
4f63cd705c vulkan: Fix OOB accesses in soft_max_back (#15861) Jeff Bolz 2025-09-09 07:41:15 -05:00
17bc5a815f HIP: use v_dot2_f32_f16 instruction for FA (#15884) Johannes Gäßler 2025-09-09 14:04:43 +02:00
ed54e32558 Workaround for subgroup arithmetic failing on MoltenVK with AMD GPUs (issue 15846) (#15886) lksj92hs 2025-09-09 15:01:15 +03:00
a972faebed CUDA: Add mul_mat_id support for the mmf kernel (#15767) Aman Gupta 2025-09-09 14:38:02 +08:00
550cf726e1 CUDA: fix GET_ROWS for large tensors (#15882) Johannes Gäßler 2025-09-09 08:11:01 +02:00
c252ce67c4 contrib : add notes about merging PRs (#15881) Georgi Gerganov 2025-09-09 08:42:10 +03:00
70cd37dbbe requirements : update transformers/torch for Embedding Gemma (#15828) Daniel Bevenius 2025-09-09 06:06:52 +02:00
acc1b008cf model-conversion : add extra debugging support for model conversion (#15877) Piotr Wilkin (ilintar) 2025-09-09 06:05:55 +02:00
7057faf64b json : support enum values within allOf (#15830) Aldehir Rojas 2025-09-08 16:14:32 -05:00
fe1c92cd7b media : add llama1 icon (#15878) j-k 2025-09-08 19:57:01 +01:00
e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850) Jeff Bolz 2025-09-08 13:10:07 -05:00
0a16bf52e6 CUDA: generate_cu_files.py - add missing mxfp4 (#15880) Aman Gupta 2025-09-09 01:23:46 +08:00
88021565f0 chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533) Jesse 2025-09-08 10:59:48 -04:00
56920f5665 server : bring back timings_per_token (#15879) Xuan-Son Nguyen 2025-09-08 21:50:05 +07:00
b0d52998b9 cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868) Georgi Gerganov 2025-09-08 13:56:51 +03:00
f28d4f4ac9 metal : refactor + optimize (#15857) Georgi Gerganov 2025-09-08 13:34:56 +03:00
9fcb29f22f ggml: allow casting between f32 and i32 (#15783) Xuan-Son Nguyen 2025-09-08 17:33:01 +07:00
5ef22d281d CUDA: non-contiguous src0 not supported for PAD (#15869) Sigbjørn Skjæret 2025-09-08 11:55:44 +02:00
233d773d02 convert : force setting sliding_window from original config (#15867) Daniel Bevenius 2025-09-08 09:44:34 +02:00
a885dcff11 batched-bench : fix llama_synchronize usage during prompt processing (#15835) Georgi Gerganov 2025-09-08 10:27:07 +03:00
663027fd54 context : fix n_outputs during reserve (#15858) Georgi Gerganov 2025-09-08 10:26:36 +03:00
cf0e3ba150 model : avoid ggml_cont_3d for fused QKV weights (#15662) Georgi Gerganov 2025-09-08 10:25:33 +03:00
d413dca003 tests: large sizes for get_rows (#15687) Jeff Bolz 2025-09-07 23:23:41 -05:00
85ca66a746 CANN: Stream sync between devices for acl_graph (#15809) Chenguang Li 2025-09-08 10:03:29 +08:00
3976dfbe00 vulkan: support im2col_3d (#15795) Jeff Bolz 2025-09-07 13:50:26 -05:00
d36e61c580 ggml-cpu: clean up s390x SIMD (#15855) Aaron Teo 2025-09-08 02:18:28 +08:00
c97b5e5854 vulkan: Support pad_ext (#15794) Jeff Bolz 2025-09-07 12:00:49 -05:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full