enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

267e99867f vulkan: Use larger loads in scalar/coopmat1 matmul (#15729) Jeff Bolz 2025-09-07 11:53:07 -05:00
3b15924d71 ggml WebGPU: remove userdata from request adapter callback (#15527) Daniel Bevenius 2025-09-07 10:19:45 +02:00
79bc429262 CUDA: faster tile FA (Pascal/AMD), headsize 256 (#15769) Johannes Gäßler 2025-09-07 00:26:28 +02:00
c4df49a42d kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16 (#15817) Charles Xu 2025-09-06 16:08:43 +02:00
3c3635d2f2 server : speed up tests (#15836) Xuan-Son Nguyen 2025-09-06 19:45:24 +07:00
61bdfd5298 server : implement prompt processing progress report in stream mode (#15827) Xuan-Son Nguyen 2025-09-06 18:35:04 +07:00
01806e7771 ggml-cpu: document use of "free" memory [no ci] (#15834) Johannes Gäßler 2025-09-06 13:28:44 +02:00
186415d595 ggml-cpu: drop support for nnpa intrinsics (#15821) Aaron Teo 2025-09-06 11:27:28 +08:00
fd621880f3 aLoRA Support (#15327) Gabe Goodhart 2025-09-05 17:32:39 -06:00
4281c7b315 ci : exempt correct research label (#15825) Sigbjørn Skjæret 2025-09-06 01:21:15 +02:00
5fac79cbc7 Thinking model disabled assistant prefill (#15404) Gabe Goodhart 2025-09-05 14:31:24 -06:00
408ff524b4 Implement --log-colors with always/never/auto (#15792) Eric Curtin 2025-09-05 19:43:59 +01:00
5143fa895e CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (#15802) Johannes Gäßler 2025-09-05 16:07:02 +02:00
3a550b5ca4 tests : add --list-ops and --show-coverage options (#15745) Daniel Bevenius 2025-09-05 14:49:21 +02:00
a81283820a gguf: gguf_writer refactor (#15691) Erik Scholz 2025-09-05 11:34:28 +02:00
c610b6c11b kv-cache : fix SWA checks + disable cacheless iSWA (#15811) Georgi Gerganov 2025-09-05 10:39:22 +03:00
5d6688de08 model-conversion : add --embeddings flag to modelcard.template [no ci] (#15801) Daniel Bevenius 2025-09-05 04:36:23 +02:00
4fd1242bef chat : fixed crash when Hermes 2 <tool_call> had a newline before it (#15639) ExtReMLapin 2025-09-05 01:24:08 +02:00
b2426e469e chat : nemotron thinking & toolcalling support (#15676) Piotr Wilkin (ilintar) 2025-09-05 01:22:22 +02:00
9e2b1e83c6 scripts : add Jinja tester PySide6 simple app (#15756) Piotr Wilkin (ilintar) 2025-09-05 01:05:12 +02:00
fb15d649ed llama : add support for EmbeddingGemma 300m (#15798) Daniel Bevenius 2025-09-04 18:10:29 +02:00
856ed0947f metal : Add template specialization for mul_mm_id w/ ne20 == 10 (#15799) Gabe Goodhart 2025-09-04 09:53:22 -06:00
d1e2adba65 llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (#15791) Daniel Bevenius 2025-09-04 15:40:44 +02:00
c1c354e44c CANN: Refactor ND to NZ workspace to be per-device (#15763) Chenguang Li 2025-09-04 20:20:14 +08:00
a68d914426 server: add exceed_context_size_error type (#15780) Xuan-Son Nguyen 2025-09-04 11:50:23 +02:00
badb80cadb Document the new max GPU layers default in help (#15771) Eric Curtin 2025-09-04 10:49:44 +01:00
0a1b3982cd ggml: add ops for WAN video model (cuda && cpu) (#15669) leejet 2025-09-04 16:38:49 +08:00
5421f63ab0 CANN: Fix precision issue on 310I DUO multi-devices (#15784) hipudding 2025-09-04 15:12:30 +08:00
820bc98531 opencl: add hs=40 to FA (#15758) rmatif 2025-09-04 08:30:28 +02:00
239b60e898 CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (#15760) Chenguang Li 2025-09-04 11:03:02 +08:00
dff7551bfd vulkan: fix mmv subgroup16 selection (#15775) Ruben Ortlam 2025-09-03 22:55:10 +02:00
0fce7a1248 vulkan: don't use std::string in load_shaders, to improve compile time (#15724) Jeff Bolz 2025-09-03 13:33:15 -05:00
8227695d7a vulkan : update ggml_vk_instance_validation_ext_available (#15666) Daniel Bevenius 2025-09-03 20:24:50 +02:00
0014fb4add ggml vulkan: add hardsigmoid and hardswish operations (#15762) Shin-myoung-serp 2025-09-04 03:22:55 +09:00
661ae31c9c CUDA: Optimize rms_norm_f32 kernel and its fused variants, giving 1-6% perf E2E (#15715) Oliver Simons 2025-09-03 19:59:16 +02:00
407c23786d model-conversion : fix pyright errors (#15770) Daniel Bevenius 2025-09-03 18:28:36 +02:00
cdedb70a99 sampling : optimize dist sampler (#15704) Georgi Gerganov 2025-09-03 18:16:26 +03:00
2c8dac72eb llama : fix incorrect model type for Gemma 270M (#15764) Daniel Bevenius 2025-09-03 13:35:49 +02:00
40a751ea9a model-conversion : remove hardcoded /bin/bash shebangs [no ci] (#15765) Daniel Bevenius 2025-09-03 12:50:47 +02:00
5eae934883 CANN: Add RoPE contiguous check for 310I DUP device (#15735) hipudding 2025-09-03 16:46:01 +08:00
05c0380f2a ggml-cpu : optimize RVV kernels (#15720) xctan 2025-09-03 16:16:21 +08:00
8c3fdf44ec model-conversion : add missing curl script [no ci] (#15761) Daniel Bevenius 2025-09-03 09:48:35 +02:00
f6da8cb86a CANN: Mask unsupported TRANSPOSE_1D operator (#15733) hipudding 2025-09-03 14:08:22 +08:00
8a2234ea0c CANN: Fix type float_t to float (#15736) Chenguang Li 2025-09-03 10:43:53 +08:00
3de008208b fix: resolve unsigned int initialization warning for n_dims/size in gguf.cpp (#15754) SnA1lGo 2025-09-03 03:27:30 +08:00
69db8a52e6 chore: Update .clang-format to use BinPackArguments=true (#15744) Oliver Simons 2025-09-02 19:40:37 +02:00
c466abe158 llama: -fa 1/0/-1 aliases for -fa on/off/auto (#15746) Johannes Gäßler 2025-09-02 18:17:26 +02:00
0a2a3841e8 vulkan: fix shaders gen when no integer dot is available (#15740) Ruben Ortlam 2025-09-02 16:02:26 +02:00
9961d244f2 CANN: Resolve soft_max precision issue (#15730) hipudding 2025-09-02 17:12:37 +08:00
25f1045f07 vulkan: Fix macro parameter order for f32 matmul shaders (#15716) Jeff Bolz 2025-09-02 01:37:01 -05:00
97669e4073 opencl: add attn sinks support for FA kernels (#15706) rmatif 2025-09-02 08:26:53 +02:00
2f853687b3 CANN: Support eager execution mode under ACL graph compilation (#15712) Chenguang Li 2025-09-02 14:07:48 +08:00
ef2af57ddf CANN: Support ext_factor in rope (#15710) hipudding 2025-09-02 14:05:23 +08:00
5d804a4938 ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722) Johannes Gäßler 2025-09-02 01:14:55 +02:00
d4d8dbe383 vulkan: use memory budget extension to read memory usage (#15545) Gilad S. 2025-09-01 22:17:42 +03:00
35a42edac8 vulkan: add missing clamps in new mul_mat_id paths (#15702) Jeff Bolz 2025-09-01 14:01:10 -05:00
fec7911f8f vulkan: disable large mmv subgroups on older Nvidia GPUs (#15717) Ruben Ortlam 2025-09-01 20:58:35 +02:00
078ce23ea7 ggml: SVE support for exponential functions (#15145) s-goto-11 2025-09-02 03:13:49 +09:00
a0c2b207c5 ggml: aarch64: Implement SVE F16 kernels for vector functions (#15115) Prashant Vithule 2025-09-01 23:43:16 +05:30
4b20d8b7e3 convert : remove redundant code (#15708) Jie Fu (傅杰) 2025-09-01 23:53:31 +08:00
02c1813517 Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants (#14903) Ruben Ortlam 2025-09-01 16:19:07 +02:00
77dee9de97 ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695) Daniel Bevenius 2025-09-01 14:28:49 +02:00
4795c91c32 docs : add Hunyuan to models section (#15707) Jie Fu (傅杰) 2025-09-01 15:34:59 +08:00
b66df9d9c9 CUDA: fix build error from ambiguous __half conversions in conv2d (#15690) Akarshan Biswas 2025-09-01 06:55:06 +05:30
b9382c3877 CANN: Optimize MUL_MAT_ID (#15658) hipudding 2025-09-01 08:57:23 +08:00
3dc7397a27 CANN: fix RoPE cache issue on multi-device (#15629) hipudding 2025-09-01 08:57:00 +08:00
e92d53b29e sampling : optimize samplers by reusing bucket sort (#15665) Georgi Gerganov 2025-08-31 20:41:02 +03:00
0d161f021a server : enable /slots by default and make it secure (#15630) Georgi Gerganov 2025-08-31 20:11:58 +03:00
4efd5a8316 metal : fix checks for available FA kernels (#15700) Georgi Gerganov 2025-08-31 19:43:30 +03:00
274966226f llama : fix fattn reserve call n_seqs parameter (#15699) Diego Devesa 2025-08-31 08:47:05 -07:00
9777032dcc llama : separate compute buffer reserve from fattn check (#15696) Diego Devesa 2025-08-31 06:49:03 -07:00
7d3c9f2b21 ci : explicitly set fa off or on (#15692) Sigbjørn Skjæret 2025-08-31 15:30:20 +02:00
bbbf5ecccb vulkan: handle large sizes for get_rows (#15686) Jeff Bolz 2025-08-31 03:13:27 -05:00
c37052ab4d vulkan: mul_mat_id coopmat2 optimizations (#15546) Jeff Bolz 2025-08-31 02:06:43 -05:00
5c16b9c87d vulkan : remove unused portability_enumeration_ext variable (#15679) Daniel Bevenius 2025-08-31 08:46:42 +02:00
b97c9edc59 vulkan: Allow fallback to sysmem memory when vidmem is full (#15649) Jeff Bolz 2025-08-31 01:30:54 -05:00
94e82c7ead vulkan: clamp matmul and FA results to the max finite value (#15652) Jeff Bolz 2025-08-31 01:27:57 -05:00
4d74393bcc ggml: update kleidiai to v1.13.0 (#15663) Charles Xu 2025-08-30 18:03:42 +02:00
dd892555b0 Update build.md to remove MSVC arm64 notes (#15684) Diego Devesa 2025-08-30 08:51:28 -07:00
e81b8e4b7f llama: use FA + max. GPU layers by default (#15434) Johannes Gäßler 2025-08-30 16:32:10 +02:00
38ad381f9f CUDA: use FP32 arithmetic for conv2d (#15683) Johannes Gäßler 2025-08-30 16:20:32 +02:00
696fccf354 vulkan: Skip syncing for prealloc_y when it is reused (#15544) Jeff Bolz 2025-08-30 04:11:22 -05:00
ef476916bb CANN: FIx compiler warnings (#15661) Chenguang Li 2025-08-30 10:18:35 +08:00
d82f6aa34a server : removed obsolete doc (#15670) Sergey Alirzaev 2025-08-30 00:12:53 +02:00
3d16b29c3b scripts: strip "AMD Instinct" from GPU name (#15668) Johannes Gäßler 2025-08-29 22:04:08 +02:00
792b44f2ed server : add documentation for parallel_tool_calls param (#15647) ExtReMLapin 2025-08-29 19:25:40 +02:00
81017865ee CUDA: fix bug in rms_norm fusion (#15660) Aman Gupta 2025-08-29 21:30:06 +08:00
60e5eee31f chat : Seed OSS thinking + tool call support (#15552) Piotr Wilkin (ilintar) 2025-08-29 14:53:41 +02:00
009b709d6e CUDA: fuse adds, fuse add with rms norm (#15631) Aman Gupta 2025-08-29 11:35:58 +08:00
e8d99dd0b6 nvidia nemotron nano v2 (nemotronh) (#15507) Gabe Goodhart 2025-08-28 18:39:31 -06:00
a8bca68f72 fix: Compute the full sum in llama-eval-callback, not just the sum of printed values (#15637) Gabe Goodhart 2025-08-28 15:27:36 -05:00
c97dc09391 CUDA: add conv2d (#15635) mnehete32 2025-08-29 00:03:03 +05:30
6c442f42ff ggml-cpu: fix invalid hsum build in debug s390x (#15634) Aaron Teo 2025-08-28 22:39:27 +08:00
73804145ab ggml : fix SSM_SCAN for n_groups > 1 (#15625) compilade 2025-08-28 10:11:36 -04:00
c8d0d14e77 kv-cache : fix find_slot to not search for continuous slot (#15638) Georgi Gerganov 2025-08-28 17:09:05 +03:00
84ab83cc0b model : jina-embeddings-v3 support (#13693) Sigbjørn Skjæret 2025-08-28 15:49:50 +02:00
55042b3692 scripts: add sqlite3 check for compare-commits.sh (#15633) Aman Gupta 2025-08-28 19:23:22 +08:00
8a4280ce43 kv-cache : remove LLAMA_SET_ROWS checks (#15505) Georgi Gerganov 2025-08-28 12:27:02 +03:00
64387f6e95 gguf-py: byteswapping improvements (#12851) Aleksei Nikiforov 2025-08-28 10:56:41 +02:00
d35a1e8c41 cli : change log to warning to explain reason for stopping (#15604) Joshua Cogliati 2025-08-28 01:48:20 -06:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full