enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

01e8f2138b ggml-vulkan: remove unused find_program(glslc) (#12416) Guus Waals 2025-03-18 00:35:43 +08:00
484a8ab513 vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312) Jeff Bolz 2025-03-17 09:26:18 -05:00
cf2270e4d3 vulkan: subgroup size tuning (#12087) Daniele 2025-03-17 12:42:33 +01:00
f07690c930 vulkan: use fp32 in coopmat2 q4_k dequant function (#12309) Jeff Bolz 2025-03-17 04:43:35 -05:00
891c63956d vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (#12273) Jeff Bolz 2025-03-17 04:41:59 -05:00
2f21123c1d vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258) Jeff Bolz 2025-03-17 04:35:00 -05:00
374101fd74 cmake : enable building llama.cpp using system libggml (#12321) Christian Kastner 2025-03-17 10:05:23 +01:00
b3c9a65673 SYCL: set extras only on GGML_TYPE_Q4_0 (#12366) Akarshan Biswas 2025-03-17 07:15:12 +05:30
8ba95dca20 llama : fix OLMo-2-0325-32B-Instruct K-norm size (#12400) Sigbjørn Skjæret 2025-03-16 18:46:36 +01:00
dc079cfdff context : fix init of n_outputs (#12397) Georgi Gerganov 2025-03-16 19:29:36 +02:00
7b61bcc87c ci : add --symlinks to xcframework zip command (#12409) Daniel Bevenius 2025-03-16 18:22:05 +01:00
f4c3dd5daa llama-tts : add '-o' option (#12398) marcoStocchi 2025-03-15 17:23:11 +01:00
3d35d87b41 SYCL: Delete redundant plus sign and space (#12391) aubreyli 2025-03-15 22:49:03 +08:00
b19bd064c0 SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399) fairydreaming 2025-03-15 15:19:30 +01:00
92a391327e [CANN]MUL_MAT optimization (#12382) Chenguang Li 2025-03-15 09:31:08 +08:00
9f2250ba72 Add CLI arg to llama-run to adjust the number of threads used (#12370) Eric Curtin 2025-03-14 16:41:20 +00:00
774973b8f3 main : add -sysf / --system-prompt-file (#12249) (#12250) Sigbjørn Skjæret 2025-03-14 16:57:05 +01:00
8fcb563613 Load all MoE experts during warmup (#11571) fairydreaming 2025-03-14 13:47:05 +01:00
add2a3aa5a server: fix "--grammar-file" parameter (#12285) Victor 2025-03-14 11:21:17 +01:00
c522ce4143 graph : simplify attn input build for unified KV cache (#12381) Georgi Gerganov 2025-03-14 10:47:44 +02:00
081bee8c64 hparams : add SWA rope parameters (#12374) Georgi Gerganov 2025-03-14 09:03:24 +02:00
84d5475541 llama : fix Gemma3 SWA KV cache shift (#12373) Georgi Gerganov 2025-03-13 19:08:07 +02:00
be7c303410 arg : no n_predict = -2 for examples except for main and infill (#12364) Xuan-Son Nguyen 2025-03-13 12:34:54 +01:00
e0dbec0bc6 llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) Georgi Gerganov 2025-03-13 12:35:44 +02:00
2048b5913d server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338) Ishaan Gandhi 2025-03-13 06:10:05 -04:00
f08f4b3187 Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) Oscar Barenys 2025-03-12 20:06:58 +01:00
80a02aa858 llama.swiftui : fix xcframework dir in README [no ci] (#12353) Daniel Bevenius 2025-03-12 13:45:32 +01:00
363f8c5d67 sycl : variable sg_size support for mmvq kernels (#12336) Alberto Cabrera Pérez 2025-03-12 09:57:32 +00:00
34c961b181 CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315) uvos 2025-03-12 10:14:11 +01:00
7841fc723e llama : Add Gemma 3 support (+ experimental vision capability) (#12343) Xuan-Son Nguyen 2025-03-12 09:30:24 +01:00
bf69cfe62f vulkan: fix bug in coopmat1 mul_mat_id (#12316) Jeff Bolz 2025-03-12 00:59:19 -05:00
10f2e81809 CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177) uvos 2025-03-11 20:16:03 +01:00
ba7654380a ggml-backend : fix backend search path (#12330) jklincn 2025-03-11 21:25:17 +08:00
6ab2e4765a metal : Cache the Metal library at the device context level (#12265) BB-fat 2025-03-11 19:45:02 +08:00
96e1280839 clip : bring back GPU support (#12322) Xuan-Son Nguyen 2025-03-11 09:20:16 +01:00
2c9f833d17 mat vec double buffer (#12188) Eve 2025-03-10 19:28:11 +00:00
251364549f musa: support new arch mp_31 and update doc (#12296) R0CKSTAR 2025-03-11 01:18:25 +08:00
8acdacb3ea opencl: use OpenCL C standard supported by the device (#12221) Henry Linjamäki 2025-03-10 18:57:00 +02:00
89b2b56e86 readme: added Sidekick to available UIs (#12311) John Bean 2025-03-10 22:13:09 +08:00
e128a1bf5b tests : fix test-quantize-fns to init the CPU backend (#12306) Georgi Gerganov 2025-03-10 14:07:15 +02:00
6ef79a67ca common : refactor '-o' option (#12278) marcoStocchi 2025-03-10 12:34:13 +01:00
4e39a3c332 server: extract <think> tags from qwq outputs (#12297) Olivier Chafik 2025-03-10 10:59:03 +00:00
be421fc429 tool-call: ensure there's always a non-empty tool call id (#12292) Olivier Chafik 2025-03-10 09:45:29 +00:00
87c2630546 allow missing content in message if tool_calls provided (#12293) Olivier Chafik 2025-03-10 09:45:07 +00:00
2b3a25c212 sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291) Olivier Chafik 2025-03-10 09:44:42 +00:00
8352cdc87b llava : fix bug in minicpm-v code (#11513) tc-mb 2025-03-10 16:33:24 +08:00
1e2f78a004 server : add speculative decoding presets for FIM (#12287) Georgi Gerganov 2025-03-09 19:08:20 +02:00
0fd7ca7a21 authors : update (#12271) Georgi Gerganov 2025-03-08 18:26:00 +02:00
6fefc05a7a ggml-backend : make path_str compatible with C++20 (#12269) Jason C.H 2025-03-09 00:02:39 +08:00
7ab364390f server : infill gen ends on new line (#12254) Georgi Gerganov 2025-03-07 20:54:30 +02:00
7c7f3b7f43 ggml : skip intermediate .air file when compiling .metallib (#12247) Daniel Bevenius 2025-03-07 14:15:27 +01:00
102ac1891d sync : ggml Georgi Gerganov 2025-03-07 14:00:27 +02:00
d6ae2fa061 ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) vmobilis 2025-03-07 11:11:40 +03:00
68d0027f3d ggml-cpu: faster AVX2 variant for IQ1_M (#12216) Rémy O 2025-03-07 12:54:22 +01:00
ea002810a2 ci : fix save-load test invocations (#12245) Georgi Gerganov 2025-03-07 12:19:31 +02:00
8fad3c7a7c server : Log original chat template parsing error (#12233) Sigbjørn Skjæret 2025-03-07 11:15:33 +01:00
7cf64f6bee sync: minja - support QwQ-32B (#12235) Olivier Chafik 2025-03-07 09:33:37 +00:00
5e2d57b2b2 metal : simplify kernel arguments using a struct (#3229) (#12194) BB-fat 2025-03-07 15:35:57 +08:00
f1648e91cf HIP: fix rocWMMA build flags under Windows (#12230) David Huang 2025-03-07 15:06:08 +08:00
d6c95b0740 metal : fix default.metallib build (#12224) Daniel Bevenius 2025-03-07 06:23:16 +01:00
d76a86d967 opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops (#12217) lhez 2025-03-06 16:20:35 -08:00
776f9e59cc cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (#12094) xiaofei 2025-03-07 06:58:25 +08:00
3d652bfddf readme : update bindings (#12229) Lucas Moura Belo 2025-03-06 16:15:13 -03:00
5220a16d18 CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222) Johannes Gäßler 2025-03-06 18:45:09 +01:00
3ffbbd5ce1 HIP: rocWMMA documentation and enabling in workflow builds (#12179) David Huang 2025-03-06 21:14:11 +08:00
42994048a3 update function-calling.md w/ template override for functionary-small-v3.2 (#12214) Olivier Chafik 2025-03-06 09:03:31 +00:00
e9b2f84f14 llava: add big-endian conversion for image encoder (#12218) Aaron Teo 2025-03-06 16:33:21 +08:00
e721c05c93 HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (#12209) uvos 2025-03-06 08:20:52 +01:00
57b6abf85a android : fix KV cache log message condition (#12212) Han Yin 2025-03-05 22:22:49 -08:00
94bb63e4f0 opencl : fix buffer alignment (#12197) Henry Linjamäki 2025-03-06 03:33:40 +02:00
f79243992c opencl : fix ulong kernel args were set from int variables (#12174) Henry Linjamäki 2025-03-06 03:31:14 +02:00
ed4ce0dda2 opencl : fix profile-related errors (#12095) simon886212 2025-03-06 09:30:05 +08:00
07d1572347 ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154) Rémy O 2025-03-06 02:26:10 +01:00
5e43f104cc SYCL: Disable f16 Unary OPs as not supported by the kernels (#12201) Akarshan Biswas 2025-03-05 21:28:23 +05:30
16e4b22c5e ggml : fix GGMLMetalClass ODR (#12200) Plamen Minev 2025-03-05 17:16:01 +02:00
074c4fd39d ci : add fetch-depth to xcframework upload (#12195) Daniel Bevenius 2025-03-05 14:16:40 +01:00
669912d9a5 tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) Olivier Chafik 2025-03-05 13:05:13 +00:00
fa31c438e0 ci : fix xcframework artifact tag (#12191) Daniel Bevenius 2025-03-05 10:22:29 +01:00
3ccbfe5a71 ci : remove xframework upload (#12190) Daniel Bevenius 2025-03-05 08:34:02 +01:00
06a92a193a server : fix cache reuse logic (#12161) Clauszy 2025-03-05 15:25:45 +08:00
a057897ad4 llama : add xcframework build script (#11996) Daniel Bevenius 2025-03-05 06:30:31 +01:00
5bbe6a9fe9 ggml : portability fixes for VS 2017 (#12150) mgroeber9110 2025-03-04 17:53:26 +01:00
20a9b8f5e1 readme : fix roadmap link (#12185) Georgi Gerganov 2025-03-04 18:42:44 +02:00
56d7a9f812 main: allow preloading conversation with -p and add -st / --single-turn (#12145) Sigbjørn Skjæret 2025-03-04 17:19:39 +01:00
1a24c4621f server: fix deadly typo in response_format.json_schema.schema handling (#12168) Olivier Chafik 2025-03-04 06:24:07 +00:00
becade5de7 HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (#12032) David Huang 2025-03-04 05:10:54 +08:00
dfd6b2c0be sync : ggml Georgi Gerganov 2025-03-03 17:57:38 +02:00
b64d7cc272 cuda: unary ops as float + de-duplicate (ggml/1130) cmdr2 2025-03-03 20:51:31 +05:30
3d1cf3cf33 sync : ggml Georgi Gerganov 2025-02-28 12:37:35 +02:00
0cbee131ad cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) cmdr2 2025-02-28 12:36:46 +02:00
8371d44595 sync : ggml Georgi Gerganov 2025-02-28 09:09:58 +02:00
87abb7e903 cuda/cpu: Increase support for fp16 unary operations (ggml/1125) cmdr2 2025-02-28 12:34:39 +05:30
6d4c23b81b whisper : support GGML_BACKEND_DL (whisper/2843) Diego Devesa 2025-02-27 13:35:07 +01:00
6512a90037 cmake : fix compile assumptions for power9/etc (whisper/2777) midnight 2025-02-05 04:41:10 -08:00
4512055792 Told cmake to install ggml-cpp.h as a public header file. (ggml/1126) petterreinholdtsen 2025-02-26 21:44:00 +01:00
f54a4ba11e Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) cmdr2 2025-02-25 18:06:34 +05:30
aede2074f6 scripts : sync-ggml-am.sh fix Georgi Gerganov 2025-02-28 09:09:38 +02:00
2679c3b55d ci : set GITHUB_ACTION env var for server tests (#12162) Daniel Bevenius 2025-03-03 16:17:36 +01:00
c43af9276b tts: add speaker file support (#12048) dm4 2025-03-03 21:09:29 +08:00
d5c63cd7f9 test-backend-ops : add option -p to filter by op params (#12155) Diego Devesa 2025-03-03 14:00:46 +01:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full