enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

484d2f31ae bug-fix: snprintf prints NULL in place of the last character (#10419) kallewoof 2024-12-11 22:48:04 +09:00
4b4d92b098 docs: fix server documentation formatting (#10776) CentricStorm 2024-12-11 10:47:43 +00:00
43041d2eb3 ggml: load all backends from a user-provided search path (#10699) Gilad S. 2024-12-11 02:47:21 +02:00
b685daf386 vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) Jeff Bolz 2024-12-10 14:23:17 -06:00
dafae66cc2 vulkan: dynamic subgroup size for the remaining k quants (#10745) Eve 2024-12-10 19:33:23 +00:00
ae4b922614 imatrix : Add imatrix to --no-context-shift (#10766) Bartowski 2024-12-10 12:23:50 -05:00
750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) Andreas Kieslinger 2024-12-10 18:23:24 +01:00
a86ad841f1 server : add flag to disable the web-ui (#10762) (#10751) Yüg 2024-12-10 17:22:34 +00:00
a05e2afcc2 vulkan: disable spirv-opt for coopmat shaders (#10763) Jeff Bolz 2024-12-10 11:22:20 -06:00
26a8406ba9 CUDA: fix shared memory access condition for mmv (#10740) Johannes Gäßler 2024-12-09 20:07:12 +01:00
c37fb4cf62 Changes to CMakePresets.json to add ninja clang target on windows (#10668) Srihari-mcw 2024-12-09 23:10:19 +05:30
3d98b4cb22 vulkan: fix compile warnings (#10731) Jeff Bolz 2024-12-09 01:24:01 -06:00
1a05004743 cmake : simplify msvc charsets (#10672) Borislav Stanimirov 2024-12-09 09:15:13 +02:00
ce8784bdb1 server : fix format_infill (#10724) Xuan Son Nguyen 2024-12-08 23:04:29 +01:00
e52522b869 server : bring back info of final chunk in stream mode (#10722) Xuan Son Nguyen 2024-12-08 20:38:51 +01:00
06d70147e6 Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (#10723) stduhpf 2024-12-08 19:19:19 +01:00
43ed389a3f llama : use cmake for swift build (#10525) Diego Devesa 2024-12-08 12:14:54 +01:00
ecc93d0558 vulkan: compile a test shader in cmake to check for coopmat2 support (#10713) Jeff Bolz 2024-12-08 02:05:55 -06:00
62e84d9848 llama : add 128k yarn context for Qwen (#10698) Robert Collins 2024-12-07 16:12:27 -05:00
3573fa8e7b server : (refactor) no more json in server_task input (#10691) Xuan Son Nguyen 2024-12-07 20:21:09 +01:00
d9c3ba2b77 ggml : disable iq4_nl interleave size 8 (#10709) Georgi Gerganov 2024-12-07 18:38:15 +02:00
ce4a7b8493 server : various fixes (#10704) Georgi Gerganov 2024-12-07 18:02:05 +02:00
19d8762ab6 ggml : refactor online repacking (#10446) Djip007 2024-12-07 13:37:50 +01:00
c2a16c0bdb server : fix free of spec context and batch (#10651) Georgi Gerganov 2024-12-07 11:52:44 +02:00
3df784b305 Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597) 0cc4m 2024-12-07 10:24:15 +01:00
86a1934978 metal : Extend how Llama.cpp locates metal resources (#10676) Robert Ormandi 2024-12-07 01:55:01 -06:00
784a14aa49 convert : add support for Roberta embeddings (#10695) Sukriti Sharma 2024-12-07 00:02:14 -07:00
c5ede3849f convert : add custom attention mapping Georgi Gerganov 2024-12-06 21:33:15 +02:00
f162d45a21 common : bring back --no-warmup to server (#10686) Xuan Son Nguyen 2024-12-06 13:29:05 +01:00
6c5bc0625f server : (refactoring) do not rely on JSON internally (#10643) Xuan Son Nguyen 2024-12-06 11:14:32 +01:00
7736837d62 fix(server) : not show alert when DONE is received (#10674) Plamen Minev 2024-12-05 23:36:41 +02:00
c9c6e01dae vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) Jeff Bolz 2024-12-05 13:15:05 -06:00
6fe6247831 llama : add Minerva 7B model support (#10673) Riccardo Orlando 2024-12-05 19:30:59 +01:00
0cd182ebcc sync : ggml Georgi Gerganov 2024-12-05 13:27:42 +02:00
a8cbab201d ggml: add GGML_SET Metal kernel + i32 CPU kernel (ggml/1037) PAB 2024-12-04 09:19:30 +01:00
c2082d93a8 ggml : add GGML_PAD_REFLECT_1D operation (ggml/1034) PAB 2024-12-03 20:20:04 +01:00
d405804be8 py : update outdated copy-paste instructions [no ci] (#10667) Daniel Bevenius 2024-12-05 08:47:55 +01:00
f112d198cd Update deprecation-warning.cpp (#10619) aryantandon01 2024-12-05 03:49:20 +05:30
1da7b76569 server : fix speculative decoding with context shift (#10641) Georgi Gerganov 2024-12-04 22:38:20 +02:00
59f4db1088 ggml : add predefined list of CPU backend variants to build (#10626) Diego Devesa 2024-12-04 14:45:40 +01:00
2803540814 ggml-cpu : fix HWCAP2_I8MM value (#10646) Diego Devesa 2024-12-04 14:40:44 +01:00
253b7fde91 Fix HF repo commit to clone lora test models (#10649) ltoniazzi 2024-12-04 09:45:48 +00:00
8d0cfd554a llama: Support MiniCPM-1B (with & w/o longrope) (#10559) JFLFY2255 2024-12-04 17:42:50 +08:00
2759916d86 vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (#10642) Jeff Bolz 2024-12-04 01:28:59 -06:00
40c6d79fb5 SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584) Nicolò Scipione 2024-12-04 02:29:20 +01:00
98036d5670 fix typo of README.md (#10605) Wang Ran (汪然) 2024-12-04 09:22:50 +08:00
cd2f37b304 Avoid using __fp16 on ARM with old nvcc (#10616) Frankie Robertson 2024-12-04 02:41:37 +02:00
da6aac91f1 Add docs for creating a static build (#10268) (#10630) Benson Wong 2024-12-03 16:40:36 -08:00
01e6d9bb71 clip : add sycl support (#10574) piDack 2024-12-04 08:26:37 +08:00
cc98896db8 vulkan: optimize and reenable split_k (#10637) Jeff Bolz 2024-12-03 13:29:54 -06:00
91c36c269b server : (web ui) Various improvements, now use vite as bundler (#10599) Xuan Son Nguyen 2024-12-03 19:38:44 +01:00
1cd3df46bd scripts : remove amx sync Georgi Gerganov 2024-12-03 19:42:30 +02:00
c505471857 sync : ggml Georgi Gerganov 2024-12-03 19:40:25 +02:00
e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032) mahorozte 2024-12-03 21:11:43 +08:00
efb6ae9630 feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) PAB 2024-12-02 19:27:24 +01:00
667d70d170 metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026) PAB 2024-11-28 09:25:06 +01:00
3b4f2e33e2 llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636) Xuan Son Nguyen 2024-12-03 12:54:30 +01:00
82bca2257b readme : add option, update default value, fix formatting (#10271) Nikolaos Pothitos 2024-12-03 12:50:08 +02:00
0115df2f65 metal : small-batch mat-mul kernels (#10581) Georgi Gerganov 2024-12-03 11:52:33 +02:00
515d4e5372 github : minify link [no ci] (revert) Georgi Gerganov 2024-12-03 11:21:43 +02:00
844e2e1fee github : minify link [no ci] Georgi Gerganov 2024-12-03 11:20:35 +02:00
70b98fadbc server : fix default draft model parameters (#10586) Georgi Gerganov 2024-12-03 11:20:00 +02:00
642330ac7c llama : add enum for built-in chat templates (#10623) Xuan Son Nguyen 2024-12-02 22:10:19 +01:00
8648c52101 make : deprecate (#10514) Georgi Gerganov 2024-12-02 21:22:53 +02:00
64ed2091b2 server: Add "tokens per second" information in the backend (#10548) haopeng 2024-12-02 21:45:54 +08:00
991f8aabee SYCL: Fix and switch to GGML_LOG system instead of fprintf (#10579) Akarshan Biswas 2024-12-02 12:34:11 +05:30
4cb003dd8d contrib : refresh (#10593) Georgi Gerganov 2024-12-02 08:53:27 +02:00
917786f43d Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types (#10572) Juk Armstrong 2024-12-01 22:09:49 +00:00
5e1ed95583 grammars : add English-only grammar (#10612) Georgi Gerganov 2024-12-01 21:37:54 +02:00
5c7a5aa0c3 ci: add error handling for Python venv creation in run.sh (#10608) Wang Qin 2024-12-01 10:11:42 -08:00
3420909dff ggml : automatic selection of best CPU backend (#10606) Diego Devesa 2024-12-01 16:12:41 +01:00
86dc11c5bc server : bind to any port when specified (#10590) alek3y 2024-12-01 12:33:12 +01:00
6acce39710 readme : update the usage section with examples (#10596) Georgi Gerganov 2024-12-01 11:25:17 +02:00
43957ef203 build: update Makefile comments for C++ version change (#10598) Wang Qin 2024-11-30 19:19:44 -08:00
0c39f44d70 ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (#10567) Adrien Gallouët 2024-11-30 18:13:18 +01:00
3e0ba0e604 readme : remove old badge Georgi Gerganov 2024-11-30 10:09:21 +02:00
abadba05be readme : refresh (#10587) Georgi Gerganov 2024-11-30 09:47:07 +02:00
0533e7fb38 vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536) Eve 2024-11-30 07:00:02 +00:00
7cc2d2c889 ggml : move AMX to the CPU backend (#10570) Diego Devesa 2024-11-29 21:54:58 +01:00
b782e5c7d4 server : add more test cases (#10569) Xuan Son Nguyen 2024-11-29 21:48:56 +01:00
3a8e9af402 imatrix : support combine-only (#10492) Robert Collins 2024-11-29 12:21:37 -05:00
a3a3048e7a cleanup UI link list (#10577) Diego Devesa 2024-11-29 17:45:08 +01:00
f0678c5ff4 ggml : fix I8MM Q4_1 scaling factor conversion (#10562) Georgi Gerganov 2024-11-29 16:25:39 +02:00
4b3242bbea ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580) Shupei Fan 2024-11-29 21:49:02 +08:00
0f77aae560 sycl : offload of get_rows set to 0 (#10432) Alberto Cabrera Pérez 2024-11-29 12:38:45 +00:00
266b8519ee sycl : Reroute permuted mul_mats through oneMKL (#10408) Alberto Cabrera Pérez 2024-11-29 09:49:43 +00:00
938f608742 CANN: RoPE operator optimization (#10563) Chenguang Li 2024-11-29 14:46:55 +08:00
f095a649ec vulkan: get the first command buffer submitted sooner (#10499) Jeff Bolz 2024-11-29 00:18:02 -06:00
678d7994f4 llava: return false instead of exit (#10546) Ting Lou 2024-11-29 08:09:46 +08:00
dc22344088 ggml : remove redundant copyright notice + update authors Georgi Gerganov 2024-11-28 20:46:40 +02:00
4c0a95b107 llama : add missing model types Georgi Gerganov 2024-11-28 20:45:07 +02:00
6c59567689 server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568) Xuan Son Nguyen 2024-11-28 19:17:49 +01:00
890719311b common: fix warning message when no GPU found (#10564) Johannes Gäßler 2024-11-28 18:15:25 +01:00
7281cf13ad docs: fix outdated usage of llama-simple (#10565) Random Fly 2024-11-28 23:03:11 +08:00
e90688edd0 ci : fix tag name in cuda and hip releases (#10566) Diego Devesa 2024-11-28 15:58:54 +01:00
76b27d29c2 ggml : fix row condition for i8mm kernels (#10561) Georgi Gerganov 2024-11-28 14:56:37 +02:00
eea986f215 cmake : fix ARM feature detection (#10543) Georgi Gerganov 2024-11-28 14:56:23 +02:00
c202cef168 ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541) Shupei Fan 2024-11-28 20:52:03 +08:00
2025fa67e9 kompute : improve backend to pass test_backend_ops (#10542) Sergio López 2024-11-28 12:51:38 +01:00
c6bc73951e CANN: Update cann.md to display correctly in CLion (#10538) Ruixin Huang 2024-11-28 15:27:11 +08:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full