enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

e4376270d9 llama.cpp: fix warning message (#11839) Oleksandr Kuvshynov 2025-02-13 01:25:34 -05:00
3e69319772 llama : update llama_decode_internal ref [no ci] (#11840) Daniel Bevenius 2025-02-13 07:07:51 +01:00
a394039db0 ggml-cpu : add chunking support to mul_mat_id (#11666) Diego Devesa 2025-02-13 01:02:38 +01:00
be3bbd6215 ggml : x2 speed for WASM by optimizing SIMD (#11453) Xuan-Son Nguyen 2025-02-13 00:33:45 +01:00
31afcbee0e server : (webui) Give copy button back to all message bubbles (#11814) Woof Dog 2025-02-12 22:47:11 +00:00
5c4284d57b HIP: Remove GCN from list of devices that avoid MMQ (#11831) uvos 2025-02-12 22:25:28 +01:00
bfd11a2344 Fix: Compile failure due to Microsoft STL breaking change (#11836) JC 2025-02-12 20:36:11 +00:00
0fb77f821f sync : ggml Georgi Gerganov 2025-02-12 21:46:02 +02:00
e598697d63 HIP: Switch to std::vector in rocblas version check (#11820) uvos 2025-02-12 17:25:03 +01:00
fef0cbeadf cleanup: fix compile warnings associated with gnu_printf (#11811) bandoti 2025-02-12 10:06:53 -04:00
748ee9fe93 ggml : fix multi-threaded clamp_f32 (#11824) Richard 2025-02-12 13:57:33 +00:00
198b1ec611 ggml-cpu: Fix duplicate MATMUL_INT8 (#11817) Weizhao Ouyang 2025-02-12 20:22:58 +08:00
c3d6af7cd2 CUDA: fix CUDART_VERSION checks (#11821) Johannes Gäßler 2025-02-12 13:16:39 +01:00
369be5598a llama : fix typo in llama-grammar.h [no ci] (#11816) Daniel Bevenius 2025-02-12 08:40:01 +01:00
4078c77f98 docs: add OpenCL (#11697) lhez 2025-02-11 14:04:13 -08:00
90e4dba461 Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (#11803) Sheldon Robinson 2025-02-11 10:55:45 -05:00
a18f481f99 server : use common_token_to_piece instead of common_detokenize (#11740) Daniel Bevenius 2025-02-11 14:06:45 +01:00
b9ab0a4d0b CUDA: use arch list for compatibility check (#11775) Johannes Gäßler 2025-02-11 00:17:22 +01:00
7b891bdc86 fix: typos in documentation files (#11791) Maxim Evtush 2025-02-10 23:21:31 +01:00
81732619fd docs: utilize the forward slash (/) as the path separator for Unix-like systems (#11770) jason_w 2025-02-11 06:17:48 +08:00
507f9174fe server : (webui) introduce conversation branching + idb storage (#11792) Xuan-Son Nguyen 2025-02-10 21:23:17 +01:00
19b392d58d llama-mmap: fix missing include (#11796) Wilken Gottwalt 2025-02-10 19:58:18 +01:00
0893e0114e server : correct signal handler (#11795) Xuan-Son Nguyen 2025-02-10 18:03:28 +01:00
d7b31a9d84 sync: minja (a72057e519) (#11774) Olivier Chafik 2025-02-10 09:34:09 +00:00
9ac3457b39 Update README.md [no ci] (#11781) pascal-lc 2025-02-10 16:05:57 +08:00
c2a67efe38 vulkan: Make Vulkan optional at runtime (#11493). (#11494) Danny Milosavljevic 2025-02-10 07:17:21 +01:00
b044a0fe3c vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592) Wagner Bruna 2025-02-10 03:08:22 -03:00
19d3c8293b There's a better way of clearing lines (#11756) Eric Curtin 2025-02-09 10:34:49 +00:00
98f6b0fd1e vulkan: account for lookup tables when checking shared memory size (#11502) Jeff Bolz 2025-02-09 01:43:51 -06:00
55ac8c7791 server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759) Xuan-Son Nguyen 2025-02-08 21:54:50 +01:00
e6e6583199 server : (webui) increase edit textarea size (#11763) Woof Dog 2025-02-08 19:09:55 +00:00
aaa5505307 server : minor log updates (#11760) Georgi Gerganov 2025-02-08 18:08:43 +02:00
bdcf8b6a56 cont : fix mmap flag print (#11699) Georgi Gerganov 2025-02-08 16:49:38 +02:00
4d3465c5ae ggml: Fix data race in ggml threadpool (#11736) Karol Kontny 2025-02-08 15:30:53 +01:00
d80be897ac CUDA: fix min. version for movmatrix (#11751) Johannes Gäßler 2025-02-08 10:46:07 +01:00
3ab410f55f readme : update front-end framework (#11753) Nikolaos Pothitos 2025-02-08 11:43:04 +02:00
0cf867160c server : (webui) fix numeric settings being saved as string (#11739) Xuan-Son Nguyen 2025-02-08 10:42:34 +01:00
d2fe216fb2 Make logging more verbose (#11714) Eric Curtin 2025-02-07 14:42:46 +00:00
ed926d8833 llama : fix defrag logic (#11707) Georgi Gerganov 2025-02-07 16:05:34 +02:00
2d219b389e vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729) Christian Fillion 2025-02-07 08:55:47 -05:00
333820d749 llama : fix progress dots (#11730) magicse 2025-02-07 15:48:47 +02:00
c026ba3c23 vulkan: print shared memory size (#11719) Jeff Bolz 2025-02-07 04:26:03 -06:00
7ee953a64a llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727) Christian Fillion 2025-02-07 04:33:27 -05:00
ec3bc8270b SYCL: remove XMX info from print devices (#11712) Akarshan Biswas 2025-02-07 14:57:53 +05:30
b7552cfcbc common : add default embeddings presets (#11677) Daniel Bevenius 2025-02-07 09:15:22 +01:00
225bbbfa39 ggml : optimize and build warning fix for LoongArch (#11709) Jinyang He 2025-02-07 15:38:31 +08:00
855cd0734a llama : fix old glm4 models (#11670) tv1wnd 2025-02-06 22:48:51 +01:00
8a59053f63 sync : ggml Georgi Gerganov 2025-02-06 21:23:03 +02:00
1d20e53c40 rpc: fix known RCE in rpc-server (ggml/1103) Patrick Peng 2025-02-06 09:29:13 -05:00
2fb3c32a16 server : (webui) migrate project to ReactJS with typescript (#11688) Xuan-Son Nguyen 2025-02-06 17:32:29 +01:00
9ab42dc722 docs: update fedora cuda guide for 12.8 release (#11393) Tei Home 2025-02-06 20:16:15 +08:00
194b2e69f8 SYCL: Adjust support condition for norm operators (#11674) Akarshan Biswas 2025-02-06 17:12:35 +05:30
9dd7a0390f llama : add log about loading model tensors (#11699) Georgi Gerganov 2025-02-06 13:41:37 +02:00
c0d4843225 build : fix llama.pc (#11658) Adrien Gallouët 2025-02-06 12:08:13 +01:00
8d4d2be143 ggml : fix LoongArch compile error with 128-bit SIMD (#11701) junchao-zhao 2025-02-06 17:20:00 +08:00
2c6c8df56d vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521) Jeff Bolz 2025-02-06 00:15:30 -06:00
8a7e3bf17a vulkan: initial support for IQ4_XS quantization (#11501) Rémy O 2025-02-06 07:09:59 +01:00
1b598b3058 vulkan: use smaller combined allocations to avoid fragmentation (#11551) Jeff Bolz 2025-02-06 00:02:18 -06:00
902368a06b metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690) Charles Duffy 2025-02-05 19:52:31 -06:00
c3db0480bb readme : add link to Autopen under UIs (#11684) Matvey Soloviev 2025-02-06 01:55:25 +01:00
d774ab3acc metal : adjust support conditions for norm operators (#11671) Georgi Gerganov 2025-02-05 10:57:42 +02:00
fa62da9b2d CUDA: support for mat. mul. with ne03 != ne13 (#11656) Johannes Gäßler 2025-02-05 08:58:31 +01:00
1ec208083c llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) SAMI 2025-02-05 14:45:40 +07:00
9f4cc8f8d3 sync: minja (#11641) Olivier Chafik 2025-02-05 01:00:12 +00:00
fd08255d0d CUDA: non-contiguous (RMS) norm support (#11659) Johannes Gäßler 2025-02-04 22:21:42 +01:00
3ec9fd4b77 HIP: force max threads per block to be 1024 (#11621) fxzjshm 2025-02-05 02:18:38 +08:00
3962fc1a79 server : add try..catch to places not covered by set_exception_handler (#11620) Xuan-Son Nguyen 2025-02-04 18:25:42 +01:00
1bef571f6a arg : list RPC devices first when using --list-devices (#11655) Radoslav Gerganov 2025-02-04 18:16:20 +02:00
db288b60cb tool-call: command r7b fix for normal responses (#11608) Olivier Chafik 2025-02-04 15:48:53 +00:00
106045e7bb readme : add llm_client Rust crate to readme bindings (#11628) Shelby Jenkins 2025-02-04 05:20:55 -06:00
f117d84b48 swift : fix llama-vocab api usage (#11645) Jhen-Jie Hong 2025-02-04 19:15:24 +08:00
534c46b53c metal : use residency set for other platforms (#11648) Jhen-Jie Hong 2025-02-04 19:07:18 +08:00
387a1598ca authors : update Georgi Gerganov 2025-02-04 13:04:10 +02:00
7c9e0ca520 sync : ggml Georgi Gerganov 2025-02-04 12:59:21 +02:00
8f8290ada9 cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) Christian Kastner 2025-02-04 00:17:15 +01:00
b34aedd558 ci : do not stale-close roadmap issues Georgi Gerganov 2025-02-04 09:30:42 +02:00
cde3833239 tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) Olivier Chafik 2025-02-03 23:49:27 +00:00
b3451785ac server : (webui) revert hacky solution from #11626 (#11634) Xuan-Son Nguyen 2025-02-04 00:10:52 +01:00
1d1e6a90bc server : (webui) allow typing and submitting during llm response (#11626) Woof Dog 2025-02-03 22:16:27 +00:00
5598f475be server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622) Daniel Bevenius 2025-02-03 16:45:38 +01:00
8ec05832fa sync : ggml Georgi Gerganov 2025-02-03 14:57:08 +02:00
21c84b5d2d CUDA: fix Volta FlashAttention logic (#11615) Johannes Gäßler 2025-02-03 13:25:56 +01:00
d92cb67e37 server : (webui) Fix Shift+Enter handling (#11609) mashdragon 2025-02-03 09:42:55 +00:00
6eecde3cc8 HIP: fix flash_attn_stream_k_fixup warning (#11604) Johannes Gäßler 2025-02-02 23:48:29 +01:00
396856b400 CUDA/HIP: add support for selectable warp size to mmv (#11519) uvos 2025-02-02 22:40:09 +01:00
4d0598e144 HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601) uvos 2025-02-02 22:08:05 +01:00
90f9b88afb nit: more informative crash when grammar sampler fails (#11593) Olivier Chafik 2025-02-02 19:58:34 +00:00
864a0b67a6 CUDA: use mma PTX instructions for FlashAttention (#11583) Johannes Gäßler 2025-02-02 19:31:09 +01:00
84ec8a58f7 Name colors (#11573) Eric Curtin 2025-02-02 16:14:48 +01:00
bfcce4d693 tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585) Olivier Chafik 2025-02-02 09:25:38 +00:00
69804487e0 Fix exotic ci env that lacks ostringstream::str (#11581) Olivier Chafik 2025-02-02 09:10:15 +00:00
ff227703d6 sampling : support for llguidance grammars (#10224) Michał Moskal 2025-02-01 23:55:32 -08:00
0cec062a63 llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) piDack 2025-02-02 15:48:46 +08:00
53debe6f3c ci: use sccache on windows HIP jobs (#11553) Olivier Chafik 2025-02-01 18:22:38 +00:00
cfd74c86db sync: minja (418a2364b5) (#11574) Olivier Chafik 2025-02-01 12:24:51 +00:00
ecef206ccb Implement s3:// protocol (#11511) Eric Curtin 2025-02-01 11:30:54 +01:00
5bbc7362cb ci: simplify cmake build commands (#11548) Olivier Chafik 2025-02-01 00:01:20 +00:00
aa6fb13213 ci: use sccache on windows instead of ccache (#11545) Olivier Chafik 2025-01-31 17:12:40 +00:00
a83f528688 tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539) Olivier Chafik 2025-01-31 14:15:25 +00:00
b1bcd309fc fix stop regression (#11543) Olivier Chafik 2025-01-31 13:48:31 +00:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full