enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

6390a998bf tts : add guide tokens support (#11186) LostRuins Concedo 2025-01-18 18:20:57 +08:00
44e18ef939 vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281) Jeff Bolz 2025-01-18 02:26:50 -06:00
3edfa7d375 llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) codezjx 2025-01-17 20:57:56 +08:00
667d72846c rpc : early register backend devices (#11262) Radoslav Gerganov 2025-01-17 10:57:09 +02:00
a133566d34 vocab : fix double-eos check (#11273) Georgi Gerganov 2025-01-17 09:28:00 +02:00
960ec65273 llama : fix deprecation message: vocabable -> vocab (#11269) David Renshaw 2025-01-17 02:12:01 -05:00
7a689c415e README : added kalavai to infrastructure list (#11216) musoles 2025-01-17 00:10:49 +00:00
bd38ddea01 vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166) Jeff Bolz 2025-01-16 15:47:10 -06:00
466300fe14 vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206) Jeff Bolz 2025-01-16 15:23:49 -06:00
206bc53422 vulkan: optimize coopmat2 q2_k dequant function (#11130) Jeff Bolz 2025-01-16 15:16:39 -06:00
4dbc8b9cb7 llama : add internlm3 support (#11233) RunningLeon 2025-01-17 02:10:38 +08:00
9c8dcefe17 CUDA: backwards pass for misc. ops, add tests (#11257) Johannes Gäßler 2025-01-16 16:43:38 +01:00
681149ced2 llama : add llama_model_load_from_splits (#11255) Xuan Son Nguyen 2025-01-16 13:54:08 +01:00
c67cc9837d ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227) fj-y-saito 2025-01-16 18:11:49 +09:00
adc5dd92e8 vulkan: scale caching for k quants + misc fixes (#11081) Eve 2025-01-15 19:50:13 +00:00
f11cfdfd7f ci : use -no-cnv in gguf-split tests (#11254) Georgi Gerganov 2025-01-15 18:28:35 +02:00
1d8504338e fix: ggml: fix vulkan-shaders-gen build (#10448) Junil Kim 2025-01-15 22:17:42 +09:00
432df2d5f9 RoPE: fix back, CUDA support for back + noncont. (#11240) Johannes Gäßler 2025-01-15 12:51:37 +01:00
0ccd7f3eb2 examples : add embd_to_audio to tts-outetts.py [no ci] (#11235) Daniel Bevenius 2025-01-15 05:44:38 +01:00
f446c2cf6a SYCL: Add gated linear attention kernel (#11175) Akarshan Biswas 2025-01-15 08:50:17 +05:30
b4d92a59a2 ci : add -no-cnv for tests (#11238) Xuan Son Nguyen 2025-01-14 15:42:23 +01:00
bbf3e55e35 vocab : add dummy tokens for "no_vocab" type (#11231) Georgi Gerganov 2025-01-14 12:54:58 +02:00
c5bf0d1bd7 server : Improve code snippets direction between RTL text (#11221) ebraminio 2025-01-14 14:09:33 +03:30
091592d758 Refactor test-chat-template.cpp (#11224) Olivier Chafik 2025-01-14 10:16:41 +00:00
44d1e796d0 sync : ggml Georgi Gerganov 2025-01-14 10:39:42 +02:00
a4f3f5d8e6 scripts : sync gguf (cont) Georgi Gerganov 2025-01-14 09:40:15 +02:00
48e1ae0e61 scripts : sync gguf Georgi Gerganov 2025-01-14 09:36:58 +02:00
d00a80e89d scripts : sync opencl Georgi Gerganov 2025-01-14 09:19:58 +02:00
504af20ee4 server : (UI) Improve messages bubble shape in RTL (#11220) ebraminio 2025-01-13 22:53:31 +03:30
84a44815f7 cli : auto activate conversation mode if chat template is available (#11214) Xuan Son Nguyen 2025-01-13 20:18:12 +01:00
39509fb082 cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (#11042) Andreas Kieslinger 2025-01-13 16:45:53 +01:00
a29f0870d4 contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:59:26 +02:00
437e05f714 server : (UI) Support for RTL text as models input or output (#11208) ebraminio 2025-01-13 17:16:39 +03:30
ca001f6656 contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:08:44 +02:00
00b4c3da62 common : support tag-based --hf-repo like on ollama (#11195) Xuan Son Nguyen 2025-01-13 13:56:23 +01:00
7426a26b24 contrib : add naming guidelines (#11177) Georgi Gerganov 2025-01-13 14:46:36 +02:00
8f70fc3d1b llama : remove 'd' from bad special token log (#11212) Daniel Bevenius 2025-01-13 13:38:20 +01:00
1244cdcf14 ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (#11211) Radoslav Gerganov 2025-01-13 13:31:41 +02:00
924518e2e5 Reset color before we exit (#11205) Eric Curtin 2025-01-12 18:23:10 +00:00
9a483999a6 llama : fix chat template gguf key (#11201) Xuan Son Nguyen 2025-01-12 13:45:14 +01:00
08f10f69c3 llama : remove notion of CLS token (#11064) Georgi Gerganov 2025-01-12 12:15:53 +02:00
afa8a9ec9b llama : add llama_vocab, functions -> methods, naming (#11110) Georgi Gerganov 2025-01-12 11:32:42 +02:00
c05e8c9934 gguf-py: fixed local detection of gguf package (#11180) Vinesh Janarthanan 2025-01-11 03:42:31 -06:00
2739a71e4b convert : sort print supported models [no ci] (#11179) Daniel Bevenius 2025-01-11 05:50:33 +01:00
ba8a1f9c5b examples : add README.md to tts example [no ci] (#11155) Daniel Bevenius 2025-01-10 13:16:16 +01:00
ff3fcabc72 convert : add --print-supported-models option (#11172) Daniel Bevenius 2025-01-10 11:30:53 +01:00
c3f9d25706 Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161) 0cc4m 2025-01-10 06:39:33 +01:00
ee7136c6d1 llama: add support for QRWKV6 model architecture (#11001) Molly Sophia 2025-01-10 09:58:08 +08:00
c6860cc734 SYCL: Refactor ggml_sycl_compute_forward (#11121) Akarshan Biswas 2025-01-10 05:43:03 +05:30
1204f97270 doc: add cuda guide for fedora (#11135) Tei Home 2025-01-09 19:32:06 +08:00
8eceb888d7 server : add tooltips to settings and themes btn (#11154) Daniel Bevenius 2025-01-09 11:28:29 +01:00
f8feb4b01a model: Add support for PhiMoE arch (#11003) Pierrick Hymbert 2025-01-09 11:21:41 +01:00
be0e950c91 media : remove old img [no ci] Georgi Gerganov 2025-01-09 11:15:15 +02:00
d9feae1c06 llama-chat : add phi 4 template (#11148) Xuan Son Nguyen 2025-01-09 10:07:33 +01:00
8d59d91171 fix: add missing msg in static_assert (#11143) hydai 2025-01-09 04:03:28 +08:00
8a1d9c25fa gguf-py : move scripts directory (#11116) Vinesh Janarthanan 2025-01-08 12:54:58 -06:00
1bf839b1e8 Enhance user input handling for llama-run (#11138) Eric Curtin 2025-01-08 18:47:05 +00:00
f7cd13301c ci : use actions from ggml-org (#11140) Xuan Son Nguyen 2025-01-08 16:09:20 +01:00
4d2b3d8804 lora : improve compat with mergekit-extract-lora (#11131) Xuan Son Nguyen 2025-01-08 15:59:53 +01:00
c07d437bbd llama : avoid hardcoded QK_K (#11061) Georgi Gerganov 2025-01-08 16:19:36 +02:00
99a3755a3c sync : ggml Georgi Gerganov 2025-01-08 13:40:30 +02:00
c792dcf488 ggml : allow loading backend with env variable (ggml/1059) Radoslav Gerganov 2025-01-05 09:50:37 +02:00
80ccf5d725 ci : pin dependency to specific version (#11137) Xuan Son Nguyen 2025-01-08 12:07:20 +01:00
a3c1232c3f arg : option to exclude arguments from specific examples (#11136) Georgi Gerganov 2025-01-08 12:55:36 +02:00
8cef75c743 llamafile : ppc64le MMA INT8 implementation (#10912) amritahs-ibm 2025-01-08 16:24:19 +05:30
0d52a69e4b ci : fix cmake option (#11125) Georgi Gerganov 2025-01-08 11:29:34 +02:00
02f0430141 Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117) Mathieu Baudier 2025-01-08 09:18:13 +01:00
bec2183f2c fix: Vulkan shader gen binary path when Cross-compiling (#11096) ag2s20150909 2025-01-08 16:17:29 +08:00
53ff6b9b9f GGUF: C++ refactor, backend support, misc fixes (#11030) Johannes Gäßler 2025-01-07 18:01:58 +01:00
017cc5f446 ggml-backend : only offload from host buffers (fix) (#11124) Diego Devesa 2025-01-07 16:11:57 +01:00
a3d50bc022 ggml-backend : only offload from host buffers (#11120) Diego Devesa 2025-01-07 12:38:05 +01:00
a4dd490069 rpc : code cleanup (#11107) Radoslav Gerganov 2025-01-07 08:37:02 +02:00
c0d6f790d0 SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087) Akarshan Biswas 2025-01-07 11:56:07 +05:30
dc7cef9f37 llama-run : fix context size (#11094) Eric Curtin 2025-01-06 22:45:28 +00:00
ecebbd292d llama : remove unused headers (#11109) Georgi Gerganov 2025-01-06 17:52:35 +02:00
96be8c3264 github : add cmd line field to bug report (#11090) Xuan Son Nguyen 2025-01-06 16:34:49 +01:00
e6e7c75d94 server : fix extra BOS in infill endpoint (#11106) Georgi Gerganov 2025-01-06 15:36:08 +02:00
09186fabbe llama : remove check flash_attn with lora (#11104) Xuan Son Nguyen 2025-01-06 13:41:12 +01:00
96a1dc27c3 llama : prevent system info string accumulation across calls (#11101) Asghar Ghorbani 2025-01-06 12:21:46 +01:00
6369f867a4 llama : rename missed batch params/vars to ubatch (#10059) Daniel Bevenius 2025-01-06 10:28:17 +01:00
47182dd03f llama : update llama_model API names (#11063) Georgi Gerganov 2025-01-06 10:55:18 +02:00
3e6e7a6bc2 tokenize : escape the prompt (#11058) Georgi Gerganov 2025-01-06 10:54:25 +02:00
ae2f606bb5 mmap : fix fileno macro clash (#11076) Georgi Gerganov 2025-01-06 10:52:38 +02:00
727368c60f llama : use LLAMA_TOKEN_NULL (#11062) Georgi Gerganov 2025-01-06 10:52:15 +02:00
5047dd3546 llama : use _impl suffix instead of _internal (#11060) Georgi Gerganov 2025-01-06 10:52:01 +02:00
46e3556e01 CUDA: add BF16 support (#11093) Johannes Gäßler 2025-01-06 02:33:52 +01:00
b56f079e28 Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074) 0cc4m 2025-01-04 21:09:59 +01:00
9394bbd484 llama : Add support for DeepSeek V3 (#11049) fairydreaming 2025-01-04 21:06:11 +01:00
f922a9c542 [GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047) matt23654 2025-01-04 16:10:30 +00:00
46be942214 llama : add support for the cohere2 model architecture (#10900) DAN™ 2025-01-04 09:33:31 -05:00
78c6785175 sync : ggml Georgi Gerganov 2025-01-04 10:54:01 +02:00
5e3b08d606 ggml : do not install metal source when embed library (ggml/1054) Georgi Gerganov 2025-01-04 10:53:54 +02:00
db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) Daniel Bevenius 2024-12-19 03:50:12 +01:00
c31fc8b966 fix: Vulkan shader gen binary path (#11037) Gilad S. 2025-01-04 10:17:31 +02:00
4b0c638b9a common : disable KV cache shifting automatically for unsupported models (#11053) Molly Sophia 2025-01-03 20:13:18 +08:00
e7da954ecc metal : avoid uint (#11019) Georgi Gerganov 2025-01-03 11:26:14 +02:00
f66f582927 llama : refactor src/llama.cpp (#10902) Georgi Gerganov 2025-01-03 10:18:53 +02:00
2f0ee84b9b server: bench: minor fixes (#10765) Pierrick Hymbert 2025-01-02 18:06:12 +01:00
0da5d86026 server : allow using LoRA adapters per-request (#10994) Xuan Son Nguyen 2025-01-02 15:05:18 +01:00
a45433ba20 readme : add llama-swap to infrastructure section (#11032) Benson Wong 2025-01-01 23:14:54 -08:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full