enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

f221d56220 ggml : alloc ggml_contexts on the heap (whisper/2525) Georgi Gerganov 2024-11-01 10:23:05 +02:00
e597e50794 build: fix build error in Windows env with OneAPI setup (#10107) Zhenwei Jin 2024-11-01 11:09:59 +08:00
85679d37f3 llama : improve output buffer type selection (#10098) Diego Devesa 2024-11-01 00:49:53 +01:00
1e9f94994e quantize : fix --keep-split (#10114) Diego Devesa 2024-11-01 00:45:34 +01:00
c02e5ab2a6 llama : fix buffer checks for mamba and rwk (#10111) Diego Devesa 2024-10-31 22:54:23 +01:00
ab3d71f97f loader: refactor tensor weights storage (#9935) Zhenwei Jin 2024-11-01 02:50:39 +08:00
0a683e8088 server : include scheme when printing URL (#10106) Kevin Gibbons 2024-10-31 06:02:35 -07:00
dea5e86051 ggml : check tensor name lengths in gguf files (#10100) Diego Devesa 2024-10-31 11:40:59 +01:00
1329c0a75e kompute: add mul_mat_q4_k shader (#10097) Sergio López 2024-10-31 10:09:52 +01:00
61408e7fad kompute: add backend registry / device interfaces (#10045) Sergio López 2024-10-30 17:01:52 +01:00
b9e02e8184 ggml : fix memory leaks when loading invalid gguf files (#10094) Diego Devesa 2024-10-30 14:51:21 +01:00
6763f713bb readme : more lora detail in main example readme (#10064) Rich Dougherty 2024-10-31 01:22:39 +13:00
79a2bc042d convert : more detailed convert lora usage docs (#10065) Rich Dougherty 2024-10-31 01:22:21 +13:00
fc83a9e584 ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029) xctan 2024-10-30 15:00:40 +08:00
c5b0f4b5d9 llama : refactor model loader with backend registry (#10026) Diego Devesa 2024-10-30 02:01:23 +01:00
8f275a7c45 ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) Changyeon Kim 2024-10-29 17:52:56 +09:00
8d8ff71536 llama : remove Tail-Free sampling (#10071) Georgi Gerganov 2024-10-29 10:42:05 +02:00
61715d5cc8 llama : Add IBM granite template (#10013) arch-btw 2024-10-28 10:45:33 -07:00
07028f9d74 flake.lock: Update (#10063) Georgi Gerganov 2024-10-28 17:41:24 +02:00
524afeec9d musa: workaround for Guilty Lockup in cleaning src0 (#10042) R0CKSTAR 2024-10-28 17:02:48 +08:00
8125e6cbfc server : don't overfill the batch during infill (#10018) Georgi Gerganov 2024-10-28 08:49:32 +02:00
8841ce3f43 llama : switch KQ multiplication to F32 precision by default (#10015) Georgi Gerganov 2024-10-27 20:59:58 +02:00
cc2983d375 sync : ggml Georgi Gerganov 2024-10-26 10:34:08 +03:00
8c60a8a462 increase cuda_cpy block size (ggml/996) bssrdf 2024-10-23 14:34:00 -04:00
9e4a2563ea scripts : fix amx sync [no ci] Georgi Gerganov 2024-10-26 10:33:31 +03:00
668750357e metal : support permuted matrix multiplicaions (#10033) Georgi Gerganov 2024-10-25 22:26:15 +03:00
ff252ea48e llama : add DRY sampler (#9702) wwoodsTM 2024-10-25 10:07:34 -06:00
d80fb71f8b llama: string_split fix (#10022) Michael Podvitskiy 2024-10-25 17:57:54 +02:00
2f8bd2b901 llamafile : extend sgemm.cpp support for Q5_0 models (#10010) Srihari-mcw 2024-10-25 12:57:41 +05:30
bc5ba007b2 server : check that the prompt fits in the slot's context (#10030) Georgi Gerganov 2024-10-25 10:13:46 +03:00
958367bf53 server : refactor slot input data, move tokenizer to HTTP thread (#10023) Xuan Son Nguyen 2024-10-24 21:51:22 +02:00
40f2555797 ci : fix cmake flags for SYCL Georgi Gerganov 2024-10-24 21:23:33 +03:00
167a515651 CUDA: fix insufficient buffer clearing for MMQ (#10032) Johannes Gäßler 2024-10-24 14:40:23 +02:00
c39665f589 CUDA: fix MMQ for non-contiguous src0, add tests (#10021) Johannes Gäßler 2024-10-24 11:09:36 +02:00
0a1c750c80 server : samplers accept the prompt correctly (#10019) wwoodsTM 2024-10-23 13:27:51 -06:00
190a37d797 sync : ggml Georgi Gerganov 2024-10-23 17:23:55 +03:00
2d3aba9ee8 llama.vim : bump generation time limit to 3s [no ci] Georgi Gerganov 2024-10-23 17:16:56 +03:00
80273a306d CUDA: fix 1D im2col, add tests (ggml/993) Johannes Gäßler 2024-10-18 09:24:44 +02:00
c19af0acb1 ggml : remove redundant set of contexts used field (ggml/978) Daniel Bevenius 2024-10-16 20:10:01 +02:00
ac113a0fee llama.vim : add classic vim support (#9995) Michael Coppola 2024-10-23 07:09:26 -04:00
4c9388fb96 metal : add POOL2D and fix IM2COL (#9943) Jun Hee Yoo 2024-10-23 19:33:45 +09:00
873279b159 flake.lock: Update github-actions[bot] 2024-10-20 00:22:59 +00:00
c8c07d658a llama : fix empty batch causing llama_batch_allocr to crash (#9966) Xuan Son Nguyen 2024-10-22 16:59:02 +02:00
19d900a756 llama : rename batch to ubatch (#9950) Daniel Bevenius 2024-10-22 15:31:06 +02:00
11d47057a5 Rwkv chat template fix (#10001) Molly Sophia 2024-10-22 21:22:26 +08:00
c421ac072d lora : warn user if new token is added in the adapter (#9948) Xuan Son Nguyen 2024-10-22 13:08:41 +02:00
4ff7fe1fb3 llama : add chat template for RWKV-World + fix EOT (#9968) Molly Sophia 2024-10-22 18:33:37 +08:00
6b8447352d [CANN] Adapt to dynamically loadable backends mechanism (#9970) leo-pony 2024-10-22 16:16:01 +08:00
674804a996 arg : fix typo in embeddings argument help [no ci] (#9994) Daniel Bevenius 2024-10-22 09:40:02 +02:00
e94a138d64 llama.vim : fix info text display [no ci] (#9787) Georgi Gerganov 2024-10-22 00:35:25 +03:00
e01c67affe llama.vim : move info to the right of screen [no ci] (#9787) Georgi Gerganov 2024-10-21 22:52:22 +03:00
994cfb1acb readme : update UI list (#9972) Asghar Ghorbani 2024-10-21 20:20:59 +02:00
94008cc760 arg : fix attention non-causal arg value hint (#9985) Daniel Bevenius 2024-10-21 20:12:52 +02:00
dbd5f2f573 llama.vim : plugin for Neovim (#9787) Georgi Gerganov 2024-10-21 20:25:02 +03:00
f594bc80ba ggml : add asserts for type conversion in fattn kernels (#9971) Georgi Gerganov 2024-10-21 16:20:46 +03:00
d5ebd79c76 rpc : pack only RPC structs (#9959) Radoslav Gerganov 2024-10-21 13:35:40 +03:00
55e47786e3 llama : default sampling changes + greedy update (#9897) Georgi Gerganov 2024-10-21 09:46:40 +03:00
bc21975084 speculative : fix handling of some input params (#9963) Georgi Gerganov 2024-10-21 09:37:12 +03:00
1db8c84fc6 fix mul_mat_vec_q and *_vec_q error (#9939) Neo Zhang Jianyu 2024-10-21 14:26:09 +08:00
45f097645e readme : update bindings list (#9951) Loïc Carrère 2024-10-20 18:25:41 +02:00
7cab2083c7 readme : update infra list (#9942) icppWorld 2024-10-20 12:01:34 -04:00
cda0e4b648 llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745) Xuan Son Nguyen 2024-10-18 23:18:01 +02:00
afd9909a64 rpc : backend refactoring (#9912) Radoslav Gerganov 2024-10-18 14:33:58 +03:00
87421a23e8 [SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705) Ouadie EL FAROUKI 2024-10-18 06:46:16 +01:00
60ce97c9d8 add amx kernel for gemm (#8998) Ma Mingfei 2024-10-18 13:34:36 +08:00
8901755ba3 server : add n_indent parameter for line indentation requirement (#9929) Georgi Gerganov 2024-10-18 07:32:19 +03:00
6f55bccbb8 llama : rename batch_all to batch (#8881) Daniel Bevenius 2024-10-18 01:41:51 +02:00
17bb928080 readme : remove --memory-f32 references (#9925) Georgi Gerganov 2024-10-17 23:43:05 +03:00
9f45fc1e99 llama : change warning to debug log Georgi Gerganov 2024-10-17 23:26:32 +03:00
99bd4ac28c llama : infill sampling handle very long tokens (#9924) Georgi Gerganov 2024-10-17 22:32:47 +03:00
3752217ed5 readme : update bindings list (#9918) Tim Wang 2024-10-17 17:57:14 +11:00
f010b77a37 vulkan : add backend registry / device interfaces (#9721) Diego Devesa 2024-10-17 02:46:58 +02:00
2194200278 fix: allocating CPU buffer with size 0 (#9917) Gilad S. 2024-10-17 02:34:22 +03:00
73afe681aa fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) Gilad S. 2024-10-17 01:36:51 +03:00
9e04102448 llama : suppress conversion from 'size_t' to 'int' (#9046) Daniel Bevenius 2024-10-16 19:34:28 +02:00
dbf18e4de9 llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
66c2c93082 grammar : fix JSON Schema for string regex with top-level alt. (#9903) Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
10433e8b45 llama : add tensor name for "result_norm" (#9907) Molly Sophia 2024-10-16 18:10:21 +08:00
1f66b699c4 server : fix the disappearance of the end of the text (#9867) Alexey Parfenov 2024-10-16 08:35:53 +00:00
0e41b300ed sync : ggml Georgi Gerganov 2024-10-16 11:28:14 +03:00
cd60b88bf7 ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
becfd387f6 [CANN] Fix cann compilation error (#9891) leo-pony 2024-10-16 08:51:46 +08:00
755a9b2bf0 llama : add infill sampler (#9896) Georgi Gerganov 2024-10-15 16:35:33 +03:00
223c25a72f server : improve infill context reuse (#9894) Georgi Gerganov 2024-10-15 16:28:55 +03:00
fbc98b748e sampling : add XTC sampler (#9742) MaggotHATE 2024-10-15 15:54:55 +05:00
dcdd535302 server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
4c42f93b22 readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
a89f75e1b7 server : handle "logprobs" field with false value (#9871) VoidIsVoid 2024-10-14 15:04:36 +08:00
13dca2a54a Vectorize load instructions in dmmv f16 CUDA kernel (#9816) agray3 2024-10-14 01:49:08 +01:00
d4c19c0f5c server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
c7181bd294 server : reuse cached context chunks (#9866) Georgi Gerganov 2024-10-13 18:52:48 +03:00
92be9f1216 flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
edc265661c server : add option to time limit the generation phase (#9865) Georgi Gerganov 2024-10-12 16:14:27 +03:00
1bde94dd02 server : remove self-extend features (#9860) Georgi Gerganov 2024-10-12 16:06:31 +03:00
95c76e8e92 server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
11ac9800af llama : improve infill support and special token detection (#9798) Georgi Gerganov 2024-10-12 08:21:51 +03:00
943d20b411 musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
96776405a1 ggml : move more prints to the ggml log system (#9839) Diego Devesa 2024-10-11 15:34:45 +02:00
7eee341bee common : use common_ prefix for common library functions (#9805) Diego Devesa 2024-10-10 22:57:42 +02:00
0e9f760eb1 rpc : add backend registry / device interfaces (#9812) Diego Devesa 2024-10-10 20:14:55 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full