enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

1666f92dcd gguf-hash : update clib.json to point to original xxhash repo (#8491) Brian 2024-07-16 17:14:16 +10:00
37b12f92ab export-lora : handle help argument (#8497) Steve Bonds 2024-07-16 00:04:45 -07:00
0efec57787 llama : valign + remove unused ftype (#8502) Georgi Gerganov 2024-07-16 10:00:30 +03:00
7acfd4e8d5 convert_hf : faster lazy safetensors (#8482) compilade 2024-07-15 23:13:10 -04:00
97bdd26eee Refactor lora adapter support (#8332) Xuan Son Nguyen 2024-07-15 20:50:47 +02:00
4db8f60fe7 fix ci (#8494) Xuan Son Nguyen 2024-07-15 19:23:10 +02:00
8fac431b06 ggml : suppress unknown pragma 'GCC' on windows (#8460) Daniel Bevenius 2024-07-15 14:48:17 +02:00
f17f39ff9c server: update README.md with llama-server --help output [no ci] (#8472) M-A 2024-07-15 08:04:56 -04:00
9104bc20ed common : add --no-cont-batching arg (#6358) Georgi Gerganov 2024-07-15 14:54:58 +03:00
fc690b018e docs: fix links in development docs [no ci] (#8481) NikolaiLyssogor 2024-07-15 04:46:39 -07:00
16bdfa42ac [SYCL] add concat through dim 1/2 (#8483) Meng, Hengyu 2024-07-15 19:32:15 +08:00
3dfda05956 llama : de-duplicate deepseek2 norm Georgi Gerganov 2024-07-15 14:10:39 +03:00
bda62d7999 Vulkan MMQ Fix (#8479) 0cc4m 2024-07-15 09:38:52 +02:00
090fca7a07 pydantic : replace uses of __annotations__ with get_type_hints (#8474) compilade 2024-07-14 19:51:21 -04:00
aaab2419ea flake.lock: Update (#8475) Georgi Gerganov 2024-07-14 18:54:02 +03:00
73cf442e7b llama : fix Gemma-2 Query scaling factors (#8473) Georgi Gerganov 2024-07-14 14:05:09 +03:00
e236528e76 gguf_hash.py: Add sha256 (#8470) Brian 2024-07-14 16:47:14 +10:00
fa79495bb4 llama : fix pre-tokenization of non-special added tokens (#8228) compilade 2024-07-13 23:35:10 -04:00
17eb6aa8a9 vulkan : cmake integration (#8119) bandoti 2024-07-13 13:12:39 -03:00
c917b67f06 metal : template-ify some of the kernels (#8447) Georgi Gerganov 2024-07-13 18:32:33 +03:00
4e24cffd8c server : handle content array in chat API (#8449) Georgi Gerganov 2024-07-12 14:48:15 +03:00
6af51c0d96 main : print error on empty input (#8456) Georgi Gerganov 2024-07-12 14:48:04 +03:00
f53226245f llama : suppress unary minus operator warning (#8448) Daniel Bevenius 2024-07-12 11:05:21 +02:00
c3ebcfa148 server : ensure batches are either all embed or all completion (#8420) Douglas Hanley 2024-07-12 03:14:12 -05:00
8a4441ea1a docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441) Armen Kaleshian 2024-07-12 04:08:19 -04:00
5aefbce27a convert : remove fsep token from GPTRefactForCausalLM (#8237) Jiří Podivín 2024-07-12 10:06:33 +02:00
71c1121d11 examples : sprintf -> snprintf (#8434) Georgi Gerganov 2024-07-12 10:46:14 +03:00
370b1f7e7a ggml : minor naming changes (#8433) Georgi Gerganov 2024-07-12 10:46:02 +03:00
b549a1bbef [SYCL] fix the mul_mat_id ut issues (#8427) Chen Xi 2024-07-12 00:52:04 +00:00
368645698a ggml : add NVPL BLAS support (#8329) (#8425) Nicholai Tukanov 2024-07-11 11:49:15 -05:00
b078c619aa cuda : suppress 'noreturn' warn in no_device_code (#8414) Daniel Bevenius 2024-07-11 17:53:42 +02:00
808aba3916 CUDA: optimize and refactor MMQ (#8416) Johannes Gäßler 2024-07-11 16:47:47 +02:00
a977c11544 gitignore : deprecated binaries Georgi Gerganov 2024-07-11 11:20:40 +03:00
9a55ffe6fb tokenize : add --no-parse-special option (#8423) compilade 2024-07-11 03:41:48 -04:00
7a221b672e llama : use F32 precision in Qwen2 attention and no FA (#8412) Georgi Gerganov 2024-07-11 10:21:30 +03:00
278d0e1846 Initialize default slot sampling parameters from the global context. (#8418) Clint Herron 2024-07-10 20:08:17 -04:00
dd07a123b7 Name Migration: Build the deprecation-warning 'main' binary every time (#8404) Clint Herron 2024-07-10 12:35:18 -04:00
f4444d992c [SYCL] Use multi_ptr to clean up deprecated warnings (#8256) AidanBeltonS 2024-07-10 16:10:49 +01:00
6b2a849d1f ggml : move sgemm sources to llamafile subfolder (#8394) Georgi Gerganov 2024-07-10 15:23:29 +03:00
0f1a39f343 ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) Dibakar Gope 2024-07-10 07:14:51 -05:00
83321c6958 gguf-py rel pipeline (#8410) M. Yusuf Sarıgöz 2024-07-10 15:12:35 +03:00
cc61948b1f llama : C++20 compatibility for u8 strings (#8408) Borislav Stanimirov 2024-07-10 14:45:44 +03:00
7a80710d93 msvc : silence codecvt c++17 deprecation warnings (#8395) Borislav Stanimirov 2024-07-10 14:40:53 +03:00
a8be1e6f59 llama : add assert about missing llama_encode() call (#8400) fairydreaming 2024-07-10 13:38:58 +02:00
e4dd31ff89 py : fix converter for internlm2 (#8321) RunningLeon 2024-07-10 19:26:40 +08:00
8f0fad42b9 py : fix extra space in convert_hf_to_gguf.py (#8407) laik 2024-07-10 19:19:10 +08:00
a59f8fdc85 Server: Enable setting default sampling parameters via command-line (#8402) Clint Herron 2024-07-09 18:26:40 -04:00
fd560fe680 Update README.md to fix broken link to docs (#8399) Andy Salerno 2024-07-09 11:58:44 -07:00
e500d6135a Deprecation warning to assist with migration to new binary names (#8283) Clint Herron 2024-07-09 11:54:43 -04:00
a03e8dd99d make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) Johannes Gäßler 2024-07-09 17:11:07 +02:00
5b0b8d8cfb sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372) Alberto Cabrera Pérez 2024-07-09 15:03:15 +01:00
9925ca4087 cmake : allow external ggml (#8370) Borislav Stanimirov 2024-07-09 11:38:00 +03:00
9beb2dda03 readme : fix typo [no ci] (#8389) daghanerdonmez 2024-07-09 09:16:00 +03:00
7d0e23d72e gguf-py : do not use internal numpy types (#7472) compilade 2024-07-09 01:04:49 -04:00
7fdb6f73e3 flake.lock: Update (#8342) Georgi Gerganov 2024-07-09 01:36:38 +03:00
a130eccef4 labeler : updated sycl to match docs and code refactor (#8373) Alberto Cabrera Pérez 2024-07-08 21:35:17 +01:00
c4dd11d1d3 readme : fix web link error [no ci] (#8347) b4b4o 2024-07-08 22:19:24 +08:00
2ec846d558 sycl : fix powf call in device code (#8368) Alberto Cabrera Pérez 2024-07-08 14:22:41 +01:00
3f2d538b81 scripts : fix sync for sycl Georgi Gerganov 2024-07-08 13:51:31 +03:00
2ee44c9a18 sync : ggml Georgi Gerganov 2024-07-08 10:39:50 +03:00
6847d54c4f tests : fix whitespace (#0) Georgi Gerganov 2024-07-08 10:39:36 +03:00
fde13b3bb9 feat: cuda implementation for ggml_conv_transpose_1d (ggml/854) John Balis 2024-07-02 11:09:52 -05:00
470939d483 common : preallocate sampling token data vector (#8363) Kevin Wang 2024-07-08 03:26:53 -04:00
6f0dbf6ab0 infill : assert prefix/suffix tokens + remove old space logic (#8351) Georgi Gerganov 2024-07-08 09:34:35 +03:00
ffd00797d8 common : avoid unnecessary logits fetch (#8358) Kevin Wang 2024-07-08 02:31:55 -04:00
04ce3a8b19 readme : add supported glm models (#8360) toyer 2024-07-08 13:57:19 +08:00
3fd62a6b1c py : type-check all Python scripts with Pyright (#8341) compilade 2024-07-07 15:04:39 -04:00
a8db2a9ce6 Update llama-cli documentation (#8315) Denis Spasyuk 2024-07-07 09:08:28 -06:00
4090ea5501 ci : add checks for cmake,make and ctest in ci/run.sh (#8200) Alex Tuddenham 2024-07-07 15:59:14 +01:00
f1948f1e10 readme : update bindings list (#8222) Andy Tai 2024-07-07 06:21:37 -07:00
f7cab35ef9 gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048) Brian 2024-07-07 22:58:43 +10:00
905942abdb llama : support glm3 and glm4 (#8031) toyer 2024-07-07 20:52:10 +08:00
b5040086d4 llama : fix n_rot default (#8348) Georgi Gerganov 2024-07-07 14:59:02 +03:00
d39130a398 py : use cpu-only torch in requirements.txt (#8335) compilade 2024-07-07 07:23:38 -04:00
b81ba1f96b finetune: Rename command name in README.md (#8343) standby24x7 2024-07-07 19:38:02 +09:00
210eb9ed0a finetune: Rename an old command name in finetune.sh (#8344) standby24x7 2024-07-07 19:37:47 +09:00
cb4d86c4d7 server: Retrieve prompt template in /props (#8337) Bjarke Viksøe 2024-07-07 11:10:38 +02:00
86e7299ef5 added support for Authorization Bearer tokens when downloading model (#8307) Derrick T. Woolworth 2024-07-06 15:32:04 -05:00
60d83a0149 update main readme (#8333) Xuan Son Nguyen 2024-07-06 19:01:23 +02:00
87e25a1d1b llama : add early return for empty range (#8327) Daniel Bevenius 2024-07-06 09:22:16 +02:00
213701b51a Detokenizer fixes (#8039) jaime-m-p 2024-07-05 19:01:35 +02:00
be20e7f49d Reorganize documentation pages (#8325) Xuan Son Nguyen 2024-07-05 18:08:32 +02:00
7ed03b8974 llama : fix compile warning (#8304) Georgi Gerganov 2024-07-05 17:32:09 +03:00
1d894a790e cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281) Natsu 2024-07-05 22:29:35 +08:00
1f3e1b66e2 Enabled more data types for oneMKL gemm_batch (#8236) Ouadie EL FAROUKI 2024-07-05 13:23:25 +01:00
148ec970b6 convert : remove AWQ remnants (#8320) Georgi Gerganov 2024-07-05 10:15:36 +03:00
2cccbaa008 llama : minor indentation during tensor loading (#8304) Georgi Gerganov 2024-07-05 10:15:24 +03:00
8e558309dc CUDA: MMQ support for iq4_nl, iq4_xs (#8278) Johannes Gäßler 2024-07-05 09:06:31 +02:00
0a423800ff CUDA: revert part of the RDNA1 optimizations (#8309) Daniele 2024-07-05 07:06:09 +00:00
d12f781074 llama : streamline embeddings from "non-embedding" models (#8087) Douglas Hanley 2024-07-05 02:05:56 -05:00
bcefa03bc0 CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311) Johannes Gäßler 2024-07-05 09:05:34 +02:00
5a7447c569 readme : fix minor typos [no ci] (#8314) Pieter Ouwerkerk 2024-07-05 02:58:41 -04:00
61ecafa390 passkey : add short intro to README.md [no-ci] (#8317) Daniel Bevenius 2024-07-05 08:14:24 +02:00
aa5898dc53 llama : prefer n_ over num_ prefix (#8308) Georgi Gerganov 2024-07-05 09:10:03 +03:00
6c05752c50 contributing : update guidelines (#8316) Georgi Gerganov 2024-07-05 09:09:47 +03:00
a9554e20b6 [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) luoyu-intel 2024-07-05 05:06:13 +00:00
e235b267a2 py : switch to snake_case (#8305) Georgi Gerganov 2024-07-05 07:53:33 +03:00
f09b7cb609 rm get_work_group_size() by local cache for performance (#8286) Neo Zhang Jianyu 2024-07-05 10:32:29 +08:00
a38b884c6c cli: add EOT when user hit Ctrl+C (#8296) Xuan Son Nguyen 2024-07-04 20:55:03 +02:00
d7fd29fff1 llama : add OpenELM support (#7359) Icecream95 2024-07-05 05:14:21 +12:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full