enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

62cfc54f77 Add quantize-stats command for testing quantization (#728) unbounded 2023-04-08 00:09:18 +02:00
698f7b5d63 make : add libllama.so target for llama-cpp-python (#797) bhubbb 2023-04-08 02:11:58 +10:00
c1950c3431 zig : don't link examples/common.cpp for non-example (#814) iacore 2023-04-07 16:05:29 +00:00
4953e9007f llama : always sort logits before nucleus sampling (#812) Ivan Stepanov 2023-04-07 19:02:12 +03:00
cc9cee8e9e Do not crash when it has nothing to say. (#796) Sergey Alirzaev 2023-04-06 17:59:11 +02:00
d2beca95dc Make docker instructions more explicit (#785) Pavol Rusnak 2023-04-06 08:56:58 +02:00
eeaa7b0492 ggml : multi-thread ggml_rope() (~3-4 times faster on M1) (#781) Georgi Gerganov 2023-04-05 22:11:03 +03:00
986b6ce9f9 ggml, llama : avoid heavy V transpose + improvements (#775) Georgi Gerganov 2023-04-05 22:07:33 +03:00
3416298929 Update README.md Georgi Gerganov 2023-04-05 19:54:30 +03:00
5a8c4f6240 llama : define non-positive top_k; top_k range check (#779) Ivan Stepanov 2023-04-05 19:20:05 +03:00
ff05d05c96 miku.sh : add executable bit (#780) at8u 2023-04-05 15:59:13 +00:00
62b3e81aae media : add logos and banners Georgi Gerganov 2023-04-05 18:58:06 +03:00
8d10406d6e readme : change logo + add bindings + add uis + add wiki Georgi Gerganov 2023-04-05 18:56:20 +03:00
ed1c214e66 zig : add build.zig (#773) iacore 2023-04-05 15:06:02 +00:00
0c44427df1 make : missing host optimizations in CXXFLAGS (#763) Ivan Stepanov 2023-04-05 17:38:37 +03:00
594cc95fab readme : update with CMake and windows example (#748) Adithya Balaji 2023-04-05 16:36:12 +02:00
88ed5761b8 examples : add Miku.sh (#724) at8u 2023-04-05 14:32:42 +00:00
58c438cf7d Add Accelerate/BLAS when using Swift (#765) Andrew Duffy 2023-04-05 11:44:24 +01:00
53dbba7695 Windows: reactive sigint handler after each Ctrl-C (#736) mgroeber9110 2023-04-03 18:00:55 +02:00
437e77855a 10+% performance improvement of ggml_vec_dot_q4_0 on AVX2 (#654) SebastianApel 2023-04-03 09:52:28 +02:00
cd7fa95690 Define non-positive temperature behavior (#720) Ivan Stepanov 2023-04-03 03:19:04 +03:00
a0c0516416 Remove torch GPU dependencies from the Docker.full image (#665) bsilvereagle 2023-04-02 15:13:03 -07:00
d8d4e865cd Add a missing step to the gpt4all instructions (#690) Thatcher Chamberlin 2023-04-02 06:48:57 -04:00
e986f94829 Added api for getting/setting the kv_cache (#685) Christian Falch 2023-04-02 12:23:04 +02:00
c0bb1d3ce2 ggml : change ne to int64_t (#626) Marian Cepok 2023-04-02 12:21:31 +02:00
6e7801d08d examples : add gpt4all script (#658) Leonardo Neumann 2023-04-02 04:56:20 -03:00
81040f10aa llama : do not allocate KV cache for "vocab_only == true" (#682) Stephan Walter 2023-04-02 07:18:53 +00:00
c4f89d8d73 make : use -march=native -mtune=native on x86 (#609) Fabian 2023-04-02 09:17:05 +02:00
5b70e7de4c fix default params for examples/main (#697) Murilo Santana 2023-04-01 23:41:12 -03:00
a717cba844 py: huggingface -> Hugging Face (#686) Ikko Eltociear Ashimine 2023-04-02 01:38:18 +09:00
d0a7f742e7 readme: replace termux links with homepage, play store is deprecated (#680) rimoliga 2023-04-01 11:57:30 -03:00
0d054e292e Show error message when -f fails Slaren 2023-03-31 20:03:48 +02:00
3525899277 Enable -std= for cmake builds, fix warnings (#598) Stephan Walter 2023-03-31 19:19:16 +00:00
1d08882afa Optimize AVX2 ggml_vec_dot_q4_0 (#642) slaren 2023-03-31 17:55:52 +02:00
02c5b27e91 Add AVX acceleration (#617) perserk 2023-03-31 16:55:44 +05:00
cbef542879 py : cleanup the code Pavol Rusnak 2023-03-29 21:31:24 +02:00
9733104be5 drop quantize.py (now that models are using a single file) Pavol Rusnak 2023-03-31 00:52:06 +02:00
3df890aef4 readme : update supported models Georgi Gerganov 2023-03-30 22:31:54 +03:00
ee0c40dd6d Introduce GGML migration tool for new file format Justine Tunney 2023-03-30 05:42:56 -07:00
6f23ba5ee2 Ensure --mlock works properly with mmap() support Justine Tunney 2023-03-30 01:53:36 -07:00
78ca9838ee Make loading weights 10-100x faster Justine Tunney 2023-03-29 13:51:37 -07:00
a017390358 Initial windows support (untested) Slaren 2023-03-29 22:22:36 +02:00
ac184d5147 Always initialize mm_addr and mm_length in llama_model Slaren 2023-03-29 08:53:14 +02:00
276e5b7811 Unmap the file in llama_free Slaren 2023-03-29 08:31:26 +02:00
d68c5dc435 Make mmap_file static Slaren 2023-03-29 06:18:18 +02:00
64bde3ffd4 Fix ggml_init_params in quantize Slaren 2023-03-29 05:38:57 +02:00
c03ae8dca1 Add mmap support for model files Slaren 2023-03-29 02:03:43 +02:00
3bcc129ba8 cmake : properly invoke CTest (#629) Stephan Walter 2023-03-30 17:56:59 +00:00
a4755cf288 Remove unused variable (#607) Casey Primozic 2023-03-30 10:53:35 -07:00
1f0414feec make : fix darwin f16c flags check (#615) david raistrick 2023-03-30 13:34:45 -04:00
77efdf5a50 ggml : fix NEON signs (close #620, #622) Georgi Gerganov 2023-03-30 20:27:32 +03:00
ed3c680bcd Fix GGML_F32Cx8_STORE in AVX without F16C path (#619) slaren 2023-03-30 11:16:30 +02:00
9cbc404ba6 ci : re-enable AVX512 testing (Windows-MSVC) (#584) anzz1 2023-03-29 23:44:39 +03:00
b51c717d5c ggml : init time on first ggml_init() call Georgi Gerganov 2023-03-29 22:15:34 +03:00
0ba76c1e73 llama : fix compile warnings when reading the vocab Georgi Gerganov 2023-03-29 22:13:12 +03:00
cea1c85948 ggml : add ARM_NEON dequantize_row_q4_1() Georgi Gerganov 2023-03-29 22:10:01 +03:00
f202ada131 ggml : add ARM_NEON quantize_row_q4_1() Georgi Gerganov 2023-03-29 22:03:02 +03:00
3b44d30d9b ggml : add ARM_NEON ggml_vec_dot_q4_1() Georgi Gerganov 2023-03-29 21:47:33 +03:00
61cbfff5c9 rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600) Pavol Rusnak 2023-03-29 20:09:25 +02:00
d9ad104440 Create chat-13B.bat (#592) Thérence 2023-03-29 19:21:09 +02:00
b467702b87 readme : fix typos Georgi Gerganov 2023-03-29 19:38:31 +03:00
516d88e75c readme : add GPT4All instructions (close #588) Georgi Gerganov 2023-03-29 19:37:20 +03:00
53635c081c py : add GPT4All conversion script Georgi Gerganov 2023-03-29 19:29:26 +03:00
41318d708e llama : use the same threshold for OpenBLAS and ggml thread limiting (#577) Maël Kerbiriou 2023-03-29 18:10:07 +02:00
a6956b25a1 add example of re-act pattern (#583) Tobias Lütke 2023-03-29 17:10:24 +02:00
83df5639eb Fix GCC warning about binary literal (#595) anzz1 2023-03-29 16:20:07 +03:00
a5c42c4b13 Fix typo in llama.h (#593) anzz1 2023-03-29 16:19:29 +03:00
5a5f8b1501 Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC (#375) anzz1 2023-03-28 22:44:29 +03:00
f1217055ea CI: fix subdirectory path globbing (#546) anzz1 2023-03-28 22:43:25 +03:00
7f4c5c6651 llama : fix linkage with mingw (#551) anzz1 2023-03-28 21:23:09 +03:00
2a98bc18ea ggml : add AVX2 implementation of quantize_row_q4_1 (#515) slaren 2023-03-28 20:06:03 +02:00
d0aaff571c py : add temporary script to convert old ggml files to newer version (#539) thement 2023-03-28 19:55:42 +02:00
d0330fd783 py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning (#403) Tai Duc Nguyen 2023-03-28 13:51:29 -04:00
99c5b27654 ggml : refactor quantized processing functions (#509) Stephan Walter 2023-03-28 17:13:01 +00:00
692ce3164e py : removed unused model variable and verified that the code functions correctly with vocab_only setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) DooWoong Lee (David) 2023-03-29 02:02:34 +09:00
96f9c0506f ci : make ctest verbose, hopefully we see what is wrong with the sanitizer Georgi Gerganov 2023-03-28 20:01:09 +03:00
d502bc7c9d tests : free llama context at the end of the test Georgi Gerganov 2023-03-28 19:51:55 +03:00
436e561931 all : be more strict about converting float to double (#458) Stephan Walter 2023-03-28 16:48:20 +00:00
20e1e84884 deploy : add a Package.swift for SwiftPM support (#393) Jed Fox 2023-03-28 11:39:01 -05:00
c1f885067c ggml : introduce structs for the q4 data blocks (#356) Stephan Walter 2023-03-28 15:56:03 +00:00
e0670260fb gitignore : add "embedding" Georgi Gerganov 2023-03-28 18:34:35 +03:00
28ba975aea Check the existence of f16_model_path_base in quantize.py (#574) dotpy314 2023-03-28 23:06:28 +08:00
a6bdc47cba Fix usage of F16C intrinsics in AVX code (#563) slaren 2023-03-28 16:26:55 +02:00
7b8dbcb78b main.cpp fixes, refactoring (#571) anzz1 2023-03-28 17:09:55 +03:00
4b8efff0e3 Add embedding example to Makefile (#540) RJ Adriaansen 2023-03-28 08:11:09 +02:00
7e5395575a Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) Marco Matthies 2023-03-27 06:55:26 +02:00
34c1072e49 ci: add debug build to sanitizer build matrix (#527) Erik Scholz 2023-03-26 17:48:40 +02:00
939ad2d3a5 Fix undefined variables in debug build, remove unused variables (#531) Stephan Walter 2023-03-26 15:34:02 +00:00
8c2ec5e21d Add support for linux/arm64 platform during Docker Builds (#514) Juan Calderon-Perez 2023-03-26 10:48:42 -04:00
b391579db9 Update README and comments for standalone perplexity tool (#525) Stephan Walter 2023-03-26 13:14:01 +00:00
7a87d31f4f [main] fix infinite generation (-n == -1) (#523) anzz1 2023-03-26 16:06:10 +03:00
348d6926ee Add logo to README.md Georgi Gerganov 2023-03-26 10:20:49 +03:00
33e35b8fe8 Exit from interactive mode if input stream is bad (#491) Harald Fernengel 2023-03-26 07:25:46 +02:00
19726169b3 CI: Run other sanitizer builds even if one fails (#511) anzz1 2023-03-26 00:13:28 +02:00
f732695cd5 Clarify console output in convert-pth-to-ggml.py (#512) jp-x-g 2023-03-25 14:53:55 -07:00
2f7bf7dd7c CMake / CI additions (#497) anzz1 2023-03-25 23:38:11 +02:00
34ab526843 (Windows) Set console to UTF-8 on init (#420) anzz1 2023-03-25 22:29:22 +02:00
c2b25b6912 Fix colors enabling on WIN32 Georgi Gerganov 2023-03-25 21:53:39 +02:00
79b2b266db If n_predict == -1, generate forever Georgi Gerganov 2023-03-25 21:51:41 +02:00
e2d490dafd Inifinite generation via context swapping (#71) Georgi Gerganov 2023-03-25 21:36:22 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full