enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

018f2279f5 cmake : link threads publicly to ggml (#1042) 源文雨 2023-04-22 02:27:06 +08:00
9411288271 main : evaluate tokens in batches after swapping context (#1014) Alex Klinkhamer 2023-04-21 11:18:09 -07:00
8687c1f258 llama : remember and restore kv cache data pointers (#1104) xaedes 2023-04-21 17:25:21 +02:00
1bfc153e2f ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) Kawrakow 2023-04-21 17:18:26 +02:00
3d59769c3b Show perplexity ETA in hours and minutes (#1096) slaren 2023-04-21 14:57:57 +02:00
d40fded93e llama : fix comment for "output.weight" tensor Georgi Gerganov 2023-04-21 10:23:36 +03:00
2510c1831f Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +00:00
12b5900dbc ggml : sync ggml (add GPT-NeoX RoPE implementation) Georgi Gerganov 2023-04-20 23:32:59 +03:00
9ff334f3c9 ggml : fix bug in ggml_compute_forward_dup_f32() Georgi Gerganov 2023-04-20 21:58:05 +03:00
2005469ea1 Add Q4_3 support to cuBLAS (#1086) slaren 2023-04-20 20:49:53 +02:00
8a1756abdf ggml : do not break cuBLAS build (Q4_3 is not yet implemented) Georgi Gerganov 2023-04-20 21:43:50 +03:00
66aab46079 ggml : fix Q4_3 quantization Georgi Gerganov 2023-04-20 20:44:05 +03:00
38de86a711 llama : multi-threaded quantization (#1075) Kawrakow 2023-04-20 19:42:27 +02:00
e0305ead3a ggml : add Q4_3 quantization (#1082) Georgi Gerganov 2023-04-20 20:35:53 +03:00
6a9661ea5a ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) Ivan Komarov 2023-04-20 17:15:18 +02:00
5addcb120c fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 源文雨 2023-04-20 21:28:43 +08:00
c8c2c52482 AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) Stephan Walter 2023-04-20 06:45:41 +00:00
02d6988121 Improve cuBLAS performance by dequantizing on the GPU (#1065) slaren 2023-04-20 03:14:14 +02:00
834695fe3a Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -05:00
f7d05095b4 Q4_2 quantization with rmse-optimized scale and quants (#1062) Kawrakow 2023-04-19 20:20:14 +02:00
884e7d7a2b ggml : use 8-bit precision for Q4_1 intermediate results (#1047) Georgi Gerganov 2023-04-19 20:10:08 +03:00
7cd5c4a3e9 readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +03:00
f3d4edf504 ggml : Q4 cleanup - remove 4-bit dot product code (#1061) Stephan Walter 2023-04-19 16:06:37 +00:00
8944a13296 Add NVIDIA cuBLAS support (#1044) slaren 2023-04-19 11:22:45 +02:00
6667401238 Multi-threaded ggml_cpy (#1035) slaren 2023-04-19 00:53:24 +02:00
77a73403ca ggml : add new Q4_2 quantization (ARM only) (#1046) Georgi Gerganov 2023-04-18 23:54:57 +03:00
50a8a2af97 ggml : scratch that - vmlaq_n_f32 is always better Georgi Gerganov 2023-04-18 23:11:23 +03:00
4caebf6d40 gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +03:00
dcdd65e296 ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators Georgi Gerganov 2023-04-18 22:59:17 +03:00
5ecff35151 Adding a simple program to measure speed of dot products (#1041) Kawrakow 2023-04-18 21:00:14 +02:00
7faa7460f0 readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +03:00
5af8e32238 ci : do not run on drafts Georgi Gerganov 2023-04-17 18:00:10 +03:00
42747220b4 Do not close file after mmap (Windows version) (#1034) Ivan Komarov 2023-04-18 03:15:50 +02:00
e9298af389 readme : add Ruby bindings (#1029) Atsushi Tatsuma 2023-04-18 04:34:35 +09:00
4ad73137a1 add 4_0 to default outfile namestr dict (#1031) Cameron 2023-04-17 11:26:23 -07:00
315a95a4d3 Add LoRA support (#820) slaren 2023-04-17 17:28:55 +02:00
efd05648c8 llama : well-defined static initialization of complex objects (#927) Arik Poznanski 2023-04-17 17:41:53 +03:00
eb17a026fd quantize-stats : fix bug in --type argument Georgi Gerganov 2023-04-17 17:31:06 +03:00
69b740289f ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c Georgi Gerganov 2023-04-17 16:16:23 +03:00
f266259ad9 Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) Ivan Komarov 2023-04-17 15:10:57 +02:00
47f61aaa5f Fix: do not close file on mmap (#1017) slaren 2023-04-16 21:27:38 +02:00
3173a62eb9 stdout : vertical align outputs for better readibility Georgi Gerganov 2023-04-16 13:58:48 +03:00
489537e6cf examples: add missing <ctime> include for time() (#1011) Pavol Rusnak 2023-04-16 12:13:00 +02:00
2d3481c721 Fix msys2 build error and warnings (#1009) nanahi 2023-04-16 17:13:42 +08:00
74f5899df4 convert.py: Fix loading safetensors and ggml format on Windows (#991) comex 2023-04-15 14:53:21 -07:00
2f7c8e014e Fix potential int8 overflow in non-SIMD vec_dot (#986) Stephan Walter 2023-04-15 18:28:56 +00:00
0ad964631f Refactor ggml.c for future tensor types (#1001) Stephan Walter 2023-04-15 16:25:38 +00:00
e95b6554b4 ggml : add Q8_0 quantization for intermediate results (#951) Georgi Gerganov 2023-04-15 17:53:22 +03:00
aa485cee33 ggml : use posix_memalign on non-Windows env Georgi Gerganov 2023-04-15 14:25:45 +03:00
c12b14b77f benchmark : fix result validation in benchmark-q4_0-matmult (#987) Ivan Komarov 2023-04-15 07:51:54 +02:00
106faaf297 cmake : add finding the OpenBLAS header file (#992) katsu560 2023-04-15 14:51:11 +09:00
c85e03d12e Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" (#982) Pavol Rusnak 2023-04-14 21:58:43 +02:00
489093548c py : bump sentencepiece to 0.1.98 to support Python 3.11 (#976) Pavol Rusnak 2023-04-14 21:46:49 +02:00
93265e988a make : fix dependencies, use auto variables (#983) Stephan Walter 2023-04-14 19:39:48 +00:00
c56b715269 Expose type name from ggml (#970) Pavol Rusnak 2023-04-14 20:05:37 +02:00
f4d277ae17 main : alternative instruct mode (Vicuna support, etc.) (#863) Tomáš Pazdiora 2023-04-14 17:19:17 +02:00
c9a59b70a5 ggml : add unary and binary map operations (#874) Kerfuffle 2023-04-14 08:43:55 -06:00
a32f7acc9f py : cleanup dependencies (#962) Pavol Rusnak 2023-04-14 15:37:11 +02:00
43ffdefb74 py : fix flake8 and isort nitpicks (#960) Pavol Rusnak 2023-04-14 14:23:21 +02:00
1623a6e9b4 ggml : minor Georgi Gerganov 2023-04-14 13:31:29 +03:00
c14e0d2f23 ggml : always allocate buffers with size multiple of GGML_MEM_ALIGN Georgi Gerganov 2023-04-14 13:31:15 +03:00
723dac55fa py : new conversion script (#545) comex 2023-04-14 00:03:03 -07:00
0f07cacb05 ggml : fix q4_1 dot product types Georgi Gerganov 2023-04-14 09:45:42 +03:00
c5d70f5c9e ggml : optimize rope function to avoid call powf in the tight loop (#807) Howard Su 2023-04-14 14:24:52 +08:00
be87b6ed20 perplexity : add support for batch size to --perplexity (#407) Gary Linscott 2023-04-13 14:50:42 -07:00
0e07e6a839 common : remove unnecessary includes (#947) CRD716 2023-04-13 10:39:25 -05:00
a3a2a0eda8 ggml : add GGML_DEFAULT_N_THREADS Georgi Gerganov 2023-04-13 18:36:40 +03:00
d990e3fffc ggml : speed-up ggml_vec_dot_q4_1() ARM_NEON + 32-bit ARM support (#900) Georgi Gerganov 2023-04-13 18:32:36 +03:00
9190e8eac8 llama : merge llama_internal.h into llama.h Georgi Gerganov 2023-04-13 18:04:45 +03:00
c85980acd0 gitignore : benchmark Georgi Gerganov 2023-04-13 18:01:22 +03:00
6232f2d7fd ggml : optimize non-SIMD Q4_0 vector dot product (#703) Stephan Walter 2023-04-13 14:59:50 +00:00
6c248707f5 ggml : introduce GGML_ALIGNED_MALLOC/GGML_ALIGNED_FREE macros (#884) Pavol Rusnak 2023-04-13 16:08:32 +02:00
8cda5c981d fix whitespace (#944) CRD716 2023-04-13 09:03:57 -05:00
ec29272175 readme : remove python 3.10 warning (#929) CRD716 2023-04-13 08:59:53 -05:00
7e941b95eb readme : llama node binding (#911) Genkagaku.GPT 2023-04-13 21:54:27 +08:00
c729ff730a flake.nix: add all binaries from bin (#848) Pavol Rusnak 2023-04-13 15:49:05 +02:00
4579af95e8 zig : update build.zig (#872) Judd 2023-04-13 21:43:22 +08:00
8c3ffc2f04 ggml : update cblas_sgemm columns var to be more reasonable (#838) Vladimir 2023-04-13 15:24:30 +02:00
107980d970 examples : add -n to alpaca and gpt4all scripts (#706) niansa/tuxifan 2023-04-13 15:03:39 +02:00
585d91a156 cmake : add explicit F16C option (x86) (#576) anzz1 2023-04-13 15:48:21 +03:00
95ea26f6e9 benchmark : add tool for timing q4_0 matrix multiplication (#653) SebastianApel 2023-04-13 14:46:23 +02:00
82d146df9b do not force the prompt file to end with a new line (#908) Pavol Rusnak 2023-04-13 11:33:16 +02:00
e7f6997f89 Don't crash on ftype (formerly f16) == 4 (#917) Stephan Walter 2023-04-12 15:06:16 +00:00
f76cb3a34d readme : change "GPU support" link to discussion Georgi Gerganov 2023-04-12 14:48:57 +03:00
782438070f readme : update hot topics with link to "GPU support" issue Georgi Gerganov 2023-04-12 14:31:12 +03:00
4dbbd40750 readme: link to sha256sums file (#902) Nicolai Weitkemper 2023-04-12 08:46:20 +02:00
8b679987cd Fix whitespace, add .editorconfig, add GitHub workflow (#883) Pavol Rusnak 2023-04-11 21:45:44 +02:00
3e6e70d8e8 Add enum llama_ftype, sync ggml_type to model files (#709) Stephan Walter 2023-04-11 15:03:51 +00:00
2663d2c678 Windows fixes (#890) comex 2023-04-11 06:19:54 -07:00
a0caa34b16 Add BAIR's Koala to supported models (#877) qouoq 2023-04-11 04:41:53 +08:00
461ba9e66e ggml : fix WASM build Georgi Gerganov 2023-04-10 23:20:01 +03:00
c3ac702e5e ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst Georgi Gerganov 2023-04-10 22:40:28 +03:00
9d634ef452 ggml : remove trailing whitespaces Georgi Gerganov 2023-04-10 19:32:45 +03:00
d9a239c410 Simplify to include lower-case windows.h always, fix compile on mingw32 (#747) Marco Matthies 2023-04-10 19:57:59 +02:00
684da25926 ggml : fix quantize_row_q4_1() ARM_NEON (close #876) Georgi Gerganov 2023-04-10 19:29:48 +03:00
180b693a47 Print model version. comex 2023-04-08 13:08:21 -07:00
f963b63afa Rewrite loading code to try to satisfy everyone: comex 2023-04-08 12:24:37 -07:00
aaf3b23deb fix for windows utf-8 input (#840) Tomáš Pazdiora 2023-04-08 17:49:39 +02:00
f2d1c47294 cmake should link openblas properly with -lopenblas like how it's done in the makefile (#839) eiery 2023-04-08 07:15:17 -04:00
317fb12fbd Add new binaries to flake.nix (#847) lon 2023-04-08 07:04:23 -03:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full