-
018f2279f5
cmake : link threads publicly to ggml (#1042)
源文雨
2023-04-22 02:27:06 +08:00
-
9411288271
main : evaluate tokens in batches after swapping context (#1014)
Alex Klinkhamer
2023-04-21 11:18:09 -07:00
-
8687c1f258
llama : remember and restore kv cache data pointers (#1104)
xaedes
2023-04-21 17:25:21 +02:00
-
1bfc153e2f
ggml : a faster version for Q4_1 x Q8_0 dot products (#1083)
Kawrakow
2023-04-21 17:18:26 +02:00
-
3d59769c3b
Show perplexity ETA in hours and minutes (#1096)
slaren
2023-04-21 14:57:57 +02:00
-
d40fded93e
llama : fix comment for "output.weight" tensor
Georgi Gerganov
2023-04-21 10:23:36 +03:00
-
2510c1831f
Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088)
Stephan Walter
2023-04-20 21:56:44 +00:00
-
12b5900dbc
ggml : sync ggml (add GPT-NeoX RoPE implementation)
Georgi Gerganov
2023-04-20 23:32:59 +03:00
-
9ff334f3c9
ggml : fix bug in ggml_compute_forward_dup_f32()
Georgi Gerganov
2023-04-20 21:58:05 +03:00
-
2005469ea1
Add Q4_3 support to cuBLAS (#1086)
slaren
2023-04-20 20:49:53 +02:00
-
8a1756abdf
ggml : do not break cuBLAS build (Q4_3 is not yet implemented)
Georgi Gerganov
2023-04-20 21:43:50 +03:00
-
66aab46079
ggml : fix Q4_3 quantization
Georgi Gerganov
2023-04-20 20:44:05 +03:00
-
38de86a711
llama : multi-threaded quantization (#1075)
Kawrakow
2023-04-20 19:42:27 +02:00
-
e0305ead3a
ggml : add Q4_3 quantization (#1082)
Georgi Gerganov
2023-04-20 20:35:53 +03:00
-
6a9661ea5a
ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074)
Ivan Komarov
2023-04-20 17:15:18 +02:00
-
5addcb120c
fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080)
源文雨
2023-04-20 21:28:43 +08:00
-
c8c2c52482
AVX2 optimization for vec_dot_q4_2_q8_0 (#1068)
Stephan Walter
2023-04-20 06:45:41 +00:00
-
02d6988121
Improve cuBLAS performance by dequantizing on the GPU (#1065)
slaren
2023-04-20 03:14:14 +02:00
-
834695fe3a
Minor: Readme fixed grammar, spelling, and misc updates (#1071)
CRD716
2023-04-19 14:52:14 -05:00
-
f7d05095b4
Q4_2 quantization with rmse-optimized scale and quants (#1062)
Kawrakow
2023-04-19 20:20:14 +02:00
-
884e7d7a2b
ggml : use 8-bit precision for Q4_1 intermediate results (#1047)
Georgi Gerganov
2023-04-19 20:10:08 +03:00
-
7cd5c4a3e9
readme : add warning about Q4_2 and Q4_3
Georgi Gerganov
2023-04-19 19:07:54 +03:00
-
f3d4edf504
ggml : Q4 cleanup - remove 4-bit dot product code (#1061)
Stephan Walter
2023-04-19 16:06:37 +00:00
-
8944a13296
Add NVIDIA cuBLAS support (#1044)
slaren
2023-04-19 11:22:45 +02:00
-
6667401238
Multi-threaded ggml_cpy (#1035)
slaren
2023-04-19 00:53:24 +02:00
-
77a73403ca
ggml : add new Q4_2 quantization (ARM only) (#1046)
Georgi Gerganov
2023-04-18 23:54:57 +03:00
-
50a8a2af97
ggml : scratch that - vmlaq_n_f32 is always better
Georgi Gerganov
2023-04-18 23:11:23 +03:00
-
4caebf6d40
gitignore : vdot
Georgi Gerganov
2023-04-18 23:00:08 +03:00
-
dcdd65e296
ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators
Georgi Gerganov
2023-04-18 22:59:17 +03:00
-
5ecff35151
Adding a simple program to measure speed of dot products (#1041)
Kawrakow
2023-04-18 21:00:14 +02:00
-
7faa7460f0
readme : update hot topics about new LoRA functionality
Georgi Gerganov
2023-04-18 20:10:26 +03:00
-
5af8e32238
ci : do not run on drafts
Georgi Gerganov
2023-04-17 18:00:10 +03:00
-
42747220b4
Do not close file after mmap (Windows version) (#1034)
Ivan Komarov
2023-04-18 03:15:50 +02:00
-
e9298af389
readme : add Ruby bindings (#1029)
Atsushi Tatsuma
2023-04-18 04:34:35 +09:00
-
4ad73137a1
add 4_0 to default outfile namestr dict (#1031)
Cameron
2023-04-17 11:26:23 -07:00
-
315a95a4d3
Add LoRA support (#820)
slaren
2023-04-17 17:28:55 +02:00
-
efd05648c8
llama : well-defined static initialization of complex objects (#927)
Arik Poznanski
2023-04-17 17:41:53 +03:00
-
eb17a026fd
quantize-stats : fix bug in --type argument
Georgi Gerganov
2023-04-17 17:31:06 +03:00
-
69b740289f
ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c
Georgi Gerganov
2023-04-17 16:16:23 +03:00
-
f266259ad9
Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933)
Ivan Komarov
2023-04-17 15:10:57 +02:00
-
47f61aaa5f
Fix: do not close file on mmap (#1017)
slaren
2023-04-16 21:27:38 +02:00
-
3173a62eb9
stdout : vertical align outputs for better readibility
Georgi Gerganov
2023-04-16 13:58:48 +03:00
-
489537e6cf
examples: add missing <ctime> include for time() (#1011)
Pavol Rusnak
2023-04-16 12:13:00 +02:00
-
2d3481c721
Fix msys2 build error and warnings (#1009)
nanahi
2023-04-16 17:13:42 +08:00
-
74f5899df4
convert.py: Fix loading safetensors and ggml format on Windows (#991)
comex
2023-04-15 14:53:21 -07:00
-
2f7c8e014e
Fix potential int8 overflow in non-SIMD vec_dot (#986)
Stephan Walter
2023-04-15 18:28:56 +00:00
-
0ad964631f
Refactor ggml.c for future tensor types (#1001)
Stephan Walter
2023-04-15 16:25:38 +00:00
-
e95b6554b4
ggml : add Q8_0 quantization for intermediate results (#951)
Georgi Gerganov
2023-04-15 17:53:22 +03:00
-
aa485cee33
ggml : use posix_memalign on non-Windows env
Georgi Gerganov
2023-04-15 14:25:45 +03:00
-
c12b14b77f
benchmark : fix result validation in benchmark-q4_0-matmult (#987)
Ivan Komarov
2023-04-15 07:51:54 +02:00
-
106faaf297
cmake : add finding the OpenBLAS header file (#992)
katsu560
2023-04-15 14:51:11 +09:00
-
c85e03d12e
Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" (#982)
Pavol Rusnak
2023-04-14 21:58:43 +02:00
-
489093548c
py : bump sentencepiece to 0.1.98 to support Python 3.11 (#976)
Pavol Rusnak
2023-04-14 21:46:49 +02:00
-
93265e988a
make : fix dependencies, use auto variables (#983)
Stephan Walter
2023-04-14 19:39:48 +00:00
-
c56b715269
Expose type name from ggml (#970)
Pavol Rusnak
2023-04-14 20:05:37 +02:00
-
f4d277ae17
main : alternative instruct mode (Vicuna support, etc.) (#863)
Tomáš Pazdiora
2023-04-14 17:19:17 +02:00
-
c9a59b70a5
ggml : add unary and binary map operations (#874)
Kerfuffle
2023-04-14 08:43:55 -06:00
-
a32f7acc9f
py : cleanup dependencies (#962)
Pavol Rusnak
2023-04-14 15:37:11 +02:00
-
43ffdefb74
py : fix flake8 and isort nitpicks (#960)
Pavol Rusnak
2023-04-14 14:23:21 +02:00
-
1623a6e9b4
ggml : minor
Georgi Gerganov
2023-04-14 13:31:29 +03:00
-
c14e0d2f23
ggml : always allocate buffers with size multiple of GGML_MEM_ALIGN
Georgi Gerganov
2023-04-14 13:31:15 +03:00
-
723dac55fa
py : new conversion script (#545)
comex
2023-04-14 00:03:03 -07:00
-
0f07cacb05
ggml : fix q4_1 dot product types
Georgi Gerganov
2023-04-14 09:45:42 +03:00
-
c5d70f5c9e
ggml : optimize rope function to avoid call powf in the tight loop (#807)
Howard Su
2023-04-14 14:24:52 +08:00
-
be87b6ed20
perplexity : add support for batch size to
--perplexity (#407)
Gary Linscott
2023-04-13 14:50:42 -07:00
-
0e07e6a839
common : remove unnecessary includes (#947)
CRD716
2023-04-13 10:39:25 -05:00
-
a3a2a0eda8
ggml : add GGML_DEFAULT_N_THREADS
Georgi Gerganov
2023-04-13 18:36:40 +03:00
-
d990e3fffc
ggml : speed-up ggml_vec_dot_q4_1() ARM_NEON + 32-bit ARM support (#900)
Georgi Gerganov
2023-04-13 18:32:36 +03:00
-
9190e8eac8
llama : merge llama_internal.h into llama.h
Georgi Gerganov
2023-04-13 18:04:45 +03:00
-
c85980acd0
gitignore : benchmark
Georgi Gerganov
2023-04-13 18:01:22 +03:00
-
6232f2d7fd
ggml : optimize non-SIMD Q4_0 vector dot product (#703)
Stephan Walter
2023-04-13 14:59:50 +00:00
-
6c248707f5
ggml : introduce GGML_ALIGNED_MALLOC/GGML_ALIGNED_FREE macros (#884)
Pavol Rusnak
2023-04-13 16:08:32 +02:00
-
8cda5c981d
fix whitespace (#944)
CRD716
2023-04-13 09:03:57 -05:00
-
ec29272175
readme : remove python 3.10 warning (#929)
CRD716
2023-04-13 08:59:53 -05:00
-
7e941b95eb
readme : llama node binding (#911)
Genkagaku.GPT
2023-04-13 21:54:27 +08:00
-
c729ff730a
flake.nix: add all binaries from bin (#848)
Pavol Rusnak
2023-04-13 15:49:05 +02:00
-
4579af95e8
zig : update build.zig (#872)
Judd
2023-04-13 21:43:22 +08:00
-
8c3ffc2f04
ggml : update cblas_sgemm columns var to be more reasonable (#838)
Vladimir
2023-04-13 15:24:30 +02:00
-
107980d970
examples : add -n to alpaca and gpt4all scripts (#706)
niansa/tuxifan
2023-04-13 15:03:39 +02:00
-
585d91a156
cmake : add explicit F16C option (x86) (#576)
anzz1
2023-04-13 15:48:21 +03:00
-
95ea26f6e9
benchmark : add tool for timing q4_0 matrix multiplication (#653)
SebastianApel
2023-04-13 14:46:23 +02:00
-
82d146df9b
do not force the prompt file to end with a new line (#908)
Pavol Rusnak
2023-04-13 11:33:16 +02:00
-
e7f6997f89
Don't crash on ftype (formerly f16) == 4 (#917)
Stephan Walter
2023-04-12 15:06:16 +00:00
-
f76cb3a34d
readme : change "GPU support" link to discussion
Georgi Gerganov
2023-04-12 14:48:57 +03:00
-
782438070f
readme : update hot topics with link to "GPU support" issue
Georgi Gerganov
2023-04-12 14:31:12 +03:00
-
4dbbd40750
readme: link to sha256sums file (#902)
Nicolai Weitkemper
2023-04-12 08:46:20 +02:00
-
8b679987cd
Fix whitespace, add .editorconfig, add GitHub workflow (#883)
Pavol Rusnak
2023-04-11 21:45:44 +02:00
-
3e6e70d8e8
Add enum llama_ftype, sync ggml_type to model files (#709)
Stephan Walter
2023-04-11 15:03:51 +00:00
-
2663d2c678
Windows fixes (#890)
comex
2023-04-11 06:19:54 -07:00
-
a0caa34b16
Add BAIR's Koala to supported models (#877)
qouoq
2023-04-11 04:41:53 +08:00
-
461ba9e66e
ggml : fix WASM build
Georgi Gerganov
2023-04-10 23:20:01 +03:00
-
c3ac702e5e
ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst
Georgi Gerganov
2023-04-10 22:40:28 +03:00
-
9d634ef452
ggml : remove trailing whitespaces
Georgi Gerganov
2023-04-10 19:32:45 +03:00
-
d9a239c410
Simplify to include lower-case windows.h always, fix compile on mingw32 (#747)
Marco Matthies
2023-04-10 19:57:59 +02:00
-
684da25926
ggml : fix quantize_row_q4_1() ARM_NEON (close #876)
Georgi Gerganov
2023-04-10 19:29:48 +03:00
-
180b693a47
Print model version.
comex
2023-04-08 13:08:21 -07:00
-
f963b63afa
Rewrite loading code to try to satisfy everyone:
comex
2023-04-08 12:24:37 -07:00
-
aaf3b23deb
fix for windows utf-8 input (#840)
Tomáš Pazdiora
2023-04-08 17:49:39 +02:00
-
f2d1c47294
cmake should link openblas properly with -lopenblas like how it's done in the makefile (#839)
eiery
2023-04-08 07:15:17 -04:00
-
317fb12fbd
Add new binaries to flake.nix (#847)
lon
2023-04-08 07:04:23 -03:00