-
20fbf2a2a0
ggml : change immintrin.h to intrin.h for compatibility (#1307)
Ron Jailall
2023-05-04 11:05:59 -04:00
-
db1080876a
Only escape prompts when used with
-e (#1311)
DannyDaemonic
2023-05-04 05:08:25 -07:00
-
c65a7fbfa9
Update main's README.md with new features (#1296)
DannyDaemonic
2023-05-04 03:02:59 -07:00
-
f647ce040f
fix #1224 reverse prompt and multi line (#1297)
Tomas
2023-05-04 17:02:30 +07:00
-
799fdc1b5d
ggml : vectorize Q8_0 quantization
Georgi Gerganov
2023-05-03 23:24:20 +03:00
-
6daa09d879
examples : read chat prompts from a template file (#1196)
khimaros
2023-05-03 10:58:11 -07:00
-
bca9ad938a
minor : fix whitespaces (#1302)
Georgi Gerganov
2023-05-03 20:09:42 +03:00
-
e2a937ca6a
minor : fix trailing whitespaces
Georgi Gerganov
2023-05-03 18:43:23 +03:00
-
b0c71c7b6d
scripts : platform independent script to verify sha256 checksums (#1203)
KASR
2023-05-03 17:31:28 +02:00
-
a8a2efdc81
examples : various prompt and example fixes (#1298)
CRD716
2023-05-03 10:26:47 -05:00
-
e216aa0463
llama : only copy used KV cache in get / set state (#1272)
Evan Jones
2023-05-02 22:26:13 -04:00
-
2485d7a4d3
Process escape sequences given in prompts (#1173)
DannyDaemonic
2023-05-02 18:46:20 -07:00
-
13b0c68ed7
Handle signals properly on Windows (#1123)
DannyDaemonic
2023-05-02 18:01:57 -07:00
-
55bc5f0900
Call sh on build-info.sh (#1294)
DannyDaemonic
2023-05-02 17:52:35 -07:00
-
9daff419f6
fix build-info.h for git submodules (#1289)
kuvaus
2023-05-03 03:43:43 +03:00
-
bf4b22ffe4
fix missing parameters in
llama_init_from_gpt_params (#1293)
slaren
2023-05-03 01:36:45 +02:00
-
67c77799e0
examples : add llama_init_from_gpt_params() common function (#1290)
Ron Evans
2023-05-02 22:39:51 +02:00
-
0e6cbff1b7
llama : fix compile warnings
Georgi Gerganov
2023-05-02 23:09:08 +03:00
-
5d5817ca60
ggml : fix 32-bit ARM
Georgi Gerganov
2023-05-02 22:14:50 +03:00
-
8c9be35ff9
examples : improve vertical alignment of a few variables (#1286)
Ron Evans
2023-05-02 19:53:52 +02:00
-
cc0bb7235c
ggml : fix ppc64le build error and make cmake detect Power processors (#1284)
Marvin Gießing
2023-05-02 18:42:16 +02:00
-
2bb992f034
llama : allow 0 as a seed number. (#1275)
Robert Brisita
2023-05-02 12:23:44 -04:00
-
e2cd506999
main : switch input_noecho to input_echo to remove negation (#979)
Ron Evans
2023-05-02 18:13:26 +02:00
-
2d099e5193
ggml: add names to tensors (#1268)
slaren
2023-05-02 16:03:00 +02:00
-
f4cef87edf
Add git-based build information for better issue tracking (#1232)
DannyDaemonic
2023-05-01 09:23:47 -07:00
-
58b367c2d7
cuBLAS: refactor and optimize f16 mat mul performance (#1259)
slaren
2023-05-01 18:11:07 +02:00
-
ea3a0ad6b6
llama : update stubs for systems without mmap and mlock (#1266)
xloem
2023-05-01 08:58:51 -04:00
-
2bdc09646d
ggml : fix ggml_used_mem() (#1264)
Kerfuffle
2023-05-01 05:56:07 -06:00
-
70269cae37
llama : fix session load / save (#1263)
Georgi Gerganov
2023-05-01 14:54:59 +03:00
-
b925f1f1b0
cuBLAS: fall back to pageable memory if pinned alloc fails (#1233)
slaren
2023-05-01 13:32:22 +02:00
-
90b19bd6ee
llama : let context be const when accessing const data (#1261)
Alex Klinkhamer
2023-05-01 00:24:20 -07:00
-
7ff0dcd320
ggml : fix UB (int << 31)
Georgi Gerganov
2023-04-30 22:28:51 +03:00
-
6f79699286
build: add armv{6,7,8} support to cmake (#1251)
Pavol Rusnak
2023-04-30 20:48:38 +02:00
-
a5d30b1f53
common : better default number of threads (#934)
jon-chuang
2023-04-30 14:41:35 -04:00
-
76a884920a
ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225)
0cc4m
2023-04-30 20:34:52 +02:00
-
6bc4400e67
ggml : add Q5 WASM SIMD + GGML_FTYPE
Georgi Gerganov
2023-04-30 19:07:00 +03:00
-
f0d70f147d
Various fixes to mat_mul benchmark (#1253)
Stephan Walter
2023-04-30 12:32:37 +00:00
-
3e5aa8a1c4
ggml : fix labels for GGML_OP_ALIBI
Georgi Gerganov
2023-04-30 10:25:46 +03:00
-
c3ca7a5f05
ggml : fix 32-bit ARM NEON
Georgi Gerganov
2023-04-29 21:34:23 +03:00
-
e8c051611a
ggml : use vzip instead of vuzp for consistency
Georgi Gerganov
2023-04-29 21:12:56 +03:00
-
0b5a935099
ggml : fix visibility and unused warnings
Georgi Gerganov
2023-04-29 19:28:36 +03:00
-
ec728e44d7
ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229)
Georgi Gerganov
2023-04-29 18:43:42 +03:00
-
214b6a3570
ggml : adjust mul_mat_f16 work memory (#1226)
Georgi Gerganov
2023-04-29 18:43:28 +03:00
-
305eb5afd5
build : fix reference to old llama_util.h
Georgi Gerganov
2023-04-29 13:53:12 +03:00
-
84ca9c2ecf
examples : fix save-load-state + rename llama-util.h
Georgi Gerganov
2023-04-29 13:48:11 +03:00
-
334637e43e
common : change default parameters to pre-#1126 (#1223)
Georgi Gerganov
2023-04-29 09:51:06 +03:00
-
dd7eff57d8
llama : new sampling algorithms (#1126)
Ivan Stepanov
2023-04-29 08:34:41 +03:00
-
7fc50c051a
cuBLAS: use host pinned memory and dequantize while copying (#1207)
slaren
2023-04-29 02:04:18 +02:00
-
b1ee8f59b4
cuBLAS: non-contiguous tensor support (#1215)
Henri Vasserman
2023-04-29 02:31:56 +03:00
-
36d19a603b
Remove Q4_3 which is no better than Q5 (#1218)
Stephan Walter
2023-04-28 23:10:43 +00:00
-
7f15c5c477
readme : update hot topics
Georgi Gerganov
2023-04-28 21:32:52 +03:00
-
55390bcaf2
ggml : sync ggml (ggml_alibi)
Georgi Gerganov
2023-04-28 20:37:43 +03:00
-
5fba3c016b
examples : add Jeopardy example (#1168)
CRD716
2023-04-28 11:13:33 -05:00
-
1481a9cf25
llama : add session file format and saved sessions in main (#1169)
Evan Jones
2023-04-28 11:59:37 -04:00
-
11d902364b
ggml : add helper debug printf in soft_max
Georgi Gerganov
2023-04-28 17:58:44 +03:00
-
7296c961d9
ggml : add CLBlast support (#1164)
0cc4m
2023-04-28 16:57:16 +02:00
-
78ec543733
Correcting link to w64devkit (#1214)
Folko-Ven
2023-04-28 19:22:48 +05:00
-
92a6e13a31
Add Manjaro CUDA include and lib dirs to Makefile (#1212)
Johannes Gäßler
2023-04-28 15:40:32 +02:00
-
04aaae1d79
add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211)
Yann Follet
2023-04-28 19:59:48 +08:00
-
0b2da20538
ggml : slightly faster AVX2 implementation for Q5 (#1197)
Stephan Walter
2023-04-26 20:26:42 +00:00
-
f9be42add0
readme : add quantization info
Georgi Gerganov
2023-04-26 23:24:42 +03:00
-
574406dc7e
ggml : add Q5_0 and Q5_1 quantization (#1187)
Georgi Gerganov
2023-04-26 23:14:13 +03:00
-
87a6f846d3
Allow setting the rng seed after initialization. (#1184)
Ásgeir Bjarni Ingvarsson
2023-04-26 20:08:43 +00:00
-
ea3ad7eb60
Updating build instructions to include BLAS support (#1183)
DaniAndTheWeb
2023-04-26 22:03:03 +02:00
-
859fee6dfb
quantize : use
map to assign quantization type from string (#1191)
Pavol Rusnak
2023-04-26 18:43:27 +02:00
-
4afcc37869
Update SHA256SUMS after quantization change (#1181)
Stephan Walter
2023-04-25 21:41:56 +00:00
-
667c501334
py : cast lora_alpha to int in convert-lora-to-ggml (#1170)
ostix360
2023-04-25 23:33:08 +02:00
-
bb98e77be7
nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981)
Pavol Rusnak
2023-04-25 23:19:57 +02:00
-
7a32fcb3b2
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179)
Georgi Gerganov
2023-04-25 23:40:51 +03:00
-
dd0eabc049
ggml : use full range for Q4_0 and Q4_2 quantization (#729)
unbounded
2023-04-25 19:20:46 +02:00
-
54bb60e268
ggml : fix bug in ggml_compute_forward_sum_f32 (#1162)
xaedes
2023-04-24 23:02:02 +02:00
-
8a0f8673ba
ggml : export symbols (#1155)
Georgi Gerganov
2023-04-24 22:18:25 +03:00
-
0c5692345d
examples : add save_load_state example (#1150)
xaedes
2023-04-24 18:23:31 +02:00
-
957c8ae21d
llama : increase scratch buffer size for 65B (ref #1152)
Georgi Gerganov
2023-04-24 18:47:03 +03:00
-
9b0a4d4214
examples/main README improvements and some light refactoring (#1131)
mgroeber9110
2023-04-24 17:45:32 +02:00
-
2ec83428de
Fix build for gcc 8 and test in CI (#1154)
Stephan Walter
2023-04-24 15:38:26 +00:00
-
e4cf982e0d
Fix cuda compilation (#1128)
slaren
2023-04-24 17:29:58 +02:00
-
c4fe84fb0d
llama : refactor get / set state + remove redundant kv cache API (#1143)
Georgi Gerganov
2023-04-24 07:40:02 +03:00
-
1d78fecdab
Fix LoRA acronym (#1145)
slaren
2023-04-23 23:03:44 +02:00
-
284685f169
scripts : add helper scripts to synch ggml repo
Georgi Gerganov
2023-04-23 19:57:09 +03:00
-
edce63baa9
Added README.md for main with examples and explanations (#1139)
DannyDaemonic
2023-04-23 08:37:02 -07:00
-
ec9cdb6752
ggml : do not print perf ops that have not been used at all
Georgi Gerganov
2023-04-23 18:32:52 +03:00
-
e4422e299c
ggml : better PERF prints + support "LLAMA_PERF=1 make"
Georgi Gerganov
2023-04-23 18:15:39 +03:00
-
53c8434398
Improve AVX2 for vec_dot_q4_3_q8_0 (#1138)
Stephan Walter
2023-04-23 11:01:03 +00:00
-
c6524f46eb
readme : update gpt4all instructions (#980)
Pavol Rusnak
2023-04-23 10:21:26 +02:00
-
c9e2c26f41
A better
packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119)
Yishuo Wang
2023-04-23 15:57:05 +08:00
-
0e018fe008
ggml : fix Q4_3 cuBLAS
Georgi Gerganov
2023-04-22 16:31:56 +03:00
-
857308d1e8
ci : trigger CI for drafts, but not most PR actions (#1125)
Stephan Walter
2023-04-22 13:12:29 +00:00
-
c50b628810
Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122)
Stephan Walter
2023-04-22 10:54:13 +00:00
-
5f939498d5
ggml : unit test for quantization functions (#953)
unbounded
2023-04-22 11:10:39 +02:00
-
36b4f7e064
llama : print timings on ctrl+c exit (#1021)
wbpxre150
2023-04-22 16:56:35 +08:00
-
10f19c1121
llama : have n_batch default to 512 (#1091)
eiery
2023-04-22 04:27:05 -04:00
-
7e312f165c
cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100)
Howard Su
2023-04-22 16:18:20 +08:00
-
872c365a91
ggml : fix AVX build + update to new Q8_0 format
Georgi Gerganov
2023-04-22 11:08:12 +03:00
-
955ef9a5d5
ggml : alternative Q4_3 implementation using modified Q8_0 (#1109)
Georgi Gerganov
2023-04-22 10:55:35 +03:00
-
c5aa5e5777
ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099)
Stephan Walter
2023-04-22 07:37:05 +00:00
-
e9a9cb0c54
examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107)
Clint Herron
2023-04-22 02:54:33 -04:00
-
b6e7f9b09e
llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105)
xaedes
2023-04-22 08:21:32 +02:00
-
50cb666b8a
Improve cuBLAS performance by using a memory pool (#1094)
slaren
2023-04-21 21:59:17 +02:00
-
25d7abbd1f
llama : fixed rlimit error message (#888)
apaz
2023-04-21 13:48:06 -05:00