-
ce2c7d72e2
metal : handle buffers larger than device's maxBufferLength (#1826)
Georgi Gerganov
2023-06-18 09:09:47 +03:00
-
57cd69460f
cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917)
Howard Su
2023-06-18 12:29:47 +08:00
-
b2416493ab
make : do not print help for simple example
Georgi Gerganov
2023-06-17 20:55:03 +03:00
-
4f9c43e3bd
minor : warning fixes
Georgi Gerganov
2023-06-17 20:24:11 +03:00
-
2c9380dd2f
Only one CUDA stream per device for async compute (#1898)
Johannes Gäßler
2023-06-17 19:15:02 +02:00
-
051e1b0e6a
llama : fix kv_cache
n init (close #1903)
Georgi Gerganov
2023-06-17 19:30:22 +03:00
-
86c7571864
make : update for latest Arch (#1701)
DaniAndTheWeb
2023-06-17 18:17:22 +02:00
-
3d59ec5935
ggml : fix warnings under MSVC (#1908)
Howard Su
2023-06-17 23:46:15 +08:00
-
0711a5f6dc
metal : add norm, cpy f16->f16, alibi kernels (#1823)
Aaron Miller
2023-06-17 07:37:49 -07:00
-
fc45a81bc6
exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc (#1863)
Faez Shakil
2023-06-17 17:13:05 +05:00
-
794db3e7b9
Server Example Refactor and Improvements (#1570)
Randall Fitzgerald
2023-06-17 07:53:04 -04:00
-
5ddf7ea1fb
hooks : setting up flake8 and pre-commit hooks (#1681)
Jiří Podivín
2023-06-17 12:32:48 +02:00
-
bac19927c3
readme : alternative way to build for Android with CLBlast. (#1828)
Gustavo Rocha Dias
2023-06-17 06:01:06 -03:00
-
b4c6f46f17
Allow cmake to build ggml as a library (#1896)
Kerfuffle
2023-06-17 01:49:42 -06:00
-
92f20d9942
train : get raw text instead of page with html (#1905)
David Yang
2023-06-17 14:51:54 +08:00
-
d411968e99
opencl : support k-quants (#1836)
0cc4m
2023-06-16 20:59:49 +02:00
-
b41b4cad6f
examples : add "simple" (#1840)
SuperUserNameMan
2023-06-16 20:58:09 +02:00
-
13fe9d2d84
cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886)
Zenix
2023-06-17 03:53:04 +09:00
-
ac3b886953
llama : fix embd when offloading non-repeating layers (#1891)
Johannes Gäßler
2023-06-16 20:25:51 +02:00
-
5b9ccaf104
Fixed possible macro redefinition (#1892)
FrankHB
2023-06-17 02:25:01 +08:00
-
9cbf50c041
build : fix and ignore MSVC warnings (#1889)
Borislav Stanimirov
2023-06-16 21:23:53 +03:00
-
3d01122610
CUDA : faster k-quant dot kernels (#1862)
Kawrakow
2023-06-16 20:08:44 +03:00
-
602c748863
gitignore : add several entries specific to Visual Studio (#1888)
Borislav Stanimirov
2023-06-16 09:58:11 +03:00
-
a09f9195be
Fixed CUDA runtime version check (#1879)
Johannes Gäßler
2023-06-15 21:49:08 +02:00
-
bed9275617
cmake : remove whitespaces
Georgi Gerganov
2023-06-15 21:56:50 +03:00
-
c36e81da62
examples : add chat-vicuna.sh (#1854)
yangli2
2023-06-15 11:05:53 -07:00
-
3559433fec
cmake : set include path for OpenBlas (#1830)
Igor Okulist
2023-06-15 12:51:26 -05:00
-
69b34a0e80
swift : Package compile breaks due to ggml-metal.metal (#1831)
Frederik Vogel
2023-06-16 02:47:04 +09:00
-
cf267d1c71
make : add train-text-from-scratch (#1850)
daboe01
2023-06-15 19:42:48 +02:00
-
9dda13e5e1
readme : server compile flag (#1874)
Srinivas Billa
2023-06-15 18:36:38 +01:00
-
37e257c48e
make : clean *.so files (#1857)
sandyiscool
2023-06-15 23:06:06 +05:30
-
64cc19b4fe
Fix the validation of main device (#1872)
Howard Su
2023-06-16 01:29:59 +08:00
-
4bfcc855ab
metal : parallel command buffer encoding (#1860)
Georgi Gerganov
2023-06-15 20:29:48 +03:00
-
6b8312e797
Better error when using both LoRA + GPU layers (#1861)
Johannes Gäßler
2023-06-15 19:06:46 +02:00
-
254a7a7a5f
CUDA full GPU acceleration, KV cache in VRAM (#1827)
Johannes Gäßler
2023-06-14 19:47:19 +02:00
-
9254920265
baby-llama : fix operator!= (#1821)
0xspringtime
2023-06-13 15:37:54 -04:00
-
e32089b2c2
train : improved training-from-scratch example (#1652)
xaedes
2023-06-13 21:04:40 +02:00
-
2347e45e7b
llama : do a warm-up eval at start for better timings (#1824)
Georgi Gerganov
2023-06-13 20:20:07 +03:00
-
74d4cfa343
Allow "quantizing" to f16 and f32 (#1787)
Kerfuffle
2023-06-13 04:23:23 -06:00
-
74a6d922f1
Metal implementation for all k_quants (#1807)
Kawrakow
2023-06-12 22:39:21 +03:00
-
e4caa8da59
ci : run when changing only the CUDA sources (#1800)
slaren
2023-06-12 19:12:47 +02:00
-
58970a4c39
Leverage mmap for offloading tensors to GPU (#1597)
Howard Su
2023-06-12 20:44:16 +08:00
-
8c0a10e64d
metal : fix failure to load model (#1817)
Kawrakow
2023-06-12 14:31:36 +03:00
-
fa84c4b3e8
Fix issue where interactive mode crashes when input exceeds ctx size (#1789)
Kerfuffle
2023-06-11 08:19:17 -06:00
-
12b063f0ec
Fixed WSL cuda's OOM error (#1594)
Kyle Liang
2023-06-11 21:20:52 +08:00
-
31d2b5f4a4
Update SHA256SUMS with current hashes for models quantized using q4_0 (#1798)
Ryan Landay
2023-06-11 17:38:53 +08:00
-
4de0334f5c
cmake : fix Metal build (close #1791)
Georgi Gerganov
2023-06-10 22:56:53 +03:00
-
3f1223155a
k-quants : GCC12 compilation fix (#1792)
Artyom Lebedev
2023-06-10 22:51:36 +03:00
-
303f5809f1
metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782)
Andrei
2023-06-10 10:47:34 -04:00
-
059e99066d
doc : fix wrong address of BLIS.md (#1772)
Aisuko
2023-06-11 00:08:11 +10:00
-
17c10acfb4
ggml : force no_alloc == false when creating opt tensors (close #1699)
Georgi Gerganov
2023-06-10 12:06:45 +03:00
-
e9b66ee982
metal : add Q4_1 implementation (#1785)
Kawrakow
2023-06-10 11:28:11 +03:00
-
4f0154b0ba
llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)
Kerfuffle
2023-06-10 01:59:17 -06:00
-
ef3171d162
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638)
Xingchen Song(宋星辰)
2023-06-10 15:49:40 +08:00
-
555275a693
make : add SSSE3 compilation use case (#1659)
rankaiyx
2023-06-10 14:41:59 +08:00
-
98ed165574
OpenCL: Add release memory (#1741)
Robert Sung-wook Shin
2023-06-10 01:24:40 +09:00
-
ae9663f188
Windows nvcc workaround (#1753)
Johannes Gäßler
2023-06-09 13:58:15 +02:00
-
b33dee282f
metal : fix build "tanhf" -> "tanh"
Georgi Gerganov
2023-06-09 11:11:04 +03:00
-
92f44ff7f7
metal : add GELU implementation (#1770)
AT
2023-06-09 04:00:51 -04:00
-
245fc3c37d
metal : faster q4_0 (#1775)
Kawrakow
2023-06-09 10:39:59 +03:00
-
72ff5282bf
metal : add Q2_K implementation (#1762)
Kawrakow
2023-06-08 22:28:21 +03:00
-
0bf7cf1b29
Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)"
Georgi Gerganov
2023-06-08 20:48:14 +03:00
-
8432d4d9f7
ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)
le.chang
2023-06-09 00:47:56 +08:00
-
0f291e1f65
metal : Q6_K implementation (#1752)
Kawrakow
2023-06-08 19:46:22 +03:00
-
8fc8179919
Add llama.cpp docker support for non-latin languages (#1673)
qingfengfenga
2023-06-08 15:58:53 +08:00
-
b50b570ed9
ggml : fix fprintf warnings (#1720)
Steven Roussey
2023-06-08 00:12:28 -07:00
-
53aba3f393
clang-tidy : restore dot file from accidental deletion
Georgi Gerganov
2023-06-08 10:09:08 +03:00
-
4161bdc04d
metal : add Q4_K implementation (#1733)
Kawrakow
2023-06-08 10:08:23 +03:00
-
0035858273
k-quants : add missing compile definition to CMakeLists (#1748)
johnson442
2023-06-08 08:02:48 +01:00
-
5c64a0952e
k-quants : allow to optionally disable at compile time (#1734)
Georgi Gerganov
2023-06-07 10:59:52 +03:00
-
5b57a5b726
flake : update to support metal on m1/m2 (#1724)
jacobi petrucciani
2023-06-07 00:15:31 -04:00
-
4dc62c545d
readme : add June roadmap
Georgi Gerganov
2023-06-07 07:15:08 +03:00
-
35a84916fb
main: add the possibility to open the prompt cache read-only (#1640)
Willy Tarreau
2023-06-07 04:10:17 +02:00
-
2d7bf110ed
llama : fix vram_scratch var
Georgi Gerganov
2023-06-06 22:54:39 +03:00
-
2a4e41a086
llama : fix compile warnings
Georgi Gerganov
2023-06-06 22:41:53 +03:00
-
17366df842
Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)
Johannes Gäßler
2023-06-06 21:33:23 +02:00
-
44f906e853
metal : add f16 support
Georgi Gerganov
2023-06-06 20:16:57 +03:00
-
d5b111f53d
Clblast fixes + enhancements to save VRAM and offload more layers (#1675)
LostRuins
2023-06-07 01:00:01 +08:00
-
2d43387daf
ggml : fix builds, add ggml-quants-k.o (close #1712, close #1710)
Georgi Gerganov
2023-06-06 10:18:03 +03:00
-
7ad7750c5c
gitignore : add .clang-tidy
Georgi Gerganov
2023-06-06 09:55:10 +03:00
-
7a74dee6b4
llama : temporary disable Q6_K output quantization (#1711)
Georgi Gerganov
2023-06-06 09:39:38 +03:00
-
590250f7a9
metal : add checks for buffer size (#1706)
Spencer Sutton
2023-06-05 23:28:17 -04:00
-
f4c55d3bd7
docs : add performance troubleshoot + example benchmark documentation (#1674)
Yuval Peled
2023-06-05 23:32:36 +03:00
-
f1465624c2
readme : fix typo (#1700)
Foul-Tarnished
2023-06-05 22:28:37 +02:00
-
c2df36d60d
llama : consistently catch and throw only exceptions deriving from std::exception (#1599)
mgroeber9110
2023-06-05 22:24:29 +02:00
-
9d0693bce3
metal : use shared buffers between CPU and GPU (#1696)
kiltyj
2023-06-05 13:24:04 -07:00
-
efe0507632
ggml : fix internal overflow in ggml_time_us on Windows (#1702)
grahameth
2023-06-05 22:11:49 +02:00
-
e7fe66e670
ci : disable auto tidy (#1705)
Georgi Gerganov
2023-06-05 23:05:05 +03:00
-
99009e72f8
ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)
Kawrakow
2023-06-05 22:56:18 +03:00
-
5220a991a5
Increase 3B scratch buffers. (#1698)
Henri Vasserman
2023-06-05 13:43:08 +03:00
-
d1f563a743
llama : fix Metal KV cache sync (close #1695)
Georgi Gerganov
2023-06-05 10:19:03 +03:00
-
827f5eda91
readme : update hot topics
Georgi Gerganov
2023-06-04 23:38:19 +03:00
-
ecb217db4f
llama : Metal inference (#1642)
Georgi Gerganov
2023-06-04 23:34:30 +03:00
-
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)
0cc4m
2023-06-04 08:12:05 +02:00
-
d8bd0013e8
Add info about CUDA_VISIBLE_DEVICES (#1682)
Henri Vasserman
2023-06-03 16:35:20 +03:00
-
b5c85468a3
Docker: change to calling convert.py (#1641)
Jiří Podivín
2023-06-03 14:11:53 +02:00
-
136476e898
Fix prompt cache saving and chat-persistent rollover (#1678)
Evan Jones
2023-06-03 07:28:45 -04:00
-
ffb06a345e
OpenLLaMA 3B support (#1588)
Henri Vasserman
2023-05-30 21:24:22 +03:00
-
7552ac5863
ggml : sync cgraph import / export API
Georgi Gerganov
2023-05-29 19:31:44 +03:00
-
5d1830b99d
ggml : fix bug in ggml_alibi
Georgi Gerganov
2023-05-29 19:30:49 +03:00