-
41f308f58e
llama : do not print "offloading layers" message in CPU-only builds (#5416)
slaren
2024-02-08 21:33:03 +01:00
-
6e99f2a04f
Fix f16_sycl cpy call from Arc (#5411)
Abhilash Majumder
2024-02-08 22:39:10 +05:30
-
ff4ff05c5f
llava : add missing .py, and fix paths in README.md (#5414)
Daniel Bevenius
2024-02-08 15:20:03 +01:00
-
b7b74cef36
fix trailing whitespace (#5407)
Johannes Gäßler
2024-02-08 11:36:54 +01:00
-
4aa43fab56
llama : fix MiniCPM (#5392)
runfuture
2024-02-08 18:36:19 +08:00
-
a6e514a85f
llava: fix typo/formatting in README.md (#5405)
Daniel Bevenius
2024-02-08 09:58:19 +01:00
-
26d4efd11e
sampling: fix top_k <= 0 (#5388)
Johannes Gäßler
2024-02-08 09:46:30 +01:00
-
8504d2d0da
tests : .gitignore obj files
Georgi Gerganov
2024-02-08 09:46:47 +02:00
-
c4fbb6717c
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393)
Michael Podvitskiy
2024-02-07 22:39:23 +01:00
-
8c933b70c2
fix typo in readme (#5399)
Ebey Abraham
2024-02-07 21:11:30 +00:00
-
b906596bb7
Add Ava in the list of llama.cpp UIs (#4362)
Kamil Tomšík
2024-02-07 19:44:52 +01:00
-
aa7ab99be2
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386)
Johannes Gäßler
2024-02-07 12:40:26 +01:00
-
10afa6f1d1
[SYCL] update install make by w64devkit (#5297)
Neo Zhang Jianyu
2024-02-07 18:16:55 +08:00
-
0ef46da632
llava-cli : always tokenize special tokens (#5382)
Xiao-Yong Jin
2024-02-07 02:17:25 -06:00
-
ee1628bdfe
Basic Vulkan Multi-GPU implementation (#5321)
0cc4m
2024-02-07 07:54:50 +01:00
-
ed0bf32290
readme : modernize (#5379)
Eve
2024-02-07 06:21:30 +00:00
-
9a697d842b
readme : update ui list (#5354)
Ben Williams
2024-02-06 22:16:48 -08:00
-
316c7faf77
llama : add MiniCPM support (#5346)
runfuture
2024-02-07 14:15:56 +08:00
-
f3e2b4fa3f
server : update
/props with "total_slots" value (#5373)
Justin Parker
2024-02-07 01:15:19 -05:00
-
f68664ac24
convert : fix TypeError on GPT-2 vocab.json (#5288)
Sang-Kil Park
2024-02-07 13:28:00 +09:00
-
213d1439fa
server : remove model.json endpoint (#5371)
Alexey Parfenov
2024-02-06 18:08:38 +00:00
-
17c97fb062
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370)
Johannes Gäßler
2024-02-06 18:43:06 +01:00
-
b08f22c882
Update README.md (#5366)
Kawrakow
2024-02-06 19:00:16 +02:00
-
f57fadc009
Slight quantization improvement for Q4_K and Q5_K (#5361)
Kawrakow
2024-02-06 17:28:02 +02:00
-
2e9c0bd6b3
readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362)
BarfingLemurs
2024-02-06 09:06:48 -05:00
-
2c516611f1
CUDA: mul_mat_vec_q for batch sizes > 1 (#5351)
Johannes Gäßler
2024-02-06 14:44:06 +01:00
-
8a79c591de
server : include total "num_slots" in props endpoint (#5349)
Justin Parker
2024-02-06 04:20:59 -05:00
-
31e7903221
server : add
dynatemp_range and dynatemp_exponent (#5352)
Michael Coppola
2024-02-06 04:20:00 -05:00
-
4ffc7a17d4
server : various fixes for the prompt field in /completion (#5300)
Niall Coates
2024-02-06 08:16:23 +00:00
-
906cff55c2
py : handle byte tokens in
get_token_type (#5341)
Georgi Gerganov
2024-02-06 07:47:22 +02:00
-
098f6d737b
make: Use ccache for faster compilation (#5318)
Johannes Gäßler
2024-02-05 19:33:00 +01:00
-
78b00dda6c
README: updated introduction (#5343)
Johannes Gäßler
2024-02-05 15:55:10 +01:00
-
c6b395535a
ggml : make use of ggml-quants.h possible in C++ code (#5338)
Kawrakow
2024-02-05 14:09:47 +02:00
-
abb61944a5
ggml : avoid duplicating function calls using MIN/MAX macros (#5325)
Dr. Tom Murphy VII Ph.D
2024-02-05 06:13:57 -05:00
-
89503dcb5f
iq3_xxs: quards for the no-imatrix situation (#5334)
Kawrakow
2024-02-05 12:32:27 +02:00
-
7e1ae372f3
py : fix internlm2-hf convert to gguf (#5305)
Guoteng
2024-02-05 17:04:06 +08:00
-
6fdfa2ecc6
iq2_xxs: tune quantization (#5320)
Kawrakow
2024-02-05 10:46:06 +02:00
-
a2d60c9158
server : allow to get default generation settings for completion (#5307)
Alexey Parfenov
2024-02-05 08:10:22 +00:00
-
e6f8177532
common : add dynamic temperature parameters to main example cli (#5295)
l3utterfly
2024-02-05 17:00:47 +09:00
-
30679d438d
scripts : fix typos, cleanup (#5303)
Georgi Gerganov
2024-02-05 09:48:03 +02:00
-
4be04c8965
scripts : add non-interactive server-llm.sh (#5303)
Нияз Гарифзянов
2024-02-05 10:43:57 +03:00
-
5d55b0cd82
readme : add CodeShell models to the supported models list (#5330)
chiranko
2024-02-05 15:41:38 +08:00
-
4833ac209d
[SYCL] Fix cpy with dims of 3 (#5289)
AidanBeltonS
2024-02-05 07:08:24 +00:00
-
9392ebd49e
flake.lock: Update
github-actions[bot]
2024-02-04 00:17:24 +00:00
-
5ed26e1fc9
Adding some imatrix tools (#5302)
Kawrakow
2024-02-04 10:39:58 +02:00
-
277fad30c6
cmake : use set() for LLAMA_WIN_VER (#5298)
Welby Seely
2024-02-03 23:18:51 -05:00
-
3c0d25c475
make: add nvcc info print (#5310)
Johannes Gäßler
2024-02-03 20:15:13 +01:00
-
3cc5ed353c
make: fix nvcc optimization flags for host code (#5309)
Johannes Gäßler
2024-02-03 20:14:59 +01:00
-
60ecf099ed
add Vulkan support to Nix flake
Martin Schwaighofer
2024-01-28 12:59:43 +01:00
-
e920ed393d
Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301)
0cc4m
2024-02-03 18:15:00 +01:00
-
52bb63c708
refactor : switch to emplace_back to avoid extra object (#5291)
Michael Klimenko
2024-02-03 12:23:37 +01:00
-
1ec3332ade
YaRN : store rope scaling type as int32_t in memory (#5285)
Jared Van Bortel
2024-02-03 06:22:06 -05:00
-
6a66c5071a
readme : add tenere in the ui tools list (#5284)
BADR
2024-02-03 12:20:26 +01:00
-
a305dba8ff
Fix im2col with 32fp (#5286)
AidanBeltonS
2024-02-03 08:11:37 +00:00
-
191221178f
perplexity : fix KL divergence calculations on Windows (#5273)
kalomaze
2024-02-02 08:15:30 -06:00
-
e437b37fd0
scripts : parse wtype in server-llm.sh (#5167)
Georgi Gerganov
2024-02-02 14:23:40 +02:00
-
2d40085c26
py : add check for '.attn.masked_bias' layers to GPT2model (#5281)
Mirror Azure
2024-02-02 14:39:09 +03:00
-
b05102fe8c
Tidy ggml-sycl (#5261)
AidanBeltonS
2024-02-02 08:39:48 +00:00
-
6b91b1e0a9
docker : add build for SYCL, Vulkan + update readme (#5228)
Xuan Son Nguyen
2024-02-02 08:56:31 +01:00
-
e805f0fa99
[SYCL] get MAX_MEM_ALLOC from device property (#5270)
Meng, Hengyu
2024-02-02 15:54:14 +08:00
-
af3ba5d946
[SYCL] update guide of SYCL backend (#5254)
Neo Zhang Jianyu
2024-02-02 15:53:27 +08:00
-
e1e721094d
llama : fix memory leak in llama_batch_free (#5252)
Ian Bull
2024-02-01 23:20:13 -08:00
-
128dcbd3c9
add --no-mmap in llama-bench (#5257)
Neo Zhang Jianyu
2024-02-02 03:48:53 +08:00
-
4d0924a890
Vulkan Phi Fix for AMD Proprietary Drivers (#5260)
0cc4m
2024-02-01 19:25:24 +01:00
-
8ca511cade
cuda : fix LLAMA_CUDA_F16 (#5262)
slaren
2024-02-01 18:30:17 +01:00
-
d71ac90985
make : generate .a library for static linking (#5205)
Ali Nehzat
2024-02-02 02:18:53 +11:00
-
ce32060198
llama : support InternLM2 (#5184)
Guoteng
2024-02-01 17:19:51 +08:00
-
1cfb5372cf
Fix broken Vulkan Cmake (properly) (#5230)
Eve
2024-01-31 19:21:55 +00:00
-
d3bac7d584
llama : reorder build_orion() at correct place (#5118)
Georgi Gerganov
2024-01-31 18:47:10 +02:00
-
5cb04dbc16
llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)
Georgi Gerganov
2024-01-31 17:30:17 +02:00
-
efb7bdbbd0
metal : add im2col F32 dst support (#5132)
Georgi Gerganov
2024-01-31 15:35:41 +02:00
-
15606309a0
llava : add MobileVLM support (#5132)
JidongZhang-THU
2024-01-31 21:10:15 +08:00
-
b2b9f025e7
format license text, restore apache license by legal suggestion (#5233)
Neo Zhang Jianyu
2024-01-31 21:04:46 +08:00
-
dabcc5b471
ggml : limit n_threads to the max n_tasks (#5238)
slaren
2024-01-31 13:43:03 +01:00
-
f8e9140cb4
Vulkan Fixes (#5223)
0cc4m
2024-01-31 11:44:19 +01:00
-
d62520eb2c
Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231)
Yiming Cui
2024-01-31 11:04:21 +08:00
-
01684139c3
support SYCL backend windows build (#5208)
Neo Zhang Jianyu
2024-01-31 10:38:07 +08:00
-
e8dc55d006
kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)
Jared Van Bortel
2024-01-30 19:04:37 -05:00
-
e0085fdf7c
Revert "server : change deps.sh xxd files to string literals (#5221)"
Georgi Gerganov
2024-01-30 21:19:26 +02:00
-
e6f291d158
server : fix context shift (#5195)
Georgi Gerganov
2024-01-30 20:17:30 +02:00
-
4003be0e5f
server : change deps.sh xxd files to string literals (#5221)
JohnnyB
2024-01-30 12:15:05 -06:00
-
fea4fd4ba7
ggml : fix IQ3_XXS on Metal (#5219)
Kawrakow
2024-01-30 19:15:28 +02:00
-
8f8ddfcfad
sync : ggml (#0)
Georgi Gerganov
2024-01-30 16:21:57 +02:00
-
6fb50ebbf0
gguf : fix comparison (ggml/715)
Georgi Gerganov
2024-01-29 21:08:18 +02:00
-
625a699b54
ggml_cuda_cpy support for 4d tensors and float16->float32 upcasting (ggml/686)
John Balis
2024-01-29 06:37:33 -06:00
-
a4b07c057a
gguf : add input validation, prevent integer overflows (ggml/709)
Georgi Gerganov
2024-01-29 14:00:10 +02:00
-
549a1e6cd5
ci : fix yolo URLs + fix metal capture (ggml/712)
Georgi Gerganov
2024-01-29 13:29:46 +02:00
-
5f14ee0b0c
metal : add debug capture backend function (ggml/694)
Jack Mousseau
2024-01-29 01:22:23 -08:00
-
8e14e3ddb3
Faster AVX2 dot product for IQ2_XS (#5187)
Kawrakow
2024-01-30 15:15:07 +02:00
-
f4d7e54974
SOTA 3-bit quants (#5196)
Kawrakow
2024-01-30 15:14:12 +02:00
-
2256f36b79
Vulkan Windows APU Memory Handling (#5199)
0cc4m
2024-01-30 13:59:30 +01:00
-
7359016c7c
quantize : fix typo (#5211)
Vladimir Malyutin
2024-01-30 17:57:07 +07:00
-
813416991a
main : allow empty --prompt-cache file (#5176)
divinity76
2024-01-30 10:18:02 +01:00
-
5589921ef8
readme : minor (#5204)
Romain Neutron
2024-01-30 10:16:38 +01:00
-
49f44b5c55
readme : update hot topics
Georgi Gerganov
2024-01-30 11:14:44 +02:00
-
6685cc41c2
server : improve README (#5209)
Wu Jian Ping
2024-01-30 17:11:46 +08:00
-
ceebbb5b21
ggml alloc: Fix for null dereference on alloc failure (#5200)
Paul Tsochantaris
2024-01-29 22:19:29 +00:00
-
6daa69ee81
kompute : fix fallback to CPU (#5201)
Jared Van Bortel
2024-01-29 17:11:27 -05:00
-
fbf1ddec69
Nomic Vulkan backend (#4456)
Jared Van Bortel
2024-01-29 15:50:50 -05:00
-
2aed77eb06
fix typo "RLIMIT_MLOCK" (#5175)
divinity76
2024-01-29 15:45:41 +01:00