-
45c0e2e4c1
Refactor Vulkan backend to allow multiple contexts (#7961)
0cc4m
2024-06-23 10:21:25 +02:00
-
b5a5f34efa
Removing extra blank lines that were breaking Lint. (#8067)
Clint Herron
2024-06-22 14:28:18 -04:00
-
3e58b0ee35
cvector: fix CI + correct help message (#8064)
Xuan Son Nguyen
2024-06-22 18:11:30 +02:00
-
adf480c3ab
cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (#8052)
HatsuneMikuUwU33
2024-06-22 17:19:37 +02:00
-
3aa184a8c7
convert-hf : change assert to exception (#8015)
0xspringtime
2024-06-22 09:37:41 -04:00
-
5b48cd53a8
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058)
ddh0
2024-06-22 07:16:10 -06:00
-
c5a8d4b749
JSON Schema to GBNF integration tests (#7790)
Clint Herron
2024-06-21 23:18:36 -04:00
-
557b653dc9
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022)
k.h.lai
2024-06-21 16:28:20 +08:00
-
7d5e8777ae
ggml : AVX IQ quants (#7845)
Eve
2024-06-21 05:57:36 +00:00
-
a927b0f3dd
llama : optimize long word tokenization with WPM (#8034)
Georgi Gerganov
2024-06-21 08:51:28 +03:00
-
80ea089d77
llama : allow pooled embeddings on any model (#7477)
Douglas Hanley
2024-06-21 00:38:22 -05:00
-
0e64591e82
swiftui : enable stream updating (#7754)
Shuichi Tsutsumi
2024-06-21 14:30:58 +09:00
-
b1ef562bc1
requirements : Bump torch and numpy for python3.12 (#8041)
Hamdoud Hakem
2024-06-20 21:01:15 +01:00
-
17b291a6a5
convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040)
Hamdoud Hakem
2024-06-20 20:59:59 +01:00
-
abd894ad96
common: fix warning (#8036)
Johannes Gäßler
2024-06-20 16:40:13 +02:00
-
de391e4c80
[SYCL] Fix windows build and inference (#8003)
luoyu-intel
2024-06-20 13:19:05 +00:00
-
d50f8897a7
CUDA: stream-k decomposition for MMQ (#8018)
Johannes Gäßler
2024-06-20 14:39:21 +02:00
-
2075a66a96
metal : fix
ggml_metal_supports_op for BF16 (#8021)
Michael de Gans
2024-06-19 22:32:01 -07:00
-
ba58993152
server : fix smart slot selection (#8020)
sasha0552
2024-06-19 23:57:10 +00:00
-
a7854743c5
un-ignore
build-info.cmake and build-info.sh (#7996)
Michael de Gans
2024-06-19 13:10:42 -07:00
-
9c77ec1d74
ggml : synchronize threads using barriers (#7993)
slaren
2024-06-19 15:04:15 +02:00
-
a04a953cab
codecov : remove (#8004)
Georgi Gerganov
2024-06-19 13:04:36 +03:00
-
623494a478
[SYCL] refactor (#6408)
Meng, Hengyu
2024-06-19 09:11:51 +08:00
-
37bef89433
tokenizer : BPE fixes (#7530)
jaime-m-p
2024-06-18 18:40:52 +02:00
-
91c188d6c2
Only use FIM middle token if it exists (#7648)
Sigbjørn Skjæret
2024-06-18 14:19:45 +02:00
-
84f6de17f6
Fix no gcc pragma on Windows (#7751)
jojorne
2024-06-18 09:18:32 -03:00
-
61665277af
Allow compiling with CUDA without CUDA runtime installed (#7989)
Ulrich Drepper
2024-06-18 14:00:14 +02:00
-
b96f9afb0d
chore: clean useless beam search param (#7985)
Frank Mai
2024-06-18 15:11:40 +08:00
-
1193778105
readme : update UI list (#7943)
Abheek Gulati
2024-06-17 23:57:41 -07:00
-
5326bcceeb
ggml : sync
Georgi Gerganov
2024-06-18 09:50:45 +03:00
-
e6ecc2be47
whisper : use ggml_backend_sched (whisper/2239)
Georgi Gerganov
2024-06-18 09:37:20 +03:00
-
a94e6ff877
update: support Qwen2-57B-A14B (#7835)
Ștefan-Gabriel Muscalu
2024-06-17 22:08:46 +03:00
-
5b6da18750
Make updates to type cast based on compiler instead of OS (#7851)
Srihari-mcw
2024-06-17 23:53:17 +05:30
-
7c26775adb
llama : disable FA if KV head size do not match (#7982)
Georgi Gerganov
2024-06-17 19:40:01 +03:00
-
b473e95084
Add Nix and Flox install instructions (#7899)
Bryan Honof
2024-06-17 17:37:55 +02:00
-
99052cd227
sched : offload_op also requires supports_op (#7977)
slaren
2024-06-17 16:51:42 +02:00
-
c637fcd34d
fix: divide 0 exception in mamba (#7932)
Frank Mai
2024-06-17 22:11:08 +08:00
-
6a2f0b3474
Implement non-mapped async IO for CUDA on Windows. (#7896)
Markus Tavenrath
2024-06-17 16:10:15 +02:00
-
21be9cab94
rpc : fix load/store misaligned addresses (#7948)
Georgi Gerganov
2024-06-17 11:09:20 +03:00
-
006167aaf6
gguf-dump.py: add --markdown dump output (#7853)
Brian
2024-06-17 15:25:20 +10:00
-
df68d4fa5d
[SYCL] Update README-sycl.md for Chapter "Recommended release" and "News" (#7946)
Neo Zhang
2024-06-17 11:17:07 +08:00
-
43b35e38ba
Add support for sqrt on CUDA (#7953)
Calvin Laurenson
2024-06-16 15:23:04 -07:00
-
19b7a836f6
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
Georgi Gerganov
2024-06-11 17:39:01 +03:00
-
b5fcf8ef5c
ggml : fix and optimize ppc64le (ggml/849)
Hong Bo PENG
2024-06-16 16:53:11 +08:00
-
398105ff43
ggml : remove duplicate include of ggml-common.h (ggml/853)
Daniel Bevenius
2024-06-16 10:51:18 +02:00
-
bc6c457fa3
flake.lock: Update (#7951)
Georgi Gerganov
2024-06-16 19:16:21 +03:00
-
52399254b3
unicode : avoid char32_t (#7957)
Georgi Gerganov
2024-06-16 14:51:40 +03:00
-
6fe1c62741
readme : update UI list [no ci] (#7958)
hopkins385
2024-06-16 13:51:18 +02:00
-
cddaf028ad
ggml : fix handling of zero blocks in IQ quants (#7955)
Georgi Gerganov
2024-06-16 14:50:12 +03:00
-
c8a82194a8
github : update pr template
Georgi Gerganov
2024-06-16 10:46:51 +03:00
-
7c7836d9d4
Vulkan Shader Refactor, Memory Debugging Option (#7947)
0cc4m
2024-06-16 07:17:31 +02:00
-
0c7b3595b9
Add
cvector-generator example (#7514)
Xuan Son Nguyen
2024-06-15 18:53:40 +02:00
-
7b2f4a7d19
[SYCL] remove global variables (#7710)
Meng, Hengyu
2024-06-15 14:05:10 +08:00
-
f8ec8877b7
ci : fix macos x86 build (#7940)
olexiyb
2024-06-14 20:28:34 +03:00
-
76d66ee0be
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
Johannes Gäßler
2024-06-14 18:41:49 +02:00
-
66ef1ceedf
metal : utilize max shared memory for mul_mat_id (#7935)
Georgi Gerganov
2024-06-14 17:14:09 +03:00
-
e65bbf606c
llama-bench : fix RPC indication (#7936)
Radoslav Gerganov
2024-06-14 16:47:41 +03:00
-
6fcd1331ef
llama : more checks before assuming FIM tokens (#7644)
Sigbjørn Skjæret
2024-06-14 12:20:04 +02:00
-
41b9260f18
convert : add Poro-34B-chat tokenizer support (#7713)
Elaine
2024-06-14 13:16:49 +03:00
-
172c825684
rpc : fix ggml_backend_rpc_supports_buft() (#7918)
Radoslav Gerganov
2024-06-13 15:18:44 +03:00
-
a55eb1bf0f
readme : Remove outdated instructions from README.md (#7914) [no ci]
Galunid
2024-06-13 09:42:41 +02:00
-
f578b86b21
move BLAS to a separate backend (#6210)
slaren
2024-06-13 03:11:35 +02:00
-
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
Olivier Chafik
2024-06-13 00:41:52 +01:00
-
963552903f
CUDA: fix broken oob check for FA vec f32 kernel (#7904)
Johannes Gäßler
2024-06-12 17:41:51 +02:00
-
a9cae48003
tests : add non-cont unary tests (#7857)
Georgi Gerganov
2024-06-12 16:00:22 +03:00
-
bfaa676b08
ggml : improve ggml_is_contiguous logic (#7856)
Georgi Gerganov
2024-06-12 15:24:20 +03:00
-
704a35b183
server : restore numeric prompts (#7883)
Georgi Gerganov
2024-06-12 14:42:29 +03:00
-
dcf752707d
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894)
Meng, Hengyu
2024-06-12 17:05:35 +08:00
-
f2b5764beb
Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci]
Patrice Ferlet
2024-06-12 03:18:16 +02:00
-
73bac2b11d
vulkan: select only one device for single gpu with multiple drivers (#7582)
k.h.lai
2024-06-12 03:26:05 +08:00
-
ef52d1d16a
Update Vulkan RoPE implementation (#7818)
0cc4m
2024-06-11 21:20:29 +02:00
-
14f83526cd
fix broken link in pr template (#7880) [no ci]
Deven Mistry
2024-06-11 12:18:58 -04:00
-
6fe42d073f
github: move PR template to .github/ root (#7868)
Brian
2024-06-12 00:43:41 +10:00
-
148995e5e5
llama-bench: more compact markdown tables (#7879)
Johannes Gäßler
2024-06-11 14:45:40 +02:00
-
4bfe50f741
tests : check the Python version (#7872)
Georgi Gerganov
2024-06-11 10:10:20 +03:00
-
bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)
Johannes Gäßler
2024-06-11 08:26:07 +02:00
-
c2ce6c47e4
fix CUDA CI by using a windows-2019 image (#7861)
slaren
2024-06-11 07:59:20 +02:00
-
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866)
Olivier Chafik
2024-06-11 02:22:57 +01:00
-
396b18dfec
json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
Olivier Chafik
2024-06-11 01:00:30 +01:00
-
864a99e7a0
cmake : fix CMake requirement for CUDA (#7821)
Jared Van Bortel
2024-06-10 18:32:10 -04:00
-
fd5ea0f897
ci : try win-2019 on server windows test (#7854)
slaren
2024-06-10 14:18:41 +02:00
-
c28a83902c
examples : remove --instruct remnants (#7846)
Georgi Gerganov
2024-06-10 15:00:15 +03:00
-
d9da0e4986
server : improve "prompt" handling (#7847)
Georgi Gerganov
2024-06-10 14:59:55 +03:00
-
1f0dabda8d
CUDA: use tensor cores for MMQ (#7676)
Johannes Gäßler
2024-06-10 11:45:13 +02:00
-
af4ae502dd
use the correct SYCL context for host USM allocations (#7777)
Ben Ashbaugh
2024-06-10 02:21:31 -07:00
-
10ceba354a
flake.lock: Update (#7838)
Georgi Gerganov
2024-06-10 02:04:50 +03:00
-
e95beeb1fc
imatrix : handle partial entries (#7833)
Georgi Gerganov
2024-06-09 20:19:35 +03:00
-
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700)
Nicolás Pérez
2024-06-09 11:24:29 -04:00
-
3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830)
mgroeber9110
2024-06-09 12:50:35 +02:00
-
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q (#7824)
Johannes Gäßler
2024-06-09 09:42:25 +02:00
-
2decf57bc6
convert-hf : set the model name based on cli arg, if present (#7693)
sasha0552
2024-06-09 06:39:25 +00:00
-
5795b94182
convert-hf : match model part name prefix and suffix (#7687)
compilade
2024-06-08 22:47:25 -04:00
-
ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
compilade
2024-06-08 22:34:29 -04:00
-
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
slaren
2024-06-09 01:43:39 +02:00
-
d4d915d351
url: save -mu downloads to new cache location (#7826)
Olivier Chafik
2024-06-08 20:21:08 +01:00
-
7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
sasha0552
2024-06-08 07:50:31 +00:00
-
da799b4189
vulkan : reuse parent extra for views (#7806)
slaren
2024-06-07 19:47:49 +02:00
-
c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803)
Christian Zhou-Zheng
2024-06-07 08:56:01 -04:00
-
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
intelmatt
2024-06-07 05:15:07 -07:00
-
7027b27d76
server: update cache_prompt documentation [no ci] (#7745)
Johannes Gäßler
2024-06-07 11:15:49 +02:00