-
a5cabd7649
server : do not get prompt in infill mode (#7286)
woodx
2024-06-07 15:09:45 +08:00
-
d5c938cd77
[SYCL] fix softmax r2r result wrong issue (#7811)
pengxin99
2024-06-07 14:28:26 +08:00
-
c9ee7118d5
check for nans in imatrix and quantize (#7807)
slaren
2024-06-07 08:01:29 +02:00
-
ee459f40f6
server : fix --threads-http arg (#7801)
Georgi Gerganov
2024-06-06 19:19:59 +03:00
-
f83351f9a6
imatrix : migrate to gpt_params (#7771)
Georgi Gerganov
2024-06-06 16:30:58 +03:00
-
ad675e1c67
Added support for . (any character) token in grammar engine. (#6467)
Clint Herron
2024-06-06 06:08:52 -07:00
-
a143c04375
README minor fixes (#7798) [no ci]
Mattheus Chediak
2024-06-06 09:17:54 -03:00
-
55b2d0849d
grammars: x{min,max} repetition operator (#6640)
Olivier Chafik
2024-06-06 10:07:06 +01:00
-
f5d7b268ec
llama : add jina v2 base code (#7596)
Joan Fontanals
2024-06-06 09:22:41 +02:00
-
2d08b7fbb4
docker : build only main and server in their images (#7782)
slaren
2024-06-06 07:19:49 +02:00
-
d67caea0d6
docker : add openmp lib (#7780)
slaren
2024-06-06 07:17:21 +02:00
-
7672adeec7
Fix encoding in python scripts (#7733)
Galunid
2024-06-05 19:07:24 +02:00
-
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
Johannes Gäßler
2024-06-05 16:53:00 +02:00
-
2b3389677a
ggml : refactor rope norm/neox (#7634)
Georgi Gerganov
2024-06-05 11:29:20 +03:00
-
9973e81c5c
readme : remove -ins (#7759)
arch-btw
2024-06-04 23:40:49 -07:00
-
c90dbe026b
Fix per token atrributes bits (#7749)
jaime-m-p
2024-06-05 01:26:14 +02:00
-
b90dc566c1
Allow number of nodes in CUDA graph to change (#7738)
agray3
2024-06-04 21:06:49 +01:00
-
1442677f92
common : refactor cli arg parsing (#7675)
Georgi Gerganov
2024-06-04 21:23:39 +03:00
-
554c247caf
ggml : remove OpenCL (#7735)
Georgi Gerganov
2024-06-04 21:23:20 +03:00
-
0cd6bd3483
llama : remove beam search (#7736)
Georgi Gerganov
2024-06-04 21:23:05 +03:00
-
5ca0944a15
readme : remove obsolete Zig instructions (#7471)
Georgi Gerganov
2024-06-04 19:43:01 +03:00
-
adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
slaren
2024-06-04 14:32:42 +02:00
-
987d743d6b
Improve hipBLAS support in CMake (#7696)
Daniele
2024-06-04 12:09:15 +00:00
-
b226c1227b
refine .gitignore (#7688)
zhouwg
2024-06-04 19:21:26 +08:00
-
3b38d48609
Per token attributes (#7685)
jaime-m-p
2024-06-04 09:17:17 +02:00
-
6d1616944d
ggml : prevent builds with -ffinite-math-only (#7726)
Georgi Gerganov
2024-06-04 10:01:09 +03:00
-
bde7cd3cd9
llama : offload to RPC in addition to other backends (#7640)
Radoslav Gerganov
2024-06-03 20:03:26 +03:00
-
a5735e4426
ggml : use OpenMP as a thread pool (#7606)
Masaya, Kato
2024-06-04 00:14:15 +09:00
-
0b832d53ba
make: fix debug options not being applied to NVCC (#7714)
Johannes Gäßler
2024-06-03 16:28:58 +02:00
-
3d7ebf6312
Vulkan Mixture of Experts (MoE) support (#7628)
0cc4m
2024-06-03 10:59:14 +02:00
-
a10cda58d3
cmake : add pkg-config spec file for llama.cpp (#7702)
Andy Tai
2024-06-03 01:06:24 -07:00
-
6f28a333c1
llama : MiniCPM support tied embeddings (#7664)
zhangkaihuo
2024-06-03 15:49:30 +08:00
-
549279d804
llama : avoid double token-to-piece cache (#7654)
Georgi Gerganov
2024-06-03 08:34:43 +03:00
-
9e405b6e2e
kompute : implement op_getrows_f32 (#6403)
woachk
2024-06-03 07:32:16 +02:00
-
3413ae2193
fix bug introduced in using calloc (#7701)
Dave Airlie
2024-06-03 07:59:54 +10:00
-
1669810d7c
flake.lock: Update (#7686)
Georgi Gerganov
2024-06-03 00:13:12 +03:00
-
7c4e5b7eae
chore : add ignore rule for generated server themes (#7689)
Austin
2024-06-02 13:39:08 -04:00
-
9422c5e34b
[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)
nickp27
2024-06-02 19:13:54 +10:00
-
e141ce624a
Fix FlashAttention debug test, FP32 assert (#7684)
Johannes Gäßler
2024-06-01 23:26:10 +02:00
-
2e666832e6
server : new UI (#7633)
Yazan Agha-Schrader
2024-06-01 21:31:48 +02:00
-
2ac95c9d56
SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548)
HanishKVC
2024-06-01 21:50:18 +05:30
-
750f60c03e
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)
Johannes Gäßler
2024-06-01 15:47:04 +02:00
-
9b596417af
CUDA: quantized KV support for FA vec (#7527)
Johannes Gäßler
2024-06-01 08:44:14 +02:00
-
a323ec60af
server : update js (#7670)
Georgi Gerganov
2024-05-31 22:23:04 +03:00
-
0515ad93f4
convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660)
Galunid
2024-05-31 17:42:33 +02:00
-
c8047d538f
scripts: update compare_llama_bench.py [no ci] (#7673)
Johannes Gäßler
2024-05-31 16:26:21 +02:00
-
30e238b246
Improve HIP compatibility (#7672)
Daniele
2024-05-31 14:00:29 +00:00
-
16926dff92
readme : link homebrew discussion
Georgi Gerganov
2024-05-31 15:04:58 +03:00
-
0c27e6f62e
ggml : fix loongson compile warnings (#7537)
Georgi Gerganov
2024-05-31 14:17:10 +03:00
-
2e32f874e6
Somehow '**' got lost (#7663)
Galunid
2024-05-31 10:24:41 +02:00
-
1af511fc22
Add convert.py removal to hot topics (#7662)
Galunid
2024-05-31 10:09:20 +02:00
-
0541f06296
[no ci] docs: add aikit to readme (#7650)
Sertaç Özercan
2024-05-30 16:57:16 -07:00
-
9022c33646
Fixed painfully slow single process builds. (#7326)
JohnnyB
2024-05-30 21:32:38 +01:00
-
5921b8f089
llama : cache llama_token_to_piece (#7587)
Georgi Gerganov
2024-05-30 19:01:41 +03:00
-
5dcdf94676
Fix conan badge display [no ci] (#7645)
Martin Delille
2024-05-30 17:07:39 +02:00
-
2e2340de17
Add brew installation instruction to README [no ci] (#7616)
Manuel
2024-05-30 16:58:15 +02:00
-
7846540bd2
readme : add Conan badge (#7638)
Martin Delille
2024-05-30 14:52:50 +02:00
-
e6157f94c8
github: add contact links to issues and convert question into research [no ci] (#7612)
Brian
2024-05-30 21:55:36 +10:00
-
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py (#7430)
Galunid
2024-05-30 13:40:00 +02:00
-
59b0d07766
faster avx512 exp implementation (#7551)
Chris Elrod
2024-05-30 07:32:55 -04:00
-
d5c05821f3
ggml : fix loongarch build (O2 issue) (#7636)
junchao-loongson
2024-05-30 17:30:10 +08:00
-
972b555ab9
README: explain parallel build [no ci] (#7618)
Johannes Gäßler
2024-05-30 09:52:39 +02:00
-
3854c9d07f
[SYCL] fix intel docker (#7630)
Meng, Hengyu
2024-05-30 14:19:08 +08:00
-
eb57fee51f
gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627)
Galunid
2024-05-30 02:10:40 +02:00
-
55d62262a9
metal : remove invalid asserts (#7617)
Georgi Gerganov
2024-05-29 22:20:40 +03:00
-
975ec63ff2
metal : add missing asserts (#7617)
Georgi Gerganov
2024-05-29 20:45:25 +03:00
-
fb76ec31a9
ggml : fix YARN + add tests + add asserts (#7617)
Georgi Gerganov
2024-05-29 20:17:31 +03:00
-
cce3dcffc5
cuda : non-cont concat support (#7610)
Georgi Gerganov
2024-05-29 15:38:26 +03:00
-
210d99173d
llama-bench : add support for the RPC backend (#7435)
Radoslav Gerganov
2024-05-29 14:45:44 +03:00
-
87bdf2a199
ggml : use atomic_flag for critical section (#7598)
slaren
2024-05-29 13:36:39 +02:00
-
00281b7be3
scripts : remove mpi remnants
Georgi Gerganov
2024-05-29 14:31:18 +03:00
-
2ab977282b
sync : ggml
Georgi Gerganov
2024-05-29 14:29:52 +03:00
-
72de268bec
ggml : restore ggml_rope_xpos_inplace (ggml/0)
Georgi Gerganov
2024-05-26 18:35:23 +03:00
-
0e8d8bfd6c
Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605)
Akarshan Biswas
2024-05-29 12:23:47 +05:30
-
504f0c340f
ggml : fix typo in ggml.c (#7603)
zhouwg
2024-05-29 10:09:31 +08:00
-
b864b50ce5
[SYCL] Align GEMM dispatch (#7566)
Meng, Hengyu
2024-05-29 07:00:24 +08:00
-
02c1ecad07
Tokenizer WPM fixes (#7500)
jaime-m-p
2024-05-28 21:46:34 +02:00
-
6bd12ce409
sycl : fix assert (#7563)
Georgi Gerganov
2024-05-28 22:22:50 +03:00
-
5442939fcc
llama : support small Granite models (#7481)
Giuseppe Scrivano
2024-05-28 20:49:49 +02:00
-
56411a950f
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552)
k.h.lai
2024-05-29 01:25:08 +08:00
-
2b737caae1
rpc : resource management rework (#7562)
Radoslav Gerganov
2024-05-28 18:13:36 +03:00
-
ee3dff6b8e
Add support for DeepseekV2ForCausalLM (#7519)
fairydreaming
2024-05-28 17:07:05 +02:00
-
edc29433fa
tests : fix test-tokenizer-0.sh
Georgi Gerganov
2024-05-28 15:04:09 +03:00
-
8b99e2aa66
llama : handle unknown utf8 bytes (#7588)
Georgi Gerganov
2024-05-28 13:55:35 +03:00
-
271ff3fc44
github: add refactor to issue template (#7561)
Brian
2024-05-28 20:27:27 +10:00
-
e2b065071c
[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436)
Neo Zhang
2024-05-28 17:53:37 +08:00
-
0548a4187f
ggml : generalize GGML_OP_CONCAT (#7563)
Georgi Gerganov
2024-05-28 11:04:19 +03:00
-
9335b969e8
server: do not remove whitespace at the start of a completion chunk (#7524)
mgroeber9110
2024-05-28 06:55:51 +02:00
-
c41767154e
Markdownish code block fix (#7571)
Nathan Epstein
2024-05-28 00:41:14 -04:00
-
74b239b3d5
llava : update clip.h (#7580)
Ikko Eltociear Ashimine
2024-05-28 11:48:16 +09:00
-
852aafb163
update HIP_UMA #7399 (#7414)
Djip007
2024-05-28 01:40:47 +02:00
-
0136966daf
adding in x64 targets to cmake presets (#7574)
kunnis
2024-05-27 18:40:12 -05:00
-
10b1e45876
make: add --device-debug to NVCC debug flags (#7542)
Johannes Gäßler
2024-05-27 19:34:40 +02:00
-
197c00681b
Allow multiple copy function pointers for CUDA graph kernel param updates (#7565)
agray3
2024-05-27 18:33:42 +01:00
-
95f84d5ce8
Fix q_xxs using mul_mat_q (#7459)
AidanBeltonS
2024-05-27 17:34:51 +01:00
-
5487593bc7
Add freq factors (#7495)
AidanBeltonS
2024-05-27 13:34:09 +01:00
-
1d8fca72ae
metal : add GGML_OP_REPEAT kernels (#7557)
Georgi Gerganov
2024-05-27 12:10:19 +03:00
-
62bfef5194
metal : disable FA kernel for HS=256 (#7556)
Georgi Gerganov
2024-05-27 10:38:39 +03:00
-
eaf6e03174
llama : add comments about experimental flags (#7544)
Georgi Gerganov
2024-05-27 09:24:13 +03:00
-
d6ef0e77dd
github: add self sorted issue ticket forms (#7543)
Brian
2024-05-27 10:54:30 +10:00