-
2cca09d509
readme : add Fedora instructions (#6783)
Mohammadreza Hendiani
2024-04-21 16:02:05 +03:30
-
89b0bf0d5d
llava : use logger in llava-cli (#6797)
Justine Tunney
2024-04-21 08:19:04 -04:00
-
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
Pedro Cuenca
2024-04-21 13:50:41 +02:00
-
b8109bc013
doc : server tests require llama to be built with curl enabled (#6788)
Jan Boon
2024-04-21 00:29:50 +08:00
-
aed82f6837
common : try to fix Android CI (#6780)
Georgi Gerganov
2024-04-20 13:27:12 +03:00
-
0e4802b2ec
ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748)
loonerin
2024-04-19 13:03:35 -04:00
-
637e9a86c2
server: static: upstream upgrade (#6765)
Pierrick Hymbert
2024-04-19 13:19:01 +02:00
-
9958c81b79
Implement the OLMo architecture (#6741)
nopperl
2024-04-19 09:35:54 +00:00
-
8b1b1f4982
train : add general name (#6752)
Austin
2024-04-19 03:16:45 -04:00
-
bca40e9814
fix wrong parameter in cmd in readme-sycl.md (#6755)
Neo Zhang
2024-04-19 09:16:31 +08:00
-
0d56246f4b
ggml : group all experts in a single ggml_mul_mat_id (#6505)
slaren
2024-04-18 15:18:48 +02:00
-
03c0946d73
convert : support models with multiple chat templates (#6588)
Sigbjørn Skjæret
2024-04-18 13:49:01 +02:00
-
e11b2e6e1e
Qwen2 : assume tied weights if lm_head/output weights is missing (#6738)
Ren Xuancheng
2024-04-18 19:38:04 +08:00
-
c71bfd736e
llama : fix compatibility with old 2 expert models (#6735)
slaren
2024-04-18 09:04:47 +02:00
-
3b8f1ec4b1
llamafile : tmp disable + build sgemm.o when needed (#6716)
Georgi Gerganov
2024-04-17 23:58:26 +03:00
-
8dd1ec8b3f
readme : add UI (#6724)
Yaroslav
2024-04-17 14:47:50 +02:00
-
facb8b56f8
convert : fix autoawq gemma (#6704)
Zheng.Deng
2024-04-17 04:51:07 +08:00
-
532c1737a1
llama : make general.name optional (#6709)
Georgi Gerganov
2024-04-16 23:50:38 +03:00
-
666867b799
ggml : fix llamafile sgemm wdata offsets (#6710)
Georgi Gerganov
2024-04-16 23:50:22 +03:00
-
8cc91dc63c
ggml : add llamafile sgemm (#6414)
Justine Tunney
2024-04-16 14:55:30 -04:00
-
dbceec87c0
llama : add StableLM2 12B (#6635)
Ashish
2024-04-16 08:48:35 -07:00
-
f4dea7da18
llama : add qwen2moe (#6074)
Shijie
2024-04-16 23:40:48 +08:00
-
8a56075b07
gritlm : add --outdir option to hf.sh script (#6699)
Daniel Bevenius
2024-04-16 08:34:06 +02:00
-
58227ffdeb
perplexity : require positive --ctx-size arg (#6695)
Georgi Gerganov
2024-04-16 09:28:33 +03:00
-
4fbd8098e6
gguf : add special tokens metadata for FIM/Infill (#6689)
Daniel Bevenius
2024-04-16 08:13:13 +02:00
-
7593639ce3
main: add --json-schema / -j flag (#6659)
Olivier Chafik
2024-04-15 18:35:21 +01:00
-
132f55795e
llama : fix restoring the number of outputs from state files (#6687)
compilade
2024-04-15 08:56:55 -04:00
-
3272896d79
server : revert "minor layout improvements" (#6684)
Pierrick Hymbert
2024-04-15 14:18:47 +02:00
-
7fc16a2c32
swift : linux support (#6590)
Steven Prichard
2024-04-15 05:14:46 -05:00
-
17e98d4c96
fix mul_mat_id() for new input, make the ut pass (#6682)
Neo Zhang Jianyu
2024-04-15 17:12:26 +08:00
-
1958f7e06c
llama : add missing kv clear in llama_beam_search (#6664)
David Renshaw
2024-04-14 15:24:15 -04:00
-
04fbc5f23e
Add Command R chat template (#6650)
Chao Jiang
2024-04-15 00:16:34 +08:00
-
f184dd9208
flake.lock: Update (#6669)
Georgi Gerganov
2024-04-14 16:55:30 +03:00
-
422c2aff1c
Added support for GGML_OP_CLAMP in Metal (#6662)
Dave
2024-04-14 07:14:19 -04:00
-
8800226d65
Fix --split-max-size (#6655)
Sigbjørn Skjæret
2024-04-14 13:12:59 +02:00
-
e689fc4e91
[bug fix] convert github repository_owner to lowercase (#6673)
Jaemin Son
2024-04-14 20:12:36 +09:00
-
a4ec34e1cd
convert : enable the
--use-temp-file cli flag (#6645)
James A Capozzoli
2024-04-14 04:40:18 -04:00
-
de17e3f745
fix memcpy() crash, add missed cmd in guide, fix softmax (#6622)
Neo Zhang Jianyu
2024-04-14 10:42:29 +08:00
-
b5e7285baf
CUDA: fix matrix multiplication logic for tests (#6667)
Johannes Gäßler
2024-04-14 00:21:55 +02:00
-
4bd0f93e4a
model: support arch
DbrxForCausalLM (#6515)
Pierrick Hymbert
2024-04-13 11:33:52 +02:00
-
ab9a3240a9
JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555)
Olivier Chafik
2024-04-12 19:43:38 +01:00
-
fbbc030ba9
metal : unify mul_mv_id kernels (#6556)
slaren
2024-04-12 18:13:20 +02:00
-
4cc120c744
infill : add download instructions for model (#6626)
Daniel Bevenius
2024-04-12 14:11:46 +02:00
-
24ee66ed0d
server : coherent log output for KV cache full (#6637)
Pierrick Hymbert
2024-04-12 13:49:21 +02:00
-
91c736015b
llama : add gguf_remove_key + remove split meta during quantize (#6591)
jiez
2024-04-12 18:45:06 +08:00
-
5c4d767ac0
chore: Fix markdown warnings (#6625)
Rene Leonhardt
2024-04-12 10:52:36 +02:00
-
ef21ce4ccb
imatrix : remove invalid assert (#6632)
Georgi Gerganov
2024-04-12 11:49:58 +03:00
-
dee7f8d692
Correct free memory and total memory. (#6630)
MasterYi1024
2024-04-12 16:28:12 +08:00
-
81da18e71c
eval-callback: use ggml_op_desc to pretty print unary operator name (#6631)
Pierrick Hymbert
2024-04-12 10:26:47 +02:00
-
9ed2737acc
ci : disable Metal for macOS-latest-cmake-x64 (#6628)
Georgi Gerganov
2024-04-12 11:15:05 +03:00
-
04a5ac211e
Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616)
Clint Herron
2024-04-11 21:44:50 -04:00
-
f7001ccc5a
As suggested by @slaren, disabling Metal for test to fix CI build on OSX from #6576 (#6619)
Clint Herron
2024-04-11 17:44:48 -04:00
-
a474f50ebb
Refactor Error Handling for CUDA (#6575)
Nikolas
2024-04-11 21:56:29 +02:00
-
cbaadc9294
grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609)
Olivier Chafik
2024-04-11 19:47:34 +01:00
-
1bbdaf6ecd
ci: download artifacts to release directory (#6612)
Hugo Roussel
2024-04-11 19:52:21 +02:00
-
f4183afe6a
scripts : add --outdir option to hf.sh (#6600)
Daniel Bevenius
2024-04-11 15:22:47 +02:00
-
b804b1ef77
eval-callback: Example how to use eval callback for debugging (#6576)
Pierrick Hymbert
2024-04-11 14:51:07 +02:00
-
8228b66dbc
gguf : add option to not check tensor data (#6582)
Daniel Bevenius
2024-04-10 20:16:48 +02:00
-
b3a96f27f0
minor layout improvements (#6572)
Ralph Soika
2024-04-10 19:18:25 +02:00
-
4f407a0a35
llama : add model types for mixtral (#6589)
slaren
2024-04-10 17:24:14 +02:00
-
65c64dc36f
convert.py : add consolidated.safetensors for mixtral 8x22b (#6587)
slaren
2024-04-10 15:23:12 +02:00
-
67fac4b95f
docs : how to add a model (#6565)
Pierrick Hymbert
2024-04-10 08:58:48 +02:00
-
29122d32ac
readme : fix ROCm link (#6579)
Artem Zinnatullin
2024-04-10 00:49:12 -06:00
-
b231b37b09
readme : update UI list (#6560)
sjxx
2024-04-10 14:34:00 +08:00
-
ba5e134e07
readme: fix typo in amdgpu target name (#6573)
Jiří Sejkora
2024-04-10 00:23:02 +02:00
-
1b67731e18
BERT tokenizer fixes (#6498)
Jared Van Bortel
2024-04-09 13:44:08 -04:00
-
c4a3a4ff47
sync : ggml
Georgi Gerganov
2024-04-09 20:29:06 +03:00
-
400d5d722d
server : detect search query to start webchat (#6554)
Ed Lee
2024-04-09 01:31:47 -07:00
-
5dc9dd7152
llama : add Command R Plus support (#6491)
Carolinabanana
2024-04-09 09:16:13 +01:00
-
e11a8999b5
license : update copyright notice + add AUTHORS (#6405)
Georgi Gerganov
2024-04-09 09:23:19 +03:00
-
cc4a95426d
llama : fix attention layer count sanity check (#6550)
Georgi Gerganov
2024-04-08 22:25:49 +03:00
-
cecd8d3c98
Comment explaining a decision (#6531)
kunnis
2024-04-08 10:44:19 -05:00
-
b73e564b16
quantize : fix precedence of cli args (#6541)
Georgi Gerganov
2024-04-08 16:23:01 +03:00
-
e3c337d87c
llama : support negative ith in llama_get_ API (#6519)
Rick G
2024-04-08 06:02:30 -07:00
-
beea6e1b16
llama : save and restore kv cache for single seq id (#6341)
Jan Boon
2024-04-08 20:43:30 +08:00
-
87fb5b4234
remove row=1 cond (#6532)
Abhilash Majumder
2024-04-08 13:56:01 +05:30
-
d752327c33
Adding KodiBot to UI list (#6535)
Firat
2024-04-08 00:48:29 -07:00
-
855f54402e
Change Windows AMD example to release build to make inference much faster. (#6525)
Mark Fairbairn
2024-04-07 19:52:19 +01:00
-
b909236c0b
flake.lock: Update (#6517)
Georgi Gerganov
2024-04-07 21:25:30 +03:00
-
e0717e751e
Add GritLM as supported models. (#6513)
DAN™
2024-04-07 13:33:59 -04:00
-
c37247796b
sync : ggml
Georgi Gerganov
2024-04-07 17:05:51 +03:00
-
f77261a7c5
ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020)
Slava Primenko
2024-04-04 14:49:24 +02:00
-
43e8995e75
scripts : sync ggml-cuda folder
Georgi Gerganov
2024-04-07 16:08:12 +03:00
-
9472bce308
Run make to build the project (#6457)
limitedAtonement
2024-04-07 07:05:40 -04:00
-
d4f220a5cc
support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521)
Neo Zhang Jianyu
2024-04-07 10:55:59 +08:00
-
54ea0698fb
sync : ggml
Georgi Gerganov
2024-04-06 17:43:15 +03:00
-
b66aec675c
backend : fix typo in scheduler documentation (ggml/781)
Daniel Bevenius
2024-04-03 22:57:20 +02:00
-
57dd02c44b
Tests: Added integration tests for GBNF parser (#6472)
Clint Herron
2024-04-06 10:31:33 -04:00
-
75cd4c7729
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495)
Pierrick Hymbert
2024-04-06 05:40:47 +02:00
-
a8bd14d557
gguf.py : add licence and version to gguf writer (#6504)
Brian
2024-04-06 05:41:38 +11:00
-
d0f5deebf8
readme : update UI list (#6503)
Hoang Nguyen
2024-04-05 11:39:43 -07:00
-
87e21bbacd
bench : make n_batch and n_ubatch configurable in Batched bench (#6500)
Ting Sun
2024-04-06 01:34:53 +07:00
-
1b496a745c
[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464)
Ouadie EL FAROUKI
2024-04-05 14:35:06 +01:00
-
a307375c02
readme : add Dot to UI list (#6487)
alexpinel
2024-04-04 18:22:50 +01:00
-
b660a5729e
readme : fix typo (#6481)
Jun Jie
2024-04-05 01:16:37 +08:00
-
0a1d889e27
server: add cURL support to server Dockerfiles (#6474)
Ed Lepedus
2024-04-04 17:31:22 +01:00
-
7dda1b727e
ci: exempt master branch workflows from getting cancelled (#6486)
Minsoo Cheong
2024-04-05 01:30:53 +09:00
-
c666ba26c3
build CI: Name artifacts (#6482)
Ewout ter Hoeven
2024-04-04 17:08:55 +02:00
-
2e66913e5f
server: allow penalizing repetition of newlines on server webpage (#6431)
Shakhar Dasgupta
2024-04-04 11:03:00 -04:00
-
8120efee1d
ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478)
Pierrick Hymbert
2024-04-04 16:59:04 +02:00