-
6f63d646c1
tokenize : add --show-count (token) option (#8299)
Daniel Bevenius
2024-07-04 18:38:58 +02:00
-
51d2ebadbb
build: Export hf-to-gguf as snakecase
ditsuke
2024-07-04 20:54:35 +05:30
-
1e920018d3
doc: Add context for why we add an explicit pytorch source
ditsuke
2024-07-03 01:02:56 +05:30
-
01a5f06550
chore: Remove rebase artifacts
ditsuke
2024-07-02 15:48:13 +05:30
-
07786a61a2
chore: Fixup requirements and build
ditsuke
2024-07-02 15:35:43 +05:30
-
de14e2ea2b
chore: ignore all __pychache__
ditsuke
2024-07-02 15:18:13 +05:30
-
821922916f
fix: Update script paths in CI scripts
ditsuke
2024-03-10 23:21:46 +05:30
-
b1c3f26e5e
fix: Actually include scripts in build
ditsuke
2024-02-29 01:47:15 +05:30
-
b0a46993df
build(python): Package scripts with pip-0517 compliance
ditsuke
2024-02-27 12:01:02 +05:30
-
807b0c49ff
Inference support for T5 and FLAN-T5 model families (#5763)
fairydreaming
2024-07-04 15:46:11 +02:00
-
f8c4c0738d
tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231)
Daniel Bevenius
2024-07-04 12:53:42 +02:00
-
402d6feffa
llama : suppress unref var in Windows MSVC (#8150)
Daniel Bevenius
2024-07-04 12:50:57 +02:00
-
20fc3804bf
convert : fix gemma v1 tokenizer convert (#8248)
Georgi Gerganov
2024-07-04 10:41:03 +03:00
-
f619024764
[SYCL] Remove unneeded semicolons (#8280)
AidanBeltonS
2024-07-04 02:07:19 +01:00
-
d23287f122
Define and optimize RDNA1 (#8085)
Daniele
2024-07-03 23:02:58 +00:00
-
5f2d4e60e2
ppl : fix n_seq_max for perplexity (#8277)
slaren
2024-07-03 19:33:31 +02:00
-
916248af1f
fix phi 3 conversion (#8262)
Xuan Son Nguyen
2024-07-03 16:01:54 +02:00
-
f8d6a23804
fix typo (#8267)
Judd
2024-07-03 20:40:16 +08:00
-
fadde67135
Dequant improvements rebase (#8255)
AidanBeltonS
2024-07-03 02:55:34 +01:00
-
a27152b602
fix: add missing short command line argument -mli for multiline-input (#8261)
MistApproach
2024-07-02 22:56:46 +02:00
-
3e2618bc7b
Adding step to
clean target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257)
Clint Herron
2024-07-02 13:19:56 -04:00
-
07a3fc0608
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258)
Clint Herron
2024-07-02 12:18:10 -04:00
-
968967376d
Add
JAIS model(s) (#8118)
Faisal Zaghloul
2024-07-02 10:36:00 -04:00
-
023b8807e1
convert-hf : print output file name when completed (#8181)
Daniel Bevenius
2024-07-02 08:40:49 +02:00
-
0e0590adab
cuda : update supports_op for matrix multiplication (#8245)
slaren
2024-07-02 08:39:38 +02:00
-
a9f3b10215
[SYCL] Fix win build conflict of math library (#8230)
luoyu-intel
2024-07-02 04:50:07 +00:00
-
d08c20edde
[SYCL] Fix the sub group size of Intel (#8106)
luoyu-intel
2024-07-02 02:16:00 +00:00
-
5fac350b9c
Fix gemma2 tokenizer convert (#8244)
Xuan Son Nguyen
2024-07-02 01:07:23 +02:00
-
cb5fad4c6c
CUDA: refactor and optimize IQ MMVQ (#8215)
Johannes Gäßler
2024-07-01 20:39:06 +02:00
-
dae57a1ebc
readme: add Paddler to the list of projects (#8239)
Mateusz Charytoniuk
2024-07-01 19:13:22 +02:00
-
49122a873f
gemma2: add sliding window mask (#8227)
Xuan Son Nguyen
2024-07-01 18:48:34 +02:00
-
0ddeff1023
readme : update tool list (#8209)
Roni
2024-07-01 14:48:16 +02:00
-
3840b6f593
nix : enable curl (#8043)
Michael Francis
2024-07-01 07:47:04 -04:00
-
257f8e41e2
nix : remove OpenCL remnants (#8235)
Georgi Gerganov
2024-07-01 14:46:18 +03:00
-
694c59cb42
Document BERT support. (#8205)
iacore
2024-07-01 11:40:58 +00:00
-
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor (#8157)
zhentaoyu
2024-07-01 19:39:06 +08:00
-
d0a7145ba9
flake.lock: Update (#8218)
Georgi Gerganov
2024-07-01 02:09:34 +03:00
-
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203)
Xuan Son Nguyen
2024-06-30 20:27:13 +02:00
-
1c5eba6f8e
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)
Andrei
2024-06-29 20:44:08 -07:00
-
72272b83a3
fix code typo in llama-cli (#8198)
Xuan Son Nguyen
2024-06-29 00:14:20 +02:00
-
8748d8ac6f
json: attempt to skip slow tests when running under emulator (#8189)
Olivier Chafik
2024-06-28 18:02:05 +01:00
-
26a39bbd6b
Add MiniCPM, Deepseek V2 chat template + clean up
llama_chat_apply_template_internal (#8172)
Xuan Son Nguyen
2024-06-28 15:11:44 +02:00
-
38373cfbab
Add SPM infill support (#8016)
Sigbjørn Skjæret
2024-06-28 12:53:43 +02:00
-
b851b3fba0
cmake : allow user to override default options (#8178)
slaren
2024-06-28 12:37:45 +02:00
-
139cc621e9
json: restore default additionalProperties to false, fix some pattern escapes (#8180)
Olivier Chafik
2024-06-28 09:26:45 +01:00
-
e57dc62057
llama: Add support for Gemma2ForCausalLM (#8156)
pculliton
2024-06-28 00:00:43 -04:00
-
a27aa50ab7
Add missing items in makefile (#8177)
Xuan Son Nguyen
2024-06-28 02:19:11 +02:00
-
cb0b06a8a6
json: update grammars/README w/ examples & note about additionalProperties (#8132)
Olivier Chafik
2024-06-27 22:08:42 +01:00
-
558f44bf83
CI: fix release build (Ubuntu+Mac) (#8170)
loonerin
2024-06-27 15:01:23 -04:00
-
8172ee9da9
cmake : fix deprecated option names not working (#8171)
slaren
2024-06-27 20:04:39 +02:00
-
16791b8f0b
Add chatml fallback for cpp
llama_chat_apply_template (#8160)
Xuan Son Nguyen
2024-06-27 18:14:19 +02:00
-
ab3679112d
flake.lock: Update (#8071)
Georgi Gerganov
2024-06-27 18:37:29 +03:00
-
97877eb10b
Control vector loading fixes (#8137)
jukofyork
2024-06-27 15:48:07 +01:00
-
387952651a
Delete examples/llama.android/llama/CMakeLists.txt (#8165)
Raj Hammeer Singh Hada
2024-06-27 20:09:29 +05:30
-
6030c61281
Add Qwen2MoE 57B-A14B model identifier (#8158)
Sigbjørn Skjæret
2024-06-27 16:27:41 +02:00
-
85a267daaa
CUDA: fix MMQ stream-k for --split-mode row (#8167)
Johannes Gäßler
2024-06-27 16:26:05 +02:00
-
f675b20a3b
Added support for Viking pre-tokenizer (#8135)
kustaaya
2024-06-27 11:58:54 +03:00
-
911e35bb8b
llama : fix CodeLlama FIM token checks (#8144)
Sigbjørn Skjæret
2024-06-27 09:46:41 +02:00
-
ac146628e4
Fix llama-android.cpp for error - "common/common.h not found" (#8145)
Raj Hammeer Singh Hada
2024-06-27 07:27:57 +05:30
-
9b31a40c6d
clip : suppress unused variable warnings (#8105)
Daniel Bevenius
2024-06-27 01:50:09 +02:00
-
c70d117c37
scripts : fix filename sync
Georgi Gerganov
2024-06-26 23:25:22 +03:00
-
ae5d0f4b89
ci : publish new docker images only when the files change (#8142)
slaren
2024-06-26 21:59:28 +02:00
-
31ec3993f6
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140)
slaren
2024-06-26 21:34:14 +02:00
-
c7ab7b612c
make : fix missing -O3 (#8143)
slaren
2024-06-26 20:20:22 +02:00
-
f2d48fffde
sync : ggml
Georgi Gerganov
2024-06-26 19:39:19 +03:00
-
4713bf3093
authors : regen
Georgi Gerganov
2024-06-26 19:36:44 +03:00
-
0e814dfc42
devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139)
Georgi Gerganov
2024-06-26 19:32:07 +03:00
-
a95631ee97
readme : update API notes
Georgi Gerganov
2024-06-26 19:26:13 +03:00
-
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
Georgi Gerganov
2024-06-26 18:33:02 +03:00
-
8854044561
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (#8115)
Isaac McFadyen
2024-06-26 02:29:28 -04:00
-
c8771ab5f8
CUDA: fix misaligned shared memory read (#8123)
Johannes Gäßler
2024-06-26 08:28:02 +02:00
-
494165f3b6
llama : extend llm_build_ffn() to support _scale tensors (#8103)
Eddie-Wang
2024-06-26 14:27:46 +08:00
-
9b2f16f805
json: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863)
Olivier Chafik
2024-06-26 01:46:35 +01:00
-
6777c544bd
json: fix additionalProperties, allow space after enum/const (#7840)
Olivier Chafik
2024-06-26 01:45:58 +01:00
-
163d50adaf
fixes #7999 (adds control vectors to all
build_XXX() functions in llama.cpp [needs testing] (#8060)
jukofyork
2024-06-25 21:47:40 +01:00
-
6fcbf68235
llama : implement Unigram tokenizer needed by T5 and FLAN-T5 model families (#5763)
fairydreaming
2024-06-25 21:14:35 +02:00
-
e6bf007744
llama : return nullptr from llama_grammar_init (#8093)
Daniel Bevenius
2024-06-25 21:07:28 +02:00
-
84631fe150
json: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum (#7797)
Olivier Chafik
2024-06-25 20:06:20 +01:00
-
dd047b476c
disable docker CI on pull requests (#8110)
slaren
2024-06-25 19:20:06 +02:00
-
925c30956d
Add healthchecks to llama-server containers (#8081)
joecryptotoo
2024-06-25 08:13:27 -07:00
-
c8ad35955a
Gguf dump start data offset via --data-offset and some extra refactor (#8054)
Brian
2024-06-25 22:03:25 +10:00
-
49c03c79cd
cvector: better prompt handling, add "mean vector" method (#8069)
Xuan Son Nguyen
2024-06-25 13:59:54 +02:00
-
48e6b92cc3
Add chat template support for llama-cli (#8068)
Xuan Son Nguyen
2024-06-25 13:56:49 +02:00
-
3791ad2193
SimpleChat v3.1: Boolean chat request options in Settings UI, cache_prompt (#7950)
HanishKVC
2024-06-25 16:57:35 +05:30
-
f702a90e24
Update control vector help (#8104)
HatsuneMikuUwU33
2024-06-25 10:44:48 +02:00
-
083bacce14
[SYCL] Re-enabled mul_mat_batched_sycl (#8095)
Meng, Hengyu
2024-06-25 10:19:20 +08:00
-
2df373ac40
CUDA: fix matrix multiplication algorithm choice (#8102)
Johannes Gäßler
2024-06-25 01:22:33 +02:00
-
3b099bcd9c
CUDA: fix MMQ writeback for int8 tensor cores (#8100)
Johannes Gäßler
2024-06-24 22:15:33 +02:00
-
a818f3028d
CUDA: use MMQ instead of cuBLAS by default (#8075)
Johannes Gäßler
2024-06-24 17:43:42 +02:00
-
d62e4aaa02
gguf-py : fix tensor groups for encoder-decoder models in gguf-dump.py (#8090)
fairydreaming
2024-06-24 14:13:39 +02:00
-
9a590c8226
CUDA: optimize MMQ int8 tensor core performance (#8062)
Johannes Gäßler
2024-06-24 12:41:23 +02:00
-
52fc8705a0
Option to split during conversion (#6942)
Christian Zhou-Zheng
2024-06-24 05:42:03 -04:00
-
8cb508d0d5
disable publishing the full-rocm docker image (#8083)
slaren
2024-06-24 07:36:11 +02:00
-
646ef4a9cf
embedding : more cli arguments (#7458)
Yann Follet
2024-06-24 13:30:24 +08:00
-
de0d6a68ac
gguf-py, convert-hf : model conversion support for T5 and FLAN-T5 model variants (#5763)
fairydreaming
2024-06-24 07:06:05 +02:00
-
95f57bb5d5
ggml : remove ggml_task_type and GGML_PERF (#8017)
slaren
2024-06-24 03:07:59 +02:00
-
e112b610a1
llama : add support for BitnetForCausalLM (#7931)
Eddie-Wang
2024-06-24 02:27:57 +08:00
-
6a2f298bd7
server : fix JSON-Scheme typo (#7975)
Aarni Koskela
2024-06-23 18:03:08 +03:00
-
11318d9aa1
Fix typo in llama_set_embeddings comment (#8077)
Daniel Bevenius
2024-06-23 15:39:45 +02:00
-
b6b9a8e606
fix CI failures (#8066)
slaren
2024-06-23 13:14:45 +02:00