-
a74401f0e5
Correct README link (#6458)
limitedAtonement
2024-04-04 10:30:02 -04:00
-
7a2c92637a
ci: bench: add more ftype, fix triggers and bot comment (#6466)
Pierrick Hymbert
2024-04-04 11:57:58 +02:00
-
4bcd6b959c
common: remove duplicate check for curl (#6471)
Daniel Bevenius
2024-04-04 09:49:21 +02:00
-
9b84ae1806
examples : add GBNF validator program (#5948)
Clint Herron
2024-04-04 03:44:28 -04:00
-
4399f13fb9
server : remove obsolete --memory-f32 option
Georgi Gerganov
2024-04-04 09:34:58 +03:00
-
1a43c7254e
server : add option to disable KV offload (#6468)
Xiao-Yong Jin
2024-04-04 01:33:48 -05:00
-
72d73af651
convert : fix for lint error complaining of bare except (#6470)
Clint Herron
2024-04-04 02:32:53 -04:00
-
5fb1574c81
A few small fixes to server's README docs (#6428)
Fattire
2024-04-03 13:22:57 -07:00
-
60cdf40cc3
server : handle exception on wrong type in request (#6452)
JH23X
2024-04-03 20:09:52 +02:00
-
bb43cf7e9d
llama : add SEA-LION support (#6448)
bryanSwk
2024-04-04 02:05:10 +08:00
-
9f62c0173d
ci : update checkout, setup-python and upload-artifact to latest (#6456)
Ewout ter Hoeven
2024-04-03 20:01:13 +02:00
-
5d4f12e462
server: add cURL support to
server.Dockerfile (#6461)
Ed Lepedus
2024-04-03 18:56:37 +01:00
-
154d4ee39c
readme : add feature-rich rust bindings (#6465)
Francisco Melo
2024-04-03 18:53:37 +01:00
-
e69945d953
security : create policy (#6354)
Joyce
2024-04-03 14:48:07 -03:00
-
db214fa578
Missing tokenizer.model error during gguf conversion (#6443)
Abhishek Gopinath K
2024-04-03 21:12:52 +05:30
-
1ff4d9f3d6
Add OpenChat, Alpaca, Vicuna chat templates (#6397)
kaizau
2024-04-03 23:24:31 +08:00
-
076b08649e
readme : update hot topics
Georgi Gerganov
2024-04-03 16:11:15 +03:00
-
08a0c02060
ggml : mul_mat_id use the same tensor for all the experts (#6387)
slaren
2024-04-03 15:07:05 +02:00
-
52604860f9
[SYCL] Disable iqx on windows as WA (#6435)
Meng, Hengyu
2024-04-03 10:34:40 +08:00
-
f87f7b8986
flake.lock: Update (#6402)
Georgi Gerganov
2024-04-01 19:05:57 +03:00
-
33a5244806
compare-llama-bench.py: fix long hexsha args (#6424)
Johannes Gäßler
2024-04-01 13:30:43 +02:00
-
226e819371
ci: server: verify deps are coherent with the commit (#6409)
Pierrick Hymbert
2024-04-01 12:36:40 +02:00
-
c50a82ce0f
readme : update hot topics
Georgi Gerganov
2024-03-31 11:56:30 +03:00
-
37e7854c10
ci: bench: fix Resource not accessible by integration on PR event (#6393)
Pierrick Hymbert
2024-03-30 11:36:07 +01:00
-
c342d070c6
Fedora build update (#6388)
Mohammadreza Hendiani
2024-03-30 01:29:56 +03:30
-
f7fc5f6c6f
split: allow --split-max-size option (#6343)
Xuan Son Nguyen
2024-03-29 22:34:44 +01:00
-
ba0c7c70ab
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
0cc4m
2024-03-29 17:29:21 +01:00
-
d48ccf3ad4
sync : ggml (#6351)
Georgi Gerganov
2024-03-29 17:45:46 +02:00
-
069574775c
[Model] Add support for xverse (#6301)
hxer7963
2024-03-29 21:37:03 +08:00
-
cfde806eb9
ci : fix BGE wget (#6383)
Georgi Gerganov
2024-03-29 14:34:28 +02:00
-
b910287954
readme : add project (#6356)
zhouwg
2024-03-29 15:33:46 +08:00
-
8093987090
cmake : add explicit metal version options (#6370)
Matt Clayton
2024-03-29 03:27:42 -04:00
-
057400a3fd
llama : remove redundant reshape in build_kv_store (#6369)
Daniel Bevenius
2024-03-29 08:23:22 +01:00
-
b75c38166c
convert : allow conversion of Mistral HF models (#6144)
Pedro Cuenca
2024-03-29 08:15:00 +01:00
-
bfe7dafc9c
readme : add notice for UI list
Georgi Gerganov
2024-03-28 22:56:03 +02:00
-
5106ef482c
[SYCL] Revisited & updated SYCL build documentation (#6141)
Ouadie EL FAROUKI
2024-03-28 16:01:47 +00:00
-
be55134a53
convert : refactor vocab selection logic (#6355)
Jared Van Bortel
2024-03-28 11:44:36 -04:00
-
66ba560256
llava : fix MobileVLM (#6364)
Ziang Wu
2024-03-28 22:33:10 +08:00
-
0308f5e3d7
llama : fix command-r inference when omitting outputs (#6367)
compilade
2024-03-28 08:05:54 -04:00
-
28cb9a09c4
ci: bench: fix master not schedule, fix commit status failed on external repo (#6365)
Pierrick Hymbert
2024-03-28 11:27:56 +01:00
-
cfc4d75df6
doc: fix outdated default value of batch size (#6336)
Ting Sun
2024-03-28 16:51:06 +08:00
-
6902cb7f2e
server : stop gracefully on SIGTERM (#6348)
Eric Zhang
2024-03-28 16:50:48 +08:00
-
d2d8f38996
nix: removed unnessesary indentation
hutli
2024-03-27 19:17:30 +01:00
-
d39b308eaf
nix: moved blas availability check to package inputs so it is still overridable
hutli
2024-03-27 19:14:28 +01:00
-
c873976649
using blas.meta.available to check host platform
hutli
2024-03-27 18:10:08 +01:00
-
dbb03e2b9c
only using explicit blas if hostPlatform is allowed
hutli
2024-03-27 17:25:05 +01:00
-
e9f17dc3bf
nix: .#windows: proper cross-compilation set-up
Someone Serge
2024-03-26 16:22:42 +00:00
-
22a462cc1f
nix: package: don't introduce the dependency on python
Someone Serge
2024-03-26 16:22:07 +00:00
-
f6a0f5c642
nix: .#widnows: init
hutli
2024-02-15 14:25:04 +01:00
-
d0e2f6416b
doc: fix typo in MobileVLM-README.md (#6181)
Ziang Wu
2024-03-28 12:03:30 +08:00
-
25f4a613c4
[SYCL] fix set main gpu crash (#6339)
Neo Zhang Jianyu
2024-03-28 08:55:24 +08:00
-
a016026a3a
server: continuous performance monitoring and PR comment (#6283)
Pierrick Hymbert
2024-03-27 20:26:49 +01:00
-
53c7ec53d5
nix: ci: dont test cuda and rocm (for now)
Someone Serge
2024-03-27 16:17:46 +00:00
-
e5b89a441a
ggml : fix bounds checking of zero size views (#6347)
slaren
2024-03-27 15:07:50 +01:00
-
3a0345970e
make : whitespace
Georgi Gerganov
2024-03-27 15:02:49 +02:00
-
1e13987fba
embedding : show full embedding for single prompt (#6342)
howlger
2024-03-27 12:15:44 +01:00
-
e82f9e2b83
[SYCL] Fix batched impl for NVidia GPU (#6164)
AidanBeltonS
2024-03-27 08:16:40 +00:00
-
cbc8343619
Make IQ1_M work for QK_K = 64 (#6327)
Kawrakow
2024-03-27 08:44:27 +01:00
-
e562b9714b
common : change --no-penalize-nl to --penalize-nl (#6334)
Sigbjørn Skjæret
2024-03-27 08:23:10 +01:00
-
2ab4f00d25
llama2c : open file as binary (#6332)
Georgi Gerganov
2024-03-27 09:16:02 +02:00
-
1740d6dd4e
readme : add php api bindings (#6326)
Mateusz Charytoniuk
2024-03-27 08:08:59 +01:00
-
0642b22cd1
server: public: use relative routes for static files (#6325)
Eric Zhang
2024-03-27 13:55:29 +08:00
-
a4f569e8a3
[SYCL] fix no file in win rel (#6314)
Neo Zhang Jianyu
2024-03-27 09:47:06 +08:00
-
32c8486e1f
wpm : portable unicode tolower (#6305)
Jared Van Bortel
2024-03-26 17:46:21 -04:00
-
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
compilade
2024-03-26 10:46:41 -04:00
-
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
Kawrakow
2024-03-26 15:21:27 +01:00
-
e097633f63
convert-hf : fix exception in sentencepiece with added tokens (#6320)
Pedro Cuenca
2024-03-26 13:32:19 +01:00
-
d25b1c31b0
quantize : be able to override metadata by key (#6321)
Kawrakow
2024-03-26 13:09:30 +01:00
-
deb7240100
embedding : adjust
n_ubatch value (#6296)
Minsoo Cheong
2024-03-26 18:11:46 +09:00
-
3d032ece8e
server : add
n_discard parameter (#6300)
Jan Boon
2024-03-26 16:47:43 +08:00
-
e190f1fca6
nix: make
xcrun visible in Nix sandbox for precompiling Metal shaders (#6118)
Joseph Stahl
2024-03-25 20:51:46 -04:00
-
280345968d
cuda : rename build flag to LLAMA_CUDA (#6299)
slaren
2024-03-26 01:16:01 +01:00
-
b06c16ef9f
nix: fix blas support (#6281)
Christian Kögler
2024-03-25 18:52:45 +01:00
-
1f2fd4e727
tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303)
Kawrakow
2024-03-25 18:33:15 +01:00
-
43139cc528
flake.lock: Update (#6266)
Georgi Gerganov
2024-03-25 17:22:27 +02:00
-
2f34b865b6
cuda : fix LLAMA_CUDA_F16 build (#6298)
slaren
2024-03-25 15:43:22 +01:00
-
ae1f211ce2
cuda : refactor into multiple files (#6269)
slaren
2024-03-25 13:50:23 +01:00
-
ad3a0505e3
Server: clean up OAI params parsing function (#6284)
Xuan Son Nguyen
2024-03-25 09:42:17 +01:00
-
95ad616cdd
[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)
Neo Zhang Jianyu
2024-03-25 15:52:41 +08:00
-
64e7b47c69
examples : add "retrieval" (#6193)
Minsoo Cheong
2024-03-25 16:38:22 +09:00
-
7733f0c760
ggml : support AVX512VNNI (#6280)
Justine Tunney
2024-03-25 01:39:56 -04:00
-
a32b77c4b2
Fix heap corruption from wmode out-of-bound writes on windows (#6272)
Rick G
2024-03-24 14:45:56 -07:00
-
a0e584defd
imatrix : fix wname for mul_mat_id ops (#6271)
Georgi Gerganov
2024-03-24 16:18:45 +02:00
-
7aed0ffe68
Fixed lookup compilation issues on Windows (#6273)
Johannes Gäßler
2024-03-24 14:21:17 +01:00
-
ea279d5609
ci : close inactive issue, increase operations per run (#6270)
Pierrick Hymbert
2024-03-24 09:57:06 +01:00
-
586e7bc561
sampling : deduplicated code for probability distribution access (#6240)
Minsoo Cheong
2024-03-24 17:54:07 +09:00
-
ddf6568510
[SYCL] offload op (#6217)
Meng, Hengyu
2024-03-24 12:04:25 +08:00
-
d03224ac98
Support build win release for SYCL (#6241)
Neo Zhang Jianyu
2024-03-24 09:44:01 +08:00
-
94d1b3b411
use _wfopen instead of fopen on Windows (#6248)
Jared Van Bortel
2024-03-23 18:48:02 -04:00
-
95562175f8
gitignore : gguf-split
Georgi Gerganov
2024-03-23 21:35:23 +02:00
-
f482bb2e49
common: llama_load_model_from_url split support (#6192)
Pierrick Hymbert
2024-03-23 18:07:00 +01:00
-
1997577d5e
server: docs:
--threads and --threads, --ubatch-size, --log-disable (#6254)
Pierrick Hymbert
2024-03-23 18:00:38 +01:00
-
476b0251b2
llama : add grok-1 support (#6204)
Julius Arkenberg
2024-03-23 17:41:53 +01:00
-
21cad01b6e
split: add gguf-split in the make build target (#6262)
Pierrick Hymbert
2024-03-23 17:18:13 +01:00
-
1b26aebe4d
server: flush stdout after logging in both text and json layout (#6253)
Pierrick Hymbert
2024-03-23 13:18:45 +01:00
-
50ccaf5eac
lookup: complement data from context with general text statistics (#5479)
Johannes Gäßler
2024-03-23 01:24:36 +01:00
-
56a00f0a2f
common : default --hf-file to --model (#6234)
Georgi Gerganov
2024-03-22 21:10:39 +02:00
-
92397d87a4
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
fraxy-v
2024-03-22 20:49:06 +02:00
-
1d0331c12a
quantize: options for output and token embedding tensors qtype (#6239)
Kawrakow
2024-03-22 19:47:14 +01:00
-
dba1af6129
llama_model_loader: support multiple split/shard GGUFs (#6187)
Pierrick Hymbert
2024-03-22 19:00:01 +01:00