enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

a03ce38455 finetune : fix #3404 (#3437) xaedes 2023-10-02 15:15:45 +02:00
a847676984 metal : set log callback before initializing (#3427) Adrian 2023-10-02 03:49:59 -07:00
095231dfd3 cmake : fix transient definitions in find pkg (#3411) bandoti 2023-10-02 06:51:49 -03:00
ea55295a74 docker : ignore Git files (#3314) Kevin Ji 2023-10-02 04:53:53 -04:00
c97f01c362 infill : add new example + extend server API (#3296) vvhg1 2023-10-02 09:42:02 +02:00
f5ef5cfb18 ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412) slaren 2023-09-30 18:12:57 +02:00
40e07a60f9 llama.cpp : add documentation about rope_freq_base and scale values (#3401) slaren 2023-09-29 18:42:32 +02:00
bc34dd4f5b train : fix KQ_pos allocation (#3392) Georgi Gerganov 2023-09-29 19:05:18 +03:00
2777a84be4 llama : quantize up to 31% faster on Linux and Windows with mmap (#3206) Cebtenzzre 2023-09-29 09:48:45 -04:00
0a4a4a0982 readme : update hot topics + model links (#3399) BarfingLemurs 2023-09-29 08:50:35 -04:00
569550df20 readme : add link to grammars app (#3388) Andrew Duffy 2023-09-29 07:15:57 -04:00
c71bf2c45c swift : fix build on xcode 15 (#3387) Jhen-Jie Hong 2023-09-29 13:25:13 +08:00
bc39553c90 build : enable more non-default compiler warnings (#3200) Cebtenzzre 2023-09-28 17:41:44 -04:00
0ccfc62a96 ggml_tensor: update the structure comments. (#3283) Hua Jiang 2023-09-28 13:06:18 -07:00
7f1a0fe709 ggml : release the requested thread pool resource (#3292) Qu Zongfu 2023-09-29 03:51:52 +08:00
16bc66d947 llama.cpp : split llama_context_params into model and context params (#3301) slaren 2023-09-28 21:42:38 +02:00
0512d66670 ci : multithreaded builds (#3311) Eve 2023-09-28 19:31:04 +00:00
0e76a8992c train : finetune LORA (#2632) xaedes 2023-09-28 20:40:11 +02:00
2db94d98ed gguf : basic type checking in gguf_get_* (#3346) Cebtenzzre 2023-09-28 14:30:31 -04:00
ecf90b1a51 gguf : make token scores and types optional (#3347) Cebtenzzre 2023-09-28 14:30:15 -04:00
2619109ad5 ci : disable freeBSD builds due to lack of VMs (#3381) Georgi Gerganov 2023-09-28 19:36:36 +03:00
ec893798b7 llama : custom attention mask + parallel decoding + no context swaps (#3228) Georgi Gerganov 2023-09-28 19:04:36 +03:00
45855b3f1c docs : mark code as Bash (#3375) Kevin Ji 2023-09-28 09:11:32 -04:00
4aea3b846e readme : add Mistral AI release 0.1 (#3362) Pierre Alexandre SCHEMBRI 2023-09-28 14:13:37 +02:00
da0400344b ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370) slaren 2023-09-28 12:08:28 +02:00
e519621010 convert : remove bug in convert.py permute function (#3364) Zhang Peiyuan 2023-09-28 02:45:20 +08:00
ac43576124 make-ggml.py : compatibility with more models and GGUF (#3290) Richard Roberson 2023-09-27 10:25:12 -06:00
20c7e1e804 gguf : fix a few general keys (#3341) Cebtenzzre 2023-09-27 12:18:07 -04:00
dc6897404e metal : reusing llama.cpp logging (#3152) Rickard Hallerbäck 2023-09-27 17:48:33 +02:00
527e57cfd8 build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342) Jag Chadha 2023-09-27 11:34:32 -04:00
ffe88a36a9 readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340) BarfingLemurs 2023-09-27 11:30:36 -04:00
99115f3fa6 cmake : fix build-info.h on MSVC (#3309) DAN™ 2023-09-25 18:45:33 -04:00
1726f9626f docs: Fix typo CLBlast_DIR var. (#3330) 2f38b454 2023-09-26 02:24:52 +08:00
a98b1633d5 nix : add cuda, use a symlinked toolkit for cmake (#3202) Erik Scholz 2023-09-25 13:48:30 +02:00
c091cdfb24 llama-bench : add README (#3317) slaren 2023-09-23 21:48:24 +02:00
51a7cf5c6e examples : fix RoPE defaults to match PR #3240 (#3315) Cebtenzzre 2023-09-23 05:28:50 -04:00
bedb92b603 scripts : use /usr/bin/env in shebang (#3313) Kevin Ji 2023-09-22 23:52:23 -04:00
bc9d3e3971 Update README.md (#3289) Lee Drake 2023-09-21 13:00:24 -06:00
36b904e200 ggml-opencl.cpp: Make private functions static (#3300) shibe2 2023-09-21 22:10:26 +04:00
324f3403d5 zig : fix for updated c lib (#3259) Edward Taylor 2023-09-21 21:08:20 +12:00
f56c418ab0 embedding : update README.md (#3224) yuiseki 2023-09-21 17:57:40 +09:00
8185710a80 CUDA: use only 1 thread if fully offloaded (#2915) Johannes Gäßler 2023-09-21 10:43:53 +02:00
7eb41179ed readme : update hot topics Georgi Gerganov 2023-09-20 20:48:22 +03:00
a5661d7e71 llama : allow gguf RoPE keys to be overridden with defaults (#3240) Cebtenzzre 2023-09-20 12:12:47 -04:00
65c2c1c5ab benchmark-matmult : do not use integer abs() on a float (#3277) Cebtenzzre 2023-09-20 12:06:08 -04:00
80834daecf flake : Restore default package's buildInputs (#3262) kang 2023-09-20 22:48:22 +09:00
a40f2b656f CI: FreeBSD fix (#3258) Alon 2023-09-20 15:06:36 +03:00
d119c04c15 examples : fix benchmark-matmult (#1554) Georgi Gerganov 2023-09-20 10:02:39 +03:00
8781013ef6 make : restore build-info.h dependency for several targets (#3205) Cebtenzzre 2023-09-18 10:03:53 -04:00
7ddf185537 ci : switch cudatoolkit install on windows to networked (#3236) Erik Scholz 2023-09-18 02:21:47 +02:00
ee66942d7e CUDA: fix peer access logic (#3231) Johannes Gäßler 2023-09-17 23:35:20 +02:00
111163e246 CUDA: enable peer access between devices (#2470) Johannes Gäßler 2023-09-17 16:37:53 +02:00
8b428c9bc8 llama.cpp : show model size and BPW on load (#3223) slaren 2023-09-17 14:33:28 +02:00
578d8c8f5c CUDA: fix scratch malloced on non-main device (#3220) Johannes Gäßler 2023-09-17 14:16:22 +02:00
b541b4f0b1 Enable BUILD_SHARED_LIBS=ON on all Windows builds (#3215) IsaacDynamo 2023-09-16 19:35:25 +02:00
5dbc2b3213 Enable build with CUDA 11.0 (make) (#3132) Vlad 2023-09-16 17:55:43 +03:00
b08e75baea Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 (#3170) goerch 2023-09-16 13:41:33 +02:00
e6616cf0db examples : add compiler version and target to build info (#2998) Cebtenzzre 2023-09-15 16:59:49 -04:00
3aefaab9e5 check C++ code with -Wmissing-declarations (#3184) Cebtenzzre 2023-09-15 15:38:27 -04:00
69eb67e282 fix build numbers by setting fetch-depth=0 (#3197) Cebtenzzre 2023-09-15 15:18:15 -04:00
4fe09dfe66 llama : add support for StarCoder model architectures (#3187) Meng Zhang 2023-09-16 03:02:13 +08:00
80291a1d02 common : do not use GNU zero-length __VA_ARGS__ extension (#3195) Cebtenzzre 2023-09-15 14:02:01 -04:00
c6f1491da0 metal : fix bug in soft_max kernels (out-of-bounds access) (#3194) Georgi Gerganov 2023-09-15 20:17:24 +03:00
e3d87a6c36 convert : make ftype optional in simple scripts (#3185) Cebtenzzre 2023-09-15 12:29:02 -04:00
8c00b7a6ff sync : ggml (Metal F32 support + reduce ggml-alloc size) (#3192) Georgi Gerganov 2023-09-15 19:06:03 +03:00
7e50d34be6 cmake : fix building shared libs for clang (rocm) on windows (#3176) Engininja2 2023-09-15 06:24:30 -06:00
235f7c193b flake : use pkg-config instead of pkgconfig (#3188) Evgeny Kurnevsky 2023-09-15 10:10:22 +02:00
a51b687657 metal : relax conditions on fast matrix multiplication kernel (#3168) Georgi Gerganov 2023-09-15 11:09:24 +03:00
76164fe2e6 cmake : fix llama.h location when built outside of root directory (#3179) Andrei 2023-09-15 04:07:40 -04:00
c2ab6fe661 ci : Cloud-V for RISC-V builds (#3160) Ali Tariq 2023-09-15 13:06:56 +05:00
2d770505a8 llama : remove mtest (#3177) Roland 2023-09-15 03:28:45 -04:00
98311c4277 llama : make quantize example up to 2.7x faster (#3115) Cebtenzzre 2023-09-14 21:09:53 -04:00
feea179e9f flake : allow $out/include to already exist (#3175) jneem 2023-09-14 13:54:47 -05:00
769266a543 cmake : compile ggml-rocm with -fpic when building shared library (#3158) Andrei 2023-09-14 13:38:16 -04:00
cf8238e7f4 flake : include llama.h in nix output (#3159) Asbjørn Olling 2023-09-14 19:25:00 +02:00
4b8560e72a make : fix clang++ detection, move some definitions to CPPFLAGS (#3155) Cebtenzzre 2023-09-14 13:22:47 -04:00
83a53b753a CI: add FreeBSD & simplify CUDA windows (#3053) Alon 2023-09-14 20:21:25 +03:00
5c872dbca2 falcon : use stated vocab size (#2914) akawrykow 2023-09-14 10:19:42 -07:00
990a5e226a cmake : add relocatable Llama package (#2960) bandoti 2023-09-14 14:04:40 -03:00
980ab41afb docker : add gpu image CI builds (#3103) dylan 2023-09-14 09:47:00 -07:00
e394084166 gguf-py : support identity operation in TensorNameMap (#3095) Kerfuffle 2023-09-14 10:32:26 -06:00
4c8643dd6e feature : support Baichuan serial models (#3009) jameswu2014 2023-09-15 00:32:10 +08:00
35f73049af speculative : add heuristic algorithm (#3006) Leng Yue 2023-09-14 09:14:44 -07:00
71ca2fad7d whisper : tokenizer fix + re-enable tokenizer test for LLaMa (#3096) goerch 2023-09-13 15:19:44 +02:00
1b6c650d16 cmake : add a compiler flag check for FP16 format (#3086) Tristan Ross 2023-09-13 06:08:52 -07:00
0a5eebb45d CUDA: mul_mat_q RDNA2 tunings (#2910) Johannes Gäßler 2023-09-13 11:20:24 +02:00
84e723653c speculative: add --n-gpu-layers-draft option (#3063) FK 2023-09-13 08:50:46 +02:00
b52b29ab9d arm64 support for windows (#3007) Eric Sommerlade 2023-09-13 02:54:20 +01:00
4f7cd6ba9c CUDA: fix LoRAs (#3130) Johannes Gäßler 2023-09-13 00:15:33 +02:00
89e89599fd CUDA: fix mul_mat_q not used for output tensor (#3127) Johannes Gäßler 2023-09-11 22:58:41 +02:00
d54a4027a6 CUDA: lower GPU latency + fix Windows performance (#3110) Johannes Gäßler 2023-09-11 19:55:51 +02:00
1b0d09259e cmake : support build for iOS/tvOS (#3116) Jhen-Jie Hong 2023-09-11 19:49:06 +08:00
8a4ca9af56 CUDA: add device number to error messages (#3112) Johannes Gäßler 2023-09-11 13:00:24 +02:00
f31b6f4e2d metal : PP speedup (#3084) Kawrakow 2023-09-11 09:30:11 +02:00
6eeb4d9083 convert: remove most of the n_mult usage in convert.py (#3098) Erik Scholz 2023-09-10 17:06:53 +02:00
21ac3a1503 metal : support for Swift (#3078) kchro3 2023-09-09 02:12:10 -07:00
4fd5477955 metal : support build for iOS/tvOS (#3089) Jhen-Jie Hong 2023-09-09 16:46:04 +08:00
ec2a24fedf flake : add train-text-from-scratch to flake.nix (#3042) takov751 2023-09-08 17:06:26 +01:00
7d99aca759 readme : fix typo (#3043) Ikko Eltociear Ashimine 2023-09-09 01:04:32 +09:00
ba7ffbb251 metal : Q3_K speedup (#2995) Kawrakow 2023-09-08 18:01:04 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full