enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

1aa18ef994 metal : concurrently dispatch commands (#2358) Shouzheng Liu 2023-07-25 08:00:19 -04:00
9a08eaf3c4 Another speed gain for Q4_0 and Q4_1 on Metal (#2375) Kawrakow 2023-07-25 13:48:29 +03:00
129d844c87 Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359) Kawrakow 2023-07-25 13:48:04 +03:00
d5512b782b server: add rms_norm_eps parameter (#2380) slaren 2023-07-25 11:36:17 +02:00
c798308e3a [Server] Escape HTML in webchat (#2368) Henri Vasserman 2023-07-25 10:27:34 +03:00
41c674161f make rms_norm_eps a parameter (#2374) slaren 2023-07-24 17:57:12 +02:00
b3f138d058 Chat UI extras (#2366) Aarni Koskela 2023-07-24 17:54:22 +03:00
5b2b2dc6ae ggml : sync (unary ops refactor, static-correctness) (#2370) Georgi Gerganov 2023-07-24 14:46:21 +03:00
42f70cb2f6 Fix scalar version of Q5_K when QK_K = 64 (#2362) Kawrakow 2023-07-24 12:55:02 +03:00
84e09a7d8b llama : add grammar-based sampling (#1773) Evan Jones 2023-07-23 23:58:10 -04:00
2f9cf974a0 Some more Q4_K and Q5_K speedup on CUDA (#2346) Kawrakow 2023-07-24 00:19:47 +03:00
4f06592cc6 Add gqa parameter support to the server (#2351) IgnacioFDM 2023-07-23 17:31:17 -03:00
70d26ac388 Fix __dp4a documentation (#2348) Johannes Gäßler 2023-07-23 17:49:06 +02:00
57921ca6db common : n_threads == -1 uses std:🧵:hardware_concurrency() (#2347) wzy 2023-07-23 21:33:02 +08:00
3602ac4255 fix n_tasks (#2342) slaren 2023-07-23 15:19:39 +02:00
95a6c595e7 ggml: move op parameters from tensors to ggml_tensor::op_params (#2333) slaren 2023-07-23 14:36:02 +02:00
e76d630df1 llama : grouped-query attention + LLaMAv2 70B support (#2276) Georgi Gerganov 2023-07-23 15:09:47 +03:00
1d0824b247 llama : print help to stdout (#2338) maddes8cht 2023-07-23 13:59:48 +02:00
bc3ec2cdc9 flake : support nix build '.#opencl' (#2337) wzy 2023-07-23 19:57:02 +08:00
a940458e48 llama : print max tensor size to stderr (#2336) Christian Demsar 2023-07-23 07:56:34 -04:00
91171b8072 make : fix CLBLAST compile support in FreeBSD (#2331) Jose Maldonado 2023-07-23 07:52:08 -04:00
355c80f49e examples : simplify vim plugin (#2327) AustinMroz 2023-07-23 06:16:48 -05:00
83a00ce69b metal : support bcast add & dup & cont op (#2323) Jiahao Li 2023-07-23 19:00:37 +08:00
d2a43664f9 Speed up Q4_K (#2322) Kawrakow 2023-07-23 08:49:20 +03:00
b9b7d94fc1 CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313) Johannes Gäßler 2023-07-22 21:27:34 +02:00
b47b8a9cfe llama : optimize memory buffers (#2325) Georgi Gerganov 2023-07-22 21:17:57 +03:00
b5fe67f8c6 Perplexity: Compute scores correlated to HellaSwag (#2312) klosax 2023-07-22 14:21:24 +02:00
24baa54ac1 examples : basic VIM plugin whoreson 2023-07-22 12:34:51 +02:00
dd6c67d3cb ci : fix args Georgi Gerganov 2023-07-22 12:00:56 +03:00
5d500e8ccf ci : add 7B CUDA tests (#2319) Georgi Gerganov 2023-07-22 11:48:22 +03:00
7d5f18468c examples : add easy python script to create quantized (k-bit support) GGML models from local HF Transformer models (#2311) Richard Roberson 2023-07-21 13:01:10 -06:00
d924522a46 Custom RoPE + bettter memory management for CUDA (#2295) Kawrakow 2023-07-21 17:27:51 +03:00
4d76a5f49b Faster Q3_K implementation on Metal (#2307) Kawrakow 2023-07-21 17:05:30 +03:00
0db14fef06 ggml : fix the rope fix (513f861953) Georgi Gerganov 2023-07-21 15:16:55 +03:00
03e566977b examples : fix typo in minigpt4.py (#2298) Ikko Eltociear Ashimine 2023-07-21 20:53:07 +09:00
513f861953 ggml : fix rope args order + assert (#2054) Georgi Gerganov 2023-07-21 14:51:34 +03:00
3973b25a64 gitignore : fix final newline Georgi Gerganov 2023-07-21 14:42:41 +03:00
ab0e26bdfb llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) Guillaume "Vermeille" Sanchez 2023-07-21 12:58:36 +02:00
73643f5fb1 gitignore : changes for Poetry users + chat examples (#2284) Jose Maldonado 2023-07-21 06:53:27 -04:00
a814d04f81 make : fix indentation Georgi Gerganov 2023-07-21 13:50:55 +03:00
4c013bb738 ci : fix MNT realpath usage (#2250) Georgi Gerganov 2023-07-21 13:48:18 +03:00
42c7c2e2e9 make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275) Sky Yan 2023-07-21 18:38:57 +08:00
78a3d13424 flake : remove intel mkl from flake.nix due to missing files (#2277) wzy 2023-07-21 18:26:34 +08:00
ae178ab46b llama : make tensor_split ptr instead of array (#2272) Georgi Gerganov 2023-07-21 13:10:51 +03:00
54e3bc76fe make : add new target for test binaries (#2244) Jiří Podivín 2023-07-21 12:09:16 +02:00
019fe257bb MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) Hatsune Miku 2023-07-21 08:13:18 +00:00
e68c96f7fe Faster Q2_K on Metal (#2297) Kawrakow 2023-07-21 10:44:40 +03:00
9cf022a188 make : fix embdinput library and server examples building on MSYS2 (#2235) Przemysław Pawełczyk 2023-07-21 09:42:21 +02:00
e782c9e735 Faster Q5_K and Q6_K on Metal (#2294) Kawrakow 2023-07-20 18:19:45 +03:00
785829dfe8 Faster Q4_K on Metal (#2290) Kawrakow 2023-07-20 15:18:43 +03:00
fff0e0eafe llama : fix regression from #2000 - could not load no-mmap models Georgi Gerganov 2023-07-20 13:47:26 +03:00
417a85a001 metal: minor q4 optimization and reduce code size (#2248) Shouzheng Liu 2023-07-20 06:32:22 -04:00
294f424554 llama : extend API to get max devices at runtime (#2253) Rinne 2023-07-19 15:06:40 +08:00
45a1b07e9b flake : update flake.nix (#2270) wzy 2023-07-19 15:01:55 +08:00
b1f4290953 cmake : install targets (#2256) wzy 2023-07-19 15:01:11 +08:00
d01bccde9f ci : integrate with ggml-org/ci (#2250) Georgi Gerganov 2023-07-18 14:24:43 +03:00
6cbf9dfb32 llama : shorten quantization descriptions Georgi Gerganov 2023-07-18 11:50:49 +03:00
7568d1a2b2 Support dup & cont ops on CUDA (#2242) Jiahao Li 2023-07-18 01:39:29 +08:00
b7647436cc llama : fix t_start_sample_us initialization warning (#2238) Alex Klinkhamer 2023-07-16 14:01:45 -07:00
672dda10e4 ggml : fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG (#2219) Qingyou Meng 2023-07-17 03:57:28 +08:00
27ab66e437 py : turn verify-checksum-models.py into executable (#2245) Jiří Podivín 2023-07-16 21:54:47 +02:00
6e7cca4047 llama : add custom RoPE (#2054) Xiao-Yong Jin 2023-07-15 06:34:16 -04:00
a6803cab94 flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -04:00
7dabc66f3c make : use pkg-config for OpenBLAS (#2222) wzy 2023-07-15 03:05:08 +08:00
7cdd30bf1f cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) Bach Le 2023-07-15 03:00:58 +08:00
e8035f141e ggml : fix static_assert with older compilers #2024 (#2218) Evan Miller 2023-07-14 14:55:56 -04:00
7513b7b0a1 llama : add functions that work directly on model (#2197) Bach Le 2023-07-15 02:55:24 +08:00
de8342423d build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -07:00
c48c525f87 examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +08:00
206e01de11 cuda : support broadcast add & mul (#2192) Jiahao Li 2023-07-15 02:38:24 +08:00
4304bd3cde CUDA: mul_mat_vec_q kernels for k-quants (#2203) Johannes Gäßler 2023-07-14 19:44:08 +02:00
229aab351c make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) James Reynolds 2023-07-14 11:34:40 -06:00
697966680b ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) Georgi Gerganov 2023-07-14 16:36:41 +03:00
27ad57a69b Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +03:00
32c5411631 Revert "Support using mmap when applying LoRA (#2095)" (#2206) Howard Su 2023-07-13 21:58:25 +08:00
ff5d58faec Fix compile error on Windows CUDA (#2207) Howard Su 2023-07-13 21:58:09 +08:00
b782422a3e devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +02:00
1cbf561466 metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -04:00
975221e954 ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +03:00
4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +03:00
680e6f9177 cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +03:00
4e7464ef88 FP16 is supported in CM=6.0 (#2177) Howard Su 2023-07-12 20:18:40 +08:00
2b5eb72e10 Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) Johannes Gäßler 2023-07-12 10:38:52 +02:00
f7d278faf3 ggml : revert CUDA broadcast changes from #2183 (#2191) Georgi Gerganov 2023-07-12 10:54:19 +03:00
20d7740a9b ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) Georgi Gerganov 2023-07-11 22:53:34 +03:00
5bf2a27718 ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) Spencer Sutton 2023-07-11 12:31:10 -04:00
c9c74b4e3f llama : add classifier-free guidance (#2135) Bach Le 2023-07-12 00:18:43 +08:00
3ec7e596b2 docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +09:00
917831c63a readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -05:00
2347463201 Support using mmap when applying LoRA (#2095) Howard Su 2023-07-11 22:37:01 +08:00
bbef28218f Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) LostRuins 2023-07-11 22:01:08 +08:00
5656d10599 mpi : add support for distributed inference via MPI (#2099) Evan Miller 2023-07-10 11:49:56 -04:00
1d16309969 llama : remove "first token must be BOS" restriction (#2153) oobabooga 2023-07-09 05:59:53 -03:00
db4047ad5c main : escape prompt prefix/suffix (#2151) Nigel Bosch 2023-07-09 03:56:18 -05:00
18780e0a5e readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -03:00
3bbc1a11f0 ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) clyang 2023-07-09 16:12:20 +08:00
2492a53fd0 readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +08:00
64639555ff Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) Johannes Gäßler 2023-07-08 20:01:44 +02:00
061f5f8d21 CUDA: add __restrict__ to mul mat vec kernels (#2140) Johannes Gäßler 2023-07-08 00:25:15 +02:00
84525e7962 docker : add support for CUDA in docker (#1461) dylan 2023-07-07 11:25:25 -07:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full