enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

20fbf2a2a0 ggml : change immintrin.h to intrin.h for compatibility (#1307) Ron Jailall 2023-05-04 11:05:59 -04:00
db1080876a Only escape prompts when used with -e (#1311) DannyDaemonic 2023-05-04 05:08:25 -07:00
c65a7fbfa9 Update main's README.md with new features (#1296) DannyDaemonic 2023-05-04 03:02:59 -07:00
f647ce040f fix #1224 reverse prompt and multi line (#1297) Tomas 2023-05-04 17:02:30 +07:00
799fdc1b5d ggml : vectorize Q8_0 quantization Georgi Gerganov 2023-05-03 23:24:20 +03:00
6daa09d879 examples : read chat prompts from a template file (#1196) khimaros 2023-05-03 10:58:11 -07:00
bca9ad938a minor : fix whitespaces (#1302) Georgi Gerganov 2023-05-03 20:09:42 +03:00
e2a937ca6a minor : fix trailing whitespaces Georgi Gerganov 2023-05-03 18:43:23 +03:00
b0c71c7b6d scripts : platform independent script to verify sha256 checksums (#1203) KASR 2023-05-03 17:31:28 +02:00
a8a2efdc81 examples : various prompt and example fixes (#1298) CRD716 2023-05-03 10:26:47 -05:00
e216aa0463 llama : only copy used KV cache in get / set state (#1272) Evan Jones 2023-05-02 22:26:13 -04:00
2485d7a4d3 Process escape sequences given in prompts (#1173) DannyDaemonic 2023-05-02 18:46:20 -07:00
13b0c68ed7 Handle signals properly on Windows (#1123) DannyDaemonic 2023-05-02 18:01:57 -07:00
55bc5f0900 Call sh on build-info.sh (#1294) DannyDaemonic 2023-05-02 17:52:35 -07:00
9daff419f6 fix build-info.h for git submodules (#1289) kuvaus 2023-05-03 03:43:43 +03:00
bf4b22ffe4 fix missing parameters in llama_init_from_gpt_params (#1293) slaren 2023-05-03 01:36:45 +02:00
67c77799e0 examples : add llama_init_from_gpt_params() common function (#1290) Ron Evans 2023-05-02 22:39:51 +02:00
0e6cbff1b7 llama : fix compile warnings Georgi Gerganov 2023-05-02 23:09:08 +03:00
5d5817ca60 ggml : fix 32-bit ARM Georgi Gerganov 2023-05-02 22:14:50 +03:00
8c9be35ff9 examples : improve vertical alignment of a few variables (#1286) Ron Evans 2023-05-02 19:53:52 +02:00
cc0bb7235c ggml : fix ppc64le build error and make cmake detect Power processors (#1284) Marvin Gießing 2023-05-02 18:42:16 +02:00
2bb992f034 llama : allow 0 as a seed number. (#1275) Robert Brisita 2023-05-02 12:23:44 -04:00
e2cd506999 main : switch input_noecho to input_echo to remove negation (#979) Ron Evans 2023-05-02 18:13:26 +02:00
2d099e5193 ggml: add names to tensors (#1268) slaren 2023-05-02 16:03:00 +02:00
f4cef87edf Add git-based build information for better issue tracking (#1232) DannyDaemonic 2023-05-01 09:23:47 -07:00
58b367c2d7 cuBLAS: refactor and optimize f16 mat mul performance (#1259) slaren 2023-05-01 18:11:07 +02:00
ea3a0ad6b6 llama : update stubs for systems without mmap and mlock (#1266) xloem 2023-05-01 08:58:51 -04:00
2bdc09646d ggml : fix ggml_used_mem() (#1264) Kerfuffle 2023-05-01 05:56:07 -06:00
70269cae37 llama : fix session load / save (#1263) Georgi Gerganov 2023-05-01 14:54:59 +03:00
b925f1f1b0 cuBLAS: fall back to pageable memory if pinned alloc fails (#1233) slaren 2023-05-01 13:32:22 +02:00
90b19bd6ee llama : let context be const when accessing const data (#1261) Alex Klinkhamer 2023-05-01 00:24:20 -07:00
7ff0dcd320 ggml : fix UB (int << 31) Georgi Gerganov 2023-04-30 22:28:51 +03:00
6f79699286 build: add armv{6,7,8} support to cmake (#1251) Pavol Rusnak 2023-04-30 20:48:38 +02:00
a5d30b1f53 common : better default number of threads (#934) jon-chuang 2023-04-30 14:41:35 -04:00
76a884920a ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225) 0cc4m 2023-04-30 20:34:52 +02:00
6bc4400e67 ggml : add Q5 WASM SIMD + GGML_FTYPE Georgi Gerganov 2023-04-30 19:07:00 +03:00
f0d70f147d Various fixes to mat_mul benchmark (#1253) Stephan Walter 2023-04-30 12:32:37 +00:00
3e5aa8a1c4 ggml : fix labels for GGML_OP_ALIBI Georgi Gerganov 2023-04-30 10:25:46 +03:00
c3ca7a5f05 ggml : fix 32-bit ARM NEON Georgi Gerganov 2023-04-29 21:34:23 +03:00
e8c051611a ggml : use vzip instead of vuzp for consistency Georgi Gerganov 2023-04-29 21:12:56 +03:00
0b5a935099 ggml : fix visibility and unused warnings Georgi Gerganov 2023-04-29 19:28:36 +03:00
ec728e44d7 ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) Georgi Gerganov 2023-04-29 18:43:42 +03:00
214b6a3570 ggml : adjust mul_mat_f16 work memory (#1226) Georgi Gerganov 2023-04-29 18:43:28 +03:00
305eb5afd5 build : fix reference to old llama_util.h Georgi Gerganov 2023-04-29 13:53:12 +03:00
84ca9c2ecf examples : fix save-load-state + rename llama-util.h Georgi Gerganov 2023-04-29 13:48:11 +03:00
334637e43e common : change default parameters to pre-#1126 (#1223) Georgi Gerganov 2023-04-29 09:51:06 +03:00
dd7eff57d8 llama : new sampling algorithms (#1126) Ivan Stepanov 2023-04-29 08:34:41 +03:00
7fc50c051a cuBLAS: use host pinned memory and dequantize while copying (#1207) slaren 2023-04-29 02:04:18 +02:00
b1ee8f59b4 cuBLAS: non-contiguous tensor support (#1215) Henri Vasserman 2023-04-29 02:31:56 +03:00
36d19a603b Remove Q4_3 which is no better than Q5 (#1218) Stephan Walter 2023-04-28 23:10:43 +00:00
7f15c5c477 readme : update hot topics Georgi Gerganov 2023-04-28 21:32:52 +03:00
55390bcaf2 ggml : sync ggml (ggml_alibi) Georgi Gerganov 2023-04-28 20:37:43 +03:00
5fba3c016b examples : add Jeopardy example (#1168) CRD716 2023-04-28 11:13:33 -05:00
1481a9cf25 llama : add session file format and saved sessions in main (#1169) Evan Jones 2023-04-28 11:59:37 -04:00
11d902364b ggml : add helper debug printf in soft_max Georgi Gerganov 2023-04-28 17:58:44 +03:00
7296c961d9 ggml : add CLBlast support (#1164) 0cc4m 2023-04-28 16:57:16 +02:00
78ec543733 Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +05:00
92a6e13a31 Add Manjaro CUDA include and lib dirs to Makefile (#1212) Johannes Gäßler 2023-04-28 15:40:32 +02:00
04aaae1d79 add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) Yann Follet 2023-04-28 19:59:48 +08:00
0b2da20538 ggml : slightly faster AVX2 implementation for Q5 (#1197) Stephan Walter 2023-04-26 20:26:42 +00:00
f9be42add0 readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +03:00
574406dc7e ggml : add Q5_0 and Q5_1 quantization (#1187) Georgi Gerganov 2023-04-26 23:14:13 +03:00
87a6f846d3 Allow setting the rng seed after initialization. (#1184) Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +00:00
ea3ad7eb60 Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +02:00
859fee6dfb quantize : use map to assign quantization type from string (#1191) Pavol Rusnak 2023-04-26 18:43:27 +02:00
4afcc37869 Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +00:00
667c501334 py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +02:00
bb98e77be7 nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +02:00
7a32fcb3b2 ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) Georgi Gerganov 2023-04-25 23:40:51 +03:00
dd0eabc049 ggml : use full range for Q4_0 and Q4_2 quantization (#729) unbounded 2023-04-25 19:20:46 +02:00
54bb60e268 ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) xaedes 2023-04-24 23:02:02 +02:00
8a0f8673ba ggml : export symbols (#1155) Georgi Gerganov 2023-04-24 22:18:25 +03:00
0c5692345d examples : add save_load_state example (#1150) xaedes 2023-04-24 18:23:31 +02:00
957c8ae21d llama : increase scratch buffer size for 65B (ref #1152) Georgi Gerganov 2023-04-24 18:47:03 +03:00
9b0a4d4214 examples/main README improvements and some light refactoring (#1131) mgroeber9110 2023-04-24 17:45:32 +02:00
2ec83428de Fix build for gcc 8 and test in CI (#1154) Stephan Walter 2023-04-24 15:38:26 +00:00
e4cf982e0d Fix cuda compilation (#1128) slaren 2023-04-24 17:29:58 +02:00
c4fe84fb0d llama : refactor get / set state + remove redundant kv cache API (#1143) Georgi Gerganov 2023-04-24 07:40:02 +03:00
1d78fecdab Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +02:00
284685f169 scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +03:00
edce63baa9 Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -07:00
ec9cdb6752 ggml : do not print perf ops that have not been used at all Georgi Gerganov 2023-04-23 18:32:52 +03:00
e4422e299c ggml : better PERF prints + support "LLAMA_PERF=1 make" Georgi Gerganov 2023-04-23 18:15:39 +03:00
53c8434398 Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) Stephan Walter 2023-04-23 11:01:03 +00:00
c6524f46eb readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +02:00
c9e2c26f41 A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) Yishuo Wang 2023-04-23 15:57:05 +08:00
0e018fe008 ggml : fix Q4_3 cuBLAS Georgi Gerganov 2023-04-22 16:31:56 +03:00
857308d1e8 ci : trigger CI for drafts, but not most PR actions (#1125) Stephan Walter 2023-04-22 13:12:29 +00:00
c50b628810 Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) Stephan Walter 2023-04-22 10:54:13 +00:00
5f939498d5 ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +02:00
36b4f7e064 llama : print timings on ctrl+c exit (#1021) wbpxre150 2023-04-22 16:56:35 +08:00
10f19c1121 llama : have n_batch default to 512 (#1091) eiery 2023-04-22 04:27:05 -04:00
7e312f165c cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) Howard Su 2023-04-22 16:18:20 +08:00
872c365a91 ggml : fix AVX build + update to new Q8_0 format Georgi Gerganov 2023-04-22 11:08:12 +03:00
955ef9a5d5 ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +03:00
c5aa5e5777 ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) Stephan Walter 2023-04-22 07:37:05 +00:00
e9a9cb0c54 examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -04:00
b6e7f9b09e llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) xaedes 2023-04-22 08:21:32 +02:00
50cb666b8a Improve cuBLAS performance by using a memory pool (#1094) slaren 2023-04-21 21:59:17 +02:00
25d7abbd1f llama : fixed rlimit error message (#888) apaz 2023-04-21 13:48:06 -05:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full