enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Files

Stephan Walter 1b107b8550 ggml : generalize quantize_fns for simpler FP16 handling (#1237 )

* Generalize quantize_fns for simpler FP16 handling

* Remove call to ggml_cuda_mul_mat_get_wsize

* ci : disable FMA for mac os actions

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2023-07-05 19:13:06 +03:00

CMakeLists.txt

ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360 )

2023-05-13 15:56:40 +03:00

test-double-float.c

all : be more strict about converting float to double (#458 )

2023-03-28 19:48:20 +03:00

test-grad0.c

tests : sync test-grad0 from ggml

2023-06-24 19:40:18 +03:00

test-opt.c

ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360 )

2023-05-13 15:56:40 +03:00

test-quantize-fns.cpp

ggml : generalize quantize_fns for simpler FP16 handling (#1237 )

2023-07-05 19:13:06 +03:00

test-quantize-perf.cpp

ggml : generalize quantize_fns for simpler FP16 handling (#1237 )

2023-07-05 19:13:06 +03:00

test-sampling.cpp

llama : fix top-p sampling to match the canonical definition (#1953 )

2023-06-24 13:15:01 +03:00

test-tokenizer-0.cpp

llama : make model stateless and context stateful (llama_state) (#1797 )

2023-06-24 11:47:58 +03:00