enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Files

Ebey Abraham b9e74f9bca llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490 )

* phi2 implementation

* fix breaking change

* phi-2 : various fixes

* phi-2 : use layer norm eps

* py : whitespaces

* llama : fix meta KV override bug

* convert : phi don't add BOS token

* convert : revert "added_tokens_decoder" change

* phi-2 : scale Q instead of KQ for better precision

* ggml : fix NeoX rope to rotate just first n_dims

* cuda : less diff in the rope_neox kernel

* ggml : add ggml_mul_mat_set_prec

ggml-ci

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision

* cuda : remove oboslete comment

---------

Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>

2023-12-18 19:27:47 +02:00

CMakeLists.txt

sync : ggml (new ops, tests, backend, etc.) (#4359 )

2023-12-07 22:26:54 +02:00

test-backend-ops.cpp

llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490 )

2023-12-18 19:27:47 +02:00

test-c.c

tests : add a C compliance test (#2848 )

2023-08-30 09:20:26 +03:00

test-double-float.cpp

ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861 )