enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Files

Ebey Abraham b9e74f9bca llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490 )

* phi2 implementation

* fix breaking change

* phi-2 : various fixes

* phi-2 : use layer norm eps

* py : whitespaces

* llama : fix meta KV override bug

* convert : phi don't add BOS token

* convert : revert "added_tokens_decoder" change

* phi-2 : scale Q instead of KQ for better precision

* ggml : fix NeoX rope to rotate just first n_dims

* cuda : less diff in the rope_neox kernel

* ggml : add ggml_mul_mat_set_prec

ggml-ci

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision

* cuda : remove oboslete comment

---------

Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>

2023-12-18 19:27:47 +02:00

__init__.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

constants.py

llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490 )

2023-12-18 19:27:47 +02:00

gguf_reader.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

gguf_writer.py

llama : add Mixtral support (#4406 )