Younes Belkada
b40eb84895
llama : support for falcon-mamba architecture (#9074)
* feat: initial support for llama.cpp
* fix: lint
* refactor: better refactor
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* fix: address comments
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
* fix: add more cleanup and harmonization
* fix: lint
* Update gguf-py/gguf/gguf_writer.py
Co-authored-by: compilade <git@compilade.net>
* fix: change name
* Apply suggestions from code review
Co-authored-by: compilade <git@compilade.net>
* add in operator
* fix: add `dt_b_c_rms` in `llm_load_print_meta`
* fix: correct printf format for bool
* fix: correct print format
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* llama : quantize more Mamba tensors
* llama : use f16 as the fallback of fallback quant types
---------
Co-authored-by: compilade <git@compilade.net>
2024-08-21 11:06:36 +03:00
..
2024-07-18 20:40:15 +10:00
2024-08-21 11:06:36 +03:00
2024-07-07 15:04:39 -04:00
2024-08-21 11:06:36 +03:00
2023-11-11 08:04:50 +03:00
2024-08-08 13:33:09 -04:00
2024-08-06 17:33:39 +02:00
2023-08-30 11:25:50 +03:00
2024-08-11 14:45:41 -04:00
2024-08-16 09:35:18 +03:00
2024-07-20 21:58:49 -04:00
2024-05-30 21:40:00 +10:00