nopperl
9958c81b79
Implement the OLMo architecture (#6741)
* implement olmo architecture
* remove unused variable
* remove unused moe branch
* remove check for weight
* remove superfluous moe, bias and rope tensors
* clarified comment
* fix clamp_kqv setting
* remove obsolete parameter name filter
2024-04-19 11:35:54 +02:00
..
2023-11-11 08:04:50 +03:00
2024-04-19 11:35:54 +02:00
2024-03-15 10:46:51 +02:00
2024-04-18 14:49:01 +03:00
2023-11-11 08:04:50 +03:00
2023-08-30 11:25:50 +03:00
2024-04-16 18:40:48 +03:00
2024-04-18 14:49:01 +03:00