f4b9969f202ab58dfa1f441d767b9b59aa76a547
Model: princeton-nlp/Llama-3-Base-8B-SFT-CPO Source: Original Platform
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Description