- Add pipeline tag and link to Github repository (b2cc1fb07ad79a789f61c32d76cdee4c9fea213b) - Update README.md (75daac7f44d364e7bca45b7d2dfcb617587f073a) Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
datasets, language, library_name, license, pipeline_tag
| datasets | language | library_name | license | pipeline_tag | ||
|---|---|---|---|---|---|---|
|
|
transformers | mit | text-generation |
Zephyr-7B-DICE-Iter2
This model was developed using Bootstrapping Language Models with DPO Implicit Rewards (DICE) at iteration 2, based on the HuggingFaceH4/zephyr-7b-beta as the starting point.
Links to Other Models
Model Description
- Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
- Language(s) (NLP): Primarily English
- License: MIT
- Fine-tuned from model: HuggingFaceH4/zephyr-7b-beta
AlpacaEval Leaderboard Evaluation Results
| Model | LC. Win Rate | Win Rate |
|---|---|---|
| Zephyr-7b-beta | 12.69 | 10.71 |
| Zephyr-7B-DICE-Iter1 | 19.03 | 17.67 |
| Zephyr-7B-DICE-Iter2 | 20.71 | 20.16 |
Code
https://github.com/sail-sg/dice
Citation
@article{chen2024bootstrapping,
title={Bootstrapping Language Models with DPO Implicit Rewards},
author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},
journal={arXiv preprint arXiv:2406.09760},
year={2024}
}
Description