sail/Zephyr-7B-DICE-Iter2

Fork 0

Go to file

ai-modelscope c65d706e57 Update README.md

2024-12-12 03:05:07 +08:00

.gitattributes

Update README.md

2024-12-12 03:05:07 +08:00

config.json

Update README.md

2024-12-12 03:05:07 +08:00

configuration.json

Update README.md

2024-12-12 03:05:07 +08:00

generation_config.json

Update README.md

2024-12-12 03:05:07 +08:00

model-00001-of-00003.safetensors

Update README.md

2024-12-12 03:05:07 +08:00

model-00002-of-00003.safetensors

Update README.md

2024-12-12 03:05:07 +08:00

model-00003-of-00003.safetensors

Update README.md

2024-12-12 03:05:07 +08:00

model.safetensors.index.json

Update README.md

2024-12-12 03:05:07 +08:00

README.md

Update README.md

2024-12-12 03:05:07 +08:00

special_tokens_map.json

Update README.md

2024-12-12 03:05:07 +08:00

tokenizer_config.json

Update README.md

2024-12-12 03:05:07 +08:00

tokenizer.json

Update README.md

2024-12-12 03:05:07 +08:00

tokenizer.model

Update README.md

2024-12-12 03:05:07 +08:00

README.md

library_name, license, datasets, language

library_name

license

datasets

language

transformers

mit

HuggingFaceH4/ultrafeedback_binarized

Zephyr-7B-DICE-Iter2

This model was developed using Bootstrapping Language Models with DPO Implicit Rewards (DICE) at iteration 2, based on the HuggingFaceH4/zephyr-7b-beta as the starting point.

Links to Other Models

Model Description

Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
Language(s) (NLP): Primarily English
License: MIT
Fine-tuned from model: HuggingFaceH4/zephyr-7b-beta

AlpacaEval Leaderboard Evaluation Results

Model	LC. Win Rate	Win Rate
Zephyr-7b-beta	12.69	10.71
Zephyr-7B-DICE-Iter1	19.03	17.67
Zephyr-7B-DICE-Iter2	20.71	20.16

Citation

@article{chen2024bootstrapping,
  title={Bootstrapping Language Models with DPO Implicit Rewards},
  author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},
  journal={arXiv preprint arXiv:2406.09760},
  year={2024}
}