Llama-3-Base-8B-DICE-Iter2/README.md

---
datasets:
- HuggingFaceH4/ultrafeedback_binarized
language:
- en
library_name: transformers
license: mit
pipeline_tag: text-generation
---

# Llama-3-Base-8B-DICE-Iter2

This model was developed using [Bootstrapping Language Models with DPO Implicit Rewards](https://arxiv.org/abs/2406.09760) (DICE) at iteration 2, based on the [princeton-nlp/Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO) architecture as the starting point. 

## Links to Other Models
- [Llama-3-Base-8B-DICE-Iter1](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter1)
- [Llama-3-Base-8B-DICE-Iter2](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter2)

## Model Description

- Model type: An 8B parameter GPT-like model fine-tuned on synthetic datasets.
- Language(s) (NLP): Primarily English
- License: MIT
- Fine-tuned from model: princeton-nlp/Llama-3-Base-8B-SFT-DPO

## [AlpacaEval Leaderboard Evaluation Results](https://tatsu-lab.github.io/alpaca_eval/)

|                Model                           | LC. Win Rate | Win Rate | 
|-------------------------------------------|:------------:|:--------:|
|[Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO) |18.20 |15.50
|[Llama-3-Base-8B-DICE-Iter1](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter1) |25.08 |25.77
|[Llama-3-Base-8B-DICE-Iter2](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter2) |**27.55** |**30.99**

## Code
https://github.com/sail-sg/dice

## Citation

```bibtex
@article{chen2024bootstrapping,
  title={Bootstrapping Language Models with DPO Implicit Rewards},
  author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},
  journal={arXiv preprint arXiv:2406.09760},
  year={2024}
}
```
System update meta information 2024-12-11 19:55:20 +00:00			`---`
Update README.md 2024-12-12 04:01:12 +08:00			`datasets:`
			`- HuggingFaceH4/ultrafeedback_binarized`
			`language:`
			`- en`
Improve model card with pipeline tag and code link (#1) - Improve model card with pipeline tag and code link (6312a4727d4ea69941f8a6ac5a89a59b44f68977) Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co> 2025-03-12 01:34:31 +08:00			`library_name: transformers`
			`license: mit`
			`pipeline_tag: text-generation`
Update README.md 2024-12-12 04:01:12 +08:00			`---`
System update meta information 2024-12-11 19:55:20 +00:00
Update README.md 2024-12-12 04:01:12 +08:00			`# Llama-3-Base-8B-DICE-Iter2`
System update meta information 2024-12-11 19:55:20 +00:00
Update README.md 2024-12-12 04:01:12 +08:00			`This model was developed using [Bootstrapping Language Models with DPO Implicit Rewards](https://arxiv.org/abs/2406.09760) (DICE) at iteration 2, based on the [princeton-nlp/Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO) architecture as the starting point.`
System update meta information 2024-12-11 19:55:20 +00:00
Update README.md 2024-12-12 04:01:12 +08:00			`## Links to Other Models`
			`- [Llama-3-Base-8B-DICE-Iter1](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter1)`
			`- [Llama-3-Base-8B-DICE-Iter2](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter2)`
System update meta information 2024-12-11 19:55:20 +00:00
Update README.md 2024-12-12 04:01:12 +08:00			`## Model Description`
System update meta information 2024-12-11 19:55:20 +00:00
Update README.md 2024-12-12 04:01:12 +08:00			`- Model type: An 8B parameter GPT-like model fine-tuned on synthetic datasets.`
			`- Language(s) (NLP): Primarily English`
			`- License: MIT`
			`- Fine-tuned from model: princeton-nlp/Llama-3-Base-8B-SFT-DPO`

			`## [AlpacaEval Leaderboard Evaluation Results](https://tatsu-lab.github.io/alpaca_eval/)`

			`\| Model \| LC. Win Rate \| Win Rate \|`
			`\|-------------------------------------------\|:------------:\|:--------:\|`
			`\|[Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO) \|18.20 \|15.50`
			`\|[Llama-3-Base-8B-DICE-Iter1](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter1) \|25.08 \|25.77`
			`\|[Llama-3-Base-8B-DICE-Iter2](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter2) \|27.55 \|30.99`

Improve model card with pipeline tag and code link (#1) - Improve model card with pipeline tag and code link (6312a4727d4ea69941f8a6ac5a89a59b44f68977) Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co> 2025-03-12 01:34:31 +08:00			`## Code`
			`https://github.com/sail-sg/dice`

Update README.md 2024-12-12 04:01:12 +08:00			`## Citation`

			```bibtex
			`@article{chen2024bootstrapping,`
			`title={Bootstrapping Language Models with DPO Implicit Rewards},`
			`author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},`
			`journal={arXiv preprint arXiv:2406.09760},`
			`year={2024}`
			`}`
			```