Go to file

ai-modelscope 5c35f80c4b Add pipeline tag and improve model card (#1 )

- Add pipeline tag and improve model card (8c2b606fc85469d117d76e290a65e5c29e76ca4d)
- Make the README consistent for the model over iterations (ad63cc0daf9fa12a3a3d48861445b5942a12a77c)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

2025-03-12 03:13:02 +08:00

.gitattributes

Update README.md

2024-12-12 02:22:27 +08:00

config.json

Update README.md

2024-12-12 02:22:27 +08:00

configuration.json

Update README.md

2024-12-12 02:22:27 +08:00

generation_config.json

Update README.md

2024-12-12 02:22:27 +08:00

model-00001-of-00004.safetensors

Update README.md

2024-12-12 02:22:27 +08:00

model-00002-of-00004.safetensors

Update README.md

2024-12-12 02:22:27 +08:00

model-00003-of-00004.safetensors

Update README.md

2024-12-12 02:22:27 +08:00

model-00004-of-00004.safetensors

Update README.md

2024-12-12 02:22:27 +08:00

model.safetensors.index.json

Update README.md

2024-12-12 02:22:27 +08:00

README.md

Add pipeline tag and improve model card (#1 )

2025-03-12 03:13:02 +08:00

special_tokens_map.json

Update README.md

2024-12-12 02:22:27 +08:00

tokenizer_config.json

Update README.md

2024-12-12 02:22:27 +08:00

tokenizer.json

Update README.md

2024-12-12 02:22:27 +08:00

README.md

datasets, language, library_name, license, pipeline_tag

datasets

language

library_name

license

pipeline_tag

HuggingFaceH4/ultrafeedback_binarized

transformers

mit

text-generation

Llama-3-Base-8B-DICE-Iter1

This model was developed using Bootstrapping Language Models with DPO Implicit Rewards (DICE) at iteration 1, based on the princeton-nlp/Llama-3-Base-8B-SFT-DPO architecture as the starting point.

Links to Other Models

Model Description

Model type: An 8B parameter GPT-like model fine-tuned on synthetic datasets.
Language(s) (NLP): Primarily English
License: MIT
Fine-tuned from model: princeton-nlp/Llama-3-Base-8B-SFT-DPO

AlpacaEval Leaderboard Evaluation Results

Model	LC. Win Rate	Win Rate
Llama-3-Base-8B-SFT-DPO	18.20	15.50
Llama-3-Base-8B-DICE-Iter1	25.08	25.77
Llama-3-Base-8B-DICE-Iter2	27.55	30.99

Code

https://github.com/sail-sg/dice

Citation

@article{chen2024bootstrapping,
  title={Bootstrapping Language Models with DPO Implicit Rewards},
  author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},
  journal={arXiv preprint arXiv:2406.09760},
  year={2024}
}