初始化项目,由ModelHub XC社区提供模型
Model: Alepach/notHumpback-M1-Rw-F-8b Source: Original Platform
This commit is contained in:
101
README.md
Normal file
101
README.md
Normal file
@@ -0,0 +1,101 @@
|
||||
---
|
||||
base_model: meta-llama/Llama-3.1-8B
|
||||
library_name: transformers
|
||||
model_name: notHumpback-M1-Rw-F-8b
|
||||
tags:
|
||||
- generated_from_trainer
|
||||
- trl
|
||||
- sft
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- OpenAssistant/oasst1
|
||||
- allenai/c4
|
||||
---
|
||||
|
||||
# notHumpback-M1-Rw-F-8b
|
||||
|
||||
This model follows roughly follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
|
||||
by Li et al. An additional improvement, primarily inspired by the paper [Better Alignment with Instruction Back-and-Forth Translation](https://arxiv.org/abs/2408.04614) by Nguyen et al.,
|
||||
is added at the end of the original pipeline.
|
||||
|
||||
The original Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
|
||||
creating a richer dataset for fine-tuning models without the need for additional manual annotation. For this, the documents from the web corpus are treated as theoretical responses,
|
||||
for which then matching instructions are generated.
|
||||
A copy of the base model, instruction-tuned on a small amount of "gold" instruction-response pairs, then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
|
||||
of all pairs with the highest possible score (self-curation).
|
||||
The pipeline by Nguyen et al. adds a third step called "Rewriting". During this step an already aligned LLM (e.g. LLaMa-2-70B-chat) is employed to rewrite those responses
|
||||
that have passed the filtering at the self-curation step. The rewriting improves the linguistic quality of the responses, due to the nature of web-sourced texts, often containing colloquialisms
|
||||
and stylistic noise. The final model is then finetuned on the rewritten dataset.
|
||||
|
||||
This approach inspired me to also add a rewriting step, performed not by an already aligned external LLM, but by the
|
||||
["seed model"](https://huggingface.co/Alepach/notHumpback-M0), that also performs the filtering (self-curation). This approach intends to bring back the idea of
|
||||
"Self-Alignment", since using an external model for rewriting deviates from the "self" aspect. In my pipeline the "self-rewriting" step is performed before self-curation,
|
||||
so that the quality of the pairs is ensured after rewriting, allowing for more candidate pairs to be taken into consideration during filtering. This can be important for
|
||||
leveraging the amount of data used, since some web documents have messy structure and would get filtered out when performing filtering first. The rewriting could potentially
|
||||
restructure the response and thereby increase its quality and chance to be included in the final training data, potentially allowing for a greater, more diverse
|
||||
final training dataset.
|
||||
|
||||
This model represents the resulting model after the first iteration of the pipeline, which is trained on a small amount of gold data
|
||||
and a set of generated data rewritten and curated by the ["seed model"](https://huggingface.co/Alepach/notHumpback-M0).
|
||||
|
||||
This model can be used for instruction-following.
|
||||
It may also be used to, again, rewrite and score the instruction-response pairs
|
||||
generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx) for a second iteration of the pipeline.
|
||||
|
||||
|
||||
Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B).
|
||||
It has been trained using [TRL](https://github.com/huggingface/trl).
|
||||
|
||||
The dataset used to train this model is a combination of data sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
|
||||
dataset and the synthetic dataset which was mentioned above. The latter has been created by applying self-augmentation, self-rewriting and self-curation
|
||||
on 502k entries from the english subset ("en") of the [c4](https://huggingface.co/datasets/allenai/c4) dataset.
|
||||
|
||||
### Framework versions
|
||||
|
||||
- TRL: 0.12.1
|
||||
- Transformers: 4.46.3
|
||||
- Pytorch: 2.5.1
|
||||
- Datasets: 3.1.0
|
||||
- Tokenizers: 0.20.3
|
||||
|
||||
## Citations
|
||||
|
||||
Original paper:
|
||||
|
||||
```bibtex
|
||||
@misc{li2023selfalignment,
|
||||
title={Self-Alignment with Instruction Backtranslation},
|
||||
author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
|
||||
year={2023},
|
||||
eprint={2308.06259},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL}
|
||||
}
|
||||
```
|
||||
|
||||
Inspiration:
|
||||
|
||||
```bibtex
|
||||
@misc{nguyen2024betteralignmentinstructionbackandforth,
|
||||
title={Better Alignment with Instruction Back-and-Forth Translation},
|
||||
author={Thao Nguyen and Jeffrey Li and Sewoong Oh and Ludwig Schmidt and Jason Weston and Luke Zettlemoyer and Xian Li},
|
||||
year={2024},
|
||||
eprint={2408.04614},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL},
|
||||
url={https://arxiv.org/abs/2408.04614},
|
||||
}
|
||||
```
|
||||
|
||||
Cite TRL as:
|
||||
|
||||
```bibtex
|
||||
@misc{vonwerra2022trl,
|
||||
title = {{TRL: Transformer Reinforcement Learning}},
|
||||
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
|
||||
year = 2020,
|
||||
journal = {GitHub repository},
|
||||
publisher = {GitHub},
|
||||
howpublished = {\url{https://github.com/huggingface/trl}}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user