初始化项目，由ModelHub XC社区提供模型

Model: Alepach/notHumpback-M1-Rw-F-8b Source: Original Platform
2026-05-25 06:55:15 +08:00
commit 1fcb71d9a0
13 changed files with 2579 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,101 @@
+---
+base_model: meta-llama/Llama-3.1-8B
+library_name: transformers
+model_name: notHumpback-M1-Rw-F-8b
+tags:
+- generated_from_trainer
+- trl
+- sft
+license: apache-2.0
+datasets:
+- OpenAssistant/oasst1
+- allenai/c4
+---
+
+# notHumpback-M1-Rw-F-8b
+
+This model follows roughly follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259) 
+by Li et al. An additional improvement, primarily inspired by the paper [Better Alignment with Instruction Back-and-Forth Translation](https://arxiv.org/abs/2408.04614) by Nguyen et al., 
+is added at the end of the original pipeline. 
+
+The original Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation), 
+creating a richer dataset for fine-tuning models without the need for additional manual annotation. For this, the documents from the web corpus are treated as theoretical responses,
+for which then matching instructions are generated.
+A copy of the base model, instruction-tuned on a small amount of "gold" instruction-response pairs, then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset 
+of all pairs with the highest possible score (self-curation). 
+The pipeline by Nguyen et al. adds a third step called "Rewriting". During this step an already aligned LLM (e.g. LLaMa-2-70B-chat) is employed to rewrite those responses 
+that have passed the filtering at the self-curation step. The rewriting improves the linguistic quality of the responses, due to the nature of web-sourced texts, often containing colloquialisms
+and stylistic noise. The final model is then finetuned on the rewritten dataset.
+
+This approach inspired me to also add a rewriting step, performed not by an already aligned external LLM, but by the 
+["seed model"](https://huggingface.co/Alepach/notHumpback-M0), that also performs the filtering (self-curation). This approach intends to bring back the idea of 
+"Self-Alignment", since using an external model for rewriting deviates from the "self" aspect. In my pipeline the "self-rewriting" step is performed before self-curation, 
+so that the quality of the pairs is ensured after rewriting, allowing for more candidate pairs to be taken into consideration during filtering. This can be important for
+leveraging the amount of data used, since some web documents have messy structure and would get filtered out when performing filtering first. The rewriting could potentially 
+restructure the response and thereby increase its quality and chance to be included in the final training data, potentially allowing for a greater, more diverse
+final training dataset.
+
+This model represents the resulting model after the first iteration of the pipeline, which is trained on a small amount of gold data 
+and a set of generated data rewritten and curated by the ["seed model"](https://huggingface.co/Alepach/notHumpback-M0). 
+
+This model can be used for instruction-following.
+It may also be used to, again, rewrite and score the instruction-response pairs 
+generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx) for a second iteration of the pipeline. 
+
+
+Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+
+The dataset used to train this model is a combination of data sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) 
+dataset and the synthetic dataset which was mentioned above. The latter has been created by applying self-augmentation, self-rewriting and self-curation 
+on 502k entries from the english subset ("en") of the [c4](https://huggingface.co/datasets/allenai/c4) dataset.
+
+### Framework versions
+
+- TRL: 0.12.1
+- Transformers: 4.46.3
+- Pytorch: 2.5.1
+- Datasets: 3.1.0
+- Tokenizers: 0.20.3
+
+## Citations
+
+Original paper:
+
+```bibtex
+@misc{li2023selfalignment,
+    title={Self-Alignment with Instruction Backtranslation},
+    author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
+    year={2023},
+    eprint={2308.06259},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+
+Inspiration:
+
+```bibtex
+@misc{nguyen2024betteralignmentinstructionbackandforth,
+      title={Better Alignment with Instruction Back-and-Forth Translation}, 
+      author={Thao Nguyen and Jeffrey Li and Sewoong Oh and Ludwig Schmidt and Jason Weston and Luke Zettlemoyer and Xian Li},
+      year={2024},
+      eprint={2408.04614},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2408.04614}, 
+}
+```
+
+Cite TRL as:
+    
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```