Model: Naahraf27/npo_llama-3.1-8b-instruct_forget10_ep5_lr5e-5_alpha2.0_beta0.1 Source: Original Platform
license, library_name, base_model, tags, datasets, pipeline_tag, language
| license | library_name | base_model | tags | datasets | pipeline_tag | language | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| llama3.1 | transformers | open-unlearning/tofu_Llama-3.1-8B-Instruct_full |
|
|
text-generation |
|
8B NPO-Unlearned Llama -- TOFU forget10
This is the benchmark-selected rank-1 8B checkpoint from:
Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models Farhaan Fayaz, Anas Adnan, Danial Norsam, Vidur Pitumbur, Berken Gokcek, Amir Solanki University College London
The model was produced by applying Negative Preference Optimisation (NPO) to the TOFU-finetuned Llama-3.1-8B-Instruct checkpoint, targeting the forget10 split (20 fictitious authors, 200 QA pairs).
Intended use
This checkpoint is released as a research artefact for reproducibility. It is the exact model evaluated in the paper. It is not intended for production deployment.
Training details
| Parameter | Value |
|---|---|
| Base model | open-unlearning/tofu_Llama-3.1-8B-Instruct_full |
| Unlearning method | NPO |
| Forget split | forget10 (20 authors, 200 QA pairs) |
| Retain split | retain90 |
| Epochs | 5 |
| Learning rate | 5e-5 |
| Alpha | 2.0 |
| Beta | 0.1 |
| Sweep | 54-run grid (2 epochs x 3 LRs x 3 alphas x 3 betas) |
| Selection | Rank-1 by official TOFU forget_quality metric (blind) |
Benchmark and audit results
| Metric | Value |
|---|---|
| TOFU forget quality | 0.700 |
| TOFU model utility | 0.576 |
| Overall novel-recall leak (corrected scorer) | 8.93% |
| Format-shift leak rate | 36.9% |
| Best-of-N prompt-level leak | 14.5% |
| Masked probe top-1 accuracy (last layer) | 0.628 |
| Forgotten-answer log-likelihood shift vs TOFU-full | +0.061 |
| RTT recovery delta | +7.64 pp |
Under the corrected novel-recall scorer (which excludes prompt-echoed content), base models leak near zero (0.4%), confirming that detected leakage is genuine TOFU-specific knowledge. The TOFU-full model leaks 9.42% at this scale. This unlearned checkpoint leaks 8.93% -- a reduction of only 0.49 pp despite passing the TOFU benchmark. Format-shift prompting alone extracts 36.9%, and RTT recovery (+7.64 pp) is the highest across all three scales, showing that larger models preserve more recoverable residual knowledge.
Full audit details, including per-family breakdowns, are reported in the paper and the GitHub repository.
How to load
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Naahraf27/npo_llama-3.1-8b-instruct_forget10_ep5_lr5e-5_alpha2.0_beta0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")
Citation
If you use this checkpoint, please cite:
@article{fayaz2026memory,
title={Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models},
author={Fayaz, Farhaan and Adnan, Anas and Norsam, Danial and Pitumbur, Vidur and Gokcek, Berken and Solanki, Amir},
year={2026},
institution={University College London}
}