license, library_name, base_model, tags, datasets, pipeline_tag, language
license library_name base_model tags datasets pipeline_tag language
llama3.2 transformers open-unlearning/tofu_Llama-3.2-1B-Instruct_full
unlearning
tofu
npo
llama
memory-laundering
machine-unlearning
locuslab/TOFU
text-generation
en

1B NPO-Unlearned Llama -- TOFU forget10

This is the benchmark-selected rank-1 1B checkpoint from:

Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models Farhaan Fayaz, Anas Adnan, Danial Norsam, Vidur Pitumbur, Berken Gokcek, Amir Solanki University College London

The model was produced by applying Negative Preference Optimisation (NPO) to the TOFU-finetuned Llama-3.2-1B-Instruct checkpoint, targeting the forget10 split (20 fictitious authors, 200 QA pairs).

Intended use

This checkpoint is released as a research artefact for reproducibility. It is the exact model evaluated in the paper. It is not intended for production deployment.

Training details

Parameter Value
Base model open-unlearning/tofu_Llama-3.2-1B-Instruct_full
Unlearning method NPO
Forget split forget10 (20 authors, 200 QA pairs)
Retain split retain90
Epochs 10
Learning rate 5e-5
Alpha 1.0
Beta 0.1
Sweep 54-run grid (2 epochs x 3 LRs x 3 alphas x 3 betas)
Selection Rank-1 by official TOFU forget_quality metric (blind)

Benchmark and audit results

Metric Value
TOFU forget quality 0.967
TOFU model utility 0.548
Overall novel-recall leak (corrected scorer) 4.67%
Format-shift leak rate 16.9%
Best-of-N prompt-level leak 10.5%
Masked probe top-1 accuracy (last layer) 0.618
Forgotten-answer log-likelihood shift vs TOFU-full -0.599
RTT recovery delta +0.84 pp

Under the corrected novel-recall scorer (which excludes prompt-echoed content), base models leak near zero (0.4%), confirming that detected leakage is genuine TOFU-specific knowledge. The TOFU-full model leaks 3.70% at this scale. This unlearned checkpoint leaks 4.67% -- actually exceeding its TOFU-full counterpart by 0.97 pp. NPO changes what the model says far more than what it still knows.

Full audit details, including per-family breakdowns, are reported in the paper and the GitHub repository.

How to load

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Naahraf27/npo_llama-3.2-1b-instruct_forget10_ep10_lr5e-5_alpha1.0_beta0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

Citation

If you use this checkpoint, please cite:

@article{fayaz2026memory,
  title={Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models},
  author={Fayaz, Farhaan and Adnan, Anas and Norsam, Danial and Pitumbur, Vidur and Gokcek, Berken and Solanki, Amir},
  year={2026},
  institution={University College London}
}
Description
Model synced from source: Naahraf27/npo_llama-3.2-1b-instruct_forget10_ep10_lr5e-5_alpha1.0_beta0.1
Readme 30 KiB