Naahraf27/npo_llama-3.2-3b-instruct_forget10_ep5_lr2e-5_alpha2.0_beta0.1

Go to file

ModelHub XC 5d1086e9e6 初始化项目，由ModelHub XC社区提供模型

Model: Naahraf27/npo_llama-3.2-3b-instruct_forget10_ep5_lr2e-5_alpha2.0_beta0.1
Source: Original Platform

2026-04-25 15:30:12 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

memory_laundering_checkpoint.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

memory_laundering_run.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

npo_tofu_summary.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-25 15:30:12 +08:00

README.md

license, library_name, base_model, tags, datasets, pipeline_tag, language

license

library_name

base_model

3B NPO-Unlearned Llama -- TOFU forget10

This is the benchmark-selected rank-1 3B checkpoint from:

Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models Farhaan Fayaz, Anas Adnan, Danial Norsam, Vidur Pitumbur, Berken Gokcek, Amir Solanki University College London

The model was produced by applying Negative Preference Optimisation (NPO) to the TOFU-finetuned Llama-3.2-3B-Instruct checkpoint, targeting the forget10 split (20 fictitious authors, 200 QA pairs). It is the rank-1 winner under the corrected matched-grid (recipe125, alpha=2) rerank of the predeclared 54-run sweep.

Intended use

This checkpoint is released as a research artefact for reproducibility. It is the exact model evaluated in the paper. It is not intended for production deployment.

Training details

Parameter	Value
Base model	`open-unlearning/tofu_Llama-3.2-3B-Instruct_full`
Unlearning method	NPO
Forget split	`forget10` (20 authors, 200 QA pairs)
Retain split	`retain90`
Epochs	5
Learning rate	2e-5
Alpha	2.0
Beta	0.1
Sweep	54-run grid (2 epochs x 3 LRs x 3 alphas x 3 betas), reranked after the corrected matched-grid alpha=2 refresh
Selection	Rank-1 by official TOFU `forget_quality` metric (blind)

Benchmark and audit results

Metric	Value
TOFU forget quality	0.468
TOFU model utility	0.621
Overall novel-recall leak (corrected scorer)	6.13%
Format-shift leak rate	22.8%
Best-of-N prompt-level leak	9.9%
Chain-of-clues final-turn leak	42.6%
Masked probe top-1 accuracy (last layer)	0.620
Avg log-probability on forgotten answer	-3.20
Forgotten-answer log-likelihood shift vs TOFU-full	+0.424
RTT recovery delta	+0.51 pp
Quantization delta (INT8 vs FP16)	-0.03 pp

Under the corrected novel-recall scorer (which excludes prompt-echoed content), base models leak near zero (0.3%), confirming that detected leakage is genuine TOFU-specific knowledge. The TOFU-full model leaks 7.85% at this scale. This unlearned checkpoint leaks 6.13% -- a reduction of 1.73 pp. However, chain-of-clues multi-turn prompting still extracts 42.6% of the forgotten facts, and the forgotten-answer log-likelihood actually increases by +0.424 relative to TOFU-full, indicating that the internal target signal is not consistently suppressed even though behavioural suppression partially succeeds.

Full audit details, including per-family breakdowns, are reported in the paper and the GitHub repository.

How to load

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Naahraf27/npo_llama-3.2-3b-instruct_forget10_ep5_lr2e-5_alpha2.0_beta0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

Citation

If you use this checkpoint, please cite:

@article{fayaz2026memory,
  title={Do Unlearned LLMs Really Forget? A Multi-View Audit of TOFU Unlearning Across 1B, 3B, and 8B Llama Models},
  author={Fayaz, Farhaan and Adnan, Anas and Norsam, Danial and Pitumbur, Vidur and Gokcek, Berken and Solanki, Amir},
  year={2026},
  institution={University College London}
}