Model: uyenlk/SimNPO_forget10_5e-5_Llama-3.2-3B-Instruct_gamma0.25_delta1.0_beta3.5 Source: Original Platform
48 lines
4.4 KiB
Plaintext
48 lines
4.4 KiB
Plaintext
[2026-03-21 15:29:24,202][model][INFO] - Setting pad_token as eos token: <|eot_id|>
|
|
[2026-03-21 15:29:28,581][evaluator][INFO] - Evaluations stored in the experiment directory: ./saves/unlearn/SimNPO_forget10_5e-5_Llama-3.2-3B-Instruct_gamma0.25_delta1.0_beta3.5
|
|
[2026-03-21 15:29:32,759][trainer][INFO] - SimNPO Trainer loaded, output_dir: ./saves/unlearn/SimNPO_forget10_5e-5_Llama-3.2-3B-Instruct_gamma0.25_delta1.0_beta3.5
|
|
[2026-03-21 15:49:30,326][evaluator][INFO] - ***** Running TOFU evaluation suite *****
|
|
[2026-03-21 15:49:30,326][evaluator][INFO] - Fine-grained evaluations will be saved to: ./saves/unlearn/SimNPO_forget10_5e-5_Llama-3.2-3B-Instruct_gamma0.25_delta1.0_beta3.5/checkpoint-60/evals/TOFU_EVAL.json
|
|
[2026-03-21 15:49:30,326][evaluator][INFO] - Aggregated evaluations will be summarised in: ./saves/unlearn/SimNPO_forget10_5e-5_Llama-3.2-3B-Instruct_gamma0.25_delta1.0_beta3.5/checkpoint-60/evals/TOFU_SUMMARY.json
|
|
[2026-03-21 15:49:32,824][metrics][INFO] - Loading evaluations from saves/eval/tofu_Llama-3.2-3B-Instruct_retain90/TOFU_EVAL.json
|
|
[2026-03-21 15:49:33,101][metrics][INFO] - Evaluating forget_Q_A_PARA_Prob
|
|
[2026-03-21 15:50:10,819][metrics][INFO] - Loading evaluations from saves/eval/tofu_Llama-3.2-3B-Instruct_retain90/TOFU_EVAL.json
|
|
[2026-03-21 15:50:10,836][metrics][INFO] - Evaluating forget_Q_A_PERT_Prob
|
|
[2026-03-21 15:53:03,153][metrics][INFO] - Loading evaluations from saves/eval/tofu_Llama-3.2-3B-Instruct_retain90/TOFU_EVAL.json
|
|
[2026-03-21 15:53:03,170][metrics][INFO] - Evaluating forget_truth_ratio
|
|
[2026-03-21 15:53:03,171][metrics][INFO] - Loading evaluations from saves/eval/tofu_Llama-3.2-3B-Instruct_retain90/TOFU_EVAL.json
|
|
[2026-03-21 15:53:03,184][metrics][INFO] - Evaluating forget_quality
|
|
[2026-03-21 15:53:03,186][evaluator][INFO] - Result for metric forget_quality: 0.15461291961180293
|
|
[2026-03-21 15:53:05,084][metrics][INFO] - Evaluating forget_Q_A_Prob
|
|
[2026-03-21 15:53:38,510][evaluator][INFO] - Result for metric forget_Q_A_Prob: 0.10250965156359598
|
|
[2026-03-21 15:53:40,508][metrics][INFO] - Evaluating forget_Q_A_ROUGE
|
|
[2026-03-21 15:54:35,761][evaluator][INFO] - Result for metric forget_Q_A_ROUGE: 0.36281320542195883
|
|
[2026-03-21 15:54:37,725][metrics][INFO] - Evaluating retain_Q_A_Prob
|
|
[2026-03-21 15:55:10,162][metrics][INFO] - Evaluating retain_Q_A_ROUGE
|
|
[2026-03-21 15:56:05,959][metrics][INFO] - Evaluating retain_Q_A_PARA_Prob
|
|
[2026-03-21 15:56:39,763][metrics][INFO] - Evaluating retain_Q_A_PERT_Prob
|
|
[2026-03-21 15:59:22,702][metrics][INFO] - Evaluating retain_Truth_Ratio
|
|
[2026-03-21 15:59:24,559][metrics][INFO] - Evaluating ra_Q_A_Prob
|
|
[2026-03-21 15:59:31,383][metrics][INFO] - Evaluating ra_Q_A_PERT_Prob
|
|
[2026-03-21 15:59:45,886][metrics][INFO] - Evaluating ra_Q_A_Prob_normalised
|
|
[2026-03-21 15:59:47,733][metrics][INFO] - Evaluating ra_Q_A_ROUGE
|
|
[2026-03-21 15:59:55,431][metrics][INFO] - Skipping ra_Truth_Ratio's precompute ra_Q_A_Prob, already evaluated.
|
|
[2026-03-21 15:59:55,431][metrics][INFO] - Skipping ra_Truth_Ratio's precompute ra_Q_A_PERT_Prob, already evaluated.
|
|
[2026-03-21 15:59:55,432][metrics][INFO] - Evaluating ra_Truth_Ratio
|
|
[2026-03-21 15:59:57,295][metrics][INFO] - Evaluating wf_Q_A_Prob
|
|
[2026-03-21 16:00:04,763][metrics][INFO] - Evaluating wf_Q_A_PERT_Prob
|
|
[2026-03-21 16:00:21,500][metrics][INFO] - Evaluating wf_Q_A_Prob_normalised
|
|
[2026-03-21 16:00:23,385][metrics][INFO] - Evaluating wf_Q_A_ROUGE
|
|
[2026-03-21 16:00:33,760][metrics][INFO] - Skipping wf_Truth_Ratio's precompute wf_Q_A_Prob, already evaluated.
|
|
[2026-03-21 16:00:33,760][metrics][INFO] - Skipping wf_Truth_Ratio's precompute wf_Q_A_PERT_Prob, already evaluated.
|
|
[2026-03-21 16:00:33,760][metrics][INFO] - Evaluating wf_Truth_Ratio
|
|
[2026-03-21 16:00:33,761][metrics][INFO] - Evaluating model_utility
|
|
[2026-03-21 16:00:33,762][evaluator][INFO] - Result for metric model_utility: 0.6285432141205023
|
|
[2026-03-21 16:00:37,143][metrics][INFO] - Loading evaluations from saves/eval/tofu_Llama-3.2-3B-Instruct_retain90/TOFU_EVAL.json
|
|
[2026-03-21 16:00:37,160][metrics][INFO] - Evaluating mia_min_k
|
|
[2026-03-21 16:00:43,641][metrics][INFO] - Loading evaluations from saves/eval/tofu_Llama-3.2-3B-Instruct_retain90/TOFU_EVAL.json
|
|
[2026-03-21 16:00:43,656][metrics][INFO] - Evaluating privleak
|
|
[2026-03-21 16:00:43,656][evaluator][INFO] - Result for metric privleak: 7.427156927246329
|
|
[2026-03-21 16:00:45,640][metrics][INFO] - Evaluating extraction_strength
|
|
[2026-03-21 16:00:49,082][evaluator][INFO] - Result for metric extraction_strength: 0.0789758995090771
|