Model: uyenlk/UNDIAL_forget10_5e-5_Llama-3.2-3B-Instruct_alpha1_beta30 Source: Original Platform
44 lines
3.9 KiB
Plaintext
44 lines
3.9 KiB
Plaintext
[2026-03-21 11:10:10,413][model][INFO] - Setting pad_token as eos token: <|eot_id|>
|
|
[2026-03-21 11:10:14,792][evaluator][INFO] - Evaluations stored in the experiment directory: ./saves/unlearn/UNDIAL_forget10_5e-5_Llama-3.2-3B-Instruct_alpha1_beta30
|
|
[2026-03-21 11:10:18,305][trainer][INFO] - UNDIAL Trainer loaded, output_dir: ./saves/unlearn/UNDIAL_forget10_5e-5_Llama-3.2-3B-Instruct_alpha1_beta30
|
|
[2026-03-21 11:24:13,450][evaluator][INFO] - ***** Running TOFU evaluation suite *****
|
|
[2026-03-21 11:24:13,450][evaluator][INFO] - Fine-grained evaluations will be saved to: ./saves/unlearn/UNDIAL_forget10_5e-5_Llama-3.2-3B-Instruct_alpha1_beta30/checkpoint-60/evals/TOFU_EVAL.json
|
|
[2026-03-21 11:24:13,451][evaluator][INFO] - Aggregated evaluations will be summarised in: ./saves/unlearn/UNDIAL_forget10_5e-5_Llama-3.2-3B-Instruct_alpha1_beta30/checkpoint-60/evals/TOFU_SUMMARY.json
|
|
[2026-03-21 11:24:15,705][metrics][INFO] - Evaluating forget_Q_A_PARA_Prob
|
|
[2026-03-21 11:24:52,753][metrics][INFO] - Evaluating forget_Q_A_PERT_Prob
|
|
[2026-03-21 11:27:47,682][metrics][INFO] - Evaluating forget_truth_ratio
|
|
[2026-03-21 11:27:47,684][metrics][INFO] - Evaluating forget_quality
|
|
[2026-03-21 11:27:47,684][metrics][WARNING] - retain_model_logs not provided in reference_logs, setting forget_quality to None
|
|
[2026-03-21 11:27:47,684][evaluator][INFO] - Result for metric forget_quality: None
|
|
[2026-03-21 11:27:49,613][metrics][INFO] - Evaluating forget_Q_A_Prob
|
|
[2026-03-21 11:28:22,293][evaluator][INFO] - Result for metric forget_Q_A_Prob: 0.09008558228611946
|
|
[2026-03-21 11:28:24,591][metrics][INFO] - Evaluating forget_Q_A_ROUGE
|
|
[2026-03-21 11:31:23,195][evaluator][INFO] - Result for metric forget_Q_A_ROUGE: 0.2604574962602255
|
|
[2026-03-21 11:31:25,124][metrics][INFO] - Evaluating retain_Q_A_Prob
|
|
[2026-03-21 11:31:56,368][metrics][INFO] - Evaluating retain_Q_A_ROUGE
|
|
[2026-03-21 11:32:45,276][metrics][INFO] - Evaluating retain_Q_A_PARA_Prob
|
|
[2026-03-21 11:33:19,904][metrics][INFO] - Evaluating retain_Q_A_PERT_Prob
|
|
[2026-03-21 11:36:03,257][metrics][INFO] - Evaluating retain_Truth_Ratio
|
|
[2026-03-21 11:36:05,140][metrics][INFO] - Evaluating ra_Q_A_Prob
|
|
[2026-03-21 11:36:12,027][metrics][INFO] - Evaluating ra_Q_A_PERT_Prob
|
|
[2026-03-21 11:36:26,823][metrics][INFO] - Evaluating ra_Q_A_Prob_normalised
|
|
[2026-03-21 11:36:28,792][metrics][INFO] - Evaluating ra_Q_A_ROUGE
|
|
[2026-03-21 11:36:36,232][metrics][INFO] - Skipping ra_Truth_Ratio's precompute ra_Q_A_Prob, already evaluated.
|
|
[2026-03-21 11:36:36,232][metrics][INFO] - Skipping ra_Truth_Ratio's precompute ra_Q_A_PERT_Prob, already evaluated.
|
|
[2026-03-21 11:36:36,232][metrics][INFO] - Evaluating ra_Truth_Ratio
|
|
[2026-03-21 11:36:38,153][metrics][INFO] - Evaluating wf_Q_A_Prob
|
|
[2026-03-21 11:36:45,600][metrics][INFO] - Evaluating wf_Q_A_PERT_Prob
|
|
[2026-03-21 11:37:02,009][metrics][INFO] - Evaluating wf_Q_A_Prob_normalised
|
|
[2026-03-21 11:37:03,955][metrics][INFO] - Evaluating wf_Q_A_ROUGE
|
|
[2026-03-21 11:37:15,801][metrics][INFO] - Skipping wf_Truth_Ratio's precompute wf_Q_A_Prob, already evaluated.
|
|
[2026-03-21 11:37:15,802][metrics][INFO] - Skipping wf_Truth_Ratio's precompute wf_Q_A_PERT_Prob, already evaluated.
|
|
[2026-03-21 11:37:15,802][metrics][INFO] - Evaluating wf_Truth_Ratio
|
|
[2026-03-21 11:37:15,802][metrics][INFO] - Evaluating model_utility
|
|
[2026-03-21 11:37:15,803][evaluator][INFO] - Result for metric model_utility: 0.6848796588356608
|
|
[2026-03-21 11:37:19,283][metrics][INFO] - Evaluating mia_min_k
|
|
[2026-03-21 11:37:25,971][metrics][INFO] - Evaluating privleak
|
|
[2026-03-21 11:37:25,972][metrics][WARNING] - retain_model_logs evals not provided for privleak, using default retain auc of 0.5
|
|
[2026-03-21 11:37:25,972][evaluator][INFO] - Result for metric privleak: -48.73624999025274
|
|
[2026-03-21 11:37:27,900][metrics][INFO] - Evaluating extraction_strength
|
|
[2026-03-21 11:37:31,431][evaluator][INFO] - Result for metric extraction_strength: 0.03423553851795265
|