2026-04-18 02:54:52 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False
2026-04-18 02:54:52 - INFO - __main__ - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='/scratch/feng.yulu/dynamic-dpo-v4/base_models/Mistral-7B-v0.3', model_revision='main', model_code_revision=None, torch_dtype='bfloat16', tokenizer_name_or_path=None, trust_remote_code=False, attn_implementation='flash_attention_2', use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False, bnb_4bit_quant_storage='uint8')
2026-04-18 02:54:52 - INFO - __main__ - Data parameters DataArguments(chat_template=None, dataset_mixer={'Anthropic/hh-rlhf': 1.0}, text_column='text', dataset_splits=['train', 'test'], dataset_configs=['helpful-base'], dataset_dir=None, preprocessing_num_workers=12, use_persistent_hf_cache=False, hf_cache_dir=None, truncation_side=None, auto_insert_empty_system_msg=True, disable_thinking=False, preprocessing_log_samples=0, preprocessing_log_dir=None)
2026-04-18 02:54:52 - INFO - __main__ - Training/evaluation parameters SFTConfig(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
average_tokens_across_devices=False,
batch_eval_metrics=False,
bf16=True,
bf16_full_eval=False,
chars_per_token=<CHARS_PER_TOKEN>,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
dataset_batch_size=1000,
dataset_kwargs=None,
dataset_num_proc=None,
dataset_text_field=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_packing=None,
eval_steps=100,
eval_strategy=IntervalStrategy.STEPS,
eval_use_gather_object=False,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=2,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant': False},
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=W-61/mistral-7b-base-sft-hh-helpful-4xh200-batch-64,
hub_model_revision=main,
hub_private_repo=None,
hub_strategy=HubStrategy.END,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_for_metrics=[],
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=info,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64/runs/Apr18_02-54-52_d4053,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=5,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.COSINE,
max_grad_norm=1.0,
max_seq_length=512,
max_steps=-1,
metric_for_best_model=None,
model_init_kwargs=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_of_sequences=1024,
num_train_epochs=1,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
optim_target_modules=None,
output_dir=/scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332,
overwrite_output_dir=True,
packing=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=200,
save_strategy=SaveStrategy.STEPS,
save_total_limit=2,
seed=42,
skip_memory_metrics=True,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
torchdynamo=None,
tp_size=0,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_liger=False,
use_liger_kernel=False,
use_mps_device=False,
wandb_project=ood-run-4xh200,
warmup_ratio=0.1,
warmup_steps=0,
weight_decay=0.0,
)
2026-04-18 02:54:52 - INFO - __main__ - W&B project: ood-run-4xh200
2026-04-18 02:54:53 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, 16-bits training: False
2026-04-18 02:54:53 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1 distributed training: True, 16-bits training: False
2026-04-18 02:54:53 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1 distributed training: True, 16-bits training: False
No config specified, defaulting to the single config: hh-rlhf/default
2026-04-18 02:54:53 - INFO - datasets.builder - No config specified, defaulting to the single config: hh-rlhf/default
Using custom data configuration default-cfba128a0ab1b99f
2026-04-18 02:54:53 - INFO - datasets.builder - Using custom data configuration default-cfba128a0ab1b99f
Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/json
2026-04-18 02:54:53 - INFO - datasets.info - Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
2026-04-18 02:54:53 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
2026-04-18 02:54:53 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
Found cached dataset hh-rlhf (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa)
2026-04-18 02:54:54 - INFO - datasets.builder - Found cached dataset hh-rlhf (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa)
Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
2026-04-18 02:54:54 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
2026-04-18 02:54:56 - WARNING - alignment.data - Dropped 237 non-canonical HH preference examples from split `train` before normalization (126 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 111 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
2026-04-18 02:54:56 - WARNING - alignment.data - Dropped 237 non-canonical HH preference examples from split `train` before normalization (126 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 111 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
2026-04-18 02:54:56 - WARNING - alignment.data - Dropped 237 non-canonical HH preference examples from split `train` before normalization (126 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 111 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
2026-04-18 02:54:56 - WARNING - alignment.data - Dropped 237 non-canonical HH preference examples from split `train` before normalization (126 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 111 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
Normalizing raw HH preferences (train):   0%|                                                                           | 0/43598 [00:00<?, ? examples/s]Normalizing raw HH preferences (train):   0%|                                                                           | 0/43598 [00:00<?, ? examples/s]Normalizing raw HH preferences (train):   0%|                                                                           | 0/43598 [00:00<?, ? examples/s]Normalizing raw HH preferences (train):   0%|                                                                           | 0/43598 [00:00<?, ? examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d6e6bfbe34161664.arrow
2026-04-18 02:54:56 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d6e6bfbe34161664.arrow
Normalizing raw HH preferences (train):   2%|▉                                                              | 668/43598 [00:00<00:07, 5886.43 examples/s]Normalizing raw HH preferences (train):   2%|▉                                                              | 670/43598 [00:00<00:07, 5900.19 examples/s]Normalizing raw HH preferences (train):   2%|▉                                                              | 664/43598 [00:00<00:07, 5746.87 examples/s]Normalizing raw HH preferences (train):   2%|▉                                                              | 676/43598 [00:00<00:07, 5817.35 examples/s]Normalizing raw HH preferences (train):   4%|██▋                                                           | 1887/43598 [00:00<00:04, 9393.91 examples/s]Normalizing raw HH preferences (train):   4%|██▋                                                           | 1894/43598 [00:00<00:04, 9427.82 examples/s]Normalizing raw HH preferences (train):   4%|██▋                                                           | 1885/43598 [00:00<00:04, 9317.19 examples/s]Normalizing raw HH preferences (train):   4%|██▋                                                           | 1906/43598 [00:00<00:04, 9386.87 examples/s]Normalizing raw HH preferences (train):   8%|████▋                                                         | 3339/43598 [00:00<00:05, 6819.67 examples/s]Normalizing raw HH preferences (train):   8%|████▋                                                         | 3340/43598 [00:00<00:06, 6622.39 examples/s]Normalizing raw HH preferences (train):   8%|████▊                                                         | 3346/43598 [00:00<00:06, 6627.36 examples/s]Normalizing raw HH preferences (train):   8%|████▋                                                         | 3339/43598 [00:00<00:06, 6284.33 examples/s]Normalizing raw HH preferences (train):  11%|██████▋                                                       | 4692/43598 [00:00<00:04, 8470.07 examples/s]Normalizing raw HH preferences (train):  11%|██████▋                                                       | 4691/43598 [00:00<00:04, 8290.08 examples/s]Normalizing raw HH preferences (train):  11%|██████▋                                                       | 4695/43598 [00:00<00:04, 8303.63 examples/s]Normalizing raw HH preferences (train):  11%|██████▋                                                       | 4693/43598 [00:00<00:04, 7990.70 examples/s]Normalizing raw HH preferences (train):  14%|████████▍                                                     | 5947/43598 [00:00<00:03, 9589.84 examples/s]Normalizing raw HH preferences (train):  14%|████████▍                                                     | 5951/43598 [00:00<00:03, 9454.78 examples/s]Normalizing raw HH preferences (train):  14%|████████▍                                                     | 5962/43598 [00:00<00:03, 9484.12 examples/s]Normalizing raw HH preferences (train):  14%|████████▍                                                     | 5954/43598 [00:00<00:04, 9205.26 examples/s]Normalizing raw HH preferences (train):  16%|█████████▉                                                   | 7099/43598 [00:00<00:03, 10131.44 examples/s]Normalizing raw HH preferences (train):  16%|█████████▉                                                   | 7110/43598 [00:00<00:03, 10047.84 examples/s]Normalizing raw HH preferences (train):  16%|█████████▉                                                   | 7124/43598 [00:00<00:03, 10076.82 examples/s]Normalizing raw HH preferences (train):  16%|██████████                                                    | 7115/43598 [00:00<00:03, 9859.89 examples/s]Normalizing raw HH preferences (train):  19%|███████████▋                                                 | 8366/43598 [00:00<00:03, 10858.28 examples/s]Normalizing raw HH preferences (train):  19%|███████████▋                                                 | 8377/43598 [00:00<00:03, 10792.97 examples/s]Normalizing raw HH preferences (train):  19%|███████████▊                                                 | 8400/43598 [00:00<00:03, 10838.34 examples/s]Normalizing raw HH preferences (train):  19%|███████████▋                                                 | 8368/43598 [00:00<00:03, 10606.04 examples/s]Normalizing raw HH preferences (train):  22%|█████████████▌                                               | 9702/43598 [00:00<00:02, 11435.42 examples/s]Normalizing raw HH preferences (train):  22%|█████████████▌                                               | 9701/43598 [00:00<00:02, 11419.90 examples/s]Normalizing raw HH preferences (train):  22%|█████████████▌                                               | 9698/43598 [00:00<00:02, 11390.43 examples/s]Normalizing raw HH preferences (train):  22%|█████████████▌                                               | 9699/43598 [00:01<00:03, 11250.24 examples/s]Normalizing raw HH preferences (train):  25%|███████████████                                             | 10982/43598 [00:01<00:02, 11828.36 examples/s]Normalizing raw HH preferences (train):  25%|███████████████                                             | 10988/43598 [00:01<00:02, 11840.53 examples/s]Normalizing raw HH preferences (train):  25%|███████████████                                             | 10977/43598 [00:01<00:02, 11793.21 examples/s]Normalizing raw HH preferences (train):  25%|███████████████                                             | 10984/43598 [00:01<00:02, 11708.06 examples/s]Normalizing raw HH preferences (train):  29%|█████████████████▋                                          | 12818/43598 [00:01<00:02, 11982.87 examples/s]Normalizing raw HH preferences (train):  29%|█████████████████▋                                          | 12840/43598 [00:01<00:02, 12031.30 examples/s]Normalizing raw HH preferences (train):  29%|█████████████████▋                                          | 12821/43598 [00:01<00:02, 11981.52 examples/s]Normalizing raw HH preferences (train):  29%|█████████████████▋                                          | 12829/43598 [00:01<00:02, 11929.51 examples/s]Normalizing raw HH preferences (train):  34%|████████████████████▏                                       | 14706/43598 [00:01<00:02, 12150.21 examples/s]Normalizing raw HH preferences (train):  34%|████████████████████▏                                       | 14714/43598 [00:01<00:02, 12194.28 examples/s]Normalizing raw HH preferences (train):  34%|████████████████████▏                                       | 14703/43598 [00:01<00:02, 12147.08 examples/s]Normalizing raw HH preferences (train):  34%|████████████████████▏                                       | 14700/43598 [00:01<00:02, 12099.16 examples/s]Normalizing raw HH preferences (train):  37%|██████████████████████                                      | 15994/43598 [00:01<00:02, 12329.99 examples/s]Normalizing raw HH preferences (train):  37%|█████████████████████▉                                      | 15981/43598 [00:01<00:02, 12303.84 examples/s]Normalizing raw HH preferences (train):  37%|██████████████████████                                      | 16000/43598 [00:01<00:02, 12110.64 examples/s]Normalizing raw HH preferences (train):  37%|█████████████████████▉                                      | 15975/43598 [00:01<00:02, 12260.25 examples/s]Normalizing raw HH preferences (train):  40%|███████████████████████▊                                    | 17272/43598 [00:01<00:02, 12266.09 examples/s]Normalizing raw HH preferences (train):  41%|████████████████████████▌                                   | 17812/43598 [00:01<00:02, 12250.44 examples/s]Normalizing raw HH preferences (train):  41%|████████████████████████▌                                   | 17818/43598 [00:01<00:02, 12280.05 examples/s]Normalizing raw HH preferences (train):  41%|████████████████████████▌                                   | 17806/43598 [00:01<00:02, 12238.87 examples/s]Normalizing raw HH preferences (train):  42%|█████████████████████████▍                                  | 18523/43598 [00:01<00:02, 12331.31 examples/s]Normalizing raw HH preferences (train):  45%|███████████████████████████                                 | 19705/43598 [00:01<00:01, 12264.24 examples/s]Normalizing raw HH preferences (train):  45%|███████████████████████████                                 | 19706/43598 [00:01<00:01, 12289.81 examples/s]Normalizing raw HH preferences (train):  45%|███████████████████████████▎                                | 19814/43598 [00:01<00:01, 12490.95 examples/s]Normalizing raw HH preferences (train):  45%|███████████████████████████                                 | 19701/43598 [00:01<00:01, 12144.10 examples/s]Normalizing raw HH preferences (train):  48%|████████████████████████████▊                               | 20954/43598 [00:01<00:01, 12314.30 examples/s]Normalizing raw HH preferences (train):  48%|████████████████████████████▊                               | 20956/43598 [00:01<00:01, 12338.20 examples/s]Normalizing raw HH preferences (train):  48%|████████████████████████████▊                               | 20940/43598 [00:01<00:01, 12201.17 examples/s]Normalizing raw HH preferences (train):  50%|█████████████████████████████▊                              | 21703/43598 [00:01<00:01, 12419.66 examples/s]Normalizing raw HH preferences (train):  52%|███████████████████████████████▎                            | 22791/43598 [00:02<00:01, 12288.19 examples/s]Normalizing raw HH preferences (train):  52%|███████████████████████████████▎                            | 22797/43598 [00:02<00:01, 12311.72 examples/s]Normalizing raw HH preferences (train):  53%|███████████████████████████████▌                            | 22978/43598 [00:02<00:01, 12503.13 examples/s]Normalizing raw HH preferences (train):  52%|███████████████████████████████▎                            | 22789/43598 [00:02<00:01, 12238.72 examples/s]Normalizing raw HH preferences (train):  57%|█████████████████████████████████▉                          | 24634/43598 [00:02<00:01, 12195.54 examples/s]Normalizing raw HH preferences (train):  57%|█████████████████████████████████▉                          | 24687/43598 [00:02<00:01, 12334.94 examples/s]Normalizing raw HH preferences (train):  57%|██████████████████████████████████▏                         | 24844/43598 [00:02<00:01, 12477.95 examples/s]Normalizing raw HH preferences (train):  57%|█████████████████████████████████▉                          | 24686/43598 [00:02<00:01, 12266.11 examples/s]Normalizing raw HH preferences (train):  59%|███████████████████████████████████▌                        | 25874/43598 [00:02<00:01, 12242.30 examples/s]Normalizing raw HH preferences (train):  60%|███████████████████████████████████▋                        | 25960/43598 [00:02<00:01, 12424.69 examples/s]Normalizing raw HH preferences (train):  60%|███████████████████████████████████▋                        | 25961/43598 [00:02<00:01, 12374.22 examples/s]Normalizing raw HH preferences (train):  61%|████████████████████████████████████▋                       | 26691/43598 [00:02<00:01, 12412.41 examples/s]Normalizing raw HH preferences (train):  64%|██████████████████████████████████████                      | 27693/43598 [00:02<00:01, 12200.47 examples/s]Normalizing raw HH preferences (train):  64%|██████████████████████████████████████▏                     | 27786/43598 [00:02<00:01, 12337.03 examples/s]Normalizing raw HH preferences (train):  64%|██████████████████████████████████████▍                     | 27948/43598 [00:02<00:01, 12447.32 examples/s]Normalizing raw HH preferences (train):  64%|██████████████████████████████████████▏                     | 27780/43598 [00:02<00:01, 12287.45 examples/s]Normalizing raw HH preferences (train):  66%|███████████████████████████████████████▉                    | 28977/43598 [00:02<00:01, 12353.81 examples/s]Normalizing raw HH preferences (train):  68%|████████████████████████████████████████▉                   | 29709/43598 [00:02<00:01, 12411.63 examples/s]Normalizing raw HH preferences (train):  68%|█████████████████████████████████████████                   | 29820/43598 [00:02<00:01, 12455.25 examples/s]Normalizing raw HH preferences (train):  68%|████████████████████████████████████████▉                   | 29711/43598 [00:02<00:01, 12381.26 examples/s]Normalizing raw HH preferences (train):  71%|██████████████████████████████████████████▍                 | 30845/43598 [00:02<00:01, 12383.96 examples/s]Normalizing raw HH preferences (train):  71%|██████████████████████████████████████████▌                 | 30964/43598 [00:02<00:01, 12414.64 examples/s]Normalizing raw HH preferences (train):  72%|███████████████████████████████████████████▎                | 31491/43598 [00:02<00:00, 12241.98 examples/s]Normalizing raw HH preferences (train):  73%|███████████████████████████████████████████▋                | 31704/43598 [00:02<00:00, 12475.03 examples/s]Normalizing raw HH preferences (train):  75%|█████████████████████████████████████████████               | 32704/43598 [00:02<00:00, 12382.19 examples/s]Normalizing raw HH preferences (train):  75%|█████████████████████████████████████████████               | 32770/43598 [00:02<00:00, 12363.72 examples/s]Normalizing raw HH preferences (train):  75%|█████████████████████████████████████████████▏              | 32829/43598 [00:02<00:00, 12418.63 examples/s]Normalizing raw HH preferences (train):  77%|██████████████████████████████████████████████              | 33474/43598 [00:02<00:00, 12257.56 examples/s]Normalizing raw HH preferences (train):  78%|██████████████████████████████████████████████▋             | 33948/43598 [00:02<00:00, 12395.17 examples/s]Normalizing raw HH preferences (train):  80%|███████████████████████████████████████████████▊            | 34722/43598 [00:03<00:00, 12308.20 examples/s]Normalizing raw HH preferences (train):  80%|███████████████████████████████████████████████▊            | 34705/43598 [00:03<00:00, 12336.50 examples/s]Normalizing raw HH preferences (train):  80%|███████████████████████████████████████████████▊            | 34703/43598 [00:03<00:00, 12332.83 examples/s]Normalizing raw HH preferences (train):  82%|█████████████████████████████████████████████████▏          | 35775/43598 [00:03<00:00, 12319.45 examples/s]Normalizing raw HH preferences (train):  83%|█████████████████████████████████████████████████▌          | 35972/43598 [00:03<00:00, 12353.36 examples/s]Normalizing raw HH preferences (train):  82%|█████████████████████████████████████████████████▍          | 35951/43598 [00:03<00:00, 12364.89 examples/s]Normalizing raw HH preferences (train):  82%|█████████████████████████████████████████████████▍          | 35953/43598 [00:03<00:00, 12368.80 examples/s]Normalizing raw HH preferences (train):  86%|███████████████████████████████████████████████████▉        | 37700/43598 [00:03<00:00, 12307.46 examples/s]Normalizing raw HH preferences (train):  87%|████████████████████████████████████████████████████        | 37830/43598 [00:03<00:00, 12360.55 examples/s]Normalizing raw HH preferences (train):  87%|████████████████████████████████████████████████████        | 37796/43598 [00:03<00:00, 12338.93 examples/s]Normalizing raw HH preferences (train):  87%|████████████████████████████████████████████████████        | 37794/43598 [00:03<00:00, 12333.91 examples/s]Normalizing raw HH preferences (train):  89%|█████████████████████████████████████████████████████▌      | 38948/43598 [00:03<00:00, 12343.54 examples/s]Normalizing raw HH preferences (train):  91%|██████████████████████████████████████████████████████▋     | 39698/43598 [00:03<00:00, 12319.23 examples/s]Normalizing raw HH preferences (train):  91%|██████████████████████████████████████████████████████▋     | 39695/43598 [00:03<00:00, 12295.17 examples/s]Normalizing raw HH preferences (train):  91%|██████████████████████████████████████████████████████▌     | 39692/43598 [00:03<00:00, 12289.47 examples/s]Normalizing raw HH preferences (train):  94%|████████████████████████████████████████████████████████    | 40780/43598 [00:03<00:00, 12297.28 examples/s]Normalizing raw HH preferences (train):  95%|█████████████████████████████████████████████████████████▊   | 41359/43598 [00:03<00:00, 9021.46 examples/s]Normalizing raw HH preferences (train):  95%|█████████████████████████████████████████████████████████▊   | 41362/43598 [00:03<00:00, 9042.68 examples/s]Normalizing raw HH preferences (train):  95%|█████████████████████████████████████████████████████████▊   | 41360/43598 [00:03<00:00, 8949.58 examples/s]Normalizing raw HH preferences (train):  98%|███████████████████████████████████████████████████████████▋ | 42698/43598 [00:03<00:00, 9749.39 examples/s]Normalizing raw HH preferences (train):  98%|███████████████████████████████████████████████████████████▋ | 42696/43598 [00:03<00:00, 9117.28 examples/s]Normalizing raw HH preferences (train):  98%|███████████████████████████████████████████████████████████▋ | 42697/43598 [00:03<00:00, 9751.05 examples/s]Normalizing raw HH preferences (train):  98%|███████████████████████████████████████████████████████████▋ | 42698/43598 [00:03<00:00, 9672.60 examples/s]Normalizing raw HH preferences (train): 100%|████████████████████████████████████████████████████████████| 43598/43598 [00:04<00:00, 10051.10 examples/s]
Normalizing raw HH preferences (train): 100%|█████████████████████████████████████████████████████████████| 43598/43598 [00:04<00:00, 9901.43 examples/s]Normalizing raw HH preferences (train): 100%|█████████████████████████████████████████████████████████████| 43598/43598 [00:04<00:00, 9902.73 examples/s]Normalizing raw HH preferences (train): 100%|█████████████████████████████████████████████████████████████| 43598/43598 [00:04<00:00, 9901.43 examples/s]


No config specified, defaulting to the single config: hh-rlhf/default
2026-04-18 02:55:00 - INFO - datasets.builder - No config specified, defaulting to the single config: hh-rlhf/default
Using custom data configuration default-cfba128a0ab1b99f
2026-04-18 02:55:00 - INFO - datasets.builder - Using custom data configuration default-cfba128a0ab1b99f
Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/json
2026-04-18 02:55:00 - INFO - datasets.info - Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
2026-04-18 02:55:00 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
2026-04-18 02:55:00 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
Found cached dataset hh-rlhf (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa)
2026-04-18 02:55:00 - INFO - datasets.builder - Found cached dataset hh-rlhf (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa)
Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
2026-04-18 02:55:00 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa
2026-04-18 02:55:01 - WARNING - alignment.data - Dropped 15 non-canonical HH preference examples from split `test` before normalization (9 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 6 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
Normalizing raw HH preferences (test):   0%|                                                                             | 0/2339 [00:00<?, ? examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-fa6f4b7acba8a3e1.arrow
2026-04-18 02:55:01 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-fa6f4b7acba8a3e1.arrow
2026-04-18 02:55:01 - WARNING - alignment.data - Dropped 15 non-canonical HH preference examples from split `test` before normalization (9 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 6 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
Normalizing raw HH preferences (test):   0%|                                                                             | 0/2339 [00:00<?, ? examples/s]2026-04-18 02:55:01 - WARNING - alignment.data - Dropped 15 non-canonical HH preference examples from split `test` before normalization (9 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 6 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
Normalizing raw HH preferences (test):   0%|                                                                             | 0/2339 [00:00<?, ? examples/s]2026-04-18 02:55:01 - WARNING - alignment.data - Dropped 15 non-canonical HH preference examples from split `test` before normalization (9 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 6 x HH chosen/rejected transcripts must each contain a divergent assistant response.).
Normalizing raw HH preferences (test):   0%|                                                                             | 0/2339 [00:00<?, ? examples/s]Normalizing raw HH preferences (test):  43%|███████████████████████████▎                                    | 1000/2339 [00:00<00:00, 9456.07 examples/s]Normalizing raw HH preferences (test):  43%|███████████████████████████▎                                    | 1000/2339 [00:00<00:00, 9082.77 examples/s]Normalizing raw HH preferences (test):  43%|███████████████████████████▎                                    | 1000/2339 [00:00<00:00, 9702.66 examples/s]Normalizing raw HH preferences (test):  43%|███████████████████████████▎                                    | 1000/2339 [00:00<00:00, 9886.98 examples/s]Normalizing raw HH preferences (test):  95%|███████████████████████████████████████████████████████████▋   | 2217/2339 [00:00<00:00, 11011.34 examples/s]Normalizing raw HH preferences (test):  96%|████████████████████████████████████████████████████████████▎  | 2238/2339 [00:00<00:00, 10938.94 examples/s]Normalizing raw HH preferences (test):  96%|████████████████████████████████████████████████████████████▎  | 2239/2339 [00:00<00:00, 11258.72 examples/s]Normalizing raw HH preferences (test):  96%|████████████████████████████████████████████████████████████▏  | 2234/2339 [00:00<00:00, 11317.89 examples/s]Normalizing raw HH preferences (test): 100%|████████████████████████████████████████████████████████████████| 2339/2339 [00:00<00:00, 6416.81 examples/s]
Normalizing raw HH preferences (test): 100%|████████████████████████████████████████████████████████████████| 2339/2339 [00:00<00:00, 7636.52 examples/s]
Normalizing raw HH preferences (test): 100%|████████████████████████████████████████████████████████████████| 2339/2339 [00:00<00:00, 6682.49 examples/s]
Loading cached shuffled indices for dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-be0876dd0add1b31.arrow
2026-04-18 02:55:01 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-be0876dd0add1b31.arrow
Normalizing raw HH preferences (test): 100%|████████████████████████████████████████████████████████████████| 2339/2339 [00:00<00:00, 7670.13 examples/s]
Loading cached shuffled indices for dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-40e942b49dfd026a.arrow
2026-04-18 02:55:01 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-40e942b49dfd026a.arrow
2026-04-18 02:55:01 - INFO - __main__ - Training on the following datasets and their proportions: ['train : 43598', 'test : 2339']
[INFO|tokenization_utils_base.py:2058] 2026-04-18 02:55:01,613 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2058] 2026-04-18 02:55:01,613 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2058] 2026-04-18 02:55:01,613 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2058] 2026-04-18 02:55:01,613 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2058] 2026-04-18 02:55:01,613 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2058] 2026-04-18 02:55:01,613 >> loading file chat_template.jinja
2026-04-18 02:55:01 - INFO - __main__ - *** Load pretrained model ***
Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00000_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00000_of_00012.arrow
Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00001_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00001_of_00012.arrow
Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00002_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00002_of_00012.arrow
Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00003_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00003_of_00012.arrow
Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00004_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00004_of_00012.arrow
Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00005_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00005_of_00012.arrow
Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00006_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00006_of_00012.arrow
Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00007_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00007_of_00012.arrow
Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00008_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00008_of_00012.arrow
Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00009_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00009_of_00012.arrow
Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00010_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00010_of_00012.arrow
Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00011_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00011_of_00012.arrow
Applying chat template (num_proc=12):   0%|                                                                             | 0/43598 [00:00<?, ? examples/s]Applying chat template (num_proc=12):   0%|                                                                             | 0/43598 [00:00<?, ? examples/s]Applying chat template (num_proc=12):   0%|                                                                             | 0/43598 [00:00<?, ? examples/s]Spawning 12 processes
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Spawning 12 processes
Applying chat template (num_proc=12):   0%|                                                                             | 0/43598 [00:00<?, ? examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00000_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00000_of_00012.arrow
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00001_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00001_of_00012.arrow
Applying chat template (num_proc=12):   0%|                                                                   | 58/43598 [00:00<03:15, 222.56 examples/s]Applying chat template (num_proc=12):   0%|                                                                   | 57/43598 [00:00<03:20, 216.64 examples/s]Applying chat template (num_proc=12):   0%|                                                                   | 1/43598 [00:00<3:03:59,  3.95 examples/s]Applying chat template (num_proc=12):   0%|                                                                   | 66/43598 [00:00<03:03, 236.96 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00002_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00002_of_00012.arrow
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00003_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00003_of_00012.arrow
Applying chat template (num_proc=12):   1%|▍                                                                | 316/43598 [00:00<00:41, 1047.05 examples/s]Applying chat template (num_proc=12):   1%|▍                                                                | 334/43598 [00:00<00:36, 1188.64 examples/s]Applying chat template (num_proc=12):   1%|▎                                                                 | 230/43598 [00:00<01:00, 712.56 examples/s]Applying chat template (num_proc=12):   2%|█▌                                                              | 1025/43598 [00:00<00:14, 2903.42 examples/s]Applying chat template (num_proc=12):   2%|█▌                                                              | 1029/43598 [00:00<00:14, 2963.92 examples/s]Applying chat template (num_proc=12):   2%|█▎                                                               | 841/43598 [00:00<00:20, 2134.44 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00004_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00004_of_00012.arrow
Applying chat template (num_proc=12):   2%|█▎                                                               | 884/43598 [00:00<00:23, 1855.06 examples/s]Applying chat template (num_proc=12):   3%|██                                                              | 1429/43598 [00:00<00:13, 3230.60 examples/s]Applying chat template (num_proc=12):   6%|███▊                                                            | 2614/43598 [00:00<00:07, 5624.93 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00005_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00005_of_00012.arrow
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00007_of_00012.arrow
Applying chat template (num_proc=12):  14%|████████▌                                                      | 5926/43598 [00:00<00:03, 12443.93 examples/s]Applying chat template (num_proc=12):   6%|███▋                                                            | 2532/43598 [00:00<00:07, 5250.74 examples/s]2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00007_of_00012.arrow
Applying chat template (num_proc=12):   4%|██▋                                                             | 1856/43598 [00:00<00:11, 3546.15 examples/s]Applying chat template (num_proc=12):   4%|██▌                                                             | 1734/43598 [00:00<00:12, 3228.93 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00006_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00006_of_00012.arrow
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00008_of_00012.arrow
2026-04-18 02:55:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00008_of_00012.arrow
Applying chat template (num_proc=12):  17%|██████████▉                                                    | 7552/43598 [00:00<00:02, 13400.75 examples/s]Applying chat template (num_proc=12):  10%|██████▎                                                         | 4317/43598 [00:00<00:04, 8888.77 examples/s]Applying chat template (num_proc=12):   7%|████▋                                                           | 3174/43598 [00:00<00:06, 5958.80 examples/s]Applying chat template (num_proc=12):  21%|█████████████▏                                                 | 9169/43598 [00:00<00:02, 13469.41 examples/s]Applying chat template (num_proc=12):   7%|████▊                                                           | 3243/43598 [00:00<00:08, 4759.08 examples/s]Applying chat template (num_proc=12):  10%|██████▏                                                         | 4212/43598 [00:00<00:05, 6755.43 examples/s]Applying chat template (num_proc=12):  14%|████████▉                                                      | 6208/43598 [00:00<00:03, 11012.60 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00009_of_00012.arrow
Applying chat template (num_proc=12):  12%|███████▉                                                        | 5447/43598 [00:01<00:04, 8289.95 examples/s]2026-04-18 02:55:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00009_of_00012.arrow
Applying chat template (num_proc=12):  25%|███████████████▏                                              | 10715/43598 [00:01<00:02, 13150.19 examples/s]Applying chat template (num_proc=12):  18%|███████████▎                                                   | 7789/43598 [00:01<00:03, 11914.57 examples/s]Applying chat template (num_proc=12):  15%|█████████▌                                                     | 6576/43598 [00:01<00:03, 10717.38 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00010_of_00012.arrow
2026-04-18 02:55:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00010_of_00012.arrow
Applying chat template (num_proc=12):  16%|██████████                                                      | 6859/43598 [00:01<00:03, 9578.10 examples/s]Applying chat template (num_proc=12):  29%|██████████████████▏                                           | 12782/43598 [00:01<00:02, 15036.92 examples/s]Applying chat template (num_proc=12):  22%|█████████████▋                                                 | 9493/43598 [00:01<00:02, 13065.99 examples/s]Applying chat template (num_proc=12):  18%|███████████▎                                                   | 7826/43598 [00:01<00:03, 10983.21 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00011_of_00012.arrow
2026-04-18 02:55:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-3c7981ca175d6640_00011_of_00012.arrow
Applying chat template (num_proc=12):  19%|███████████▊                                                   | 8207/43598 [00:01<00:03, 10497.71 examples/s]Applying chat template (num_proc=12):  34%|█████████████████████                                         | 14833/43598 [00:01<00:01, 16528.44 examples/s]Applying chat template (num_proc=12):  26%|████████████████                                              | 11265/43598 [00:01<00:02, 14233.97 examples/s]Applying chat template (num_proc=12):  21%|█████████████                                                  | 9047/43598 [00:01<00:03, 11081.42 examples/s]Applying chat template (num_proc=12):  38%|███████████████████████▊                                      | 16767/43598 [00:01<00:01, 17312.94 examples/s]Applying chat template (num_proc=12):  23%|██████████████▎                                               | 10069/43598 [00:01<00:02, 12416.88 examples/s]Applying chat template (num_proc=12):  30%|██████████████████▌                                           | 13010/43598 [00:01<00:02, 14904.05 examples/s]Applying chat template (num_proc=12):  23%|██████████████▌                                               | 10224/43598 [00:01<00:02, 11173.50 examples/s]Applying chat template (num_proc=12):  27%|████████████████▉                                             | 11892/43598 [00:01<00:02, 13897.13 examples/s]Applying chat template (num_proc=12):  43%|██████████████████████████▌                                   | 18658/43598 [00:01<00:01, 17036.40 examples/s]Applying chat template (num_proc=12):  27%|████████████████▌                                             | 11639/43598 [00:01<00:02, 11968.21 examples/s]Applying chat template (num_proc=12):  34%|████████████████████▊                                         | 14614/43598 [00:01<00:01, 14985.65 examples/s]Applying chat template (num_proc=12):  31%|███████████████████▏                                          | 13532/43598 [00:01<00:02, 14580.34 examples/s]Applying chat template (num_proc=12):  31%|██████████████████▉                                           | 13339/43598 [00:01<00:02, 13299.65 examples/s]Applying chat template (num_proc=12):  47%|█████████████████████████████▎                                | 20573/43598 [00:01<00:01, 17138.44 examples/s]Applying chat template (num_proc=12):  37%|███████████████████████▏                                      | 16287/43598 [00:01<00:01, 15083.36 examples/s]Applying chat template (num_proc=12):  35%|█████████████████████▌                                        | 15168/43598 [00:01<00:01, 15043.61 examples/s]Applying chat template (num_proc=12):  51%|███████████████████████████████▊                              | 22396/43598 [00:01<00:01, 17321.31 examples/s]Applying chat template (num_proc=12):  34%|█████████████████████▏                                        | 14882/43598 [00:01<00:02, 13756.58 examples/s]Applying chat template (num_proc=12):  41%|█████████████████████████▎                                    | 17821/43598 [00:01<00:01, 15154.26 examples/s]Applying chat template (num_proc=12):  39%|████████████████████████▎                                     | 17129/43598 [00:01<00:01, 16267.27 examples/s]Applying chat template (num_proc=12):  56%|██████████████████████████████████▋                           | 24387/43598 [00:01<00:01, 17853.19 examples/s]Applying chat template (num_proc=12):  38%|███████████████████████▋                                      | 16647/43598 [00:01<00:01, 14646.41 examples/s]Applying chat template (num_proc=12):  45%|████████████████████████████                                  | 19761/43598 [00:01<00:01, 15972.27 examples/s]Applying chat template (num_proc=12):  44%|███████████████████████████▏                                  | 19091/43598 [00:01<00:01, 17092.06 examples/s]Applying chat template (num_proc=12):  61%|█████████████████████████████████████▋                        | 26480/43598 [00:01<00:00, 18432.31 examples/s]Applying chat template (num_proc=12):  50%|██████████████████████████████▋                               | 21605/43598 [00:01<00:01, 16680.05 examples/s]Applying chat template (num_proc=12):  42%|██████████████████████████▎                                   | 18466/43598 [00:01<00:01, 15009.65 examples/s]Applying chat template (num_proc=12):  48%|█████████████████████████████▉                                | 21081/43598 [00:01<00:01, 17656.73 examples/s]Applying chat template (num_proc=12):  46%|████████████████████████████▋                                 | 20212/43598 [00:01<00:01, 15692.88 examples/s]Applying chat template (num_proc=12):  65%|████████████████████████████████████████▎                     | 28359/43598 [00:02<00:00, 17885.35 examples/s]Applying chat template (num_proc=12):  53%|█████████████████████████████████▏                            | 23302/43598 [00:01<00:01, 16070.20 examples/s]Applying chat template (num_proc=12):  53%|████████████████████████████████▋                             | 22958/43598 [00:02<00:01, 17721.97 examples/s]Applying chat template (num_proc=12):  50%|███████████████████████████████                               | 21799/43598 [00:02<00:01, 15743.41 examples/s]Applying chat template (num_proc=12):  69%|███████████████████████████████████████████                   | 30241/43598 [00:02<00:00, 17983.53 examples/s]Applying chat template (num_proc=12):  57%|███████████████████████████████████▋                          | 25052/43598 [00:02<00:01, 16432.53 examples/s]Applying chat template (num_proc=12):  57%|███████████████████████████████████▌                          | 24995/43598 [00:02<00:01, 18342.71 examples/s]Applying chat template (num_proc=12):  54%|█████████████████████████████████▎                            | 23415/43598 [00:02<00:01, 15275.65 examples/s]Applying chat template (num_proc=12):  74%|█████████████████████████████████████████████▋                | 32131/43598 [00:02<00:00, 17675.79 examples/s]Applying chat template (num_proc=12):  61%|██████████████████████████████████████                        | 26777/43598 [00:02<00:01, 15727.67 examples/s]Applying chat template (num_proc=12):  62%|██████████████████████████████████████▏                       | 26879/43598 [00:02<00:00, 18219.04 examples/s]Applying chat template (num_proc=12):  57%|███████████████████████████████████▍                          | 24963/43598 [00:02<00:01, 15324.60 examples/s]Applying chat template (num_proc=12):  78%|████████████████████████████████████████████████▌             | 34132/43598 [00:02<00:00, 17719.04 examples/s]Applying chat template (num_proc=12):  65%|████████████████████████████████████████▍                     | 28470/43598 [00:02<00:00, 15866.64 examples/s]Applying chat template (num_proc=12):  66%|█████████████████████████████████████████▏                    | 28991/43598 [00:02<00:00, 18474.21 examples/s]Applying chat template (num_proc=12):  61%|█████████████████████████████████████▉                        | 26642/43598 [00:02<00:01, 15665.19 examples/s]Applying chat template (num_proc=12):  82%|███████████████████████████████████████████████████           | 35924/43598 [00:02<00:00, 17672.30 examples/s]Applying chat template (num_proc=12):  69%|███████████████████████████████████████████                   | 30264/43598 [00:02<00:00, 16316.22 examples/s]Applying chat template (num_proc=12):  71%|███████████████████████████████████████████▉                  | 30899/43598 [00:02<00:00, 18520.87 examples/s]Applying chat template (num_proc=12):  65%|████████████████████████████████████████▌                     | 28544/43598 [00:02<00:00, 16270.17 examples/s]Applying chat template (num_proc=12):  87%|█████████████████████████████████████████████████████▊        | 37851/43598 [00:02<00:00, 17636.28 examples/s]Applying chat template (num_proc=12):  74%|█████████████████████████████████████████████▋                | 32143/43598 [00:02<00:00, 16794.56 examples/s]Applying chat template (num_proc=12):  75%|██████████████████████████████████████████████▋               | 32862/43598 [00:02<00:00, 18574.68 examples/s]Applying chat template (num_proc=12):  69%|██████████████████████████████████████████▉                   | 30179/43598 [00:02<00:00, 15987.10 examples/s]Applying chat template (num_proc=12):  78%|████████████████████████████████████████████████▏             | 33886/43598 [00:02<00:00, 16960.01 examples/s]Applying chat template (num_proc=12):  91%|████████████████████████████████████████████████████████▎     | 39618/43598 [00:02<00:00, 15500.94 examples/s]Applying chat template (num_proc=12):  80%|█████████████████████████████████████████████████▋            | 34951/43598 [00:02<00:00, 19218.36 examples/s]Applying chat template (num_proc=12):  73%|█████████████████████████████████████████████▎                | 31860/43598 [00:02<00:00, 16117.21 examples/s]Applying chat template (num_proc=12):  82%|██████████████████████████████████████████████████▋           | 35649/43598 [00:02<00:00, 17064.88 examples/s]Applying chat template (num_proc=12):  85%|████████████████████████████████████████████████████▉         | 37267/43598 [00:02<00:00, 20338.84 examples/s]Applying chat template (num_proc=12):  78%|████████████████████████████████████████████████▎             | 33983/43598 [00:02<00:00, 17444.01 examples/s]Applying chat template (num_proc=12):  95%|██████████████████████████████████████████████████████████▋   | 41260/43598 [00:02<00:00, 13507.30 examples/s]Applying chat template (num_proc=12):  86%|█████████████████████████████████████████████████████▏        | 37439/43598 [00:02<00:00, 16925.27 examples/s]Applying chat template (num_proc=12):  91%|████████████████████████████████████████████████████████▎     | 39561/43598 [00:02<00:00, 21081.57 examples/s]Applying chat template (num_proc=12):  83%|███████████████████████████████████████████████████▋          | 36323/43598 [00:02<00:00, 19115.88 examples/s]Applying chat template (num_proc=12):  90%|███████████████████████████████████████████████████████▊      | 39239/43598 [00:02<00:00, 17059.71 examples/s]Applying chat template (num_proc=12):  96%|███████████████████████████████████████████████████████████▎  | 41718/43598 [00:03<00:00, 20402.19 examples/s]Applying chat template (num_proc=12):  88%|██████████████████████████████████████████████████████▊       | 38577/43598 [00:03<00:00, 19916.71 examples/s]Applying chat template (num_proc=12):  94%|██████████████████████████████████████████████████████████▌   | 41143/43598 [00:03<00:00, 17626.85 examples/s]Applying chat template (num_proc=12):  98%|████████████████████████████████████████████████████████████▊ | 42735/43598 [00:03<00:00, 10502.46 examples/s]Applying chat template (num_proc=12):  94%|██████████████████████████████████████████████████████████▎   | 41020/43598 [00:03<00:00, 21232.64 examples/s]Applying chat template (num_proc=12):  99%|█████████████████████████████████████████████████████████████▍| 43228/43598 [00:03<00:00, 18564.49 examples/s]Applying chat template (num_proc=12):  99%|█████████████████████████████████████████████████████████████▌| 43275/43598 [00:03<00:00, 18251.36 examples/s]Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs90b73fbeb7579e9000031a9e'
Applying chat template (num_proc=12): 100%|██████████████████████████████████████████████████████████████| 43598/43598 [00:03<00:00, 11698.33 examples/s]
Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfseac720887c7dd6fa00031aa5'
Applying chat template (num_proc=12): 100%|██████████████████████████████████████████████████████████████| 43598/43598 [00:03<00:00, 11091.86 examples/s]
Concatenating 12 shards
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Concatenating 12 shards
Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs1f463e2a9793f82f00031aa6'
Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00000_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00000_of_00012.arrow
Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00001_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00001_of_00012.arrow
Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00002_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00002_of_00012.arrow
Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00003_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00003_of_00012.arrow
Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00004_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00004_of_00012.arrow
Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00005_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00005_of_00012.arrow
Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00006_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00006_of_00012.arrow
Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00007_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00007_of_00012.arrow
Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00008_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00008_of_00012.arrow
Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00009_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00009_of_00012.arrow
Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00010_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00010_of_00012.arrow
Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00011_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00011_of_00012.arrow
Applying chat template (num_proc=12): 100%|██████████████████████████████████████████████████████████████| 43598/43598 [00:04<00:00, 10860.86 examples/s]
Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs47dfbb060531734000031aa7'
Applying chat template (num_proc=12): 100%|██████████████████████████████████████████████████████████████| 43598/43598 [00:04<00:00, 10757.31 examples/s]
Applying chat template (num_proc=12):   0%|                                                                              | 0/2339 [00:00<?, ? examples/s]Spawning 12 processes
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Spawning 12 processes
Applying chat template (num_proc=12):   0%|                                                                              | 0/2339 [00:00<?, ? examples/s]Applying chat template (num_proc=12):   0%|                                                                              | 0/2339 [00:00<?, ? examples/s]Applying chat template (num_proc=12):   0%|                                                                              | 0/2339 [00:00<?, ? examples/s]Applying chat template (num_proc=12):   5%|███▎                                                               | 116/2339 [00:00<00:04, 451.26 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00000_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00000_of_00012.arrow
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00001_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00001_of_00012.arrow
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00002_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00002_of_00012.arrow
Applying chat template (num_proc=12):   0%|                                                                      | 1/2339 [00:00<13:16,  2.93 examples/s]Applying chat template (num_proc=12):  21%|█████████████▋                                                    | 486/2339 [00:00<00:01, 1426.15 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00004_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00004_of_00012.arrow
Applying chat template (num_proc=12):   0%|                                                                      | 1/2339 [00:00<13:14,  2.94 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00003_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00003_of_00012.arrow
Applying chat template (num_proc=12):   0%|                                                                      | 1/2339 [00:00<13:16,  2.94 examples/s]Applying chat template (num_proc=12):  13%|████████▉                                                          | 310/2339 [00:00<00:02, 870.38 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00005_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00005_of_00012.arrow
Applying chat template (num_proc=12):  43%|████████████████████████████▏                                    | 1014/2339 [00:00<00:00, 2351.73 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00006_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00006_of_00012.arrow
Applying chat template (num_proc=12):   5%|███▍                                                               | 119/2339 [00:00<00:06, 343.67 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00007_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00007_of_00012.arrow
Applying chat template (num_proc=12):  25%|████████████████▌                                                 | 585/2339 [00:00<00:01, 1273.55 examples/s]Applying chat template (num_proc=12):  17%|███████████▏                                                       | 390/2339 [00:00<00:02, 894.96 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00008_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00008_of_00012.arrow
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00009_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00009_of_00012.arrow
Applying chat template (num_proc=12):  17%|███████████▏                                                       | 390/2339 [00:00<00:02, 942.37 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00011_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00011_of_00012.arrow
Applying chat template (num_proc=12):  58%|█████████████████████████████████████▉                           | 1366/2339 [00:00<00:00, 2055.76 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00010_of_00012.arrow
2026-04-18 02:55:06 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-dafa9bfd40ecea63_00010_of_00012.arrow
Applying chat template (num_proc=12):  25%|████████████████▊                                                  | 585/2339 [00:00<00:02, 871.16 examples/s]Applying chat template (num_proc=12):  25%|████████████████▌                                                 | 586/2339 [00:00<00:01, 1024.10 examples/s]Applying chat template (num_proc=12):  75%|████████████████████████████████████████████████▊                | 1755/2339 [00:00<00:00, 2112.91 examples/s]Applying chat template (num_proc=12):  33%|██████████████████████▎                                            | 781/2339 [00:00<00:01, 979.09 examples/s]Applying chat template (num_proc=12):  42%|███████████████████████████▌                                      | 976/2339 [00:01<00:01, 1078.44 examples/s]Applying chat template (num_proc=12):  33%|██████████████████████▍                                            | 782/2339 [00:00<00:01, 891.04 examples/s]Applying chat template (num_proc=12):  33%|██████████████████████▎                                            | 781/2339 [00:00<00:01, 998.59 examples/s]Applying chat template (num_proc=12):  58%|█████████████████████████████████████▉                           | 1366/2339 [00:01<00:00, 1614.54 examples/s]Applying chat template (num_proc=12):  92%|███████████████████████████████████████████████████████████▌     | 2145/2339 [00:01<00:00, 1819.50 examples/s]Applying chat template (num_proc=12):  50%|████████████████████████████████▌                                | 1171/2339 [00:01<00:00, 1626.41 examples/s]Applying chat template (num_proc=12):  58%|█████████████████████████████████████▉                           | 1366/2339 [00:01<00:00, 1842.09 examples/s]Applying chat template (num_proc=12):  75%|████████████████████████████████████████████████▊                | 1758/2339 [00:01<00:00, 1655.88 examples/s]Applying chat template (num_proc=12):  75%|████████████████████████████████████████████████▊                | 1758/2339 [00:01<00:00, 1737.47 examples/s]Applying chat template (num_proc=12):  67%|███████████████████████████████████████████▍                     | 1564/2339 [00:01<00:00, 1600.46 examples/s]Applying chat template (num_proc=12): 100%|█████████████████████████████████████████████████████████████████| 2339/2339 [00:01<00:00, 2151.89 examples/s]Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs5aee568026f30ce900031ac0'
Applying chat template (num_proc=12): 100%|█████████████████████████████████████████████████████████████████| 2339/2339 [00:01<00:00, 1288.13 examples/s]
Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs0cdd3916cf4d4a7000031acb'
Applying chat template (num_proc=12): 100%|█████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1144.34 examples/s]
Filter:   0%|                                                                                                           | 0/43598 [00:00<?, ? examples/s]Filter:   0%|                                                                                                           | 0/43598 [00:00<?, ? examples/s]Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfsb5b7add01df43c8800031acd'
Applying chat template (num_proc=12): 100%|█████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1117.66 examples/s]
Filter:   0%|                                                                                                           | 0/43598 [00:00<?, ? examples/s]Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 600, in _run_server
    server.serve_forever()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/managers.py", line 184, in serve_forever
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/multiprocess/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs92999a5621cdf16e00031ace'
Applying chat template (num_proc=12): 100%|█████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1051.95 examples/s]
Concatenating 12 shards
2026-04-18 02:55:08 - INFO - datasets.arrow_dataset - Concatenating 12 shards
Filter:   0%|                                                                                                           | 0/43598 [00:00<?, ? examples/s]Filter:  23%|█████████████████████▎                                                                       | 10000/43598 [00:08<00:29, 1144.78 examples/s]Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-c4b2f23ae420b215.arrow
2026-04-18 02:55:17 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-c4b2f23ae420b215.arrow
Filter:  23%|█████████████████████▎                                                                       | 10000/43598 [00:08<00:29, 1143.64 examples/s]Filter:  23%|█████████████████████▎                                                                       | 10000/43598 [00:08<00:29, 1133.85 examples/s]Filter:  23%|█████████████████████▎                                                                       | 10000/43598 [00:08<00:30, 1113.68 examples/s]Filter:  46%|██████████████████████████████████████████▋                                                  | 20000/43598 [00:17<00:20, 1154.67 examples/s]Filter:  46%|██████████████████████████████████████████▋                                                  | 20000/43598 [00:17<00:20, 1137.84 examples/s]Filter:  46%|██████████████████████████████████████████▋                                                  | 20000/43598 [00:17<00:20, 1137.99 examples/s]Filter:  46%|██████████████████████████████████████████▋                                                  | 20000/43598 [00:17<00:20, 1134.14 examples/s]Filter:  69%|███████████████████████████████████████████████████████████████▉                             | 30000/43598 [00:26<00:11, 1153.49 examples/s]Filter:  69%|███████████████████████████████████████████████████████████████▉                             | 30000/43598 [00:26<00:11, 1142.91 examples/s]Filter:  69%|███████████████████████████████████████████████████████████████▉                             | 30000/43598 [00:26<00:11, 1135.14 examples/s]Filter:  69%|███████████████████████████████████████████████████████████████▉                             | 30000/43598 [00:26<00:11, 1133.26 examples/s]Filter:  92%|█████████████████████████████████████████████████████████████████████████████████████▎       | 40000/43598 [00:34<00:03, 1155.70 examples/s]Filter:  92%|█████████████████████████████████████████████████████████████████████████████████████▎       | 40000/43598 [00:34<00:03, 1148.18 examples/s]Filter:  92%|█████████████████████████████████████████████████████████████████████████████████████▎       | 40000/43598 [00:35<00:03, 1138.04 examples/s]Filter:  92%|█████████████████████████████████████████████████████████████████████████████████████▎       | 40000/43598 [00:35<00:03, 1130.10 examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:37<00:00, 1154.81 examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:37<00:00, 1150.58 examples/s]
Filter:   0%|                                                                                                            | 0/2339 [00:00<?, ? examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:38<00:00, 1149.52 examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:38<00:00, 1139.49 examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:38<00:00, 1139.63 examples/s]
Filter:   0%|                                                                                                            | 0/2339 [00:00<?, ? examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:38<00:00, 1131.29 examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:38<00:00, 1129.12 examples/s]
Filter:   0%|                                                                                                            | 0/2339 [00:00<?, ? examples/s]Filter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 43598/43598 [00:38<00:00, 1128.14 examples/s]
Filter:   0%|                                                                                                            | 0/2339 [00:00<?, ? examples/s]Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1147.95 examples/s]Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1098.20 examples/s]
Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d55f13663f25d416.arrow
2026-04-18 02:55:48 - INFO - datasets.arrow_dataset - Caching processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d55f13663f25d416.arrow
Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1140.28 examples/s]Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1017.52 examples/s]
2026-04-18 02:55:49 - INFO - __main__ - Decontaminated 0 (0.00%) samples from the training set.
2026-04-18 02:55:49 - INFO - __main__ - Sample 41905 of the processed training set:

<|system|>
</s>
<|user|>
What is the best way to clean my refrigerator?</s>
<|assistant|>
Well, I have a few ideas, but the best way to know what works best is to run some tests and then analyze the data to figure out which technique works best.</s>
<|user|>
What are your ideas?</s>
<|assistant|>
Some of my ideas include wiping down the interior with a rag, using bleach and water to wipe down the door, emptying the contents of the fridge to scrub the shelves with a clean cloth, cleaning the interior of the fridge with vinegar and water, and using a vacuum cleaner to clean the interior.</s>

2026-04-18 02:55:49 - INFO - __main__ - Sample 7296 of the processed training set:

<|system|>
</s>
<|user|>
How can I get blood out of cotton?</s>
<|assistant|>
Blood? What do you mean?</s>

2026-04-18 02:55:49 - INFO - __main__ - Sample 1639 of the processed training set:

<|system|>
</s>
<|user|>
I'm trying to get my friend to open up</s>
<|assistant|>
Sounds great!  What have you tried so far?</s>
<|user|>
I've tried to give my friend a safe space to open up about their feelings but they still aren't comfortable</s>
<|assistant|>
It’s really important that people feel safe sharing their feelings, but I totally hear that this isn’t easy for them.  And in my experience, that has a lot to do with them feeling that you can accept them and what they’re saying, even if they’re feeling difficult or uncomfortable feelings.  Do you think you’ve done that, and that they still haven’t opened up?</s>

Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1129.02 examples/s]Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1137.17 examples/s]Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1095.12 examples/s]
Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2339/2339 [00:02<00:00, 1036.62 examples/s]
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
[INFO|configuration_utils.py:691] 2026-04-18 02:55:50,268 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/base_models/Mistral-7B-v0.3/config.json
[INFO|configuration_utils.py:765] 2026-04-18 02:55:50,278 >> Model config MistralConfig {
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.0",
  "use_cache": false,
  "vocab_size": 32768
}

[INFO|modeling_utils.py:1121] 2026-04-18 02:55:50,994 >> loading weights file /scratch/feng.yulu/dynamic-dpo-v4/base_models/Mistral-7B-v0.3/model.safetensors.index.json
[INFO|modeling_utils.py:2167] 2026-04-18 02:55:50,997 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[WARNING|logging.py:328] 2026-04-18 02:55:51,000 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2026-04-18 02:55:51,001 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2026-04-18 02:55:51,001 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2026-04-18 02:55:51,001 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[INFO|configuration_utils.py:1142] 2026-04-18 02:55:51,002 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "use_cache": false
}

Loading checkpoint shards:   0%|                                                                                                   | 0/3 [00:00<?, ?it/s]Loading checkpoint shards:   0%|                                                                                                   | 0/3 [00:00<?, ?it/s]Loading checkpoint shards:   0%|                                                                                                   | 0/3 [00:00<?, ?it/s]Loading checkpoint shards:   0%|                                                                                                   | 0/3 [00:00<?, ?it/s]Loading checkpoint shards:  33%|██████████████████████████████▎                                                            | 1/3 [00:00<00:00,  4.88it/s]Loading checkpoint shards:  33%|██████████████████████████████▎                                                            | 1/3 [00:00<00:00,  5.21it/s]Loading checkpoint shards:  33%|██████████████████████████████▎                                                            | 1/3 [00:00<00:00,  5.20it/s]Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████▋                              | 2/3 [00:00<00:00,  4.81it/s]Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████▋                              | 2/3 [00:00<00:00,  4.94it/s]Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████▋                              | 2/3 [00:00<00:00,  4.93it/s]Loading checkpoint shards:  33%|██████████████████████████████▎                                                            | 1/3 [00:00<00:01,  1.99it/s]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.93it/s]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.90it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  5.01it/s]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  5.01it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  5.00it/s]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  5.01it/s]
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:195: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:195: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:195: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████▋                              | 2/3 [00:00<00:00,  2.66it/s]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  3.30it/s]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.98it/s]
[INFO|modeling_utils.py:4926] 2026-04-18 02:55:52,033 >> All model checkpoint weights were used when initializing MistralForCausalLM.

[INFO|modeling_utils.py:4934] 2026-04-18 02:55:52,033 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at /scratch/feng.yulu/dynamic-dpo-v4/base_models/Mistral-7B-v0.3.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
[INFO|configuration_utils.py:1095] 2026-04-18 02:55:52,037 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/base_models/Mistral-7B-v0.3/generation_config.json
[INFO|configuration_utils.py:1142] 2026-04-18 02:55:52,037 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}

/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:195: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
Using custom data configuration default-39901636ead2c4ee
2026-04-18 02:55:52 - INFO - datasets.builder - Using custom data configuration default-39901636ead2c4ee
Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator
2026-04-18 02:55:52 - INFO - datasets.info - Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator
Generating dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39901636ead2c4ee/0.0.0)
2026-04-18 02:55:52 - INFO - datasets.builder - Generating dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39901636ead2c4ee/0.0.0)
Downloading and preparing dataset generator/default to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39901636ead2c4ee/0.0.0...
2026-04-18 02:55:52 - INFO - datasets.builder - Downloading and preparing dataset generator/default to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39901636ead2c4ee/0.0.0...
Generating train split
2026-04-18 02:55:52 - INFO - datasets.builder - Generating train split
Generating train split: 0 examples [00:00, ? examples/s]Generating train split: 1 examples [00:00,  1.52 examples/s]Generating train split: 1040 examples [00:01, 871.07 examples/s]Generating train split: 2083 examples [00:02, 1138.68 examples/s]Generating train split: 3126 examples [00:03, 1145.96 examples/s]Generating train split: 4169 examples [00:03, 1245.93 examples/s]Generating train split: 5210 examples [00:04, 1307.57 examples/s]Generating train split: 6251 examples [00:05, 1348.15 examples/s]Generating train split: 7295 examples [00:05, 1371.93 examples/s]Generating train split: 8338 examples [00:06, 1392.60 examples/s]Generating train split: 9383 examples [00:07, 1407.62 examples/s]Generating train split: 10000 examples [00:07, 1591.93 examples/s]Generating train split: 10425 examples [00:08, 1260.48 examples/s]Generating train split: 11468 examples [00:08, 1317.06 examples/s]Generating train split: 12507 examples [00:09, 1356.28 examples/s]Generating train split: 13548 examples [00:10, 1378.81 examples/s]Generating train split: 14590 examples [00:11, 1400.88 examples/s]Generating train split: 15628 examples [00:11, 1407.41 examples/s]Generating train split: 16668 examples [00:12, 1418.87 examples/s]Generating train split: 17713 examples [00:13, 1423.46 examples/s]Generating train split: 18751 examples [00:14, 1349.44 examples/s]Generating train split: 19794 examples [00:14, 1822.94 examples/s]Generating train split: 19827 examples [00:14, 1378.32 examples/s]
Unable to verify splits sizes.
2026-04-18 02:56:06 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.
Dataset generator downloaded and prepared to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39901636ead2c4ee/0.0.0. Subsequent calls will reuse this data.
2026-04-18 02:56:06 - INFO - datasets.builder - Dataset generator downloaded and prepared to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39901636ead2c4ee/0.0.0. Subsequent calls will reuse this data.
Using custom data configuration default-3ff8dc28238fd37c
2026-04-18 02:56:06 - INFO - datasets.builder - Using custom data configuration default-3ff8dc28238fd37c
Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator
2026-04-18 02:56:06 - INFO - datasets.info - Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator
Generating dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-3ff8dc28238fd37c/0.0.0)
2026-04-18 02:56:06 - INFO - datasets.builder - Generating dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-3ff8dc28238fd37c/0.0.0)
Downloading and preparing dataset generator/default to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-3ff8dc28238fd37c/0.0.0...
2026-04-18 02:56:06 - INFO - datasets.builder - Downloading and preparing dataset generator/default to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-3ff8dc28238fd37c/0.0.0...
Generating train split
2026-04-18 02:56:06 - INFO - datasets.builder - Generating train split
Generating train split: 0 examples [00:00, ? examples/s]Generating train split: 1 examples [00:00,  1.57 examples/s]Generating train split: 1074 examples [00:00, 1943.90 examples/s]Generating train split: 1074 examples [00:00, 1402.29 examples/s]
Unable to verify splits sizes.
2026-04-18 02:56:07 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.
Dataset generator downloaded and prepared to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-3ff8dc28238fd37c/0.0.0. Subsequent calls will reuse this data.
2026-04-18 02:56:07 - INFO - datasets.builder - Dataset generator downloaded and prepared to /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-3ff8dc28238fd37c/0.0.0. Subsequent calls will reuse this data.
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:407: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
  super().__init__(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:407: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
  super().__init__(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:407: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
  super().__init__(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:407: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
  super().__init__(
[INFO|trainer.py:748] 2026-04-18 02:56:10,778 >> Using auto half precision backend
2026-04-18 02:56:10 - INFO - __main__ - *** Train ***
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1557: UserWarning: Upcasted low precision parameters in MistralForCausalLM because mixed precision turned on in FSDP. Affects: model.embed_tokens.weight, model.norm.weight, lm_head.weight.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1557: UserWarning: Upcasted low precision parameters in MistralDecoderLayer because mixed precision turned on in FSDP. Affects: self_attn.q_proj.weight, self_attn.k_proj.weight, self_attn.v_proj.weight, self_attn.o_proj.weight, mlp.gate_proj.weight, mlp.up_proj.weight, mlp.down_proj.weight, input_layernorm.weight, post_attention_layernorm.weight.
  warnings.warn(
/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1563: UserWarning: FSDP upcast of low precision parameters may affect the precision of model checkpoints.
  warnings.warn(
[INFO|trainer.py:2414] 2026-04-18 02:56:56,684 >> ***** Running training *****
[INFO|trainer.py:2415] 2026-04-18 02:56:56,684 >>   Num examples = 19,827
[INFO|trainer.py:2416] 2026-04-18 02:56:56,684 >>   Num Epochs = 1
[INFO|trainer.py:2417] 2026-04-18 02:56:56,684 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:2420] 2026-04-18 02:56:56,684 >>   Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:2421] 2026-04-18 02:56:56,684 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2422] 2026-04-18 02:56:56,684 >>   Total optimization steps = 310
[INFO|trainer.py:2423] 2026-04-18 02:56:56,685 >>   Number of trainable parameters = 1,812,005,888
[INFO|integration_utils.py:831] 2026-04-18 02:56:56,686 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: can-not-fand (can-not-fand-northeastern-university). Use `wandb login --relogin` to force relogin
wandb: wandb version 0.26.0 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.17.5
wandb: Run data is saved locally in /scratch/feng.yulu/dynamic-dpo-v4/wandb/wandb/run-20260418_025659-jylo12f8
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332
wandb: ⭐️ View project at https://wandb.ai/can-not-fand-northeastern-university/ood-run-4xh200
wandb: 🚀 View run at https://wandb.ai/can-not-fand-northeastern-university/ood-run-4xh200/runs/jylo12f8
  0%|                                                                                                                            | 0/310 [00:00<?, ?it/s]  0%|▎                                                                                                                   | 1/310 [00:02<10:20,  2.01s/it]                                                                                                                                                         {'loss': 1.9, 'grad_norm': 58.05511474609375, 'learning_rate': 0.0, 'epoch': 0.0}
  0%|▎                                                                                                                   | 1/310 [00:02<10:20,  2.01s/it]  1%|▋                                                                                                                   | 2/310 [00:03<07:50,  1.53s/it]  1%|█                                                                                                                   | 3/310 [00:04<07:07,  1.39s/it]  1%|█▍                                                                                                                  | 4/310 [00:05<06:44,  1.32s/it]  2%|█▊                                                                                                                  | 5/310 [00:06<06:31,  1.28s/it]                                                                                                                                                         {'loss': 1.7687, 'grad_norm': 21.915355682373047, 'learning_rate': 2.580645161290323e-06, 'epoch': 0.02}
  2%|█▊                                                                                                                  | 5/310 [00:06<06:31,  1.28s/it]  2%|██▏                                                                                                                 | 6/310 [00:08<06:23,  1.26s/it]  2%|██▌                                                                                                                 | 7/310 [00:09<06:17,  1.25s/it]  3%|██▉                                                                                                                 | 8/310 [00:10<06:13,  1.24s/it]  3%|███▎                                                                                                                | 9/310 [00:11<06:08,  1.22s/it]  3%|███▋                                                                                                               | 10/310 [00:12<06:05,  1.22s/it]                                                                                                                                                         {'loss': 1.5337, 'grad_norm': 8.215951919555664, 'learning_rate': 5.806451612903226e-06, 'epoch': 0.03}
  3%|███▋                                                                                                               | 10/310 [00:12<06:05,  1.22s/it]  4%|████                                                                                                               | 11/310 [00:14<06:02,  1.21s/it]  4%|████▍                                                                                                              | 12/310 [00:15<06:00,  1.21s/it]  4%|████▊                                                                                                              | 13/310 [00:16<06:00,  1.21s/it]  5%|█████▏                                                                                                             | 14/310 [00:17<06:01,  1.22s/it]  5%|█████▌                                                                                                             | 15/310 [00:18<05:58,  1.22s/it]                                                                                                                                                         {'loss': 1.4014, 'grad_norm': 4.512474060058594, 'learning_rate': 9.03225806451613e-06, 'epoch': 0.05}
  5%|█████▌                                                                                                             | 15/310 [00:19<05:58,  1.22s/it]  5%|█████▉                                                                                                             | 16/310 [00:20<05:56,  1.21s/it]  5%|██████▎                                                                                                            | 17/310 [00:21<05:55,  1.21s/it]  6%|██████▋                                                                                                            | 18/310 [00:22<05:52,  1.21s/it]  6%|███████                                                                                                            | 19/310 [00:23<06:07,  1.26s/it]  6%|███████▍                                                                                                           | 20/310 [00:25<06:01,  1.25s/it]                                                                                                                                                         {'loss': 1.3709, 'grad_norm': 5.174832820892334, 'learning_rate': 1.2258064516129034e-05, 'epoch': 0.06}
  6%|███████▍                                                                                                           | 20/310 [00:25<06:01,  1.25s/it]  7%|███████▊                                                                                                           | 21/310 [00:26<05:56,  1.23s/it]  7%|████████▏                                                                                                          | 22/310 [00:27<05:52,  1.22s/it]  7%|████████▌                                                                                                          | 23/310 [00:28<05:49,  1.22s/it]  8%|████████▉                                                                                                          | 24/310 [00:29<05:47,  1.21s/it]  8%|█████████▎                                                                                                         | 25/310 [00:31<05:44,  1.21s/it]                                                                                                                                                         {'loss': 1.3652, 'grad_norm': 4.159119129180908, 'learning_rate': 1.5483870967741936e-05, 'epoch': 0.08}
  8%|█████████▎                                                                                                         | 25/310 [00:31<05:44,  1.21s/it]  8%|█████████▋                                                                                                         | 26/310 [00:32<05:42,  1.21s/it]  9%|██████████                                                                                                         | 27/310 [00:33<05:41,  1.21s/it]  9%|██████████▍                                                                                                        | 28/310 [00:34<05:39,  1.20s/it]  9%|██████████▊                                                                                                        | 29/310 [00:36<05:38,  1.20s/it] 10%|███████████▏                                                                                                       | 30/310 [00:37<05:36,  1.20s/it]                                                                                                                                                         {'loss': 1.3885, 'grad_norm': 4.116876602172852, 'learning_rate': 1.870967741935484e-05, 'epoch': 0.1}
 10%|███████████▏                                                                                                       | 30/310 [00:37<05:36,  1.20s/it] 10%|███████████▌                                                                                                       | 31/310 [00:38<05:35,  1.20s/it] 10%|███████████▊                                                                                                       | 32/310 [00:39<05:34,  1.20s/it] 11%|████████████▏                                                                                                      | 33/310 [00:40<05:33,  1.20s/it] 11%|████████████▌                                                                                                      | 34/310 [00:42<05:31,  1.20s/it] 11%|████████████▉                                                                                                      | 35/310 [00:43<05:30,  1.20s/it]                                                                                                                                                         {'loss': 1.3982, 'grad_norm': 3.840414047241211, 'learning_rate': 1.999429490929718e-05, 'epoch': 0.11}
 11%|████████████▉                                                                                                      | 35/310 [00:43<05:30,  1.20s/it] 12%|█████████████▎                                                                                                     | 36/310 [00:44<05:43,  1.25s/it] 12%|█████████████▋                                                                                                     | 37/310 [00:45<05:37,  1.24s/it] 12%|██████████████                                                                                                     | 38/310 [00:46<05:34,  1.23s/it] 13%|██████████████▍                                                                                                    | 39/310 [00:48<05:44,  1.27s/it] 13%|██████████████▊                                                                                                    | 40/310 [00:49<05:37,  1.25s/it]                                                                                                                                                         {'loss': 1.3744, 'grad_norm': 4.314554691314697, 'learning_rate': 1.9959454037227215e-05, 'epoch': 0.13}
 13%|██████████████▊                                                                                                    | 40/310 [00:49<05:37,  1.25s/it] 13%|███████████████▏                                                                                                   | 41/310 [00:50<05:33,  1.24s/it] 14%|███████████████▌                                                                                                   | 42/310 [00:51<05:29,  1.23s/it] 14%|███████████████▉                                                                                                   | 43/310 [00:53<05:26,  1.22s/it] 14%|████████████████▎                                                                                                  | 44/310 [00:54<05:23,  1.22s/it] 15%|████████████████▋                                                                                                  | 45/310 [00:55<05:21,  1.21s/it]                                                                                                                                                         {'loss': 1.394, 'grad_norm': 4.36101770401001, 'learning_rate': 1.989305206325792e-05, 'epoch': 0.15}
 15%|████████████████▋                                                                                                  | 45/310 [00:55<05:21,  1.21s/it] 15%|█████████████████                                                                                                  | 46/310 [00:56<05:19,  1.21s/it] 15%|█████████████████▍                                                                                                 | 47/310 [00:58<05:18,  1.21s/it] 15%|█████████████████▊                                                                                                 | 48/310 [00:59<05:16,  1.21s/it] 16%|██████████████████▏                                                                                                | 49/310 [01:00<05:15,  1.21s/it] 16%|██████████████████▌                                                                                                | 50/310 [01:01<05:13,  1.21s/it]                                                                                                                                                         {'loss': 1.3652, 'grad_norm': 3.858530282974243, 'learning_rate': 1.9795299412524948e-05, 'epoch': 0.16}
 16%|██████████████████▌                                                                                                | 50/310 [01:01<05:13,  1.21s/it] 16%|██████████████████▉                                                                                                | 51/310 [01:02<05:23,  1.25s/it] 17%|███████████████████▎                                                                                               | 52/310 [01:04<05:18,  1.24s/it] 17%|███████████████████▋                                                                                               | 53/310 [01:05<05:14,  1.22s/it] 17%|████████████████████                                                                                               | 54/310 [01:06<05:11,  1.22s/it] 18%|████████████████████▍                                                                                              | 55/310 [01:07<05:08,  1.21s/it]                                                                                                                                                         {'loss': 1.3825, 'grad_norm': 4.053957462310791, 'learning_rate': 1.9666505859174462e-05, 'epoch': 0.18}
 18%|████████████████████▍                                                                                              | 55/310 [01:07<05:08,  1.21s/it] 18%|████████████████████▊                                                                                              | 56/310 [01:08<05:06,  1.21s/it] 18%|█████████████████████▏                                                                                             | 57/310 [01:10<05:04,  1.21s/it] 19%|█████████████████████▌                                                                                             | 58/310 [01:11<05:03,  1.20s/it] 19%|█████████████████████▉                                                                                             | 59/310 [01:12<05:02,  1.20s/it] 19%|██████████████████████▎                                                                                            | 60/310 [01:13<05:01,  1.20s/it]                                                                                                                                                         {'loss': 1.3449, 'grad_norm': 3.8405709266662598, 'learning_rate': 1.9507079544701583e-05, 'epoch': 0.19}
 19%|██████████████████████▎                                                                                            | 60/310 [01:13<05:01,  1.20s/it] 20%|██████████████████████▋                                                                                            | 61/310 [01:14<04:59,  1.20s/it] 20%|███████████████████████                                                                                            | 62/310 [01:16<04:58,  1.20s/it] 20%|███████████████████████▎                                                                                           | 63/310 [01:17<04:57,  1.20s/it] 21%|███████████████████████▋                                                                                           | 64/310 [01:18<05:08,  1.26s/it] 21%|████████████████████████                                                                                           | 65/310 [01:19<05:03,  1.24s/it]                                                                                                                                                         {'loss': 1.3268, 'grad_norm': 3.8573291301727295, 'learning_rate': 1.9317525684566686e-05, 'epoch': 0.21}
 21%|████████████████████████                                                                                           | 65/310 [01:20<05:03,  1.24s/it] 21%|████████████████████████▍                                                                                          | 66/310 [01:21<04:59,  1.23s/it] 22%|████████████████████████▊                                                                                          | 67/310 [01:22<04:56,  1.22s/it] 22%|█████████████████████████▏                                                                                         | 68/310 [01:23<04:53,  1.21s/it] 22%|█████████████████████████▌                                                                                         | 69/310 [01:24<04:51,  1.21s/it] 23%|█████████████████████████▉                                                                                         | 70/310 [01:26<05:00,  1.25s/it]                                                                                                                                                         {'loss': 1.3096, 'grad_norm': 4.115865230560303, 'learning_rate': 1.9098444967188308e-05, 'epoch': 0.23}
 23%|█████████████████████████▉                                                                                         | 70/310 [01:26<05:00,  1.25s/it] 23%|██████████████████████████▎                                                                                        | 71/310 [01:27<04:55,  1.24s/it] 23%|██████████████████████████▋                                                                                        | 72/310 [01:28<04:52,  1.23s/it] 24%|███████████████████████████                                                                                        | 73/310 [01:29<04:48,  1.22s/it] 24%|███████████████████████████▍                                                                                       | 74/310 [01:30<04:47,  1.22s/it] 24%|███████████████████████████▊                                                                                       | 75/310 [01:32<04:44,  1.21s/it]                                                                                                                                                         {'loss': 1.3243, 'grad_norm': 3.9826207160949707, 'learning_rate': 1.8850531650386154e-05, 'epoch': 0.24}
 24%|███████████████████████████▊                                                                                       | 75/310 [01:32<04:44,  1.21s/it] 25%|████████████████████████████▏                                                                                      | 76/310 [01:33<04:53,  1.25s/it] 25%|████████████████████████████▌                                                                                      | 77/310 [01:34<04:48,  1.24s/it] 25%|████████████████████████████▉                                                                                      | 78/310 [01:35<04:44,  1.23s/it] 25%|█████████████████████████████▎                                                                                     | 79/310 [01:37<04:41,  1.22s/it] 26%|█████████████████████████████▋                                                                                     | 80/310 [01:38<04:38,  1.21s/it]                                                                                                                                                         {'loss': 1.309, 'grad_norm': 8.022098541259766, 'learning_rate': 1.857457136130651e-05, 'epoch': 0.26}
 26%|█████████████████████████████▋                                                                                     | 80/310 [01:38<04:38,  1.21s/it] 26%|██████████████████████████████                                                                                     | 81/310 [01:39<04:37,  1.21s/it] 26%|██████████████████████████████▍                                                                                    | 82/310 [01:40<04:35,  1.21s/it] 27%|██████████████████████████████▊                                                                                    | 83/310 [01:41<04:34,  1.21s/it] 27%|███████████████████████████████▏                                                                                   | 84/310 [01:43<04:32,  1.21s/it] 27%|███████████████████████████████▌                                                                                   | 85/310 [01:44<04:31,  1.20s/it]                                                                                                                                                         {'loss': 1.278, 'grad_norm': 3.8160018920898438, 'learning_rate': 1.827143860680199e-05, 'epoch': 0.27}
 27%|███████████████████████████████▌                                                                                   | 85/310 [01:44<04:31,  1.20s/it] 28%|███████████████████████████████▉                                                                                   | 86/310 [01:45<04:29,  1.20s/it] 28%|████████████████████████████████▎                                                                                  | 87/310 [01:46<04:40,  1.26s/it] 28%|████████████████████████████████▋                                                                                  | 88/310 [01:48<04:35,  1.24s/it] 29%|█████████████████████████████████                                                                                  | 89/310 [01:49<04:31,  1.23s/it] 29%|█████████████████████████████████▍                                                                                 | 90/310 [01:50<04:28,  1.22s/it]                                                                                                                                                         {'loss': 1.2788, 'grad_norm': 4.026176452636719, 'learning_rate': 1.7942094002155122e-05, 'epoch': 0.29}
 29%|█████████████████████████████████▍                                                                                 | 90/310 [01:50<04:28,  1.22s/it] 29%|█████████████████████████████████▊                                                                                 | 91/310 [01:51<04:26,  1.22s/it] 30%|██████████████████████████████████▏                                                                                | 92/310 [01:52<04:23,  1.21s/it] 30%|██████████████████████████████████▌                                                                                | 93/310 [01:54<04:22,  1.21s/it] 30%|██████████████████████████████████▊                                                                                | 94/310 [01:55<04:19,  1.20s/it] 31%|███████████████████████████████████▏                                                                               | 95/310 [01:56<04:30,  1.26s/it]                                                                                                                                                         {'loss': 1.2412, 'grad_norm': 4.071281909942627, 'learning_rate': 1.758758122692791e-05, 'epoch': 0.31}
 31%|███████████████████████████████████▏                                                                               | 95/310 [01:56<04:30,  1.26s/it] 31%|███████████████████████████████████▌                                                                               | 96/310 [01:57<04:25,  1.24s/it] 31%|███████████████████████████████████▉                                                                               | 97/310 [01:59<04:32,  1.28s/it] 32%|████████████████████████████████████▎                                                                              | 98/310 [02:00<04:26,  1.26s/it] 32%|████████████████████████████████████▋                                                                              | 99/310 [02:01<04:21,  1.24s/it] 32%|████████████████████████████████████▊                                                                             | 100/310 [02:02<04:17,  1.23s/it]                                                                                                                                                         {'loss': 1.1916, 'grad_norm': 3.6679961681365967, 'learning_rate': 1.7209023717584013e-05, 'epoch': 0.32}
 32%|████████████████████████████████████▊                                                                             | 100/310 [02:02<04:17,  1.23s/it][INFO|trainer.py:4307] 2026-04-18 02:59:08,316 >> 
***** Running Evaluation *****
[INFO|trainer.py:4309] 2026-04-18 02:59:08,316 >>   Num examples = 1074
[INFO|trainer.py:4312] 2026-04-18 02:59:08,316 >>   Batch size = 8

  0%|                                                                                                                             | 0/34 [00:00<?, ?it/s][A
  6%|██████▉                                                                                                              | 2/34 [00:00<00:02, 12.51it/s][A
 12%|█████████████▊                                                                                                       | 4/34 [00:00<00:03,  7.94it/s][A
 15%|█████████████████▏                                                                                                   | 5/34 [00:00<00:03,  7.34it/s][A
 18%|████████████████████▋                                                                                                | 6/34 [00:00<00:04,  6.93it/s][A
 21%|████████████████████████                                                                                             | 7/34 [00:00<00:04,  6.69it/s][A
 24%|███████████████████████████▌                                                                                         | 8/34 [00:01<00:03,  6.59it/s][A
 26%|██████████████████████████████▉                                                                                      | 9/34 [00:01<00:03,  6.48it/s][A
 29%|██████████████████████████████████                                                                                  | 10/34 [00:01<00:03,  6.46it/s][A
 32%|█████████████████████████████████████▌                                                                              | 11/34 [00:01<00:03,  6.37it/s][A
 35%|████████████████████████████████████████▉                                                                           | 12/34 [00:01<00:03,  6.35it/s][A
 38%|████████████████████████████████████████████▎                                                                       | 13/34 [00:01<00:03,  6.28it/s][A
 41%|███████████████████████████████████████████████▊                                                                    | 14/34 [00:02<00:03,  6.27it/s][A
 44%|███████████████████████████████████████████████████▏                                                                | 15/34 [00:02<00:03,  6.25it/s][A
 47%|██████████████████████████████████████████████████████▌                                                             | 16/34 [00:02<00:02,  6.22it/s][A
 50%|██████████████████████████████████████████████████████████                                                          | 17/34 [00:02<00:02,  6.24it/s][A
 53%|█████████████████████████████████████████████████████████████▍                                                      | 18/34 [00:02<00:02,  6.19it/s][A
 56%|████████████████████████████████████████████████████████████████▊                                                   | 19/34 [00:02<00:02,  6.19it/s][A
 59%|████████████████████████████████████████████████████████████████████▏                                               | 20/34 [00:03<00:02,  6.18it/s][A
 62%|███████████████████████████████████████████████████████████████████████▋                                            | 21/34 [00:03<00:02,  6.20it/s][A
 65%|███████████████████████████████████████████████████████████████████████████                                         | 22/34 [00:03<00:01,  6.22it/s][A
 68%|██████████████████████████████████████████████████████████████████████████████▍                                     | 23/34 [00:03<00:01,  6.23it/s][A
 71%|█████████████████████████████████████████████████████████████████████████████████▉                                  | 24/34 [00:03<00:01,  6.25it/s][A
 74%|█████████████████████████████████████████████████████████████████████████████████████▎                              | 25/34 [00:03<00:01,  6.28it/s][A
 76%|████████████████████████████████████████████████████████████████████████████████████████▋                           | 26/34 [00:04<00:01,  6.28it/s][A
 79%|████████████████████████████████████████████████████████████████████████████████████████████                        | 27/34 [00:04<00:01,  6.28it/s][A
 82%|███████████████████████████████████████████████████████████████████████████████████████████████▌                    | 28/34 [00:04<00:00,  6.29it/s][A
 85%|██████████████████████████████████████████████████████████████████████████████████████████████████▉                 | 29/34 [00:04<00:00,  6.29it/s][A
 88%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎             | 30/34 [00:04<00:00,  6.28it/s][A
 91%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▊          | 31/34 [00:04<00:00,  6.24it/s][A
 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▏      | 32/34 [00:04<00:00,  6.23it/s][A
 97%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌   | 33/34 [00:05<00:00,  6.23it/s][A
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.28it/s][A                                                                                                                                                         
                                                                                                                                                         [A{'eval_loss': 1.2297844886779785, 'eval_runtime': 5.4527, 'eval_samples_per_second': 196.965, 'eval_steps_per_second': 6.235, 'epoch': 0.32}
 32%|████████████████████████████████████▊                                                                             | 100/310 [02:08<04:17,  1.23s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.28it/s][A
                                                                                                                                                         [A 33%|█████████████████████████████████████▏                                                                            | 101/310 [02:09<09:57,  2.86s/it] 33%|█████████████████████████████████████▌                                                                            | 102/310 [02:10<08:11,  2.36s/it] 33%|█████████████████████████████████████▉                                                                            | 103/310 [02:11<06:56,  2.01s/it] 34%|██████████████████████████████████████▏                                                                           | 104/310 [02:13<06:04,  1.77s/it] 34%|██████████████████████████████████████▌                                                                           | 105/310 [02:14<05:27,  1.60s/it]                                                                                                                                                         {'loss': 1.2004, 'grad_norm': 3.7028727531433105, 'learning_rate': 1.6807621107364613e-05, 'epoch': 0.34}
 34%|██████████████████████████████████████▌                                                                           | 105/310 [02:14<05:27,  1.60s/it] 34%|██████████████████████████████████████▉                                                                           | 106/310 [02:15<05:11,  1.53s/it] 35%|███████████████████████████████████████▎                                                                          | 107/310 [02:16<04:50,  1.43s/it] 35%|███████████████████████████████████████▋                                                                          | 108/310 [02:18<04:35,  1.36s/it] 35%|████████████████████████████████████████                                                                          | 109/310 [02:19<04:24,  1.32s/it] 35%|████████████████████████████████████████▍                                                                         | 110/310 [02:20<04:16,  1.28s/it]                                                                                                                                                         {'loss': 1.1933, 'grad_norm': 3.7159862518310547, 'learning_rate': 1.6384645424699835e-05, 'epoch': 0.35}
 35%|████████████████████████████████████████▍                                                                         | 110/310 [02:20<04:16,  1.28s/it] 36%|████████████████████████████████████████▊                                                                         | 111/310 [02:21<04:10,  1.26s/it] 36%|█████████████████████████████████████████▏                                                                        | 112/310 [02:22<04:05,  1.24s/it] 36%|█████████████████████████████████████████▌                                                                        | 113/310 [02:24<04:01,  1.23s/it] 37%|█████████████████████████████████████████▉                                                                        | 114/310 [02:25<03:58,  1.22s/it] 37%|██████████████████████████████████████████▎                                                                       | 115/310 [02:26<03:56,  1.21s/it]                                                                                                                                                         {'loss': 1.1913, 'grad_norm': 3.5471580028533936, 'learning_rate': 1.594143706220273e-05, 'epoch': 0.37}
 37%|██████████████████████████████████████████▎                                                                       | 115/310 [02:26<03:56,  1.21s/it] 37%|██████████████████████████████████████████▋                                                                       | 116/310 [02:27<04:05,  1.26s/it] 38%|███████████████████████████████████████████                                                                       | 117/310 [02:29<03:59,  1.24s/it] 38%|███████████████████████████████████████████▍                                                                      | 118/310 [02:30<04:03,  1.27s/it] 38%|███████████████████████████████████████████▊                                                                      | 119/310 [02:31<03:57,  1.24s/it] 39%|████████████████████████████████████████████▏                                                                     | 120/310 [02:32<03:53,  1.23s/it]                                                                                                                                                         {'loss': 1.1872, 'grad_norm': 4.138647079467773, 'learning_rate': 1.5479400529019987e-05, 'epoch': 0.39}
 39%|████████████████████████████████████████████▏                                                                     | 120/310 [02:32<03:53,  1.23s/it] 39%|████████████████████████████████████████████▍                                                                     | 121/310 [02:33<03:50,  1.22s/it] 39%|████████████████████████████████████████████▊                                                                     | 122/310 [02:35<03:48,  1.21s/it] 40%|█████████████████████████████████████████████▏                                                                    | 123/310 [02:36<03:46,  1.21s/it] 40%|█████████████████████████████████████████████▌                                                                    | 124/310 [02:37<03:44,  1.21s/it] 40%|█████████████████████████████████████████████▉                                                                    | 125/310 [02:38<03:53,  1.26s/it]                                                                                                                                                         {'loss': 1.1359, 'grad_norm': 3.7750704288482666, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.4}
 40%|█████████████████████████████████████████████▉                                                                    | 125/310 [02:38<03:53,  1.26s/it] 41%|██████████████████████████████████████████████▎                                                                   | 126/310 [02:40<03:48,  1.24s/it] 41%|██████████████████████████████████████████████▋                                                                   | 127/310 [02:41<03:44,  1.23s/it] 41%|███████████████████████████████████████████████                                                                   | 128/310 [02:42<03:41,  1.22s/it] 42%|███████████████████████████████████████████████▍                                                                  | 129/310 [02:43<03:39,  1.21s/it] 42%|███████████████████████████████████████████████▊                                                                  | 130/310 [02:44<03:37,  1.21s/it]                                                                                                                                                         {'loss': 1.1518, 'grad_norm': 3.429034948348999, 'learning_rate': 1.4504754675782731e-05, 'epoch': 0.42}
 42%|███████████████████████████████████████████████▊                                                                  | 130/310 [02:44<03:37,  1.21s/it] 42%|████████████████████████████████████████████████▏                                                                 | 131/310 [02:46<03:35,  1.20s/it] 43%|████████████████████████████████████████████████▌                                                                 | 132/310 [02:47<03:33,  1.20s/it] 43%|████████████████████████████████████████████████▉                                                                 | 133/310 [02:48<03:32,  1.20s/it] 43%|█████████████████████████████████████████████████▎                                                                | 134/310 [02:49<03:40,  1.25s/it] 44%|█████████████████████████████████████████████████▋                                                                | 135/310 [02:51<03:35,  1.23s/it]                                                                                                                                                         {'loss': 1.1405, 'grad_norm': 3.416344404220581, 'learning_rate': 1.3995233968515105e-05, 'epoch': 0.44}
 44%|█████████████████████████████████████████████████▋                                                                | 135/310 [02:51<03:35,  1.23s/it] 44%|██████████████████████████████████████████████████                                                                | 136/310 [02:52<03:32,  1.22s/it] 44%|██████████████████████████████████████████████████▍                                                               | 137/310 [02:53<03:30,  1.21s/it] 45%|██████████████████████████████████████████████████▋                                                               | 138/310 [02:54<03:36,  1.26s/it] 45%|███████████████████████████████████████████████████                                                               | 139/310 [02:56<03:31,  1.24s/it] 45%|███████████████████████████████████████████████████▍                                                              | 140/310 [02:57<03:28,  1.23s/it]                                                                                                                                                         {'loss': 1.1137, 'grad_norm': 4.399573802947998, 'learning_rate': 1.3473052528448203e-05, 'epoch': 0.45}
 45%|███████████████████████████████████████████████████▍                                                              | 140/310 [02:57<03:28,  1.23s/it] 45%|███████████████████████████████████████████████████▊                                                              | 141/310 [02:58<03:25,  1.22s/it] 46%|████████████████████████████████████████████████████▏                                                             | 142/310 [02:59<03:23,  1.21s/it] 46%|████████████████████████████████████████████████████▌                                                             | 143/310 [03:00<03:30,  1.26s/it] 46%|████████████████████████████████████████████████████▉                                                             | 144/310 [03:02<03:25,  1.24s/it] 47%|█████████████████████████████████████████████████████▎                                                            | 145/310 [03:03<03:22,  1.23s/it]                                                                                                                                                         {'loss': 1.1235, 'grad_norm': 3.7525525093078613, 'learning_rate': 1.2939865127176771e-05, 'epoch': 0.47}
 47%|█████████████████████████████████████████████████████▎                                                            | 145/310 [03:03<03:22,  1.23s/it] 47%|█████████████████████████████████████████████████████▋                                                            | 146/310 [03:04<03:19,  1.22s/it] 47%|██████████████████████████████████████████████████████                                                            | 147/310 [03:05<03:17,  1.21s/it] 48%|██████████████████████████████████████████████████████▍                                                           | 148/310 [03:06<03:15,  1.21s/it] 48%|██████████████████████████████████████████████████████▊                                                           | 149/310 [03:08<03:14,  1.21s/it] 48%|███████████████████████████████████████████████████████▏                                                          | 150/310 [03:09<03:12,  1.20s/it]                                                                                                                                                         {'loss': 1.0858, 'grad_norm': 3.8501410484313965, 'learning_rate': 1.2397361413735785e-05, 'epoch': 0.48}
 48%|███████████████████████████████████████████████████████▏                                                          | 150/310 [03:09<03:12,  1.20s/it] 49%|███████████████████████████████████████████████████████▌                                                          | 151/310 [03:10<03:11,  1.20s/it] 49%|███████████████████████████████████████████████████████▉                                                          | 152/310 [03:11<03:17,  1.25s/it] 49%|████████████████████████████████████████████████████████▎                                                         | 153/310 [03:13<03:13,  1.23s/it] 50%|████████████████████████████████████████████████████████▋                                                         | 154/310 [03:14<03:10,  1.22s/it] 50%|█████████████████████████████████████████████████████████                                                         | 155/310 [03:15<03:14,  1.25s/it]                                                                                                                                                         {'loss': 1.0692, 'grad_norm': 3.5585899353027344, 'learning_rate': 1.1847260560171895e-05, 'epoch': 0.5}
 50%|█████████████████████████████████████████████████████████                                                         | 155/310 [03:15<03:14,  1.25s/it] 50%|█████████████████████████████████████████████████████████▎                                                        | 156/310 [03:16<03:10,  1.24s/it] 51%|█████████████████████████████████████████████████████████▋                                                        | 157/310 [03:18<03:07,  1.22s/it] 51%|██████████████████████████████████████████████████████████                                                        | 158/310 [03:19<03:04,  1.22s/it] 51%|██████████████████████████████████████████████████████████▍                                                       | 159/310 [03:20<03:02,  1.21s/it] 52%|██████████████████████████████████████████████████████████▊                                                       | 160/310 [03:21<03:00,  1.20s/it]                                                                                                                                                         {'loss': 1.0313, 'grad_norm': 3.620709180831909, 'learning_rate': 1.1291305813557616e-05, 'epoch': 0.52}
 52%|██████████████████████████████████████████████████████████▊                                                       | 160/310 [03:21<03:00,  1.20s/it] 52%|███████████████████████████████████████████████████████████▏                                                      | 161/310 [03:23<03:06,  1.25s/it] 52%|███████████████████████████████████████████████████████████▌                                                      | 162/310 [03:24<03:03,  1.24s/it] 53%|███████████████████████████████████████████████████████████▉                                                      | 163/310 [03:25<02:59,  1.22s/it] 53%|████████████████████████████████████████████████████████████▎                                                     | 164/310 [03:26<02:57,  1.22s/it] 53%|████████████████████████████████████████████████████████████▋                                                     | 165/310 [03:27<02:55,  1.21s/it]                                                                                                                                                         {'loss': 1.0268, 'grad_norm': 3.7223265171051025, 'learning_rate': 1.0731258971712762e-05, 'epoch': 0.53}
 53%|████████████████████████████████████████████████████████████▋                                                     | 165/310 [03:27<02:55,  1.21s/it] 54%|█████████████████████████████████████████████████████████████                                                     | 166/310 [03:28<02:53,  1.21s/it] 54%|█████████████████████████████████████████████████████████████▍                                                    | 167/310 [03:30<02:52,  1.20s/it] 54%|█████████████████████████████████████████████████████████████▊                                                    | 168/310 [03:31<02:50,  1.20s/it] 55%|██████████████████████████████████████████████████████████████▏                                                   | 169/310 [03:32<02:49,  1.20s/it] 55%|██████████████████████████████████████████████████████████████▌                                                   | 170/310 [03:34<03:01,  1.30s/it]                                                                                                                                                         {'loss': 1.0123, 'grad_norm': 3.67724871635437, 'learning_rate': 1.0168894800139311e-05, 'epoch': 0.55}
 55%|██████████████████████████████████████████████████████████████▌                                                   | 170/310 [03:34<03:01,  1.30s/it] 55%|██████████████████████████████████████████████████████████████▉                                                   | 171/310 [03:35<02:56,  1.27s/it] 55%|███████████████████████████████████████████████████████████████▎                                                  | 172/310 [03:36<02:51,  1.25s/it] 56%|███████████████████████████████████████████████████████████████▌                                                  | 173/310 [03:37<02:48,  1.23s/it] 56%|███████████████████████████████████████████████████████████████▉                                                  | 174/310 [03:38<02:45,  1.22s/it] 56%|████████████████████████████████████████████████████████████████▎                                                 | 175/310 [03:40<02:43,  1.21s/it]                                                                                                                                                         {'loss': 0.9905, 'grad_norm': 4.000744342803955, 'learning_rate': 9.605995407862248e-06, 'epoch': 0.56}
 56%|████████████████████████████████████████████████████████████████▎                                                 | 175/310 [03:40<02:43,  1.21s/it] 57%|████████████████████████████████████████████████████████████████▋                                                 | 176/310 [03:41<02:42,  1.21s/it] 57%|█████████████████████████████████████████████████████████████████                                                 | 177/310 [03:42<02:40,  1.21s/it] 57%|█████████████████████████████████████████████████████████████████▍                                                | 178/310 [03:43<02:38,  1.20s/it] 58%|█████████████████████████████████████████████████████████████████▊                                                | 179/310 [03:45<02:43,  1.25s/it] 58%|██████████████████████████████████████████████████████████████████▏                                               | 180/310 [03:46<02:40,  1.23s/it]                                                                                                                                                         {'loss': 0.9682, 'grad_norm': 3.4585959911346436, 'learning_rate': 9.04434459999902e-06, 'epoch': 0.58}
 58%|██████████████████████████████████████████████████████████████████▏                                               | 180/310 [03:46<02:40,  1.23s/it] 58%|██████████████████████████████████████████████████████████████████▌                                               | 181/310 [03:47<02:37,  1.22s/it] 59%|██████████████████████████████████████████████████████████████████▉                                               | 182/310 [03:48<02:35,  1.22s/it] 59%|███████████████████████████████████████████████████████████████████▎                                              | 183/310 [03:49<02:39,  1.25s/it] 59%|███████████████████████████████████████████████████████████████████▋                                              | 184/310 [03:51<02:35,  1.24s/it] 60%|████████████████████████████████████████████████████████████████████                                              | 185/310 [03:52<02:32,  1.22s/it]                                                                                                                                                         {'loss': 0.9364, 'grad_norm': 3.573272466659546, 'learning_rate': 8.485722224954237e-06, 'epoch': 0.6}
 60%|████████████████████████████████████████████████████████████████████                                              | 185/310 [03:52<02:32,  1.22s/it] 60%|████████████████████████████████████████████████████████████████████▍                                             | 186/310 [03:53<02:30,  1.21s/it] 60%|████████████████████████████████████████████████████████████████████▊                                             | 187/310 [03:54<02:28,  1.21s/it] 61%|█████████████████████████████████████████████████████████████████████▏                                            | 188/310 [03:56<02:33,  1.26s/it] 61%|█████████████████████████████████████████████████████████████████████▌                                            | 189/310 [03:57<02:29,  1.24s/it] 61%|█████████████████████████████████████████████████████████████████████▊                                            | 190/310 [03:58<02:26,  1.22s/it]                                                                                                                                                         {'loss': 0.9559, 'grad_norm': 3.517512083053589, 'learning_rate': 7.93189853415293e-06, 'epoch': 0.61}
 61%|█████████████████████████████████████████████████████████████████████▊                                            | 190/310 [03:58<02:26,  1.22s/it] 62%|██████████████████████████████████████████████████████████████████████▏                                           | 191/310 [03:59<02:24,  1.22s/it] 62%|██████████████████████████████████████████████████████████████████████▌                                           | 192/310 [04:00<02:22,  1.21s/it] 62%|██████████████████████████████████████████████████████████████████████▉                                           | 193/310 [04:02<02:21,  1.21s/it] 63%|███████████████████████████████████████████████████████████████████████▎                                          | 194/310 [04:03<02:19,  1.20s/it] 63%|███████████████████████████████████████████████████████████████████████▋                                          | 195/310 [04:04<02:23,  1.24s/it]                                                                                                                                                         {'loss': 0.9376, 'grad_norm': 3.629652500152588, 'learning_rate': 7.384628572186334e-06, 'epoch': 0.63}
 63%|███████████████████████████████████████████████████████████████████████▋                                          | 195/310 [04:04<02:23,  1.24s/it] 63%|████████████████████████████████████████████████████████████████████████                                          | 196/310 [04:05<02:20,  1.23s/it] 64%|████████████████████████████████████████████████████████████████████████▍                                         | 197/310 [04:07<02:23,  1.27s/it] 64%|████████████████████████████████████████████████████████████████████████▊                                         | 198/310 [04:08<02:19,  1.25s/it] 64%|█████████████████████████████████████████████████████████████████████████▏                                        | 199/310 [04:09<02:16,  1.23s/it] 65%|█████████████████████████████████████████████████████████████████████████▌                                        | 200/310 [04:10<02:14,  1.22s/it]                                                                                                                                                         {'loss': 0.9281, 'grad_norm': 3.6871094703674316, 'learning_rate': 6.845646615147445e-06, 'epoch': 0.65}
 65%|█████████████████████████████████████████████████████████████████████████▌                                        | 200/310 [04:10<02:14,  1.22s/it][INFO|trainer.py:4307] 2026-04-18 03:01:16,244 >> 
***** Running Evaluation *****
[INFO|trainer.py:4309] 2026-04-18 03:01:16,245 >>   Num examples = 1074
[INFO|trainer.py:4312] 2026-04-18 03:01:16,245 >>   Batch size = 8

  0%|                                                                                                                             | 0/34 [00:00<?, ?it/s][A
  6%|██████▉                                                                                                              | 2/34 [00:00<00:02, 12.72it/s][A
 12%|█████████████▊                                                                                                       | 4/34 [00:00<00:03,  8.03it/s][A
 15%|█████████████████▏                                                                                                   | 5/34 [00:00<00:03,  7.44it/s][A
 18%|████████████████████▋                                                                                                | 6/34 [00:00<00:04,  6.97it/s][A
 21%|████████████████████████                                                                                             | 7/34 [00:00<00:04,  6.70it/s][A
 24%|███████████████████████████▌                                                                                         | 8/34 [00:01<00:03,  6.63it/s][A
 26%|██████████████████████████████▉                                                                                      | 9/34 [00:01<00:03,  6.53it/s][A
 29%|██████████████████████████████████                                                                                  | 10/34 [00:01<00:03,  6.49it/s][A
 32%|█████████████████████████████████████▌                                                                              | 11/34 [00:01<00:03,  6.45it/s][A
 35%|████████████████████████████████████████▉                                                                           | 12/34 [00:01<00:03,  6.40it/s][A
 38%|████████████████████████████████████████████▎                                                                       | 13/34 [00:01<00:03,  6.34it/s][A
 41%|███████████████████████████████████████████████▊                                                                    | 14/34 [00:02<00:03,  6.35it/s][A
 44%|███████████████████████████████████████████████████▏                                                                | 15/34 [00:02<00:02,  6.35it/s][A
 47%|██████████████████████████████████████████████████████▌                                                             | 16/34 [00:02<00:02,  6.35it/s][A
 50%|██████████████████████████████████████████████████████████                                                          | 17/34 [00:02<00:02,  6.41it/s][A
 53%|█████████████████████████████████████████████████████████████▍                                                      | 18/34 [00:02<00:02,  6.39it/s][A
 56%|████████████████████████████████████████████████████████████████▊                                                   | 19/34 [00:02<00:02,  6.29it/s][A
 59%|████████████████████████████████████████████████████████████████████▏                                               | 20/34 [00:03<00:02,  6.30it/s][A
 62%|███████████████████████████████████████████████████████████████████████▋                                            | 21/34 [00:03<00:02,  6.29it/s][A
 65%|███████████████████████████████████████████████████████████████████████████                                         | 22/34 [00:03<00:01,  6.36it/s][A
 68%|██████████████████████████████████████████████████████████████████████████████▍                                     | 23/34 [00:03<00:01,  6.40it/s][A
 71%|█████████████████████████████████████████████████████████████████████████████████▉                                  | 24/34 [00:03<00:01,  6.40it/s][A
 74%|█████████████████████████████████████████████████████████████████████████████████████▎                              | 25/34 [00:03<00:01,  6.32it/s][A
 76%|████████████████████████████████████████████████████████████████████████████████████████▋                           | 26/34 [00:03<00:01,  6.31it/s][A
 79%|████████████████████████████████████████████████████████████████████████████████████████████                        | 27/34 [00:04<00:01,  6.25it/s][A
 82%|███████████████████████████████████████████████████████████████████████████████████████████████▌                    | 28/34 [00:04<00:00,  6.24it/s][A
 85%|██████████████████████████████████████████████████████████████████████████████████████████████████▉                 | 29/34 [00:04<00:00,  6.30it/s][A
 88%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎             | 30/34 [00:04<00:00,  6.31it/s][A
 91%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▊          | 31/34 [00:04<00:00,  6.29it/s][A
 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▏      | 32/34 [00:04<00:00,  6.26it/s][A
 97%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌   | 33/34 [00:05<00:00,  6.29it/s][A
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.26it/s][A                                                                                                                                                         
                                                                                                                                                         [A{'eval_loss': 0.9366754293441772, 'eval_runtime': 5.3925, 'eval_samples_per_second': 199.164, 'eval_steps_per_second': 6.305, 'epoch': 0.65}
 65%|█████████████████████████████████████████████████████████████████████████▌                                        | 200/310 [04:16<02:14,  1.22s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.26it/s][A
                                                                                                                                                         [A[INFO|trainer.py:3984] 2026-04-18 03:01:46,271 >> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-200
[INFO|configuration_utils.py:419] 2026-04-18 03:01:46,283 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-200/config.json
[INFO|configuration_utils.py:911] 2026-04-18 03:01:46,290 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-200/generation_config.json
[INFO|modeling_utils.py:3580] 2026-04-18 03:02:58,504 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-200/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2510] 2026-04-18 03:02:58,527 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-200/tokenizer_config.json
[INFO|tokenization_utils_base.py:2519] 2026-04-18 03:02:58,543 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-200/special_tokens_map.json
 65%|████████████████████████████████████████████████████████████████████████▌                                       | 201/310 [09:07<2:43:26, 89.97s/it] 65%|████████████████████████████████████████████████████████████████████████▉                                       | 202/310 [09:08<1:53:59, 63.33s/it] 65%|█████████████████████████████████████████████████████████████████████████▎                                      | 203/310 [09:10<1:19:41, 44.69s/it] 66%|███████████████████████████████████████████████████████████████████████████                                       | 204/310 [09:11<55:53, 31.64s/it] 66%|███████████████████████████████████████████████████████████████████████████▍                                      | 205/310 [09:12<39:29, 22.57s/it]                                                                                                                                                         {'loss': 0.9359, 'grad_norm': 3.7159454822540283, 'learning_rate': 6.31666067478113e-06, 'epoch': 0.66}
 66%|███████████████████████████████████████████████████████████████████████████▍                                      | 205/310 [09:12<39:29, 22.57s/it] 66%|███████████████████████████████████████████████████████████████████████████▊                                      | 206/310 [09:13<28:00, 16.16s/it] 67%|████████████████████████████████████████████████████████████████████████████                                      | 207/310 [09:15<20:01, 11.67s/it] 67%|████████████████████████████████████████████████████████████████████████████▍                                     | 208/310 [09:16<14:38,  8.61s/it] 67%|████████████████████████████████████████████████████████████████████████████▊                                     | 209/310 [09:17<10:44,  6.39s/it] 68%|█████████████████████████████████████████████████████████████████████████████▏                                    | 210/310 [09:19<08:02,  4.83s/it]                                                                                                                                                         {'loss': 0.8788, 'grad_norm': 3.5741474628448486, 'learning_rate': 5.799347085864851e-06, 'epoch': 0.68}
 68%|█████████████████████████████████████████████████████████████████████████████▏                                    | 210/310 [09:19<08:02,  4.83s/it] 68%|█████████████████████████████████████████████████████████████████████████████▌                                    | 211/310 [09:20<06:10,  3.74s/it] 68%|█████████████████████████████████████████████████████████████████████████████▉                                    | 212/310 [09:21<04:51,  2.97s/it] 69%|██████████████████████████████████████████████████████████████████████████████▎                                   | 213/310 [09:22<03:56,  2.44s/it] 69%|██████████████████████████████████████████████████████████████████████████████▋                                   | 214/310 [09:23<03:18,  2.06s/it] 69%|███████████████████████████████████████████████████████████████████████████████                                   | 215/310 [09:25<02:55,  1.85s/it]                                                                                                                                                         {'loss': 0.8994, 'grad_norm': 3.8699448108673096, 'learning_rate': 5.295345193972445e-06, 'epoch': 0.69}
 69%|███████████████████████████████████████████████████████████████████████████████                                   | 215/310 [09:25<02:55,  1.85s/it] 70%|███████████████████████████████████████████████████████████████████████████████▍                                  | 216/310 [09:26<02:35,  1.65s/it] 70%|███████████████████████████████████████████████████████████████████████████████▊                                  | 217/310 [09:27<02:20,  1.51s/it] 70%|████████████████████████████████████████████████████████████████████████████████▏                                 | 218/310 [09:28<02:10,  1.42s/it] 71%|████████████████████████████████████████████████████████████████████████████████▌                                 | 219/310 [09:29<02:03,  1.35s/it] 71%|████████████████████████████████████████████████████████████████████████████████▉                                 | 220/310 [09:31<01:57,  1.31s/it]                                                                                                                                                         {'loss': 0.8894, 'grad_norm': 3.681175470352173, 'learning_rate': 4.8062521604551245e-06, 'epoch': 0.71}
 71%|████████████████████████████████████████████████████████████████████████████████▉                                 | 220/310 [09:31<01:57,  1.31s/it] 71%|█████████████████████████████████████████████████████████████████████████████████▎                                | 221/310 [09:32<01:53,  1.27s/it] 72%|█████████████████████████████████████████████████████████████████████████████████▋                                | 222/310 [09:33<01:50,  1.25s/it] 72%|██████████████████████████████████████████████████████████████████████████████████                                | 223/310 [09:34<01:51,  1.29s/it] 72%|██████████████████████████████████████████████████████████████████████████████████▎                               | 224/310 [09:36<01:52,  1.30s/it] 73%|██████████████████████████████████████████████████████████████████████████████████▋                               | 225/310 [09:37<01:48,  1.27s/it]                                                                                                                                                         {'loss': 0.8478, 'grad_norm': 4.077960968017578, 'learning_rate': 4.333617901102592e-06, 'epoch': 0.73}
 73%|██████████████████████████████████████████████████████████████████████████████████▋                               | 225/310 [09:37<01:48,  1.27s/it] 73%|███████████████████████████████████████████████████████████████████████████████████                               | 226/310 [09:38<01:44,  1.25s/it] 73%|███████████████████████████████████████████████████████████████████████████████████▍                              | 227/310 [09:39<01:42,  1.24s/it] 74%|███████████████████████████████████████████████████████████████████████████████████▊                              | 228/310 [09:41<01:40,  1.22s/it] 74%|████████████████████████████████████████████████████████████████████████████████████▏                             | 229/310 [09:42<01:38,  1.22s/it] 74%|████████████████████████████████████████████████████████████████████████████████████▌                             | 230/310 [09:43<01:37,  1.21s/it]                                                                                                                                                         {'loss': 0.8409, 'grad_norm': 4.131643772125244, 'learning_rate': 3.878940174523371e-06, 'epoch': 0.74}
 74%|████████████████████████████████████████████████████████████████████████████████████▌                             | 230/310 [09:43<01:37,  1.21s/it] 75%|████████████████████████████████████████████████████████████████████████████████████▉                             | 231/310 [09:44<01:35,  1.21s/it] 75%|█████████████████████████████████████████████████████████████████████████████████████▎                            | 232/310 [09:45<01:34,  1.21s/it] 75%|█████████████████████████████████████████████████████████████████████████████████████▋                            | 233/310 [09:47<01:36,  1.25s/it] 75%|██████████████████████████████████████████████████████████████████████████████████████                            | 234/310 [09:48<01:33,  1.24s/it] 76%|██████████████████████████████████████████████████████████████████████████████████████▍                           | 235/310 [09:49<01:31,  1.23s/it]                                                                                                                                                         {'loss': 0.878, 'grad_norm': 3.835108995437622, 'learning_rate': 3.4436598358091577e-06, 'epoch': 0.76}
 76%|██████████████████████████████████████████████████████████████████████████████████████▍                           | 235/310 [09:49<01:31,  1.23s/it] 76%|██████████████████████████████████████████████████████████████████████████████████████▊                           | 236/310 [09:50<01:34,  1.28s/it] 76%|███████████████████████████████████████████████████████████████████████████████████████▏                          | 237/310 [09:52<01:31,  1.25s/it] 77%|███████████████████████████████████████████████████████████████████████████████████████▌                          | 238/310 [09:53<01:28,  1.24s/it] 77%|███████████████████████████████████████████████████████████████████████████████████████▉                          | 239/310 [09:54<01:26,  1.22s/it] 77%|████████████████████████████████████████████████████████████████████████████████████████▎                         | 240/310 [09:55<01:25,  1.22s/it]                                                                                                                                                         {'loss': 0.8321, 'grad_norm': 3.9890787601470947, 'learning_rate': 3.0291562705240107e-06, 'epoch': 0.77}
 77%|████████████████████████████████████████████████████████████████████████████████████████▎                         | 240/310 [09:55<01:25,  1.22s/it] 78%|████████████████████████████████████████████████████████████████████████████████████████▋                         | 241/310 [09:56<01:23,  1.21s/it] 78%|████████████████████████████████████████████████████████████████████████████████████████▉                         | 242/310 [09:58<01:25,  1.26s/it] 78%|█████████████████████████████████████████████████████████████████████████████████████████▎                        | 243/310 [09:59<01:23,  1.24s/it] 79%|█████████████████████████████████████████████████████████████████████████████████████████▋                        | 244/310 [10:00<01:21,  1.23s/it] 79%|██████████████████████████████████████████████████████████████████████████████████████████                        | 245/310 [10:01<01:19,  1.22s/it]                                                                                                                                                         {'loss': 0.8269, 'grad_norm': 3.728030204772949, 'learning_rate': 2.6367430234880286e-06, 'epoch': 0.79}
 79%|██████████████████████████████████████████████████████████████████████████████████████████                        | 245/310 [10:01<01:19,  1.22s/it] 79%|██████████████████████████████████████████████████████████████████████████████████████████▍                       | 246/310 [10:03<01:17,  1.22s/it] 80%|██████████████████████████████████████████████████████████████████████████████████████████▊                       | 247/310 [10:04<01:16,  1.21s/it] 80%|███████████████████████████████████████████████████████████████████████████████████████████▏                      | 248/310 [10:05<01:17,  1.25s/it] 80%|███████████████████████████████████████████████████████████████████████████████████████████▌                      | 249/310 [10:06<01:15,  1.23s/it] 81%|███████████████████████████████████████████████████████████████████████████████████████████▉                      | 250/310 [10:08<01:13,  1.23s/it]                                                                                                                                                         {'loss': 0.7872, 'grad_norm': 3.8121144771575928, 'learning_rate': 2.2676636362076075e-06, 'epoch': 0.81}
 81%|███████████████████████████████████████████████████████████████████████████████████████████▉                      | 250/310 [10:08<01:13,  1.23s/it] 81%|████████████████████████████████████████████████████████████████████████████████████████████▎                     | 251/310 [10:09<01:14,  1.27s/it] 81%|████████████████████████████████████████████████████████████████████████████████████████████▋                     | 252/310 [10:10<01:12,  1.25s/it] 82%|█████████████████████████████████████████████████████████████████████████████████████████████                     | 253/310 [10:11<01:10,  1.24s/it] 82%|█████████████████████████████████████████████████████████████████████████████████████████████▍                    | 254/310 [10:13<01:08,  1.23s/it] 82%|█████████████████████████████████████████████████████████████████████████████████████████████▊                    | 255/310 [10:14<01:07,  1.22s/it]                                                                                                                                                         {'loss': 0.8187, 'grad_norm': 3.7709388732910156, 'learning_rate': 1.9230877061433505e-06, 'epoch': 0.82}
 82%|█████████████████████████████████████████████████████████████████████████████████████████████▊                    | 255/310 [10:14<01:07,  1.22s/it] 83%|██████████████████████████████████████████████████████████████████████████████████████████████▏                   | 256/310 [10:15<01:05,  1.22s/it] 83%|██████████████████████████████████████████████████████████████████████████████████████████████▌                   | 257/310 [10:16<01:04,  1.21s/it] 83%|██████████████████████████████████████████████████████████████████████████████████████████████▉                   | 258/310 [10:17<01:02,  1.21s/it] 84%|███████████████████████████████████████████████████████████████████████████████████████████████▏                  | 259/310 [10:19<01:04,  1.26s/it] 84%|███████████████████████████████████████████████████████████████████████████████████████████████▌                  | 260/310 [10:20<01:04,  1.29s/it]                                                                                                                                                         {'loss': 0.8195, 'grad_norm': 3.6049516201019287, 'learning_rate': 1.60410718030361e-06, 'epoch': 0.84}
 84%|███████████████████████████████████████████████████████████████████████████████████████████████▌                  | 260/310 [10:20<01:04,  1.29s/it] 84%|███████████████████████████████████████████████████████████████████████████████████████████████▉                  | 261/310 [10:21<01:02,  1.27s/it] 85%|████████████████████████████████████████████████████████████████████████████████████████████████▎                 | 262/310 [10:23<00:59,  1.25s/it] 85%|████████████████████████████████████████████████████████████████████████████████████████████████▋                 | 263/310 [10:24<00:58,  1.24s/it] 85%|█████████████████████████████████████████████████████████████████████████████████████████████████                 | 264/310 [10:25<00:56,  1.23s/it] 85%|█████████████████████████████████████████████████████████████████████████████████████████████████▍                | 265/310 [10:26<00:55,  1.22s/it]                                                                                                                                                         {'loss': 0.8091, 'grad_norm': 3.6554863452911377, 'learning_rate': 1.3117328949091634e-06, 'epoch': 0.85}
 85%|█████████████████████████████████████████████████████████████████████████████████████████████████▍                | 265/310 [10:26<00:55,  1.22s/it] 86%|█████████████████████████████████████████████████████████████████████████████████████████████████▊                | 266/310 [10:27<00:53,  1.22s/it] 86%|██████████████████████████████████████████████████████████████████████████████████████████████████▏               | 267/310 [10:29<00:52,  1.22s/it] 86%|██████████████████████████████████████████████████████████████████████████████████████████████████▌               | 268/310 [10:30<00:50,  1.21s/it] 87%|██████████████████████████████████████████████████████████████████████████████████████████████████▉               | 269/310 [10:31<00:53,  1.30s/it] 87%|███████████████████████████████████████████████████████████████████████████████████████████████████▎              | 270/310 [10:33<00:50,  1.27s/it]                                                                                                                                                         {'loss': 0.7855, 'grad_norm': 3.8024368286132812, 'learning_rate': 1.0468913720946084e-06, 'epoch': 0.87}
 87%|███████████████████████████████████████████████████████████████████████████████████████████████████▎              | 270/310 [10:33<00:50,  1.27s/it] 87%|███████████████████████████████████████████████████████████████████████████████████████████████████▋              | 271/310 [10:34<00:48,  1.25s/it] 88%|████████████████████████████████████████████████████████████████████████████████████████████████████              | 272/310 [10:35<00:46,  1.23s/it] 88%|████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 273/310 [10:36<00:45,  1.23s/it] 88%|████████████████████████████████████████████████████████████████████████████████████████████████████▊             | 274/310 [10:37<00:43,  1.22s/it] 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████▏            | 275/310 [10:39<00:42,  1.21s/it]                                                                                                                                                         {'loss': 0.7959, 'grad_norm': 3.79064679145813, 'learning_rate': 8.10421883797694e-07, 'epoch': 0.89}
 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████▏            | 275/310 [10:39<00:42,  1.21s/it] 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████▍            | 276/310 [10:40<00:41,  1.21s/it] 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████▊            | 277/310 [10:41<00:39,  1.21s/it] 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏           | 278/310 [10:42<00:41,  1.29s/it] 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▌           | 279/310 [10:44<00:39,  1.26s/it] 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▉           | 280/310 [10:45<00:37,  1.25s/it]                                                                                                                                                         {'loss': 0.7824, 'grad_norm': 3.7366783618927, 'learning_rate': 6.030737921409169e-07, 'epoch': 0.9}
 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▉           | 280/310 [10:45<00:37,  1.25s/it] 91%|███████████████████████████████████████████████████████████████████████████████████████████████████████▎          | 281/310 [10:46<00:35,  1.23s/it] 91%|███████████████████████████████████████████████████████████████████████████████████████████████████████▋          | 282/310 [10:47<00:34,  1.22s/it] 91%|████████████████████████████████████████████████████████████████████████████████████████████████████████          | 283/310 [10:48<00:32,  1.22s/it] 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████▍         | 284/310 [10:50<00:31,  1.21s/it] 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████▊         | 285/310 [10:51<00:30,  1.21s/it]                                                                                                                                                         {'loss': 0.7653, 'grad_norm': 3.792200803756714, 'learning_rate': 4.2550417473364524e-07, 'epoch': 0.92}
 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████▊         | 285/310 [10:51<00:30,  1.21s/it] 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▏        | 286/310 [10:52<00:28,  1.21s/it] 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▌        | 287/310 [10:54<00:29,  1.30s/it] 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▉        | 288/310 [10:55<00:27,  1.27s/it] 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▎       | 289/310 [10:56<00:26,  1.25s/it] 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋       | 290/310 [10:57<00:24,  1.23s/it]                                                                                                                                                         {'loss': 0.759, 'grad_norm': 3.850405216217041, 'learning_rate': 2.7827574242009434e-07, 'epoch': 0.94}
 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋       | 290/310 [10:57<00:24,  1.23s/it] 94%|███████████████████████████████████████████████████████████████████████████████████████████████████████████       | 291/310 [10:58<00:23,  1.22s/it] 94%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▍      | 292/310 [11:00<00:21,  1.22s/it] 95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▋      | 293/310 [11:01<00:20,  1.21s/it] 95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████      | 294/310 [11:02<00:19,  1.21s/it] 95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▍     | 295/310 [11:03<00:18,  1.21s/it]                                                                                                                                                         {'loss': 0.7866, 'grad_norm': 3.8457884788513184, 'learning_rate': 1.6185505607171027e-07, 'epoch': 0.95}
 95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▍     | 295/310 [11:03<00:18,  1.21s/it] 95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▊     | 296/310 [11:05<00:18,  1.29s/it] 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▏    | 297/310 [11:06<00:16,  1.27s/it] 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▌    | 298/310 [11:07<00:14,  1.25s/it] 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▉    | 299/310 [11:08<00:13,  1.24s/it] 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▎   | 300/310 [11:09<00:12,  1.23s/it]                                                                                                                                                         {'loss': 0.7683, 'grad_norm': 3.824265241622925, 'learning_rate': 7.661104807487607e-08, 'epoch': 0.97}
 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▎   | 300/310 [11:09<00:12,  1.23s/it][INFO|trainer.py:4307] 2026-04-18 03:08:15,413 >> 
***** Running Evaluation *****
[INFO|trainer.py:4309] 2026-04-18 03:08:15,413 >>   Num examples = 1074
[INFO|trainer.py:4312] 2026-04-18 03:08:15,413 >>   Batch size = 8

  0%|                                                                                                                             | 0/34 [00:00<?, ?it/s][A
  6%|██████▉                                                                                                              | 2/34 [00:00<00:02, 12.77it/s][A
 12%|█████████████▊                                                                                                       | 4/34 [00:00<00:03,  7.98it/s][A
 15%|█████████████████▏                                                                                                   | 5/34 [00:00<00:03,  7.36it/s][A
 18%|████████████████████▋                                                                                                | 6/34 [00:00<00:04,  6.99it/s][A
 21%|████████████████████████                                                                                             | 7/34 [00:00<00:04,  6.72it/s][A
 24%|███████████████████████████▌                                                                                         | 8/34 [00:01<00:03,  6.56it/s][A
 26%|██████████████████████████████▉                                                                                      | 9/34 [00:01<00:03,  6.49it/s][A
 29%|██████████████████████████████████                                                                                  | 10/34 [00:01<00:03,  6.41it/s][A
 32%|█████████████████████████████████████▌                                                                              | 11/34 [00:01<00:03,  6.35it/s][A
 35%|████████████████████████████████████████▉                                                                           | 12/34 [00:01<00:03,  6.30it/s][A
 38%|████████████████████████████████████████████▎                                                                       | 13/34 [00:01<00:03,  6.28it/s][A
 41%|███████████████████████████████████████████████▊                                                                    | 14/34 [00:02<00:03,  6.30it/s][A
 44%|███████████████████████████████████████████████████▏                                                                | 15/34 [00:02<00:03,  6.30it/s][A
 47%|██████████████████████████████████████████████████████▌                                                             | 16/34 [00:02<00:02,  6.28it/s][A
 50%|██████████████████████████████████████████████████████████                                                          | 17/34 [00:02<00:02,  6.28it/s][A
 53%|█████████████████████████████████████████████████████████████▍                                                      | 18/34 [00:02<00:02,  6.19it/s][A
 56%|████████████████████████████████████████████████████████████████▊                                                   | 19/34 [00:02<00:02,  6.15it/s][A
 59%|████████████████████████████████████████████████████████████████████▏                                               | 20/34 [00:03<00:02,  6.19it/s][A
 62%|███████████████████████████████████████████████████████████████████████▋                                            | 21/34 [00:03<00:02,  6.23it/s][A
 65%|███████████████████████████████████████████████████████████████████████████                                         | 22/34 [00:03<00:01,  6.20it/s][A
 68%|██████████████████████████████████████████████████████████████████████████████▍                                     | 23/34 [00:03<00:01,  6.19it/s][A
 71%|█████████████████████████████████████████████████████████████████████████████████▉                                  | 24/34 [00:03<00:01,  6.18it/s][A
 74%|█████████████████████████████████████████████████████████████████████████████████████▎                              | 25/34 [00:03<00:01,  6.21it/s][A
 76%|████████████████████████████████████████████████████████████████████████████████████████▋                           | 26/34 [00:04<00:01,  6.19it/s][A
 79%|████████████████████████████████████████████████████████████████████████████████████████████                        | 27/34 [00:04<00:01,  6.20it/s][A
 82%|███████████████████████████████████████████████████████████████████████████████████████████████▌                    | 28/34 [00:04<00:00,  6.25it/s][A
 85%|██████████████████████████████████████████████████████████████████████████████████████████████████▉                 | 29/34 [00:04<00:00,  6.26it/s][A
 88%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎             | 30/34 [00:04<00:00,  6.25it/s][A
 91%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▊          | 31/34 [00:04<00:00,  6.21it/s][A
 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▏      | 32/34 [00:04<00:00,  6.21it/s][A
 97%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌   | 33/34 [00:05<00:00,  6.22it/s][A
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.30it/s][A                                                                                                                                                         
                                                                                                                                                         [A{'eval_loss': 0.7902069091796875, 'eval_runtime': 5.4529, 'eval_samples_per_second': 196.96, 'eval_steps_per_second': 6.235, 'epoch': 0.97}
 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▎   | 300/310 [11:15<00:12,  1.23s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.30it/s][A
                                                                                                                                                         [A 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▋   | 301/310 [11:16<00:25,  2.86s/it] 97%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████   | 302/310 [11:17<00:18,  2.36s/it] 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▍  | 303/310 [11:19<00:14,  2.01s/it] 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▊  | 304/310 [11:20<00:10,  1.77s/it] 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 305/310 [11:21<00:08,  1.69s/it]                                                                                                                                                         {'loss': 0.778, 'grad_norm': 3.8102200031280518, 'learning_rate': 2.2813853199292745e-08, 'epoch': 0.98}
 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 305/310 [11:21<00:08,  1.69s/it] 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 306/310 [11:22<00:06,  1.54s/it] 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 307/310 [11:24<00:04,  1.44s/it] 99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 308/310 [11:25<00:02,  1.37s/it]100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 309/310 [11:26<00:01,  1.32s/it]100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 310/310 [11:27<00:00,  1.28s/it]                                                                                                                                                         {'loss': 0.7865, 'grad_norm': 3.923637628555298, 'learning_rate': 6.339525519594159e-10, 'epoch': 1.0}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 310/310 [11:27<00:00,  1.28s/it][INFO|trainer.py:3984] 2026-04-18 03:08:48,538 >> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-310
[INFO|configuration_utils.py:419] 2026-04-18 03:08:48,549 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-310/config.json
[INFO|configuration_utils.py:911] 2026-04-18 03:08:48,558 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-310/generation_config.json
[INFO|modeling_utils.py:3580] 2026-04-18 03:09:43,551 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-310/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2510] 2026-04-18 03:09:43,576 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-310/tokenizer_config.json
[INFO|tokenization_utils_base.py:2519] 2026-04-18 03:09:43,586 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/checkpoint-310/special_tokens_map.json
[INFO|trainer.py:2681] 2026-04-18 03:13:11,518 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


                                                                                                                                                         {'train_runtime': 974.833, 'train_samples_per_second': 20.339, 'train_steps_per_second': 0.318, 'train_loss': 1.0778046692571333, 'epoch': 1.0}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 310/310 [16:06<00:00,  1.28s/it]100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 310/310 [16:06<00:00,  3.12s/it]
***** train metrics *****
  epoch                    =         1.0
  total_flos               = 100949701GF
  train_loss               =      1.0778
  train_runtime            =  0:16:14.83
  train_samples            =       43598
  train_samples_per_second =      20.339
  train_steps_per_second   =       0.318
2026-04-18 03:13:11 - INFO - __main__ - *** Save model ***
[INFO|configuration_utils.py:419] 2026-04-18 03:13:27,201 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/config.json
[INFO|configuration_utils.py:911] 2026-04-18 03:13:27,208 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/generation_config.json
[INFO|modeling_utils.py:3580] 2026-04-18 03:14:20,249 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2510] 2026-04-18 03:14:20,266 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/tokenizer_config.json
[INFO|tokenization_utils_base.py:2519] 2026-04-18 03:14:20,275 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/special_tokens_map.json
2026-04-18 03:14:20 - INFO - __main__ - Saved HF-compatible model artifacts to /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332
2026-04-18 03:14:20 - INFO - __main__ - Saved validated HF-compatible model artifacts to /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332
[INFO|modelcard.py:450] 2026-04-18 03:14:20,406 >> Dropping the following result as it does not have all the necessary fields:
{'dataset': {'name': 'Anthropic/hh-rlhf', 'type': 'Anthropic/hh-rlhf', 'config': 'default', 'split': 'train', 'args': 'default'}}
[INFO|configuration_utils.py:419] 2026-04-18 03:14:20,419 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332/config.json
2026-04-18 03:14:20 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:4307] 2026-04-18 03:14:20,420 >> 
***** Running Evaluation *****
[INFO|trainer.py:4309] 2026-04-18 03:14:20,420 >>   Num examples = 1074
[INFO|trainer.py:4312] 2026-04-18 03:14:20,421 >>   Batch size = 8
  0%|                                                                                                                             | 0/34 [00:00<?, ?it/s]  6%|██████▉                                                                                                              | 2/34 [00:00<00:02, 13.01it/s] 12%|█████████████▊                                                                                                       | 4/34 [00:00<00:03,  8.17it/s] 15%|█████████████████▏                                                                                                   | 5/34 [00:00<00:03,  7.51it/s] 18%|████████████████████▋                                                                                                | 6/34 [00:00<00:03,  7.02it/s] 21%|████████████████████████                                                                                             | 7/34 [00:00<00:03,  6.81it/s] 24%|███████████████████████████▌                                                                                         | 8/34 [00:01<00:03,  6.64it/s] 26%|██████████████████████████████▉                                                                                      | 9/34 [00:01<00:03,  6.54it/s] 29%|██████████████████████████████████                                                                                  | 10/34 [00:01<00:03,  6.50it/s] 32%|█████████████████████████████████████▌                                                                              | 11/34 [00:01<00:03,  6.47it/s] 35%|████████████████████████████████████████▉                                                                           | 12/34 [00:01<00:03,  6.36it/s] 38%|████████████████████████████████████████████▎                                                                       | 13/34 [00:01<00:03,  6.38it/s] 41%|███████████████████████████████████████████████▊                                                                    | 14/34 [00:02<00:03,  6.36it/s] 44%|███████████████████████████████████████████████████▏                                                                | 15/34 [00:02<00:02,  6.36it/s] 47%|██████████████████████████████████████████████████████▌                                                             | 16/34 [00:02<00:02,  6.39it/s] 50%|██████████████████████████████████████████████████████████                                                          | 17/34 [00:02<00:02,  6.38it/s] 53%|█████████████████████████████████████████████████████████████▍                                                      | 18/34 [00:02<00:02,  6.31it/s] 56%|████████████████████████████████████████████████████████████████▊                                                   | 19/34 [00:02<00:02,  6.25it/s] 59%|████████████████████████████████████████████████████████████████████▏                                               | 20/34 [00:03<00:02,  6.26it/s] 62%|███████████████████████████████████████████████████████████████████████▋                                            | 21/34 [00:03<00:02,  6.26it/s] 65%|███████████████████████████████████████████████████████████████████████████                                         | 22/34 [00:03<00:01,  6.28it/s] 68%|██████████████████████████████████████████████████████████████████████████████▍                                     | 23/34 [00:03<00:01,  6.29it/s] 71%|█████████████████████████████████████████████████████████████████████████████████▉                                  | 24/34 [00:03<00:01,  6.28it/s] 74%|█████████████████████████████████████████████████████████████████████████████████████▎                              | 25/34 [00:03<00:01,  6.28it/s] 76%|████████████████████████████████████████████████████████████████████████████████████████▋                           | 26/34 [00:03<00:01,  6.24it/s] 79%|████████████████████████████████████████████████████████████████████████████████████████████                        | 27/34 [00:04<00:01,  6.21it/s] 82%|███████████████████████████████████████████████████████████████████████████████████████████████▌                    | 28/34 [00:04<00:00,  6.21it/s] 85%|██████████████████████████████████████████████████████████████████████████████████████████████████▉                 | 29/34 [00:04<00:00,  6.26it/s] 88%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎             | 30/34 [00:04<00:00,  6.32it/s] 91%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▊          | 31/34 [00:04<00:00,  6.33it/s] 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▏      | 32/34 [00:04<00:00,  6.33it/s] 97%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌   | 33/34 [00:05<00:00,  6.31it/s]100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.32it/s]100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:05<00:00,  6.49it/s]
***** eval metrics *****
  epoch                   =        1.0
  eval_loss               =     0.7899
  eval_runtime            = 0:00:05.38
  eval_samples            =       2339
  eval_samples_per_second =    199.326
  eval_steps_per_second   =       6.31
2026-04-18 03:14:25 - INFO - __main__ - *** Training complete ***
wandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.014 MB of 0.014 MB uploadedwandb: | 0.014 MB of 0.014 MB uploadedwandb: / 0.014 MB of 0.014 MB uploadedwandb: - 0.014 MB of 0.014 MB uploadedwandb: \ 0.043 MB of 0.085 MB uploadedwandb: | 0.058 MB of 0.086 MB uploadedwandb: / 0.058 MB of 0.086 MB uploadedwandb: - 0.058 MB of 0.086 MB uploadedwandb: \ 0.058 MB of 0.086 MB uploadedwandb: | 0.058 MB of 0.086 MB uploadedwandb: / 0.058 MB of 0.086 MB uploadedwandb: - 0.058 MB of 0.086 MB uploadedwandb: \ 0.058 MB of 0.086 MB uploadedwandb: | 0.058 MB of 0.086 MB uploadedwandb: / 0.058 MB of 0.086 MB uploadedwandb: - 0.058 MB of 0.086 MB uploadedwandb: \ 0.058 MB of 0.086 MB uploadedwandb: | 0.058 MB of 0.086 MB uploadedwandb: / 0.058 MB of 0.086 MB uploadedwandb: - 0.058 MB of 0.086 MB uploadedwandb: \ 0.058 MB of 0.086 MB uploadedwandb: | 0.058 MB of 0.086 MB uploadedwandb: / 0.084 MB of 0.086 MB uploadedwandb: - 0.084 MB of 0.086 MB uploadedwandb: \ 0.084 MB of 0.086 MB uploadedwandb: | 0.084 MB of 0.086 MB uploadedwandb: / 0.086 MB of 0.086 MB uploadedwandb: 
wandb: Run history:
wandb:               eval/loss █▃▁▁
wandb:            eval/runtime █▁█▁
wandb: eval/samples_per_second ▁█▁█
wandb:   eval/steps_per_second ▁█▁█
wandb:             train/epoch ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇█████
wandb:       train/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇█████
wandb:         train/grad_norm █▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:     train/learning_rate ▁▂▄▅███████▇▇▇▇▇▆▆▆▅▅▅▅▄▄▄▃▃▃▂▂▂▂▂▁▁▁▁▁▁
wandb:              train/loss █▇▅▅▅▅▅▅▅▄▄▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁
wandb: 
wandb: Run summary:
wandb:                eval/loss 0.7899
wandb:             eval/runtime 5.3882
wandb:  eval/samples_per_second 199.326
wandb:    eval/steps_per_second 6.31
wandb:               total_flos 1.0839391709062758e+17
wandb:              train/epoch 1.0
wandb:        train/global_step 310
wandb:          train/grad_norm 3.92364
wandb:      train/learning_rate 0.0
wandb:               train/loss 0.7865
wandb:               train_loss 1.0778
wandb:            train_runtime 974.833
wandb: train_samples_per_second 20.339
wandb:   train_steps_per_second 0.318
wandb: 
wandb: 🚀 View run mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332 at: https://wandb.ai/can-not-fand-northeastern-university/ood-run-4xh200/runs/jylo12f8
wandb: ⭐️ View project at: https://wandb.ai/can-not-fand-northeastern-university/ood-run-4xh200
wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
wandb: Find logs at: /scratch/feng.yulu/dynamic-dpo-v4/wandb/wandb/run-20260418_025659-jylo12f8/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.