2026-04-16 16:21:35 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, 16-bits training: False 2026-04-16 16:21:35 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1 distributed training: True, 16-bits training: False 2026-04-16 16:21:35 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1 distributed training: True, 16-bits training: False 2026-04-16 16:21:35 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False 2026-04-16 16:21:35 - INFO - __main__ - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='/scratch/feng.yulu/dynamic-dpo-v4/base_models/Meta-Llama-3-8B', model_revision='main', model_code_revision=None, torch_dtype='bfloat16', tokenizer_name_or_path=None, trust_remote_code=False, attn_implementation='flash_attention_2', use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False, bnb_4bit_quant_storage='uint8') 2026-04-16 16:21:35 - INFO - __main__ - Data parameters DataArguments(chat_template="{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}", dataset_mixer={'Anthropic/hh-rlhf': 1.0}, text_column='text', dataset_splits=['train', 'test'], dataset_configs=['helpful-base'], dataset_dir=None, preprocessing_num_workers=12, use_persistent_hf_cache=False, hf_cache_dir=None, truncation_side=None, auto_insert_empty_system_msg=True, preprocessing_log_samples=0, preprocessing_log_dir=None) 2026-04-16 16:21:35 - INFO - __main__ - Training/evaluation parameters SFTConfig( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, chars_per_token=, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, dataset_batch_size=1000, dataset_kwargs=None, dataset_num_proc=None, dataset_text_field=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_packing=None, eval_steps=100, eval_strategy=IntervalStrategy.STEPS, eval_use_gather_object=False, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=True, gradient_checkpointing_kwargs={'use_reentrant': False}, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=W-61/llama-3-8b-base-sft-hh-helpful-4xh200, hub_model_revision=main, hub_private_repo=None, hub_strategy=HubStrategy.END, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=info, log_level_replica=warning, log_on_each_node=True, logging_dir=outputs/llama-3-8b-base-sft-hh-helpful-4xh200/runs/Apr16_16-21-35_d4054, logging_first_step=True, logging_nan_inf_filter=True, logging_steps=5, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_seq_length=512, max_steps=-1, metric_for_best_model=None, model_init_kwargs=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_of_sequences=1024, num_train_epochs=1, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, optim_target_modules=None, output_dir=/scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101, overwrite_output_dir=True, packing=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=200, save_strategy=SaveStrategy.STEPS, save_total_limit=2, seed=42, skip_memory_metrics=True, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tp_size=0, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.1, warmup_steps=0, weight_decay=0.0, ) No config specified, defaulting to the single config: hh-rlhf/default 2026-04-16 16:21:36 - INFO - datasets.builder - No config specified, defaulting to the single config: hh-rlhf/default Using custom data configuration default-cfba128a0ab1b99f 2026-04-16 16:21:36 - INFO - datasets.builder - Using custom data configuration default-cfba128a0ab1b99f Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/json 2026-04-16 16:21:36 - INFO - datasets.info - Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/json Overwrite dataset info from restored data version if exists. 2026-04-16 16:21:36 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa 2026-04-16 16:21:36 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa Found cached dataset hh-rlhf (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa) 2026-04-16 16:21:36 - INFO - datasets.builder - Found cached dataset hh-rlhf (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa) Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa 2026-04-16 16:21:36 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa 2026-04-16 16:21:38 - WARNING - alignment.data - Dropped 237 non-canonical HH preference examples from split `train` before normalization (126 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 111 x HH chosen/rejected transcripts must each contain a divergent assistant response.). Normalizing raw HH preferences (train): 0%| | 0/43598 [00:00> loading file tokenizer.json [INFO|tokenization_utils_base.py:2058] 2026-04-16 16:21:43,179 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2058] 2026-04-16 16:21:43,179 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2058] 2026-04-16 16:21:43,179 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2058] 2026-04-16 16:21:43,179 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2058] 2026-04-16 16:21:43,179 >> loading file chat_template.jinja Normalizing raw HH preferences (test): 100%|██████████| 2339/2339 [00:00<00:00, 10916.03 examples/s] [INFO|tokenization_utils_base.py:2323] 2026-04-16 16:21:43,499 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2026-04-16 16:21:43 - INFO - __main__ - *** Load pretrained model *** Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00000_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00000_of_00012.arrow Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00001_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00001_of_00012.arrow Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00002_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00002_of_00012.arrow Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00003_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00003_of_00012.arrow Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00004_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00004_of_00012.arrow Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00005_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00005_of_00012.arrow Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00006_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00006_of_00012.arrow Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00007_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00007_of_00012.arrow Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00008_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00008_of_00012.arrow Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00009_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00009_of_00012.arrow Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00010_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00010_of_00012.arrow Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00011_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_00011_of_00012.arrow Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_*_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d3917bc8eb716f92_*_of_00012.arrow Concatenating 12 shards 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Concatenating 12 shards Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00000_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #0 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00000_of_00012.arrow Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00001_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #1 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00001_of_00012.arrow Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00002_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #2 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00002_of_00012.arrow Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00003_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #3 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00003_of_00012.arrow Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00004_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #4 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00004_of_00012.arrow Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00005_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #5 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00005_of_00012.arrow Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00006_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #6 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00006_of_00012.arrow Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00007_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #7 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00007_of_00012.arrow Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00008_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #8 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00008_of_00012.arrow Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00009_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #9 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00009_of_00012.arrow Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00010_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #10 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00010_of_00012.arrow Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00011_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Process #11 will write at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_00011_of_00012.arrow Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_*_of_00012.arrow 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-0f820217b8a8b27e_*_of_00012.arrow Concatenating 12 shards 2026-04-16 16:21:43 - INFO - datasets.arrow_dataset - Concatenating 12 shards Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d924339979854155.arrow 2026-04-16 16:21:44 - INFO - datasets.arrow_dataset - Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-d924339979854155.arrow Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-45f8b5bf4d93c54f.arrow 2026-04-16 16:21:44 - INFO - datasets.arrow_dataset - Loading cached processed dataset at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/Anthropic___hh-rlhf/default-cfba128a0ab1b99f/0.0.0/09be8c5bbc57cb3887f3a9732ad6aa7ec602a1fa/cache-45f8b5bf4d93c54f.arrow 2026-04-16 16:21:44 - INFO - __main__ - Decontaminated 0 (0.00%) samples from the training set. 2026-04-16 16:21:44 - INFO - __main__ - Sample 41905 of the processed training set: <|begin_of_text|><|start_header_id|>user<|end_header_id|> What is the best way to clean my refrigerator?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Well, I have a few ideas, but the best way to know what works best is to run some tests and then analyze the data to figure out which technique works best.<|eot_id|><|start_header_id|>user<|end_header_id|> What are your ideas?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Some of my ideas include wiping down the interior with a rag, using bleach and water to wipe down the door, emptying the contents of the fridge to scrub the shelves with a clean cloth, cleaning the interior of the fridge with vinegar and water, and using a vacuum cleaner to clean the interior.<|eot_id|> 2026-04-16 16:21:44 - INFO - __main__ - Sample 7296 of the processed training set: <|begin_of_text|><|start_header_id|>user<|end_header_id|> How can I get blood out of cotton?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Blood? What do you mean?<|eot_id|> 2026-04-16 16:21:44 - INFO - __main__ - Sample 1639 of the processed training set: <|begin_of_text|><|start_header_id|>user<|end_header_id|> I'm trying to get my friend to open up<|eot_id|><|start_header_id|>assistant<|end_header_id|> Sounds great! What have you tried so far?<|eot_id|><|start_header_id|>user<|end_header_id|> I've tried to give my friend a safe space to open up about their feelings but they still aren't comfortable<|eot_id|><|start_header_id|>assistant<|end_header_id|> It’s really important that people feel safe sharing their feelings, but I totally hear that this isn’t easy for them. And in my experience, that has a lot to do with them feeling that you can accept them and what they’re saying, even if they’re feeling difficult or uncomfortable feelings. Do you think you’ve done that, and that they still haven’t opened up?<|eot_id|> /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'. Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead. warnings.warn(message, FutureWarning) /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'. Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead. warnings.warn(message, FutureWarning) /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'. Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead. warnings.warn(message, FutureWarning) /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, dataset_text_field, max_seq_length, packing. Will not be supported from version '1.0.0'. Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead. warnings.warn(message, FutureWarning) /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:158: UserWarning: You passed `model_init_kwargs` to the SFTTrainer, the value you passed will override the one in the `SFTConfig`. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:185: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you. warnings.warn( [INFO|configuration_utils.py:691] 2026-04-16 16:21:45,743 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/base_models/Meta-Llama-3-8B/config.json [INFO|configuration_utils.py:765] 2026-04-16 16:21:45,744 >> Model config LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128001, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.51.0", "use_cache": false, "vocab_size": 128256 } [WARNING|logging.py:328] 2026-04-16 16:21:45,759 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [WARNING|logging.py:328] 2026-04-16 16:21:45,759 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [INFO|modeling_utils.py:1121] 2026-04-16 16:21:45,759 >> loading weights file /scratch/feng.yulu/dynamic-dpo-v4/base_models/Meta-Llama-3-8B/model.safetensors.index.json [INFO|modeling_utils.py:2167] 2026-04-16 16:21:45,760 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [WARNING|logging.py:328] 2026-04-16 16:21:45,763 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [WARNING|logging.py:328] 2026-04-16 16:21:45,763 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [INFO|configuration_utils.py:1142] 2026-04-16 16:21:45,764 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001, "use_cache": false } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4934] 2026-04-16 16:21:46,477 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch/feng.yulu/dynamic-dpo-v4/base_models/Meta-Llama-3-8B. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:1095] 2026-04-16 16:21:46,479 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/base_models/Meta-Llama-3-8B/generation_config.json [INFO|configuration_utils.py:1142] 2026-04-16 16:21:46,480 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": 128001, "max_length": 4096, "temperature": 0.6, "top_p": 0.9 } /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:195: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`. warnings.warn( Using custom data configuration default-39b52f6e03e85a82 2026-04-16 16:21:46 - INFO - datasets.builder - Using custom data configuration default-39b52f6e03e85a82 Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator 2026-04-16 16:21:46 - INFO - datasets.info - Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator Overwrite dataset info from restored data version if exists. 2026-04-16 16:21:46 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39b52f6e03e85a82/0.0.0 2026-04-16 16:21:46 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39b52f6e03e85a82/0.0.0 Found cached dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39b52f6e03e85a82/0.0.0) 2026-04-16 16:21:46 - INFO - datasets.builder - Found cached dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39b52f6e03e85a82/0.0.0) Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39b52f6e03e85a82/0.0.0 2026-04-16 16:21:46 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-39b52f6e03e85a82/0.0.0 Using custom data configuration default-1519231937de8df3 2026-04-16 16:21:46 - INFO - datasets.builder - Using custom data configuration default-1519231937de8df3 Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator 2026-04-16 16:21:46 - INFO - datasets.info - Loading Dataset Infos from /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/datasets/packaged_modules/generator Overwrite dataset info from restored data version if exists. 2026-04-16 16:21:46 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-1519231937de8df3/0.0.0 2026-04-16 16:21:46 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-1519231937de8df3/0.0.0 Found cached dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-1519231937de8df3/0.0.0) 2026-04-16 16:21:46 - INFO - datasets.builder - Found cached dataset generator (/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-1519231937de8df3/0.0.0) Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-1519231937de8df3/0.0.0 2026-04-16 16:21:46 - INFO - datasets.info - Loading Dataset info from /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets/generator/default-1519231937de8df3/0.0.0 /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead. super().__init__( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead. super().__init__( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead. super().__init__( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:412: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `SFTTrainer.__init__`. Use `processing_class` instead. super().__init__( [INFO|trainer.py:748] 2026-04-16 16:21:49,099 >> Using auto half precision backend 2026-04-16 16:21:49 - INFO - __main__ - *** Train *** /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1557: UserWarning: Upcasted low precision parameters in LlamaForCausalLM because mixed precision turned on in FSDP. Affects: model.embed_tokens.weight, model.norm.weight, lm_head.weight. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1557: UserWarning: Upcasted low precision parameters in LlamaDecoderLayer because mixed precision turned on in FSDP. Affects: self_attn.q_proj.weight, self_attn.k_proj.weight, self_attn.v_proj.weight, self_attn.o_proj.weight, mlp.gate_proj.weight, mlp.up_proj.weight, mlp.down_proj.weight, input_layernorm.weight, post_attention_layernorm.weight. warnings.warn( /home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1563: UserWarning: FSDP upcast of low precision parameters may affect the precision of model checkpoints. warnings.warn( [INFO|trainer.py:2414] 2026-04-16 16:22:27,009 >> ***** Running training ***** [INFO|trainer.py:2415] 2026-04-16 16:22:27,009 >> Num examples = 16,516 [INFO|trainer.py:2416] 2026-04-16 16:22:27,009 >> Num Epochs = 1 [INFO|trainer.py:2417] 2026-04-16 16:22:27,009 >> Instantaneous batch size per device = 8 [INFO|trainer.py:2420] 2026-04-16 16:22:27,009 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:2421] 2026-04-16 16:22:27,009 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2422] 2026-04-16 16:22:27,009 >> Total optimization steps = 258 [INFO|trainer.py:2423] 2026-04-16 16:22:27,010 >> Number of trainable parameters = 2,007,565,312 [INFO|integration_utils.py:831] 2026-04-16 16:22:27,011 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: can-not-fand (can-not-fand-northeastern-university). Use `wandb login --relogin` to force relogin wandb: wandb version 0.26.0 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.5 wandb: Run data is saved locally in /scratch/feng.yulu/dynamic-dpo-v4/wandb/wandb/run-20260416_162228-ivik22vv wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101 wandb: ⭐️ View project at https://wandb.ai/can-not-fand-northeastern-university/huggingface wandb: 🚀 View run at https://wandb.ai/can-not-fand-northeastern-university/huggingface/runs/ivik22vv 0%| | 0/258 [00:00> ***** Running Evaluation ***** [INFO|trainer.py:4309] 2026-04-16 16:24:43,454 >> Num examples = 895 [INFO|trainer.py:4312] 2026-04-16 16:24:43,454 >> Batch size = 8 0%| | 0/28 [00:00> ***** Running Evaluation ***** [INFO|trainer.py:4309] 2026-04-16 16:26:57,969 >> Num examples = 895 [INFO|trainer.py:4312] 2026-04-16 16:26:57,969 >> Batch size = 8 0%| | 0/28 [00:00> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-200 [INFO|configuration_utils.py:419] 2026-04-16 16:27:21,904 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-200/config.json [INFO|configuration_utils.py:911] 2026-04-16 16:27:21,910 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-200/generation_config.json [INFO|modeling_utils.py:3580] 2026-04-16 16:28:10,583 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2510] 2026-04-16 16:28:10,589 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2519] 2026-04-16 16:28:10,592 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-200/special_tokens_map.json 78%|███████▊ | 201/258 [09:14<1:23:30, 87.91s/it] 78%|███████▊ | 202/258 [09:15<57:47, 61.91s/it] 79%|███████▊ | 203/258 [09:16<40:04, 43.71s/it] 79%|███████▉ | 204/258 [09:18<27:52, 30.97s/it] 79%|███████▉ | 205/258 [09:19<19:32, 22.11s/it] {'loss': 1.1818, 'grad_norm': 2.0174171924591064, 'learning_rate': 2.5564826243772965e-06, 'epoch': 0.79} 79%|███████▉ | 205/258 [09:19<19:32, 22.11s/it] 80%|███████▉ | 206/258 [09:21<13:47, 15.91s/it] 80%|████████ | 207/258 [09:22<09:49, 11.56s/it] 81%|████████ | 208/258 [09:23<07:03, 8.47s/it] 81%|████████ | 209/258 [09:24<05:08, 6.30s/it] 81%|████████▏ | 210/258 [09:26<03:49, 4.79s/it] {'loss': 1.1851, 'grad_norm': 2.1461687088012695, 'learning_rate': 2.1217454620337842e-06, 'epoch': 0.81} 81%|████████▏ | 210/258 [09:26<03:49, 4.79s/it] 82%|████████▏ | 211/258 [09:27<02:55, 3.73s/it] 82%|████████▏ | 212/258 [09:28<02:17, 2.99s/it] 83%|████████▎ | 213/258 [09:30<01:51, 2.47s/it] 83%|████████▎ | 214/258 [09:31<01:34, 2.16s/it] 83%|████████▎ | 215/258 [09:32<01:22, 1.92s/it] {'loss': 1.1799, 'grad_norm': 1.9792951345443726, 'learning_rate': 1.7231100184310955e-06, 'epoch': 0.83} 83%|████████▎ | 215/258 [09:32<01:22, 1.92s/it] 84%|████████▎ | 216/258 [09:34<01:12, 1.72s/it] 84%|████████▍ | 217/258 [09:35<01:04, 1.58s/it] 84%|████████▍ | 218/258 [09:36<00:59, 1.48s/it] 85%|████████▍ | 219/258 [09:37<00:55, 1.42s/it] 85%|████████▌ | 220/258 [09:39<00:52, 1.37s/it] {'loss': 1.1748, 'grad_norm': 2.1498570442199707, 'learning_rate': 1.3624030211261684e-06, 'epoch': 0.85} 85%|████████▌ | 220/258 [09:39<00:52, 1.37s/it] 86%|████████▌ | 221/258 [09:40<00:49, 1.34s/it] 86%|████████▌ | 222/258 [09:41<00:49, 1.37s/it] 86%|████████▋ | 223/258 [09:43<00:48, 1.38s/it] 87%|████████▋ | 224/258 [09:44<00:46, 1.38s/it] 87%|████████▋ | 225/258 [09:45<00:44, 1.34s/it] {'loss': 1.1529, 'grad_norm': 2.3734147548675537, 'learning_rate': 1.0412773924131202e-06, 'epoch': 0.87} 87%|████████▋ | 225/258 [09:45<00:44, 1.34s/it] 88%|████████▊ | 226/258 [09:47<00:42, 1.32s/it] 88%|████████▊ | 227/258 [09:48<00:40, 1.30s/it] 88%|████████▊ | 228/258 [09:49<00:38, 1.29s/it] 89%|████████▉ | 229/258 [09:50<00:37, 1.28s/it] 89%|████████▉ | 230/258 [09:52<00:35, 1.27s/it] {'loss': 1.1486, 'grad_norm': 21.268768310546875, 'learning_rate': 7.612046748871327e-07, 'epoch': 0.89} 89%|████████▉ | 230/258 [09:52<00:35, 1.27s/it] 90%|████████▉ | 231/258 [09:53<00:34, 1.27s/it] 90%|████████▉ | 232/258 [09:54<00:34, 1.31s/it] 90%|█████████ | 233/258 [09:56<00:33, 1.33s/it] 91%|█████████ | 234/258 [09:57<00:31, 1.31s/it] 91%|█████████ | 235/258 [09:58<00:30, 1.34s/it] {'loss': 1.1451, 'grad_norm': 2.0091605186462402, 'learning_rate': 5.234682881719766e-07, 'epoch': 0.91} 91%|█████████ | 235/258 [09:58<00:30, 1.34s/it] 91%|█████████▏| 236/258 [10:00<00:29, 1.32s/it] 92%|█████████▏| 237/258 [10:01<00:27, 1.30s/it] 92%|█████████▏| 238/258 [10:02<00:25, 1.29s/it] 93%|█████████▎| 239/258 [10:03<00:24, 1.28s/it] 93%|█████████▎| 240/258 [10:05<00:22, 1.27s/it] {'loss': 1.1681, 'grad_norm': 1.8940573930740356, 'learning_rate': 3.2915764771193294e-07, 'epoch': 0.93} 93%|█████████▎| 240/258 [10:05<00:22, 1.27s/it] 93%|█████████▎| 241/258 [10:06<00:22, 1.31s/it] 94%|█████████▍| 242/258 [10:07<00:21, 1.33s/it] 94%|█████████▍| 243/258 [10:09<00:19, 1.31s/it] 95%|█████████▍| 244/258 [10:10<00:18, 1.30s/it] 95%|█████████▍| 245/258 [10:11<00:16, 1.29s/it] {'loss': 1.1391, 'grad_norm': 2.0212926864624023, 'learning_rate': 1.791631725784404e-07, 'epoch': 0.95} 95%|█████████▍| 245/258 [10:11<00:16, 1.29s/it] 95%|█████████▌| 246/258 [10:13<00:15, 1.28s/it] 96%|█████████▌| 247/258 [10:14<00:14, 1.32s/it] 96%|█████████▌| 248/258 [10:15<00:13, 1.30s/it] 97%|█████████▋| 249/258 [10:16<00:11, 1.29s/it] 97%|█████████▋| 250/258 [10:18<00:10, 1.32s/it] {'loss': 1.1662, 'grad_norm': 2.1692233085632324, 'learning_rate': 7.4172205167945e-08, 'epoch': 0.97} 97%|█████████▋| 250/258 [10:18<00:10, 1.32s/it] 97%|█████████▋| 251/258 [10:19<00:09, 1.34s/it] 98%|█████████▊| 252/258 [10:20<00:07, 1.32s/it] 98%|█████████▊| 253/258 [10:22<00:06, 1.30s/it] 98%|█████████▊| 254/258 [10:23<00:05, 1.29s/it] 99%|█████████▉| 255/258 [10:24<00:03, 1.28s/it] {'loss': 1.163, 'grad_norm': 1.9714499711990356, 'learning_rate': 1.4665861488761813e-08, 'epoch': 0.99} 99%|█████████▉| 255/258 [10:24<00:03, 1.28s/it] 99%|█████████▉| 256/258 [10:26<00:02, 1.27s/it] 100%|█████████▉| 257/258 [10:27<00:01, 1.27s/it] 100%|██████████| 258/258 [10:28<00:00, 1.31s/it][INFO|trainer.py:3984] 2026-04-16 16:33:17,710 >> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-258 [INFO|configuration_utils.py:419] 2026-04-16 16:33:17,717 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-258/config.json [INFO|configuration_utils.py:911] 2026-04-16 16:33:17,725 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-258/generation_config.json [INFO|modeling_utils.py:3580] 2026-04-16 16:34:00,825 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-258/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2510] 2026-04-16 16:34:00,890 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-258/tokenizer_config.json [INFO|tokenization_utils_base.py:2519] 2026-04-16 16:34:00,904 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/checkpoint-258/special_tokens_map.json [INFO|trainer.py:2681] 2026-04-16 16:37:12,033 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 885.0234, 'train_samples_per_second': 18.662, 'train_steps_per_second': 0.292, 'train_loss': 1.5008004404777704, 'epoch': 1.0} 100%|██████████| 258/258 [14:38<00:00, 1.31s/it] 100%|██████████| 258/258 [14:38<00:00, 3.41s/it] ***** train metrics ***** epoch = 0.9981 total_flos = 88635435GF train_loss = 1.5008 train_runtime = 0:14:45.02 train_samples = 43598 train_samples_per_second = 18.662 train_steps_per_second = 0.292 2026-04-16 16:37:12 - INFO - __main__ - *** Save model *** [INFO|configuration_utils.py:419] 2026-04-16 16:37:29,519 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/config.json [INFO|configuration_utils.py:911] 2026-04-16 16:37:29,523 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/generation_config.json [INFO|modeling_utils.py:3580] 2026-04-16 16:38:16,222 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2510] 2026-04-16 16:38:16,229 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/tokenizer_config.json [INFO|tokenization_utils_base.py:2519] 2026-04-16 16:38:16,232 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/special_tokens_map.json 2026-04-16 16:38:16 - INFO - __main__ - Saved HF-compatible model artifacts to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101 2026-04-16 16:38:16 - INFO - __main__ - Saved validated HF-compatible model artifacts to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101 [INFO|modelcard.py:450] 2026-04-16 16:38:16,543 >> Dropping the following result as it does not have all the necessary fields: {'dataset': {'name': 'Anthropic/hh-rlhf', 'type': 'Anthropic/hh-rlhf', 'config': 'default', 'split': 'train', 'args': 'default'}} [INFO|configuration_utils.py:419] 2026-04-16 16:38:16,550 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/config.json 2026-04-16 16:38:16 - INFO - __main__ - *** Evaluate *** [INFO|trainer.py:4307] 2026-04-16 16:38:16,552 >> ***** Running Evaluation ***** [INFO|trainer.py:4309] 2026-04-16 16:38:16,552 >> Num examples = 895 [INFO|trainer.py:4312] 2026-04-16 16:38:16,552 >> Batch size = 8 0%| | 0/28 [00:00