--- library_name: transformers base_model: W-61/llama-3-8b-base-sft-hh-harmless-4xh200 tags: - alignment-handbook - new-dpo - generated_from_trainer datasets: - Anthropic/hh-rlhf model-index: - name: llama-3-8b-base-new-dpo-ultrafeedback-4xh200 results: [] --- # llama-3-8b-base-new-dpo-ultrafeedback-4xh200 This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-hh-harmless-4xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-hh-harmless-4xh200) on the Anthropic/hh-rlhf dataset. It achieves the following results on the evaluation set: - Loss: 0.5214 - Fcm Dpo/beta: 0.0836 - Fcm Dpo/q T: 0.3380 - Fcm Dpo/delta: -0.0050 - Fcm Dpo/margin: 11.8756 - Margin Dpo/margin Mean: 11.8756 - Margin Dpo/margin Std: 18.3875 - Logps/chosen: -96.2474 - Logps/rejected: -113.1114 - Logps/ref Chosen: -75.8693 - Logps/ref Rejected: -80.8577 - Logits/chosen: 0.3597 - Logits/rejected: 0.3046 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - total_eval_batch_size: 32 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Fcm Dpo/q T | Fcm Dpo/delta | Fcm Dpo/margin | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected | |:-------------:|:------:|:----:|:---------------:|:------------:|:-----------:|:-------------:|:--------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:| | 1.0502 | 0.3023 | 200 | 0.5717 | 0.0936 | 0.3525 | 0.0185 | 10.3675 | 10.3675 | 18.4204 | -87.9072 | -103.2631 | -75.8693 | -80.8577 | 0.4369 | 0.3875 | | 1.0126 | 0.6047 | 400 | 0.5364 | 0.1033 | 0.3417 | 0.0098 | 9.4748 | 9.4748 | 15.2862 | -92.9376 | -107.4008 | -75.8693 | -80.8577 | 0.3564 | 0.3008 | | 1.0944 | 0.9070 | 600 | 0.5214 | 0.0836 | 0.3380 | -0.0050 | 11.8756 | 11.8756 | 18.3875 | -96.2474 | -113.1114 | -75.8693 | -80.8577 | 0.3597 | 0.3046 | ### Framework versions - Transformers 4.51.0 - Pytorch 2.3.1+cu121 - Datasets 2.21.0 - Tokenizers 0.21.4