llama-3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64-20260418-003215/README.md at main

Files

ModelHub XC 121c6f2962 初始化项目，由ModelHub XC社区提供模型

Model: W-61/llama-3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64-20260418-003215
Source: Original Platform

2026-06-13 14:08:31 +08:00

library_name, base_model, tags, datasets, model-index

library_name

base_model

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Epsilon Dpo/beta	Epsilon Dpo/loss Margin Mean	Epsilon Dpo/beta Margin Mean	Epsilon Dpo/beta Margin Std	Epsilon Dpo/beta Margin Grad Mean	Epsilon Dpo/beta Margin Grad Std	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/chosen	Logps/rejected	Logps/ref Chosen	Logps/ref Rejected	Logits/chosen	Logits/rejected	Kl/p Epsilon Steps	Kl/n Epsilon Steps
1.3305	0.1512	100	0.6591	0.0821	1.4524	0.1163	0.4195	-0.4725	0.0969	-0.4073	-0.5236	0.6298	0.1163	-79.8024	-85.9443	-74.8595	-79.5490	-0.2133	-0.2955	0.6109	0.3882
0.9619	0.3023	200	0.5464	0.0537	12.5090	0.6642	1.1185	-0.3726	0.2095	-1.0244	-1.6886	0.7183	0.6642	-93.8833	-111.0817	-74.8595	-79.5490	-0.1899	-0.3034	0.7240	0.2751
1.0235	0.4535	300	0.5323	0.0324	22.9568	0.7358	1.1722	-0.3618	0.2142	-1.6265	-2.3623	0.7284	0.7358	-124.9662	-152.6126	-74.8595	-79.5490	-0.0481	-0.1742	0.7293	0.2698
1.132	0.6047	400	0.5402	0.0198	30.5708	0.5994	0.9731	-0.3799	0.1895	-1.2023	-1.8017	0.7302	0.5994	-135.4181	-170.6785	-74.8595	-79.5490	-0.0432	-0.1614	0.7293	0.2698
1.0834	0.7559	500	0.5507	0.0121	40.9401	0.4894	0.7940	-0.3952	0.1670	-1.0111	-1.5005	0.7210	0.4894	-158.3591	-203.9887	-74.8595	-79.5490	0.0215	-0.0872	0.7227	0.2764
1.1666	0.9070	600	0.5778	0.0075	44.6202	0.3297	0.5411	-0.4237	0.1239	-0.6980	-1.0277	0.7192	0.3297	-168.1115	-217.4212	-74.8595	-79.5490	0.0396	-0.0641	0.7196	0.2799