llama-3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260417-233539/README.md at 8162a32e06ecbfa264ef06fb83f6ba0a5e017774

Files

ModelHub XC 8162a32e06 初始化项目，由ModelHub XC社区提供模型

Model: W-61/llama-3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260417-233539
Source: Original Platform

2026-05-30 22:45:37 +08:00

library_name, base_model, tags, datasets, model-index

library_name

base_model

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Beta Dpo/beta	Beta Dpo/loss Margin Mean	Beta Dpo/beta Margin Mean	Beta Dpo/beta Margin Std	Beta Dpo/beta Margin Grad Mean	Beta Dpo/beta Margin Grad Std	Beta Dpo/gap Mean	Beta Dpo/gap Std	Beta Dpo/beta Used Raw	Beta Dpo/beta Used	Beta Dpo/mask Keep Frac	Logits/chosen	Logits/rejected
1.3014	0.1512	100	0.6391	0.1183	1.3224	0.1789	0.4749	-0.4595	0.1057	1.0180	3.3360	0.1183	0.1183	1.0	0.2572	0.2207
0.9318	0.3023	200	0.5939	0.0752	9.0802	0.8670	1.1566	-0.3975	0.1333	10.1709	15.2637	0.0346	0.0752	1.0	0.4250	0.3786
1.1289	0.4535	300	0.6938	0.1151	14.7143	2.1435	2.8181	-0.3684	0.1767	15.7264	23.2621	0.0393	0.1151	1.0	0.5036	0.4522
1.3777	0.6047	400	0.6486	0.0698	13.2713	1.2326	1.6702	-0.4066	0.1228	15.5137	23.8374	-0.0345	0.0698	1.0	0.4392	0.3876
1.1911	0.7559	500	0.6888	0.0936	16.0572	1.9620	2.5727	-0.3866	0.1471	17.9087	28.7161	-0.0111	0.0936	1.0	0.5027	0.4490
1.0347	0.9070	600	0.8203	0.1705	16.6192	3.5151	4.7567	-0.3392	0.2229	16.4768	28.4131	0.1085	0.1705	1.0	0.5021	0.4487