llama-3-8b-base-margin-dpo-hh-harmless-batch-size-64/README.md at main

Files

ModelHub XC 2fc13c995f 初始化项目，由ModelHub XC社区提供模型

Model: jackf857/llama-3-8b-base-margin-dpo-hh-harmless-batch-size-64
Source: Original Platform

2026-04-21 14:03:06 +08:00

library_name, base_model, tags, datasets, model-index

library_name

base_model

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Margin Dpo/margin Mean	Margin Dpo/margin Std	Logps/chosen	Logps/rejected	Logps/ref Chosen	Logps/ref Rejected	Logits/chosen	Logits/rejected
1.3342	0.1512	100	0.6557	1.4205	4.9786	-79.7014	-85.8115	-74.8595	-79.5490	0.2556	0.2183
0.9165	0.3023	200	0.5447	7.4721	12.5600	-86.5507	-98.7123	-74.8595	-79.5490	0.3345	0.2868
0.9692	0.4535	300	0.5345	9.3794	14.9738	-93.1794	-107.2484	-74.8595	-79.5490	0.4017	0.3507
1.084	0.6047	400	0.5337	8.8635	14.3566	-91.2627	-104.8157	-74.8595	-79.5490	0.3912	0.3394
1.0037	0.7559	500	0.5277	9.5078	15.0672	-92.1725	-106.3698	-74.8595	-79.5490	0.3937	0.3419
1.0459	0.9070	600	0.5259	9.3649	14.8097	-92.0386	-106.0930	-74.8595	-79.5490	0.3798	0.3285