llama-3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260418-001920/README.md at a3cf714e1e6da74e6d57d9bc0a7f9d78fe655554

Files

ModelHub XC a3cf714e1e 初始化项目，由ModelHub XC社区提供模型

Model: W-61/llama-3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260418-001920
Source: Original Platform

2026-04-25 06:50:01 +08:00

library_name, base_model, tags, datasets, model-index

library_name

base_model

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Epsilon Dpo/beta	Epsilon Dpo/loss Margin Mean	Epsilon Dpo/beta Margin Mean	Epsilon Dpo/beta Margin Std	Epsilon Dpo/beta Margin Grad Mean	Epsilon Dpo/beta Margin Grad Std	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/chosen	Logps/rejected	Logps/ref Chosen	Logps/ref Rejected	Logits/chosen	Logits/rejected	Kl/p Epsilon Steps	Kl/n Epsilon Steps
0.9809	0.1468	100	0.5819	0.0612	10.0016	0.6045	1.2197	-0.3930	0.2078	-0.7143	-1.3188	0.6926	0.6045	-90.6611	-108.4097	-79.0510	-86.7979	-0.9442	-0.8901	0.6370	0.3617
0.7763	0.2937	200	0.5340	0.0331	25.3103	0.8303	1.4036	-0.3643	0.2158	-1.1029	-1.9332	0.7149	0.8303	-112.2424	-145.2996	-79.0510	-86.7979	-0.9108	-0.8737	0.6759	0.3232
0.7955	0.4405	300	0.4928	0.0169	54.8226	0.9212	1.3268	-0.3414	0.2134	-1.7179	-2.6391	0.7453	0.9212	-180.2318	-242.8013	-79.0510	-86.7979	-0.4338	-0.2913	0.7282	0.2710
0.6919	0.5874	400	0.5234	0.0084	77.8526	0.6459	0.9831	-0.3725	0.1852	-1.3445	-1.9904	0.7333	0.6459	-239.2477	-324.8472	-79.0510	-86.7979	0.0554	0.2763	0.7183	0.2808
0.9214	0.7342	500	0.5470	0.0042	108.2775	0.4514	0.6818	-0.4010	0.1444	-1.4275	-1.8789	0.7487	0.4514	-417.7782	-533.8026	-79.0510	-86.7979	0.5317	0.8735	0.7235	0.2753
1.0729	0.8811	600	0.5941	0.0022	124.0911	0.2718	0.4719	-0.4365	0.1083	-0.9073	-1.1791	0.7183	0.2718	-487.8939	-619.7319	-79.0510	-86.7979	0.9628	1.3823	0.6956	0.3022