ModelHub XC 527c912d5f 初始化项目,由ModelHub XC社区提供模型
Model: jackf857/qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260424-025105
Source: Original Platform
2026-05-16 07:03:57 +08:00

library_name, base_model, tags, datasets, model-index
library_name base_model tags datasets model-index
transformers jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452
alignment-handbook
beta-dpo
generated_from_trainer
Anthropic/hh-rlhf
name results
qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260424-025105

qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260424-025105

This model is a fine-tuned version of jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452 on the Anthropic/hh-rlhf dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7256
  • Beta Dpo/gap Mean: 9.9202
  • Beta Dpo/gap Std: 18.3470
  • Beta Dpo/beta Used Raw: 0.1809
  • Beta Dpo/beta Used: 0.1995
  • Beta Dpo/mask Keep Frac: 1.0
  • Logits/chosen: 1.5449
  • Logits/rejected: 1.4137

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Beta Dpo/gap Mean Beta Dpo/gap Std Beta Dpo/beta Used Raw Beta Dpo/beta Used Beta Dpo/mask Keep Frac Logits/chosen Logits/rejected
1.3382 0.1512 100 0.6596 0.5103 1.3374 0.1093 0.1093 1.0 1.8052 1.6946
1.0452 0.3023 200 0.6042 5.1820 10.9635 0.1250 0.1302 1.0 1.6393 1.5121
1.1502 0.4535 300 0.6283 8.5454 15.3857 0.1243 0.1420 1.0 1.4622 1.3384
1.3806 0.6047 400 0.6464 9.9655 16.5703 0.1189 0.1405 1.0 1.4474 1.3215
1.3396 0.7559 500 0.6756 11.0204 19.5206 0.1269 0.1533 1.0 1.3984 1.2735
1.0636 0.9070 600 0.7256 9.9202 18.3470 0.1809 0.1995 1.0 1.5449 1.4137

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.21.4
Description
Model synced from source: jackf857/qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260424-025105
Readme 2.2 MiB