ModelHub XC 9268b0b929 初始化项目,由ModelHub XC社区提供模型
Model: jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6
Source: Original Platform
2026-05-13 01:12:57 +08:00

library_name, license, base_model, tags, datasets, model-index
library_name license base_model tags datasets model-index
transformers apache-2.0 jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452
alignment-handbook
new-dpo
generated_from_trainer
Anthropic/hh-rlhf
name results
qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6

qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6

This model is a fine-tuned version of jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452 on the Anthropic/hh-rlhf dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5368
  • Fcm Dpo/beta: 0.0530
  • Margin Dpo/margin Mean: 11.7010
  • Margin Dpo/margin Std: 18.7863
  • Logps/chosen: -92.3002
  • Logps/rejected: -113.7958
  • Logps/ref Chosen: -86.9018
  • Logps/ref Rejected: -96.6964
  • Logits/chosen: 1.6311
  • Logits/rejected: 1.4933

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Fcm Dpo/beta Margin Dpo/margin Mean Margin Dpo/margin Std Logps/chosen Logps/rejected Logps/ref Chosen Logps/ref Rejected Logits/chosen Logits/rejected
1.3231 0.1512 100 0.6549 0.1000 0.8834 1.9512 -86.2078 -96.8858 -86.9018 -96.6964 1.6085 1.4983
1.1388 0.3023 200 0.5426 0.2115 2.9544 4.9177 -81.7111 -94.4601 -86.9018 -96.6964 1.6653 1.5454
1.1386 0.4535 300 0.5411 0.1198 5.2175 8.5703 -85.9098 -100.9220 -86.9018 -96.6964 1.5470 1.4237
1.209 0.6047 400 0.5380 0.0890 6.8529 11.1007 -85.3711 -102.0186 -86.9018 -96.6964 1.6431 1.5118
1.0608 0.7559 500 0.5388 0.0570 10.7654 17.4069 -90.8454 -111.4054 -86.9018 -96.6964 1.2932 1.1717
1.1399 0.9070 600 0.5368 0.0530 11.7010 18.7863 -92.3002 -113.7958 -86.9018 -96.6964 1.6311 1.4933

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.21.4
Description
Model synced from source: jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6
Readme 2.5 MiB