---
library_name: transformers
base_model: mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332
tags:
- alignment-handbook
- beta-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260418-015332
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260418-015332

This model is a fine-tuned version of [mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332](https://huggingface.co/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6015
- Beta Dpo/beta: 0.0010
- Beta Dpo/loss Margin Mean: 243.4043
- Beta Dpo/beta Margin Mean: 0.2434
- Beta Dpo/beta Margin Std: 0.4217
- Beta Dpo/beta Margin Grad Mean: -0.4422
- Beta Dpo/beta Margin Grad Std: 0.0983
- Beta Dpo/gap Mean: 404.4037
- Beta Dpo/gap Std: 357.4069
- Beta Dpo/beta Used Raw: -9.5600
- Beta Dpo/beta Used: 0.0010
- Beta Dpo/mask Keep Frac: 1.0
- Logits/chosen: -2.7813
- Logits/rejected: -2.8108

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Beta Dpo/beta | Beta Dpo/loss Margin Mean | Beta Dpo/beta Margin Mean | Beta Dpo/beta Margin Std | Beta Dpo/beta Margin Grad Mean | Beta Dpo/beta Margin Grad Std | Beta Dpo/gap Mean | Beta Dpo/gap Std | Beta Dpo/beta Used Raw | Beta Dpo/beta Used | Beta Dpo/mask Keep Frac | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:-------------:|:-------------------------:|:-------------------------:|:------------------------:|:------------------------------:|:-----------------------------:|:-----------------:|:----------------:|:----------------------:|:------------------:|:-----------------------:|:-------------:|:---------------:|
| 1.3346        | 0.1468 | 100  | 0.7825          | 0.0211        | 38.6966                   | 1.4685                    | 2.0475                   | -0.4727                        | 0.0403                        | 60.6513           | 63.8526          | -1.2173                | 0.0211             | 1.0                     | -2.9129       | -2.9033         |
| 1.265         | 0.2937 | 200  | 1.2116          | 0.0416        | 108.9061                  | 8.0746                    | 10.4591                  | -0.4594                        | 0.0608                        | 175.9197          | 183.7102         | -3.9208                | 0.0416             | 1.0                     | -2.3116       | -2.3059         |
| 0.5857        | 0.4405 | 300  | 0.6708          | 0.0032        | 165.3890                  | 0.8039                    | 1.0106                   | -0.4553                        | 0.0715                        | 284.4015          | 265.4041         | -7.0408                | 0.0032             | 1.0                     | -2.3951       | -2.3756         |
| 3.7878        | 0.5874 | 400  | 0.6122          | 0.0010        | 205.4126                  | 0.2054                    | 0.3571                   | -0.4506                        | 0.0845                        | 362.1024          | 333.2912         | -9.3014                | 0.0010             | 1.0                     | -2.4431       | -2.4332         |
| 6.7444        | 0.7342 | 500  | 0.6026          | 0.0010        | 233.9227                  | 0.2339                    | 0.3910                   | -0.4441                        | 0.0919                        | 390.5113          | 345.8571         | -9.2953                | 0.0010             | 1.0                     | -2.6421       | -2.6564         |
| 0.5388        | 0.8811 | 600  | 0.6015          | 0.0010        | 243.4043                  | 0.2434                    | 0.4217                   | -0.4422                        | 0.0983                        | 404.4037          | 357.4069         | -9.5600                | 0.0010             | 1.0                     | -2.7813       | -2.8108         |


### Framework versions

- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4