初始化项目，由ModelHub XC社区提供模型

Model: W-61/mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64 Source: Original Platform
2026-04-22 11:00:42 +08:00
commit 097e33be47
20 changed files with 297992 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,85 @@
+---
+library_name: transformers
+base_model: mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332
+tags:
+- alignment-handbook
+- beta-dpo
+- generated_from_trainer
+datasets:
+- Anthropic/hh-rlhf
+model-index:
+- name: mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260418-015332
+  results: []
+---
+
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+
+# mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260418-015332
+
+This model is a fine-tuned version of [mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332](https://huggingface.co/mistral-7b-base-sft-hh-helpful-4xh200-batch-64-20260418-015332) on the Anthropic/hh-rlhf dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.6015
+- Beta Dpo/beta: 0.0010
+- Beta Dpo/loss Margin Mean: 243.4043
+- Beta Dpo/beta Margin Mean: 0.2434
+- Beta Dpo/beta Margin Std: 0.4217
+- Beta Dpo/beta Margin Grad Mean: -0.4422
+- Beta Dpo/beta Margin Grad Std: 0.0983
+- Beta Dpo/gap Mean: 404.4037
+- Beta Dpo/gap Std: 357.4069
+- Beta Dpo/beta Used Raw: -9.5600
+- Beta Dpo/beta Used: 0.0010
+- Beta Dpo/mask Keep Frac: 1.0
+- Logits/chosen: -2.7813
+- Logits/rejected: -2.8108
+
+## Model description
+
+More information needed
+
+## Intended uses & limitations
+
+More information needed
+
+## Training and evaluation data
+
+More information needed
+
+## Training procedure
+
+### Training hyperparameters
+
+The following hyperparameters were used during training:
+- learning_rate: 5e-07
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 64
+- total_eval_batch_size: 32
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+
+### Training results
+
+| Training Loss | Epoch  | Step | Validation Loss | Beta Dpo/beta | Beta Dpo/loss Margin Mean | Beta Dpo/beta Margin Mean | Beta Dpo/beta Margin Std | Beta Dpo/beta Margin Grad Mean | Beta Dpo/beta Margin Grad Std | Beta Dpo/gap Mean | Beta Dpo/gap Std | Beta Dpo/beta Used Raw | Beta Dpo/beta Used | Beta Dpo/mask Keep Frac | Logits/chosen | Logits/rejected |
+|:-------------:|:------:|:----:|:---------------:|:-------------:|:-------------------------:|:-------------------------:|:------------------------:|:------------------------------:|:-----------------------------:|:-----------------:|:----------------:|:----------------------:|:------------------:|:-----------------------:|:-------------:|:---------------:|
+| 1.3346        | 0.1468 | 100  | 0.7825          | 0.0211        | 38.6966                   | 1.4685                    | 2.0475                   | -0.4727                        | 0.0403                        | 60.6513           | 63.8526          | -1.2173                | 0.0211             | 1.0                     | -2.9129       | -2.9033         |
+| 1.265         | 0.2937 | 200  | 1.2116          | 0.0416        | 108.9061                  | 8.0746                    | 10.4591                  | -0.4594                        | 0.0608                        | 175.9197          | 183.7102         | -3.9208                | 0.0416             | 1.0                     | -2.3116       | -2.3059         |
+| 0.5857        | 0.4405 | 300  | 0.6708          | 0.0032        | 165.3890                  | 0.8039                    | 1.0106                   | -0.4553                        | 0.0715                        | 284.4015          | 265.4041         | -7.0408                | 0.0032             | 1.0                     | -2.3951       | -2.3756         |
+| 3.7878        | 0.5874 | 400  | 0.6122          | 0.0010        | 205.4126                  | 0.2054                    | 0.3571                   | -0.4506                        | 0.0845                        | 362.1024          | 333.2912         | -9.3014                | 0.0010             | 1.0                     | -2.4431       | -2.4332         |
+| 6.7444        | 0.7342 | 500  | 0.6026          | 0.0010        | 233.9227                  | 0.2339                    | 0.3910                   | -0.4441                        | 0.0919                        | 390.5113          | 345.8571         | -9.2953                | 0.0010             | 1.0                     | -2.6421       | -2.6564         |
+| 0.5388        | 0.8811 | 600  | 0.6015          | 0.0010        | 243.4043                  | 0.2434                    | 0.4217                   | -0.4422                        | 0.0983                        | 404.4037          | 357.4069         | -9.5600                | 0.0010             | 1.0                     | -2.7813       | -2.8108         |
+
+
+### Framework versions
+
+- Transformers 4.51.0
+- Pytorch 2.3.1+cu121
+- Datasets 2.21.0
+- Tokenizers 0.21.4