初始化项目,由ModelHub XC社区提供模型
Model: W-61/llama-3-8b-base-beta-dpo-ultrafeedback-8xh200 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
75
README.md
Normal file
75
README.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
library_name: transformers
|
||||
base_model: W-61/llama-3-8b-base-sft-ultrachat-8xh200
|
||||
tags:
|
||||
- alignment-handbook
|
||||
- beta-dpo
|
||||
- generated_from_trainer
|
||||
datasets:
|
||||
- HuggingFaceH4/ultrafeedback_binarized
|
||||
model-index:
|
||||
- name: llama-3-8b-base-beta-dpo-ultrafeedback-8xh200-20260410-201956
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# llama-3-8b-base-beta-dpo-ultrafeedback-8xh200-20260410-201956
|
||||
|
||||
This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-ultrachat-8xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-ultrachat-8xh200) on the HuggingFaceH4/ultrafeedback_binarized dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.7668
|
||||
- Beta Dpo/gap Mean: 15.9231
|
||||
- Beta Dpo/gap Std: 25.9660
|
||||
- Beta Dpo/beta Used Raw: 0.0986
|
||||
- Beta Dpo/beta Used: 0.1434
|
||||
- Beta Dpo/mask Keep Frac: 1.0
|
||||
- Logits/chosen: -0.8035
|
||||
- Logits/rejected: -0.7974
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-07
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 8
|
||||
- gradient_accumulation_steps: 2
|
||||
- total_train_batch_size: 128
|
||||
- total_eval_batch_size: 64
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 1
|
||||
|
||||
### Training results
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss | Beta Dpo/gap Mean | Beta Dpo/gap Std | Beta Dpo/beta Used Raw | Beta Dpo/beta Used | Beta Dpo/mask Keep Frac | Logits/chosen | Logits/rejected |
|
||||
|:-------------:|:------:|:----:|:---------------:|:-----------------:|:----------------:|:----------------------:|:------------------:|:-----------------------:|:-------------:|:---------------:|
|
||||
| 1.1971 | 0.4188 | 200 | 0.6549 | 11.0198 | 18.6390 | 0.0997 | 0.1243 | 1.0 | -0.7570 | -0.7553 |
|
||||
| 1.2165 | 0.8377 | 400 | 0.7668 | 15.9231 | 25.9660 | 0.0986 | 0.1434 | 1.0 | -0.8035 | -0.7974 |
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.51.0
|
||||
- Pytorch 2.3.1+cu121
|
||||
- Datasets 2.21.0
|
||||
- Tokenizers 0.21.4
|
||||
21
all_results.json
Normal file
21
all_results.json
Normal file
@@ -0,0 +1,21 @@
|
||||
{
|
||||
"epoch": 0.9989528795811519,
|
||||
"eval_beta_dpo/beta_used": 0.12717531621456146,
|
||||
"eval_beta_dpo/beta_used_raw": 0.07513566315174103,
|
||||
"eval_beta_dpo/gap_mean": 16.700754165649414,
|
||||
"eval_beta_dpo/gap_std": 26.765077590942383,
|
||||
"eval_beta_dpo/mask_keep_frac": 1.0,
|
||||
"eval_logits/chosen": -0.787127673625946,
|
||||
"eval_logits/rejected": -0.7806017398834229,
|
||||
"eval_loss": 0.7446804642677307,
|
||||
"eval_runtime": 50.7623,
|
||||
"eval_samples": 2000,
|
||||
"eval_samples_per_second": 39.399,
|
||||
"eval_steps_per_second": 0.63,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.1642480231431045,
|
||||
"train_runtime": 4421.8255,
|
||||
"train_samples": 61135,
|
||||
"train_samples_per_second": 13.826,
|
||||
"train_steps_per_second": 0.108
|
||||
}
|
||||
29
config.json
Normal file
29
config.json
Normal file
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": 128001,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 8192,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.51.0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
15
eval_results.json
Normal file
15
eval_results.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"epoch": 0.9989528795811519,
|
||||
"eval_beta_dpo/beta_used": 0.12717531621456146,
|
||||
"eval_beta_dpo/beta_used_raw": 0.07513566315174103,
|
||||
"eval_beta_dpo/gap_mean": 16.700754165649414,
|
||||
"eval_beta_dpo/gap_std": 26.765077590942383,
|
||||
"eval_beta_dpo/mask_keep_frac": 1.0,
|
||||
"eval_logits/chosen": -0.787127673625946,
|
||||
"eval_logits/rejected": -0.7806017398834229,
|
||||
"eval_loss": 0.7446804642677307,
|
||||
"eval_runtime": 50.7623,
|
||||
"eval_samples": 2000,
|
||||
"eval_samples_per_second": 39.399,
|
||||
"eval_steps_per_second": 0.63
|
||||
}
|
||||
9
generation_config.json
Normal file
9
generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"bos_token_id": 128000,
|
||||
"do_sample": true,
|
||||
"eos_token_id": 128001,
|
||||
"max_length": 4096,
|
||||
"temperature": 0.6,
|
||||
"top_p": 0.9,
|
||||
"transformers_version": "4.51.0"
|
||||
}
|
||||
3
model-00001-of-00007.safetensors
Normal file
3
model-00001-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e1f5405a2766a199009272559bbb01f5f9e0d3fb9940f7db55ea8d0f2319c598
|
||||
size 4886466168
|
||||
3
model-00002-of-00007.safetensors
Normal file
3
model-00002-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ce3c1dab648c3082b6b9c241a529cae74a0205922ddeb57d90c22ef0243fac48
|
||||
size 4832007448
|
||||
3
model-00003-of-00007.safetensors
Normal file
3
model-00003-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e8b5b731320e59310a04ec15014507960e2410a789e92e318b3ec81626291c14
|
||||
size 4999813112
|
||||
3
model-00004-of-00007.safetensors
Normal file
3
model-00004-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e3b2a8bc5c0ba0769ff15e9986a7487d193614bb43ee1d864ab5235db15fe03e
|
||||
size 4999813128
|
||||
3
model-00005-of-00007.safetensors
Normal file
3
model-00005-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3db7819b606dff84d64eb9c6d1fa0c998ac1e770998e316f19baaeb994c939cf
|
||||
size 4832007496
|
||||
3
model-00006-of-00007.safetensors
Normal file
3
model-00006-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:679d6e7fe50557638a1e8a8bd3a47eabd29457715e4ef25ba0c5290d6fb51e04
|
||||
size 4999813120
|
||||
3
model-00007-of-00007.safetensors
Normal file
3
model-00007-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:da55dc822e3555a1fbe768b07e77c091d3290c23f216152d23eea5ce38e85d1b
|
||||
size 2571158184
|
||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 32121044992
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00007-of-00007.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.norm.weight": "model-00007-of-00007.safetensors"
|
||||
}
|
||||
}
|
||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3c5cf44023714fb39b05e71e425f8d7b92805ff73f7988b083b8c87f0bf87393
|
||||
size 17209961
|
||||
2064
tokenizer_config.json
Normal file
2064
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 0.9989528795811519,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.1642480231431045,
|
||||
"train_runtime": 4421.8255,
|
||||
"train_samples": 61135,
|
||||
"train_samples_per_second": 13.826,
|
||||
"train_steps_per_second": 0.108
|
||||
}
|
||||
745
trainer_state.json
Normal file
745
trainer_state.json
Normal file
@@ -0,0 +1,745 @@
|
||||
{
|
||||
"best_global_step": null,
|
||||
"best_metric": null,
|
||||
"best_model_checkpoint": null,
|
||||
"epoch": 0.9989528795811519,
|
||||
"eval_steps": 200,
|
||||
"global_step": 477,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10024853050708771,
|
||||
"beta_dpo/beta_used_raw": 0.10024853050708771,
|
||||
"beta_dpo/gap_mean": -0.0031278375536203384,
|
||||
"beta_dpo/gap_std": 0.09185527265071869,
|
||||
"beta_dpo/mask_keep_frac": 0.75,
|
||||
"epoch": 0.0020942408376963353,
|
||||
"grad_norm": 80.06067657470703,
|
||||
"learning_rate": 0.0,
|
||||
"logits/chosen": -0.6103914976119995,
|
||||
"logits/rejected": -0.6099507808685303,
|
||||
"loss": 1.3869,
|
||||
"step": 1
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10045824944972992,
|
||||
"beta_dpo/beta_used_raw": 0.10045824944972992,
|
||||
"beta_dpo/gap_mean": 0.0029368107207119465,
|
||||
"beta_dpo/gap_std": 0.47314706444740295,
|
||||
"beta_dpo/mask_keep_frac": 0.7916666865348816,
|
||||
"epoch": 0.020942408376963352,
|
||||
"grad_norm": 72.42662811279297,
|
||||
"learning_rate": 9.375e-08,
|
||||
"logits/chosen": -0.6866854429244995,
|
||||
"logits/rejected": -0.668829083442688,
|
||||
"loss": 1.386,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10218687355518341,
|
||||
"beta_dpo/beta_used_raw": 0.10218687355518341,
|
||||
"beta_dpo/gap_mean": 0.05031166225671768,
|
||||
"beta_dpo/gap_std": 0.731455385684967,
|
||||
"beta_dpo/mask_keep_frac": 0.7749999761581421,
|
||||
"epoch": 0.041884816753926704,
|
||||
"grad_norm": 77.65188598632812,
|
||||
"learning_rate": 1.9791666666666664e-07,
|
||||
"logits/chosen": -0.6419292688369751,
|
||||
"logits/rejected": -0.6541769504547119,
|
||||
"loss": 1.3785,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10061170160770416,
|
||||
"beta_dpo/beta_used_raw": 0.10061170160770416,
|
||||
"beta_dpo/gap_mean": 0.0937122255563736,
|
||||
"beta_dpo/gap_std": 0.7656054496765137,
|
||||
"beta_dpo/mask_keep_frac": 0.8062499761581421,
|
||||
"epoch": 0.06282722513089005,
|
||||
"grad_norm": 74.03604125976562,
|
||||
"learning_rate": 3.020833333333333e-07,
|
||||
"logits/chosen": -0.6690393686294556,
|
||||
"logits/rejected": -0.6756961941719055,
|
||||
"loss": 1.3767,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10513947159051895,
|
||||
"beta_dpo/beta_used_raw": 0.10513947159051895,
|
||||
"beta_dpo/gap_mean": 0.3032568395137787,
|
||||
"beta_dpo/gap_std": 0.9986203908920288,
|
||||
"beta_dpo/mask_keep_frac": 0.856249988079071,
|
||||
"epoch": 0.08376963350785341,
|
||||
"grad_norm": 68.4834976196289,
|
||||
"learning_rate": 4.0625e-07,
|
||||
"logits/chosen": -0.6429699659347534,
|
||||
"logits/rejected": -0.6495934724807739,
|
||||
"loss": 1.3467,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.104192815721035,
|
||||
"beta_dpo/beta_used_raw": 0.104192815721035,
|
||||
"beta_dpo/gap_mean": 0.7923426032066345,
|
||||
"beta_dpo/gap_std": 1.8291547298431396,
|
||||
"beta_dpo/mask_keep_frac": 0.762499988079071,
|
||||
"epoch": 0.10471204188481675,
|
||||
"grad_norm": 71.59126281738281,
|
||||
"learning_rate": 4.999932966293553e-07,
|
||||
"logits/chosen": -0.7035672068595886,
|
||||
"logits/rejected": -0.7120343446731567,
|
||||
"loss": 1.3039,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10957477241754532,
|
||||
"beta_dpo/beta_used_raw": 0.10957477241754532,
|
||||
"beta_dpo/gap_mean": 1.5687782764434814,
|
||||
"beta_dpo/gap_std": 3.4623851776123047,
|
||||
"beta_dpo/mask_keep_frac": 0.84375,
|
||||
"epoch": 0.1256544502617801,
|
||||
"grad_norm": 82.82760620117188,
|
||||
"learning_rate": 4.991893270335525e-07,
|
||||
"logits/chosen": -0.6742374897003174,
|
||||
"logits/rejected": -0.6726926565170288,
|
||||
"loss": 1.2274,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10323189198970795,
|
||||
"beta_dpo/beta_used_raw": 0.10323189198970795,
|
||||
"beta_dpo/gap_mean": 2.4878456592559814,
|
||||
"beta_dpo/gap_std": 5.3841118812561035,
|
||||
"beta_dpo/mask_keep_frac": 0.737500011920929,
|
||||
"epoch": 0.14659685863874344,
|
||||
"grad_norm": 79.24715423583984,
|
||||
"learning_rate": 4.970496218214204e-07,
|
||||
"logits/chosen": -0.7053920030593872,
|
||||
"logits/rejected": -0.7138158679008484,
|
||||
"loss": 1.1847,
|
||||
"step": 70
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10442471504211426,
|
||||
"beta_dpo/beta_used_raw": 0.10279443114995956,
|
||||
"beta_dpo/gap_mean": 3.6363892555236816,
|
||||
"beta_dpo/gap_std": 7.359000205993652,
|
||||
"beta_dpo/mask_keep_frac": 0.831250011920929,
|
||||
"epoch": 0.16753926701570682,
|
||||
"grad_norm": 40.18954849243164,
|
||||
"learning_rate": 4.935856505068998e-07,
|
||||
"logits/chosen": -0.7026282548904419,
|
||||
"logits/rejected": -0.70656818151474,
|
||||
"loss": 1.1297,
|
||||
"step": 80
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.09297941625118256,
|
||||
"beta_dpo/beta_used_raw": 0.0927402526140213,
|
||||
"beta_dpo/gap_mean": 4.5779619216918945,
|
||||
"beta_dpo/gap_std": 9.087356567382812,
|
||||
"beta_dpo/mask_keep_frac": 0.7875000238418579,
|
||||
"epoch": 0.18848167539267016,
|
||||
"grad_norm": 45.59261703491211,
|
||||
"learning_rate": 4.8881598109976e-07,
|
||||
"logits/chosen": -0.6874291896820068,
|
||||
"logits/rejected": -0.7057452201843262,
|
||||
"loss": 1.1141,
|
||||
"step": 90
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10471361875534058,
|
||||
"beta_dpo/beta_used_raw": 0.10211487114429474,
|
||||
"beta_dpo/gap_mean": 5.183230400085449,
|
||||
"beta_dpo/gap_std": 10.404474258422852,
|
||||
"beta_dpo/mask_keep_frac": 0.8187500238418579,
|
||||
"epoch": 0.2094240837696335,
|
||||
"grad_norm": 66.85250854492188,
|
||||
"learning_rate": 4.827661805750437e-07,
|
||||
"logits/chosen": -0.6732321977615356,
|
||||
"logits/rejected": -0.6987311840057373,
|
||||
"loss": 1.1044,
|
||||
"step": 100
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.1166844591498375,
|
||||
"beta_dpo/beta_used_raw": 0.1166844591498375,
|
||||
"beta_dpo/gap_mean": 6.204737663269043,
|
||||
"beta_dpo/gap_std": 11.558156967163086,
|
||||
"beta_dpo/mask_keep_frac": 0.8062499761581421,
|
||||
"epoch": 0.23036649214659685,
|
||||
"grad_norm": 54.56244659423828,
|
||||
"learning_rate": 4.75468677825789e-07,
|
||||
"logits/chosen": -0.7261234521865845,
|
||||
"logits/rejected": -0.7450467348098755,
|
||||
"loss": 1.0282,
|
||||
"step": 110
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.08581940829753876,
|
||||
"beta_dpo/beta_used_raw": 0.0759856328368187,
|
||||
"beta_dpo/gap_mean": 6.777069091796875,
|
||||
"beta_dpo/gap_std": 12.461393356323242,
|
||||
"beta_dpo/mask_keep_frac": 0.800000011920929,
|
||||
"epoch": 0.2513089005235602,
|
||||
"grad_norm": 54.73094940185547,
|
||||
"learning_rate": 4.669625898336438e-07,
|
||||
"logits/chosen": -0.7630956768989563,
|
||||
"logits/rejected": -0.776543378829956,
|
||||
"loss": 1.1069,
|
||||
"step": 120
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10493312776088715,
|
||||
"beta_dpo/beta_used_raw": 0.09375782310962677,
|
||||
"beta_dpo/gap_mean": 7.0316290855407715,
|
||||
"beta_dpo/gap_std": 13.4308500289917,
|
||||
"beta_dpo/mask_keep_frac": 0.800000011920929,
|
||||
"epoch": 0.27225130890052357,
|
||||
"grad_norm": 53.551025390625,
|
||||
"learning_rate": 4.5729351198915705e-07,
|
||||
"logits/chosen": -0.7406284809112549,
|
||||
"logits/rejected": -0.7330573201179504,
|
||||
"loss": 1.091,
|
||||
"step": 130
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.0665307343006134,
|
||||
"beta_dpo/beta_used_raw": 0.04071963578462601,
|
||||
"beta_dpo/gap_mean": 7.776385307312012,
|
||||
"beta_dpo/gap_std": 14.402565002441406,
|
||||
"beta_dpo/mask_keep_frac": 0.824999988079071,
|
||||
"epoch": 0.2931937172774869,
|
||||
"grad_norm": 107.44986724853516,
|
||||
"learning_rate": 4.4651327368569684e-07,
|
||||
"logits/chosen": -0.7388048768043518,
|
||||
"logits/rejected": -0.7451251745223999,
|
||||
"loss": 1.1576,
|
||||
"step": 140
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07846825569868088,
|
||||
"beta_dpo/beta_used_raw": 0.06488198786973953,
|
||||
"beta_dpo/gap_mean": 8.364961624145508,
|
||||
"beta_dpo/gap_std": 14.984090805053711,
|
||||
"beta_dpo/mask_keep_frac": 0.7875000238418579,
|
||||
"epoch": 0.31413612565445026,
|
||||
"grad_norm": 38.963260650634766,
|
||||
"learning_rate": 4.346796604970912e-07,
|
||||
"logits/chosen": -0.768231213092804,
|
||||
"logits/rejected": -0.7551404237747192,
|
||||
"loss": 1.1224,
|
||||
"step": 150
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.11797045171260834,
|
||||
"beta_dpo/beta_used_raw": 0.09938563406467438,
|
||||
"beta_dpo/gap_mean": 9.785693168640137,
|
||||
"beta_dpo/gap_std": 15.681970596313477,
|
||||
"beta_dpo/mask_keep_frac": 0.856249988079071,
|
||||
"epoch": 0.33507853403141363,
|
||||
"grad_norm": 80.62310028076172,
|
||||
"learning_rate": 4.218561044282098e-07,
|
||||
"logits/chosen": -0.7575253844261169,
|
||||
"logits/rejected": -0.7614981532096863,
|
||||
"loss": 1.0544,
|
||||
"step": 160
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07409517467021942,
|
||||
"beta_dpo/beta_used_raw": 0.04705094173550606,
|
||||
"beta_dpo/gap_mean": 10.035483360290527,
|
||||
"beta_dpo/gap_std": 16.284427642822266,
|
||||
"beta_dpo/mask_keep_frac": 0.8187500238418579,
|
||||
"epoch": 0.35602094240837695,
|
||||
"grad_norm": 65.990966796875,
|
||||
"learning_rate": 4.081113438988443e-07,
|
||||
"logits/chosen": -0.7660126090049744,
|
||||
"logits/rejected": -0.7755380868911743,
|
||||
"loss": 1.0875,
|
||||
"step": 170
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07568483054637909,
|
||||
"beta_dpo/beta_used_raw": 0.06118815019726753,
|
||||
"beta_dpo/gap_mean": 9.977958679199219,
|
||||
"beta_dpo/gap_std": 16.553037643432617,
|
||||
"beta_dpo/mask_keep_frac": 0.793749988079071,
|
||||
"epoch": 0.3769633507853403,
|
||||
"grad_norm": 56.092166900634766,
|
||||
"learning_rate": 3.935190552834828e-07,
|
||||
"logits/chosen": -0.7195374965667725,
|
||||
"logits/rejected": -0.7341417074203491,
|
||||
"loss": 1.0689,
|
||||
"step": 180
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10011672973632812,
|
||||
"beta_dpo/beta_used_raw": 0.08130989223718643,
|
||||
"beta_dpo/gap_mean": 10.884498596191406,
|
||||
"beta_dpo/gap_std": 17.649686813354492,
|
||||
"beta_dpo/mask_keep_frac": 0.768750011920929,
|
||||
"epoch": 0.39790575916230364,
|
||||
"grad_norm": 47.546146392822266,
|
||||
"learning_rate": 3.781574579820464e-07,
|
||||
"logits/chosen": -0.7710455060005188,
|
||||
"logits/rejected": -0.783000648021698,
|
||||
"loss": 1.0703,
|
||||
"step": 190
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.03816061466932297,
|
||||
"beta_dpo/beta_used_raw": 0.01525220274925232,
|
||||
"beta_dpo/gap_mean": 10.375402450561523,
|
||||
"beta_dpo/gap_std": 17.245559692382812,
|
||||
"beta_dpo/mask_keep_frac": 0.831250011920929,
|
||||
"epoch": 0.418848167539267,
|
||||
"grad_norm": 40.988670349121094,
|
||||
"learning_rate": 3.621088951385353e-07,
|
||||
"logits/chosen": -0.7636905312538147,
|
||||
"logits/rejected": -0.7812480330467224,
|
||||
"loss": 1.1971,
|
||||
"step": 200
|
||||
},
|
||||
{
|
||||
"epoch": 0.418848167539267,
|
||||
"eval_beta_dpo/beta_used": 0.12430721521377563,
|
||||
"eval_beta_dpo/beta_used_raw": 0.09974151104688644,
|
||||
"eval_beta_dpo/gap_mean": 11.01975154876709,
|
||||
"eval_beta_dpo/gap_std": 18.638986587524414,
|
||||
"eval_beta_dpo/mask_keep_frac": 1.0,
|
||||
"eval_logits/chosen": -0.7570037245750427,
|
||||
"eval_logits/rejected": -0.7552843689918518,
|
||||
"eval_loss": 0.6548933386802673,
|
||||
"eval_runtime": 51.0397,
|
||||
"eval_samples_per_second": 39.185,
|
||||
"eval_steps_per_second": 0.627,
|
||||
"step": 200
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.09783867746591568,
|
||||
"beta_dpo/beta_used_raw": 0.09206344187259674,
|
||||
"beta_dpo/gap_mean": 11.258265495300293,
|
||||
"beta_dpo/gap_std": 19.141300201416016,
|
||||
"beta_dpo/mask_keep_frac": 0.831250011920929,
|
||||
"epoch": 0.4397905759162304,
|
||||
"grad_norm": 106.01080322265625,
|
||||
"learning_rate": 3.454593922550693e-07,
|
||||
"logits/chosen": -0.7539916038513184,
|
||||
"logits/rejected": -0.7599259614944458,
|
||||
"loss": 1.0859,
|
||||
"step": 210
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.13818596303462982,
|
||||
"beta_dpo/beta_used_raw": 0.118813656270504,
|
||||
"beta_dpo/gap_mean": 11.77585220336914,
|
||||
"beta_dpo/gap_std": 19.773366928100586,
|
||||
"beta_dpo/mask_keep_frac": 0.824999988079071,
|
||||
"epoch": 0.4607329842931937,
|
||||
"grad_norm": 128.11996459960938,
|
||||
"learning_rate": 3.2829819606729477e-07,
|
||||
"logits/chosen": -0.7987761497497559,
|
||||
"logits/rejected": -0.7768310308456421,
|
||||
"loss": 1.0097,
|
||||
"step": 220
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.0800265297293663,
|
||||
"beta_dpo/beta_used_raw": 0.06512973457574844,
|
||||
"beta_dpo/gap_mean": 12.928131103515625,
|
||||
"beta_dpo/gap_std": 20.115745544433594,
|
||||
"beta_dpo/mask_keep_frac": 0.75,
|
||||
"epoch": 0.4816753926701571,
|
||||
"grad_norm": 41.492034912109375,
|
||||
"learning_rate": 3.1071729615293424e-07,
|
||||
"logits/chosen": -0.7944627404212952,
|
||||
"logits/rejected": -0.7826088070869446,
|
||||
"loss": 1.0617,
|
||||
"step": 230
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07821373641490936,
|
||||
"beta_dpo/beta_used_raw": 0.05508134886622429,
|
||||
"beta_dpo/gap_mean": 13.714938163757324,
|
||||
"beta_dpo/gap_std": 21.715341567993164,
|
||||
"beta_dpo/mask_keep_frac": 0.7749999761581421,
|
||||
"epoch": 0.5026178010471204,
|
||||
"grad_norm": 55.7053108215332,
|
||||
"learning_rate": 2.9281093183781403e-07,
|
||||
"logits/chosen": -0.7329837083816528,
|
||||
"logits/rejected": -0.7595623731613159,
|
||||
"loss": 1.1275,
|
||||
"step": 240
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.08778323978185654,
|
||||
"beta_dpo/beta_used_raw": 0.048361603170633316,
|
||||
"beta_dpo/gap_mean": 13.810220718383789,
|
||||
"beta_dpo/gap_std": 22.46774673461914,
|
||||
"beta_dpo/mask_keep_frac": 0.800000011920929,
|
||||
"epoch": 0.5235602094240838,
|
||||
"grad_norm": 53.13675308227539,
|
||||
"learning_rate": 2.7467508704251135e-07,
|
||||
"logits/chosen": -0.787535548210144,
|
||||
"logits/rejected": -0.7830525636672974,
|
||||
"loss": 1.1019,
|
||||
"step": 250
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.11194082349538803,
|
||||
"beta_dpo/beta_used_raw": 0.06594248861074448,
|
||||
"beta_dpo/gap_mean": 13.73353099822998,
|
||||
"beta_dpo/gap_std": 22.698503494262695,
|
||||
"beta_dpo/mask_keep_frac": 0.824999988079071,
|
||||
"epoch": 0.5445026178010471,
|
||||
"grad_norm": 0.9119361042976379,
|
||||
"learning_rate": 2.5640697577740815e-07,
|
||||
"logits/chosen": -0.7817746996879578,
|
||||
"logits/rejected": -0.7839881181716919,
|
||||
"loss": 1.1687,
|
||||
"step": 260
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.09284855425357819,
|
||||
"beta_dpo/beta_used_raw": 0.08311768621206284,
|
||||
"beta_dpo/gap_mean": 13.976015090942383,
|
||||
"beta_dpo/gap_std": 22.33526039123535,
|
||||
"beta_dpo/mask_keep_frac": 0.8187500238418579,
|
||||
"epoch": 0.5654450261780105,
|
||||
"grad_norm": 136.4973602294922,
|
||||
"learning_rate": 2.381045210440644e-07,
|
||||
"logits/chosen": -0.7521445155143738,
|
||||
"logits/rejected": -0.7410815954208374,
|
||||
"loss": 1.0209,
|
||||
"step": 270
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10686023533344269,
|
||||
"beta_dpo/beta_used_raw": 0.06296978890895844,
|
||||
"beta_dpo/gap_mean": 14.858721733093262,
|
||||
"beta_dpo/gap_std": 22.79940414428711,
|
||||
"beta_dpo/mask_keep_frac": 0.824999988079071,
|
||||
"epoch": 0.5863874345549738,
|
||||
"grad_norm": 38.58131790161133,
|
||||
"learning_rate": 2.1986582993616925e-07,
|
||||
"logits/chosen": -0.7521171569824219,
|
||||
"logits/rejected": -0.7675251364707947,
|
||||
"loss": 1.058,
|
||||
"step": 280
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.06835642457008362,
|
||||
"beta_dpo/beta_used_raw": 0.012238355353474617,
|
||||
"beta_dpo/gap_mean": 13.978078842163086,
|
||||
"beta_dpo/gap_std": 23.335269927978516,
|
||||
"beta_dpo/mask_keep_frac": 0.8374999761581421,
|
||||
"epoch": 0.6073298429319371,
|
||||
"grad_norm": 1.274525761604309,
|
||||
"learning_rate": 2.0178866775369774e-07,
|
||||
"logits/chosen": -0.7752319574356079,
|
||||
"logits/rejected": -0.7829610109329224,
|
||||
"loss": 1.2126,
|
||||
"step": 290
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.08970650285482407,
|
||||
"beta_dpo/beta_used_raw": 0.0673152282834053,
|
||||
"beta_dpo/gap_mean": 13.71714973449707,
|
||||
"beta_dpo/gap_std": 23.238323211669922,
|
||||
"beta_dpo/mask_keep_frac": 0.75,
|
||||
"epoch": 0.6282722513089005,
|
||||
"grad_norm": 60.473148345947266,
|
||||
"learning_rate": 1.839699339491937e-07,
|
||||
"logits/chosen": -0.7769112586975098,
|
||||
"logits/rejected": -0.7637456655502319,
|
||||
"loss": 1.1287,
|
||||
"step": 300
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.0964554101228714,
|
||||
"beta_dpo/beta_used_raw": 0.06809216737747192,
|
||||
"beta_dpo/gap_mean": 14.4856595993042,
|
||||
"beta_dpo/gap_std": 23.187442779541016,
|
||||
"beta_dpo/mask_keep_frac": 0.8125,
|
||||
"epoch": 0.6492146596858639,
|
||||
"grad_norm": 30.574621200561523,
|
||||
"learning_rate": 1.6650514271527465e-07,
|
||||
"logits/chosen": -0.7852055430412292,
|
||||
"logits/rejected": -0.7743746638298035,
|
||||
"loss": 1.1436,
|
||||
"step": 310
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.0930468887090683,
|
||||
"beta_dpo/beta_used_raw": 0.057879697531461716,
|
||||
"beta_dpo/gap_mean": 15.27861213684082,
|
||||
"beta_dpo/gap_std": 23.997211456298828,
|
||||
"beta_dpo/mask_keep_frac": 0.8125,
|
||||
"epoch": 0.6701570680628273,
|
||||
"grad_norm": 266.17156982421875,
|
||||
"learning_rate": 1.4948791099758052e-07,
|
||||
"logits/chosen": -0.8031824827194214,
|
||||
"logits/rejected": -0.7853301763534546,
|
||||
"loss": 1.2318,
|
||||
"step": 320
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.08731904625892639,
|
||||
"beta_dpo/beta_used_raw": 0.05920511484146118,
|
||||
"beta_dpo/gap_mean": 15.062555313110352,
|
||||
"beta_dpo/gap_std": 24.421737670898438,
|
||||
"beta_dpo/mask_keep_frac": 0.824999988079071,
|
||||
"epoch": 0.6910994764397905,
|
||||
"grad_norm": 54.84642791748047,
|
||||
"learning_rate": 1.3300945667758012e-07,
|
||||
"logits/chosen": -0.7694008946418762,
|
||||
"logits/rejected": -0.7609071135520935,
|
||||
"loss": 1.058,
|
||||
"step": 330
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07772944122552872,
|
||||
"beta_dpo/beta_used_raw": 0.04176778346300125,
|
||||
"beta_dpo/gap_mean": 15.674954414367676,
|
||||
"beta_dpo/gap_std": 25.302011489868164,
|
||||
"beta_dpo/mask_keep_frac": 0.762499988079071,
|
||||
"epoch": 0.7120418848167539,
|
||||
"grad_norm": 162.36752319335938,
|
||||
"learning_rate": 1.1715810961514072e-07,
|
||||
"logits/chosen": -0.8045889139175415,
|
||||
"logits/rejected": -0.8078791499137878,
|
||||
"loss": 1.1423,
|
||||
"step": 340
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.09465853869915009,
|
||||
"beta_dpo/beta_used_raw": 0.03491034358739853,
|
||||
"beta_dpo/gap_mean": 15.350746154785156,
|
||||
"beta_dpo/gap_std": 25.115270614624023,
|
||||
"beta_dpo/mask_keep_frac": 0.7562500238418579,
|
||||
"epoch": 0.7329842931937173,
|
||||
"grad_norm": 122.0066146850586,
|
||||
"learning_rate": 1.0201883817182949e-07,
|
||||
"logits/chosen": -0.7979413866996765,
|
||||
"logits/rejected": -0.8106569051742554,
|
||||
"loss": 1.1516,
|
||||
"step": 350
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07950497418642044,
|
||||
"beta_dpo/beta_used_raw": 0.021852362900972366,
|
||||
"beta_dpo/gap_mean": 15.205873489379883,
|
||||
"beta_dpo/gap_std": 25.209131240844727,
|
||||
"beta_dpo/mask_keep_frac": 0.800000011920929,
|
||||
"epoch": 0.7539267015706806,
|
||||
"grad_norm": 93.26220703125,
|
||||
"learning_rate": 8.76727937529367e-08,
|
||||
"logits/chosen": -0.7563246488571167,
|
||||
"logits/rejected": -0.7660932540893555,
|
||||
"loss": 1.24,
|
||||
"step": 360
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.10245828330516815,
|
||||
"beta_dpo/beta_used_raw": 0.05802968889474869,
|
||||
"beta_dpo/gap_mean": 16.286312103271484,
|
||||
"beta_dpo/gap_std": 25.74993896484375,
|
||||
"beta_dpo/mask_keep_frac": 0.768750011920929,
|
||||
"epoch": 0.774869109947644,
|
||||
"grad_norm": 143.22608947753906,
|
||||
"learning_rate": 7.419687580962222e-08,
|
||||
"logits/chosen": -0.7966378331184387,
|
||||
"logits/rejected": -0.8195791244506836,
|
||||
"loss": 1.1759,
|
||||
"step": 370
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.04838007315993309,
|
||||
"beta_dpo/beta_used_raw": -0.006214796099811792,
|
||||
"beta_dpo/gap_mean": 15.983156204223633,
|
||||
"beta_dpo/gap_std": 24.809345245361328,
|
||||
"beta_dpo/mask_keep_frac": 0.7437499761581421,
|
||||
"epoch": 0.7958115183246073,
|
||||
"grad_norm": 36.29342269897461,
|
||||
"learning_rate": 6.166331963291519e-08,
|
||||
"logits/chosen": -0.7881544828414917,
|
||||
"logits/rejected": -0.786669909954071,
|
||||
"loss": 1.2336,
|
||||
"step": 380
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07021647691726685,
|
||||
"beta_dpo/beta_used_raw": 0.00572154950350523,
|
||||
"beta_dpo/gap_mean": 16.157865524291992,
|
||||
"beta_dpo/gap_std": 25.035715103149414,
|
||||
"beta_dpo/mask_keep_frac": 0.793749988079071,
|
||||
"epoch": 0.8167539267015707,
|
||||
"grad_norm": 27.86089324951172,
|
||||
"learning_rate": 5.013930914912476e-08,
|
||||
"logits/chosen": -0.8044806718826294,
|
||||
"logits/rejected": -0.8055523633956909,
|
||||
"loss": 1.1986,
|
||||
"step": 390
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.0964551717042923,
|
||||
"beta_dpo/beta_used_raw": 0.04246100038290024,
|
||||
"beta_dpo/gap_mean": 16.26091766357422,
|
||||
"beta_dpo/gap_std": 25.67080307006836,
|
||||
"beta_dpo/mask_keep_frac": 0.793749988079071,
|
||||
"epoch": 0.837696335078534,
|
||||
"grad_norm": 203.63230895996094,
|
||||
"learning_rate": 3.968661679220467e-08,
|
||||
"logits/chosen": -0.8050006628036499,
|
||||
"logits/rejected": -0.7917808890342712,
|
||||
"loss": 1.2165,
|
||||
"step": 400
|
||||
},
|
||||
{
|
||||
"epoch": 0.837696335078534,
|
||||
"eval_beta_dpo/beta_used": 0.1434057652950287,
|
||||
"eval_beta_dpo/beta_used_raw": 0.09862707555294037,
|
||||
"eval_beta_dpo/gap_mean": 15.923084259033203,
|
||||
"eval_beta_dpo/gap_std": 25.965980529785156,
|
||||
"eval_beta_dpo/mask_keep_frac": 1.0,
|
||||
"eval_logits/chosen": -0.8034595847129822,
|
||||
"eval_logits/rejected": -0.7974430322647095,
|
||||
"eval_loss": 0.7667602896690369,
|
||||
"eval_runtime": 50.9741,
|
||||
"eval_samples_per_second": 39.236,
|
||||
"eval_steps_per_second": 0.628,
|
||||
"step": 400
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.09968056529760361,
|
||||
"beta_dpo/beta_used_raw": 0.04236916825175285,
|
||||
"beta_dpo/gap_mean": 16.500282287597656,
|
||||
"beta_dpo/gap_std": 26.050161361694336,
|
||||
"beta_dpo/mask_keep_frac": 0.793749988079071,
|
||||
"epoch": 0.8586387434554974,
|
||||
"grad_norm": 82.50851440429688,
|
||||
"learning_rate": 3.036127238347164e-08,
|
||||
"logits/chosen": -0.827735424041748,
|
||||
"logits/rejected": -0.8203527331352234,
|
||||
"loss": 1.2025,
|
||||
"step": 410
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.0970761626958847,
|
||||
"beta_dpo/beta_used_raw": 0.058871395885944366,
|
||||
"beta_dpo/gap_mean": 16.738262176513672,
|
||||
"beta_dpo/gap_std": 26.436817169189453,
|
||||
"beta_dpo/mask_keep_frac": 0.7562500238418579,
|
||||
"epoch": 0.8795811518324608,
|
||||
"grad_norm": 277.814453125,
|
||||
"learning_rate": 2.2213262793589482e-08,
|
||||
"logits/chosen": -0.7558459639549255,
|
||||
"logits/rejected": -0.7355632185935974,
|
||||
"loss": 1.1919,
|
||||
"step": 420
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.07494507730007172,
|
||||
"beta_dpo/beta_used_raw": 0.037638500332832336,
|
||||
"beta_dpo/gap_mean": 17.993297576904297,
|
||||
"beta_dpo/gap_std": 27.201208114624023,
|
||||
"beta_dpo/mask_keep_frac": 0.84375,
|
||||
"epoch": 0.900523560209424,
|
||||
"grad_norm": 1.1577889919281006,
|
||||
"learning_rate": 1.5286263996730026e-08,
|
||||
"logits/chosen": -0.8132478594779968,
|
||||
"logits/rejected": -0.8199571371078491,
|
||||
"loss": 1.1747,
|
||||
"step": 430
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.046172745525836945,
|
||||
"beta_dpo/beta_used_raw": -0.05054600164294243,
|
||||
"beta_dpo/gap_mean": 16.831357955932617,
|
||||
"beta_dpo/gap_std": 27.087594985961914,
|
||||
"beta_dpo/mask_keep_frac": 0.793749988079071,
|
||||
"epoch": 0.9214659685863874,
|
||||
"grad_norm": 8.6950101852417,
|
||||
"learning_rate": 9.617406953185136e-09,
|
||||
"logits/chosen": -0.8208335638046265,
|
||||
"logits/rejected": -0.8280296325683594,
|
||||
"loss": 1.2337,
|
||||
"step": 440
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.11133173853158951,
|
||||
"beta_dpo/beta_used_raw": 0.07020476460456848,
|
||||
"beta_dpo/gap_mean": 16.71297264099121,
|
||||
"beta_dpo/gap_std": 26.49554443359375,
|
||||
"beta_dpo/mask_keep_frac": 0.824999988079071,
|
||||
"epoch": 0.9424083769633508,
|
||||
"grad_norm": 108.31566619873047,
|
||||
"learning_rate": 5.2370785753763356e-09,
|
||||
"logits/chosen": -0.7833819389343262,
|
||||
"logits/rejected": -0.7876101732254028,
|
||||
"loss": 1.171,
|
||||
"step": 450
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.06652946025133133,
|
||||
"beta_dpo/beta_used_raw": -0.015655241906642914,
|
||||
"beta_dpo/gap_mean": 17.124013900756836,
|
||||
"beta_dpo/gap_std": 27.718246459960938,
|
||||
"beta_dpo/mask_keep_frac": 0.8062499761581421,
|
||||
"epoch": 0.9633507853403142,
|
||||
"grad_norm": 70.03536224365234,
|
||||
"learning_rate": 2.168758844148272e-09,
|
||||
"logits/chosen": -0.8030775785446167,
|
||||
"logits/rejected": -0.8030357360839844,
|
||||
"loss": 1.2041,
|
||||
"step": 460
|
||||
},
|
||||
{
|
||||
"beta_dpo/beta_used": 0.12787500023841858,
|
||||
"beta_dpo/beta_used_raw": 0.10427769273519516,
|
||||
"beta_dpo/gap_mean": 17.284704208374023,
|
||||
"beta_dpo/gap_std": 27.71035385131836,
|
||||
"beta_dpo/mask_keep_frac": 0.862500011920929,
|
||||
"epoch": 0.9842931937172775,
|
||||
"grad_norm": 215.3680877685547,
|
||||
"learning_rate": 4.288949484559934e-10,
|
||||
"logits/chosen": -0.7909310460090637,
|
||||
"logits/rejected": -0.7838017344474792,
|
||||
"loss": 1.2015,
|
||||
"step": 470
|
||||
},
|
||||
{
|
||||
"epoch": 0.9989528795811519,
|
||||
"step": 477,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.1642480231431045,
|
||||
"train_runtime": 4421.8255,
|
||||
"train_samples_per_second": 13.826,
|
||||
"train_steps_per_second": 0.108
|
||||
}
|
||||
],
|
||||
"logging_steps": 10,
|
||||
"max_steps": 477,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 1,
|
||||
"save_steps": 200,
|
||||
"stateful_callbacks": {
|
||||
"TrainerControl": {
|
||||
"args": {
|
||||
"should_epoch_stop": false,
|
||||
"should_evaluate": false,
|
||||
"should_log": false,
|
||||
"should_save": true,
|
||||
"should_training_stop": true
|
||||
},
|
||||
"attributes": {}
|
||||
}
|
||||
},
|
||||
"total_flos": 0.0,
|
||||
"train_batch_size": 8,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
Reference in New Issue
Block a user