From 7ea33623c3e17a72b172b86e9cde5d36ce294647 Mon Sep 17 00:00:00 2001 From: ModelHub XC Date: Tue, 26 May 2026 09:02:17 +0800 Subject: [PATCH] =?UTF-8?q?=E5=88=9D=E5=A7=8B=E5=8C=96=E9=A1=B9=E7=9B=AE?= =?UTF-8?q?=EF=BC=8C=E7=94=B1ModelHub=20XC=E7=A4=BE=E5=8C=BA=E6=8F=90?= =?UTF-8?q?=E4=BE=9B=E6=A8=A1=E5=9E=8B?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Model: W-61/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753 Source: Original Platform --- .gitattributes | 36 + README.md | 85 + all_results.json | 27 + config.json | 29 + eval_results.json | 21 + generation_config.json | 9 + model-00001-of-00007.safetensors | 3 + model-00002-of-00007.safetensors | 3 + model-00003-of-00007.safetensors | 3 + model-00004-of-00007.safetensors | 3 + model-00005-of-00007.safetensors | 3 + model-00006-of-00007.safetensors | 3 + model-00007-of-00007.safetensors | 3 + model.safetensors.index.json | 298 + special_tokens_map.json | 23 + tokenizer.json | 3 + tokenizer_config.json | 2064 +++++ train.log | 1793 ++++ train_results.json | 9 + trainer_state.json | 13789 +++++++++++++++++++++++++++++ 20 files changed, 18207 insertions(+) create mode 100644 .gitattributes create mode 100644 README.md create mode 100644 all_results.json create mode 100644 config.json create mode 100644 eval_results.json create mode 100644 generation_config.json create mode 100644 model-00001-of-00007.safetensors create mode 100644 model-00002-of-00007.safetensors create mode 100644 model-00003-of-00007.safetensors create mode 100644 model-00004-of-00007.safetensors create mode 100644 model-00005-of-00007.safetensors create mode 100644 model-00006-of-00007.safetensors create mode 100644 model-00007-of-00007.safetensors create mode 100644 model.safetensors.index.json create mode 100644 special_tokens_map.json create mode 100644 tokenizer.json create mode 100644 tokenizer_config.json create mode 100644 train.log create mode 100644 train_results.json create mode 100644 trainer_state.json diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..52373fe --- /dev/null +++ b/.gitattributes @@ -0,0 +1,36 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +tokenizer.json filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md new file mode 100644 index 0000000..e8ef077 --- /dev/null +++ b/README.md @@ -0,0 +1,85 @@ +--- +library_name: transformers +base_model: llama-3-8b-base-sft-hh-helpful-4xh200-batch-64 +tags: +- alignment-handbook +- beta-dpo +- generated_from_trainer +datasets: +- Anthropic/hh-rlhf +model-index: +- name: llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753 + results: [] +--- + + + +# llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753 + +This model is a fine-tuned version of [llama-3-8b-base-sft-hh-helpful-4xh200-batch-64](https://huggingface.co/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64) on the Anthropic/hh-rlhf dataset. +It achieves the following results on the evaluation set: +- Loss: 1.7101 +- Beta Dpo/beta: 0.0691 +- Beta Dpo/loss Margin Mean: 86.8606 +- Beta Dpo/beta Margin Mean: 10.0274 +- Beta Dpo/beta Margin Std: 12.8117 +- Beta Dpo/beta Margin Grad Mean: -0.4550 +- Beta Dpo/beta Margin Grad Std: 0.0744 +- Beta Dpo/gap Mean: 130.0152 +- Beta Dpo/gap Std: 165.0541 +- Beta Dpo/beta Used Raw: -2.4893 +- Beta Dpo/beta Used: 0.0691 +- Beta Dpo/mask Keep Frac: 1.0 +- Logits/chosen: -0.2789 +- Logits/rejected: -0.2575 + +## Model description + +More information needed + +## Intended uses & limitations + +More information needed + +## Training and evaluation data + +More information needed + +## Training procedure + +### Training hyperparameters + +The following hyperparameters were used during training: +- learning_rate: 5e-07 +- train_batch_size: 8 +- eval_batch_size: 8 +- seed: 42 +- distributed_type: multi-GPU +- num_devices: 4 +- gradient_accumulation_steps: 2 +- total_train_batch_size: 64 +- total_eval_batch_size: 32 +- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments +- lr_scheduler_type: cosine +- lr_scheduler_warmup_ratio: 0.1 +- num_epochs: 1 + +### Training results + +| Training Loss | Epoch | Step | Validation Loss | Beta Dpo/beta | Beta Dpo/loss Margin Mean | Beta Dpo/beta Margin Mean | Beta Dpo/beta Margin Std | Beta Dpo/beta Margin Grad Mean | Beta Dpo/beta Margin Grad Std | Beta Dpo/gap Mean | Beta Dpo/gap Std | Beta Dpo/beta Used Raw | Beta Dpo/beta Used | Beta Dpo/mask Keep Frac | Logits/chosen | Logits/rejected | +|:-------------:|:------:|:----:|:---------------:|:-------------:|:-------------------------:|:-------------------------:|:------------------------:|:------------------------------:|:-----------------------------:|:-----------------:|:----------------:|:----------------------:|:------------------:|:-----------------------:|:-------------:|:---------------:| +| 0.9552 | 0.1468 | 100 | 0.6786 | 0.0046 | 9.8098 | 0.0725 | 0.1096 | -0.4895 | 0.0167 | 17.6954 | 22.1843 | -0.3731 | 0.0046 | 1.0 | -0.6698 | -0.6418 | +| 0.8706 | 0.2937 | 200 | 0.6904 | 0.0046 | 27.7458 | 0.2199 | 0.3260 | -0.4903 | 0.0228 | 50.6913 | 68.2433 | -1.2767 | 0.0046 | 1.0 | -0.6064 | -0.5873 | +| 2.8698 | 0.4405 | 300 | 0.8542 | 0.0215 | 46.8593 | 1.7761 | 2.5216 | -0.4710 | 0.0500 | 79.1242 | 110.1004 | -1.8359 | 0.0215 | 1.0 | -0.4178 | -0.4010 | +| 18.5063 | 0.5874 | 400 | 0.7607 | 0.0093 | 66.8920 | 1.0762 | 1.4305 | -0.4753 | 0.0447 | 116.2162 | 143.8824 | -2.8595 | 0.0093 | 1.0 | -0.4157 | -0.3938 | +| 1.1587 | 0.7342 | 500 | 1.3024 | 0.0541 | 78.1021 | 7.2488 | 9.0766 | -0.4557 | 0.0679 | 118.3478 | 142.3098 | -2.3147 | 0.0541 | 1.0 | -0.3590 | -0.3353 | +| 4.9835 | 0.8811 | 600 | 1.7101 | 0.0691 | 86.8606 | 10.0274 | 12.8117 | -0.4550 | 0.0744 | 130.0152 | 165.0541 | -2.4893 | 0.0691 | 1.0 | -0.2789 | -0.2575 | + + +### Framework versions + +- Transformers 4.51.0 +- Pytorch 2.3.1+cu121 +- Datasets 2.21.0 +- Tokenizers 0.21.4 diff --git a/all_results.json b/all_results.json new file mode 100644 index 0000000..cfac24b --- /dev/null +++ b/all_results.json @@ -0,0 +1,27 @@ +{ + "epoch": 1.0, + "eval_beta_dpo/beta": 0.011150650680065155, + "eval_beta_dpo/beta_margin_grad_mean": -0.47102445363998413, + "eval_beta_dpo/beta_margin_grad_std": 0.049201950430870056, + "eval_beta_dpo/beta_margin_mean": 1.640921711921692, + "eval_beta_dpo/beta_margin_std": 2.0729942321777344, + "eval_beta_dpo/beta_used": 0.011150650680065155, + "eval_beta_dpo/beta_used_raw": -3.504255771636963, + "eval_beta_dpo/gap_mean": 147.1534881591797, + "eval_beta_dpo/gap_std": 168.50018310546875, + "eval_beta_dpo/loss_margin_mean": 87.08258056640625, + "eval_beta_dpo/mask_keep_frac": 1.0, + "eval_logits/chosen": -0.2772989273071289, + "eval_logits/rejected": -0.2549748718738556, + "eval_loss": 0.7894024848937988, + "eval_runtime": 40.1139, + "eval_samples": 2339, + "eval_samples_per_second": 58.309, + "eval_steps_per_second": 1.845, + "total_flos": 0.0, + "train_loss": 2.627565469291942, + "train_runtime": 3177.7378, + "train_samples": 43598, + "train_samples_per_second": 13.72, + "train_steps_per_second": 0.214 +} \ No newline at end of file diff --git a/config.json b/config.json new file mode 100644 index 0000000..5092b09 --- /dev/null +++ b/config.json @@ -0,0 +1,29 @@ +{ + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128001, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 8192, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "float32", + "transformers_version": "4.51.0", + "use_cache": true, + "vocab_size": 128256 +} diff --git a/eval_results.json b/eval_results.json new file mode 100644 index 0000000..b9d3fdc --- /dev/null +++ b/eval_results.json @@ -0,0 +1,21 @@ +{ + "epoch": 1.0, + "eval_beta_dpo/beta": 0.011150650680065155, + "eval_beta_dpo/beta_margin_grad_mean": -0.47102445363998413, + "eval_beta_dpo/beta_margin_grad_std": 0.049201950430870056, + "eval_beta_dpo/beta_margin_mean": 1.640921711921692, + "eval_beta_dpo/beta_margin_std": 2.0729942321777344, + "eval_beta_dpo/beta_used": 0.011150650680065155, + "eval_beta_dpo/beta_used_raw": -3.504255771636963, + "eval_beta_dpo/gap_mean": 147.1534881591797, + "eval_beta_dpo/gap_std": 168.50018310546875, + "eval_beta_dpo/loss_margin_mean": 87.08258056640625, + "eval_beta_dpo/mask_keep_frac": 1.0, + "eval_logits/chosen": -0.2772989273071289, + "eval_logits/rejected": -0.2549748718738556, + "eval_loss": 0.7894024848937988, + "eval_runtime": 40.1139, + "eval_samples": 2339, + "eval_samples_per_second": 58.309, + "eval_steps_per_second": 1.845 +} \ No newline at end of file diff --git a/generation_config.json b/generation_config.json new file mode 100644 index 0000000..76247c9 --- /dev/null +++ b/generation_config.json @@ -0,0 +1,9 @@ +{ + "bos_token_id": 128000, + "do_sample": true, + "eos_token_id": 128001, + "max_length": 4096, + "temperature": 0.6, + "top_p": 0.9, + "transformers_version": "4.51.0" +} diff --git a/model-00001-of-00007.safetensors b/model-00001-of-00007.safetensors new file mode 100644 index 0000000..c0a8cba --- /dev/null +++ b/model-00001-of-00007.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f495606710cb33512b2beb05b23d58ae7b1d9dc4cadb6c76f47b1133621bf7b7 +size 4886466168 diff --git a/model-00002-of-00007.safetensors b/model-00002-of-00007.safetensors new file mode 100644 index 0000000..d1f519f --- /dev/null +++ b/model-00002-of-00007.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d983b1368f83a6898f73bc8983cedac79306c35bee0e995cc8e749206d2ef7c +size 4832007448 diff --git a/model-00003-of-00007.safetensors b/model-00003-of-00007.safetensors new file mode 100644 index 0000000..2bb0a6c --- /dev/null +++ b/model-00003-of-00007.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d83bd9699c4a89eb50e72aa777fee0b26e18079f74379ba5ffffb462d1875193 +size 4999813112 diff --git a/model-00004-of-00007.safetensors b/model-00004-of-00007.safetensors new file mode 100644 index 0000000..ce74104 --- /dev/null +++ b/model-00004-of-00007.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d23a9cdcc67d48b3bb3df50047ce60cd5fa674cb7c98bc00b02783210d2594c8 +size 4999813128 diff --git a/model-00005-of-00007.safetensors b/model-00005-of-00007.safetensors new file mode 100644 index 0000000..1066e52 --- /dev/null +++ b/model-00005-of-00007.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3aa97dd4e920c169257071bafb52788e8008946a17f084e62566933eb6fd6801 +size 4832007496 diff --git a/model-00006-of-00007.safetensors b/model-00006-of-00007.safetensors new file mode 100644 index 0000000..9515e23 --- /dev/null +++ b/model-00006-of-00007.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e5fddff270810773517b1b109a6367ff71ddec325f35f1e01d428061a8680a62 +size 4999813120 diff --git a/model-00007-of-00007.safetensors b/model-00007-of-00007.safetensors new file mode 100644 index 0000000..3ce8cb6 --- /dev/null +++ b/model-00007-of-00007.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:898490d5a16032b9bc97d95af293e8e0542a9bcf2f5af49cdec63d2bf2744c26 +size 2571158184 diff --git a/model.safetensors.index.json b/model.safetensors.index.json new file mode 100644 index 0000000..0985084 --- /dev/null +++ b/model.safetensors.index.json @@ -0,0 +1,298 @@ +{ + "metadata": { + "total_size": 32121044992 + }, + "weight_map": { + "lm_head.weight": "model-00007-of-00007.safetensors", + "model.embed_tokens.weight": "model-00001-of-00007.safetensors", + "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors", + "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors", + "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors", + "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors", + "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors", + "model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors", + "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors", + "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors", + "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors", + "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors", + "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors", + "model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors", + "model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors", + "model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors", + "model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors", + "model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors", + "model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors", + "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors", + "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors", + "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors", + "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors", + "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors", + "model.norm.weight": "model-00007-of-00007.safetensors" + } +} diff --git a/special_tokens_map.json b/special_tokens_map.json new file mode 100644 index 0000000..e5b39b6 --- /dev/null +++ b/special_tokens_map.json @@ -0,0 +1,23 @@ +{ + "bos_token": { + "content": "<|begin_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + }, + "eos_token": { + "content": "<|end_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + }, + "pad_token": { + "content": "<|end_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + } +} diff --git a/tokenizer.json b/tokenizer.json new file mode 100644 index 0000000..86a3394 --- /dev/null +++ b/tokenizer.json @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c5cf44023714fb39b05e71e425f8d7b92805ff73f7988b083b8c87f0bf87393 +size 17209961 diff --git a/tokenizer_config.json b/tokenizer_config.json new file mode 100644 index 0000000..8c6916a --- /dev/null +++ b/tokenizer_config.json @@ -0,0 +1,2064 @@ +{ + "added_tokens_decoder": { + "128000": { + "content": "<|begin_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128001": { + "content": "<|end_of_text|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128002": { + "content": "<|reserved_special_token_0|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128003": { + "content": "<|reserved_special_token_1|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128004": { + "content": "<|reserved_special_token_2|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128005": { + "content": "<|reserved_special_token_3|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128006": { + "content": "<|start_header_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128007": { + "content": "<|end_header_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128008": { + "content": "<|reserved_special_token_4|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128009": { + "content": "<|eot_id|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128010": { + "content": "<|reserved_special_token_5|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128011": { + "content": "<|reserved_special_token_6|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128012": { + "content": "<|reserved_special_token_7|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128013": { + "content": "<|reserved_special_token_8|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128014": { + "content": "<|reserved_special_token_9|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128015": { + "content": "<|reserved_special_token_10|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128016": { + "content": "<|reserved_special_token_11|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128017": { + "content": "<|reserved_special_token_12|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128018": { + "content": "<|reserved_special_token_13|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128019": { + "content": "<|reserved_special_token_14|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128020": { + "content": "<|reserved_special_token_15|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128021": { + "content": "<|reserved_special_token_16|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128022": { + "content": "<|reserved_special_token_17|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128023": { + "content": "<|reserved_special_token_18|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128024": { + "content": "<|reserved_special_token_19|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128025": { + "content": "<|reserved_special_token_20|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128026": { + "content": "<|reserved_special_token_21|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128027": { + "content": "<|reserved_special_token_22|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128028": { + "content": "<|reserved_special_token_23|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128029": { + "content": "<|reserved_special_token_24|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128030": { + "content": "<|reserved_special_token_25|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128031": { + "content": "<|reserved_special_token_26|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128032": { + "content": "<|reserved_special_token_27|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128033": { + "content": "<|reserved_special_token_28|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128034": { + "content": "<|reserved_special_token_29|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128035": { + "content": "<|reserved_special_token_30|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128036": { + "content": "<|reserved_special_token_31|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128037": { + "content": "<|reserved_special_token_32|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128038": { + "content": "<|reserved_special_token_33|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128039": { + "content": "<|reserved_special_token_34|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128040": { + "content": "<|reserved_special_token_35|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128041": { + "content": "<|reserved_special_token_36|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128042": { + "content": "<|reserved_special_token_37|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128043": { + "content": "<|reserved_special_token_38|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128044": { + "content": "<|reserved_special_token_39|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128045": { + "content": "<|reserved_special_token_40|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128046": { + "content": "<|reserved_special_token_41|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128047": { + "content": "<|reserved_special_token_42|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128048": { + "content": "<|reserved_special_token_43|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128049": { + "content": "<|reserved_special_token_44|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128050": { + "content": "<|reserved_special_token_45|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128051": { + "content": "<|reserved_special_token_46|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128052": { + "content": "<|reserved_special_token_47|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128053": { + "content": "<|reserved_special_token_48|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128054": { + "content": "<|reserved_special_token_49|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128055": { + "content": "<|reserved_special_token_50|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128056": { + "content": "<|reserved_special_token_51|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128057": { + "content": "<|reserved_special_token_52|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128058": { + "content": "<|reserved_special_token_53|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128059": { + "content": "<|reserved_special_token_54|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128060": { + "content": "<|reserved_special_token_55|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128061": { + "content": "<|reserved_special_token_56|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128062": { + "content": "<|reserved_special_token_57|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128063": { + "content": "<|reserved_special_token_58|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128064": { + "content": "<|reserved_special_token_59|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128065": { + "content": "<|reserved_special_token_60|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128066": { + "content": "<|reserved_special_token_61|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128067": { + "content": "<|reserved_special_token_62|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128068": { + "content": "<|reserved_special_token_63|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128069": { + "content": "<|reserved_special_token_64|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128070": { + "content": "<|reserved_special_token_65|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128071": { + "content": "<|reserved_special_token_66|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128072": { + "content": "<|reserved_special_token_67|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128073": { + "content": "<|reserved_special_token_68|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128074": { + "content": "<|reserved_special_token_69|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128075": { + "content": "<|reserved_special_token_70|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128076": { + "content": "<|reserved_special_token_71|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128077": { + "content": "<|reserved_special_token_72|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128078": { + "content": "<|reserved_special_token_73|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128079": { + "content": "<|reserved_special_token_74|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128080": { + "content": "<|reserved_special_token_75|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128081": { + "content": "<|reserved_special_token_76|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128082": { + "content": "<|reserved_special_token_77|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128083": { + "content": "<|reserved_special_token_78|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128084": { + "content": "<|reserved_special_token_79|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128085": { + "content": "<|reserved_special_token_80|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128086": { + "content": "<|reserved_special_token_81|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128087": { + "content": "<|reserved_special_token_82|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128088": { + "content": "<|reserved_special_token_83|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128089": { + "content": "<|reserved_special_token_84|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128090": { + "content": "<|reserved_special_token_85|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128091": { + "content": "<|reserved_special_token_86|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128092": { + "content": "<|reserved_special_token_87|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128093": { + "content": "<|reserved_special_token_88|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128094": { + "content": "<|reserved_special_token_89|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128095": { + "content": "<|reserved_special_token_90|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128096": { + "content": "<|reserved_special_token_91|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128097": { + "content": "<|reserved_special_token_92|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128098": { + "content": "<|reserved_special_token_93|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128099": { + "content": "<|reserved_special_token_94|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128100": { + "content": "<|reserved_special_token_95|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128101": { + "content": "<|reserved_special_token_96|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128102": { + "content": "<|reserved_special_token_97|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128103": { + "content": "<|reserved_special_token_98|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128104": { + "content": "<|reserved_special_token_99|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128105": { + "content": "<|reserved_special_token_100|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128106": { + "content": "<|reserved_special_token_101|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128107": { + "content": "<|reserved_special_token_102|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128108": { + "content": "<|reserved_special_token_103|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128109": { + "content": "<|reserved_special_token_104|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128110": { + "content": "<|reserved_special_token_105|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128111": { + "content": "<|reserved_special_token_106|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128112": { + "content": "<|reserved_special_token_107|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128113": { + "content": "<|reserved_special_token_108|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128114": { + "content": "<|reserved_special_token_109|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128115": { + "content": "<|reserved_special_token_110|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128116": { + "content": "<|reserved_special_token_111|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128117": { + "content": "<|reserved_special_token_112|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128118": { + "content": "<|reserved_special_token_113|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128119": { + "content": "<|reserved_special_token_114|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128120": { + "content": "<|reserved_special_token_115|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128121": { + "content": "<|reserved_special_token_116|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128122": { + "content": "<|reserved_special_token_117|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128123": { + "content": "<|reserved_special_token_118|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128124": { + "content": "<|reserved_special_token_119|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128125": { + "content": "<|reserved_special_token_120|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128126": { + "content": "<|reserved_special_token_121|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128127": { + "content": "<|reserved_special_token_122|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128128": { + "content": "<|reserved_special_token_123|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128129": { + "content": "<|reserved_special_token_124|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128130": { + "content": "<|reserved_special_token_125|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128131": { + "content": "<|reserved_special_token_126|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128132": { + "content": "<|reserved_special_token_127|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128133": { + "content": "<|reserved_special_token_128|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128134": { + "content": "<|reserved_special_token_129|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128135": { + "content": "<|reserved_special_token_130|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128136": { + "content": "<|reserved_special_token_131|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128137": { + "content": "<|reserved_special_token_132|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128138": { + "content": "<|reserved_special_token_133|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128139": { + "content": "<|reserved_special_token_134|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128140": { + "content": "<|reserved_special_token_135|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128141": { + "content": "<|reserved_special_token_136|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128142": { + "content": "<|reserved_special_token_137|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128143": { + "content": "<|reserved_special_token_138|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128144": { + "content": "<|reserved_special_token_139|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128145": { + "content": "<|reserved_special_token_140|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128146": { + "content": "<|reserved_special_token_141|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128147": { + "content": "<|reserved_special_token_142|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128148": { + "content": "<|reserved_special_token_143|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128149": { + "content": "<|reserved_special_token_144|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128150": { + "content": "<|reserved_special_token_145|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128151": { + "content": "<|reserved_special_token_146|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128152": { + "content": "<|reserved_special_token_147|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128153": { + "content": "<|reserved_special_token_148|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128154": { + "content": "<|reserved_special_token_149|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128155": { + "content": "<|reserved_special_token_150|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128156": { + "content": "<|reserved_special_token_151|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128157": { + "content": "<|reserved_special_token_152|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128158": { + "content": "<|reserved_special_token_153|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128159": { + "content": "<|reserved_special_token_154|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128160": { + "content": "<|reserved_special_token_155|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128161": { + "content": "<|reserved_special_token_156|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128162": { + "content": "<|reserved_special_token_157|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128163": { + "content": "<|reserved_special_token_158|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128164": { + "content": "<|reserved_special_token_159|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128165": { + "content": "<|reserved_special_token_160|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128166": { + "content": "<|reserved_special_token_161|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128167": { + "content": "<|reserved_special_token_162|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128168": { + "content": "<|reserved_special_token_163|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128169": { + "content": "<|reserved_special_token_164|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128170": { + "content": "<|reserved_special_token_165|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128171": { + "content": "<|reserved_special_token_166|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128172": { + "content": "<|reserved_special_token_167|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128173": { + "content": "<|reserved_special_token_168|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128174": { + "content": "<|reserved_special_token_169|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128175": { + "content": "<|reserved_special_token_170|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128176": { + "content": "<|reserved_special_token_171|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128177": { + "content": "<|reserved_special_token_172|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128178": { + "content": "<|reserved_special_token_173|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128179": { + "content": "<|reserved_special_token_174|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128180": { + "content": "<|reserved_special_token_175|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128181": { + "content": "<|reserved_special_token_176|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128182": { + "content": "<|reserved_special_token_177|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128183": { + "content": "<|reserved_special_token_178|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128184": { + "content": "<|reserved_special_token_179|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128185": { + "content": "<|reserved_special_token_180|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128186": { + "content": "<|reserved_special_token_181|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128187": { + "content": "<|reserved_special_token_182|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128188": { + "content": "<|reserved_special_token_183|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128189": { + "content": "<|reserved_special_token_184|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128190": { + "content": "<|reserved_special_token_185|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128191": { + "content": "<|reserved_special_token_186|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128192": { + "content": "<|reserved_special_token_187|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128193": { + "content": "<|reserved_special_token_188|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128194": { + "content": "<|reserved_special_token_189|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128195": { + "content": "<|reserved_special_token_190|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128196": { + "content": "<|reserved_special_token_191|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128197": { + "content": "<|reserved_special_token_192|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128198": { + "content": "<|reserved_special_token_193|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128199": { + "content": "<|reserved_special_token_194|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128200": { + "content": "<|reserved_special_token_195|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128201": { + "content": "<|reserved_special_token_196|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128202": { + "content": "<|reserved_special_token_197|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128203": { + "content": "<|reserved_special_token_198|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128204": { + "content": "<|reserved_special_token_199|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128205": { + "content": "<|reserved_special_token_200|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128206": { + "content": "<|reserved_special_token_201|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128207": { + "content": "<|reserved_special_token_202|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128208": { + "content": "<|reserved_special_token_203|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128209": { + "content": "<|reserved_special_token_204|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128210": { + "content": "<|reserved_special_token_205|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128211": { + "content": "<|reserved_special_token_206|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128212": { + "content": "<|reserved_special_token_207|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128213": { + "content": "<|reserved_special_token_208|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128214": { + "content": "<|reserved_special_token_209|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128215": { + "content": "<|reserved_special_token_210|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128216": { + "content": "<|reserved_special_token_211|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128217": { + "content": "<|reserved_special_token_212|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128218": { + "content": "<|reserved_special_token_213|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128219": { + "content": "<|reserved_special_token_214|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128220": { + "content": "<|reserved_special_token_215|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128221": { + "content": "<|reserved_special_token_216|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128222": { + "content": "<|reserved_special_token_217|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128223": { + "content": "<|reserved_special_token_218|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128224": { + "content": "<|reserved_special_token_219|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128225": { + "content": "<|reserved_special_token_220|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128226": { + "content": "<|reserved_special_token_221|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128227": { + "content": "<|reserved_special_token_222|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128228": { + "content": "<|reserved_special_token_223|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128229": { + "content": "<|reserved_special_token_224|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128230": { + "content": "<|reserved_special_token_225|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128231": { + "content": "<|reserved_special_token_226|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128232": { + "content": "<|reserved_special_token_227|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128233": { + "content": "<|reserved_special_token_228|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128234": { + "content": "<|reserved_special_token_229|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128235": { + "content": "<|reserved_special_token_230|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128236": { + "content": "<|reserved_special_token_231|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128237": { + "content": "<|reserved_special_token_232|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128238": { + "content": "<|reserved_special_token_233|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128239": { + "content": "<|reserved_special_token_234|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128240": { + "content": "<|reserved_special_token_235|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128241": { + "content": "<|reserved_special_token_236|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128242": { + "content": "<|reserved_special_token_237|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128243": { + "content": "<|reserved_special_token_238|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128244": { + "content": "<|reserved_special_token_239|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128245": { + "content": "<|reserved_special_token_240|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128246": { + "content": "<|reserved_special_token_241|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128247": { + "content": "<|reserved_special_token_242|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128248": { + "content": "<|reserved_special_token_243|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128249": { + "content": "<|reserved_special_token_244|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128250": { + "content": "<|reserved_special_token_245|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128251": { + "content": "<|reserved_special_token_246|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128252": { + "content": "<|reserved_special_token_247|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128253": { + "content": "<|reserved_special_token_248|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128254": { + "content": "<|reserved_special_token_249|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "128255": { + "content": "<|reserved_special_token_250|>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + } + }, + "bos_token": "<|begin_of_text|>", + "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}", + "clean_up_tokenization_spaces": true, + "eos_token": "<|end_of_text|>", + "extra_special_tokens": {}, + "model_input_names": [ + "input_ids", + "attention_mask" + ], + "model_max_length": 2048, + "pad_token": "<|end_of_text|>", + "tokenizer_class": "PreTrainedTokenizer" +} diff --git a/train.log b/train.log new file mode 100644 index 0000000..007c59c --- /dev/null +++ b/train.log @@ -0,0 +1,1793 @@ +2026-04-17 23:08:16 - INFO - __main__ - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='/scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101', model_revision='main', model_code_revision=None, torch_dtype='bfloat16', tokenizer_name_or_path=None, trust_remote_code=False, attn_implementation='flash_attention_2', use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False, bnb_4bit_quant_storage='uint8') +2026-04-17 23:08:16 - INFO - __main__ - Data parameters DataArguments(chat_template=None, dataset_mixer={'Anthropic/hh-rlhf': 1.0}, text_column='text', dataset_splits=['train', 'test'], dataset_configs=['helpful-base'], dataset_dir=None, preprocessing_num_workers=12, use_persistent_hf_cache=True, hf_cache_dir='/scratch/feng.yulu/dynamic-dpo-v4/hf/datasets', truncation_side=None, auto_insert_empty_system_msg=True, preprocessing_log_samples=0, preprocessing_log_dir=None) +2026-04-17 23:08:16 - INFO - __main__ - Training/evaluation parameters BetaDPOConfig( +_n_gpu=1, +accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, +adafactor=False, +adam_beta1=0.9, +adam_beta2=0.999, +adam_epsilon=1e-08, +alpha=0.6, +auto_find_batch_size=False, +average_tokens_across_devices=False, +batch_eval_metrics=False, +beta=0.1, +beta_min=0.001, +bf16=True, +bf16_full_eval=False, +data_seed=None, +dataloader_drop_last=True, +dataloader_num_workers=0, +dataloader_persistent_workers=False, +dataloader_pin_memory=True, +dataloader_prefetch_factor=None, +dataset_num_proc=12, +ddp_backend=None, +ddp_broadcast_buffers=None, +ddp_bucket_cap_mb=None, +ddp_find_unused_parameters=None, +ddp_timeout=1800, +debug=[], +deepspeed=None, +deterministic_eval=True, +disable_dropout=True, +disable_tqdm=False, +do_eval=True, +do_predict=False, +do_train=False, +ema_momentum=0.9, +eval_accumulation_steps=None, +eval_delay=0, +eval_do_concat_batches=True, +eval_on_start=False, +eval_steps=100, +eval_strategy=IntervalStrategy.STEPS, +eval_use_gather_object=False, +f_alpha_divergence_coef=1.0, +f_divergence_type=FDivergenceType.REVERSE_KL, +force_use_ref_model=False, +fp16=False, +fp16_backend=auto, +fp16_full_eval=False, +fp16_opt_level=O1, +fsdp=[], +fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, +fsdp_min_num_params=0, +fsdp_transformer_layer_cls_to_wrap=None, +full_determinism=False, +generate_during_eval=False, +gradient_accumulation_steps=2, +gradient_checkpointing=True, +gradient_checkpointing_kwargs={'use_reentrant': False}, +greater_is_better=None, +group_by_length=False, +half_precision_backend=auto, +hub_always_push=False, +hub_model_id=W-61/llama-3-8b-base-beta-dpo-hh-helpful-4xh200, +hub_model_revision=main, +hub_private_repo=None, +hub_strategy=HubStrategy.EVERY_SAVE, +hub_token=, +ignore_data_skip=False, +include_for_metrics=[], +include_inputs_for_metrics=False, +include_num_input_tokens_seen=False, +include_tokens_per_second=False, +is_encoder_decoder=None, +jit_mode_eval=False, +label_names=None, +label_pad_token_id=-100, +label_smoothing=0.0, +label_smoothing_factor=0.0, +learning_rate=5e-07, +length_column_name=length, +load_best_model_at_end=False, +local_rank=0, +log_level=info, +log_level_replica=warning, +log_on_each_node=True, +logging_dir=outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200/runs/Apr17_23-08-16_d4054, +logging_first_step=True, +logging_nan_inf_filter=True, +logging_steps=1, +logging_strategy=IntervalStrategy.STEPS, +loss_type=sigmoid, +lr_scheduler_kwargs={}, +lr_scheduler_type=SchedulerType.COSINE, +max_grad_norm=1.0, +max_length=512, +max_prompt_length=256, +max_steps=-1, +max_target_length=None, +metric_for_best_model=None, +model_adapter_name=None, +model_init_kwargs=None, +mp_parameters=, +neftune_noise_alpha=None, +no_cuda=False, +non_finite_logits_handling=sanitize, +num_train_epochs=1, +optim=OptimizerNames.ADAMW_TORCH, +optim_args=None, +optim_target_modules=None, +output_dir=/scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753, +overwrite_output_dir=False, +padding_value=None, +past_index=-1, +per_device_eval_batch_size=8, +per_device_train_batch_size=8, +post_tokenization_log_dir=None, +post_tokenization_log_samples=0, +precompute_ref_batch_size=None, +precompute_ref_eval_batch_size=None, +precompute_ref_log_probs=False, +prediction_loss_only=False, +push_to_hub=False, +push_to_hub_model_id=None, +push_to_hub_organization=None, +push_to_hub_token=, +ray_scope=last, +ref_adapter_name=None, +ref_model_init_kwargs=None, +ref_model_mixup_alpha=0.9, +ref_model_sync_steps=64, +reference_free=False, +remove_unused_columns=False, +report_to=['wandb'], +require_equal_local_batch_size=True, +restore_callback_states_from_checkpoint=False, +resume_from_checkpoint=None, +reuse_tokenized_dataset=True, +rho=0.8, +rpo_alpha=None, +run_name=llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753, +save_on_each_node=False, +save_only_model=False, +save_safetensors=True, +save_steps=200, +save_strategy=SaveStrategy.STEPS, +save_total_limit=2, +seed=42, +sft_weight=0.0, +skip_memory_metrics=True, +sync_global_mask=True, +sync_ref_model=False, +tf32=None, +tokenization_batch_size=128, +tokenization_mode=online, +tokenized_dataset_cache_dir=/scratch/feng.yulu/dynamic-dpo-v4/tokenized_preferences, +torch_compile=False, +torch_compile_backend=None, +torch_compile_mode=None, +torch_empty_cache_steps=None, +torchdynamo=None, +tp_size=0, +tpu_metrics_debug=False, +tpu_num_cores=None, +trainer_type=beta_dpo, +truncation_mode=keep_end, +use_cpu=False, +use_ipex=False, +use_legacy_prediction_loop=False, +use_liger_kernel=False, +use_mps_device=False, +wandb_project=ood-run-4xh200, +warmup_ratio=0.1, +warmup_steps=0, +weight_decay=0.0, +) +2026-04-17 23:08:16 - INFO - __main__ - W&B project: ood-run-4xh200 +2026-04-17 23:08:16 - INFO - __main__ - Beta-DPO parameters: beta=0.1, rho=0.8, alpha=0.6, ema_momentum=0.9 +2026-04-17 23:08:16 - INFO - __main__ - Using persistent HF datasets cache at /scratch/feng.yulu/dynamic-dpo-v4/hf/datasets +2026-04-17 23:08:19 - WARNING - __main__ - Dropped 237 non-canonical HH preference examples from split `train` before normalization (126 x HH preprocessing expects exactly one final assistant response in chosen/rejected suffixes., 111 x HH chosen/rejected transcripts must each contain a divergent assistant response.). + Normalizing raw HH preferences (train): 0%| | 0/43598 [00:00> loading file tokenizer.json +[INFO|tokenization_utils_base.py:2058] 2026-04-17 23:08:24,385 >> loading file tokenizer.model +[INFO|tokenization_utils_base.py:2058] 2026-04-17 23:08:24,385 >> loading file added_tokens.json +[INFO|tokenization_utils_base.py:2058] 2026-04-17 23:08:24,385 >> loading file special_tokens_map.json +[INFO|tokenization_utils_base.py:2058] 2026-04-17 23:08:24,385 >> loading file tokenizer_config.json +[INFO|tokenization_utils_base.py:2058] 2026-04-17 23:08:24,385 >> loading file chat_template.jinja + Normalizing raw HH preferences (test): 43%|███████████▉ | 1000/2339 [00:00<00:00, 9772.97 examples/s] Normalizing raw HH preferences (test): 85%|███████████████████████▊ | 1986/2339 [00:00<00:00, 6171.60 examples/s][INFO|tokenization_utils_base.py:2323] 2026-04-17 23:08:24,776 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. + Normalizing raw HH preferences (test): 100%|████████████████████████████| 2339/2339 [00:00<00:00, 5391.68 examples/s] +2026-04-17 23:08:25 - INFO - __main__ - Processed train sample 41905: + +Prompt: +<|begin_of_text|><|start_header_id|>user<|end_header_id|> + +What is the best way to clean my refrigerator?<|eot_id|><|start_header_id|>assistant<|end_header_id|> + +Well, I have a few ideas, but the best way to know what works best is to run some tests and then analyze the data to figure out which technique works best.<|eot_id|><|start_header_id|>user<|end_header_id|> + +What are your ideas?<|eot_id|> + +Chosen: +<|start_header_id|>assistant<|end_header_id|> + +Some of my ideas include wiping down the interior with a rag, using bleach and water to wipe down the door, emptying the contents of the fridge to scrub the shelves with a clean cloth, cleaning the interior of the fridge with vinegar and water, and using a vacuum cleaner to clean the interior.<|eot_id|> + +Rejected: +<|start_header_id|>assistant<|end_header_id|> + +I was thinking that you could try different types of soap, which could be a variation of “soap,” like antibacterial soap. I was also thinking that you could try using different tools, like a toothbrush, to clean the inside. Or I was thinking that you could try different ways to get the gunk out, like using a squeegee, or using a paper towel. I was also thinking you could try other types of cleaning, like vacuuming, but I think that could have the opposite of the desired effect.<|eot_id|> +/home/feng.yulu/dynamic-dpo-v4/scripts/tokenized_dpo_trainer.py:391: UserWarning: You passed a model_id to the trainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you. + warnings.warn( +/home/feng.yulu/dynamic-dpo-v4/scripts/tokenized_dpo_trainer.py:391: UserWarning: You passed a model_id to the trainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you. + warnings.warn( +[INFO|configuration_utils.py:691] 2026-04-17 23:08:25,111 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/config.json +[INFO|configuration_utils.py:765] 2026-04-17 23:08:25,123 >> Model config LlamaConfig { + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128001, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 8192, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "bfloat16", + "transformers_version": "4.51.0", + "use_cache": false, + "vocab_size": 128256 +} + +[INFO|modeling_utils.py:1121] 2026-04-17 23:08:25,330 >> loading weights file /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/model.safetensors.index.json +[INFO|modeling_utils.py:2167] 2026-04-17 23:08:25,331 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. +[WARNING|logging.py:328] 2026-04-17 23:08:25,333 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|logging.py:328] 2026-04-17 23:08:25,333 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[INFO|configuration_utils.py:1142] 2026-04-17 23:08:25,334 >> Generate config GenerationConfig { + "bos_token_id": 128000, + "eos_token_id": 128001, + "use_cache": false +} + + Loading checkpoint shards: 0%| | 0/7 [00:00> Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead. +/home/feng.yulu/dynamic-dpo-v4/scripts/tokenized_dpo_trainer.py:391: UserWarning: You passed a model_id to the trainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you. + warnings.warn( +[WARNING|logging.py:328] 2026-04-17 23:08:25,667 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. + Loading checkpoint shards: 0%| | 0/7 [00:00> Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead. + Loading checkpoint shards: 14%|███████▊ | 1/7 [00:01<00:11, 1.84s/it] Loading checkpoint shards: 29%|███████████████▋ | 2/7 [00:06<00:17, 3.41s/it] Loading checkpoint shards: 43%|███████████████████████▌ | 3/7 [00:08<00:11, 2.77s/it] Loading checkpoint shards: 57%|███████████████████████████████▍ | 4/7 [00:10<00:07, 2.56s/it] Loading checkpoint shards: 71%|███████████████████████████████████████▎ | 5/7 [00:12<00:04, 2.42s/it] Loading checkpoint shards: 86%|███████████████████████████████████████████████▏ | 6/7 [00:14<00:02, 2.30s/it] Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 7/7 [00:15<00:00, 1.89s/it] Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 7/7 [00:15<00:00, 2.27s/it] +[INFO|modeling_utils.py:4926] 2026-04-17 23:08:41,252 >> All model checkpoint weights were used when initializing LlamaForCausalLM. + +[INFO|modeling_utils.py:4934] 2026-04-17 23:08:41,252 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101. +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. +[INFO|configuration_utils.py:1095] 2026-04-17 23:08:41,255 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/generation_config.json +[INFO|configuration_utils.py:1142] 2026-04-17 23:08:41,255 >> Generate config GenerationConfig { + "bos_token_id": 128000, + "do_sample": true, + "eos_token_id": 128001, + "max_length": 4096, + "temperature": 0.6, + "top_p": 0.9 +} + +[INFO|configuration_utils.py:691] 2026-04-17 23:08:41,256 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/config.json +[INFO|configuration_utils.py:765] 2026-04-17 23:08:41,257 >> Model config LlamaConfig { + "architectures": [ + "LlamaForCausalLM" + ], + "attention_bias": false, + "attention_dropout": 0.0, + "bos_token_id": 128000, + "eos_token_id": 128001, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 4096, + "initializer_range": 0.02, + "intermediate_size": 14336, + "max_position_embeddings": 8192, + "mlp_bias": false, + "model_type": "llama", + "num_attention_heads": 32, + "num_hidden_layers": 32, + "num_key_value_heads": 8, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "rope_theta": 500000.0, + "tie_word_embeddings": false, + "torch_dtype": "bfloat16", + "transformers_version": "4.51.0", + "use_cache": false, + "vocab_size": 128256 +} + +[INFO|modeling_utils.py:1121] 2026-04-17 23:08:41,258 >> loading weights file /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/model.safetensors.index.json +[INFO|modeling_utils.py:2167] 2026-04-17 23:08:41,259 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. +[INFO|configuration_utils.py:1142] 2026-04-17 23:08:41,262 >> Generate config GenerationConfig { + "bos_token_id": 128000, + "eos_token_id": 128001, + "use_cache": false +} + + Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. + +[INFO|modeling_utils.py:4934] 2026-04-17 23:08:54,510 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101. +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. +[INFO|configuration_utils.py:1095] 2026-04-17 23:08:54,512 >> loading configuration file /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-sft-hh-helpful-4xh200-batch-64-20260416-162101/generation_config.json +[INFO|configuration_utils.py:1142] 2026-04-17 23:08:54,513 >> Generate config GenerationConfig { + "bos_token_id": 128000, + "do_sample": true, + "eos_token_id": 128001, + "max_length": 4096, + "temperature": 0.6, + "top_p": 0.9 +} + +[WARNING|trainer.py:821] 2026-04-17 23:08:54,514 >> Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead. +[WARNING|trainer.py:816] 2026-04-17 23:08:54,514 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. + Normalizing raw HH preferences (train): 64%|███████████████▎ | 27929/43598 [00:02<00:01, 12244.36 examples/s] Normalizing raw HH preferences (train): 68%|████████████████▍ | 29785/43598 [00:02<00:01, 12282.94 examples/s] Normalizing raw HH preferences (train): 72%|█████████████████▍ | 31570/43598 [00:02<00:00, 12154.65 examples/s] Normalizing raw HH preferences (train): 75%|██████████████████ | 32835/43598 [00:02<00:00, 12269.71 examples/s] Normalizing raw HH preferences (train): 80%|███████████████████ | 34703/43598 [00:02<00:00, 12256.80 examples/s] Normalizing raw HH preferences (train): 82%|███████████████████▊ | 35934/43598 [00:03<00:00, 12268.15 examples/s] Normalizing raw HH preferences (train): 87%|████████████████████▊ | 37772/43598 [00:03<00:00, 12260.22 examples/s] Normalizing raw HH preferences (train): 91%|█████████████████████▊ | 39692/43598 [00:03<00:00, 12258.75 examples/s] Normalizing raw HH preferences (train): 94%|██████████████████████▌ | 40942/43598 [00:03<00:00, 12313.21 examples/s] Normalizing raw HH preferences (train): 98%|███████████████████████▌| 42785/43598 [00:03<00:00, 12301.37 examples/s] Normalizing raw HH preferences (train): 100%|████████████████████████| 43598/43598 [00:03<00:00, 11250.34 examples/s] + Normalizing raw HH preferences (test): 0%| | 0/2339 [00:00> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. + Loading checkpoint shards: 0%| | 0/7 [00:00> Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead. + Tokenizing train (num_proc=12): 0%| | 0/43598 [00:00> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. + Saving the dataset (0/2 shards): 0%| | 0/43598 [00:00> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. + Tokenizing test (num_proc=12): 0%| | 0/2339 [00:00> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. + Saving the dataset (0/1 shards): 0%| | 0/2339 [00:00> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,282 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,282 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,475 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,475 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,475 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,475 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,476 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,476 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,501 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,501 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +[WARNING|trainer.py:816] 2026-04-17 23:23:14,501 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. +/home/feng.yulu/dynamic-dpo-v4/scripts/tokenized_dpo_trainer.py:521: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `BetaDPOTrainer.__init__`. Use `processing_class` instead. + super().__init__( +/home/feng.yulu/dynamic-dpo-v4/scripts/tokenized_dpo_trainer.py:521: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `BetaDPOTrainer.__init__`. Use `processing_class` instead. + super().__init__( +/home/feng.yulu/dynamic-dpo-v4/scripts/tokenized_dpo_trainer.py:521: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `BetaDPOTrainer.__init__`. Use `processing_class` instead. + super().__init__( +[INFO|trainer.py:748] 2026-04-17 23:23:14,673 >> Using auto half precision backend +/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1557: UserWarning: Upcasted low precision parameters in LlamaForCausalLM because mixed precision turned on in FSDP. Affects: model.embed_tokens.weight, model.norm.weight, lm_head.weight. + warnings.warn( +/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1557: UserWarning: Upcasted low precision parameters in LlamaDecoderLayer because mixed precision turned on in FSDP. Affects: self_attn.q_proj.weight, self_attn.k_proj.weight, self_attn.v_proj.weight, self_attn.o_proj.weight, mlp.gate_proj.weight, mlp.up_proj.weight, mlp.down_proj.weight, input_layernorm.weight, post_attention_layernorm.weight. + warnings.warn( +/home/feng.yulu/.conda/envs/dpo_venv/lib/python3.11/site-packages/accelerate/accelerator.py:1563: UserWarning: FSDP upcast of low precision parameters may affect the precision of model checkpoints. + warnings.warn( +[INFO|trainer.py:2414] 2026-04-17 23:23:23,819 >> ***** Running training ***** +[INFO|trainer.py:2415] 2026-04-17 23:23:23,819 >> Num examples = 43,598 +[INFO|trainer.py:2416] 2026-04-17 23:23:23,819 >> Num Epochs = 1 +[INFO|trainer.py:2417] 2026-04-17 23:23:23,819 >> Instantaneous batch size per device = 8 +[INFO|trainer.py:2420] 2026-04-17 23:23:23,819 >> Total train batch size (w. parallel, distributed & accumulation) = 64 +[INFO|trainer.py:2421] 2026-04-17 23:23:23,819 >> Gradient Accumulation steps = 2 +[INFO|trainer.py:2422] 2026-04-17 23:23:23,819 >> Total optimization steps = 681 +[INFO|trainer.py:2423] 2026-04-17 23:23:23,820 >> Number of trainable parameters = 2,007,565,312 +[INFO|integration_utils.py:831] 2026-04-17 23:23:23,821 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" +wandb: Currently logged in as: can-not-fand (can-not-fand-northeastern-university). Use `wandb login --relogin` to force relogin +wandb: wandb version 0.26.0 is available! To upgrade, please run: +wandb: $ pip install wandb --upgrade +wandb: Tracking run with wandb version 0.17.5 +wandb: Run data is saved locally in /scratch/feng.yulu/dynamic-dpo-v4/wandb/wandb/run-20260417_232327-zg7hpnnu +wandb: Run `wandb offline` to turn off syncing. +wandb: Syncing run llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753 +wandb: ⭐️ View project at https://wandb.ai/can-not-fand-northeastern-university/ood-run-4xh200 +wandb: 🚀 View run at https://wandb.ai/can-not-fand-northeastern-university/ood-run-4xh200/runs/zg7hpnnu + 0%| | 0/681 [00:00> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:1713] 2026-04-17 23:23:33,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:1713] 2026-04-17 23:23:33,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:1713] 2026-04-17 23:23:33,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 1/681 [00:02<30:46, 2.72s/it] {'loss': 1.3849, 'grad_norm': 83.69244384765625, 'learning_rate': 0.0, 'beta_dpo/gap_mean': -0.004527175799012184, 'beta_dpo/gap_std': 0.06229356676340103, 'beta_dpo/beta_used_raw': 0.10115084052085876, 'beta_dpo/beta_used': 0.10115084052085876, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4974287748336792, 'logits/rejected': -0.43299180269241333, 'beta_dpo/beta': 0.10115084052085876, 'beta_dpo/loss_margin_mean': -0.02287048101425171, 'beta_dpo/beta_margin_mean': -0.002253394341096282, 'beta_dpo/beta_margin_std': 0.042461980134248734, 'beta_dpo/beta_margin_grad_mean': -0.5005621910095215, 'beta_dpo/beta_margin_grad_std': 0.010608955286443233, 'epoch': 0.0} + 0%| | 1/681 [00:02<30:46, 2.72s/it] 0%|▏ | 2/681 [00:05<32:04, 2.83s/it] {'loss': 1.389, 'grad_norm': 72.02227783203125, 'learning_rate': 7.246376811594203e-09, 'beta_dpo/gap_mean': -0.0141224917024374, 'beta_dpo/gap_std': 0.1194789782166481, 'beta_dpo/beta_used_raw': 0.09928660839796066, 'beta_dpo/beta_used': 0.09928660839796066, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4953641891479492, 'logits/rejected': -0.4594460129737854, 'beta_dpo/beta': 0.09928660839796066, 'beta_dpo/loss_margin_mean': -0.06572240591049194, 'beta_dpo/beta_margin_mean': -0.006530125625431538, 'beta_dpo/beta_margin_std': 0.034978773444890976, 'beta_dpo/beta_margin_grad_mean': -0.501632034778595, 'beta_dpo/beta_margin_grad_std': 0.008741416968405247, 'epoch': 0.0} + 0%|▏ | 2/681 [00:05<32:04, 2.83s/it] 0%|▎ | 3/681 [00:08<31:45, 2.81s/it] {'loss': 1.389, 'grad_norm': 67.19432067871094, 'learning_rate': 1.4492753623188406e-08, 'beta_dpo/gap_mean': -0.006174812093377113, 'beta_dpo/gap_std': 0.16936704516410828, 'beta_dpo/beta_used_raw': 0.09881577640771866, 'beta_dpo/beta_used': 0.09881577640771866, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.48158758878707886, 'logits/rejected': -0.4422696828842163, 'beta_dpo/beta': 0.09881577640771866, 'beta_dpo/loss_margin_mean': 0.04976421594619751, 'beta_dpo/beta_margin_mean': 0.00491556478664279, 'beta_dpo/beta_margin_std': 0.03592138737440109, 'beta_dpo/beta_margin_grad_mean': -0.49877238273620605, 'beta_dpo/beta_margin_grad_std': 0.008976051583886147, 'epoch': 0.0} + 0%|▎ | 3/681 [00:08<31:45, 2.81s/it] 1%|▍ | 4/681 [00:11<31:56, 2.83s/it] {'loss': 1.3977, 'grad_norm': 67.43733215332031, 'learning_rate': 2.1739130434782606e-08, 'beta_dpo/gap_mean': -0.00973600521683693, 'beta_dpo/gap_std': 0.2109805941581726, 'beta_dpo/beta_used_raw': 0.09335151314735413, 'beta_dpo/beta_used': 0.09335151314735413, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45274418592453003, 'logits/rejected': -0.42465052008628845, 'beta_dpo/beta': 0.09335151314735413, 'beta_dpo/loss_margin_mean': -0.04590195417404175, 'beta_dpo/beta_margin_mean': -0.004296026658266783, 'beta_dpo/beta_margin_std': 0.03754071146249771, 'beta_dpo/beta_margin_grad_mean': -0.5010735988616943, 'beta_dpo/beta_margin_grad_std': 0.009380017407238483, 'epoch': 0.01} + 1%|▍ | 4/681 [00:11<31:56, 2.83s/it] 1%|▌ | 5/681 [00:14<31:55, 2.83s/it] {'loss': 1.3858, 'grad_norm': 87.71318817138672, 'learning_rate': 2.898550724637681e-08, 'beta_dpo/gap_mean': -0.0020640306174755096, 'beta_dpo/gap_std': 0.2421741932630539, 'beta_dpo/beta_used_raw': 0.10049673914909363, 'beta_dpo/beta_used': 0.10049673914909363, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4956454932689667, 'logits/rejected': -0.4505915641784668, 'beta_dpo/beta': 0.10049673914909363, 'beta_dpo/loss_margin_mean': 0.05585688352584839, 'beta_dpo/beta_margin_mean': 0.005582462064921856, 'beta_dpo/beta_margin_std': 0.03796974569559097, 'beta_dpo/beta_margin_grad_mean': -0.4986048936843872, 'beta_dpo/beta_margin_grad_std': 0.009488900192081928, 'epoch': 0.01} + 1%|▌ | 5/681 [00:14<31:55, 2.83s/it] 1%|▋ | 6/681 [00:16<30:26, 2.71s/it] {'loss': 1.3854, 'grad_norm': 90.84674072265625, 'learning_rate': 3.6231884057971014e-08, 'beta_dpo/gap_mean': 0.0017710481770336628, 'beta_dpo/gap_std': 0.2680016756057739, 'beta_dpo/beta_used_raw': 0.10047884285449982, 'beta_dpo/beta_used': 0.10047884285449982, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5020167827606201, 'logits/rejected': -0.4594297409057617, 'beta_dpo/beta': 0.10047884285449982, 'beta_dpo/loss_margin_mean': -0.007976382970809937, 'beta_dpo/beta_margin_mean': -0.0008351176511496305, 'beta_dpo/beta_margin_std': 0.03574404865503311, 'beta_dpo/beta_margin_grad_mean': -0.500208854675293, 'beta_dpo/beta_margin_grad_std': 0.008933261968195438, 'epoch': 0.01} + 1%|▋ | 6/681 [00:16<30:26, 2.71s/it] 1%|▊ | 7/681 [00:19<29:57, 2.67s/it] {'loss': 1.3865, 'grad_norm': 83.6563491821289, 'learning_rate': 4.347826086956521e-08, 'beta_dpo/gap_mean': 6.500491872429848e-05, 'beta_dpo/gap_std': 0.2939686179161072, 'beta_dpo/beta_used_raw': 0.09998422861099243, 'beta_dpo/beta_used': 0.09998422861099243, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5061265826225281, 'logits/rejected': -0.4723086953163147, 'beta_dpo/beta': 0.09998422861099243, 'beta_dpo/loss_margin_mean': -0.009219467639923096, 'beta_dpo/beta_margin_mean': -0.0009349790052510798, 'beta_dpo/beta_margin_std': 0.04061206057667732, 'beta_dpo/beta_margin_grad_mean': -0.5002336502075195, 'beta_dpo/beta_margin_grad_std': 0.01014900952577591, 'epoch': 0.01} + 1%|▊ | 7/681 [00:19<29:57, 2.67s/it] 1%|▉ | 8/681 [00:21<29:27, 2.63s/it] {'loss': 1.3836, 'grad_norm': 77.50525665283203, 'learning_rate': 5.0724637681159424e-08, 'beta_dpo/gap_mean': -0.009944056160748005, 'beta_dpo/gap_std': 0.3154027462005615, 'beta_dpo/beta_used_raw': 0.1022939383983612, 'beta_dpo/beta_used': 0.1022939383983612, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5334175825119019, 'logits/rejected': -0.510188102722168, 'beta_dpo/beta': 0.1022939383983612, 'beta_dpo/loss_margin_mean': -0.061917901039123535, 'beta_dpo/beta_margin_mean': -0.006352751050144434, 'beta_dpo/beta_margin_std': 0.042014747858047485, 'beta_dpo/beta_margin_grad_mean': -0.5015852451324463, 'beta_dpo/beta_margin_grad_std': 0.010492443107068539, 'epoch': 0.01} + 1%|▉ | 8/681 [00:21<29:27, 2.63s/it] 1%|█ | 9/681 [00:24<29:49, 2.66s/it] {'loss': 1.3895, 'grad_norm': 77.50155639648438, 'learning_rate': 5.797101449275362e-08, 'beta_dpo/gap_mean': -0.005505750421434641, 'beta_dpo/gap_std': 0.34114253520965576, 'beta_dpo/beta_used_raw': 0.09855471551418304, 'beta_dpo/beta_used': 0.09855471551418304, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.523200511932373, 'logits/rejected': -0.478301465511322, 'beta_dpo/beta': 0.09855471551418304, 'beta_dpo/loss_margin_mean': 0.02003002166748047, 'beta_dpo/beta_margin_mean': 0.00194238789845258, 'beta_dpo/beta_margin_std': 0.04742159694433212, 'beta_dpo/beta_margin_grad_mean': -0.49951478838920593, 'beta_dpo/beta_margin_grad_std': 0.011848426423966885, 'epoch': 0.01} + 1%|█ | 9/681 [00:24<29:49, 2.66s/it] 1%|█▏ | 10/681 [00:27<30:14, 2.70s/it] {'loss': 1.3878, 'grad_norm': 72.39192962646484, 'learning_rate': 6.521739130434782e-08, 'beta_dpo/gap_mean': -0.010290170088410378, 'beta_dpo/gap_std': 0.3536257743835449, 'beta_dpo/beta_used_raw': 0.0998501181602478, 'beta_dpo/beta_used': 0.0998501181602478, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.518991231918335, 'logits/rejected': -0.4768357276916504, 'beta_dpo/beta': 0.0998501181602478, 'beta_dpo/loss_margin_mean': -0.021320700645446777, 'beta_dpo/beta_margin_mean': -0.002129613421857357, 'beta_dpo/beta_margin_std': 0.04054965451359749, 'beta_dpo/beta_margin_grad_mean': -0.5005317330360413, 'beta_dpo/beta_margin_grad_std': 0.010131197981536388, 'epoch': 0.01} + 1%|█▏ | 10/681 [00:27<30:14, 2.70s/it] 2%|█▎ | 11/681 [00:30<30:46, 2.76s/it] {'loss': 1.3833, 'grad_norm': 66.96553802490234, 'learning_rate': 7.246376811594203e-08, 'beta_dpo/gap_mean': -0.004253363702446222, 'beta_dpo/gap_std': 0.35756930708885193, 'beta_dpo/beta_used_raw': 0.10206712037324905, 'beta_dpo/beta_used': 0.10206712037324905, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4826526641845703, 'logits/rejected': -0.4586416780948639, 'beta_dpo/beta': 0.10206712037324905, 'beta_dpo/loss_margin_mean': 0.03583630919456482, 'beta_dpo/beta_margin_mean': 0.003652524435892701, 'beta_dpo/beta_margin_std': 0.03466520085930824, 'beta_dpo/beta_margin_grad_mean': -0.49908754229545593, 'beta_dpo/beta_margin_grad_std': 0.008663208223879337, 'epoch': 0.02} + 2%|█▎ | 11/681 [00:30<30:46, 2.76s/it] 2%|█▍ | 12/681 [00:32<30:23, 2.73s/it] {'loss': 1.392, 'grad_norm': 83.22624206542969, 'learning_rate': 7.971014492753623e-08, 'beta_dpo/gap_mean': -0.00683976337313652, 'beta_dpo/gap_std': 0.3720043897628784, 'beta_dpo/beta_used_raw': 0.09693565964698792, 'beta_dpo/beta_used': 0.09693565964698792, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.547247052192688, 'logits/rejected': -0.5113379955291748, 'beta_dpo/beta': 0.09693565964698792, 'beta_dpo/loss_margin_mean': -0.017470553517341614, 'beta_dpo/beta_margin_mean': -0.0017924468265846372, 'beta_dpo/beta_margin_std': 0.042050570249557495, 'beta_dpo/beta_margin_grad_mean': -0.500446617603302, 'beta_dpo/beta_margin_grad_std': 0.01050448976457119, 'epoch': 0.02} + 2%|█▍ | 12/681 [00:32<30:23, 2.73s/it] 2%|█▌ | 13/681 [00:35<30:47, 2.77s/it] {'loss': 1.3897, 'grad_norm': 82.04718017578125, 'learning_rate': 8.695652173913042e-08, 'beta_dpo/gap_mean': -0.006056391168385744, 'beta_dpo/gap_std': 0.3698127865791321, 'beta_dpo/beta_used_raw': 0.09837324917316437, 'beta_dpo/beta_used': 0.09837324917316437, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4824892282485962, 'logits/rejected': -0.45439815521240234, 'beta_dpo/beta': 0.09837324917316437, 'beta_dpo/loss_margin_mean': 0.002656310796737671, 'beta_dpo/beta_margin_mean': 0.00025006092619150877, 'beta_dpo/beta_margin_std': 0.03974674642086029, 'beta_dpo/beta_margin_grad_mean': -0.4999392330646515, 'beta_dpo/beta_margin_grad_std': 0.00992752518504858, 'epoch': 0.02} + 2%|█▌ | 13/681 [00:35<30:47, 2.77s/it] 2%|█▌ | 14/681 [00:38<30:15, 2.72s/it] {'loss': 1.3877, 'grad_norm': 89.19822692871094, 'learning_rate': 9.420289855072464e-08, 'beta_dpo/gap_mean': -0.0021513975225389004, 'beta_dpo/gap_std': 0.37402260303497314, 'beta_dpo/beta_used_raw': 0.09926562756299973, 'beta_dpo/beta_used': 0.09926562756299973, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.47684037685394287, 'logits/rejected': -0.437483549118042, 'beta_dpo/beta': 0.09926562756299973, 'beta_dpo/loss_margin_mean': -0.01792725920677185, 'beta_dpo/beta_margin_mean': -0.001605634461157024, 'beta_dpo/beta_margin_std': 0.03615177050232887, 'beta_dpo/beta_margin_grad_mean': -0.5004010200500488, 'beta_dpo/beta_margin_grad_std': 0.009033882059156895, 'epoch': 0.02} + 2%|█▌ | 14/681 [00:38<30:15, 2.72s/it] 2%|█▋ | 15/681 [00:40<30:06, 2.71s/it] {'loss': 1.3806, 'grad_norm': 72.2989501953125, 'learning_rate': 1.0144927536231885e-07, 'beta_dpo/gap_mean': 0.0069586304016411304, 'beta_dpo/gap_std': 0.3670150637626648, 'beta_dpo/beta_used_raw': 0.1028667539358139, 'beta_dpo/beta_used': 0.1028667539358139, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4976166784763336, 'logits/rejected': -0.47907328605651855, 'beta_dpo/beta': 0.1028667539358139, 'beta_dpo/loss_margin_mean': 0.05616268515586853, 'beta_dpo/beta_margin_mean': 0.006086469162255526, 'beta_dpo/beta_margin_std': 0.03645266592502594, 'beta_dpo/beta_margin_grad_mean': -0.49847865104675293, 'beta_dpo/beta_margin_grad_std': 0.009109060280025005, 'epoch': 0.02} + 2%|█▋ | 15/681 [00:40<30:06, 2.71s/it] 2%|█▊ | 16/681 [00:43<29:40, 2.68s/it] {'loss': 1.3833, 'grad_norm': 85.27164459228516, 'learning_rate': 1.0869565217391303e-07, 'beta_dpo/gap_mean': 0.01056666485965252, 'beta_dpo/gap_std': 0.369087815284729, 'beta_dpo/beta_used_raw': 0.10129574686288834, 'beta_dpo/beta_used': 0.10129574686288834, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5442918539047241, 'logits/rejected': -0.5051777362823486, 'beta_dpo/beta': 0.10129574686288834, 'beta_dpo/loss_margin_mean': 0.04578801989555359, 'beta_dpo/beta_margin_mean': 0.004635946359485388, 'beta_dpo/beta_margin_std': 0.03721487522125244, 'beta_dpo/beta_margin_grad_mean': -0.4988415837287903, 'beta_dpo/beta_margin_grad_std': 0.009300184436142445, 'epoch': 0.02} + 2%|█▊ | 16/681 [00:43<29:40, 2.68s/it] 2%|█▉ | 17/681 [00:46<29:22, 2.65s/it] {'loss': 1.3755, 'grad_norm': 80.40909576416016, 'learning_rate': 1.1594202898550725e-07, 'beta_dpo/gap_mean': 0.023403100669384003, 'beta_dpo/gap_std': 0.37113308906555176, 'beta_dpo/beta_used_raw': 0.10490189492702484, 'beta_dpo/beta_used': 0.10490189492702484, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4828060269355774, 'logits/rejected': -0.44346535205841064, 'beta_dpo/beta': 0.10490189492702484, 'beta_dpo/loss_margin_mean': 0.10013490915298462, 'beta_dpo/beta_margin_mean': 0.010502819903194904, 'beta_dpo/beta_margin_std': 0.039345428347587585, 'beta_dpo/beta_margin_grad_mean': -0.4973750412464142, 'beta_dpo/beta_margin_grad_std': 0.009830176830291748, 'epoch': 0.02} + 2%|█▉ | 17/681 [00:46<29:22, 2.65s/it] 3%|██ | 18/681 [00:48<28:59, 2.62s/it] {'loss': 1.3833, 'grad_norm': 82.2762680053711, 'learning_rate': 1.2318840579710146e-07, 'beta_dpo/gap_mean': 0.029124243184924126, 'beta_dpo/gap_std': 0.3635770082473755, 'beta_dpo/beta_used_raw': 0.1001388430595398, 'beta_dpo/beta_used': 0.1001388430595398, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5221867561340332, 'logits/rejected': -0.4699585437774658, 'beta_dpo/beta': 0.1001388430595398, 'beta_dpo/loss_margin_mean': 0.03153112530708313, 'beta_dpo/beta_margin_mean': 0.00319434585981071, 'beta_dpo/beta_margin_std': 0.03238019719719887, 'beta_dpo/beta_margin_grad_mean': -0.4992016553878784, 'beta_dpo/beta_margin_grad_std': 0.008092939853668213, 'epoch': 0.03} + 3%|██ | 18/681 [00:48<28:59, 2.62s/it] 3%|██▏ | 19/681 [00:51<28:54, 2.62s/it] {'loss': 1.3788, 'grad_norm': 67.32933807373047, 'learning_rate': 1.3043478260869563e-07, 'beta_dpo/gap_mean': 0.03644995018839836, 'beta_dpo/gap_std': 0.36511197686195374, 'beta_dpo/beta_used_raw': 0.10230091959238052, 'beta_dpo/beta_used': 0.10230091959238052, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.49089670181274414, 'logits/rejected': -0.4410245716571808, 'beta_dpo/beta': 0.10230091959238052, 'beta_dpo/loss_margin_mean': 0.09297522902488708, 'beta_dpo/beta_margin_mean': 0.009549921378493309, 'beta_dpo/beta_margin_std': 0.03987620025873184, 'beta_dpo/beta_margin_grad_mean': -0.4976135194301605, 'beta_dpo/beta_margin_grad_std': 0.009962659329175949, 'epoch': 0.03} + 3%|██▏ | 19/681 [00:51<28:54, 2.62s/it] 3%|██▎ | 20/681 [00:53<28:51, 2.62s/it] {'loss': 1.3796, 'grad_norm': 77.79698944091797, 'learning_rate': 1.3768115942028986e-07, 'beta_dpo/gap_mean': 0.04330967366695404, 'beta_dpo/gap_std': 0.36020204424858093, 'beta_dpo/beta_used_raw': 0.10144417732954025, 'beta_dpo/beta_used': 0.10144417732954025, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5257374048233032, 'logits/rejected': -0.4667814075946808, 'beta_dpo/beta': 0.10144417732954025, 'beta_dpo/loss_margin_mean': 0.0418030321598053, 'beta_dpo/beta_margin_mean': 0.0042366455309093, 'beta_dpo/beta_margin_std': 0.031295765191316605, 'beta_dpo/beta_margin_grad_mean': -0.49894100427627563, 'beta_dpo/beta_margin_grad_std': 0.007821588777005672, 'epoch': 0.03} + 3%|██▎ | 20/681 [00:53<28:51, 2.62s/it] 3%|██▍ | 21/681 [00:56<28:36, 2.60s/it] {'loss': 1.3762, 'grad_norm': 84.59689331054688, 'learning_rate': 1.4492753623188405e-07, 'beta_dpo/gap_mean': 0.052578218281269073, 'beta_dpo/gap_std': 0.3585847020149231, 'beta_dpo/beta_used_raw': 0.10282687842845917, 'beta_dpo/beta_used': 0.10282687842845917, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5015609860420227, 'logits/rejected': -0.4782274663448334, 'beta_dpo/beta': 0.10282687842845917, 'beta_dpo/loss_margin_mean': 0.1178915798664093, 'beta_dpo/beta_margin_mean': 0.012177429161965847, 'beta_dpo/beta_margin_std': 0.04252319782972336, 'beta_dpo/beta_margin_grad_mean': -0.49695706367492676, 'beta_dpo/beta_margin_grad_std': 0.010617760010063648, 'epoch': 0.03} + 3%|██▍ | 21/681 [00:56<28:36, 2.60s/it] 3%|██▌ | 22/681 [00:59<28:59, 2.64s/it] {'loss': 1.375, 'grad_norm': 82.02935028076172, 'learning_rate': 1.5217391304347825e-07, 'beta_dpo/gap_mean': 0.07795767486095428, 'beta_dpo/gap_std': 0.37775668501853943, 'beta_dpo/beta_used_raw': 0.1021641194820404, 'beta_dpo/beta_used': 0.1021641194820404, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5076688528060913, 'logits/rejected': -0.46508467197418213, 'beta_dpo/beta': 0.1021641194820404, 'beta_dpo/loss_margin_mean': 0.2064528465270996, 'beta_dpo/beta_margin_mean': 0.021053766831755638, 'beta_dpo/beta_margin_std': 0.04432320222258568, 'beta_dpo/beta_margin_grad_mean': -0.494739294052124, 'beta_dpo/beta_margin_grad_std': 0.011074875481426716, 'epoch': 0.03} + 3%|██▌ | 22/681 [00:59<28:59, 2.64s/it] 3%|██▋ | 23/681 [01:02<30:07, 2.75s/it] {'loss': 1.3708, 'grad_norm': 76.44645690917969, 'learning_rate': 1.5942028985507245e-07, 'beta_dpo/gap_mean': 0.10390491783618927, 'beta_dpo/gap_std': 0.3772027790546417, 'beta_dpo/beta_used_raw': 0.10281073302030563, 'beta_dpo/beta_used': 0.10281073302030563, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5184653997421265, 'logits/rejected': -0.4976601004600525, 'beta_dpo/beta': 0.10281073302030563, 'beta_dpo/loss_margin_mean': 0.2033129334449768, 'beta_dpo/beta_margin_mean': 0.02100636623799801, 'beta_dpo/beta_margin_std': 0.03903375566005707, 'beta_dpo/beta_margin_grad_mean': -0.4947512447834015, 'beta_dpo/beta_margin_grad_std': 0.009751598350703716, 'epoch': 0.03} + 3%|██▋ | 23/681 [01:02<30:07, 2.75s/it] 4%|██▊ | 24/681 [01:04<29:58, 2.74s/it] {'loss': 1.3656, 'grad_norm': 94.25565338134766, 'learning_rate': 1.6666666666666665e-07, 'beta_dpo/gap_mean': 0.12391284108161926, 'beta_dpo/gap_std': 0.37767690420150757, 'beta_dpo/beta_used_raw': 0.10454396903514862, 'beta_dpo/beta_used': 0.10454396903514862, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5424538254737854, 'logits/rejected': -0.5254075527191162, 'beta_dpo/beta': 0.10454396903514862, 'beta_dpo/loss_margin_mean': 0.2502744197845459, 'beta_dpo/beta_margin_mean': 0.026394186541438103, 'beta_dpo/beta_margin_std': 0.04219713807106018, 'beta_dpo/beta_margin_grad_mean': -0.4934062063694, 'beta_dpo/beta_margin_grad_std': 0.010538320057094097, 'epoch': 0.04} + 4%|██▊ | 24/681 [01:04<29:58, 2.74s/it] 4%|██▉ | 25/681 [01:07<30:00, 2.74s/it] {'loss': 1.37, 'grad_norm': 75.07634735107422, 'learning_rate': 1.7391304347826085e-07, 'beta_dpo/gap_mean': 0.14912059903144836, 'beta_dpo/gap_std': 0.3832852840423584, 'beta_dpo/beta_used_raw': 0.100839763879776, 'beta_dpo/beta_used': 0.100839763879776, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4918757677078247, 'logits/rejected': -0.46183332800865173, 'beta_dpo/beta': 0.100839763879776, 'beta_dpo/loss_margin_mean': 0.22906917333602905, 'beta_dpo/beta_margin_mean': 0.02309180237352848, 'beta_dpo/beta_margin_std': 0.03954963758587837, 'beta_dpo/beta_margin_grad_mean': -0.4942309856414795, 'beta_dpo/beta_margin_grad_std': 0.009877659380435944, 'epoch': 0.04} + 4%|██▉ | 25/681 [01:07<30:00, 2.74s/it] 4%|███ | 26/681 [01:10<28:35, 2.62s/it] {'loss': 1.3654, 'grad_norm': 78.68896484375, 'learning_rate': 1.8115942028985507e-07, 'beta_dpo/gap_mean': 0.1847640573978424, 'beta_dpo/gap_std': 0.4011450409889221, 'beta_dpo/beta_used_raw': 0.10145638883113861, 'beta_dpo/beta_used': 0.10145638883113861, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5504664182662964, 'logits/rejected': -0.5192441344261169, 'beta_dpo/beta': 0.10145638883113861, 'beta_dpo/loss_margin_mean': 0.3683029115200043, 'beta_dpo/beta_margin_mean': 0.037368275225162506, 'beta_dpo/beta_margin_std': 0.050109487026929855, 'beta_dpo/beta_margin_grad_mean': -0.4906671941280365, 'beta_dpo/beta_margin_grad_std': 0.012507390230894089, 'epoch': 0.04} + 4%|███ | 26/681 [01:10<28:35, 2.62s/it] 4%|███▏ | 27/681 [01:12<28:12, 2.59s/it] {'loss': 1.3563, 'grad_norm': 87.7347183227539, 'learning_rate': 1.8840579710144927e-07, 'beta_dpo/gap_mean': 0.23974978923797607, 'beta_dpo/gap_std': 0.42792779207229614, 'beta_dpo/beta_used_raw': 0.10302956402301788, 'beta_dpo/beta_used': 0.10302956402301788, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5187373161315918, 'logits/rejected': -0.4824272394180298, 'beta_dpo/beta': 0.10302956402301788, 'beta_dpo/loss_margin_mean': 0.47268885374069214, 'beta_dpo/beta_margin_mean': 0.049370817840099335, 'beta_dpo/beta_margin_std': 0.057142678648233414, 'beta_dpo/beta_margin_grad_mean': -0.4876747727394104, 'beta_dpo/beta_margin_grad_std': 0.01424187608063221, 'epoch': 0.04} + 4%|███▏ | 27/681 [01:12<28:12, 2.59s/it] 4%|███▏ | 28/681 [01:15<28:22, 2.61s/it] {'loss': 1.3579, 'grad_norm': 75.64714050292969, 'learning_rate': 1.9565217391304347e-07, 'beta_dpo/gap_mean': 0.2491932511329651, 'beta_dpo/gap_std': 0.4498485326766968, 'beta_dpo/beta_used_raw': 0.102115698158741, 'beta_dpo/beta_used': 0.102115698158741, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5084043741226196, 'logits/rejected': -0.4534956216812134, 'beta_dpo/beta': 0.102115698158741, 'beta_dpo/loss_margin_mean': 0.295854777097702, 'beta_dpo/beta_margin_mean': 0.03022361919283867, 'beta_dpo/beta_margin_std': 0.056595128029584885, 'beta_dpo/beta_margin_grad_mean': -0.49245062470436096, 'beta_dpo/beta_margin_grad_std': 0.014135321602225304, 'epoch': 0.04} + 4%|███▏ | 28/681 [01:15<28:22, 2.61s/it] 4%|███▎ | 29/681 [01:17<27:12, 2.50s/it] {'loss': 1.346, 'grad_norm': 94.25686645507812, 'learning_rate': 2.028985507246377e-07, 'beta_dpo/gap_mean': 0.29277026653289795, 'beta_dpo/gap_std': 0.47807806730270386, 'beta_dpo/beta_used_raw': 0.10585251450538635, 'beta_dpo/beta_used': 0.10585251450538635, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5720121264457703, 'logits/rejected': -0.5272256731987, 'beta_dpo/beta': 0.10585251450538635, 'beta_dpo/loss_margin_mean': 0.4953559637069702, 'beta_dpo/beta_margin_mean': 0.05249761790037155, 'beta_dpo/beta_margin_std': 0.062127504497766495, 'beta_dpo/beta_margin_grad_mean': -0.4868943691253662, 'beta_dpo/beta_margin_grad_std': 0.015499315224587917, 'epoch': 0.04} + 4%|███▎ | 29/681 [01:17<27:12, 2.50s/it] 4%|███▍ | 30/681 [01:20<28:05, 2.59s/it] {'loss': 1.3372, 'grad_norm': 91.32884979248047, 'learning_rate': 2.1014492753623187e-07, 'beta_dpo/gap_mean': 0.3511636555194855, 'beta_dpo/gap_std': 0.5038948059082031, 'beta_dpo/beta_used_raw': 0.10716623067855835, 'beta_dpo/beta_used': 0.10716623067855835, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4944499731063843, 'logits/rejected': -0.4637511968612671, 'beta_dpo/beta': 0.10716623067855835, 'beta_dpo/loss_margin_mean': 0.6101883053779602, 'beta_dpo/beta_margin_mean': 0.06553145498037338, 'beta_dpo/beta_margin_std': 0.06532347202301025, 'beta_dpo/beta_margin_grad_mean': -0.48364534974098206, 'beta_dpo/beta_margin_grad_std': 0.016273001208901405, 'epoch': 0.04} + 4%|███▍ | 30/681 [01:20<28:05, 2.59s/it] 5%|███▌ | 31/681 [01:22<28:28, 2.63s/it] {'loss': 1.3554, 'grad_norm': 68.29032135009766, 'learning_rate': 2.1739130434782607e-07, 'beta_dpo/gap_mean': 0.36561119556427, 'beta_dpo/gap_std': 0.5108226537704468, 'beta_dpo/beta_used_raw': 0.09747521579265594, 'beta_dpo/beta_used': 0.09747521579265594, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5044275522232056, 'logits/rejected': -0.4597151279449463, 'beta_dpo/beta': 0.09747521579265594, 'beta_dpo/loss_margin_mean': 0.4201761782169342, 'beta_dpo/beta_margin_mean': 0.041009921580553055, 'beta_dpo/beta_margin_std': 0.05886054411530495, 'beta_dpo/beta_margin_grad_mean': -0.48976314067840576, 'beta_dpo/beta_margin_grad_std': 0.014673292636871338, 'epoch': 0.05} + 5%|███▌ | 31/681 [01:22<28:28, 2.63s/it] 5%|███▋ | 32/681 [01:25<29:03, 2.69s/it] {'loss': 1.338, 'grad_norm': 78.29996490478516, 'learning_rate': 2.2463768115942027e-07, 'beta_dpo/gap_mean': 0.4219781458377838, 'beta_dpo/gap_std': 0.56684410572052, 'beta_dpo/beta_used_raw': 0.10314959287643433, 'beta_dpo/beta_used': 0.10314959287643433, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5184359550476074, 'logits/rejected': -0.4776637554168701, 'beta_dpo/beta': 0.10314959287643433, 'beta_dpo/loss_margin_mean': 0.7036821842193604, 'beta_dpo/beta_margin_mean': 0.07229103147983551, 'beta_dpo/beta_margin_std': 0.08329294621944427, 'beta_dpo/beta_margin_grad_mean': -0.4819798171520233, 'beta_dpo/beta_margin_grad_std': 0.020708369091153145, 'epoch': 0.05} + 5%|███▋ | 32/681 [01:25<29:03, 2.69s/it] 5%|███▊ | 33/681 [01:28<28:46, 2.66s/it] {'loss': 1.3384, 'grad_norm': 75.79508209228516, 'learning_rate': 2.318840579710145e-07, 'beta_dpo/gap_mean': 0.4387624263763428, 'beta_dpo/gap_std': 0.5823417901992798, 'beta_dpo/beta_used_raw': 0.10217119753360748, 'beta_dpo/beta_used': 0.10217119753360748, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.47731277346611023, 'logits/rejected': -0.4508548974990845, 'beta_dpo/beta': 0.10217119753360748, 'beta_dpo/loss_margin_mean': 0.5102719664573669, 'beta_dpo/beta_margin_mean': 0.05285169929265976, 'beta_dpo/beta_margin_std': 0.0644962415099144, 'beta_dpo/beta_margin_grad_mean': -0.48680615425109863, 'beta_dpo/beta_margin_grad_std': 0.016086775809526443, 'epoch': 0.05} + 5%|███▊ | 33/681 [01:28<28:46, 2.66s/it] 5%|███▉ | 34/681 [01:31<28:44, 2.67s/it] {'loss': 1.3401, 'grad_norm': 66.3543930053711, 'learning_rate': 2.391304347826087e-07, 'beta_dpo/gap_mean': 0.48840245604515076, 'beta_dpo/gap_std': 0.6152428388595581, 'beta_dpo/beta_used_raw': 0.09928236901760101, 'beta_dpo/beta_used': 0.09928236901760101, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5100936889648438, 'logits/rejected': -0.4925019145011902, 'beta_dpo/beta': 0.09928236901760101, 'beta_dpo/loss_margin_mean': 0.7295181751251221, 'beta_dpo/beta_margin_mean': 0.07247772812843323, 'beta_dpo/beta_margin_std': 0.07699740678071976, 'beta_dpo/beta_margin_grad_mean': -0.4819219708442688, 'beta_dpo/beta_margin_grad_std': 0.01917845755815506, 'epoch': 0.05} + 5%|███▉ | 34/681 [01:31<28:44, 2.67s/it] 5%|████ | 35/681 [01:33<28:52, 2.68s/it] {'loss': 1.3114, 'grad_norm': 77.56873321533203, 'learning_rate': 2.463768115942029e-07, 'beta_dpo/gap_mean': 0.5772824883460999, 'beta_dpo/gap_std': 0.6622889637947083, 'beta_dpo/beta_used_raw': 0.10785353183746338, 'beta_dpo/beta_used': 0.10785353183746338, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5331075191497803, 'logits/rejected': -0.49618980288505554, 'beta_dpo/beta': 0.10785353183746338, 'beta_dpo/loss_margin_mean': 0.9983453750610352, 'beta_dpo/beta_margin_mean': 0.10804824531078339, 'beta_dpo/beta_margin_std': 0.10100562125444412, 'beta_dpo/beta_margin_grad_mean': -0.47311800718307495, 'beta_dpo/beta_margin_grad_std': 0.02489115111529827, 'epoch': 0.05} + 5%|████ | 35/681 [01:33<28:52, 2.68s/it] 5%|████▏ | 36/681 [01:36<28:52, 2.69s/it] {'loss': 1.3121, 'grad_norm': 73.26063537597656, 'learning_rate': 2.536231884057971e-07, 'beta_dpo/gap_mean': 0.6375015377998352, 'beta_dpo/gap_std': 0.7486386299133301, 'beta_dpo/beta_used_raw': 0.10545908659696579, 'beta_dpo/beta_used': 0.10545908659696579, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5830014944076538, 'logits/rejected': -0.5479526519775391, 'beta_dpo/beta': 0.10545908659696579, 'beta_dpo/loss_margin_mean': 0.9657546281814575, 'beta_dpo/beta_margin_mean': 0.10363934934139252, 'beta_dpo/beta_margin_std': 0.12403807044029236, 'beta_dpo/beta_margin_grad_mean': -0.4742385447025299, 'beta_dpo/beta_margin_grad_std': 0.030729172751307487, 'epoch': 0.05} + 5%|████▏ | 36/681 [01:36<28:52, 2.69s/it] 5%|████▎ | 37/681 [01:39<28:40, 2.67s/it] {'loss': 1.3286, 'grad_norm': 50.44397735595703, 'learning_rate': 2.6086956521739126e-07, 'beta_dpo/gap_mean': 0.7214508056640625, 'beta_dpo/gap_std': 0.8505280017852783, 'beta_dpo/beta_used_raw': 0.0942203551530838, 'beta_dpo/beta_used': 0.0942203551530838, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5012315511703491, 'logits/rejected': -0.45690277218818665, 'beta_dpo/beta': 0.0942203551530838, 'beta_dpo/loss_margin_mean': 0.9462437629699707, 'beta_dpo/beta_margin_mean': 0.09107129275798798, 'beta_dpo/beta_margin_std': 0.1248544380068779, 'beta_dpo/beta_margin_grad_mean': -0.47738873958587646, 'beta_dpo/beta_margin_grad_std': 0.03081784024834633, 'epoch': 0.05} + 5%|████▎ | 37/681 [01:39<28:40, 2.67s/it] 6%|████▍ | 38/681 [01:41<27:27, 2.56s/it] {'loss': 1.2998, 'grad_norm': 67.5627212524414, 'learning_rate': 2.681159420289855e-07, 'beta_dpo/gap_mean': 0.7879455089569092, 'beta_dpo/gap_std': 0.9812790155410767, 'beta_dpo/beta_used_raw': 0.1041734591126442, 'beta_dpo/beta_used': 0.1041734591126442, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5276659727096558, 'logits/rejected': -0.4949561655521393, 'beta_dpo/beta': 0.1041734591126442, 'beta_dpo/loss_margin_mean': 1.224595069885254, 'beta_dpo/beta_margin_mean': 0.12841160595417023, 'beta_dpo/beta_margin_std': 0.16240736842155457, 'beta_dpo/beta_margin_grad_mean': -0.4682784676551819, 'beta_dpo/beta_margin_grad_std': 0.03961404040455818, 'epoch': 0.06} + 6%|████▍ | 38/681 [01:41<27:27, 2.56s/it] 6%|████▌ | 39/681 [01:43<27:21, 2.56s/it] {'loss': 1.275, 'grad_norm': 74.21395874023438, 'learning_rate': 2.753623188405797e-07, 'beta_dpo/gap_mean': 0.9118002653121948, 'beta_dpo/gap_std': 1.0534446239471436, 'beta_dpo/beta_used_raw': 0.10857867449522018, 'beta_dpo/beta_used': 0.10857867449522018, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5588313341140747, 'logits/rejected': -0.5193623304367065, 'beta_dpo/beta': 0.10857867449522018, 'beta_dpo/loss_margin_mean': 1.4260352849960327, 'beta_dpo/beta_margin_mean': 0.15660372376441956, 'beta_dpo/beta_margin_std': 0.15102945268154144, 'beta_dpo/beta_margin_grad_mean': -0.46116903424263, 'beta_dpo/beta_margin_grad_std': 0.03715595230460167, 'epoch': 0.06} + 6%|████▌ | 39/681 [01:43<27:21, 2.56s/it] 6%|████▋ | 40/681 [01:46<27:48, 2.60s/it] {'loss': 1.2931, 'grad_norm': 55.91511154174805, 'learning_rate': 2.8260869565217386e-07, 'beta_dpo/gap_mean': 0.9838204383850098, 'beta_dpo/gap_std': 1.121214509010315, 'beta_dpo/beta_used_raw': 0.0998622328042984, 'beta_dpo/beta_used': 0.0998622328042984, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4998742341995239, 'logits/rejected': -0.46878963708877563, 'beta_dpo/beta': 0.0998622328042984, 'beta_dpo/loss_margin_mean': 1.3697092533111572, 'beta_dpo/beta_margin_mean': 0.13751423358917236, 'beta_dpo/beta_margin_std': 0.16336165368556976, 'beta_dpo/beta_margin_grad_mean': -0.4660206437110901, 'beta_dpo/beta_margin_grad_std': 0.03987602889537811, 'epoch': 0.06} + 6%|████▋ | 40/681 [01:46<27:48, 2.60s/it] 6%|████▊ | 41/681 [01:49<27:38, 2.59s/it] {'loss': 1.2849, 'grad_norm': 59.53895950317383, 'learning_rate': 2.898550724637681e-07, 'beta_dpo/gap_mean': 1.111755609512329, 'beta_dpo/gap_std': 1.2354657649993896, 'beta_dpo/beta_used_raw': 0.09814733266830444, 'beta_dpo/beta_used': 0.09814733266830444, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5415744781494141, 'logits/rejected': -0.5051206350326538, 'beta_dpo/beta': 0.09814733266830444, 'beta_dpo/loss_margin_mean': 1.6996898651123047, 'beta_dpo/beta_margin_mean': 0.1658371239900589, 'beta_dpo/beta_margin_std': 0.16969500482082367, 'beta_dpo/beta_margin_grad_mean': -0.4590160846710205, 'beta_dpo/beta_margin_grad_std': 0.04150310531258583, 'epoch': 0.06} + 6%|████▊ | 41/681 [01:49<27:38, 2.59s/it] 6%|████▊ | 42/681 [01:51<27:25, 2.58s/it] {'loss': 1.2274, 'grad_norm': 74.77738189697266, 'learning_rate': 2.971014492753623e-07, 'beta_dpo/gap_mean': 1.3095552921295166, 'beta_dpo/gap_std': 1.4133354425430298, 'beta_dpo/beta_used_raw': 0.11233452707529068, 'beta_dpo/beta_used': 0.11233452707529068, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5276546478271484, 'logits/rejected': -0.4807955324649811, 'beta_dpo/beta': 0.11233452707529068, 'beta_dpo/loss_margin_mean': 2.2716450691223145, 'beta_dpo/beta_margin_mean': 0.25520431995391846, 'beta_dpo/beta_margin_std': 0.23295927047729492, 'beta_dpo/beta_margin_grad_mean': -0.43761613965034485, 'beta_dpo/beta_margin_grad_std': 0.055440664291381836, 'epoch': 0.06} + 6%|████▊ | 42/681 [01:51<27:25, 2.58s/it] 6%|████▉ | 43/681 [01:54<27:27, 2.58s/it] {'loss': 1.1947, 'grad_norm': 79.2459487915039, 'learning_rate': 3.043478260869565e-07, 'beta_dpo/gap_mean': 1.495275855064392, 'beta_dpo/gap_std': 1.494248390197754, 'beta_dpo/beta_used_raw': 0.11648497730493546, 'beta_dpo/beta_used': 0.11648497730493546, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5580309629440308, 'logits/rejected': -0.5340878963470459, 'beta_dpo/beta': 0.11648497730493546, 'beta_dpo/loss_margin_mean': 2.2226815223693848, 'beta_dpo/beta_margin_mean': 0.2601800560951233, 'beta_dpo/beta_margin_std': 0.2120179980993271, 'beta_dpo/beta_margin_grad_mean': -0.4362444281578064, 'beta_dpo/beta_margin_grad_std': 0.05007302016019821, 'epoch': 0.06} + 6%|████▉ | 43/681 [01:54<27:27, 2.58s/it] 6%|█████ | 44/681 [01:57<27:52, 2.63s/it] {'loss': 1.1951, 'grad_norm': 80.41355895996094, 'learning_rate': 3.115942028985507e-07, 'beta_dpo/gap_mean': 1.653472900390625, 'beta_dpo/gap_std': 1.5553144216537476, 'beta_dpo/beta_used_raw': 0.11155369877815247, 'beta_dpo/beta_used': 0.11155369877815247, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.47205644845962524, 'logits/rejected': -0.45171642303466797, 'beta_dpo/beta': 0.11155369877815247, 'beta_dpo/loss_margin_mean': 2.398895740509033, 'beta_dpo/beta_margin_mean': 0.2672099471092224, 'beta_dpo/beta_margin_std': 0.20892754197120667, 'beta_dpo/beta_margin_grad_mean': -0.4344336986541748, 'beta_dpo/beta_margin_grad_std': 0.05017215758562088, 'epoch': 0.06} + 6%|█████ | 44/681 [01:57<27:52, 2.63s/it] 7%|█████▏ | 45/681 [01:59<27:51, 2.63s/it] {'loss': 1.2831, 'grad_norm': 47.4119987487793, 'learning_rate': 3.188405797101449e-07, 'beta_dpo/gap_mean': 1.7186641693115234, 'beta_dpo/gap_std': 1.6547086238861084, 'beta_dpo/beta_used_raw': 0.07954459637403488, 'beta_dpo/beta_used': 0.07954459637403488, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45731085538864136, 'logits/rejected': -0.4441610276699066, 'beta_dpo/beta': 0.07954459637403488, 'beta_dpo/loss_margin_mean': 1.9536571502685547, 'beta_dpo/beta_margin_mean': 0.15512201189994812, 'beta_dpo/beta_margin_std': 0.17768782377243042, 'beta_dpo/beta_margin_grad_mean': -0.4617185890674591, 'beta_dpo/beta_margin_grad_std': 0.043333351612091064, 'epoch': 0.07} + 7%|█████▏ | 45/681 [01:59<27:51, 2.63s/it] 7%|█████▎ | 46/681 [02:02<28:16, 2.67s/it] {'loss': 1.244, 'grad_norm': 66.04317474365234, 'learning_rate': 3.260869565217391e-07, 'beta_dpo/gap_mean': 1.8407939672470093, 'beta_dpo/gap_std': 1.877316951751709, 'beta_dpo/beta_used_raw': 0.08992807567119598, 'beta_dpo/beta_used': 0.08992807567119598, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.517339289188385, 'logits/rejected': -0.46569010615348816, 'beta_dpo/beta': 0.08992807567119598, 'beta_dpo/loss_margin_mean': 2.509418249130249, 'beta_dpo/beta_margin_mean': 0.22959379851818085, 'beta_dpo/beta_margin_std': 0.2589755356311798, 'beta_dpo/beta_margin_grad_mean': -0.44419437646865845, 'beta_dpo/beta_margin_grad_std': 0.060576457530260086, 'epoch': 0.07} + 7%|█████▎ | 46/681 [02:02<28:16, 2.67s/it] 7%|█████▍ | 47/681 [02:05<28:13, 2.67s/it] {'loss': 1.1832, 'grad_norm': 67.16490173339844, 'learning_rate': 3.333333333333333e-07, 'beta_dpo/gap_mean': 1.97328519821167, 'beta_dpo/gap_std': 1.9843567609786987, 'beta_dpo/beta_used_raw': 0.10393651574850082, 'beta_dpo/beta_used': 0.10393651574850082, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5952492952346802, 'logits/rejected': -0.5439423322677612, 'beta_dpo/beta': 0.10393651574850082, 'beta_dpo/loss_margin_mean': 2.603851795196533, 'beta_dpo/beta_margin_mean': 0.28237393498420715, 'beta_dpo/beta_margin_std': 0.2598910629749298, 'beta_dpo/beta_margin_grad_mean': -0.43122005462646484, 'beta_dpo/beta_margin_grad_std': 0.062102172523736954, 'epoch': 0.07} + 7%|█████▍ | 47/681 [02:05<28:13, 2.67s/it] 7%|█████▌ | 48/681 [02:08<28:49, 2.73s/it] {'loss': 1.1987, 'grad_norm': 78.59500122070312, 'learning_rate': 3.4057971014492755e-07, 'beta_dpo/gap_mean': 2.1250531673431396, 'beta_dpo/gap_std': 2.0948853492736816, 'beta_dpo/beta_used_raw': 0.09790638089179993, 'beta_dpo/beta_used': 0.09790638089179993, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5685693025588989, 'logits/rejected': -0.5092687606811523, 'beta_dpo/beta': 0.09790638089179993, 'beta_dpo/loss_margin_mean': 2.544447422027588, 'beta_dpo/beta_margin_mean': 0.2532716393470764, 'beta_dpo/beta_margin_std': 0.272605299949646, 'beta_dpo/beta_margin_grad_mean': -0.43831878900527954, 'beta_dpo/beta_margin_grad_std': 0.06469718366861343, 'epoch': 0.07} + 7%|█████▌ | 48/681 [02:08<28:49, 2.73s/it] 7%|█████▋ | 49/681 [02:10<28:10, 2.67s/it] {'loss': 1.1095, 'grad_norm': 240.3484344482422, 'learning_rate': 3.478260869565217e-07, 'beta_dpo/gap_mean': 2.2471675872802734, 'beta_dpo/gap_std': 2.2004098892211914, 'beta_dpo/beta_used_raw': 0.11987863481044769, 'beta_dpo/beta_used': 0.11987863481044769, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5591256618499756, 'logits/rejected': -0.5024401545524597, 'beta_dpo/beta': 0.11987863481044769, 'beta_dpo/loss_margin_mean': 3.125662088394165, 'beta_dpo/beta_margin_mean': 0.3685282766819, 'beta_dpo/beta_margin_std': 0.3620261251926422, 'beta_dpo/beta_margin_grad_mean': -0.41220971941947937, 'beta_dpo/beta_margin_grad_std': 0.08246695250272751, 'epoch': 0.07} + 7%|█████▋ | 49/681 [02:10<28:10, 2.67s/it] 7%|█████▊ | 50/681 [02:13<28:05, 2.67s/it] {'loss': 1.1672, 'grad_norm': 64.82975769042969, 'learning_rate': 3.5507246376811595e-07, 'beta_dpo/gap_mean': 2.4781899452209473, 'beta_dpo/gap_std': 2.4213905334472656, 'beta_dpo/beta_used_raw': 0.10016916692256927, 'beta_dpo/beta_used': 0.10016916692256927, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5334613919258118, 'logits/rejected': -0.497406542301178, 'beta_dpo/beta': 0.10016916692256927, 'beta_dpo/loss_margin_mean': 3.3676936626434326, 'beta_dpo/beta_margin_mean': 0.3364598751068115, 'beta_dpo/beta_margin_std': 0.32345935702323914, 'beta_dpo/beta_margin_grad_mean': -0.4190990924835205, 'beta_dpo/beta_margin_grad_std': 0.07547645270824432, 'epoch': 0.07} + 7%|█████▊ | 50/681 [02:13<28:05, 2.67s/it] 7%|█████▉ | 51/681 [02:15<28:06, 2.68s/it] {'loss': 1.2592, 'grad_norm': 36.31479263305664, 'learning_rate': 3.6231884057971015e-07, 'beta_dpo/gap_mean': 2.662703275680542, 'beta_dpo/gap_std': 2.715353012084961, 'beta_dpo/beta_used_raw': 0.0657687559723854, 'beta_dpo/beta_used': 0.0657687559723854, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5016952753067017, 'logits/rejected': -0.4681543707847595, 'beta_dpo/beta': 0.0657687559723854, 'beta_dpo/loss_margin_mean': 3.3309483528137207, 'beta_dpo/beta_margin_mean': 0.22254019975662231, 'beta_dpo/beta_margin_std': 0.2765715718269348, 'beta_dpo/beta_margin_grad_mean': -0.44602659344673157, 'beta_dpo/beta_margin_grad_std': 0.06567390263080597, 'epoch': 0.07} + 7%|█████▉ | 51/681 [02:15<28:06, 2.68s/it] 8%|██████ | 52/681 [02:18<27:25, 2.62s/it] {'loss': 0.9776, 'grad_norm': 85.15430450439453, 'learning_rate': 3.695652173913043e-07, 'beta_dpo/gap_mean': 3.020768404006958, 'beta_dpo/gap_std': 2.9662249088287354, 'beta_dpo/beta_used_raw': 0.13919858634471893, 'beta_dpo/beta_used': 0.13919858634471893, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5425816774368286, 'logits/rejected': -0.4867020845413208, 'beta_dpo/beta': 0.13919858634471893, 'beta_dpo/loss_margin_mean': 5.1557536125183105, 'beta_dpo/beta_margin_mean': 0.7171680927276611, 'beta_dpo/beta_margin_std': 0.5753344297409058, 'beta_dpo/beta_margin_grad_mean': -0.34051814675331116, 'beta_dpo/beta_margin_grad_std': 0.11514287441968918, 'epoch': 0.08} + 8%|██████ | 52/681 [02:18<27:25, 2.62s/it] 8%|██████▏ | 53/681 [02:20<27:17, 2.61s/it] {'loss': 1.0761, 'grad_norm': 66.78472137451172, 'learning_rate': 3.7681159420289855e-07, 'beta_dpo/gap_mean': 3.373033046722412, 'beta_dpo/gap_std': 3.254366874694824, 'beta_dpo/beta_used_raw': 0.10637001693248749, 'beta_dpo/beta_used': 0.10637001693248749, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5930138826370239, 'logits/rejected': -0.5710781812667847, 'beta_dpo/beta': 0.10637001693248749, 'beta_dpo/loss_margin_mean': 5.15134334564209, 'beta_dpo/beta_margin_mean': 0.5370194315910339, 'beta_dpo/beta_margin_std': 0.5486578345298767, 'beta_dpo/beta_margin_grad_mean': -0.3794803321361542, 'beta_dpo/beta_margin_grad_std': 0.10878144204616547, 'epoch': 0.08} + 8%|██████▏ | 53/681 [02:21<27:17, 2.61s/it] 8%|██████▎ | 54/681 [02:23<26:29, 2.53s/it] {'loss': 1.0957, 'grad_norm': 54.912174224853516, 'learning_rate': 3.8405797101449274e-07, 'beta_dpo/gap_mean': 3.6533608436584473, 'beta_dpo/gap_std': 3.5544323921203613, 'beta_dpo/beta_used_raw': 0.09235785901546478, 'beta_dpo/beta_used': 0.09235785901546478, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5769028663635254, 'logits/rejected': -0.5225714445114136, 'beta_dpo/beta': 0.09235785901546478, 'beta_dpo/loss_margin_mean': 4.466633319854736, 'beta_dpo/beta_margin_mean': 0.42590391635894775, 'beta_dpo/beta_margin_std': 0.46513980627059937, 'beta_dpo/beta_margin_grad_mean': -0.4021127223968506, 'beta_dpo/beta_margin_grad_std': 0.09637561440467834, 'epoch': 0.08} + 8%|██████▎ | 54/681 [02:23<26:29, 2.53s/it] 8%|██████▍ | 55/681 [02:25<25:35, 2.45s/it] {'loss': 0.9505, 'grad_norm': 70.0872573852539, 'learning_rate': 3.9130434782608694e-07, 'beta_dpo/gap_mean': 3.942603826522827, 'beta_dpo/gap_std': 3.9598231315612793, 'beta_dpo/beta_used_raw': 0.12684877216815948, 'beta_dpo/beta_used': 0.12684877216815948, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6278643608093262, 'logits/rejected': -0.5644968748092651, 'beta_dpo/beta': 0.12684877216815948, 'beta_dpo/loss_margin_mean': 5.50035285949707, 'beta_dpo/beta_margin_mean': 0.690856397151947, 'beta_dpo/beta_margin_std': 0.7624755501747131, 'beta_dpo/beta_margin_grad_mean': -0.3536130487918854, 'beta_dpo/beta_margin_grad_std': 0.14455373585224152, 'epoch': 0.08} + 8%|██████▍ | 55/681 [02:25<25:35, 2.45s/it] 8%|██████▍ | 56/681 [02:28<26:29, 2.54s/it] {'loss': 1.0989, 'grad_norm': 50.04378128051758, 'learning_rate': 3.9855072463768114e-07, 'beta_dpo/gap_mean': 4.207155227661133, 'beta_dpo/gap_std': 4.369948387145996, 'beta_dpo/beta_used_raw': 0.08802211284637451, 'beta_dpo/beta_used': 0.08802211284637451, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6108717918395996, 'logits/rejected': -0.5681912899017334, 'beta_dpo/beta': 0.08802211284637451, 'beta_dpo/loss_margin_mean': 5.1305742263793945, 'beta_dpo/beta_margin_mean': 0.45057377219200134, 'beta_dpo/beta_margin_std': 0.5337446928024292, 'beta_dpo/beta_margin_grad_mean': -0.39712223410606384, 'beta_dpo/beta_margin_grad_std': 0.1159137487411499, 'epoch': 0.08} + 8%|██████▍ | 56/681 [02:28<26:29, 2.54s/it] 8%|██████▌ | 57/681 [02:30<26:17, 2.53s/it] {'loss': 0.8215, 'grad_norm': 76.4854736328125, 'learning_rate': 4.057971014492754e-07, 'beta_dpo/gap_mean': 4.442320823669434, 'beta_dpo/gap_std': 4.536768436431885, 'beta_dpo/beta_used_raw': 0.151127427816391, 'beta_dpo/beta_used': 0.151127427816391, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5923542976379395, 'logits/rejected': -0.5654958486557007, 'beta_dpo/beta': 0.151127427816391, 'beta_dpo/loss_margin_mean': 5.748650074005127, 'beta_dpo/beta_margin_mean': 0.8648303747177124, 'beta_dpo/beta_margin_std': 0.7930364012718201, 'beta_dpo/beta_margin_grad_mean': -0.31904980540275574, 'beta_dpo/beta_margin_grad_std': 0.14913946390151978, 'epoch': 0.08} + 8%|██████▌ | 57/681 [02:30<26:17, 2.53s/it] 9%|██████▋ | 58/681 [02:33<26:35, 2.56s/it] {'loss': 1.0303, 'grad_norm': 63.09685134887695, 'learning_rate': 4.1304347826086954e-07, 'beta_dpo/gap_mean': 4.803388595581055, 'beta_dpo/gap_std': 4.8988494873046875, 'beta_dpo/beta_used_raw': 0.09416334331035614, 'beta_dpo/beta_used': 0.09416334331035614, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.595874547958374, 'logits/rejected': -0.5206152200698853, 'beta_dpo/beta': 0.09416334331035614, 'beta_dpo/loss_margin_mean': 6.5755534172058105, 'beta_dpo/beta_margin_mean': 0.6299749612808228, 'beta_dpo/beta_margin_std': 0.6659680008888245, 'beta_dpo/beta_margin_grad_mean': -0.3633388876914978, 'beta_dpo/beta_margin_grad_std': 0.13083474338054657, 'epoch': 0.09} + 9%|██████▋ | 58/681 [02:33<26:35, 2.56s/it] 9%|██████▊ | 59/681 [02:36<26:36, 2.57s/it] {'loss': 0.9537, 'grad_norm': 60.62688064575195, 'learning_rate': 4.2028985507246374e-07, 'beta_dpo/gap_mean': 5.30738639831543, 'beta_dpo/gap_std': 5.2926130294799805, 'beta_dpo/beta_used_raw': 0.10466543585062027, 'beta_dpo/beta_used': 0.10466543585062027, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5974893569946289, 'logits/rejected': -0.5545705556869507, 'beta_dpo/beta': 0.10466543585062027, 'beta_dpo/loss_margin_mean': 6.950667381286621, 'beta_dpo/beta_margin_mean': 0.829659640789032, 'beta_dpo/beta_margin_std': 1.0400630235671997, 'beta_dpo/beta_margin_grad_mean': -0.34448105096817017, 'beta_dpo/beta_margin_grad_std': 0.15328913927078247, 'epoch': 0.09} + 9%|██████▊ | 59/681 [02:36<26:36, 2.57s/it] 9%|██████▉ | 60/681 [02:38<26:27, 2.56s/it] {'loss': 0.8759, 'grad_norm': 69.3149185180664, 'learning_rate': 4.2753623188405794e-07, 'beta_dpo/gap_mean': 5.407642364501953, 'beta_dpo/gap_std': 5.513436317443848, 'beta_dpo/beta_used_raw': 0.11850239336490631, 'beta_dpo/beta_used': 0.11850239336490631, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5805940628051758, 'logits/rejected': -0.5189210772514343, 'beta_dpo/beta': 0.11850239336490631, 'beta_dpo/loss_margin_mean': 5.766895294189453, 'beta_dpo/beta_margin_mean': 0.7030664086341858, 'beta_dpo/beta_margin_std': 0.7772324085235596, 'beta_dpo/beta_margin_grad_mean': -0.3506718575954437, 'beta_dpo/beta_margin_grad_std': 0.15503977239131927, 'epoch': 0.09} + 9%|██████▉ | 60/681 [02:38<26:27, 2.56s/it] 9%|███████ | 61/681 [02:41<26:42, 2.58s/it] {'loss': 1.0428, 'grad_norm': 49.676326751708984, 'learning_rate': 4.3478260869565214e-07, 'beta_dpo/gap_mean': 5.656585693359375, 'beta_dpo/gap_std': 6.2068586349487305, 'beta_dpo/beta_used_raw': 0.08738794177770615, 'beta_dpo/beta_used': 0.08738794177770615, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5972954034805298, 'logits/rejected': -0.5621410608291626, 'beta_dpo/beta': 0.08738794177770615, 'beta_dpo/loss_margin_mean': 6.976743221282959, 'beta_dpo/beta_margin_mean': 0.6398810744285583, 'beta_dpo/beta_margin_std': 1.0747108459472656, 'beta_dpo/beta_margin_grad_mean': -0.37938931584358215, 'beta_dpo/beta_margin_grad_std': 0.15377961099147797, 'epoch': 0.09} + 9%|███████ | 61/681 [02:41<26:42, 2.58s/it] 9%|███████▏ | 62/681 [02:44<27:10, 2.63s/it] {'loss': 1.0477, 'grad_norm': 49.01858901977539, 'learning_rate': 4.420289855072464e-07, 'beta_dpo/gap_mean': 5.591924667358398, 'beta_dpo/gap_std': 6.288469314575195, 'beta_dpo/beta_used_raw': 0.07970167696475983, 'beta_dpo/beta_used': 0.07970167696475983, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5811234712600708, 'logits/rejected': -0.5460039973258972, 'beta_dpo/beta': 0.07970167696475983, 'beta_dpo/loss_margin_mean': 5.3183794021606445, 'beta_dpo/beta_margin_mean': 0.45351850986480713, 'beta_dpo/beta_margin_std': 0.6815299987792969, 'beta_dpo/beta_margin_grad_mean': -0.4036404490470886, 'beta_dpo/beta_margin_grad_std': 0.1279177963733673, 'epoch': 0.09} + 9%|███████▏ | 62/681 [02:44<27:10, 2.63s/it] 9%|███████▎ | 63/681 [02:46<26:57, 2.62s/it] {'loss': 0.934, 'grad_norm': 54.96387481689453, 'learning_rate': 4.4927536231884053e-07, 'beta_dpo/gap_mean': 5.912351608276367, 'beta_dpo/gap_std': 6.507175445556641, 'beta_dpo/beta_used_raw': 0.10061165690422058, 'beta_dpo/beta_used': 0.10061165690422058, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5760135650634766, 'logits/rejected': -0.5288089513778687, 'beta_dpo/beta': 0.10061165690422058, 'beta_dpo/loss_margin_mean': 7.235960006713867, 'beta_dpo/beta_margin_mean': 0.8012576103210449, 'beta_dpo/beta_margin_std': 0.977336049079895, 'beta_dpo/beta_margin_grad_mean': -0.3452926576137543, 'beta_dpo/beta_margin_grad_std': 0.16270661354064941, 'epoch': 0.09} + 9%|███████▎ | 63/681 [02:46<26:57, 2.62s/it] 9%|███████▍ | 64/681 [02:49<26:29, 2.58s/it] {'loss': 0.892, 'grad_norm': 54.98874282836914, 'learning_rate': 4.5652173913043473e-07, 'beta_dpo/gap_mean': 6.382755279541016, 'beta_dpo/gap_std': 7.030701637268066, 'beta_dpo/beta_used_raw': 0.11127346754074097, 'beta_dpo/beta_used': 0.11127346754074097, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5682976245880127, 'logits/rejected': -0.5359951257705688, 'beta_dpo/beta': 0.11127346754074097, 'beta_dpo/loss_margin_mean': 8.447539329528809, 'beta_dpo/beta_margin_mean': 1.0696979761123657, 'beta_dpo/beta_margin_std': 1.435511589050293, 'beta_dpo/beta_margin_grad_mean': -0.32286009192466736, 'beta_dpo/beta_margin_grad_std': 0.17790742218494415, 'epoch': 0.09} + 9%|███████▍ | 64/681 [02:49<26:29, 2.58s/it] 10%|███████▌ | 65/681 [02:51<26:33, 2.59s/it] {'loss': 0.7454, 'grad_norm': 84.47888946533203, 'learning_rate': 4.63768115942029e-07, 'beta_dpo/gap_mean': 6.738654136657715, 'beta_dpo/gap_std': 7.486597061157227, 'beta_dpo/beta_used_raw': 0.15355268120765686, 'beta_dpo/beta_used': 0.15355268120765686, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6392999887466431, 'logits/rejected': -0.6247435808181763, 'beta_dpo/beta': 0.15355268120765686, 'beta_dpo/loss_margin_mean': 8.504437446594238, 'beta_dpo/beta_margin_mean': 1.3807626962661743, 'beta_dpo/beta_margin_std': 1.8169898986816406, 'beta_dpo/beta_margin_grad_mean': -0.28559890389442444, 'beta_dpo/beta_margin_grad_std': 0.21047906577587128, 'epoch': 0.1} + 10%|███████▌ | 65/681 [02:51<26:33, 2.59s/it] 10%|███████▋ | 66/681 [02:54<26:38, 2.60s/it] {'loss': 1.1833, 'grad_norm': 30.142791748046875, 'learning_rate': 4.7101449275362313e-07, 'beta_dpo/gap_mean': 7.011206150054932, 'beta_dpo/gap_std': 7.803816795349121, 'beta_dpo/beta_used_raw': 0.038759633898735046, 'beta_dpo/beta_used': 0.038759633898735046, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6175287365913391, 'logits/rejected': -0.5830913186073303, 'beta_dpo/beta': 0.038759633898735046, 'beta_dpo/loss_margin_mean': 7.870203971862793, 'beta_dpo/beta_margin_mean': 0.3621111810207367, 'beta_dpo/beta_margin_std': 0.5689931511878967, 'beta_dpo/beta_margin_grad_mean': -0.42082634568214417, 'beta_dpo/beta_margin_grad_std': 0.11057644337415695, 'epoch': 0.1} + 10%|███████▋ | 66/681 [02:54<26:38, 2.60s/it] 10%|███████▊ | 67/681 [02:56<25:34, 2.50s/it] {'loss': 1.0324, 'grad_norm': 44.186004638671875, 'learning_rate': 4.782608695652174e-07, 'beta_dpo/gap_mean': 7.094534873962402, 'beta_dpo/gap_std': 8.07803726196289, 'beta_dpo/beta_used_raw': 0.06989531219005585, 'beta_dpo/beta_used': 0.06989531219005585, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6101835370063782, 'logits/rejected': -0.5699295997619629, 'beta_dpo/beta': 0.06989531219005585, 'beta_dpo/loss_margin_mean': 8.12269401550293, 'beta_dpo/beta_margin_mean': 0.59562087059021, 'beta_dpo/beta_margin_std': 0.8447734117507935, 'beta_dpo/beta_margin_grad_mean': -0.38001659512519836, 'beta_dpo/beta_margin_grad_std': 0.14094047248363495, 'epoch': 0.1} + 10%|███████▊ | 67/681 [02:56<25:34, 2.50s/it] 10%|███████▉ | 68/681 [02:59<25:44, 2.52s/it] {'loss': 0.953, 'grad_norm': 40.886878967285156, 'learning_rate': 4.855072463768116e-07, 'beta_dpo/gap_mean': 7.258274078369141, 'beta_dpo/gap_std': 8.184741973876953, 'beta_dpo/beta_used_raw': 0.06118408590555191, 'beta_dpo/beta_used': 0.09041684120893478, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6218951940536499, 'logits/rejected': -0.5630506873130798, 'beta_dpo/beta': 0.09041684120893478, 'beta_dpo/loss_margin_mean': 7.898317813873291, 'beta_dpo/beta_margin_mean': 0.7865732908248901, 'beta_dpo/beta_margin_std': 1.181038498878479, 'beta_dpo/beta_margin_grad_mean': -0.3650799095630646, 'beta_dpo/beta_margin_grad_std': 0.1839817315340042, 'epoch': 0.1} + 10%|███████▉ | 68/681 [02:59<25:44, 2.52s/it] 10%|████████ | 69/681 [03:01<26:26, 2.59s/it] {'loss': 0.7568, 'grad_norm': 72.10195922851562, 'learning_rate': 4.927536231884058e-07, 'beta_dpo/gap_mean': 7.689189434051514, 'beta_dpo/gap_std': 8.327251434326172, 'beta_dpo/beta_used_raw': 0.12943625450134277, 'beta_dpo/beta_used': 0.12943625450134277, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5902745723724365, 'logits/rejected': -0.5661255717277527, 'beta_dpo/beta': 0.12943625450134277, 'beta_dpo/loss_margin_mean': 9.463652610778809, 'beta_dpo/beta_margin_mean': 1.2787585258483887, 'beta_dpo/beta_margin_std': 1.491976022720337, 'beta_dpo/beta_margin_grad_mean': -0.28914546966552734, 'beta_dpo/beta_margin_grad_std': 0.1749580055475235, 'epoch': 0.1} + 10%|████████ | 69/681 [03:01<26:26, 2.59s/it] 10%|████████ | 70/681 [03:04<26:00, 2.55s/it] {'loss': 1.0241, 'grad_norm': 58.23539352416992, 'learning_rate': 5e-07, 'beta_dpo/gap_mean': 8.018512725830078, 'beta_dpo/gap_std': 8.71467399597168, 'beta_dpo/beta_used_raw': 0.06600124388933182, 'beta_dpo/beta_used': 0.0740790069103241, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6226065158843994, 'logits/rejected': -0.5874596834182739, 'beta_dpo/beta': 0.0740790069103241, 'beta_dpo/loss_margin_mean': 10.070143699645996, 'beta_dpo/beta_margin_mean': 0.8269989490509033, 'beta_dpo/beta_margin_std': 1.3370610475540161, 'beta_dpo/beta_margin_grad_mean': -0.369037926197052, 'beta_dpo/beta_margin_grad_std': 0.1858556717634201, 'epoch': 0.1} + 10%|████████ | 70/681 [03:04<26:00, 2.55s/it] 10%|████████▏ | 71/681 [03:06<26:00, 2.56s/it] {'loss': 0.8167, 'grad_norm': 47.67396545410156, 'learning_rate': 4.999967061337492e-07, 'beta_dpo/gap_mean': 8.682525634765625, 'beta_dpo/gap_std': 9.29095458984375, 'beta_dpo/beta_used_raw': 0.10465647280216217, 'beta_dpo/beta_used': 0.10465647280216217, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6586190462112427, 'logits/rejected': -0.6172687411308289, 'beta_dpo/beta': 0.10465647280216217, 'beta_dpo/loss_margin_mean': 11.49172306060791, 'beta_dpo/beta_margin_mean': 1.1773220300674438, 'beta_dpo/beta_margin_std': 1.2341235876083374, 'beta_dpo/beta_margin_grad_mean': -0.29003310203552246, 'beta_dpo/beta_margin_grad_std': 0.17214025557041168, 'epoch': 0.1} + 10%|████████▏ | 71/681 [03:06<26:00, 2.56s/it] 11%|████████▎ | 72/681 [03:09<26:24, 2.60s/it] {'loss': 0.5912, 'grad_norm': 75.66039276123047, 'learning_rate': 4.999868246217933e-07, 'beta_dpo/gap_mean': 9.315265655517578, 'beta_dpo/gap_std': 9.664226531982422, 'beta_dpo/beta_used_raw': 0.1546517014503479, 'beta_dpo/beta_used': 0.1546517014503479, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6442696452140808, 'logits/rejected': -0.6082816123962402, 'beta_dpo/beta': 0.1546517014503479, 'beta_dpo/loss_margin_mean': 12.12594985961914, 'beta_dpo/beta_margin_mean': 1.905733585357666, 'beta_dpo/beta_margin_std': 2.095893383026123, 'beta_dpo/beta_margin_grad_mean': -0.24096769094467163, 'beta_dpo/beta_margin_grad_std': 0.22502072155475616, 'epoch': 0.11} + 11%|████████▎ | 72/681 [03:09<26:24, 2.60s/it] 11%|████████▍ | 73/681 [03:12<26:53, 2.65s/it] {'loss': 0.877, 'grad_norm': 63.61186981201172, 'learning_rate': 4.999703557245192e-07, 'beta_dpo/gap_mean': 9.892107009887695, 'beta_dpo/gap_std': 10.947005271911621, 'beta_dpo/beta_used_raw': 0.09382159262895584, 'beta_dpo/beta_used': 0.09382159262895584, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6697020530700684, 'logits/rejected': -0.6270005702972412, 'beta_dpo/beta': 0.09382159262895584, 'beta_dpo/loss_margin_mean': 12.176095008850098, 'beta_dpo/beta_margin_mean': 1.1267133951187134, 'beta_dpo/beta_margin_std': 1.6691551208496094, 'beta_dpo/beta_margin_grad_mean': -0.3226276934146881, 'beta_dpo/beta_margin_grad_std': 0.23689226806163788, 'epoch': 0.11} + 11%|████████▍ | 73/681 [03:12<26:53, 2.65s/it] 11%|████████▌ | 74/681 [03:14<26:28, 2.62s/it] {'loss': 1.0827, 'grad_norm': 36.97188949584961, 'learning_rate': 4.999472998758977e-07, 'beta_dpo/gap_mean': 10.440993309020996, 'beta_dpo/gap_std': 12.396344184875488, 'beta_dpo/beta_used_raw': 0.04306982085108757, 'beta_dpo/beta_used': 0.0458955280482769, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.605143129825592, 'logits/rejected': -0.5923604965209961, 'beta_dpo/beta': 0.0458955280482769, 'beta_dpo/loss_margin_mean': 13.167186737060547, 'beta_dpo/beta_margin_mean': 0.6362202763557434, 'beta_dpo/beta_margin_std': 1.2357457876205444, 'beta_dpo/beta_margin_grad_mean': -0.38849544525146484, 'beta_dpo/beta_margin_grad_std': 0.181712806224823, 'epoch': 0.11} + 11%|████████▌ | 74/681 [03:14<26:28, 2.62s/it] 11%|████████▋ | 75/681 [03:17<26:41, 2.64s/it] {'loss': 0.6467, 'grad_norm': 92.53497314453125, 'learning_rate': 4.999176576834721e-07, 'beta_dpo/gap_mean': 11.546646118164062, 'beta_dpo/gap_std': 13.614230155944824, 'beta_dpo/beta_used_raw': 0.1566300094127655, 'beta_dpo/beta_used': 0.1566300094127655, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6607112288475037, 'logits/rejected': -0.6499860286712646, 'beta_dpo/beta': 0.1566300094127655, 'beta_dpo/loss_margin_mean': 18.43977165222168, 'beta_dpo/beta_margin_mean': 3.023698568344116, 'beta_dpo/beta_margin_std': 3.2827866077423096, 'beta_dpo/beta_margin_grad_mean': -0.19501835107803345, 'beta_dpo/beta_margin_grad_std': 0.2327680140733719, 'epoch': 0.11} + 11%|████████▋ | 75/681 [03:17<26:41, 2.64s/it] 11%|████████▊ | 76/681 [03:20<26:23, 2.62s/it] {'loss': 1.0088, 'grad_norm': 44.36159133911133, 'learning_rate': 4.998814299283415e-07, 'beta_dpo/gap_mean': 12.032630920410156, 'beta_dpo/gap_std': 13.884933471679688, 'beta_dpo/beta_used_raw': 0.004215408116579056, 'beta_dpo/beta_used': 0.05693836510181427, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6945298910140991, 'logits/rejected': -0.6507744789123535, 'beta_dpo/beta': 0.05693836510181427, 'beta_dpo/loss_margin_mean': 11.839239120483398, 'beta_dpo/beta_margin_mean': 0.6884029507637024, 'beta_dpo/beta_margin_std': 1.4083665609359741, 'beta_dpo/beta_margin_grad_mean': -0.3819631040096283, 'beta_dpo/beta_margin_grad_std': 0.20355312526226044, 'epoch': 0.11} + 11%|████████▊ | 76/681 [03:20<26:23, 2.62s/it] 11%|████████▉ | 77/681 [03:22<25:14, 2.51s/it] {'loss': 0.3922, 'grad_norm': 122.56193542480469, 'learning_rate': 4.998386175651409e-07, 'beta_dpo/gap_mean': 13.085380554199219, 'beta_dpo/gap_std': 14.796323776245117, 'beta_dpo/beta_used_raw': 0.3072592616081238, 'beta_dpo/beta_used': 0.3072592616081238, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6592667102813721, 'logits/rejected': -0.6153388023376465, 'beta_dpo/beta': 0.3072592616081238, 'beta_dpo/loss_margin_mean': 18.652969360351562, 'beta_dpo/beta_margin_mean': 6.070537090301514, 'beta_dpo/beta_margin_std': 7.8197712898254395, 'beta_dpo/beta_margin_grad_mean': -0.16621431708335876, 'beta_dpo/beta_margin_grad_std': 0.2623097002506256, 'epoch': 0.11} + 11%|████████▉ | 77/681 [03:22<25:14, 2.51s/it] 11%|█████████ | 78/681 [03:25<25:37, 2.55s/it] {'loss': 0.7759, 'grad_norm': 55.331443786621094, 'learning_rate': 4.997892217220159e-07, 'beta_dpo/gap_mean': 13.365839958190918, 'beta_dpo/gap_std': 15.315971374511719, 'beta_dpo/beta_used_raw': 0.12561628222465515, 'beta_dpo/beta_used': 0.14949087798595428, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6192628145217896, 'logits/rejected': -0.5899114608764648, 'beta_dpo/beta': 0.14949087798595428, 'beta_dpo/loss_margin_mean': 14.452160835266113, 'beta_dpo/beta_margin_mean': 2.418715238571167, 'beta_dpo/beta_margin_std': 3.7272212505340576, 'beta_dpo/beta_margin_grad_mean': -0.3154319226741791, 'beta_dpo/beta_margin_grad_std': 0.24938298761844635, 'epoch': 0.11} + 11%|█████████ | 78/681 [03:25<25:37, 2.55s/it] 12%|█████████▏ | 79/681 [03:27<25:50, 2.58s/it] {'loss': 0.8819, 'grad_norm': 69.28112030029297, 'learning_rate': 4.997332437005931e-07, 'beta_dpo/gap_mean': 13.848381042480469, 'beta_dpo/gap_std': 16.022428512573242, 'beta_dpo/beta_used_raw': -0.001482747495174408, 'beta_dpo/beta_used': 0.11019716411828995, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6417551636695862, 'logits/rejected': -0.608524739742279, 'beta_dpo/beta': 0.11019716411828995, 'beta_dpo/loss_margin_mean': 15.7933349609375, 'beta_dpo/beta_margin_mean': 1.7321070432662964, 'beta_dpo/beta_margin_std': 3.166022777557373, 'beta_dpo/beta_margin_grad_mean': -0.3492397964000702, 'beta_dpo/beta_margin_grad_std': 0.24441301822662354, 'epoch': 0.12} + 12%|█████████▏ | 79/681 [03:27<25:50, 2.58s/it] 12%|█████████▎ | 80/681 [03:30<25:48, 2.58s/it] {'loss': 1.3671, 'grad_norm': 2.357767343521118, 'learning_rate': 4.996706849759452e-07, 'beta_dpo/gap_mean': 14.141023635864258, 'beta_dpo/gap_std': 16.736181259155273, 'beta_dpo/beta_used_raw': -0.12951478362083435, 'beta_dpo/beta_used': 0.001718068728223443, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.7271322011947632, 'logits/rejected': -0.6814069747924805, 'beta_dpo/beta': 0.001718068728223443, 'beta_dpo/loss_margin_mean': 14.089604377746582, 'beta_dpo/beta_margin_mean': 0.02697627618908882, 'beta_dpo/beta_margin_std': 0.04508247226476669, 'beta_dpo/beta_margin_grad_mean': -0.49326348304748535, 'beta_dpo/beta_margin_grad_std': 0.011248563416302204, 'epoch': 0.12} + 12%|█████████▎ | 80/681 [03:30<25:48, 2.58s/it] 12%|█████████▍ | 81/681 [03:33<26:38, 2.66s/it] {'loss': 1.0778, 'grad_norm': 137.00436401367188, 'learning_rate': 4.996015471965529e-07, 'beta_dpo/gap_mean': 14.902729034423828, 'beta_dpo/gap_std': 17.593263626098633, 'beta_dpo/beta_used_raw': 0.08890701830387115, 'beta_dpo/beta_used': 0.1173420324921608, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.7320711016654968, 'logits/rejected': -0.699401319026947, 'beta_dpo/beta': 0.1173420324921608, 'beta_dpo/loss_margin_mean': 19.815006256103516, 'beta_dpo/beta_margin_mean': 2.6220462322235107, 'beta_dpo/beta_margin_std': 4.677156925201416, 'beta_dpo/beta_margin_grad_mean': -0.3296668529510498, 'beta_dpo/beta_margin_grad_std': 0.2772652506828308, 'epoch': 0.12} + 12%|█████████▍ | 81/681 [03:33<26:38, 2.66s/it] 12%|█████████▌ | 82/681 [03:35<25:55, 2.60s/it] {'loss': 1.0506, 'grad_norm': 50.82543182373047, 'learning_rate': 4.995258321842611e-07, 'beta_dpo/gap_mean': 14.832651138305664, 'beta_dpo/gap_std': 18.701509475708008, 'beta_dpo/beta_used_raw': 0.04351024702191353, 'beta_dpo/beta_used': 0.04351024702191353, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.649533748626709, 'logits/rejected': -0.6332418918609619, 'beta_dpo/beta': 0.04351024702191353, 'beta_dpo/loss_margin_mean': 15.354249954223633, 'beta_dpo/beta_margin_mean': 0.48920586705207825, 'beta_dpo/beta_margin_std': 1.2577557563781738, 'beta_dpo/beta_margin_grad_mean': -0.40177345275878906, 'beta_dpo/beta_margin_grad_std': 0.19916068017482758, 'epoch': 0.12} + 12%|█████████▌ | 82/681 [03:35<25:55, 2.60s/it] 12%|█████████▋ | 83/681 [03:38<25:23, 2.55s/it] {'loss': 1.3736, 'grad_norm': 1.6841143369674683, 'learning_rate': 4.994435419342304e-07, 'beta_dpo/gap_mean': 15.605181694030762, 'beta_dpo/gap_std': 19.392963409423828, 'beta_dpo/beta_used_raw': -0.06825613230466843, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6862367391586304, 'logits/rejected': -0.643555760383606, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 17.986804962158203, 'beta_dpo/beta_margin_mean': 0.017986806109547615, 'beta_dpo/beta_margin_std': 0.021009791642427444, 'beta_dpo/beta_margin_grad_mean': -0.4955040216445923, 'beta_dpo/beta_margin_grad_std': 0.0052512530237436295, 'epoch': 0.12} + 12%|█████████▋ | 83/681 [03:38<25:23, 2.55s/it] 12%|█████████▋ | 84/681 [03:40<25:59, 2.61s/it] {'loss': 0.7014, 'grad_norm': 86.9267349243164, 'learning_rate': 4.993546786148857e-07, 'beta_dpo/gap_mean': 15.893194198608398, 'beta_dpo/gap_std': 18.990737915039062, 'beta_dpo/beta_used_raw': 0.14811725914478302, 'beta_dpo/beta_used': 0.14811725914478302, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6301603317260742, 'logits/rejected': -0.5886775851249695, 'beta_dpo/beta': 0.14811725914478302, 'beta_dpo/loss_margin_mean': 15.966986656188965, 'beta_dpo/beta_margin_mean': 2.8868696689605713, 'beta_dpo/beta_margin_std': 4.1358442306518555, 'beta_dpo/beta_margin_grad_mean': -0.2708915174007416, 'beta_dpo/beta_margin_grad_std': 0.20906409621238708, 'epoch': 0.12} + 12%|█████████▋ | 84/681 [03:40<25:59, 2.61s/it] 12%|█████████▊ | 85/681 [03:43<26:01, 2.62s/it] {'loss': 1.0304, 'grad_norm': 61.42685317993164, 'learning_rate': 4.992592445678582e-07, 'beta_dpo/gap_mean': 15.512821197509766, 'beta_dpo/gap_std': 18.84861183166504, 'beta_dpo/beta_used_raw': -0.06038748845458031, 'beta_dpo/beta_used': 0.05548453703522682, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6268604397773743, 'logits/rejected': -0.5931763648986816, 'beta_dpo/beta': 0.05548453703522682, 'beta_dpo/loss_margin_mean': 15.94063663482666, 'beta_dpo/beta_margin_mean': 1.1414363384246826, 'beta_dpo/beta_margin_std': 1.9398654699325562, 'beta_dpo/beta_margin_grad_mean': -0.3620225489139557, 'beta_dpo/beta_margin_grad_std': 0.21889419853687286, 'epoch': 0.12} + 12%|█████████▊ | 85/681 [03:43<26:01, 2.62s/it] 13%|█████████▉ | 86/681 [03:46<26:50, 2.71s/it] {'loss': 1.17, 'grad_norm': 116.6102523803711, 'learning_rate': 4.991572423079235e-07, 'beta_dpo/gap_mean': 15.852239608764648, 'beta_dpo/gap_std': 20.208812713623047, 'beta_dpo/beta_used_raw': -0.07008485496044159, 'beta_dpo/beta_used': 0.08018074184656143, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6792384386062622, 'logits/rejected': -0.6633239984512329, 'beta_dpo/beta': 0.08018074184656143, 'beta_dpo/loss_margin_mean': 16.962554931640625, 'beta_dpo/beta_margin_mean': 1.180087924003601, 'beta_dpo/beta_margin_std': 3.0287249088287354, 'beta_dpo/beta_margin_grad_mean': -0.3861086666584015, 'beta_dpo/beta_margin_grad_std': 0.2810457944869995, 'epoch': 0.13} + 13%|█████████▉ | 86/681 [03:46<26:50, 2.71s/it] 13%|██████████ | 87/681 [03:48<26:21, 2.66s/it] {'loss': 0.7054, 'grad_norm': 81.023681640625, 'learning_rate': 4.990486745229364e-07, 'beta_dpo/gap_mean': 16.574663162231445, 'beta_dpo/gap_std': 21.20650863647461, 'beta_dpo/beta_used_raw': 0.12275532633066177, 'beta_dpo/beta_used': 0.12275532633066177, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.7079585790634155, 'logits/rejected': -0.675015389919281, 'beta_dpo/beta': 0.12275532633066177, 'beta_dpo/loss_margin_mean': 18.905868530273438, 'beta_dpo/beta_margin_mean': 2.5129313468933105, 'beta_dpo/beta_margin_std': 3.3721165657043457, 'beta_dpo/beta_margin_grad_mean': -0.2609297037124634, 'beta_dpo/beta_margin_grad_std': 0.26698076725006104, 'epoch': 0.13} + 13%|██████████ | 87/681 [03:48<26:21, 2.66s/it] 13%|██████████▏ | 88/681 [03:51<26:11, 2.65s/it] {'loss': 1.0505, 'grad_norm': 91.79285430908203, 'learning_rate': 4.989335440737586e-07, 'beta_dpo/gap_mean': 16.420879364013672, 'beta_dpo/gap_std': 22.033344268798828, 'beta_dpo/beta_used_raw': 0.07114126533269882, 'beta_dpo/beta_used': 0.10302203893661499, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.661591649055481, 'logits/rejected': -0.6481854915618896, 'beta_dpo/beta': 0.10302203893661499, 'beta_dpo/loss_margin_mean': 14.693923950195312, 'beta_dpo/beta_margin_mean': 1.8344087600708008, 'beta_dpo/beta_margin_std': 4.733022689819336, 'beta_dpo/beta_margin_grad_mean': -0.38228002190589905, 'beta_dpo/beta_margin_grad_std': 0.26822036504745483, 'epoch': 0.13} + 13%|██████████▏ | 88/681 [03:51<26:11, 2.65s/it] 13%|██████████▎ | 89/681 [03:54<25:40, 2.60s/it] {'loss': 0.8893, 'grad_norm': 84.89918518066406, 'learning_rate': 4.988118539941847e-07, 'beta_dpo/gap_mean': 15.963903427124023, 'beta_dpo/gap_std': 21.23855209350586, 'beta_dpo/beta_used_raw': -0.0026644468307495117, 'beta_dpo/beta_used': 0.12089363485574722, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.7054777145385742, 'logits/rejected': -0.666853129863739, 'beta_dpo/beta': 0.12089363485574722, 'beta_dpo/loss_margin_mean': 15.816018104553223, 'beta_dpo/beta_margin_mean': 2.3120830059051514, 'beta_dpo/beta_margin_std': 3.9636423587799072, 'beta_dpo/beta_margin_grad_mean': -0.32127439975738525, 'beta_dpo/beta_margin_grad_std': 0.2475607842206955, 'epoch': 0.13} + 13%|██████████▎ | 89/681 [03:54<25:40, 2.60s/it] 13%|██████████▍ | 90/681 [03:56<25:07, 2.55s/it] {'loss': 1.3734, 'grad_norm': 1.6320456266403198, 'learning_rate': 4.986836074908615e-07, 'beta_dpo/gap_mean': 16.511451721191406, 'beta_dpo/gap_std': 22.19609832763672, 'beta_dpo/beta_used_raw': -0.10932803153991699, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6601126194000244, 'logits/rejected': -0.6607536673545837, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 20.473350524902344, 'beta_dpo/beta_margin_mean': 0.020473351702094078, 'beta_dpo/beta_margin_std': 0.029139788821339607, 'beta_dpo/beta_margin_grad_mean': -0.4948834478855133, 'beta_dpo/beta_margin_grad_std': 0.007280984427779913, 'epoch': 0.13} + 13%|██████████▍ | 90/681 [03:56<25:07, 2.55s/it] 13%|██████████▌ | 91/681 [03:59<25:16, 2.57s/it] {'loss': 1.135, 'grad_norm': 163.5145721435547, 'learning_rate': 4.985488079432037e-07, 'beta_dpo/gap_mean': 16.999650955200195, 'beta_dpo/gap_std': 22.816213607788086, 'beta_dpo/beta_used_raw': 0.060106635093688965, 'beta_dpo/beta_used': 0.0956064909696579, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.683163583278656, 'logits/rejected': -0.6435012817382812, 'beta_dpo/beta': 0.0956064909696579, 'beta_dpo/loss_margin_mean': 17.7425594329834, 'beta_dpo/beta_margin_mean': 1.9452344179153442, 'beta_dpo/beta_margin_std': 3.7261810302734375, 'beta_dpo/beta_margin_grad_mean': -0.36107704043388367, 'beta_dpo/beta_margin_grad_std': 0.26534104347229004, 'epoch': 0.13} + 13%|██████████▌ | 91/681 [03:59<25:16, 2.57s/it] 14%|██████████▋ | 92/681 [04:01<24:58, 2.54s/it] {'loss': 1.3231, 'grad_norm': 7.026480197906494, 'learning_rate': 4.984074589033043e-07, 'beta_dpo/gap_mean': 17.035350799560547, 'beta_dpo/gap_std': 22.991302490234375, 'beta_dpo/beta_used_raw': -0.09906575083732605, 'beta_dpo/beta_used': 0.004416329320520163, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.714478611946106, 'logits/rejected': -0.685989499092102, 'beta_dpo/beta': 0.004416329320520163, 'beta_dpo/loss_margin_mean': 17.186429977416992, 'beta_dpo/beta_margin_mean': 0.08677387237548828, 'beta_dpo/beta_margin_std': 0.1371731013059616, 'beta_dpo/beta_margin_grad_mean': -0.47850051522254944, 'beta_dpo/beta_margin_grad_std': 0.033828821033239365, 'epoch': 0.14} + 14%|██████████▋ | 92/681 [04:01<24:58, 2.54s/it] 14%|██████████▊ | 93/681 [04:03<23:41, 2.42s/it] {'loss': 1.216, 'grad_norm': 17.654693603515625, 'learning_rate': 4.982595640958425e-07, 'beta_dpo/gap_mean': 17.194652557373047, 'beta_dpo/gap_std': 22.38436508178711, 'beta_dpo/beta_used_raw': 0.003189191222190857, 'beta_dpo/beta_used': 0.012795208021998405, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.7458562850952148, 'logits/rejected': -0.6881492137908936, 'beta_dpo/beta': 0.012795208021998405, 'beta_dpo/loss_margin_mean': 17.124244689941406, 'beta_dpo/beta_margin_mean': 0.21816346049308777, 'beta_dpo/beta_margin_std': 0.3640429377555847, 'beta_dpo/beta_margin_grad_mean': -0.4487362504005432, 'beta_dpo/beta_margin_grad_std': 0.08327450603246689, 'epoch': 0.14} + 14%|██████████▊ | 93/681 [04:03<23:41, 2.42s/it] 14%|██████████▉ | 94/681 [04:06<24:53, 2.54s/it] {'loss': 0.9494, 'grad_norm': 86.43866729736328, 'learning_rate': 4.98105127417984e-07, 'beta_dpo/gap_mean': 17.62067222595215, 'beta_dpo/gap_std': 22.231197357177734, 'beta_dpo/beta_used_raw': 0.05387556180357933, 'beta_dpo/beta_used': 0.08266030997037888, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6766291260719299, 'logits/rejected': -0.6523104310035706, 'beta_dpo/beta': 0.08266030997037888, 'beta_dpo/loss_margin_mean': 19.16136932373047, 'beta_dpo/beta_margin_mean': 1.8386805057525635, 'beta_dpo/beta_margin_std': 3.1514334678649902, 'beta_dpo/beta_margin_grad_mean': -0.34286096692085266, 'beta_dpo/beta_margin_grad_std': 0.254118949174881, 'epoch': 0.14} + 14%|██████████▉ | 94/681 [04:06<24:53, 2.54s/it] 14%|███████████ | 95/681 [04:09<24:37, 2.52s/it] {'loss': 1.3739, 'grad_norm': 1.29397714138031, 'learning_rate': 4.979441529392784e-07, 'beta_dpo/gap_mean': 17.355606079101562, 'beta_dpo/gap_std': 21.673551559448242, 'beta_dpo/beta_used_raw': -0.1939472258090973, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.737293004989624, 'logits/rejected': -0.7039185166358948, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 15.809903144836426, 'beta_dpo/beta_margin_mean': 0.015809904783964157, 'beta_dpo/beta_margin_std': 0.018763281404972076, 'beta_dpo/beta_margin_grad_mean': -0.4960479736328125, 'beta_dpo/beta_margin_grad_std': 0.00468993978574872, 'epoch': 0.14} + 14%|███████████ | 95/681 [04:09<24:37, 2.52s/it] 14%|███████████▏ | 96/681 [04:11<24:43, 2.54s/it] {'loss': 0.7946, 'grad_norm': 48.836517333984375, 'learning_rate': 4.977766449015534e-07, 'beta_dpo/gap_mean': 17.98691177368164, 'beta_dpo/gap_std': 21.86615753173828, 'beta_dpo/beta_used_raw': -0.02500748634338379, 'beta_dpo/beta_used': 0.1486305147409439, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.7020214796066284, 'logits/rejected': -0.6632054448127747, 'beta_dpo/beta': 0.1486305147409439, 'beta_dpo/loss_margin_mean': 21.639110565185547, 'beta_dpo/beta_margin_mean': 3.7028400897979736, 'beta_dpo/beta_margin_std': 6.17563533782959, 'beta_dpo/beta_margin_grad_mean': -0.30508890748023987, 'beta_dpo/beta_margin_grad_std': 0.2417270988225937, 'epoch': 0.14} + 14%|███████████▏ | 96/681 [04:11<24:43, 2.54s/it] 14%|███████████▎ | 97/681 [04:14<25:24, 2.61s/it] {'loss': 0.9477, 'grad_norm': 62.58485794067383, 'learning_rate': 4.976026077188012e-07, 'beta_dpo/gap_mean': 17.544296264648438, 'beta_dpo/gap_std': 21.351360321044922, 'beta_dpo/beta_used_raw': 0.023374740034341812, 'beta_dpo/beta_used': 0.06436537951231003, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6425115466117859, 'logits/rejected': -0.5889946818351746, 'beta_dpo/beta': 0.06436537951231003, 'beta_dpo/loss_margin_mean': 16.46492576599121, 'beta_dpo/beta_margin_mean': 1.3948326110839844, 'beta_dpo/beta_margin_std': 2.1092705726623535, 'beta_dpo/beta_margin_grad_mean': -0.3319231867790222, 'beta_dpo/beta_margin_grad_std': 0.21798565983772278, 'epoch': 0.14} + 14%|███████████▎ | 97/681 [04:14<25:24, 2.61s/it] 14%|███████████▎ | 98/681 [04:16<24:43, 2.55s/it] {'loss': 1.0858, 'grad_norm': 155.92921447753906, 'learning_rate': 4.974220459770639e-07, 'beta_dpo/gap_mean': 17.85407257080078, 'beta_dpo/gap_std': 21.613468170166016, 'beta_dpo/beta_used_raw': 0.16680875420570374, 'beta_dpo/beta_used': 0.1993415206670761, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6993780136108398, 'logits/rejected': -0.6774000525474548, 'beta_dpo/beta': 0.1993415206670761, 'beta_dpo/loss_margin_mean': 18.21445655822754, 'beta_dpo/beta_margin_mean': 3.733274221420288, 'beta_dpo/beta_margin_std': 8.150524139404297, 'beta_dpo/beta_margin_grad_mean': -0.3418026566505432, 'beta_dpo/beta_margin_grad_std': 0.29540500044822693, 'epoch': 0.14} + 14%|███████████▎ | 98/681 [04:16<24:43, 2.55s/it] 15%|███████████▍ | 99/681 [04:18<23:46, 2.45s/it] {'loss': 0.7627, 'grad_norm': 45.9489860534668, 'learning_rate': 4.972349644343108e-07, 'beta_dpo/gap_mean': 18.435466766357422, 'beta_dpo/gap_std': 22.153942108154297, 'beta_dpo/beta_used_raw': 0.05922618508338928, 'beta_dpo/beta_used': 0.05922618508338928, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6738119125366211, 'logits/rejected': -0.6671220660209656, 'beta_dpo/beta': 0.05922618508338928, 'beta_dpo/loss_margin_mean': 21.74091911315918, 'beta_dpo/beta_margin_mean': 1.2846572399139404, 'beta_dpo/beta_margin_std': 1.4927436113357544, 'beta_dpo/beta_margin_grad_mean': -0.29211270809173584, 'beta_dpo/beta_margin_grad_std': 0.1934242695569992, 'epoch': 0.15} + 15%|███████████▍ | 99/681 [04:19<23:46, 2.45s/it] 15%|███████████▍ | 100/681 [04:21<24:11, 2.50s/it] {'loss': 0.9552, 'grad_norm': 40.60963821411133, 'learning_rate': 4.970413680203148e-07, 'beta_dpo/gap_mean': 17.79035186767578, 'beta_dpo/gap_std': 22.48064422607422, 'beta_dpo/beta_used_raw': 0.027484482154250145, 'beta_dpo/beta_used': 0.049059588462114334, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.6705986261367798, 'logits/rejected': -0.62305748462677, 'beta_dpo/beta': 0.049059588462114334, 'beta_dpo/loss_margin_mean': 13.807634353637695, 'beta_dpo/beta_margin_mean': 0.7820718884468079, 'beta_dpo/beta_margin_std': 1.3751544952392578, 'beta_dpo/beta_margin_grad_mean': -0.376477986574173, 'beta_dpo/beta_margin_grad_std': 0.19105187058448792, 'epoch': 0.15} + 15%|███████████▍ | 100/681 [04:21<24:11, 2.50s/it][INFO|trainer.py:4307] 2026-04-17 23:27:53,623 >> +***** Running Evaluation ***** +[INFO|trainer.py:4309] 2026-04-17 23:27:53,624 >> Num examples = 2339 +[INFO|trainer.py:4312] 2026-04-17 23:27:53,624 >> Batch size = 8 + + 0%| | 0/73 [00:00> +***** Running Evaluation ***** +[INFO|trainer.py:4309] 2026-04-17 23:32:49,827 >> Num examples = 2339 +[INFO|trainer.py:4312] 2026-04-17 23:32:49,827 >> Batch size = 8 + + 0%| | 0/73 [00:00> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-200 +[INFO|configuration_utils.py:419] 2026-04-17 23:33:44,942 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-200/config.json +[INFO|configuration_utils.py:911] 2026-04-17 23:33:45,016 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-200/generation_config.json +[INFO|modeling_utils.py:3580] 2026-04-17 23:34:38,392 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-200/model.safetensors.index.json. +[INFO|tokenization_utils_base.py:2510] 2026-04-17 23:34:38,413 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-200/tokenizer_config.json +[INFO|tokenization_utils_base.py:2519] 2026-04-17 23:34:38,427 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-200/special_tokens_map.json + 30%|█████████████████████▊ | 201/681 [14:55<13:46:06, 103.26s/it] {'loss': 1.3421, 'grad_norm': 3.234513282775879, 'learning_rate': 4.455721242469372e-07, 'beta_dpo/gap_mean': 51.0998420715332, 'beta_dpo/gap_std': 69.32807922363281, 'beta_dpo/beta_used_raw': -0.24471929669380188, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5788037776947021, 'logits/rejected': -0.5658458471298218, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 54.232852935791016, 'beta_dpo/beta_margin_mean': 0.05423285812139511, 'beta_dpo/beta_margin_std': 0.07199931889772415, 'beta_dpo/beta_margin_grad_mean': -0.48646533489227295, 'beta_dpo/beta_margin_grad_std': 0.017951475456357002, 'epoch': 0.3} + 30%|█████████████████████▊ | 201/681 [14:55<13:46:06, 103.26s/it] 30%|██████████████████████▌ | 202/681 [14:58<9:43:39, 73.11s/it] {'loss': 1.3486, 'grad_norm': 3.0596237182617188, 'learning_rate': 4.4477014363141755e-07, 'beta_dpo/gap_mean': 49.74256896972656, 'beta_dpo/gap_std': 69.538330078125, 'beta_dpo/beta_used_raw': -0.5578510165214539, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5542974472045898, 'logits/rejected': -0.557321310043335, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 40.36243438720703, 'beta_dpo/beta_margin_mean': 0.040362436324357986, 'beta_dpo/beta_margin_std': 0.07123276591300964, 'beta_dpo/beta_margin_grad_mean': -0.4899270534515381, 'beta_dpo/beta_margin_grad_std': 0.017768291756510735, 'epoch': 0.3} + 30%|██████████████████████▌ | 202/681 [14:58<9:43:39, 73.11s/it] 30%|██████████████████████▋ | 203/681 [15:01<6:54:41, 52.05s/it] {'loss': 1.347, 'grad_norm': 3.645709753036499, 'learning_rate': 4.439630306414758e-07, 'beta_dpo/gap_mean': 48.89398193359375, 'beta_dpo/gap_std': 68.63645935058594, 'beta_dpo/beta_used_raw': -0.41438037157058716, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.571040153503418, 'logits/rejected': -0.5497109293937683, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 45.81226348876953, 'beta_dpo/beta_margin_mean': 0.04581226408481598, 'beta_dpo/beta_margin_std': 0.06257802248001099, 'beta_dpo/beta_margin_grad_mean': -0.48856452107429504, 'beta_dpo/beta_margin_grad_std': 0.015608040615916252, 'epoch': 0.3} + 30%|██████████████████████▋ | 203/681 [15:01<6:54:41, 52.05s/it] 30%|██████████████████████▊ | 204/681 [15:04<4:56:33, 37.30s/it] {'loss': 1.3582, 'grad_norm': 2.720808982849121, 'learning_rate': 4.431508065452897e-07, 'beta_dpo/gap_mean': 47.7497673034668, 'beta_dpo/gap_std': 70.519287109375, 'beta_dpo/beta_used_raw': -1.0310747623443604, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5842176675796509, 'logits/rejected': -0.5408717393875122, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 42.08584213256836, 'beta_dpo/beta_margin_mean': 0.04208584129810333, 'beta_dpo/beta_margin_std': 0.07838640362024307, 'beta_dpo/beta_margin_grad_mean': -0.489501029253006, 'beta_dpo/beta_margin_grad_std': 0.01954388990998268, 'epoch': 0.3} + 30%|██████████████████████▊ | 204/681 [15:04<4:56:33, 37.30s/it] 30%|██████████████████████▉ | 205/681 [15:07<3:33:13, 26.88s/it] {'loss': 0.9147, 'grad_norm': 358.5487365722656, 'learning_rate': 4.4233349274571974e-07, 'beta_dpo/gap_mean': 50.11834716796875, 'beta_dpo/gap_std': 70.84585571289062, 'beta_dpo/beta_used_raw': 0.12516099214553833, 'beta_dpo/beta_used': 0.2624741196632385, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.54551100730896, 'logits/rejected': -0.5079349875450134, 'beta_dpo/beta': 0.2624741196632385, 'beta_dpo/loss_margin_mean': 59.972965240478516, 'beta_dpo/beta_margin_mean': 21.14405059814453, 'beta_dpo/beta_margin_std': 34.92091369628906, 'beta_dpo/beta_margin_grad_mean': -0.29318341612815857, 'beta_dpo/beta_margin_grad_std': 0.2785731852054596, 'epoch': 0.3} + 30%|██████████████████████▉ | 205/681 [15:07<3:33:13, 26.88s/it] 30%|██████████████████████▉ | 206/681 [15:09<2:34:17, 19.49s/it] {'loss': 5.7592, 'grad_norm': 1746.28271484375, 'learning_rate': 4.415111107797445e-07, 'beta_dpo/gap_mean': 52.5726318359375, 'beta_dpo/gap_std': 71.26499938964844, 'beta_dpo/beta_used_raw': 0.8118077516555786, 'beta_dpo/beta_used': 0.8118077516555786, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5080777406692505, 'logits/rejected': -0.5112833976745605, 'beta_dpo/beta': 0.8118077516555786, 'beta_dpo/loss_margin_mean': 68.52181243896484, 'beta_dpo/beta_margin_mean': 56.539398193359375, 'beta_dpo/beta_margin_std': 60.37042236328125, 'beta_dpo/beta_margin_grad_mean': -0.1911478042602539, 'beta_dpo/beta_margin_grad_std': 0.3803271949291229, 'epoch': 0.3} + 30%|██████████████████████▉ | 206/681 [15:09<2:34:17, 19.49s/it] 30%|███████████████████████ | 207/681 [15:11<1:53:45, 14.40s/it] {'loss': 1.3372, 'grad_norm': 3.9254820346832275, 'learning_rate': 4.4068368231789365e-07, 'beta_dpo/gap_mean': 55.76563262939453, 'beta_dpo/gap_std': 74.22966766357422, 'beta_dpo/beta_used_raw': -0.22176781296730042, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5645418167114258, 'logits/rejected': -0.5385115742683411, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 70.96053314208984, 'beta_dpo/beta_margin_mean': 0.07096053659915924, 'beta_dpo/beta_margin_std': 0.08763889223337173, 'beta_dpo/beta_margin_grad_mean': -0.4823157787322998, 'beta_dpo/beta_margin_grad_std': 0.02179008349776268, 'epoch': 0.3} + 30%|███████████████████████ | 207/681 [15:11<1:53:45, 14.40s/it] 31%|███████████████████████▏ | 208/681 [15:14<1:25:42, 10.87s/it] {'loss': 1.3351, 'grad_norm': 3.8811442852020264, 'learning_rate': 4.398512291636768e-07, 'beta_dpo/gap_mean': 56.717201232910156, 'beta_dpo/gap_std': 76.8087158203125, 'beta_dpo/beta_used_raw': -0.131654754281044, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5704125761985779, 'logits/rejected': -0.5577903985977173, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 55.905086517333984, 'beta_dpo/beta_margin_mean': 0.055905092507600784, 'beta_dpo/beta_margin_std': 0.08626676350831985, 'beta_dpo/beta_margin_grad_mean': -0.48605671525001526, 'beta_dpo/beta_margin_grad_std': 0.021491041406989098, 'epoch': 0.31} + 31%|███████████████████████▏ | 208/681 [15:14<1:25:42, 10.87s/it] 31%|███████████████████████▎ | 209/681 [15:16<1:05:18, 8.30s/it] {'loss': 1.3415, 'grad_norm': 3.4770359992980957, 'learning_rate': 4.3901377325300857e-07, 'beta_dpo/gap_mean': 55.72069549560547, 'beta_dpo/gap_std': 78.26738739013672, 'beta_dpo/beta_used_raw': -0.45698946714401245, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5012378692626953, 'logits/rejected': -0.4895186424255371, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 52.53633117675781, 'beta_dpo/beta_margin_mean': 0.05253633111715317, 'beta_dpo/beta_margin_std': 0.07954316586256027, 'beta_dpo/beta_margin_grad_mean': -0.48689284920692444, 'beta_dpo/beta_margin_grad_std': 0.019833343103528023, 'epoch': 0.31} + 31%|███████████████████████▎ | 209/681 [15:16<1:05:18, 8.30s/it] 31%|████████████████████████ | 210/681 [15:19<51:16, 6.53s/it] {'loss': 1.2834, 'grad_norm': 341.5815124511719, 'learning_rate': 4.381713366536311e-07, 'beta_dpo/gap_mean': 55.32640075683594, 'beta_dpo/gap_std': 78.07096862792969, 'beta_dpo/beta_used_raw': -0.5257502794265747, 'beta_dpo/beta_used': 0.15351513028144836, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4934021234512329, 'logits/rejected': -0.48370587825775146, 'beta_dpo/beta': 0.15351513028144836, 'beta_dpo/loss_margin_mean': 55.78252029418945, 'beta_dpo/beta_margin_mean': 9.529181480407715, 'beta_dpo/beta_margin_std': 20.73506736755371, 'beta_dpo/beta_margin_grad_mean': -0.3444797396659851, 'beta_dpo/beta_margin_grad_std': 0.28890836238861084, 'epoch': 0.31} + 31%|████████████████████████ | 210/681 [15:19<51:16, 6.53s/it] 31%|████████████████████████▏ | 211/681 [15:21<41:16, 5.27s/it] {'loss': 1.3584, 'grad_norm': 3.5843217372894287, 'learning_rate': 4.373239415645323e-07, 'beta_dpo/gap_mean': 54.482818603515625, 'beta_dpo/gap_std': 79.86414337158203, 'beta_dpo/beta_used_raw': -1.437325358390808, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4944462776184082, 'logits/rejected': -0.4566226005554199, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 47.95879364013672, 'beta_dpo/beta_margin_mean': 0.047958794981241226, 'beta_dpo/beta_margin_std': 0.09425321221351624, 'beta_dpo/beta_margin_grad_mean': -0.4880537688732147, 'beta_dpo/beta_margin_grad_std': 0.0234391950070858, 'epoch': 0.31} + 31%|████████████████████████▏ | 211/681 [15:21<41:16, 5.27s/it] 31%|████████████████████████▎ | 212/681 [15:24<34:43, 4.44s/it] {'loss': 29.8368, 'grad_norm': 7063.01416015625, 'learning_rate': 4.3647161031536086e-07, 'beta_dpo/gap_mean': 59.36201477050781, 'beta_dpo/gap_std': 85.50032043457031, 'beta_dpo/beta_used_raw': 1.0547301769256592, 'beta_dpo/beta_used': 1.3223354816436768, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4703846573829651, 'logits/rejected': -0.4657232165336609, 'beta_dpo/beta': 1.3223354816436768, 'beta_dpo/loss_margin_mean': 81.92825317382812, 'beta_dpo/beta_margin_mean': 141.0015869140625, 'beta_dpo/beta_margin_std': 267.85894775390625, 'beta_dpo/beta_margin_grad_mean': -0.35223668813705444, 'beta_dpo/beta_margin_grad_std': 0.32164767384529114, 'epoch': 0.31} + 31%|████████████████████████▎ | 212/681 [15:24<34:43, 4.44s/it] 31%|████████████████████████▍ | 213/681 [15:26<30:19, 3.89s/it] {'loss': 1.335, 'grad_norm': 4.132566452026367, 'learning_rate': 4.3561436536583774e-07, 'beta_dpo/gap_mean': 61.29865646362305, 'beta_dpo/gap_std': 87.67449951171875, 'beta_dpo/beta_used_raw': -0.37106069922447205, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.47617167234420776, 'logits/rejected': -0.44875389337539673, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 71.65673828125, 'beta_dpo/beta_margin_mean': 0.07165674865245819, 'beta_dpo/beta_margin_std': 0.10214556753635406, 'beta_dpo/beta_margin_grad_mean': -0.4821443259716034, 'beta_dpo/beta_margin_grad_std': 0.025429587811231613, 'epoch': 0.31} + 31%|████████████████████████▍ | 213/681 [15:26<30:19, 3.89s/it] 31%|████████████████████████▌ | 214/681 [15:28<26:33, 3.41s/it] {'loss': 1.3312, 'grad_norm': 5.018362998962402, 'learning_rate': 4.3475222930516473e-07, 'beta_dpo/gap_mean': 62.14265823364258, 'beta_dpo/gap_std': 89.926513671875, 'beta_dpo/beta_used_raw': -0.2031300812959671, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4100716710090637, 'logits/rejected': -0.41462287306785583, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 66.73929595947266, 'beta_dpo/beta_margin_mean': 0.06673929840326309, 'beta_dpo/beta_margin_std': 0.09353061765432358, 'beta_dpo/beta_margin_grad_mean': -0.4833696484565735, 'beta_dpo/beta_margin_grad_std': 0.023280689492821693, 'epoch': 0.31} + 31%|████████████████████████▌ | 214/681 [15:28<26:33, 3.41s/it] 32%|████████████████████████▋ | 215/681 [15:31<25:17, 3.26s/it] {'loss': 5.4992, 'grad_norm': 1893.756103515625, 'learning_rate': 4.3388522485142885e-07, 'beta_dpo/gap_mean': 64.10518646240234, 'beta_dpo/gap_std': 91.72321319580078, 'beta_dpo/beta_used_raw': -0.06115126609802246, 'beta_dpo/beta_used': 0.3104745149612427, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4227758049964905, 'logits/rejected': -0.41368818283081055, 'beta_dpo/beta': 0.3104745149612427, 'beta_dpo/loss_margin_mean': 70.83655548095703, 'beta_dpo/beta_margin_mean': 24.29639434814453, 'beta_dpo/beta_margin_std': 55.270938873291016, 'beta_dpo/beta_margin_grad_mean': -0.35084572434425354, 'beta_dpo/beta_margin_grad_std': 0.3201132118701935, 'epoch': 0.32} + 32%|████████████████████████▋ | 215/681 [15:31<25:17, 3.26s/it] 32%|████████████████████████▋ | 216/681 [15:34<24:05, 3.11s/it] {'loss': 1.6065, 'grad_norm': 478.4328918457031, 'learning_rate': 4.330133748510036e-07, 'beta_dpo/gap_mean': 63.70437240600586, 'beta_dpo/gap_std': 92.65457153320312, 'beta_dpo/beta_used_raw': -0.4864157736301422, 'beta_dpo/beta_used': 0.1452518105506897, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4186558425426483, 'logits/rejected': -0.40211576223373413, 'beta_dpo/beta': 0.1452518105506897, 'beta_dpo/loss_margin_mean': 66.72907257080078, 'beta_dpo/beta_margin_mean': 12.094311714172363, 'beta_dpo/beta_margin_std': 23.100305557250977, 'beta_dpo/beta_margin_grad_mean': -0.3200395703315735, 'beta_dpo/beta_margin_grad_std': 0.28639811277389526, 'epoch': 0.32} + 32%|████████████████████████▋ | 216/681 [15:34<24:05, 3.11s/it] 32%|████████████████████████▊ | 217/681 [15:37<22:49, 2.95s/it] {'loss': 1.4452, 'grad_norm': 547.8667602539062, 'learning_rate': 4.3213670227794757e-07, 'beta_dpo/gap_mean': 67.39096069335938, 'beta_dpo/gap_std': 93.9806137084961, 'beta_dpo/beta_used_raw': -0.15126293897628784, 'beta_dpo/beta_used': 0.052402470260858536, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4100034236907959, 'logits/rejected': -0.407045841217041, 'beta_dpo/beta': 0.052402470260858536, 'beta_dpo/loss_margin_mean': 80.78280639648438, 'beta_dpo/beta_margin_mean': 3.8621439933776855, 'beta_dpo/beta_margin_std': 9.069067001342773, 'beta_dpo/beta_margin_grad_mean': -0.38916242122650146, 'beta_dpo/beta_margin_grad_std': 0.3128577768802643, 'epoch': 0.32} + 32%|████████████████████████▊ | 217/681 [15:37<22:49, 2.95s/it] 32%|████████████████████████▉ | 218/681 [15:39<22:10, 2.87s/it] {'loss': 1.3429, 'grad_norm': 5.246548652648926, 'learning_rate': 4.3125523023339815e-07, 'beta_dpo/gap_mean': 66.28788757324219, 'beta_dpo/gap_std': 94.35865783691406, 'beta_dpo/beta_used_raw': -1.1618341207504272, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.431363046169281, 'logits/rejected': -0.4271088242530823, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 57.7301025390625, 'beta_dpo/beta_margin_mean': 0.05773010477423668, 'beta_dpo/beta_margin_std': 0.09657855331897736, 'beta_dpo/beta_margin_grad_mean': -0.4856181740760803, 'beta_dpo/beta_margin_grad_std': 0.024022625759243965, 'epoch': 0.32} + 32%|████████████████████████▉ | 218/681 [15:39<22:10, 2.87s/it] 32%|█████████████████████████ | 219/681 [15:42<21:37, 2.81s/it] {'loss': 1.3488, 'grad_norm': 4.286383152008057, 'learning_rate': 4.303689819449636e-07, 'beta_dpo/gap_mean': 62.747528076171875, 'beta_dpo/gap_std': 96.75794982910156, 'beta_dpo/beta_used_raw': -1.3351142406463623, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4274938106536865, 'logits/rejected': -0.41343453526496887, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 51.15894317626953, 'beta_dpo/beta_margin_mean': 0.05115894228219986, 'beta_dpo/beta_margin_std': 0.10570129752159119, 'beta_dpo/beta_margin_grad_mean': -0.4872594475746155, 'beta_dpo/beta_margin_grad_std': 0.02623271755874157, 'epoch': 0.32} + 32%|█████████████████████████ | 219/681 [15:42<21:37, 2.81s/it] 32%|█████████████████████████▏ | 220/681 [15:45<21:06, 2.75s/it] {'loss': 0.914, 'grad_norm': 1213.65625, 'learning_rate': 4.2947798076611047e-07, 'beta_dpo/gap_mean': 60.67655944824219, 'beta_dpo/gap_std': 93.46902465820312, 'beta_dpo/beta_used_raw': 0.05747605115175247, 'beta_dpo/beta_used': 0.17918218672275543, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45730453729629517, 'logits/rejected': -0.43929389119148254, 'beta_dpo/beta': 0.17918218672275543, 'beta_dpo/loss_margin_mean': 43.123931884765625, 'beta_dpo/beta_margin_mean': 9.130165100097656, 'beta_dpo/beta_margin_std': 20.58268928527832, 'beta_dpo/beta_margin_grad_mean': -0.34378835558891296, 'beta_dpo/beta_margin_grad_std': 0.3021136224269867, 'epoch': 0.32} + 32%|█████████████████████████▏ | 220/681 [15:45<21:06, 2.75s/it] 32%|█████████████████████████▎ | 221/681 [15:47<20:29, 2.67s/it] {'loss': 8.7201, 'grad_norm': 3353.1982421875, 'learning_rate': 4.285822501755485e-07, 'beta_dpo/gap_mean': 63.86392593383789, 'beta_dpo/gap_std': 92.72855377197266, 'beta_dpo/beta_used_raw': 0.8143908977508545, 'beta_dpo/beta_used': 1.0828216075897217, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.41670554876327515, 'logits/rejected': -0.42472323775291443, 'beta_dpo/beta': 1.0828216075897217, 'beta_dpo/loss_margin_mean': 93.2608871459961, 'beta_dpo/beta_margin_mean': 110.4359359741211, 'beta_dpo/beta_margin_std': 165.42660522460938, 'beta_dpo/beta_margin_grad_mean': -0.285607248544693, 'beta_dpo/beta_margin_grad_std': 0.28007781505584717, 'epoch': 0.32} + 32%|█████████████████████████▎ | 221/681 [15:47<20:29, 2.67s/it] 33%|█████████████████████████▍ | 222/681 [15:50<20:15, 2.65s/it] {'loss': 3.3505, 'grad_norm': 1461.958251953125, 'learning_rate': 4.276818137766118e-07, 'beta_dpo/gap_mean': 65.32881164550781, 'beta_dpo/gap_std': 91.67716979980469, 'beta_dpo/beta_used_raw': 0.24051879346370697, 'beta_dpo/beta_used': 0.24051879346370697, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4541017413139343, 'logits/rejected': -0.45362943410873413, 'beta_dpo/beta': 0.24051879346370697, 'beta_dpo/loss_margin_mean': 64.1358642578125, 'beta_dpo/beta_margin_mean': 14.743354797363281, 'beta_dpo/beta_margin_std': 23.80963897705078, 'beta_dpo/beta_margin_grad_mean': -0.19494900107383728, 'beta_dpo/beta_margin_grad_std': 0.3727710545063019, 'epoch': 0.33} + 33%|█████████████████████████▍ | 222/681 [15:50<20:15, 2.65s/it] 33%|█████████████████████████▌ | 223/681 [15:52<19:10, 2.51s/it] {'loss': 1.3355, 'grad_norm': 5.079369068145752, 'learning_rate': 4.2677669529663686e-07, 'beta_dpo/gap_mean': 64.32170104980469, 'beta_dpo/gap_std': 92.70675659179688, 'beta_dpo/beta_used_raw': -0.6068298816680908, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4292357563972473, 'logits/rejected': -0.41810518503189087, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 55.133277893066406, 'beta_dpo/beta_margin_mean': 0.05513327941298485, 'beta_dpo/beta_margin_std': 0.09587711095809937, 'beta_dpo/beta_margin_grad_mean': -0.48626866936683655, 'beta_dpo/beta_margin_grad_std': 0.02382073365151882, 'epoch': 0.33} + 33%|█████████████████████████▌ | 223/681 [15:52<19:10, 2.51s/it] 33%|█████████████████████████▋ | 224/681 [15:54<18:15, 2.40s/it] {'loss': 1.3414, 'grad_norm': 5.289470672607422, 'learning_rate': 4.2586691858633747e-07, 'beta_dpo/gap_mean': 64.98542785644531, 'beta_dpo/gap_std': 92.75971221923828, 'beta_dpo/beta_used_raw': -1.0090042352676392, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.40946879982948303, 'logits/rejected': -0.3898620009422302, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 66.39501190185547, 'beta_dpo/beta_margin_mean': 0.06639501452445984, 'beta_dpo/beta_margin_std': 0.09770266711711884, 'beta_dpo/beta_margin_grad_mean': -0.4834619462490082, 'beta_dpo/beta_margin_grad_std': 0.024282945320010185, 'epoch': 0.33} + 33%|█████████████████████████▋ | 224/681 [15:54<18:15, 2.40s/it] 33%|█████████████████████████▊ | 225/681 [15:56<17:56, 2.36s/it] {'loss': 2.7657, 'grad_norm': 5388.49951171875, 'learning_rate': 4.249525076191759e-07, 'beta_dpo/gap_mean': 66.35330200195312, 'beta_dpo/gap_std': 93.56597137451172, 'beta_dpo/beta_used_raw': 0.5190803408622742, 'beta_dpo/beta_used': 0.5190803408622742, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.42718806862831116, 'logits/rejected': -0.4125264286994934, 'beta_dpo/beta': 0.5190803408622742, 'beta_dpo/loss_margin_mean': 78.7562484741211, 'beta_dpo/beta_margin_mean': 43.59006881713867, 'beta_dpo/beta_margin_std': 67.32926940917969, 'beta_dpo/beta_margin_grad_mean': -0.22263871133327484, 'beta_dpo/beta_margin_grad_std': 0.4009822607040405, 'epoch': 0.33} + 33%|█████████████████████████▊ | 225/681 [15:56<17:56, 2.36s/it] 33%|█████████████████████████▉ | 226/681 [15:59<18:51, 2.49s/it] {'loss': 1.1641, 'grad_norm': 17.83782958984375, 'learning_rate': 4.2403348649073167e-07, 'beta_dpo/gap_mean': 65.67891693115234, 'beta_dpo/gap_std': 93.4427490234375, 'beta_dpo/beta_used_raw': -0.38874343037605286, 'beta_dpo/beta_used': 0.004440045915544033, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.48651188611984253, 'logits/rejected': -0.4508872628211975, 'beta_dpo/beta': 0.004440045915544033, 'beta_dpo/loss_margin_mean': 54.554115295410156, 'beta_dpo/beta_margin_mean': 0.30315059423446655, 'beta_dpo/beta_margin_std': 0.5602424740791321, 'beta_dpo/beta_margin_grad_mean': -0.433518648147583, 'beta_dpo/beta_margin_grad_std': 0.11666657030582428, 'epoch': 0.33} + 33%|█████████████████████████▉ | 226/681 [15:59<18:51, 2.49s/it] 33%|██████████████████████████ | 227/681 [16:02<18:46, 2.48s/it] {'loss': 1.3266, 'grad_norm': 4.970055103302002, 'learning_rate': 4.2310987941806615e-07, 'beta_dpo/gap_mean': 66.89671325683594, 'beta_dpo/gap_std': 93.85809326171875, 'beta_dpo/beta_used_raw': -0.21814611554145813, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.46300429105758667, 'logits/rejected': -0.4537394046783447, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 80.1581802368164, 'beta_dpo/beta_margin_mean': 0.08015818148851395, 'beta_dpo/beta_margin_std': 0.09570427238941193, 'beta_dpo/beta_margin_grad_mean': -0.4800347089767456, 'beta_dpo/beta_margin_grad_std': 0.023754583671689034, 'epoch': 0.33} + 33%|██████████████████████████ | 227/681 [16:02<18:46, 2.48s/it] 33%|██████████████████████████ | 228/681 [16:04<19:45, 2.62s/it] {'loss': 2.1248, 'grad_norm': 576.836669921875, 'learning_rate': 4.2218171073908463e-07, 'beta_dpo/gap_mean': 65.2583999633789, 'beta_dpo/gap_std': 93.36762237548828, 'beta_dpo/beta_used_raw': -0.04854981601238251, 'beta_dpo/beta_used': 0.23101337254047394, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.46720415353775024, 'logits/rejected': -0.4512375593185425, 'beta_dpo/beta': 0.23101337254047394, 'beta_dpo/loss_margin_mean': 55.80860137939453, 'beta_dpo/beta_margin_mean': 11.812125205993652, 'beta_dpo/beta_margin_std': 30.85622215270996, 'beta_dpo/beta_margin_grad_mean': -0.36090514063835144, 'beta_dpo/beta_margin_grad_std': 0.31774094700813293, 'epoch': 0.33} + 33%|██████████████████████████ | 228/681 [16:05<19:45, 2.62s/it] 34%|██████████████████████████▏ | 229/681 [16:07<19:23, 2.57s/it] {'loss': 1.3298, 'grad_norm': 5.151296615600586, 'learning_rate': 4.212490049118951e-07, 'beta_dpo/gap_mean': 63.560447692871094, 'beta_dpo/gap_std': 93.40653991699219, 'beta_dpo/beta_used_raw': -0.1934027224779129, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.52690190076828, 'logits/rejected': -0.4995231628417969, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 55.15596389770508, 'beta_dpo/beta_margin_mean': 0.05515596643090248, 'beta_dpo/beta_margin_std': 0.09170445799827576, 'beta_dpo/beta_margin_grad_mean': -0.486246794462204, 'beta_dpo/beta_margin_grad_std': 0.022847512736916542, 'epoch': 0.34} + 34%|██████████████████████████▏ | 229/681 [16:07<19:23, 2.57s/it] 34%|██████████████████████████▎ | 230/681 [16:09<18:56, 2.52s/it] {'loss': 6.1252, 'grad_norm': 2202.667724609375, 'learning_rate': 4.203117865141635e-07, 'beta_dpo/gap_mean': 66.75623321533203, 'beta_dpo/gap_std': 92.87422180175781, 'beta_dpo/beta_used_raw': 0.7624739408493042, 'beta_dpo/beta_used': 0.7624739408493042, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4267687201499939, 'logits/rejected': -0.43476104736328125, 'beta_dpo/beta': 0.7624739408493042, 'beta_dpo/loss_margin_mean': 86.35228729248047, 'beta_dpo/beta_margin_mean': 65.0893325805664, 'beta_dpo/beta_margin_std': 68.40202331542969, 'beta_dpo/beta_margin_grad_mean': -0.14170564711093903, 'beta_dpo/beta_margin_grad_std': 0.3462100327014923, 'epoch': 0.34} + 34%|██████████████████████████▎ | 230/681 [16:09<18:56, 2.52s/it] 34%|██████████████████████████▍ | 231/681 [16:12<19:06, 2.55s/it] {'loss': 1.3451, 'grad_norm': 3.6118087768554688, 'learning_rate': 4.1937008024246625e-07, 'beta_dpo/gap_mean': 65.62940979003906, 'beta_dpo/gap_std': 90.5175552368164, 'beta_dpo/beta_used_raw': -1.2731541395187378, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.48225754499435425, 'logits/rejected': -0.4550408124923706, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 55.06380844116211, 'beta_dpo/beta_margin_mean': 0.05506381019949913, 'beta_dpo/beta_margin_std': 0.0815558135509491, 'beta_dpo/beta_margin_grad_mean': -0.4862736463546753, 'beta_dpo/beta_margin_grad_std': 0.020299429073929787, 'epoch': 0.34} + 34%|██████████████████████████▍ | 231/681 [16:12<19:06, 2.55s/it] 34%|██████████████████████████▌ | 232/681 [16:15<19:42, 2.63s/it] {'loss': 1.3498, 'grad_norm': 3.735759973526001, 'learning_rate': 4.1842391091163933e-07, 'beta_dpo/gap_mean': 62.820167541503906, 'beta_dpo/gap_std': 90.34293365478516, 'beta_dpo/beta_used_raw': -1.4023609161376953, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.459547221660614, 'logits/rejected': -0.43855172395706177, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 45.41696548461914, 'beta_dpo/beta_margin_mean': 0.04541696980595589, 'beta_dpo/beta_margin_std': 0.08694743365049362, 'beta_dpo/beta_margin_grad_mean': -0.4886838495731354, 'beta_dpo/beta_margin_grad_std': 0.02164299599826336, 'epoch': 0.34} + 34%|██████████████████████████▌ | 232/681 [16:15<19:42, 2.63s/it] 34%|██████████████████████████▋ | 233/681 [16:18<19:58, 2.68s/it] {'loss': 13.3621, 'grad_norm': 2282.5595703125, 'learning_rate': 4.174733034541245e-07, 'beta_dpo/gap_mean': 63.577369689941406, 'beta_dpo/gap_std': 93.35490417480469, 'beta_dpo/beta_used_raw': 0.6553887128829956, 'beta_dpo/beta_used': 0.6553887128829956, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4606332778930664, 'logits/rejected': -0.46368852257728577, 'beta_dpo/beta': 0.6553887128829956, 'beta_dpo/loss_margin_mean': 80.09815216064453, 'beta_dpo/beta_margin_mean': 55.271568298339844, 'beta_dpo/beta_margin_std': 99.48710632324219, 'beta_dpo/beta_margin_grad_mean': -0.2702082693576813, 'beta_dpo/beta_margin_grad_std': 0.43462416529655457, 'epoch': 0.34} + 34%|██████████████████████████▋ | 233/681 [16:18<19:58, 2.68s/it] 34%|██████████████████████████▊ | 234/681 [16:20<19:56, 2.68s/it] {'loss': 8.0627, 'grad_norm': 2767.52880859375, 'learning_rate': 4.165182829193126e-07, 'beta_dpo/gap_mean': 68.05307006835938, 'beta_dpo/gap_std': 95.49946594238281, 'beta_dpo/beta_used_raw': 0.4511352777481079, 'beta_dpo/beta_used': 0.7232382297515869, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.43197929859161377, 'logits/rejected': -0.4625827670097351, 'beta_dpo/beta': 0.7232382297515869, 'beta_dpo/loss_margin_mean': 78.76392364501953, 'beta_dpo/beta_margin_mean': 74.6033935546875, 'beta_dpo/beta_margin_std': 122.55489349365234, 'beta_dpo/beta_margin_grad_mean': -0.2983703017234802, 'beta_dpo/beta_margin_grad_std': 0.284095823764801, 'epoch': 0.34} + 34%|██████████████████████████▊ | 234/681 [16:20<19:56, 2.68s/it] 35%|██████████████████████████▉ | 235/681 [16:23<19:21, 2.60s/it] {'loss': 1.3526, 'grad_norm': 6.3571929931640625, 'learning_rate': 4.1555887447288255e-07, 'beta_dpo/gap_mean': 64.27421569824219, 'beta_dpo/gap_std': 95.8262939453125, 'beta_dpo/beta_used_raw': -1.6707329750061035, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4989432692527771, 'logits/rejected': -0.4859057068824768, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 48.786006927490234, 'beta_dpo/beta_margin_mean': 0.04878600686788559, 'beta_dpo/beta_margin_std': 0.09710308909416199, 'beta_dpo/beta_margin_grad_mean': -0.4878506064414978, 'beta_dpo/beta_margin_grad_std': 0.024142302572727203, 'epoch': 0.35} + 35%|██████████████████████████▉ | 235/681 [16:23<19:21, 2.60s/it] 35%|███████████████████████████ | 236/681 [16:25<19:37, 2.65s/it] {'loss': 0.7942, 'grad_norm': 211.59228515625, 'learning_rate': 4.1459510339613946e-07, 'beta_dpo/gap_mean': 65.10395050048828, 'beta_dpo/gap_std': 94.22532653808594, 'beta_dpo/beta_used_raw': -0.10223083198070526, 'beta_dpo/beta_used': 0.12677739560604095, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.46568238735198975, 'logits/rejected': -0.4750595688819885, 'beta_dpo/beta': 0.12677739560604095, 'beta_dpo/loss_margin_mean': 74.87934875488281, 'beta_dpo/beta_margin_mean': 10.552834510803223, 'beta_dpo/beta_margin_std': 17.651796340942383, 'beta_dpo/beta_margin_grad_mean': -0.2837068736553192, 'beta_dpo/beta_margin_grad_std': 0.26055774092674255, 'epoch': 0.35} + 35%|███████████████████████████ | 236/681 [16:25<19:37, 2.65s/it] 35%|███████████████████████████▏ | 237/681 [16:28<19:46, 2.67s/it] {'loss': 4.2522, 'grad_norm': 1457.3970947265625, 'learning_rate': 4.136269950853473e-07, 'beta_dpo/gap_mean': 66.33110046386719, 'beta_dpo/gap_std': 94.28207397460938, 'beta_dpo/beta_used_raw': 0.05189155042171478, 'beta_dpo/beta_used': 0.23019856214523315, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4683570861816406, 'logits/rejected': -0.4693116545677185, 'beta_dpo/beta': 0.23019856214523315, 'beta_dpo/loss_margin_mean': 68.86071014404297, 'beta_dpo/beta_margin_mean': 16.547616958618164, 'beta_dpo/beta_margin_std': 33.88982391357422, 'beta_dpo/beta_margin_grad_mean': -0.3403577208518982, 'beta_dpo/beta_margin_grad_std': 0.2993144690990448, 'epoch': 0.35} + 35%|███████████████████████████▏ | 237/681 [16:28<19:46, 2.67s/it] 35%|███████████████████████████▎ | 238/681 [16:31<20:04, 2.72s/it] {'loss': 1.3348, 'grad_norm': 4.830297946929932, 'learning_rate': 4.126545750510605e-07, 'beta_dpo/gap_mean': 66.35641479492188, 'beta_dpo/gap_std': 93.38137817382812, 'beta_dpo/beta_used_raw': -0.6816811561584473, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4283304214477539, 'logits/rejected': -0.4415178894996643, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 60.12242889404297, 'beta_dpo/beta_margin_mean': 0.06012243032455444, 'beta_dpo/beta_margin_std': 0.08544077724218369, 'beta_dpo/beta_margin_grad_mean': -0.48502317070961, 'beta_dpo/beta_margin_grad_std': 0.021210981532931328, 'epoch': 0.35} + 35%|███████████████████████████▎ | 238/681 [16:31<20:04, 2.72s/it] 35%|███████████████████████████▎ | 239/681 [16:33<19:27, 2.64s/it] {'loss': 5.1543, 'grad_norm': 3145.790283203125, 'learning_rate': 4.116778689174514e-07, 'beta_dpo/gap_mean': 66.31056213378906, 'beta_dpo/gap_std': 91.886962890625, 'beta_dpo/beta_used_raw': 0.1778862476348877, 'beta_dpo/beta_used': 1.085011601448059, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.46962711215019226, 'logits/rejected': -0.45392659306526184, 'beta_dpo/beta': 1.085011601448059, 'beta_dpo/loss_margin_mean': 67.11627197265625, 'beta_dpo/beta_margin_mean': 92.08358001708984, 'beta_dpo/beta_margin_std': 171.84555053710938, 'beta_dpo/beta_margin_grad_mean': -0.3376123607158661, 'beta_dpo/beta_margin_grad_std': 0.3144451677799225, 'epoch': 0.35} + 35%|███████████████████████████▎ | 239/681 [16:33<19:27, 2.64s/it] 35%|███████████████████████████▍ | 240/681 [16:36<19:23, 2.64s/it] {'loss': 1.003, 'grad_norm': 40.69264602661133, 'learning_rate': 4.106969024216348e-07, 'beta_dpo/gap_mean': 64.25852966308594, 'beta_dpo/gap_std': 89.93122863769531, 'beta_dpo/beta_used_raw': -0.3118809163570404, 'beta_dpo/beta_used': 0.01710430718958378, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4985065460205078, 'logits/rejected': -0.48068171739578247, 'beta_dpo/beta': 0.01710430718958378, 'beta_dpo/loss_margin_mean': 53.26530075073242, 'beta_dpo/beta_margin_mean': 1.1268202066421509, 'beta_dpo/beta_margin_std': 2.160505771636963, 'beta_dpo/beta_margin_grad_mean': -0.35482582449913025, 'beta_dpo/beta_margin_grad_std': 0.23242245614528656, 'epoch': 0.35} + 35%|███████████████████████████▍ | 240/681 [16:36<19:23, 2.64s/it] 35%|███████████████████████████▌ | 241/681 [16:39<18:56, 2.58s/it] {'loss': 0.7624, 'grad_norm': 423.2123107910156, 'learning_rate': 4.097117014129903e-07, 'beta_dpo/gap_mean': 65.07862854003906, 'beta_dpo/gap_std': 90.019287109375, 'beta_dpo/beta_used_raw': -0.44774329662323, 'beta_dpo/beta_used': 0.15345998108386993, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.509527862071991, 'logits/rejected': -0.4832276701927185, 'beta_dpo/beta': 0.15345998108386993, 'beta_dpo/loss_margin_mean': 80.56120300292969, 'beta_dpo/beta_margin_mean': 10.851144790649414, 'beta_dpo/beta_margin_std': 16.8941593170166, 'beta_dpo/beta_margin_grad_mean': -0.2729555368423462, 'beta_dpo/beta_margin_grad_std': 0.2604886293411255, 'epoch': 0.35} + 35%|███████████████████████████▌ | 241/681 [16:39<18:56, 2.58s/it] 36%|███████████████████████████▋ | 242/681 [16:41<18:35, 2.54s/it] {'loss': 1.3321, 'grad_norm': 4.439099311828613, 'learning_rate': 4.087222918524807e-07, 'beta_dpo/gap_mean': 64.53580474853516, 'beta_dpo/gap_std': 93.37384033203125, 'beta_dpo/beta_used_raw': -0.40156319737434387, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45422351360321045, 'logits/rejected': -0.42984485626220703, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 57.7884407043457, 'beta_dpo/beta_margin_mean': 0.05778844282031059, 'beta_dpo/beta_margin_std': 0.094021275639534, 'beta_dpo/beta_margin_grad_mean': -0.4855991005897522, 'beta_dpo/beta_margin_grad_std': 0.023397963494062424, 'epoch': 0.36} + 36%|███████████████████████████▋ | 242/681 [16:41<18:35, 2.54s/it] 36%|███████████████████████████▊ | 243/681 [16:44<18:34, 2.54s/it] {'loss': 4.1283, 'grad_norm': 1469.662841796875, 'learning_rate': 4.07728699811968e-07, 'beta_dpo/gap_mean': 65.9927978515625, 'beta_dpo/gap_std': 92.37184143066406, 'beta_dpo/beta_used_raw': -0.12471228837966919, 'beta_dpo/beta_used': 0.30491340160369873, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.44465339183807373, 'logits/rejected': -0.4099007844924927, 'beta_dpo/beta': 0.30491340160369873, 'beta_dpo/loss_margin_mean': 72.20804595947266, 'beta_dpo/beta_margin_mean': 22.148780822753906, 'beta_dpo/beta_margin_std': 43.37929153442383, 'beta_dpo/beta_margin_grad_mean': -0.3281807005405426, 'beta_dpo/beta_margin_grad_std': 0.29721781611442566, 'epoch': 0.36} + 36%|███████████████████████████▊ | 243/681 [16:44<18:34, 2.54s/it] 36%|███████████████████████████▉ | 244/681 [16:46<18:34, 2.55s/it] {'loss': 1.3698, 'grad_norm': 321.0909118652344, 'learning_rate': 4.067309514735267e-07, 'beta_dpo/gap_mean': 67.40866088867188, 'beta_dpo/gap_std': 90.40948486328125, 'beta_dpo/beta_used_raw': -0.010177649557590485, 'beta_dpo/beta_used': 0.12539884448051453, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.49787259101867676, 'logits/rejected': -0.4910111427307129, 'beta_dpo/beta': 0.12539884448051453, 'beta_dpo/loss_margin_mean': 73.00894165039062, 'beta_dpo/beta_margin_mean': 8.772866249084473, 'beta_dpo/beta_margin_std': 14.808113098144531, 'beta_dpo/beta_margin_grad_mean': -0.33266112208366394, 'beta_dpo/beta_margin_grad_std': 0.2994270622730255, 'epoch': 0.36} + 36%|███████████████████████████▉ | 244/681 [16:46<18:34, 2.55s/it] 36%|████████████████████████████ | 245/681 [16:49<19:03, 2.62s/it] {'loss': 1.3501, 'grad_norm': 3.6279404163360596, 'learning_rate': 4.057290731287531e-07, 'beta_dpo/gap_mean': 67.58259582519531, 'beta_dpo/gap_std': 91.15482330322266, 'beta_dpo/beta_used_raw': -1.7166011333465576, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5001641511917114, 'logits/rejected': -0.4671769142150879, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 57.68232727050781, 'beta_dpo/beta_margin_mean': 0.057682327926158905, 'beta_dpo/beta_margin_std': 0.10235247761011124, 'beta_dpo/beta_margin_grad_mean': -0.48563241958618164, 'beta_dpo/beta_margin_grad_std': 0.025404594838619232, 'epoch': 0.36} + 36%|████████████████████████████ | 245/681 [16:49<19:03, 2.62s/it] 36%|████████████████████████████▏ | 246/681 [16:51<18:54, 2.61s/it] {'loss': 1.3351, 'grad_norm': 4.705582618713379, 'learning_rate': 4.047230911780736e-07, 'beta_dpo/gap_mean': 64.41853332519531, 'beta_dpo/gap_std': 90.19287872314453, 'beta_dpo/beta_used_raw': -0.5867970585823059, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5271105766296387, 'logits/rejected': -0.49014580249786377, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 58.41972351074219, 'beta_dpo/beta_margin_mean': 0.05841972678899765, 'beta_dpo/beta_margin_std': 0.08776440471410751, 'beta_dpo/beta_margin_grad_mean': -0.48544150590896606, 'beta_dpo/beta_margin_grad_std': 0.021831955760717392, 'epoch': 0.36} + 36%|████████████████████████████▏ | 246/681 [16:52<18:54, 2.61s/it] 36%|████████████████████████████▎ | 247/681 [16:54<18:35, 2.57s/it] {'loss': 2.3625, 'grad_norm': 471.9046325683594, 'learning_rate': 4.0371303213004814e-07, 'beta_dpo/gap_mean': 68.1501235961914, 'beta_dpo/gap_std': 92.23121643066406, 'beta_dpo/beta_used_raw': 0.06344389915466309, 'beta_dpo/beta_used': 0.1847115010023117, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.44840526580810547, 'logits/rejected': -0.45401185750961304, 'beta_dpo/beta': 0.1847115010023117, 'beta_dpo/loss_margin_mean': 90.32220458984375, 'beta_dpo/beta_margin_mean': 15.352115631103516, 'beta_dpo/beta_margin_std': 25.77711296081543, 'beta_dpo/beta_margin_grad_mean': -0.2965923547744751, 'beta_dpo/beta_margin_grad_std': 0.28494712710380554, 'epoch': 0.36} + 36%|████████████████████████████▎ | 247/681 [16:54<18:35, 2.57s/it] 36%|████████████████████████████▍ | 248/681 [16:56<18:27, 2.56s/it] {'loss': 1.3302, 'grad_norm': 4.8645339012146, 'learning_rate': 4.0269892260067197e-07, 'beta_dpo/gap_mean': 68.87054443359375, 'beta_dpo/gap_std': 89.18070220947266, 'beta_dpo/beta_used_raw': -0.5669313669204712, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45725005865097046, 'logits/rejected': -0.47495073080062866, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 66.2895278930664, 'beta_dpo/beta_margin_mean': 0.06628952920436859, 'beta_dpo/beta_margin_std': 0.07302607595920563, 'beta_dpo/beta_margin_grad_mean': -0.48346683382987976, 'beta_dpo/beta_margin_grad_std': 0.018162554129958153, 'epoch': 0.36} + 36%|████████████████████████████▍ | 248/681 [16:57<18:27, 2.56s/it] 37%|████████████████████████████▌ | 249/681 [16:59<18:04, 2.51s/it] {'loss': 1.3437, 'grad_norm': 6.0476884841918945, 'learning_rate': 4.0168078931267426e-07, 'beta_dpo/gap_mean': 64.90923309326172, 'beta_dpo/gap_std': 87.11177825927734, 'beta_dpo/beta_used_raw': -1.1384837627410889, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.47966477274894714, 'logits/rejected': -0.45807725191116333, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 41.93442916870117, 'beta_dpo/beta_margin_mean': 0.041934434324502945, 'beta_dpo/beta_margin_std': 0.07966778427362442, 'beta_dpo/beta_margin_grad_mean': -0.48953747749328613, 'beta_dpo/beta_margin_grad_std': 0.019871097058057785, 'epoch': 0.37} + 37%|████████████████████████████▌ | 249/681 [16:59<18:04, 2.51s/it] 37%|████████████████████████████▋ | 250/681 [17:02<18:39, 2.60s/it] {'loss': 0.7885, 'grad_norm': 598.4202270507812, 'learning_rate': 4.006586590948141e-07, 'beta_dpo/gap_mean': 62.63585662841797, 'beta_dpo/gap_std': 83.05741119384766, 'beta_dpo/beta_used_raw': 0.17717288434505463, 'beta_dpo/beta_used': 0.3004174530506134, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.458739697933197, 'logits/rejected': -0.40397346019744873, 'beta_dpo/beta': 0.3004174530506134, 'beta_dpo/loss_margin_mean': 62.211456298828125, 'beta_dpo/beta_margin_mean': 21.81104278564453, 'beta_dpo/beta_margin_std': 32.62987518310547, 'beta_dpo/beta_margin_grad_mean': -0.277651309967041, 'beta_dpo/beta_margin_grad_std': 0.2621324062347412, 'epoch': 0.37} + 37%|████████████████████████████▋ | 250/681 [17:02<18:39, 2.60s/it] 37%|████████████████████████████▋ | 251/681 [17:04<18:12, 2.54s/it] {'loss': 1.1642, 'grad_norm': 993.1358032226562, 'learning_rate': 3.9963255888117325e-07, 'beta_dpo/gap_mean': 62.36948013305664, 'beta_dpo/gap_std': 82.18414306640625, 'beta_dpo/beta_used_raw': -0.07390487194061279, 'beta_dpo/beta_used': 0.25873419642448425, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45881718397140503, 'logits/rejected': -0.43461471796035767, 'beta_dpo/beta': 0.25873419642448425, 'beta_dpo/loss_margin_mean': 55.00004959106445, 'beta_dpo/beta_margin_mean': 16.468461990356445, 'beta_dpo/beta_margin_std': 38.37507629394531, 'beta_dpo/beta_margin_grad_mean': -0.35796087980270386, 'beta_dpo/beta_margin_grad_std': 0.3145868182182312, 'epoch': 0.37} + 37%|████████████████████████████▋ | 251/681 [17:04<18:12, 2.54s/it] 37%|████████████████████████████▊ | 252/681 [17:07<18:28, 2.58s/it] {'loss': 1.6671, 'grad_norm': 1931.18115234375, 'learning_rate': 3.9860251571044666e-07, 'beta_dpo/gap_mean': 61.836875915527344, 'beta_dpo/gap_std': 78.91354370117188, 'beta_dpo/beta_used_raw': 0.2833039164543152, 'beta_dpo/beta_used': 0.43845975399017334, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5125927925109863, 'logits/rejected': -0.47563207149505615, 'beta_dpo/beta': 0.43845975399017334, 'beta_dpo/loss_margin_mean': 61.227333068847656, 'beta_dpo/beta_margin_mean': 30.300657272338867, 'beta_dpo/beta_margin_std': 50.934173583984375, 'beta_dpo/beta_margin_grad_mean': -0.27507588267326355, 'beta_dpo/beta_margin_grad_std': 0.2723042070865631, 'epoch': 0.37} + 37%|████████████████████████████▊ | 252/681 [17:07<18:28, 2.58s/it] 37%|████████████████████████████▉ | 253/681 [17:09<18:32, 2.60s/it] {'loss': 1.364, 'grad_norm': 569.2015991210938, 'learning_rate': 3.9756855672522986e-07, 'beta_dpo/gap_mean': 60.489776611328125, 'beta_dpo/gap_std': 77.81178283691406, 'beta_dpo/beta_used_raw': -0.5888211727142334, 'beta_dpo/beta_used': 0.13669037818908691, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.49557358026504517, 'logits/rejected': -0.4879855513572693, 'beta_dpo/beta': 0.13669037818908691, 'beta_dpo/loss_margin_mean': 61.38548278808594, 'beta_dpo/beta_margin_mean': 11.10105037689209, 'beta_dpo/beta_margin_std': 19.28214454650879, 'beta_dpo/beta_margin_grad_mean': -0.31530076265335083, 'beta_dpo/beta_margin_grad_std': 0.2846486270427704, 'epoch': 0.37} + 37%|████████████████████████████▉ | 253/681 [17:09<18:32, 2.60s/it] 37%|█████████████████████████████ | 254/681 [17:12<18:48, 2.64s/it] {'loss': 1.3323, 'grad_norm': 4.005491733551025, 'learning_rate': 3.965307091713037e-07, 'beta_dpo/gap_mean': 61.257843017578125, 'beta_dpo/gap_std': 80.37059020996094, 'beta_dpo/beta_used_raw': -0.21809083223342896, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.47500473260879517, 'logits/rejected': -0.46256011724472046, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 57.76969528198242, 'beta_dpo/beta_margin_mean': 0.05776969715952873, 'beta_dpo/beta_margin_std': 0.08991079777479172, 'beta_dpo/beta_margin_grad_mean': -0.4855990707874298, 'beta_dpo/beta_margin_grad_std': 0.02238706313073635, 'epoch': 0.37} + 37%|█████████████████████████████ | 254/681 [17:12<18:48, 2.64s/it] 37%|█████████████████████████████▏ | 255/681 [17:15<18:23, 2.59s/it] {'loss': 2.8414, 'grad_norm': 1103.354248046875, 'learning_rate': 3.954890003969163e-07, 'beta_dpo/gap_mean': 61.839927673339844, 'beta_dpo/gap_std': 83.36296844482422, 'beta_dpo/beta_used_raw': -0.4761512279510498, 'beta_dpo/beta_used': 0.21867026388645172, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4370883107185364, 'logits/rejected': -0.4320235848426819, 'beta_dpo/beta': 0.21867026388645172, 'beta_dpo/loss_margin_mean': 60.37783432006836, 'beta_dpo/beta_margin_mean': 18.47532844543457, 'beta_dpo/beta_margin_std': 37.53182601928711, 'beta_dpo/beta_margin_grad_mean': -0.3410184681415558, 'beta_dpo/beta_margin_grad_std': 0.3134188652038574, 'epoch': 0.37} + 37%|█████████████████████████████▏ | 255/681 [17:15<18:23, 2.59s/it] 38%|█████████████████████████████▎ | 256/681 [17:17<18:15, 2.58s/it] {'loss': 1.3417, 'grad_norm': 5.088190078735352, 'learning_rate': 3.944434578520628e-07, 'beta_dpo/gap_mean': 60.238243103027344, 'beta_dpo/gap_std': 83.42945861816406, 'beta_dpo/beta_used_raw': -0.7590247988700867, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.43487805128097534, 'logits/rejected': -0.4386810064315796, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 57.98687744140625, 'beta_dpo/beta_margin_mean': 0.05798688158392906, 'beta_dpo/beta_margin_std': 0.08430825173854828, 'beta_dpo/beta_margin_grad_mean': -0.48553693294525146, 'beta_dpo/beta_margin_grad_std': 0.021008189767599106, 'epoch': 0.38} + 38%|█████████████████████████████▎ | 256/681 [17:17<18:15, 2.58s/it] 38%|█████████████████████████████▍ | 257/681 [17:20<18:23, 2.60s/it] {'loss': 1.3407, 'grad_norm': 4.403605937957764, 'learning_rate': 3.933941090877615e-07, 'beta_dpo/gap_mean': 63.066001892089844, 'beta_dpo/gap_std': 86.74974060058594, 'beta_dpo/beta_used_raw': -0.8660670518875122, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4413556456565857, 'logits/rejected': -0.42769724130630493, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 74.549072265625, 'beta_dpo/beta_margin_mean': 0.07454907149076462, 'beta_dpo/beta_margin_std': 0.09976498037576675, 'beta_dpo/beta_margin_grad_mean': -0.48144304752349854, 'beta_dpo/beta_margin_grad_std': 0.024765780195593834, 'epoch': 0.38} + 38%|█████████████████████████████▍ | 257/681 [17:20<18:23, 2.60s/it] 38%|█████████████████████████████▌ | 258/681 [17:22<17:34, 2.49s/it] {'loss': 6.21, 'grad_norm': 3212.95654296875, 'learning_rate': 3.923409817553284e-07, 'beta_dpo/gap_mean': 62.323787689208984, 'beta_dpo/gap_std': 87.7547607421875, 'beta_dpo/beta_used_raw': 0.8561594486236572, 'beta_dpo/beta_used': 0.8561594486236572, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.39796602725982666, 'logits/rejected': -0.39811059832572937, 'beta_dpo/beta': 0.8561594486236572, 'beta_dpo/loss_margin_mean': 63.648921966552734, 'beta_dpo/beta_margin_mean': 54.67416000366211, 'beta_dpo/beta_margin_std': 84.30635070800781, 'beta_dpo/beta_margin_grad_mean': -0.24124778807163239, 'beta_dpo/beta_margin_grad_std': 0.42249229550361633, 'epoch': 0.38} + 38%|█████████████████████████████▌ | 258/681 [17:22<17:34, 2.49s/it] 38%|█████████████████████████████▋ | 259/681 [17:25<17:36, 2.50s/it] {'loss': 1.3481, 'grad_norm': 5.502564430236816, 'learning_rate': 3.9128410360564793e-07, 'beta_dpo/gap_mean': 62.94316864013672, 'beta_dpo/gap_std': 88.68659973144531, 'beta_dpo/beta_used_raw': -1.3002283573150635, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45921239256858826, 'logits/rejected': -0.4577338993549347, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 59.64350891113281, 'beta_dpo/beta_margin_mean': 0.05964351072907448, 'beta_dpo/beta_margin_std': 0.08494514971971512, 'beta_dpo/beta_margin_grad_mean': -0.48512884974479675, 'beta_dpo/beta_margin_grad_std': 0.02116353064775467, 'epoch': 0.38} + 38%|█████████████████████████████▋ | 259/681 [17:25<17:36, 2.50s/it] 38%|█████████████████████████████▊ | 260/681 [17:27<18:00, 2.57s/it] {'loss': 1.3345, 'grad_norm': 6.234367847442627, 'learning_rate': 3.9022350248844246e-07, 'beta_dpo/gap_mean': 62.76177215576172, 'beta_dpo/gap_std': 87.07768249511719, 'beta_dpo/beta_used_raw': -0.4817598760128021, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.42374077439308167, 'logits/rejected': -0.44464540481567383, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 69.42640686035156, 'beta_dpo/beta_margin_mean': 0.06942640990018845, 'beta_dpo/beta_margin_std': 0.08899199217557907, 'beta_dpo/beta_margin_grad_mean': -0.48271456360816956, 'beta_dpo/beta_margin_grad_std': 0.022007808089256287, 'epoch': 0.38} + 38%|█████████████████████████████▊ | 260/681 [17:27<18:00, 2.57s/it] 38%|█████████████████████████████▉ | 261/681 [17:30<17:13, 2.46s/it] {'loss': 1.3409, 'grad_norm': 4.799732208251953, 'learning_rate': 3.891592063515376e-07, 'beta_dpo/gap_mean': 64.44114685058594, 'beta_dpo/gap_std': 89.06988525390625, 'beta_dpo/beta_used_raw': -0.943615198135376, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3694385290145874, 'logits/rejected': -0.3720252513885498, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 67.0704574584961, 'beta_dpo/beta_margin_mean': 0.06707046180963516, 'beta_dpo/beta_margin_std': 0.09085685759782791, 'beta_dpo/beta_margin_grad_mean': -0.48329126834869385, 'beta_dpo/beta_margin_grad_std': 0.022538091987371445, 'epoch': 0.38} + 38%|█████████████████████████████▉ | 261/681 [17:30<17:13, 2.46s/it] 38%|██████████████████████████████ | 262/681 [17:32<17:10, 2.46s/it] {'loss': 1.3326, 'grad_norm': 4.288631916046143, 'learning_rate': 3.880912432401264e-07, 'beta_dpo/gap_mean': 63.341896057128906, 'beta_dpo/gap_std': 86.87582397460938, 'beta_dpo/beta_used_raw': -0.3785286545753479, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3830975890159607, 'logits/rejected': -0.3654525876045227, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 59.38102722167969, 'beta_dpo/beta_margin_mean': 0.05938103049993515, 'beta_dpo/beta_margin_std': 0.07453680038452148, 'beta_dpo/beta_margin_grad_mean': -0.4851832985877991, 'beta_dpo/beta_margin_grad_std': 0.018579039722681046, 'epoch': 0.38} + 38%|██████████████████████████████ | 262/681 [17:32<17:10, 2.46s/it] 39%|██████████████████████████████ | 263/681 [17:35<17:43, 2.54s/it] {'loss': 3.3255, 'grad_norm': 1847.1041259765625, 'learning_rate': 3.870196412960302e-07, 'beta_dpo/gap_mean': 66.71192932128906, 'beta_dpo/gap_std': 88.3709487915039, 'beta_dpo/beta_used_raw': 0.651368260383606, 'beta_dpo/beta_used': 0.7164207696914673, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.43336668610572815, 'logits/rejected': -0.40536999702453613, 'beta_dpo/beta': 0.7164207696914673, 'beta_dpo/loss_margin_mean': 81.06204986572266, 'beta_dpo/beta_margin_mean': 71.56385803222656, 'beta_dpo/beta_margin_std': 131.27561950683594, 'beta_dpo/beta_margin_grad_mean': -0.31974849104881287, 'beta_dpo/beta_margin_grad_std': 0.30375197529792786, 'epoch': 0.39} + 39%|██████████████████████████████ | 263/681 [17:35<17:43, 2.54s/it] 39%|██████████████████████████████▏ | 264/681 [17:37<18:03, 2.60s/it] {'loss': 3.0123, 'grad_norm': 1272.9935302734375, 'learning_rate': 3.8594442875695665e-07, 'beta_dpo/gap_mean': 66.88683319091797, 'beta_dpo/gap_std': 88.364501953125, 'beta_dpo/beta_used_raw': -0.5231786966323853, 'beta_dpo/beta_used': 0.32210445404052734, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.47683650255203247, 'logits/rejected': -0.4689565896987915, 'beta_dpo/beta': 0.32210445404052734, 'beta_dpo/loss_margin_mean': 62.8607292175293, 'beta_dpo/beta_margin_mean': 26.927030563354492, 'beta_dpo/beta_margin_std': 47.48490524291992, 'beta_dpo/beta_margin_grad_mean': -0.3196498155593872, 'beta_dpo/beta_margin_grad_std': 0.2992617189884186, 'epoch': 0.39} + 39%|██████████████████████████████▏ | 264/681 [17:37<18:03, 2.60s/it] 39%|██████████████████████████████▎ | 265/681 [17:40<17:34, 2.54s/it] {'loss': 1.3367, 'grad_norm': 5.106090545654297, 'learning_rate': 3.848656339557562e-07, 'beta_dpo/gap_mean': 65.43952941894531, 'beta_dpo/gap_std': 89.35261535644531, 'beta_dpo/beta_used_raw': -0.7309384942054749, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.44355565309524536, 'logits/rejected': -0.42892855405807495, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 65.74657440185547, 'beta_dpo/beta_margin_mean': 0.06574657559394836, 'beta_dpo/beta_margin_std': 0.097164586186409, 'beta_dpo/beta_margin_grad_mean': -0.4836253225803375, 'beta_dpo/beta_margin_grad_std': 0.024147428572177887, 'epoch': 0.39} + 39%|██████████████████████████████▎ | 265/681 [17:40<17:34, 2.54s/it] 39%|██████████████████████████████▍ | 266/681 [17:42<17:28, 2.53s/it] {'loss': 1.3305, 'grad_norm': 3.584993362426758, 'learning_rate': 3.8378328531967507e-07, 'beta_dpo/gap_mean': 64.68086242675781, 'beta_dpo/gap_std': 90.58798217773438, 'beta_dpo/beta_used_raw': -0.3030107021331787, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5003777146339417, 'logits/rejected': -0.4550362229347229, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 57.66104507446289, 'beta_dpo/beta_margin_mean': 0.05766104906797409, 'beta_dpo/beta_margin_std': 0.08847023546695709, 'beta_dpo/beta_margin_grad_mean': -0.48562902212142944, 'beta_dpo/beta_margin_grad_std': 0.02202366106212139, 'epoch': 0.39} + 39%|██████████████████████████████▍ | 266/681 [17:42<17:28, 2.53s/it] 39%|██████████████████████████████▌ | 267/681 [17:45<17:29, 2.54s/it] {'loss': 1.3368, 'grad_norm': 5.026149272918701, 'learning_rate': 3.8269741136960646e-07, 'beta_dpo/gap_mean': 64.24443817138672, 'beta_dpo/gap_std': 89.84454345703125, 'beta_dpo/beta_used_raw': -0.679577112197876, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.46633046865463257, 'logits/rejected': -0.4374736547470093, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 64.59161376953125, 'beta_dpo/beta_margin_mean': 0.06459161639213562, 'beta_dpo/beta_margin_std': 0.08754534274339676, 'beta_dpo/beta_margin_grad_mean': -0.4839051365852356, 'beta_dpo/beta_margin_grad_std': 0.021753991022706032, 'epoch': 0.39} + 39%|██████████████████████████████▌ | 267/681 [17:45<17:29, 2.54s/it] 39%|██████████████████████████████▋ | 268/681 [17:47<17:25, 2.53s/it] {'loss': 5.9535, 'grad_norm': 2970.7353515625, 'learning_rate': 3.8160804071933894e-07, 'beta_dpo/gap_mean': 64.24163055419922, 'beta_dpo/gap_std': 89.63772583007812, 'beta_dpo/beta_used_raw': 0.28859809041023254, 'beta_dpo/beta_used': 0.4272679090499878, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4358275532722473, 'logits/rejected': -0.44389188289642334, 'beta_dpo/beta': 0.4272679090499878, 'beta_dpo/loss_margin_mean': 64.2711410522461, 'beta_dpo/beta_margin_mean': 26.88530731201172, 'beta_dpo/beta_margin_std': 64.07011413574219, 'beta_dpo/beta_margin_grad_mean': -0.3774115741252899, 'beta_dpo/beta_margin_grad_std': 0.3255773186683655, 'epoch': 0.39} + 39%|██████████████████████████████▋ | 268/681 [17:47<17:25, 2.53s/it] 40%|██████████████████████████████▊ | 269/681 [17:50<17:12, 2.51s/it] {'loss': 17.9159, 'grad_norm': 6386.1025390625, 'learning_rate': 3.8051520207480204e-07, 'beta_dpo/gap_mean': 67.56047821044922, 'beta_dpo/gap_std': 95.0364990234375, 'beta_dpo/beta_used_raw': 0.7696582078933716, 'beta_dpo/beta_used': 0.7696582078933716, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4257703721523285, 'logits/rejected': -0.40664464235305786, 'beta_dpo/beta': 0.7696582078933716, 'beta_dpo/loss_margin_mean': 89.27982330322266, 'beta_dpo/beta_margin_mean': 70.0230484008789, 'beta_dpo/beta_margin_std': 98.9859390258789, 'beta_dpo/beta_margin_grad_mean': -0.31902071833610535, 'beta_dpo/beta_margin_grad_std': 0.45872315764427185, 'epoch': 0.4} + 40%|██████████████████████████████▊ | 269/681 [17:50<17:12, 2.51s/it] 40%|██████████████████████████████▉ | 270/681 [17:53<17:32, 2.56s/it] {'loss': 2.5132, 'grad_norm': 782.6886596679688, 'learning_rate': 3.794189242333106e-07, 'beta_dpo/gap_mean': 68.88189697265625, 'beta_dpo/gap_std': 98.04679870605469, 'beta_dpo/beta_used_raw': 0.31599855422973633, 'beta_dpo/beta_used': 0.31599855422973633, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5228564739227295, 'logits/rejected': -0.5192960500717163, 'beta_dpo/beta': 0.31599855422973633, 'beta_dpo/loss_margin_mean': 69.03794860839844, 'beta_dpo/beta_margin_mean': 21.739652633666992, 'beta_dpo/beta_margin_std': 33.68879318237305, 'beta_dpo/beta_margin_grad_mean': -0.1990150511264801, 'beta_dpo/beta_margin_grad_std': 0.38719597458839417, 'epoch': 0.4} + 40%|██████████████████████████████▉ | 270/681 [17:53<17:32, 2.56s/it] 40%|███████████████████████████████ | 271/681 [17:55<17:06, 2.50s/it] {'loss': 1.3298, 'grad_norm': 5.687714576721191, 'learning_rate': 3.7831923608280514e-07, 'beta_dpo/gap_mean': 70.80068969726562, 'beta_dpo/gap_std': 99.597412109375, 'beta_dpo/beta_used_raw': -0.6152107119560242, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.45079296827316284, 'logits/rejected': -0.4350966811180115, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 78.76793670654297, 'beta_dpo/beta_margin_mean': 0.07876794040203094, 'beta_dpo/beta_margin_std': 0.10474507510662079, 'beta_dpo/beta_margin_grad_mean': -0.48039206862449646, 'beta_dpo/beta_margin_grad_std': 0.026033930480480194, 'epoch': 0.4} + 40%|███████████████████████████████ | 271/681 [17:55<17:06, 2.50s/it] 40%|███████████████████████████████▏ | 272/681 [17:58<17:27, 2.56s/it] {'loss': 2.403, 'grad_norm': 3469.978759765625, 'learning_rate': 3.772161666010912e-07, 'beta_dpo/gap_mean': 74.29582214355469, 'beta_dpo/gap_std': 98.69171905517578, 'beta_dpo/beta_used_raw': 0.21146634221076965, 'beta_dpo/beta_used': 0.2916773557662964, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4280936121940613, 'logits/rejected': -0.4439089596271515, 'beta_dpo/beta': 0.2916773557662964, 'beta_dpo/loss_margin_mean': 96.75138854980469, 'beta_dpo/beta_margin_mean': 26.473766326904297, 'beta_dpo/beta_margin_std': 43.10868835449219, 'beta_dpo/beta_margin_grad_mean': -0.34444308280944824, 'beta_dpo/beta_margin_grad_std': 0.3155882954597473, 'epoch': 0.4} + 40%|███████████████████████████████▏ | 272/681 [17:58<17:27, 2.56s/it] 40%|███████████████████████████████▎ | 273/681 [18:00<17:05, 2.51s/it] {'loss': 3.9441, 'grad_norm': 1977.9761962890625, 'learning_rate': 3.761097448550755e-07, 'beta_dpo/gap_mean': 76.61572265625, 'beta_dpo/gap_std': 100.23278045654297, 'beta_dpo/beta_used_raw': -0.13546743988990784, 'beta_dpo/beta_used': 0.4497944712638855, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4547615647315979, 'logits/rejected': -0.4396814703941345, 'beta_dpo/beta': 0.4497944712638855, 'beta_dpo/loss_margin_mean': 77.89697265625, 'beta_dpo/beta_margin_mean': 40.1925048828125, 'beta_dpo/beta_margin_std': 79.06779479980469, 'beta_dpo/beta_margin_grad_mean': -0.30444207787513733, 'beta_dpo/beta_margin_grad_std': 0.294939249753952, 'epoch': 0.4} + 40%|███████████████████████████████▎ | 273/681 [18:00<17:05, 2.51s/it] 40%|███████████████████████████████▍ | 274/681 [18:03<17:07, 2.52s/it] {'loss': 1.3435, 'grad_norm': 4.778660774230957, 'learning_rate': 3.75e-07, 'beta_dpo/gap_mean': 73.87464141845703, 'beta_dpo/gap_std': 97.98983001708984, 'beta_dpo/beta_used_raw': -1.6583735942840576, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4540286064147949, 'logits/rejected': -0.43437108397483826, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 64.36125183105469, 'beta_dpo/beta_margin_mean': 0.06436125934123993, 'beta_dpo/beta_margin_std': 0.08749227970838547, 'beta_dpo/beta_margin_grad_mean': -0.48395276069641113, 'beta_dpo/beta_margin_grad_std': 0.021796153858304024, 'epoch': 0.4} + 40%|███████████████████████████████▍ | 274/681 [18:03<17:07, 2.52s/it] 40%|███████████████████████████████▍ | 275/681 [18:05<17:50, 2.64s/it] {'loss': 1.4477, 'grad_norm': 1107.3209228515625, 'learning_rate': 3.738869612786737e-07, 'beta_dpo/gap_mean': 73.8324203491211, 'beta_dpo/gap_std': 97.02469635009766, 'beta_dpo/beta_used_raw': -0.2188054919242859, 'beta_dpo/beta_used': 0.3658776581287384, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.48561912775039673, 'logits/rejected': -0.4850524365901947, 'beta_dpo/beta': 0.3658776581287384, 'beta_dpo/loss_margin_mean': 74.46902465820312, 'beta_dpo/beta_margin_mean': 28.91089630126953, 'beta_dpo/beta_margin_std': 53.39341354370117, 'beta_dpo/beta_margin_grad_mean': -0.30500340461730957, 'beta_dpo/beta_margin_grad_std': 0.2937050759792328, 'epoch': 0.4} + 40%|███████████████████████████████▍ | 275/681 [18:06<17:50, 2.64s/it] 41%|███████████████████████████████▌ | 276/681 [18:08<17:52, 2.65s/it] {'loss': 1.328, 'grad_norm': 4.309329986572266, 'learning_rate': 3.7277065802070204e-07, 'beta_dpo/gap_mean': 73.41490173339844, 'beta_dpo/gap_std': 99.17544555664062, 'beta_dpo/beta_used_raw': -0.6487220525741577, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4739760756492615, 'logits/rejected': -0.4428936541080475, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 68.88844299316406, 'beta_dpo/beta_margin_mean': 0.06888844817876816, 'beta_dpo/beta_margin_std': 0.10926186293363571, 'beta_dpo/beta_margin_grad_mean': -0.48285502195358276, 'beta_dpo/beta_margin_grad_std': 0.027132032439112663, 'epoch': 0.41} + 41%|███████████████████████████████▌ | 276/681 [18:08<17:52, 2.65s/it] 41%|███████████████████████████████▋ | 277/681 [18:10<16:57, 2.52s/it] {'loss': 3.3232, 'grad_norm': 959.0798950195312, 'learning_rate': 3.71651119641714e-07, 'beta_dpo/gap_mean': 71.44065856933594, 'beta_dpo/gap_std': 96.77009582519531, 'beta_dpo/beta_used_raw': -0.2556490898132324, 'beta_dpo/beta_used': 0.2809670865535736, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4775172770023346, 'logits/rejected': -0.4674876928329468, 'beta_dpo/beta': 0.2809670865535736, 'beta_dpo/loss_margin_mean': 65.0876693725586, 'beta_dpo/beta_margin_mean': 21.383647918701172, 'beta_dpo/beta_margin_std': 38.81602478027344, 'beta_dpo/beta_margin_grad_mean': -0.3351666331291199, 'beta_dpo/beta_margin_grad_std': 0.3102318048477173, 'epoch': 0.41} + 41%|███████████████████████████████▋ | 277/681 [18:10<16:57, 2.52s/it] 41%|███████████████████████████████▊ | 278/681 [18:13<16:54, 2.52s/it] {'loss': 1.8696, 'grad_norm': 546.0422973632812, 'learning_rate': 3.705283756425872e-07, 'beta_dpo/gap_mean': 73.7154541015625, 'beta_dpo/gap_std': 97.09827423095703, 'beta_dpo/beta_used_raw': -0.4377209544181824, 'beta_dpo/beta_used': 0.09777142852544785, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5163074731826782, 'logits/rejected': -0.5155045390129089, 'beta_dpo/beta': 0.09777142852544785, 'beta_dpo/loss_margin_mean': 81.99968719482422, 'beta_dpo/beta_margin_mean': 9.16122055053711, 'beta_dpo/beta_margin_std': 16.98973274230957, 'beta_dpo/beta_margin_grad_mean': -0.3435121774673462, 'beta_dpo/beta_margin_grad_std': 0.3006548285484314, 'epoch': 0.41} + 41%|███████████████████████████████▊ | 278/681 [18:13<16:54, 2.52s/it] 41%|███████████████████████████████▉ | 279/681 [18:15<16:52, 2.52s/it] {'loss': 2.2112, 'grad_norm': 677.2081909179688, 'learning_rate': 3.6940245560867e-07, 'beta_dpo/gap_mean': 75.48173522949219, 'beta_dpo/gap_std': 98.2899169921875, 'beta_dpo/beta_used_raw': -0.6678704023361206, 'beta_dpo/beta_used': 0.1939535290002823, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4336978495121002, 'logits/rejected': -0.42833346128463745, 'beta_dpo/beta': 0.1939535290002823, 'beta_dpo/loss_margin_mean': 83.40555572509766, 'beta_dpo/beta_margin_mean': 18.408206939697266, 'beta_dpo/beta_margin_std': 33.89780807495117, 'beta_dpo/beta_margin_grad_mean': -0.31403571367263794, 'beta_dpo/beta_margin_grad_std': 0.2941286265850067, 'epoch': 0.41} + 41%|███████████████████████████████▉ | 279/681 [18:15<16:52, 2.52s/it] 41%|████████████████████████████████ | 280/681 [18:18<17:03, 2.55s/it] {'loss': 0.6316, 'grad_norm': 373.3504943847656, 'learning_rate': 3.6827338920900253e-07, 'beta_dpo/gap_mean': 75.63088989257812, 'beta_dpo/gap_std': 95.76606750488281, 'beta_dpo/beta_used_raw': 0.5752575993537903, 'beta_dpo/beta_used': 0.5752575993537903, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4546999931335449, 'logits/rejected': -0.45433032512664795, 'beta_dpo/beta': 0.5752575993537903, 'beta_dpo/loss_margin_mean': 77.03679656982422, 'beta_dpo/beta_margin_mean': 43.74085235595703, 'beta_dpo/beta_margin_std': 54.59124755859375, 'beta_dpo/beta_margin_grad_mean': -0.13274730741977692, 'beta_dpo/beta_margin_grad_std': 0.31232884526252747, 'epoch': 0.41} + 41%|████████████████████████████████ | 280/681 [18:18<17:03, 2.55s/it] 41%|████████████████████████████████▏ | 281/681 [18:21<17:05, 2.56s/it] {'loss': 1.3182, 'grad_norm': 8.058195114135742, 'learning_rate': 3.6714120619553435e-07, 'beta_dpo/gap_mean': 73.92355346679688, 'beta_dpo/gap_std': 93.38307189941406, 'beta_dpo/beta_used_raw': -0.12925973534584045, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.49369382858276367, 'logits/rejected': -0.46913886070251465, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 66.84371185302734, 'beta_dpo/beta_margin_mean': 0.06684371829032898, 'beta_dpo/beta_margin_std': 0.0805417075753212, 'beta_dpo/beta_margin_grad_mean': -0.483308345079422, 'beta_dpo/beta_margin_grad_std': 0.020045718178153038, 'epoch': 0.41} + 41%|████████████████████████████████▏ | 281/681 [18:21<17:05, 2.56s/it] 41%|████████████████████████████████▎ | 282/681 [18:23<16:56, 2.55s/it] {'loss': 1.346, 'grad_norm': 3.629554033279419, 'learning_rate': 3.660059364023408e-07, 'beta_dpo/gap_mean': 70.5438003540039, 'beta_dpo/gap_std': 89.866455078125, 'beta_dpo/beta_used_raw': -1.6386913061141968, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5274136066436768, 'logits/rejected': -0.5010647773742676, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 53.189762115478516, 'beta_dpo/beta_margin_mean': 0.05318976566195488, 'beta_dpo/beta_margin_std': 0.07743314653635025, 'beta_dpo/beta_margin_grad_mean': -0.48673728108406067, 'beta_dpo/beta_margin_grad_std': 0.019268635660409927, 'epoch': 0.41} + 41%|████████████████████████████████▎ | 282/681 [18:23<16:56, 2.55s/it] 42%|████████████████████████████████▍ | 283/681 [18:26<16:50, 2.54s/it] {'loss': 1.5468, 'grad_norm': 462.5566711425781, 'learning_rate': 3.6486760974483685e-07, 'beta_dpo/gap_mean': 71.56987762451172, 'beta_dpo/gap_std': 89.52423095703125, 'beta_dpo/beta_used_raw': 0.838965654373169, 'beta_dpo/beta_used': 0.838965654373169, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.49745476245880127, 'logits/rejected': -0.48693162202835083, 'beta_dpo/beta': 0.838965654373169, 'beta_dpo/loss_margin_mean': 84.6165771484375, 'beta_dpo/beta_margin_mean': 74.75791931152344, 'beta_dpo/beta_margin_std': 82.98445892333984, 'beta_dpo/beta_margin_grad_mean': -0.12155988812446594, 'beta_dpo/beta_margin_grad_std': 0.31926241517066956, 'epoch': 0.42} + 42%|████████████████████████████████▍ | 283/681 [18:26<16:50, 2.54s/it] 42%|████████████████████████████████▌ | 284/681 [18:28<17:18, 2.62s/it] {'loss': 1.3195, 'grad_norm': 6.851167678833008, 'learning_rate': 3.6372625621898863e-07, 'beta_dpo/gap_mean': 74.1982650756836, 'beta_dpo/gap_std': 90.27053833007812, 'beta_dpo/beta_used_raw': -0.19312117993831635, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.5042980313301086, 'logits/rejected': -0.4991450905799866, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 81.42410278320312, 'beta_dpo/beta_margin_mean': 0.08142410963773727, 'beta_dpo/beta_margin_std': 0.0932619571685791, 'beta_dpo/beta_margin_grad_mean': -0.47972315549850464, 'beta_dpo/beta_margin_grad_std': 0.02312047965824604, 'epoch': 0.42} + 42%|████████████████████████████████▌ | 284/681 [18:28<17:18, 2.62s/it] 42%|████████████████████████████████▋ | 285/681 [18:31<17:07, 2.60s/it] {'loss': 1.3215, 'grad_norm': 7.985069274902344, 'learning_rate': 3.625819059005228e-07, 'beta_dpo/gap_mean': 73.66974639892578, 'beta_dpo/gap_std': 90.04093933105469, 'beta_dpo/beta_used_raw': -0.30619388818740845, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4940687417984009, 'logits/rejected': -0.48543840646743774, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 72.39013671875, 'beta_dpo/beta_margin_mean': 0.07239013910293579, 'beta_dpo/beta_margin_std': 0.08699988573789597, 'beta_dpo/beta_margin_grad_mean': -0.48195090889930725, 'beta_dpo/beta_margin_grad_std': 0.0216471329331398, 'epoch': 0.42} + 42%|████████████████████████████████▋ | 285/681 [18:31<17:07, 2.60s/it] 42%|████████████████████████████████▊ | 286/681 [18:34<17:15, 2.62s/it] {'loss': 1.2468, 'grad_norm': 274.8042907714844, 'learning_rate': 3.614345889441346e-07, 'beta_dpo/gap_mean': 74.31663513183594, 'beta_dpo/gap_std': 90.61752319335938, 'beta_dpo/beta_used_raw': -0.39668411016464233, 'beta_dpo/beta_used': 0.07317624241113663, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4832392930984497, 'logits/rejected': -0.46001118421554565, 'beta_dpo/beta': 0.07317624241113663, 'beta_dpo/loss_margin_mean': 73.65949249267578, 'beta_dpo/beta_margin_mean': 5.437658786773682, 'beta_dpo/beta_margin_std': 11.020866394042969, 'beta_dpo/beta_margin_grad_mean': -0.3578983247280121, 'beta_dpo/beta_margin_grad_std': 0.3003653585910797, 'epoch': 0.42} + 42%|████████████████████████████████▊ | 286/681 [18:34<17:15, 2.62s/it] 42%|████████████████████████████████▊ | 287/681 [18:36<16:29, 2.51s/it] {'loss': 1.3334, 'grad_norm': 4.072757720947266, 'learning_rate': 3.6028433558269275e-07, 'beta_dpo/gap_mean': 72.43344116210938, 'beta_dpo/gap_std': 90.36245727539062, 'beta_dpo/beta_used_raw': -0.9690273404121399, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.46784478425979614, 'logits/rejected': -0.44443923234939575, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 61.16653823852539, 'beta_dpo/beta_margin_mean': 0.061166539788246155, 'beta_dpo/beta_margin_std': 0.08909157663583755, 'beta_dpo/beta_margin_grad_mean': -0.48475971817970276, 'beta_dpo/beta_margin_grad_std': 0.022152835503220558, 'epoch': 0.42} + 42%|████████████████████████████████▊ | 287/681 [18:36<16:29, 2.51s/it] 42%|████████████████████████████████▉ | 288/681 [18:39<17:19, 2.64s/it] {'loss': 1.2203, 'grad_norm': 2659.9658203125, 'learning_rate': 3.5913117612644327e-07, 'beta_dpo/gap_mean': 74.12348937988281, 'beta_dpo/gap_std': 91.28290557861328, 'beta_dpo/beta_used_raw': 0.6619566082954407, 'beta_dpo/beta_used': 0.7897164821624756, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.44504302740097046, 'logits/rejected': -0.4315459430217743, 'beta_dpo/beta': 0.7897164821624756, 'beta_dpo/loss_margin_mean': 79.14554595947266, 'beta_dpo/beta_margin_mean': 85.403076171875, 'beta_dpo/beta_margin_std': 137.9335479736328, 'beta_dpo/beta_margin_grad_mean': -0.3040521442890167, 'beta_dpo/beta_margin_grad_std': 0.2914998233318329, 'epoch': 0.42} + 42%|████████████████████████████████▉ | 288/681 [18:39<17:19, 2.64s/it] 42%|█████████████████████████████████ | 289/681 [18:41<16:55, 2.59s/it] {'loss': 2.5223, 'grad_norm': 1766.22216796875, 'learning_rate': 3.5797514096221024e-07, 'beta_dpo/gap_mean': 74.27970886230469, 'beta_dpo/gap_std': 92.71040344238281, 'beta_dpo/beta_used_raw': -0.10066229104995728, 'beta_dpo/beta_used': 0.629094123840332, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.35877037048339844, 'logits/rejected': -0.3598015010356903, 'beta_dpo/beta': 0.629094123840332, 'beta_dpo/loss_margin_mean': 88.7456283569336, 'beta_dpo/beta_margin_mean': 68.1593246459961, 'beta_dpo/beta_margin_std': 111.16494750976562, 'beta_dpo/beta_margin_grad_mean': -0.3004843592643738, 'beta_dpo/beta_margin_grad_std': 0.28447577357292175, 'epoch': 0.42} + 42%|█████████████████████████████████ | 289/681 [18:41<16:55, 2.59s/it] 43%|█████████████████████████████████▏ | 290/681 [18:44<16:27, 2.53s/it] {'loss': 3.118, 'grad_norm': 1004.2230224609375, 'learning_rate': 3.568162605525952e-07, 'beta_dpo/gap_mean': 80.18174743652344, 'beta_dpo/gap_std': 98.11917877197266, 'beta_dpo/beta_used_raw': -0.1657930314540863, 'beta_dpo/beta_used': 0.4477105140686035, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3618127703666687, 'logits/rejected': -0.38121217489242554, 'beta_dpo/beta': 0.4477105140686035, 'beta_dpo/loss_margin_mean': 104.16016387939453, 'beta_dpo/beta_margin_mean': 46.98125076293945, 'beta_dpo/beta_margin_std': 88.1680908203125, 'beta_dpo/beta_margin_grad_mean': -0.3150025010108948, 'beta_dpo/beta_margin_grad_std': 0.30229073762893677, 'epoch': 0.43} + 43%|█████████████████████████████████▏ | 290/681 [18:44<16:27, 2.53s/it] 43%|█████████████████████████████████▎ | 291/681 [18:46<16:33, 2.55s/it] {'loss': 2.0799, 'grad_norm': 952.44775390625, 'learning_rate': 3.5565456543517485e-07, 'beta_dpo/gap_mean': 79.38957214355469, 'beta_dpo/gap_std': 99.54486083984375, 'beta_dpo/beta_used_raw': 0.3404870629310608, 'beta_dpo/beta_used': 0.3404870629310608, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.42019423842430115, 'logits/rejected': -0.40653547644615173, 'beta_dpo/beta': 0.3404870629310608, 'beta_dpo/loss_margin_mean': 72.23591613769531, 'beta_dpo/beta_margin_mean': 25.296192169189453, 'beta_dpo/beta_margin_std': 31.709936141967773, 'beta_dpo/beta_margin_grad_mean': -0.2316586971282959, 'beta_dpo/beta_margin_grad_std': 0.40322452783584595, 'epoch': 0.43} + 43%|█████████████████████████████████▎ | 291/681 [18:46<16:33, 2.55s/it] 43%|█████████████████████████████████▍ | 292/681 [18:49<16:11, 2.50s/it] {'loss': 1.6587, 'grad_norm': 439.33978271484375, 'learning_rate': 3.5449008622169583e-07, 'beta_dpo/gap_mean': 79.20477294921875, 'beta_dpo/gap_std': 100.69721984863281, 'beta_dpo/beta_used_raw': -0.7626643776893616, 'beta_dpo/beta_used': 0.12015949934720993, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3626874089241028, 'logits/rejected': -0.3548169732093811, 'beta_dpo/beta': 0.12015949934720993, 'beta_dpo/loss_margin_mean': 78.73019409179688, 'beta_dpo/beta_margin_mean': 10.655224800109863, 'beta_dpo/beta_margin_std': 21.715547561645508, 'beta_dpo/beta_margin_grad_mean': -0.33290329575538635, 'beta_dpo/beta_margin_grad_std': 0.30069440603256226, 'epoch': 0.43} + 43%|█████████████████████████████████▍ | 292/681 [18:49<16:11, 2.50s/it] 43%|█████████████████████████████████▌ | 293/681 [18:51<16:28, 2.55s/it] {'loss': 1.3249, 'grad_norm': 4.400468349456787, 'learning_rate': 3.5332285359726846e-07, 'beta_dpo/gap_mean': 77.53086853027344, 'beta_dpo/gap_std': 101.82347106933594, 'beta_dpo/beta_used_raw': -0.7132205963134766, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.41298121213912964, 'logits/rejected': -0.40352344512939453, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 67.81421661376953, 'beta_dpo/beta_margin_mean': 0.06781422346830368, 'beta_dpo/beta_margin_std': 0.10339030623435974, 'beta_dpo/beta_margin_grad_mean': -0.4831177890300751, 'beta_dpo/beta_margin_grad_std': 0.025702647864818573, 'epoch': 0.43} + 43%|█████████████████████████████████▌ | 293/681 [18:51<16:28, 2.55s/it] 43%|█████████████████████████████████▋ | 294/681 [18:54<16:11, 2.51s/it] {'loss': 1.3324, 'grad_norm': 4.342075347900391, 'learning_rate': 3.5215289831955786e-07, 'beta_dpo/gap_mean': 78.0030517578125, 'beta_dpo/gap_std': 102.60092163085938, 'beta_dpo/beta_used_raw': -1.2027143239974976, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.380662739276886, 'logits/rejected': -0.3861265182495117, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 76.23664855957031, 'beta_dpo/beta_margin_mean': 0.07623665034770966, 'beta_dpo/beta_margin_std': 0.10481663793325424, 'beta_dpo/beta_margin_grad_mean': -0.48102009296417236, 'beta_dpo/beta_margin_grad_std': 0.026048097759485245, 'epoch': 0.43} + 43%|█████████████████████████████████▋ | 294/681 [18:54<16:11, 2.51s/it] 43%|█████████████████████████████████▊ | 295/681 [18:56<16:01, 2.49s/it] {'loss': 8.3528, 'grad_norm': 1862.5281982421875, 'learning_rate': 3.509802512179737e-07, 'beta_dpo/gap_mean': 76.71334075927734, 'beta_dpo/gap_std': 102.96287536621094, 'beta_dpo/beta_used_raw': 0.05702996253967285, 'beta_dpo/beta_used': 0.2947583496570587, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.37672334909439087, 'logits/rejected': -0.3786112368106842, 'beta_dpo/beta': 0.2947583496570587, 'beta_dpo/loss_margin_mean': 78.26499938964844, 'beta_dpo/beta_margin_mean': 19.130741119384766, 'beta_dpo/beta_margin_std': 50.656394958496094, 'beta_dpo/beta_margin_grad_mean': -0.40385448932647705, 'beta_dpo/beta_margin_grad_std': 0.32800954580307007, 'epoch': 0.43} + 43%|█████████████████████████████████▊ | 295/681 [18:56<16:01, 2.49s/it] 43%|█████████████████████████████████▉ | 296/681 [18:59<15:54, 2.48s/it] {'loss': 1.3325, 'grad_norm': 4.538437366485596, 'learning_rate': 3.498049431928577e-07, 'beta_dpo/gap_mean': 75.4265365600586, 'beta_dpo/gap_std': 102.43699645996094, 'beta_dpo/beta_used_raw': -1.0527881383895874, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.41676008701324463, 'logits/rejected': -0.3972277343273163, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 58.992069244384766, 'beta_dpo/beta_margin_mean': 0.05899207293987274, 'beta_dpo/beta_margin_std': 0.10110720992088318, 'beta_dpo/beta_margin_grad_mean': -0.48530909419059753, 'beta_dpo/beta_margin_grad_std': 0.02513442374765873, 'epoch': 0.43} + 43%|█████████████████████████████████▉ | 296/681 [18:59<15:54, 2.48s/it] 44%|██████████████████████████████████ | 297/681 [19:01<16:03, 2.51s/it] {'loss': 1.3264, 'grad_norm': 5.51907205581665, 'learning_rate': 3.486270052146694e-07, 'beta_dpo/gap_mean': 73.90296936035156, 'beta_dpo/gap_std': 100.1375961303711, 'beta_dpo/beta_used_raw': -0.6135008335113525, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.35295820236206055, 'logits/rejected': -0.3571382761001587, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 69.82958984375, 'beta_dpo/beta_margin_mean': 0.06982959061861038, 'beta_dpo/beta_margin_std': 0.0898992121219635, 'beta_dpo/beta_margin_grad_mean': -0.48259493708610535, 'beta_dpo/beta_margin_grad_std': 0.022371800616383553, 'epoch': 0.44} + 44%|██████████████████████████████████ | 297/681 [19:01<16:03, 2.51s/it] 44%|██████████████████████████████████▏ | 298/681 [19:04<16:45, 2.63s/it] {'loss': 1.6437, 'grad_norm': 738.2294311523438, 'learning_rate': 3.474464683231698e-07, 'beta_dpo/gap_mean': 74.29185485839844, 'beta_dpo/gap_std': 102.38994598388672, 'beta_dpo/beta_used_raw': 0.27431046962738037, 'beta_dpo/beta_used': 0.27431046962738037, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4162539839744568, 'logits/rejected': -0.4425868093967438, 'beta_dpo/beta': 0.27431046962738037, 'beta_dpo/loss_margin_mean': 86.00110626220703, 'beta_dpo/beta_margin_mean': 20.479074478149414, 'beta_dpo/beta_margin_std': 45.33749008178711, 'beta_dpo/beta_margin_grad_mean': -0.263118177652359, 'beta_dpo/beta_margin_grad_std': 0.33494073152542114, 'epoch': 0.44} + 44%|██████████████████████████████████▏ | 298/681 [19:04<16:45, 2.63s/it] 44%|██████████████████████████████████▏ | 299/681 [19:07<16:35, 2.61s/it] {'loss': 1.3176, 'grad_norm': 5.145935535430908, 'learning_rate': 3.462633636266041e-07, 'beta_dpo/gap_mean': 74.65848541259766, 'beta_dpo/gap_std': 103.56509399414062, 'beta_dpo/beta_used_raw': -0.10428804159164429, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.39493227005004883, 'logits/rejected': -0.40073153376579285, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 77.17499542236328, 'beta_dpo/beta_margin_mean': 0.07717499881982803, 'beta_dpo/beta_margin_std': 0.10790830105543137, 'beta_dpo/beta_margin_grad_mean': -0.4808002710342407, 'beta_dpo/beta_margin_grad_std': 0.02673073299229145, 'epoch': 0.44} + 44%|██████████████████████████████████▏ | 299/681 [19:07<16:35, 2.61s/it] 44%|██████████████████████████████████▎ | 300/681 [19:09<16:41, 2.63s/it] {'loss': 2.8698, 'grad_norm': 932.0242919921875, 'learning_rate': 3.4507772230088147e-07, 'beta_dpo/gap_mean': 77.76226806640625, 'beta_dpo/gap_std': 109.28889465332031, 'beta_dpo/beta_used_raw': -0.37631434202194214, 'beta_dpo/beta_used': 0.09059438109397888, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3533056974411011, 'logits/rejected': -0.35223710536956787, 'beta_dpo/beta': 0.09059438109397888, 'beta_dpo/loss_margin_mean': 91.35057830810547, 'beta_dpo/beta_margin_mean': 9.376904487609863, 'beta_dpo/beta_margin_std': 18.19443702697754, 'beta_dpo/beta_margin_grad_mean': -0.3587842583656311, 'beta_dpo/beta_margin_grad_std': 0.31596502661705017, 'epoch': 0.44} + 44%|██████████████████████████████████▎ | 300/681 [19:09<16:41, 2.63s/it][INFO|trainer.py:4307] 2026-04-17 23:42:41,926 >> +***** Running Evaluation ***** +[INFO|trainer.py:4309] 2026-04-17 23:42:41,926 >> Num examples = 2339 +[INFO|trainer.py:4312] 2026-04-17 23:42:41,926 >> Batch size = 8 + + 0%| | 0/73 [00:00> +***** Running Evaluation ***** +[INFO|trainer.py:4309] 2026-04-17 23:47:39,407 >> Num examples = 2339 +[INFO|trainer.py:4312] 2026-04-17 23:47:39,407 >> Batch size = 8 + + 0%| | 0/73 [00:00> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-400 +[INFO|configuration_utils.py:419] 2026-04-17 23:48:34,554 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-400/config.json +[INFO|configuration_utils.py:911] 2026-04-17 23:48:34,560 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-400/generation_config.json +[INFO|modeling_utils.py:3580] 2026-04-17 23:49:23,208 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-400/model.safetensors.index.json. +[INFO|tokenization_utils_base.py:2510] 2026-04-17 23:49:23,279 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-400/tokenizer_config.json +[INFO|tokenization_utils_base.py:2519] 2026-04-17 23:49:23,338 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-400/special_tokens_map.json + 59%|████████████████████████████████████████████▏ | 401/681 [29:39<7:53:02, 101.36s/it] {'loss': 1.303, 'grad_norm': 6.7877278327941895, 'learning_rate': 2.1800473436235136e-07, 'beta_dpo/gap_mean': 113.66698455810547, 'beta_dpo/gap_std': 148.7388916015625, 'beta_dpo/beta_used_raw': -1.313326358795166, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.35054802894592285, 'logits/rejected': -0.3441402316093445, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 94.4480209350586, 'beta_dpo/beta_margin_mean': 0.094448022544384, 'beta_dpo/beta_margin_std': 0.17364467680454254, 'beta_dpo/beta_margin_grad_mean': -0.4766118824481964, 'beta_dpo/beta_margin_grad_std': 0.042866192758083344, 'epoch': 0.59} + 59%|████████████████████████████████████████████▏ | 401/681 [29:39<7:53:02, 101.36s/it] 59%|████████████████████████████████████████████▊ | 402/681 [29:41<5:33:04, 71.63s/it] {'loss': 5.1878, 'grad_norm': 2755.770263671875, 'learning_rate': 2.1673238449588665e-07, 'beta_dpo/gap_mean': 119.07506561279297, 'beta_dpo/gap_std': 149.63043212890625, 'beta_dpo/beta_used_raw': 0.6786636710166931, 'beta_dpo/beta_used': 0.6786636710166931, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3783169388771057, 'logits/rejected': -0.3567967414855957, 'beta_dpo/beta': 0.6786636710166931, 'beta_dpo/loss_margin_mean': 158.593994140625, 'beta_dpo/beta_margin_mean': 108.07469177246094, 'beta_dpo/beta_margin_std': 99.87371826171875, 'beta_dpo/beta_margin_grad_mean': -0.10659972578287125, 'beta_dpo/beta_margin_grad_std': 0.3003370761871338, 'epoch': 0.59} + 59%|████████████████████████████████████████████▊ | 402/681 [29:41<5:33:04, 71.63s/it] 59%|████████████████████████████████████████████▉ | 403/681 [29:44<3:55:41, 50.87s/it] {'loss': 1.2773, 'grad_norm': 6.965160369873047, 'learning_rate': 2.154609112620295e-07, 'beta_dpo/gap_mean': 120.31027221679688, 'beta_dpo/gap_std': 146.5064697265625, 'beta_dpo/beta_used_raw': -0.16943010687828064, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.38709545135498047, 'logits/rejected': -0.3838120698928833, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 119.02156829833984, 'beta_dpo/beta_margin_mean': 0.11902157217264175, 'beta_dpo/beta_margin_std': 0.13397535681724548, 'beta_dpo/beta_margin_grad_mean': -0.4704398214817047, 'beta_dpo/beta_margin_grad_std': 0.03317659720778465, 'epoch': 0.59} + 59%|████████████████████████████████████████████▉ | 403/681 [29:44<3:55:41, 50.87s/it] 59%|█████████████████████████████████████████████ | 404/681 [29:46<2:47:48, 36.35s/it] {'loss': 1.2817, 'grad_norm': 7.6397705078125, 'learning_rate': 2.1419034816528218e-07, 'beta_dpo/gap_mean': 120.10807800292969, 'beta_dpo/gap_std': 150.40188598632812, 'beta_dpo/beta_used_raw': -0.3914072513580322, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.32211601734161377, 'logits/rejected': -0.3063517212867737, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 116.9678955078125, 'beta_dpo/beta_margin_mean': 0.11696790158748627, 'beta_dpo/beta_margin_std': 0.16781915724277496, 'beta_dpo/beta_margin_grad_mean': -0.4710405468940735, 'beta_dpo/beta_margin_grad_std': 0.04145493730902672, 'epoch': 0.59} + 59%|█████████████████████████████████████████████ | 404/681 [29:46<2:47:48, 36.35s/it] 59%|█████████████████████████████████████████████▏ | 405/681 [29:49<2:00:16, 26.15s/it] {'loss': 1.3212, 'grad_norm': 7.145941257476807, 'learning_rate': 2.129207286861638e-07, 'beta_dpo/gap_mean': 115.75703430175781, 'beta_dpo/gap_std': 156.31784057617188, 'beta_dpo/beta_used_raw': -2.5856375694274902, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3783246874809265, 'logits/rejected': -0.35847070813179016, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 101.84107971191406, 'beta_dpo/beta_margin_mean': 0.10184108465909958, 'beta_dpo/beta_margin_std': 0.19063597917556763, 'beta_dpo/beta_margin_grad_mean': -0.4748651087284088, 'beta_dpo/beta_margin_grad_std': 0.0469396598637104, 'epoch': 0.59} + 59%|█████████████████████████████████████████████▏ | 405/681 [29:49<2:00:16, 26.15s/it] 60%|█████████████████████████████████████████████▎ | 406/681 [29:51<1:27:18, 19.05s/it] {'loss': 1.2817, 'grad_norm': 7.6379899978637695, 'learning_rate': 2.1165208628032861e-07, 'beta_dpo/gap_mean': 117.24635314941406, 'beta_dpo/gap_std': 158.91787719726562, 'beta_dpo/beta_used_raw': -0.2788747549057007, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3630604147911072, 'logits/rejected': -0.35475897789001465, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 132.46351623535156, 'beta_dpo/beta_margin_mean': 0.13246352970600128, 'beta_dpo/beta_margin_std': 0.16599087417125702, 'beta_dpo/beta_margin_grad_mean': -0.46724018454551697, 'beta_dpo/beta_margin_grad_std': 0.04072672128677368, 'epoch': 0.6} + 60%|█████████████████████████████████████████████▎ | 406/681 [29:51<1:27:18, 19.05s/it] 60%|█████████████████████████████████████████████▍ | 407/681 [29:54<1:04:22, 14.10s/it] {'loss': 10.9265, 'grad_norm': 4827.95361328125, 'learning_rate': 2.1038445437768375e-07, 'beta_dpo/gap_mean': 115.60337829589844, 'beta_dpo/gap_std': 158.64419555664062, 'beta_dpo/beta_used_raw': -0.8416473865509033, 'beta_dpo/beta_used': 0.9161151051521301, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.377028226852417, 'logits/rejected': -0.3416253924369812, 'beta_dpo/beta': 0.9161151051521301, 'beta_dpo/loss_margin_mean': 100.3062515258789, 'beta_dpo/beta_margin_mean': 123.79098510742188, 'beta_dpo/beta_margin_std': 235.88023376464844, 'beta_dpo/beta_margin_grad_mean': -0.3084886372089386, 'beta_dpo/beta_margin_grad_std': 0.2934010624885559, 'epoch': 0.6} + 60%|█████████████████████████████████████████████▍ | 407/681 [29:54<1:04:22, 14.10s/it] 60%|██████████████████████████████████████████████▋ | 408/681 [29:57<48:52, 10.74s/it] {'loss': 1.3143, 'grad_norm': 7.195991516113281, 'learning_rate': 2.0911786638150872e-07, 'beta_dpo/gap_mean': 111.73031616210938, 'beta_dpo/gap_std': 154.44381713867188, 'beta_dpo/beta_used_raw': -2.016765594482422, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.40478670597076416, 'logits/rejected': -0.37068575620651245, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 79.95922088623047, 'beta_dpo/beta_margin_mean': 0.07995922118425369, 'beta_dpo/beta_margin_std': 0.13016226887702942, 'beta_dpo/beta_margin_grad_mean': -0.4801346957683563, 'beta_dpo/beta_margin_grad_std': 0.03229653090238571, 'epoch': 0.6} + 60%|██████████████████████████████████████████████▋ | 408/681 [29:57<48:52, 10.74s/it] 60%|██████████████████████████████████████████████▊ | 409/681 [29:59<37:42, 8.32s/it] {'loss': 1.3136, 'grad_norm': 7.589075565338135, 'learning_rate': 2.0785235566757517e-07, 'beta_dpo/gap_mean': 109.16544342041016, 'beta_dpo/gap_std': 155.03025817871094, 'beta_dpo/beta_used_raw': -1.8204164505004883, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.34394949674606323, 'logits/rejected': -0.3319231867790222, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 111.70116424560547, 'beta_dpo/beta_margin_mean': 0.11170117557048798, 'beta_dpo/beta_margin_std': 0.15871573984622955, 'beta_dpo/beta_margin_grad_mean': -0.4723385274410248, 'beta_dpo/beta_margin_grad_std': 0.03917807340621948, 'epoch': 0.6} + 60%|██████████████████████████████████████████████▊ | 409/681 [29:59<37:42, 8.32s/it] 60%|██████████████████████████████████████████████▉ | 410/681 [30:02<29:45, 6.59s/it] {'loss': 1.3845, 'grad_norm': 1850.3192138671875, 'learning_rate': 2.065879555832674e-07, 'beta_dpo/gap_mean': 112.37252044677734, 'beta_dpo/gap_std': 154.6945343017578, 'beta_dpo/beta_used_raw': -0.16834038496017456, 'beta_dpo/beta_used': 0.5207417011260986, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3558204472064972, 'logits/rejected': -0.34983137249946594, 'beta_dpo/beta': 0.5207417011260986, 'beta_dpo/loss_margin_mean': 121.03668975830078, 'beta_dpo/beta_margin_mean': 55.4542121887207, 'beta_dpo/beta_margin_std': 125.90103912353516, 'beta_dpo/beta_margin_grad_mean': -0.3113498389720917, 'beta_dpo/beta_margin_grad_std': 0.3010904788970947, 'epoch': 0.6} + 60%|██████████████████████████████████████████████▉ | 410/681 [30:02<29:45, 6.59s/it] 60%|███████████████████████████████████████████████ | 411/681 [30:04<23:44, 5.28s/it] {'loss': 1.3083, 'grad_norm': 10.161256790161133, 'learning_rate': 2.0532469944670343e-07, 'beta_dpo/gap_mean': 113.07457733154297, 'beta_dpo/gap_std': 160.96011352539062, 'beta_dpo/beta_used_raw': -1.678023338317871, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.293745219707489, 'logits/rejected': -0.2922123670578003, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 106.50434112548828, 'beta_dpo/beta_margin_mean': 0.10650434345006943, 'beta_dpo/beta_margin_std': 0.19019237160682678, 'beta_dpo/beta_margin_grad_mean': -0.4737773537635803, 'beta_dpo/beta_margin_grad_std': 0.046403612941503525, 'epoch': 0.6} + 60%|███████████████████████████████████████████████ | 411/681 [30:04<23:44, 5.28s/it] 60%|███████████████████████████████████████████████▏ | 412/681 [30:06<19:40, 4.39s/it] {'loss': 1.3101, 'grad_norm': 7.504628658294678, 'learning_rate': 2.0406262054585738e-07, 'beta_dpo/gap_mean': 111.26490783691406, 'beta_dpo/gap_std': 163.46185302734375, 'beta_dpo/beta_used_raw': -1.7123887538909912, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3158118724822998, 'logits/rejected': -0.32687675952911377, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 115.88773345947266, 'beta_dpo/beta_margin_mean': 0.11588773876428604, 'beta_dpo/beta_margin_std': 0.17700567841529846, 'beta_dpo/beta_margin_grad_mean': -0.4713696539402008, 'beta_dpo/beta_margin_grad_std': 0.04353627562522888, 'epoch': 0.6} + 60%|███████████████████████████████████████████████▏ | 412/681 [30:06<19:40, 4.39s/it] 61%|███████████████████████████████████████████████▎ | 413/681 [30:09<17:20, 3.88s/it] {'loss': 1.2993, 'grad_norm': 10.111505508422852, 'learning_rate': 2.0280175213768205e-07, 'beta_dpo/gap_mean': 110.88575744628906, 'beta_dpo/gap_std': 163.36767578125, 'beta_dpo/beta_used_raw': -0.9482701420783997, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.33399781584739685, 'logits/rejected': -0.3200353980064392, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 115.44949340820312, 'beta_dpo/beta_margin_mean': 0.11544950306415558, 'beta_dpo/beta_margin_std': 0.16624687612056732, 'beta_dpo/beta_margin_grad_mean': -0.4713848829269409, 'beta_dpo/beta_margin_grad_std': 0.04106110334396362, 'epoch': 0.61} + 61%|███████████████████████████████████████████████▎ | 413/681 [30:09<17:20, 3.88s/it] 61%|███████████████████████████████████████████████▍ | 414/681 [30:11<15:27, 3.47s/it] {'loss': 18.094, 'grad_norm': 10158.8984375, 'learning_rate': 2.0154212744723247e-07, 'beta_dpo/gap_mean': 114.5771484375, 'beta_dpo/gap_std': 164.32669067382812, 'beta_dpo/beta_used_raw': 0.5759499669075012, 'beta_dpo/beta_used': 1.1125692129135132, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.29129675030708313, 'logits/rejected': -0.28304004669189453, 'beta_dpo/beta': 1.1125692129135132, 'beta_dpo/loss_margin_mean': 140.61228942871094, 'beta_dpo/beta_margin_mean': 208.84002685546875, 'beta_dpo/beta_margin_std': 342.9871826171875, 'beta_dpo/beta_margin_grad_mean': -0.3008454442024231, 'beta_dpo/beta_margin_grad_std': 0.29388001561164856, 'epoch': 0.61} + 61%|███████████████████████████████████████████████▍ | 414/681 [30:12<15:27, 3.47s/it] 61%|███████████████████████████████████████████████▌ | 415/681 [30:14<14:27, 3.26s/it] {'loss': 1.3127, 'grad_norm': 7.246009826660156, 'learning_rate': 2.002837796667909e-07, 'beta_dpo/gap_mean': 116.05256652832031, 'beta_dpo/gap_std': 165.4222412109375, 'beta_dpo/beta_used_raw': -2.1535353660583496, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3636121153831482, 'logits/rejected': -0.35459795594215393, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 95.04881286621094, 'beta_dpo/beta_margin_mean': 0.09504882246255875, 'beta_dpo/beta_margin_std': 0.1632952094078064, 'beta_dpo/beta_margin_grad_mean': -0.47645503282546997, 'beta_dpo/beta_margin_grad_std': 0.04023678973317146, 'epoch': 0.61} + 61%|███████████████████████████████████████████████▌ | 415/681 [30:14<14:27, 3.26s/it] 61%|███████████████████████████████████████████████▋ | 416/681 [30:17<13:25, 3.04s/it] {'loss': 7.9484, 'grad_norm': 9633.19921875, 'learning_rate': 1.990267419549914e-07, 'beta_dpo/gap_mean': 118.31330108642578, 'beta_dpo/gap_std': 161.25177001953125, 'beta_dpo/beta_used_raw': 0.8338208198547363, 'beta_dpo/beta_used': 0.8338208198547363, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3395143449306488, 'logits/rejected': -0.32628241181373596, 'beta_dpo/beta': 0.8338208198547363, 'beta_dpo/loss_margin_mean': 147.69627380371094, 'beta_dpo/beta_margin_mean': 129.33334350585938, 'beta_dpo/beta_margin_std': 189.8321990966797, 'beta_dpo/beta_margin_grad_mean': -0.17009158432483673, 'beta_dpo/beta_margin_grad_std': 0.35257911682128906, 'epoch': 0.61} + 61%|███████████████████████████████████████████████▋ | 416/681 [30:17<13:25, 3.04s/it] 61%|███████████████████████████████████████████████▊ | 417/681 [30:19<12:26, 2.83s/it] {'loss': 6.8179, 'grad_norm': 3923.328857421875, 'learning_rate': 1.9777104743594686e-07, 'beta_dpo/gap_mean': 119.18829345703125, 'beta_dpo/gap_std': 156.21324157714844, 'beta_dpo/beta_used_raw': 0.1691010594367981, 'beta_dpo/beta_used': 0.3650580644607544, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.27080368995666504, 'logits/rejected': -0.22706595063209534, 'beta_dpo/beta': 0.3650580644607544, 'beta_dpo/loss_margin_mean': 115.47030639648438, 'beta_dpo/beta_margin_mean': 45.415931701660156, 'beta_dpo/beta_margin_std': 81.82047271728516, 'beta_dpo/beta_margin_grad_mean': -0.3305802643299103, 'beta_dpo/beta_margin_grad_std': 0.3116385340690613, 'epoch': 0.61} + 61%|███████████████████████████████████████████████▊ | 417/681 [30:19<12:26, 2.83s/it] 61%|███████████████████████████████████████████████▉ | 418/681 [30:22<12:19, 2.81s/it] {'loss': 2.2555, 'grad_norm': 956.6565551757812, 'learning_rate': 1.965167291983757e-07, 'beta_dpo/gap_mean': 119.43331909179688, 'beta_dpo/gap_std': 159.44818115234375, 'beta_dpo/beta_used_raw': -0.4067423641681671, 'beta_dpo/beta_used': 0.1472662091255188, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.38150128722190857, 'logits/rejected': -0.33936968445777893, 'beta_dpo/beta': 0.1472662091255188, 'beta_dpo/loss_margin_mean': 129.963623046875, 'beta_dpo/beta_margin_mean': 24.18821907043457, 'beta_dpo/beta_margin_std': 42.399009704589844, 'beta_dpo/beta_margin_grad_mean': -0.31466129422187805, 'beta_dpo/beta_margin_grad_std': 0.29242756962776184, 'epoch': 0.61} + 61%|███████████████████████████████████████████████▉ | 418/681 [30:22<12:19, 2.81s/it] 62%|███████████████████████████████████████████████▉ | 419/681 [30:25<12:03, 2.76s/it] {'loss': 1.3448, 'grad_norm': 406.3208312988281, 'learning_rate': 1.9526382029472988e-07, 'beta_dpo/gap_mean': 123.40135192871094, 'beta_dpo/gap_std': 159.61978149414062, 'beta_dpo/beta_used_raw': -0.6058524250984192, 'beta_dpo/beta_used': 0.04090619087219238, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3685181736946106, 'logits/rejected': -0.35807985067367554, 'beta_dpo/beta': 0.04090619087219238, 'beta_dpo/loss_margin_mean': 131.7987060546875, 'beta_dpo/beta_margin_mean': 5.522484302520752, 'beta_dpo/beta_margin_std': 10.368701934814453, 'beta_dpo/beta_margin_grad_mean': -0.3300994336605072, 'beta_dpo/beta_margin_grad_std': 0.2953225076198578, 'epoch': 0.62} + 62%|███████████████████████████████████████████████▉ | 419/681 [30:25<12:03, 2.76s/it] 62%|████████████████████████████████████████████████ | 420/681 [30:27<11:42, 2.69s/it] {'loss': 1.2911, 'grad_norm': 8.381654739379883, 'learning_rate': 1.9401235374032425e-07, 'beta_dpo/gap_mean': 117.49530029296875, 'beta_dpo/gap_std': 161.63946533203125, 'beta_dpo/beta_used_raw': -0.7748525738716125, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.34530162811279297, 'logits/rejected': -0.2882389426231384, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 84.27227783203125, 'beta_dpo/beta_margin_mean': 0.08427228033542633, 'beta_dpo/beta_margin_std': 0.17619559168815613, 'beta_dpo/beta_margin_grad_mean': -0.479174941778183, 'beta_dpo/beta_margin_grad_std': 0.043484870344400406, 'epoch': 0.62} + 62%|████████████████████████████████████████████████ | 420/681 [30:27<11:42, 2.69s/it] 62%|████████████████████████████████████████████████▏ | 421/681 [30:30<11:43, 2.70s/it] {'loss': 1.2975, 'grad_norm': 6.698497772216797, 'learning_rate': 1.9276236251246653e-07, 'beta_dpo/gap_mean': 111.70301818847656, 'beta_dpo/gap_std': 160.45973205566406, 'beta_dpo/beta_used_raw': -0.9457611441612244, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3635827600955963, 'logits/rejected': -0.3487810492515564, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 89.76795959472656, 'beta_dpo/beta_margin_mean': 0.08976796269416809, 'beta_dpo/beta_margin_std': 0.1381371021270752, 'beta_dpo/beta_margin_grad_mean': -0.4776723086833954, 'beta_dpo/beta_margin_grad_std': 0.03428473323583603, 'epoch': 0.62} + 62%|████████████████████████████████████████████████▏ | 421/681 [30:30<11:43, 2.70s/it] 62%|████████████████████████████████████████████████▎ | 422/681 [30:33<11:48, 2.74s/it] {'loss': 1.3, 'grad_norm': 7.295708179473877, 'learning_rate': 1.9151387954958792e-07, 'beta_dpo/gap_mean': 108.79386901855469, 'beta_dpo/gap_std': 155.77139282226562, 'beta_dpo/beta_used_raw': -0.9157909154891968, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.323574960231781, 'logits/rejected': -0.3058650493621826, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 94.8263168334961, 'beta_dpo/beta_margin_mean': 0.09482631832361221, 'beta_dpo/beta_margin_std': 0.14304772019386292, 'beta_dpo/beta_margin_grad_mean': -0.476465106010437, 'beta_dpo/beta_margin_grad_std': 0.03539099171757698, 'epoch': 0.62} + 62%|████████████████████████████████████████████████▎ | 422/681 [30:33<11:48, 2.74s/it] 62%|████████████████████████████████████████████████▍ | 423/681 [30:35<11:17, 2.63s/it] {'loss': 5.8705, 'grad_norm': 2952.7294921875, 'learning_rate': 1.902669377503756e-07, 'beta_dpo/gap_mean': 111.04264831542969, 'beta_dpo/gap_std': 153.08340454101562, 'beta_dpo/beta_used_raw': 0.5498670339584351, 'beta_dpo/beta_used': 0.5498670339584351, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.29522740840911865, 'logits/rejected': -0.2932446002960205, 'beta_dpo/beta': 0.5498670339584351, 'beta_dpo/loss_margin_mean': 124.84105682373047, 'beta_dpo/beta_margin_mean': 67.63153839111328, 'beta_dpo/beta_margin_std': 82.61705017089844, 'beta_dpo/beta_margin_grad_mean': -0.19201448559761047, 'beta_dpo/beta_margin_grad_std': 0.38938337564468384, 'epoch': 0.62} + 62%|████████████████████████████████████████████████▍ | 423/681 [30:35<11:17, 2.63s/it] 62%|████████████████████████████████████████████████▌ | 424/681 [30:38<11:17, 2.64s/it] {'loss': 0.9681, 'grad_norm': 137.83319091796875, 'learning_rate': 1.890215699729057e-07, 'beta_dpo/gap_mean': 112.22328186035156, 'beta_dpo/gap_std': 152.5062255859375, 'beta_dpo/beta_used_raw': -1.4149752855300903, 'beta_dpo/beta_used': 0.027477234601974487, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3986721932888031, 'logits/rejected': -0.3727181553840637, 'beta_dpo/beta': 0.027477234601974487, 'beta_dpo/loss_margin_mean': 109.46617126464844, 'beta_dpo/beta_margin_mean': 3.6695759296417236, 'beta_dpo/beta_margin_std': 6.411843299865723, 'beta_dpo/beta_margin_grad_mean': -0.31576114892959595, 'beta_dpo/beta_margin_grad_std': 0.28133726119995117, 'epoch': 0.62} + 62%|████████████████████████████████████████████████▌ | 424/681 [30:38<11:17, 2.64s/it] 62%|████████████████████████████████████████████████▋ | 425/681 [30:40<11:05, 2.60s/it] {'loss': 6.1878, 'grad_norm': 6124.79150390625, 'learning_rate': 1.8777780903377732e-07, 'beta_dpo/gap_mean': 109.38998413085938, 'beta_dpo/gap_std': 150.577880859375, 'beta_dpo/beta_used_raw': 0.4376869797706604, 'beta_dpo/beta_used': 0.5835731625556946, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3280683159828186, 'logits/rejected': -0.32920628786087036, 'beta_dpo/beta': 0.5835731625556946, 'beta_dpo/loss_margin_mean': 106.90679168701172, 'beta_dpo/beta_margin_mean': 72.92134857177734, 'beta_dpo/beta_margin_std': 129.18519592285156, 'beta_dpo/beta_margin_grad_mean': -0.3327001929283142, 'beta_dpo/beta_margin_grad_std': 0.312762588262558, 'epoch': 0.62} + 62%|████████████████████████████████████████████████▋ | 425/681 [30:40<11:05, 2.60s/it] 63%|████████████████████████████████████████████████▊ | 426/681 [30:43<11:09, 2.63s/it] {'loss': 8.4638, 'grad_norm': 5486.13525390625, 'learning_rate': 1.8653568770724803e-07, 'beta_dpo/gap_mean': 111.31645965576172, 'beta_dpo/gap_std': 149.850341796875, 'beta_dpo/beta_used_raw': -0.7809062600135803, 'beta_dpo/beta_used': 0.8895680904388428, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.37183499336242676, 'logits/rejected': -0.31186115741729736, 'beta_dpo/beta': 0.8895680904388428, 'beta_dpo/loss_margin_mean': 127.76013946533203, 'beta_dpo/beta_margin_mean': 142.06744384765625, 'beta_dpo/beta_margin_std': 253.59666442871094, 'beta_dpo/beta_margin_grad_mean': -0.30061760544776917, 'beta_dpo/beta_margin_grad_std': 0.29346781969070435, 'epoch': 0.63} + 63%|████████████████████████████████████████████████▊ | 426/681 [30:43<11:09, 2.63s/it] 63%|████████████████████████████████████████████████▉ | 427/681 [30:45<11:05, 2.62s/it] {'loss': 1.306, 'grad_norm': 6.825258731842041, 'learning_rate': 1.8529523872436977e-07, 'beta_dpo/gap_mean': 109.63316345214844, 'beta_dpo/gap_std': 148.7486572265625, 'beta_dpo/beta_used_raw': -1.4412474632263184, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3810279965400696, 'logits/rejected': -0.35081952810287476, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 81.17052459716797, 'beta_dpo/beta_margin_mean': 0.08117052912712097, 'beta_dpo/beta_margin_std': 0.13242076337337494, 'beta_dpo/beta_margin_grad_mean': -0.4798411726951599, 'beta_dpo/beta_margin_grad_std': 0.03274958208203316, 'epoch': 0.63} + 63%|████████████████████████████████████████████████▉ | 427/681 [30:45<11:05, 2.62s/it] 63%|█████████████████████████████████████████████████ | 428/681 [30:48<11:12, 2.66s/it] {'loss': 3.3058, 'grad_norm': 3710.728271484375, 'learning_rate': 1.8405649477212697e-07, 'beta_dpo/gap_mean': 109.1749267578125, 'beta_dpo/gap_std': 149.90882873535156, 'beta_dpo/beta_used_raw': -0.37105491757392883, 'beta_dpo/beta_used': 0.41161054372787476, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.37873727083206177, 'logits/rejected': -0.37077072262763977, 'beta_dpo/beta': 0.41161054372787476, 'beta_dpo/loss_margin_mean': 120.1363296508789, 'beta_dpo/beta_margin_mean': 47.70144271850586, 'beta_dpo/beta_margin_std': 109.32994842529297, 'beta_dpo/beta_margin_grad_mean': -0.32943397760391235, 'beta_dpo/beta_margin_grad_std': 0.30981266498565674, 'epoch': 0.63} + 63%|█████████████████████████████████████████████████ | 428/681 [30:48<11:12, 2.66s/it] 63%|█████████████████████████████████████████████████▏ | 429/681 [30:51<11:14, 2.68s/it] {'loss': 7.8201, 'grad_norm': 3259.48974609375, 'learning_rate': 1.828194884925749e-07, 'beta_dpo/gap_mean': 107.56082916259766, 'beta_dpo/gap_std': 150.14230346679688, 'beta_dpo/beta_used_raw': 0.4344549775123596, 'beta_dpo/beta_used': 0.4344549775123596, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.38967394828796387, 'logits/rejected': -0.33787745237350464, 'beta_dpo/beta': 0.4344549775123596, 'beta_dpo/loss_margin_mean': 103.24775695800781, 'beta_dpo/beta_margin_mean': 48.72703552246094, 'beta_dpo/beta_margin_std': 64.88159942626953, 'beta_dpo/beta_margin_grad_mean': -0.2335137575864792, 'beta_dpo/beta_margin_grad_std': 0.4133719801902771, 'epoch': 0.63} + 63%|█████████████████████████████████████████████████▏ | 429/681 [30:51<11:14, 2.68s/it] 63%|█████████████████████████████████████████████████▎ | 430/681 [30:54<11:43, 2.80s/it] {'loss': 1.3051, 'grad_norm': 7.614285945892334, 'learning_rate': 1.8158425248197928e-07, 'beta_dpo/gap_mean': 109.13970947265625, 'beta_dpo/gap_std': 147.79107666015625, 'beta_dpo/beta_used_raw': -1.2862778902053833, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4028991460800171, 'logits/rejected': -0.40245670080184937, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 112.42671203613281, 'beta_dpo/beta_margin_mean': 0.11242672055959702, 'beta_dpo/beta_margin_std': 0.14071322977542877, 'beta_dpo/beta_margin_grad_mean': -0.4721178114414215, 'beta_dpo/beta_margin_grad_std': 0.03465087339282036, 'epoch': 0.63} + 63%|█████████████████████████████████████████████████▎ | 430/681 [30:54<11:43, 2.80s/it] 63%|█████████████████████████████████████████████████▎ | 431/681 [30:57<11:33, 2.77s/it] {'loss': 1.2936, 'grad_norm': 6.900725841522217, 'learning_rate': 1.8035081928995788e-07, 'beta_dpo/gap_mean': 113.27009582519531, 'beta_dpo/gap_std': 150.56829833984375, 'beta_dpo/beta_used_raw': -0.7770711183547974, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.33334821462631226, 'logits/rejected': -0.32843929529190063, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 130.09228515625, 'beta_dpo/beta_margin_mean': 0.1300922930240631, 'beta_dpo/beta_margin_std': 0.15987038612365723, 'beta_dpo/beta_margin_grad_mean': -0.4677823781967163, 'beta_dpo/beta_margin_grad_std': 0.03935808688402176, 'epoch': 0.63} + 63%|█████████████████████████████████████████████████▎ | 431/681 [30:57<11:33, 2.77s/it] 63%|█████████████████████████████████████████████████▍ | 432/681 [30:59<11:29, 2.77s/it] {'loss': 2.6038, 'grad_norm': 871.7344970703125, 'learning_rate': 1.791192214186223e-07, 'beta_dpo/gap_mean': 113.14790344238281, 'beta_dpo/gap_std': 143.69342041015625, 'beta_dpo/beta_used_raw': -0.6099668145179749, 'beta_dpo/beta_used': 0.10785573720932007, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4066033363342285, 'logits/rejected': -0.37539827823638916, 'beta_dpo/beta': 0.10785573720932007, 'beta_dpo/loss_margin_mean': 124.32618713378906, 'beta_dpo/beta_margin_mean': 17.94474220275879, 'beta_dpo/beta_margin_std': 30.068361282348633, 'beta_dpo/beta_margin_grad_mean': -0.31203174591064453, 'beta_dpo/beta_margin_grad_std': 0.2826971411705017, 'epoch': 0.63} + 63%|█████████████████████████████████████████████████▍ | 432/681 [30:59<11:29, 2.77s/it] 64%|█████████████████████████████████████████████████▌ | 433/681 [31:02<11:11, 2.71s/it] {'loss': 0.6547, 'grad_norm': 21.230777740478516, 'learning_rate': 1.7788949132172193e-07, 'beta_dpo/gap_mean': 112.42630767822266, 'beta_dpo/gap_std': 144.95359802246094, 'beta_dpo/beta_used_raw': -0.3640483319759369, 'beta_dpo/beta_used': 0.2765732407569885, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.358863890171051, 'logits/rejected': -0.34688135981559753, 'beta_dpo/beta': 0.2765732407569885, 'beta_dpo/loss_margin_mean': 93.55750274658203, 'beta_dpo/beta_margin_mean': 27.676023483276367, 'beta_dpo/beta_margin_std': 58.62560272216797, 'beta_dpo/beta_margin_grad_mean': -0.31871679425239563, 'beta_dpo/beta_margin_grad_std': 0.3027940094470978, 'epoch': 0.64} + 64%|█████████████████████████████████████████████████▌ | 433/681 [31:02<11:11, 2.71s/it] 64%|█████████████████████████████████████████████████▋ | 434/681 [31:05<10:59, 2.67s/it] {'loss': 1.3143, 'grad_norm': 6.974902629852295, 'learning_rate': 1.7666166140378853e-07, 'beta_dpo/gap_mean': 108.65299987792969, 'beta_dpo/gap_std': 142.203125, 'beta_dpo/beta_used_raw': -1.8638619184494019, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.39474016427993774, 'logits/rejected': -0.36454617977142334, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 86.80241394042969, 'beta_dpo/beta_margin_mean': 0.0868024155497551, 'beta_dpo/beta_margin_std': 0.12513087689876556, 'beta_dpo/beta_margin_grad_mean': -0.47845569252967834, 'beta_dpo/beta_margin_grad_std': 0.030826503410935402, 'epoch': 0.64} + 64%|█████████████████████████████████████████████████▋ | 434/681 [31:05<10:59, 2.67s/it] 64%|█████████████████████████████████████████████████▊ | 435/681 [31:07<10:25, 2.54s/it] {'loss': 2.6214, 'grad_norm': 1543.473388671875, 'learning_rate': 1.7543576401928218e-07, 'beta_dpo/gap_mean': 108.19963073730469, 'beta_dpo/gap_std': 141.86123657226562, 'beta_dpo/beta_used_raw': 0.4795774221420288, 'beta_dpo/beta_used': 0.4795774221420288, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3778401017189026, 'logits/rejected': -0.35977697372436523, 'beta_dpo/beta': 0.4795774221420288, 'beta_dpo/loss_margin_mean': 114.60810089111328, 'beta_dpo/beta_margin_mean': 54.45040512084961, 'beta_dpo/beta_margin_std': 62.09480285644531, 'beta_dpo/beta_margin_grad_mean': -0.12648658454418182, 'beta_dpo/beta_margin_grad_std': 0.3134034276008606, 'epoch': 0.64} + 64%|█████████████████████████████████████████████████▊ | 435/681 [31:07<10:25, 2.54s/it] 64%|█████████████████████████████████████████████████▉ | 436/681 [31:09<10:28, 2.56s/it] {'loss': 1.3673, 'grad_norm': 229.00344848632812, 'learning_rate': 1.742118314717391e-07, 'beta_dpo/gap_mean': 106.91453552246094, 'beta_dpo/gap_std': 138.24383544921875, 'beta_dpo/beta_used_raw': -1.441216230392456, 'beta_dpo/beta_used': 0.055185671895742416, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.40563905239105225, 'logits/rejected': -0.3649734854698181, 'beta_dpo/beta': 0.055185671895742416, 'beta_dpo/loss_margin_mean': 96.44285583496094, 'beta_dpo/beta_margin_mean': 5.6773810386657715, 'beta_dpo/beta_margin_std': 10.930699348449707, 'beta_dpo/beta_margin_grad_mean': -0.30519527196884155, 'beta_dpo/beta_margin_grad_std': 0.2901572585105896, 'epoch': 0.64} + 64%|█████████████████████████████████████████████████▉ | 436/681 [31:09<10:28, 2.56s/it] 64%|██████████████████████████████████████████████████ | 437/681 [31:12<10:37, 2.61s/it] {'loss': 5.1701, 'grad_norm': 1593.89501953125, 'learning_rate': 1.7298989601292036e-07, 'beta_dpo/gap_mean': 104.29106140136719, 'beta_dpo/gap_std': 136.22210693359375, 'beta_dpo/beta_used_raw': -0.26632630825042725, 'beta_dpo/beta_used': 0.44650039076805115, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3833288848400116, 'logits/rejected': -0.3413906693458557, 'beta_dpo/beta': 0.44650039076805115, 'beta_dpo/loss_margin_mean': 96.21728515625, 'beta_dpo/beta_margin_mean': 45.99268341064453, 'beta_dpo/beta_margin_std': 82.80380249023438, 'beta_dpo/beta_margin_grad_mean': -0.31715255975723267, 'beta_dpo/beta_margin_grad_std': 0.30363377928733826, 'epoch': 0.64} + 64%|██████████████████████████████████████████████████ | 437/681 [31:12<10:37, 2.61s/it] 64%|██████████████████████████████████████████████████▏ | 438/681 [31:15<10:17, 2.54s/it] {'loss': 8.9122, 'grad_norm': 7641.5771484375, 'learning_rate': 1.7176998984196144e-07, 'beta_dpo/gap_mean': 107.95907592773438, 'beta_dpo/gap_std': 133.67709350585938, 'beta_dpo/beta_used_raw': 1.1907906532287598, 'beta_dpo/beta_used': 1.1907906532287598, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.37084126472473145, 'logits/rejected': -0.3320963382720947, 'beta_dpo/beta': 1.1907906532287598, 'beta_dpo/loss_margin_mean': 126.7836685180664, 'beta_dpo/beta_margin_mean': 154.26736450195312, 'beta_dpo/beta_margin_std': 161.1520538330078, 'beta_dpo/beta_margin_grad_mean': -0.1750006526708603, 'beta_dpo/beta_margin_grad_std': 0.37429773807525635, 'epoch': 0.64} + 64%|██████████████████████████████████████████████████▏ | 438/681 [31:15<10:17, 2.54s/it] 64%|██████████████████████████████████████████████████▎ | 439/681 [31:17<10:06, 2.51s/it] {'loss': 1.4163, 'grad_norm': 512.3974609375, 'learning_rate': 1.7055214510452458e-07, 'beta_dpo/gap_mean': 107.83575439453125, 'beta_dpo/gap_std': 133.11056518554688, 'beta_dpo/beta_used_raw': -1.7231221199035645, 'beta_dpo/beta_used': 0.07319752871990204, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.40375328063964844, 'logits/rejected': -0.4028066396713257, 'beta_dpo/beta': 0.07319752871990204, 'beta_dpo/loss_margin_mean': 90.30883026123047, 'beta_dpo/beta_margin_mean': 9.903467178344727, 'beta_dpo/beta_margin_std': 17.277389526367188, 'beta_dpo/beta_margin_grad_mean': -0.32128748297691345, 'beta_dpo/beta_margin_grad_std': 0.2859705984592438, 'epoch': 0.64} + 64%|██████████████████████████████████████████████████▎ | 439/681 [31:17<10:06, 2.51s/it] 65%|██████████████████████████████████████████████████▍ | 440/681 [31:19<09:55, 2.47s/it] {'loss': 1.3123, 'grad_norm': 11.201451301574707, 'learning_rate': 1.6933639389195134e-07, 'beta_dpo/gap_mean': 100.31968688964844, 'beta_dpo/gap_std': 130.88662719726562, 'beta_dpo/beta_used_raw': -1.205794334411621, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.48015761375427246, 'logits/rejected': -0.44124317169189453, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 76.77810668945312, 'beta_dpo/beta_margin_mean': 0.0767781138420105, 'beta_dpo/beta_margin_std': 0.1251077651977539, 'beta_dpo/beta_margin_grad_mean': -0.4809180796146393, 'beta_dpo/beta_margin_grad_std': 0.031033983454108238, 'epoch': 0.65} + 65%|██████████████████████████████████████████████████▍ | 440/681 [31:19<09:55, 2.47s/it] 65%|██████████████████████████████████████████████████▌ | 441/681 [31:22<10:28, 2.62s/it] {'loss': 1.3149, 'grad_norm': 12.307683944702148, 'learning_rate': 1.681227682404166e-07, 'beta_dpo/gap_mean': 99.0499267578125, 'beta_dpo/gap_std': 131.88418579101562, 'beta_dpo/beta_used_raw': -1.326048493385315, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4150615930557251, 'logits/rejected': -0.4018522799015045, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 98.94155883789062, 'beta_dpo/beta_margin_mean': 0.09894155710935593, 'beta_dpo/beta_margin_std': 0.1482171267271042, 'beta_dpo/beta_margin_grad_mean': -0.4754822850227356, 'beta_dpo/beta_margin_grad_std': 0.036441490054130554, 'epoch': 0.65} + 65%|██████████████████████████████████████████████████▌ | 441/681 [31:22<10:28, 2.62s/it] 65%|██████████████████████████████████████████████████▋ | 442/681 [31:25<10:21, 2.60s/it] {'loss': 1.2978, 'grad_norm': 920.883056640625, 'learning_rate': 1.669113001300851e-07, 'beta_dpo/gap_mean': 101.88089752197266, 'beta_dpo/gap_std': 133.10354614257812, 'beta_dpo/beta_used_raw': -0.9432244896888733, 'beta_dpo/beta_used': 0.19351361691951752, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.42568036913871765, 'logits/rejected': -0.4096643924713135, 'beta_dpo/beta': 0.19351361691951752, 'beta_dpo/loss_margin_mean': 124.86181640625, 'beta_dpo/beta_margin_mean': 28.820743560791016, 'beta_dpo/beta_margin_std': 45.040016174316406, 'beta_dpo/beta_margin_grad_mean': -0.2922385334968567, 'beta_dpo/beta_margin_grad_std': 0.2803710997104645, 'epoch': 0.65} + 65%|██████████████████████████████████████████████████▋ | 442/681 [31:25<10:21, 2.60s/it] 65%|██████████████████████████████████████████████████▋ | 443/681 [31:28<10:23, 2.62s/it] {'loss': 1.3075, 'grad_norm': 8.173919677734375, 'learning_rate': 1.6570202148426815e-07, 'beta_dpo/gap_mean': 100.70872497558594, 'beta_dpo/gap_std': 131.86151123046875, 'beta_dpo/beta_used_raw': -0.9595794677734375, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4086863398551941, 'logits/rejected': -0.38320356607437134, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 73.91852569580078, 'beta_dpo/beta_margin_mean': 0.07391852885484695, 'beta_dpo/beta_margin_std': 0.12798674404621124, 'beta_dpo/beta_margin_grad_mean': -0.48161694407463074, 'beta_dpo/beta_margin_grad_std': 0.03176787868142128, 'epoch': 0.65} + 65%|██████████████████████████████████████████████████▋ | 443/681 [31:28<10:23, 2.62s/it] 65%|██████████████████████████████████████████████████▊ | 444/681 [31:30<10:19, 2.61s/it] {'loss': 2.6417, 'grad_norm': 1877.506103515625, 'learning_rate': 1.6449496416858282e-07, 'beta_dpo/gap_mean': 102.39837646484375, 'beta_dpo/gap_std': 133.09300231933594, 'beta_dpo/beta_used_raw': -0.6297559142112732, 'beta_dpo/beta_used': 0.13834300637245178, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.38396507501602173, 'logits/rejected': -0.3728168308734894, 'beta_dpo/beta': 0.13834300637245178, 'beta_dpo/loss_margin_mean': 126.64820861816406, 'beta_dpo/beta_margin_mean': 17.195384979248047, 'beta_dpo/beta_margin_std': 30.380125045776367, 'beta_dpo/beta_margin_grad_mean': -0.29932746291160583, 'beta_dpo/beta_margin_grad_std': 0.28772518038749695, 'epoch': 0.65} + 65%|██████████████████████████████████████████████████▊ | 444/681 [31:30<10:19, 2.61s/it] 65%|██████████████████████████████████████████████████▉ | 445/681 [31:33<10:14, 2.60s/it] {'loss': 1.3019, 'grad_norm': 15.09030818939209, 'learning_rate': 1.6329015999011182e-07, 'beta_dpo/gap_mean': 103.6148681640625, 'beta_dpo/gap_std': 134.420654296875, 'beta_dpo/beta_used_raw': -0.7490635514259338, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4086730480194092, 'logits/rejected': -0.3865576982498169, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 102.85830688476562, 'beta_dpo/beta_margin_mean': 0.10285831242799759, 'beta_dpo/beta_margin_std': 0.14040379226207733, 'beta_dpo/beta_margin_grad_mean': -0.4744797348976135, 'beta_dpo/beta_margin_grad_std': 0.03470303490757942, 'epoch': 0.65} + 65%|██████████████████████████████████████████████████▉ | 445/681 [31:33<10:14, 2.60s/it] 65%|███████████████████████████████████████████████████ | 446/681 [31:35<10:13, 2.61s/it] {'loss': 1.0661, 'grad_norm': 680.3995971679688, 'learning_rate': 1.6208764069656578e-07, 'beta_dpo/gap_mean': 104.75509643554688, 'beta_dpo/gap_std': 132.56423950195312, 'beta_dpo/beta_used_raw': -0.14518234133720398, 'beta_dpo/beta_used': 0.28921666741371155, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.42230162024497986, 'logits/rejected': -0.42033088207244873, 'beta_dpo/beta': 0.28921666741371155, 'beta_dpo/loss_margin_mean': 121.7738265991211, 'beta_dpo/beta_margin_mean': 44.87013626098633, 'beta_dpo/beta_margin_std': 70.5100326538086, 'beta_dpo/beta_margin_grad_mean': -0.27025842666625977, 'beta_dpo/beta_margin_grad_std': 0.26976633071899414, 'epoch': 0.65} + 65%|███████████████████████████████████████████████████ | 446/681 [31:35<10:13, 2.61s/it] 66%|███████████████████████████████████████████████████▏ | 447/681 [31:38<09:51, 2.53s/it] {'loss': 1.2871, 'grad_norm': 12.893980026245117, 'learning_rate': 1.608874379754465e-07, 'beta_dpo/gap_mean': 110.31854248046875, 'beta_dpo/gap_std': 135.51388549804688, 'beta_dpo/beta_used_raw': -0.2607978880405426, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4422151446342468, 'logits/rejected': -0.45059633255004883, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 122.02387237548828, 'beta_dpo/beta_margin_mean': 0.12202388048171997, 'beta_dpo/beta_margin_std': 0.14478100836277008, 'beta_dpo/beta_margin_grad_mean': -0.4698044955730438, 'beta_dpo/beta_margin_grad_std': 0.03518033027648926, 'epoch': 0.66} + 66%|███████████████████████████████████████████████████▏ | 447/681 [31:38<09:51, 2.53s/it] 66%|███████████████████████████████████████████████████▎ | 448/681 [31:40<09:52, 2.54s/it] {'loss': 0.6614, 'grad_norm': 4.838625907897949, 'learning_rate': 1.5968958345321177e-07, 'beta_dpo/gap_mean': 111.61314392089844, 'beta_dpo/gap_std': 135.30453491210938, 'beta_dpo/beta_used_raw': -0.4037218689918518, 'beta_dpo/beta_used': 0.8242188692092896, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3758270740509033, 'logits/rejected': -0.3679637312889099, 'beta_dpo/beta': 0.8242188692092896, 'beta_dpo/loss_margin_mean': 128.5632781982422, 'beta_dpo/beta_margin_mean': 125.9197998046875, 'beta_dpo/beta_margin_std': 187.5569305419922, 'beta_dpo/beta_margin_grad_mean': -0.2527080774307251, 'beta_dpo/beta_margin_grad_std': 0.254643052816391, 'epoch': 0.66} + 66%|███████████████████████████████████████████████████▎ | 448/681 [31:40<09:52, 2.54s/it] 66%|███████████████████████████████████████████████████▍ | 449/681 [31:43<09:45, 2.53s/it] {'loss': 1.2892, 'grad_norm': 8.870427131652832, 'learning_rate': 1.584941086944423e-07, 'beta_dpo/gap_mean': 113.01295471191406, 'beta_dpo/gap_std': 139.9627685546875, 'beta_dpo/beta_used_raw': -0.4290629029273987, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4272603690624237, 'logits/rejected': -0.40170085430145264, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 108.83658599853516, 'beta_dpo/beta_margin_mean': 0.1088365912437439, 'beta_dpo/beta_margin_std': 0.17029906809329987, 'beta_dpo/beta_margin_grad_mean': -0.4730731248855591, 'beta_dpo/beta_margin_grad_std': 0.04196110740303993, 'epoch': 0.66} + 66%|███████████████████████████████████████████████████▍ | 449/681 [31:43<09:45, 2.53s/it] 66%|███████████████████████████████████████████████████▌ | 450/681 [31:45<09:42, 2.52s/it] {'loss': 1.2847, 'grad_norm': 9.47729206085205, 'learning_rate': 1.573010452010098e-07, 'beta_dpo/gap_mean': 113.51698303222656, 'beta_dpo/gap_std': 141.5602264404297, 'beta_dpo/beta_used_raw': -0.313100129365921, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3865205645561218, 'logits/rejected': -0.38359227776527405, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 121.2069091796875, 'beta_dpo/beta_margin_mean': 0.12120691686868668, 'beta_dpo/beta_margin_std': 0.13755354285240173, 'beta_dpo/beta_margin_grad_mean': -0.46986889839172363, 'beta_dpo/beta_margin_grad_std': 0.0339692123234272, 'epoch': 0.66} + 66%|███████████████████████████████████████████████████▌ | 450/681 [31:45<09:42, 2.52s/it] 66%|███████████████████████████████████████████████████▋ | 451/681 [31:47<09:19, 2.43s/it] {'loss': 7.5031, 'grad_norm': 3518.580078125, 'learning_rate': 1.5611042441124687e-07, 'beta_dpo/gap_mean': 110.95838928222656, 'beta_dpo/gap_std': 140.57334899902344, 'beta_dpo/beta_used_raw': -0.24944308400154114, 'beta_dpo/beta_used': 0.3940798044204712, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3372274339199066, 'logits/rejected': -0.3046179413795471, 'beta_dpo/beta': 0.3940798044204712, 'beta_dpo/loss_margin_mean': 94.27608489990234, 'beta_dpo/beta_margin_mean': 42.9352912902832, 'beta_dpo/beta_margin_std': 87.50625610351562, 'beta_dpo/beta_margin_grad_mean': -0.33411669731140137, 'beta_dpo/beta_margin_grad_std': 0.31294018030166626, 'epoch': 0.66} + 66%|███████████████████████████████████████████████████▋ | 451/681 [31:48<09:19, 2.43s/it] 66%|███████████████████████████████████████████████████▊ | 452/681 [31:50<09:26, 2.47s/it] {'loss': 12.507, 'grad_norm': 4123.4677734375, 'learning_rate': 1.549222776991186e-07, 'beta_dpo/gap_mean': 111.77011108398438, 'beta_dpo/gap_std': 139.58013916015625, 'beta_dpo/beta_used_raw': 0.7567883729934692, 'beta_dpo/beta_used': 0.7567883729934692, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3441423773765564, 'logits/rejected': -0.35753265023231506, 'beta_dpo/beta': 0.7567883729934692, 'beta_dpo/loss_margin_mean': 117.5452651977539, 'beta_dpo/beta_margin_mean': 89.04338073730469, 'beta_dpo/beta_margin_std': 102.09488677978516, 'beta_dpo/beta_margin_grad_mean': -0.17167411744594574, 'beta_dpo/beta_margin_grad_std': 0.37626853585243225, 'epoch': 0.66} + 66%|███████████████████████████████████████████████████▊ | 452/681 [31:50<09:26, 2.47s/it] 67%|███████████████████████████████████████████████████▉ | 453/681 [31:52<09:19, 2.46s/it] {'loss': 1.2991, 'grad_norm': 8.228669166564941, 'learning_rate': 1.5373663637339584e-07, 'beta_dpo/gap_mean': 111.07215881347656, 'beta_dpo/gap_std': 140.66952514648438, 'beta_dpo/beta_used_raw': -1.122417688369751, 'beta_dpo/beta_used': 0.0010159736266359687, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.4157373905181885, 'logits/rejected': -0.38169363141059875, 'beta_dpo/beta': 0.0010159736266359687, 'beta_dpo/loss_margin_mean': 99.66301727294922, 'beta_dpo/beta_margin_mean': 0.10151873528957367, 'beta_dpo/beta_margin_std': 0.14481480419635773, 'beta_dpo/beta_margin_grad_mean': -0.47485530376434326, 'beta_dpo/beta_margin_grad_std': 0.03558202460408211, 'epoch': 0.67} + 67%|███████████████████████████████████████████████████▉ | 453/681 [31:53<09:19, 2.46s/it] 67%|████████████████████████████████████████████████████ | 454/681 [31:55<09:29, 2.51s/it] {'loss': 1.2903, 'grad_norm': 7.617781162261963, 'learning_rate': 1.5255353167683017e-07, 'beta_dpo/gap_mean': 112.77023315429688, 'beta_dpo/gap_std': 141.88412475585938, 'beta_dpo/beta_used_raw': -0.6103986501693726, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3238060176372528, 'logits/rejected': -0.2969810962677002, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 128.04446411132812, 'beta_dpo/beta_margin_mean': 0.1280444711446762, 'beta_dpo/beta_margin_std': 0.15373089909553528, 'beta_dpo/beta_margin_grad_mean': -0.46827903389930725, 'beta_dpo/beta_margin_grad_std': 0.03779821842908859, 'epoch': 0.67} + 67%|████████████████████████████████████████████████████ | 454/681 [31:55<09:29, 2.51s/it] 67%|████████████████████████████████████████████████████ | 455/681 [31:58<09:27, 2.51s/it] {'loss': 0.7604, 'grad_norm': 257.9051208496094, 'learning_rate': 1.5137299478533064e-07, 'beta_dpo/gap_mean': 119.1419677734375, 'beta_dpo/gap_std': 145.837158203125, 'beta_dpo/beta_used_raw': 0.23084740340709686, 'beta_dpo/beta_used': 0.23283345997333527, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3444521725177765, 'logits/rejected': -0.35367467999458313, 'beta_dpo/beta': 0.23283345997333527, 'beta_dpo/loss_margin_mean': 164.03538513183594, 'beta_dpo/beta_margin_mean': 31.724552154541016, 'beta_dpo/beta_margin_std': 55.67319107055664, 'beta_dpo/beta_margin_grad_mean': -0.27318888902664185, 'beta_dpo/beta_margin_grad_std': 0.2729749083518982, 'epoch': 0.67} + 67%|████████████████████████████████████████████████████ | 455/681 [31:58<09:27, 2.51s/it] 67%|████████████████████████████████████████████████████▏ | 456/681 [32:00<09:34, 2.55s/it] {'loss': 0.9313, 'grad_norm': 182.11668395996094, 'learning_rate': 1.5019505680714232e-07, 'beta_dpo/gap_mean': 127.31085205078125, 'beta_dpo/gap_std': 151.3060302734375, 'beta_dpo/beta_used_raw': -0.5959498286247253, 'beta_dpo/beta_used': 0.028770416975021362, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.36195001006126404, 'logits/rejected': -0.3621336817741394, 'beta_dpo/beta': 0.028770416975021362, 'beta_dpo/loss_margin_mean': 154.75982666015625, 'beta_dpo/beta_margin_mean': 4.620020866394043, 'beta_dpo/beta_margin_std': 7.49506950378418, 'beta_dpo/beta_margin_grad_mean': -0.30041444301605225, 'beta_dpo/beta_margin_grad_std': 0.25256428122520447, 'epoch': 0.67} + 67%|████████████████████████████████████████████████████▏ | 456/681 [32:00<09:34, 2.55s/it] 67%|████████████████████████████████████████████████████▎ | 457/681 [32:03<09:33, 2.56s/it] {'loss': 1.9311, 'grad_norm': 2250.94482421875, 'learning_rate': 1.4901974878202627e-07, 'beta_dpo/gap_mean': 128.869873046875, 'beta_dpo/gap_std': 148.14273071289062, 'beta_dpo/beta_used_raw': 0.25765174627304077, 'beta_dpo/beta_used': 0.9050564765930176, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.35765865445137024, 'logits/rejected': -0.3306649625301361, 'beta_dpo/beta': 0.9050564765930176, 'beta_dpo/loss_margin_mean': 125.5430908203125, 'beta_dpo/beta_margin_mean': 119.8252182006836, 'beta_dpo/beta_margin_std': 193.12596130371094, 'beta_dpo/beta_margin_grad_mean': -0.3133964538574219, 'beta_dpo/beta_margin_grad_std': 0.30206099152565, 'epoch': 0.67} + 67%|████████████████████████████████████████████████████▎ | 457/681 [32:03<09:33, 2.56s/it] 67%|████████████████████████████████████████████████████▍ | 458/681 [32:05<09:20, 2.51s/it] {'loss': 1.2669, 'grad_norm': 7.672088146209717, 'learning_rate': 1.4784710168044212e-07, 'beta_dpo/gap_mean': 133.1038818359375, 'beta_dpo/gap_std': 151.08180236816406, 'beta_dpo/beta_used_raw': -0.31320202350616455, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3376292586326599, 'logits/rejected': -0.31968408823013306, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 156.4623565673828, 'beta_dpo/beta_margin_mean': 0.15646237134933472, 'beta_dpo/beta_margin_std': 0.1608007401227951, 'beta_dpo/beta_margin_grad_mean': -0.46133655309677124, 'beta_dpo/beta_margin_grad_std': 0.03900197148323059, 'epoch': 0.67} + 67%|████████████████████████████████████████████████████▍ | 458/681 [32:05<09:20, 2.51s/it] 67%|████████████████████████████████████████████████████▌ | 459/681 [32:08<09:24, 2.54s/it] {'loss': 1.2765, 'grad_norm': 7.513828754425049, 'learning_rate': 1.466771464027316e-07, 'beta_dpo/gap_mean': 132.22055053710938, 'beta_dpo/gap_std': 149.7262420654297, 'beta_dpo/beta_used_raw': -0.7991423606872559, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3106893301010132, 'logits/rejected': -0.30481159687042236, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 125.19963073730469, 'beta_dpo/beta_margin_mean': 0.12519963085651398, 'beta_dpo/beta_margin_std': 0.14195367693901062, 'beta_dpo/beta_margin_grad_mean': -0.46891355514526367, 'beta_dpo/beta_margin_grad_std': 0.03510946035385132, 'epoch': 0.67} + 67%|████████████████████████████████████████████████████▌ | 459/681 [32:08<09:24, 2.54s/it] 68%|████████████████████████████████████████████████████▋ | 460/681 [32:11<09:30, 2.58s/it] {'loss': 1.2756, 'grad_norm': 9.385546684265137, 'learning_rate': 1.4550991377830423e-07, 'beta_dpo/gap_mean': 132.47604370117188, 'beta_dpo/gap_std': 149.71617126464844, 'beta_dpo/beta_used_raw': -0.736950159072876, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.35042130947113037, 'logits/rejected': -0.36293381452560425, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 135.55874633789062, 'beta_dpo/beta_margin_mean': 0.13555875420570374, 'beta_dpo/beta_margin_std': 0.15229398012161255, 'beta_dpo/beta_margin_grad_mean': -0.46639198064804077, 'beta_dpo/beta_margin_grad_std': 0.03753071278333664, 'epoch': 0.68} + 68%|████████████████████████████████████████████████████▋ | 460/681 [32:11<09:30, 2.58s/it] 68%|████████████████████████████████████████████████████▊ | 461/681 [32:13<09:33, 2.61s/it] {'loss': 1.2997, 'grad_norm': 9.00002670288086, 'learning_rate': 1.4434543456482518e-07, 'beta_dpo/gap_mean': 128.8672637939453, 'beta_dpo/gap_std': 150.39163208007812, 'beta_dpo/beta_used_raw': -2.1008927822113037, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3559180200099945, 'logits/rejected': -0.3427043855190277, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 114.12368774414062, 'beta_dpo/beta_margin_mean': 0.11412369459867477, 'beta_dpo/beta_margin_std': 0.15732567012310028, 'beta_dpo/beta_margin_grad_mean': -0.4717380404472351, 'beta_dpo/beta_margin_grad_std': 0.03876164183020592, 'epoch': 0.68} + 68%|████████████████████████████████████████████████████▊ | 461/681 [32:13<09:33, 2.61s/it] 68%|████████████████████████████████████████████████████▉ | 462/681 [32:16<09:24, 2.58s/it] {'loss': 1.3102, 'grad_norm': 9.362037658691406, 'learning_rate': 1.4318373944740484e-07, 'beta_dpo/gap_mean': 123.946533203125, 'beta_dpo/gap_std': 149.71881103515625, 'beta_dpo/beta_used_raw': -2.4599204063415527, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3212631940841675, 'logits/rejected': -0.29980742931365967, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 86.9663314819336, 'beta_dpo/beta_margin_mean': 0.08696634322404861, 'beta_dpo/beta_margin_std': 0.1362220048904419, 'beta_dpo/beta_margin_grad_mean': -0.4784083962440491, 'beta_dpo/beta_margin_grad_std': 0.03373510017991066, 'epoch': 0.68} + 68%|████████████████████████████████████████████████████▉ | 462/681 [32:16<09:24, 2.58s/it] 68%|█████████████████████████████████████████████████████ | 463/681 [32:18<09:12, 2.54s/it] {'loss': 7.63, 'grad_norm': 3597.947021484375, 'learning_rate': 1.4202485903778976e-07, 'beta_dpo/gap_mean': 119.78553771972656, 'beta_dpo/gap_std': 151.25320434570312, 'beta_dpo/beta_used_raw': -0.2606269419193268, 'beta_dpo/beta_used': 0.2805536985397339, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3247862458229065, 'logits/rejected': -0.3129928708076477, 'beta_dpo/beta': 0.2805536985397339, 'beta_dpo/loss_margin_mean': 117.82292938232422, 'beta_dpo/beta_margin_mean': 38.56230926513672, 'beta_dpo/beta_margin_std': 81.53507232666016, 'beta_dpo/beta_margin_grad_mean': -0.3316049575805664, 'beta_dpo/beta_margin_grad_std': 0.31257641315460205, 'epoch': 0.68} + 68%|█████████████████████████████████████████████████████ | 463/681 [32:18<09:12, 2.54s/it] 68%|█████████████████████████████████████████████████████▏ | 464/681 [32:20<08:53, 2.46s/it] {'loss': 29.0936, 'grad_norm': 10341.1005859375, 'learning_rate': 1.4086882387355658e-07, 'beta_dpo/gap_mean': 131.84754943847656, 'beta_dpo/gap_std': 157.7271728515625, 'beta_dpo/beta_used_raw': 2.1228408813476562, 'beta_dpo/beta_used': 2.1228408813476562, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3156416118144989, 'logits/rejected': -0.3281491696834564, 'beta_dpo/beta': 2.1228408813476562, 'beta_dpo/loss_margin_mean': 192.6825714111328, 'beta_dpo/beta_margin_mean': 394.66033935546875, 'beta_dpo/beta_margin_std': 431.92449951171875, 'beta_dpo/beta_margin_grad_mean': -0.12389523535966873, 'beta_dpo/beta_margin_grad_std': 0.3279002010822296, 'epoch': 0.68} + 68%|█████████████████████████████████████████████████████▏ | 464/681 [32:20<08:53, 2.46s/it] 68%|█████████████████████████████████████████████████████▎ | 465/681 [32:23<09:09, 2.55s/it] {'loss': 8.3958, 'grad_norm': 4897.61328125, 'learning_rate': 1.3971566441730714e-07, 'beta_dpo/gap_mean': 137.17782592773438, 'beta_dpo/gap_std': 158.68795776367188, 'beta_dpo/beta_used_raw': 0.4801773428916931, 'beta_dpo/beta_used': 0.4801773428916931, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.31099051237106323, 'logits/rejected': -0.305058091878891, 'beta_dpo/beta': 0.4801773428916931, 'beta_dpo/loss_margin_mean': 162.17996215820312, 'beta_dpo/beta_margin_mean': 81.35899353027344, 'beta_dpo/beta_margin_std': 94.96959686279297, 'beta_dpo/beta_margin_grad_mean': -0.16912737488746643, 'beta_dpo/beta_margin_grad_std': 0.37140730023384094, 'epoch': 0.68} + 68%|█████████████████████████████████████████████████████▎ | 465/681 [32:23<09:09, 2.55s/it] 68%|█████████████████████████████████████████████████████▎ | 466/681 [32:26<09:27, 2.64s/it] {'loss': 1.2206, 'grad_norm': 1151.1441650390625, 'learning_rate': 1.3856541105586545e-07, 'beta_dpo/gap_mean': 139.38119506835938, 'beta_dpo/gap_std': 160.36859130859375, 'beta_dpo/beta_used_raw': -0.26916056871414185, 'beta_dpo/beta_used': 0.22260768711566925, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3135194778442383, 'logits/rejected': -0.3104793429374695, 'beta_dpo/beta': 0.22260768711566925, 'beta_dpo/loss_margin_mean': 131.21505737304688, 'beta_dpo/beta_margin_mean': 33.30300521850586, 'beta_dpo/beta_margin_std': 57.53418731689453, 'beta_dpo/beta_margin_grad_mean': -0.2829422950744629, 'beta_dpo/beta_margin_grad_std': 0.2813977301120758, 'epoch': 0.68} + 68%|█████████████████████████████████████████████████████▎ | 466/681 [32:26<09:27, 2.64s/it] 69%|█████████████████████████████████████████████████████▍ | 467/681 [32:29<09:26, 2.65s/it] {'loss': 1.292, 'grad_norm': 9.571708679199219, 'learning_rate': 1.3741809409947729e-07, 'beta_dpo/gap_mean': 137.7141571044922, 'beta_dpo/gap_std': 169.05447387695312, 'beta_dpo/beta_used_raw': -1.9833605289459229, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.31644725799560547, 'logits/rejected': -0.29425540566444397, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 125.64309692382812, 'beta_dpo/beta_margin_mean': 0.12564310431480408, 'beta_dpo/beta_margin_std': 0.21079717576503754, 'beta_dpo/beta_margin_grad_mean': -0.4690595865249634, 'beta_dpo/beta_margin_grad_std': 0.05179882049560547, 'epoch': 0.69} + 69%|█████████████████████████████████████████████████████▍ | 467/681 [32:29<09:26, 2.65s/it] 69%|█████████████████████████████████████████████████████▌ | 468/681 [32:31<09:24, 2.65s/it] {'loss': 1.9544, 'grad_norm': 2363.861083984375, 'learning_rate': 1.362737437810114e-07, 'beta_dpo/gap_mean': 136.60678100585938, 'beta_dpo/gap_std': 168.23411560058594, 'beta_dpo/beta_used_raw': 0.5442880988121033, 'beta_dpo/beta_used': 0.5442880988121033, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3771149516105652, 'logits/rejected': -0.3516891598701477, 'beta_dpo/beta': 0.5442880988121033, 'beta_dpo/loss_margin_mean': 139.57421875, 'beta_dpo/beta_margin_mean': 83.84257507324219, 'beta_dpo/beta_margin_std': 139.0602569580078, 'beta_dpo/beta_margin_grad_mean': -0.17244772613048553, 'beta_dpo/beta_margin_grad_std': 0.3269096910953522, 'epoch': 0.69} + 69%|█████████████████████████████████████████████████████▌ | 468/681 [32:31<09:24, 2.65s/it] 69%|█████████████████████████████████████████████████████▋ | 469/681 [32:34<09:29, 2.69s/it] {'loss': 5.6201, 'grad_norm': 4023.0234375, 'learning_rate': 1.351323902551631e-07, 'beta_dpo/gap_mean': 139.35459899902344, 'beta_dpo/gap_std': 167.7623291015625, 'beta_dpo/beta_used_raw': 0.13212749361991882, 'beta_dpo/beta_used': 0.5691275596618652, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3229216933250427, 'logits/rejected': -0.2937919497489929, 'beta_dpo/beta': 0.5691275596618652, 'beta_dpo/loss_margin_mean': 148.57752990722656, 'beta_dpo/beta_margin_mean': 104.36015319824219, 'beta_dpo/beta_margin_std': 166.2760467529297, 'beta_dpo/beta_margin_grad_mean': -0.2979428172111511, 'beta_dpo/beta_margin_grad_std': 0.2913264036178589, 'epoch': 0.69} + 69%|█████████████████████████████████████████████████████▋ | 469/681 [32:34<09:29, 2.69s/it] 69%|█████████████████████████████████████████████████████▊ | 470/681 [32:37<09:07, 2.60s/it] {'loss': 1.8137, 'grad_norm': 635.5731201171875, 'learning_rate': 1.339940635976592e-07, 'beta_dpo/gap_mean': 140.06040954589844, 'beta_dpo/gap_std': 169.35638427734375, 'beta_dpo/beta_used_raw': -0.260947585105896, 'beta_dpo/beta_used': 0.058329131454229355, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2994263470172882, 'logits/rejected': -0.2865986227989197, 'beta_dpo/beta': 0.058329131454229355, 'beta_dpo/loss_margin_mean': 151.863525390625, 'beta_dpo/beta_margin_mean': 8.915841102600098, 'beta_dpo/beta_margin_std': 17.628265380859375, 'beta_dpo/beta_margin_grad_mean': -0.30150657892227173, 'beta_dpo/beta_margin_grad_std': 0.2844862639904022, 'epoch': 0.69} + 69%|█████████████████████████████████████████████████████▊ | 470/681 [32:37<09:07, 2.60s/it] 69%|█████████████████████████████████████████████████████▉ | 471/681 [32:39<08:50, 2.52s/it] {'loss': 1.5718, 'grad_norm': 660.4382934570312, 'learning_rate': 1.3285879380446563e-07, 'beta_dpo/gap_mean': 141.4301300048828, 'beta_dpo/gap_std': 166.99551391601562, 'beta_dpo/beta_used_raw': -1.259301781654358, 'beta_dpo/beta_used': 0.14344525337219238, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3323206603527069, 'logits/rejected': -0.301265686750412, 'beta_dpo/beta': 0.14344525337219238, 'beta_dpo/loss_margin_mean': 137.1492462158203, 'beta_dpo/beta_margin_mean': 23.80760955810547, 'beta_dpo/beta_margin_std': 40.966461181640625, 'beta_dpo/beta_margin_grad_mean': -0.32090723514556885, 'beta_dpo/beta_margin_grad_std': 0.296132355928421, 'epoch': 0.69} + 69%|█████████████████████████████████████████████████████▉ | 471/681 [32:39<08:50, 2.52s/it] 69%|██████████████████████████████████████████████████████ | 472/681 [32:42<09:08, 2.62s/it] {'loss': 1.2787, 'grad_norm': 9.515340805053711, 'learning_rate': 1.317266107909975e-07, 'beta_dpo/gap_mean': 141.42642211914062, 'beta_dpo/gap_std': 171.97683715820312, 'beta_dpo/beta_used_raw': -1.5177662372589111, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.46395474672317505, 'logits/rejected': -0.4258913993835449, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 153.0021514892578, 'beta_dpo/beta_margin_mean': 0.15300215780735016, 'beta_dpo/beta_margin_std': 0.2004023641347885, 'beta_dpo/beta_margin_grad_mean': -0.46238476037979126, 'beta_dpo/beta_margin_grad_std': 0.048712510615587234, 'epoch': 0.69} + 69%|██████████████████████████████████████████████████████ | 472/681 [32:42<09:08, 2.62s/it] 69%|██████████████████████████████████████████████████████▏ | 473/681 [32:44<09:10, 2.65s/it] {'loss': 2.9427, 'grad_norm': 874.2503051757812, 'learning_rate': 1.3059754439133002e-07, 'beta_dpo/gap_mean': 136.3826141357422, 'beta_dpo/gap_std': 172.83595275878906, 'beta_dpo/beta_used_raw': -2.1221091747283936, 'beta_dpo/beta_used': 0.12622235715389252, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3121126890182495, 'logits/rejected': -0.27456527948379517, 'beta_dpo/beta': 0.12622235715389252, 'beta_dpo/loss_margin_mean': 98.95618438720703, 'beta_dpo/beta_margin_mean': 11.348122596740723, 'beta_dpo/beta_margin_std': 32.52213668823242, 'beta_dpo/beta_margin_grad_mean': -0.3606536090373993, 'beta_dpo/beta_margin_grad_std': 0.32541587948799133, 'epoch': 0.69} + 69%|██████████████████████████████████████████████████████▏ | 473/681 [32:44<09:10, 2.65s/it] 70%|██████████████████████████████████████████████████████▎ | 474/681 [32:47<09:14, 2.68s/it] {'loss': 1.019, 'grad_norm': 241.4309539794922, 'learning_rate': 1.2947162435741277e-07, 'beta_dpo/gap_mean': 128.73321533203125, 'beta_dpo/gap_std': 170.72265625, 'beta_dpo/beta_used_raw': -1.1924772262573242, 'beta_dpo/beta_used': 0.03025379776954651, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3187577426433563, 'logits/rejected': -0.31267520785331726, 'beta_dpo/beta': 0.03025379776954651, 'beta_dpo/loss_margin_mean': 102.19025421142578, 'beta_dpo/beta_margin_mean': 3.4796054363250732, 'beta_dpo/beta_margin_std': 7.700491428375244, 'beta_dpo/beta_margin_grad_mean': -0.3477736711502075, 'beta_dpo/beta_margin_grad_std': 0.26919984817504883, 'epoch': 0.7} + 70%|██████████████████████████████████████████████████████▎ | 474/681 [32:47<09:14, 2.68s/it] 70%|██████████████████████████████████████████████████████▍ | 475/681 [32:50<08:58, 2.62s/it] {'loss': 1.2878, 'grad_norm': 7.302783966064453, 'learning_rate': 1.2834888035828596e-07, 'beta_dpo/gap_mean': 130.75253295898438, 'beta_dpo/gap_std': 168.95263671875, 'beta_dpo/beta_used_raw': -1.4561372995376587, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3118668496608734, 'logits/rejected': -0.32232552766799927, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 145.15594482421875, 'beta_dpo/beta_margin_mean': 0.14515595138072968, 'beta_dpo/beta_margin_std': 0.1593308448791504, 'beta_dpo/beta_margin_grad_mean': -0.4640824496746063, 'beta_dpo/beta_margin_grad_std': 0.03909669071435928, 'epoch': 0.7} + 70%|██████████████████████████████████████████████████████▍ | 475/681 [32:50<08:58, 2.62s/it] 70%|██████████████████████████████████████████████████████▌ | 476/681 [32:52<08:52, 2.60s/it] {'loss': 1.2757, 'grad_norm': 10.900651931762695, 'learning_rate': 1.2722934197929802e-07, 'beta_dpo/gap_mean': 130.04847717285156, 'beta_dpo/gap_std': 165.11314392089844, 'beta_dpo/beta_used_raw': -0.6184031367301941, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.32981306314468384, 'logits/rejected': -0.3277033567428589, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 123.46017456054688, 'beta_dpo/beta_margin_mean': 0.12346017360687256, 'beta_dpo/beta_margin_std': 0.13980108499526978, 'beta_dpo/beta_margin_grad_mean': -0.4693569839000702, 'beta_dpo/beta_margin_grad_std': 0.03457416966557503, 'epoch': 0.7} + 70%|██████████████████████████████████████████████████████▌ | 476/681 [32:52<08:52, 2.60s/it] 70%|██████████████████████████████████████████████████████▋ | 477/681 [32:55<08:47, 2.59s/it] {'loss': 1.9278, 'grad_norm': 881.2789306640625, 'learning_rate': 1.2611303872132631e-07, 'beta_dpo/gap_mean': 129.47628784179688, 'beta_dpo/gap_std': 165.23104858398438, 'beta_dpo/beta_used_raw': -0.9268441200256348, 'beta_dpo/beta_used': 0.08377163857221603, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.34101468324661255, 'logits/rejected': -0.27440470457077026, 'beta_dpo/beta': 0.08377163857221603, 'beta_dpo/loss_margin_mean': 131.6189727783203, 'beta_dpo/beta_margin_mean': 9.991097450256348, 'beta_dpo/beta_margin_std': 23.768993377685547, 'beta_dpo/beta_margin_grad_mean': -0.3329217731952667, 'beta_dpo/beta_margin_grad_std': 0.2996887266635895, 'epoch': 0.7} + 70%|██████████████████████████████████████████████████████▋ | 477/681 [32:55<08:47, 2.59s/it] 70%|██████████████████████████████████████████████████████▋ | 478/681 [32:58<09:10, 2.71s/it] {'loss': 1.2777, 'grad_norm': 8.391778945922852, 'learning_rate': 1.2500000000000005e-07, 'beta_dpo/gap_mean': 131.2724609375, 'beta_dpo/gap_std': 162.33258056640625, 'beta_dpo/beta_used_raw': -0.9066869616508484, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3239785432815552, 'logits/rejected': -0.3198069930076599, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 147.8665771484375, 'beta_dpo/beta_margin_mean': 0.14786657691001892, 'beta_dpo/beta_margin_std': 0.16245287656784058, 'beta_dpo/beta_margin_grad_mean': -0.46343475580215454, 'beta_dpo/beta_margin_grad_std': 0.039767127484083176, 'epoch': 0.7} + 70%|██████████████████████████████████████████████████████▋ | 478/681 [32:58<09:10, 2.71s/it] 70%|██████████████████████████████████████████████████████▊ | 479/681 [33:00<08:59, 2.67s/it] {'loss': 1.2909, 'grad_norm': 9.221752166748047, 'learning_rate': 1.2389025514492456e-07, 'beta_dpo/gap_mean': 130.87498474121094, 'beta_dpo/gap_std': 161.7484893798828, 'beta_dpo/beta_used_raw': -1.6241159439086914, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3099960684776306, 'logits/rejected': -0.3118622601032257, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 130.71371459960938, 'beta_dpo/beta_margin_mean': 0.13071373105049133, 'beta_dpo/beta_margin_std': 0.16454558074474335, 'beta_dpo/beta_margin_grad_mean': -0.4676341712474823, 'beta_dpo/beta_margin_grad_std': 0.04058250039815903, 'epoch': 0.7} + 70%|██████████████████████████████████████████████████████▊ | 479/681 [33:00<08:59, 2.67s/it] 70%|██████████████████████████████████████████████████████▉ | 480/681 [33:03<08:48, 2.63s/it] {'loss': 4.3921, 'grad_norm': 1865.645751953125, 'learning_rate': 1.227838333989088e-07, 'beta_dpo/gap_mean': 128.6205596923828, 'beta_dpo/gap_std': 162.02749633789062, 'beta_dpo/beta_used_raw': -1.191691517829895, 'beta_dpo/beta_used': 0.13506542146205902, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2958667278289795, 'logits/rejected': -0.261913537979126, 'beta_dpo/beta': 0.13506542146205902, 'beta_dpo/loss_margin_mean': 111.26964569091797, 'beta_dpo/beta_margin_mean': 20.978227615356445, 'beta_dpo/beta_margin_std': 46.153724670410156, 'beta_dpo/beta_margin_grad_mean': -0.3352108895778656, 'beta_dpo/beta_margin_grad_std': 0.31329280138015747, 'epoch': 0.7} + 70%|██████████████████████████████████████████████████████▉ | 480/681 [33:03<08:48, 2.63s/it] 71%|███████████████████████████████████████████████████████ | 481/681 [33:05<08:37, 2.59s/it] {'loss': 10.966, 'grad_norm': 6225.22705078125, 'learning_rate': 1.2168076391719489e-07, 'beta_dpo/gap_mean': 132.35614013671875, 'beta_dpo/gap_std': 165.59747314453125, 'beta_dpo/beta_used_raw': -0.41111305356025696, 'beta_dpo/beta_used': 0.4420124888420105, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.29309454560279846, 'logits/rejected': -0.2821449935436249, 'beta_dpo/beta': 0.4420124888420105, 'beta_dpo/loss_margin_mean': 140.51625061035156, 'beta_dpo/beta_margin_mean': 69.89620208740234, 'beta_dpo/beta_margin_std': 127.26205444335938, 'beta_dpo/beta_margin_grad_mean': -0.32656970620155334, 'beta_dpo/beta_margin_grad_std': 0.3070107102394104, 'epoch': 0.71} + 71%|███████████████████████████████████████████████████████ | 481/681 [33:05<08:37, 2.59s/it] 71%|███████████████████████████████████████████████████████▏ | 482/681 [33:08<08:40, 2.61s/it] {'loss': 1.2958, 'grad_norm': 7.6943440437316895, 'learning_rate': 1.2058107576668938e-07, 'beta_dpo/gap_mean': 127.62977600097656, 'beta_dpo/gap_std': 167.57472229003906, 'beta_dpo/beta_used_raw': -1.7888857126235962, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.33630889654159546, 'logits/rejected': -0.3210619390010834, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 99.38764190673828, 'beta_dpo/beta_margin_mean': 0.09938764572143555, 'beta_dpo/beta_margin_std': 0.172020822763443, 'beta_dpo/beta_margin_grad_mean': -0.47538548707962036, 'beta_dpo/beta_margin_grad_std': 0.0424528568983078, 'epoch': 0.71} + 71%|███████████████████████████████████████████████████████▏ | 482/681 [33:08<08:40, 2.61s/it] 71%|███████████████████████████████████████████████████████▎ | 483/681 [33:11<08:31, 2.58s/it] {'loss': 1.3205, 'grad_norm': 1278.6922607421875, 'learning_rate': 1.194847979251979e-07, 'beta_dpo/gap_mean': 130.0849151611328, 'beta_dpo/gap_std': 171.31443786621094, 'beta_dpo/beta_used_raw': 0.08599334955215454, 'beta_dpo/beta_used': 0.26435208320617676, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3326480984687805, 'logits/rejected': -0.2999170124530792, 'beta_dpo/beta': 0.26435208320617676, 'beta_dpo/loss_margin_mean': 154.75323486328125, 'beta_dpo/beta_margin_mean': 39.302825927734375, 'beta_dpo/beta_margin_std': 61.75477981567383, 'beta_dpo/beta_margin_grad_mean': -0.2773337662220001, 'beta_dpo/beta_margin_grad_std': 0.2783583700656891, 'epoch': 0.71} + 71%|███████████████████████████████████████████████████████▎ | 483/681 [33:11<08:31, 2.58s/it] 71%|███████████████████████████████████████████████████████▍ | 484/681 [33:13<08:12, 2.50s/it] {'loss': 1.2859, 'grad_norm': 10.355823516845703, 'learning_rate': 1.1839195928066101e-07, 'beta_dpo/gap_mean': 129.75552368164062, 'beta_dpo/gap_std': 164.25143432617188, 'beta_dpo/beta_used_raw': -1.3041430711746216, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3226221203804016, 'logits/rejected': -0.2984588146209717, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 130.5736541748047, 'beta_dpo/beta_margin_mean': 0.13057366013526917, 'beta_dpo/beta_margin_std': 0.1456281840801239, 'beta_dpo/beta_margin_grad_mean': -0.46763938665390015, 'beta_dpo/beta_margin_grad_std': 0.035770609974861145, 'epoch': 0.71} + 71%|███████████████████████████████████████████████████████▍ | 484/681 [33:13<08:12, 2.50s/it] 71%|███████████████████████████████████████████████████████▌ | 485/681 [33:15<08:10, 2.50s/it] {'loss': 17.9925, 'grad_norm': 9029.59765625, 'learning_rate': 1.1730258863039347e-07, 'beta_dpo/gap_mean': 135.1558837890625, 'beta_dpo/gap_std': 167.03604125976562, 'beta_dpo/beta_used_raw': 0.09787964820861816, 'beta_dpo/beta_used': 0.5772560238838196, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3165690302848816, 'logits/rejected': -0.30851900577545166, 'beta_dpo/beta': 0.5772560238838196, 'beta_dpo/loss_margin_mean': 159.67459106445312, 'beta_dpo/beta_margin_mean': 89.47730255126953, 'beta_dpo/beta_margin_std': 172.3997344970703, 'beta_dpo/beta_margin_grad_mean': -0.33905330300331116, 'beta_dpo/beta_margin_grad_std': 0.3175105154514313, 'epoch': 0.71} + 71%|███████████████████████████████████████████████████████▌ | 485/681 [33:15<08:10, 2.50s/it] 71%|███████████████████████████████████████████████████████▋ | 486/681 [33:17<07:42, 2.37s/it] {'loss': 24.9742, 'grad_norm': 8708.306640625, 'learning_rate': 1.1621671468032493e-07, 'beta_dpo/gap_mean': 137.73037719726562, 'beta_dpo/gap_std': 171.21456909179688, 'beta_dpo/beta_used_raw': 0.35201627016067505, 'beta_dpo/beta_used': 0.5253121852874756, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3039623498916626, 'logits/rejected': -0.28515172004699707, 'beta_dpo/beta': 0.5253121852874756, 'beta_dpo/loss_margin_mean': 145.63682556152344, 'beta_dpo/beta_margin_mean': 79.52362823486328, 'beta_dpo/beta_margin_std': 165.96304321289062, 'beta_dpo/beta_margin_grad_mean': -0.3582148551940918, 'beta_dpo/beta_margin_grad_std': 0.32531389594078064, 'epoch': 0.71} + 71%|███████████████████████████████████████████████████████▋ | 486/681 [33:17<07:42, 2.37s/it] 72%|███████████████████████████████████████████████████████▊ | 487/681 [33:20<07:54, 2.45s/it] {'loss': 7.4951, 'grad_norm': 4539.7001953125, 'learning_rate': 1.1513436604424378e-07, 'beta_dpo/gap_mean': 138.84857177734375, 'beta_dpo/gap_std': 166.0025634765625, 'beta_dpo/beta_used_raw': 0.6316623091697693, 'beta_dpo/beta_used': 0.6316623091697693, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3175516128540039, 'logits/rejected': -0.30147281289100647, 'beta_dpo/beta': 0.6316623091697693, 'beta_dpo/loss_margin_mean': 136.31451416015625, 'beta_dpo/beta_margin_mean': 86.09791564941406, 'beta_dpo/beta_margin_std': 80.9069595336914, 'beta_dpo/beta_margin_grad_mean': -0.15689758956432343, 'beta_dpo/beta_margin_grad_std': 0.36151018738746643, 'epoch': 0.72} + 72%|███████████████████████████████████████████████████████▊ | 487/681 [33:20<07:54, 2.45s/it] 72%|███████████████████████████████████████████████████████▉ | 488/681 [33:23<08:02, 2.50s/it] {'loss': 1.2797, 'grad_norm': 10.171424865722656, 'learning_rate': 1.1405557124304335e-07, 'beta_dpo/gap_mean': 134.59036254882812, 'beta_dpo/gap_std': 155.66152954101562, 'beta_dpo/beta_used_raw': -1.245683193206787, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3197871446609497, 'logits/rejected': -0.2931329607963562, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 118.42108917236328, 'beta_dpo/beta_margin_mean': 0.11842110008001328, 'beta_dpo/beta_margin_std': 0.10160267353057861, 'beta_dpo/beta_margin_grad_mean': -0.4705146551132202, 'beta_dpo/beta_margin_grad_std': 0.025216443464159966, 'epoch': 0.72} + 72%|███████████████████████████████████████████████████████▉ | 488/681 [33:23<08:02, 2.50s/it] 72%|████████████████████████████████████████████████████████ | 489/681 [33:25<07:58, 2.49s/it] {'loss': 1.3069, 'grad_norm': 7.416528701782227, 'learning_rate': 1.1298035870396985e-07, 'beta_dpo/gap_mean': 132.25436401367188, 'beta_dpo/gap_std': 150.97909545898438, 'beta_dpo/beta_used_raw': -2.862081527709961, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.39010077714920044, 'logits/rejected': -0.36551567912101746, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 118.88796997070312, 'beta_dpo/beta_margin_mean': 0.11888797581195831, 'beta_dpo/beta_margin_std': 0.13958628475666046, 'beta_dpo/beta_margin_grad_mean': -0.47052738070487976, 'beta_dpo/beta_margin_grad_std': 0.03435816988348961, 'epoch': 0.72} + 72%|████████████████████████████████████████████████████████ | 489/681 [33:25<07:58, 2.49s/it] 72%|████████████████████████████████████████████████████████ | 490/681 [33:28<08:19, 2.62s/it] {'loss': 2.6273, 'grad_norm': 638.021728515625, 'learning_rate': 1.1190875675987355e-07, 'beta_dpo/gap_mean': 131.10269165039062, 'beta_dpo/gap_std': 152.6240692138672, 'beta_dpo/beta_used_raw': -0.9930161833763123, 'beta_dpo/beta_used': 0.058361634612083435, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.40140801668167114, 'logits/rejected': -0.4072290062904358, 'beta_dpo/beta': 0.058361634612083435, 'beta_dpo/loss_margin_mean': 133.6477813720703, 'beta_dpo/beta_margin_mean': 8.510327339172363, 'beta_dpo/beta_margin_std': 16.38105583190918, 'beta_dpo/beta_margin_grad_mean': -0.35114118456840515, 'beta_dpo/beta_margin_grad_std': 0.3123593032360077, 'epoch': 0.72} + 72%|████████████████████████████████████████████████████████ | 490/681 [33:28<08:19, 2.62s/it] 72%|████████████████████████████████████████████████████████▏ | 491/681 [33:31<08:20, 2.64s/it] {'loss': 1.2841, 'grad_norm': 8.015692710876465, 'learning_rate': 1.1084079364846241e-07, 'beta_dpo/gap_mean': 128.78497314453125, 'beta_dpo/gap_std': 152.2926025390625, 'beta_dpo/beta_used_raw': -1.122982144355774, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3382050395011902, 'logits/rejected': -0.30560484528541565, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 108.52165985107422, 'beta_dpo/beta_margin_mean': 0.10852167010307312, 'beta_dpo/beta_margin_std': 0.14018140733242035, 'beta_dpo/beta_margin_grad_mean': -0.47306498885154724, 'beta_dpo/beta_margin_grad_std': 0.03465822711586952, 'epoch': 0.72} + 72%|████████████████████████████████████████████████████████▏ | 491/681 [33:31<08:20, 2.64s/it] 72%|████████████████████████████████████████████████████████▎ | 492/681 [33:33<08:21, 2.66s/it] {'loss': 1.3048, 'grad_norm': 7.962594509124756, 'learning_rate': 1.097764975115576e-07, 'beta_dpo/gap_mean': 120.65419006347656, 'beta_dpo/gap_std': 151.2496337890625, 'beta_dpo/beta_used_raw': -1.9428200721740723, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.32057705521583557, 'logits/rejected': -0.30018332600593567, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 79.98442840576172, 'beta_dpo/beta_margin_mean': 0.07998443394899368, 'beta_dpo/beta_margin_std': 0.14913584291934967, 'beta_dpo/beta_margin_grad_mean': -0.4801286458969116, 'beta_dpo/beta_margin_grad_std': 0.03697565570473671, 'epoch': 0.72} + 72%|████████████████████████████████████████████████████████▎ | 492/681 [33:33<08:21, 2.66s/it] 72%|████████████████████████████████████████████████████████▍ | 493/681 [33:36<08:24, 2.68s/it] {'loss': 1.3088, 'grad_norm': 8.332205772399902, 'learning_rate': 1.0871589639435203e-07, 'beta_dpo/gap_mean': 116.27113342285156, 'beta_dpo/gap_std': 149.367431640625, 'beta_dpo/beta_used_raw': -1.9641090631484985, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3866749703884125, 'logits/rejected': -0.3490540385246277, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 98.75540924072266, 'beta_dpo/beta_margin_mean': 0.09875541180372238, 'beta_dpo/beta_margin_std': 0.13388586044311523, 'beta_dpo/beta_margin_grad_mean': -0.4754677712917328, 'beta_dpo/beta_margin_grad_std': 0.03316526114940643, 'epoch': 0.72} + 72%|████████████████████████████████████████████████████████▍ | 493/681 [33:36<08:24, 2.68s/it] 73%|████████████████████████████████████████████████████████▌ | 494/681 [33:39<08:10, 2.62s/it] {'loss': 4.6034, 'grad_norm': 6623.4462890625, 'learning_rate': 1.0765901824467166e-07, 'beta_dpo/gap_mean': 119.46544647216797, 'beta_dpo/gap_std': 148.60195922851562, 'beta_dpo/beta_used_raw': 1.4735260009765625, 'beta_dpo/beta_used': 1.4735260009765625, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2984636425971985, 'logits/rejected': -0.31005731225013733, 'beta_dpo/beta': 1.4735260009765625, 'beta_dpo/loss_margin_mean': 145.59498596191406, 'beta_dpo/beta_margin_mean': 210.98004150390625, 'beta_dpo/beta_margin_std': 204.13458251953125, 'beta_dpo/beta_margin_grad_mean': -0.15571968257427216, 'beta_dpo/beta_margin_grad_std': 0.3583217263221741, 'epoch': 0.73} + 73%|████████████████████████████████████████████████████████▌ | 494/681 [33:39<08:10, 2.62s/it] 73%|████████████████████████████████████████████████████████▋ | 495/681 [33:41<08:06, 2.62s/it] {'loss': 0.6957, 'grad_norm': 1850.2857666015625, 'learning_rate': 1.0660589091223854e-07, 'beta_dpo/gap_mean': 119.32475280761719, 'beta_dpo/gap_std': 148.88406372070312, 'beta_dpo/beta_used_raw': -0.09175539016723633, 'beta_dpo/beta_used': 0.5986773371696472, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3925628662109375, 'logits/rejected': -0.37049469351768494, 'beta_dpo/beta': 0.5986773371696472, 'beta_dpo/loss_margin_mean': 118.74334716796875, 'beta_dpo/beta_margin_mean': 80.61207580566406, 'beta_dpo/beta_margin_std': 141.1808624267578, 'beta_dpo/beta_margin_grad_mean': -0.2792108356952667, 'beta_dpo/beta_margin_grad_std': 0.2721221148967743, 'epoch': 0.73} + 73%|████████████████████████████████████████████████████████▋ | 495/681 [33:41<08:06, 2.62s/it] 73%|████████████████████████████████████████████████████████▊ | 496/681 [33:44<08:05, 2.63s/it] {'loss': 1.3074, 'grad_norm': 7.529769420623779, 'learning_rate': 1.0555654214793722e-07, 'beta_dpo/gap_mean': 116.95680236816406, 'beta_dpo/gap_std': 145.31634521484375, 'beta_dpo/beta_used_raw': -1.945371389389038, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3815876245498657, 'logits/rejected': -0.34360769391059875, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 97.60633850097656, 'beta_dpo/beta_margin_mean': 0.09760633856058121, 'beta_dpo/beta_margin_std': 0.12117937952280045, 'beta_dpo/beta_margin_grad_mean': -0.4757267236709595, 'beta_dpo/beta_margin_grad_std': 0.030057376250624657, 'epoch': 0.73} + 73%|████████████████████████████████████████████████████████▊ | 496/681 [33:44<08:05, 2.63s/it] 73%|████████████████████████████████████████████████████████▉ | 497/681 [33:47<08:03, 2.63s/it] {'loss': 2.3877, 'grad_norm': 2982.553955078125, 'learning_rate': 1.0451099960308374e-07, 'beta_dpo/gap_mean': 115.927490234375, 'beta_dpo/gap_std': 140.37762451171875, 'beta_dpo/beta_used_raw': -0.907131552696228, 'beta_dpo/beta_used': 0.8181713223457336, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3127893805503845, 'logits/rejected': -0.2815262973308563, 'beta_dpo/beta': 0.8181713223457336, 'beta_dpo/loss_margin_mean': 105.21829986572266, 'beta_dpo/beta_margin_mean': 115.75753021240234, 'beta_dpo/beta_margin_std': 177.8175506591797, 'beta_dpo/beta_margin_grad_mean': -0.27269458770751953, 'beta_dpo/beta_margin_grad_std': 0.2720523774623871, 'epoch': 0.73} + 73%|████████████████████████████████████████████████████████▉ | 497/681 [33:47<08:03, 2.63s/it] 73%|█████████████████████████████████████████████████████████ | 498/681 [33:49<08:06, 2.66s/it] {'loss': 1.2907, 'grad_norm': 8.269208908081055, 'learning_rate': 1.0346929082869641e-07, 'beta_dpo/gap_mean': 111.99593353271484, 'beta_dpo/gap_std': 142.203369140625, 'beta_dpo/beta_used_raw': -0.5454678535461426, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3560227155685425, 'logits/rejected': -0.323871910572052, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 105.87074279785156, 'beta_dpo/beta_margin_mean': 0.10587074607610703, 'beta_dpo/beta_margin_std': 0.15012362599372864, 'beta_dpo/beta_margin_grad_mean': -0.4737287759780884, 'beta_dpo/beta_margin_grad_std': 0.037132780998945236, 'epoch': 0.73} + 73%|█████████████████████████████████████████████████████████ | 498/681 [33:49<08:06, 2.66s/it] 73%|█████████████████████████████████████████████████████████▏ | 499/681 [33:52<07:54, 2.61s/it] {'loss': 1.0329, 'grad_norm': 825.9117431640625, 'learning_rate': 1.0243144327477013e-07, 'beta_dpo/gap_mean': 114.74722290039062, 'beta_dpo/gap_std': 141.5767822265625, 'beta_dpo/beta_used_raw': 0.6870215535163879, 'beta_dpo/beta_used': 0.6870215535163879, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.30797550082206726, 'logits/rejected': -0.313708633184433, 'beta_dpo/beta': 0.6870215535163879, 'beta_dpo/loss_margin_mean': 125.64728546142578, 'beta_dpo/beta_margin_mean': 93.55929565429688, 'beta_dpo/beta_margin_std': 131.30792236328125, 'beta_dpo/beta_margin_grad_mean': -0.1571728140115738, 'beta_dpo/beta_margin_grad_std': 0.35055309534072876, 'epoch': 0.73} + 73%|█████████████████████████████████████████████████████████▏ | 499/681 [33:52<07:54, 2.61s/it] 73%|█████████████████████████████████████████████████████████▎ | 500/681 [33:54<07:43, 2.56s/it] {'loss': 1.1587, 'grad_norm': 940.185546875, 'learning_rate': 1.0139748428955333e-07, 'beta_dpo/gap_mean': 117.69755554199219, 'beta_dpo/gap_std': 142.67498779296875, 'beta_dpo/beta_used_raw': 0.49765706062316895, 'beta_dpo/beta_used': 0.7255595922470093, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.30566155910491943, 'logits/rejected': -0.30621030926704407, 'beta_dpo/beta': 0.7255595922470093, 'beta_dpo/loss_margin_mean': 134.94979858398438, 'beta_dpo/beta_margin_mean': 94.36482238769531, 'beta_dpo/beta_margin_std': 166.26669311523438, 'beta_dpo/beta_margin_grad_mean': -0.3108097314834595, 'beta_dpo/beta_margin_grad_std': 0.3008542060852051, 'epoch': 0.73} + 73%|█████████████████████████████████████████████████████████▎ | 500/681 [33:54<07:43, 2.56s/it][INFO|trainer.py:4307] 2026-04-17 23:57:26,744 >> +***** Running Evaluation ***** +[INFO|trainer.py:4309] 2026-04-17 23:57:26,744 >> Num examples = 2339 +[INFO|trainer.py:4312] 2026-04-17 23:57:26,744 >> Batch size = 8 + + 0%| | 0/73 [00:00> +***** Running Evaluation ***** +[INFO|trainer.py:4309] 2026-04-18 00:02:25,667 >> Num examples = 2339 +[INFO|trainer.py:4312] 2026-04-18 00:02:25,667 >> Batch size = 8 + + 0%| | 0/73 [00:00> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-600 +[INFO|configuration_utils.py:419] 2026-04-18 00:03:20,757 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-600/config.json +[INFO|configuration_utils.py:911] 2026-04-18 00:03:20,767 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-600/generation_config.json +[INFO|modeling_utils.py:3580] 2026-04-18 00:04:11,264 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-600/model.safetensors.index.json. +[INFO|tokenization_utils_base.py:2510] 2026-04-18 00:04:11,280 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-600/tokenizer_config.json +[INFO|tokenization_utils_base.py:2519] 2026-04-18 00:04:11,291 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-600/special_tokens_map.json +[INFO|trainer.py:4083] 2026-04-18 00:07:50,421 >> Deleting older checkpoint [/scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-200] due to args.save_total_limit + 88%|██████████████████████████████████████████████████████████████████▏ | 601/681 [44:23<2:14:24, 100.80s/it] {'loss': 17.4177, 'grad_norm': 8184.26904296875, 'learning_rate': 2.1301532877994742e-08, 'beta_dpo/gap_mean': 132.40260314941406, 'beta_dpo/gap_std': 165.82818603515625, 'beta_dpo/beta_used_raw': 0.8004127740859985, 'beta_dpo/beta_used': 0.8004127740859985, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.22523418068885803, 'logits/rejected': -0.21112903952598572, 'beta_dpo/beta': 0.8004127740859985, 'beta_dpo/loss_margin_mean': 142.81410217285156, 'beta_dpo/beta_margin_mean': 111.2850570678711, 'beta_dpo/beta_margin_std': 138.40003967285156, 'beta_dpo/beta_margin_grad_mean': -0.1567797064781189, 'beta_dpo/beta_margin_grad_std': 0.36213722825050354, 'epoch': 0.88} + 88%|██████████████████████████████████████████████████████████████████▏ | 601/681 [44:23<2:14:24, 100.80s/it] 88%|███████████████████████████████████████████████████████████████████▏ | 602/681 [44:25<1:33:53, 71.31s/it] {'loss': 3.5737, 'grad_norm': 3134.716064453125, 'learning_rate': 2.0786184285784298e-08, 'beta_dpo/gap_mean': 136.0496826171875, 'beta_dpo/gap_std': 164.3628387451172, 'beta_dpo/beta_used_raw': -0.4077162742614746, 'beta_dpo/beta_used': 0.405770868062973, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2434152215719223, 'logits/rejected': -0.23451802134513855, 'beta_dpo/beta': 0.405770868062973, 'beta_dpo/loss_margin_mean': 162.997314453125, 'beta_dpo/beta_margin_mean': 59.608760833740234, 'beta_dpo/beta_margin_std': 99.93406677246094, 'beta_dpo/beta_margin_grad_mean': -0.3060374855995178, 'beta_dpo/beta_margin_grad_std': 0.2988956570625305, 'epoch': 0.88} + 88%|███████████████████████████████████████████████████████████████████▏ | 602/681 [44:26<1:33:53, 71.31s/it] 89%|███████████████████████████████████████████████████████████████████▎ | 603/681 [44:28<1:05:50, 50.64s/it] {'loss': 1.281, 'grad_norm': 8.003498077392578, 'learning_rate': 2.0276875690788204e-08, 'beta_dpo/gap_mean': 135.49624633789062, 'beta_dpo/gap_std': 164.59576416015625, 'beta_dpo/beta_used_raw': -1.2860097885131836, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.30411213636398315, 'logits/rejected': -0.28685271739959717, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 118.26836395263672, 'beta_dpo/beta_margin_mean': 0.11826837062835693, 'beta_dpo/beta_margin_std': 0.15868444740772247, 'beta_dpo/beta_margin_grad_mean': -0.47068238258361816, 'beta_dpo/beta_margin_grad_std': 0.03925681486725807, 'epoch': 0.89} + 89%|███████████████████████████████████████████████████████████████████▎ | 603/681 [44:28<1:05:50, 50.64s/it] 89%|█████████████████████████████████████████████████████████████████████▏ | 604/681 [44:31<46:30, 36.24s/it] {'loss': 0.8912, 'grad_norm': 724.6542358398438, 'learning_rate': 1.977362051376158e-08, 'beta_dpo/gap_mean': 136.69830322265625, 'beta_dpo/gap_std': 164.337158203125, 'beta_dpo/beta_used_raw': 0.08082294464111328, 'beta_dpo/beta_used': 0.6322586536407471, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2538166642189026, 'logits/rejected': -0.25749316811561584, 'beta_dpo/beta': 0.6322586536407471, 'beta_dpo/loss_margin_mean': 149.04656982421875, 'beta_dpo/beta_margin_mean': 95.61994934082031, 'beta_dpo/beta_margin_std': 150.78732299804688, 'beta_dpo/beta_margin_grad_mean': -0.27876517176628113, 'beta_dpo/beta_margin_grad_std': 0.2794075906276703, 'epoch': 0.89} + 89%|█████████████████████████████████████████████████████████████████████▏ | 604/681 [44:31<46:30, 36.24s/it] 89%|█████████████████████████████████████████████████████████████████████▎ | 605/681 [44:33<33:08, 26.16s/it] {'loss': 1.2738, 'grad_norm': 12.376964569091797, 'learning_rate': 1.9276432015946446e-08, 'beta_dpo/gap_mean': 137.9195098876953, 'beta_dpo/gap_std': 170.83059692382812, 'beta_dpo/beta_used_raw': -1.015389084815979, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2761760950088501, 'logits/rejected': -0.2704794406890869, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 133.719970703125, 'beta_dpo/beta_margin_mean': 0.13371996581554413, 'beta_dpo/beta_margin_std': 0.18470925092697144, 'beta_dpo/beta_margin_grad_mean': -0.4670778810977936, 'beta_dpo/beta_margin_grad_std': 0.044755224138498306, 'epoch': 0.89} + 89%|█████████████████████████████████████████████████████████████████████▎ | 605/681 [44:33<33:08, 26.16s/it] 89%|█████████████████████████████████████████████████████████████████████▍ | 606/681 [44:36<23:46, 19.02s/it] {'loss': 1.7791, 'grad_norm': 1692.56103515625, 'learning_rate': 1.8785323298722093e-08, 'beta_dpo/gap_mean': 136.48269653320312, 'beta_dpo/gap_std': 169.08889770507812, 'beta_dpo/beta_used_raw': 0.5771820545196533, 'beta_dpo/beta_used': 1.080771803855896, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.20563073456287384, 'logits/rejected': -0.20558518171310425, 'beta_dpo/beta': 1.080771803855896, 'beta_dpo/loss_margin_mean': 149.46290588378906, 'beta_dpo/beta_margin_mean': 202.94053649902344, 'beta_dpo/beta_margin_std': 319.66082763671875, 'beta_dpo/beta_margin_grad_mean': -0.28311601281166077, 'beta_dpo/beta_margin_grad_std': 0.2813016474246979, 'epoch': 0.89} + 89%|█████████████████████████████████████████████████████████████████████▍ | 606/681 [44:36<23:46, 19.02s/it] 89%|█████████████████████████████████████████████████████████████████████▌ | 607/681 [44:38<17:26, 14.15s/it] {'loss': 1.2853, 'grad_norm': 8.623156547546387, 'learning_rate': 1.8300307303259904e-08, 'beta_dpo/gap_mean': 136.1642303466797, 'beta_dpo/gap_std': 165.0216522216797, 'beta_dpo/beta_used_raw': -1.6541626453399658, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.28075528144836426, 'logits/rejected': -0.26314833760261536, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 120.72370910644531, 'beta_dpo/beta_margin_mean': 0.12072371691465378, 'beta_dpo/beta_margin_std': 0.14232668280601501, 'beta_dpo/beta_margin_grad_mean': -0.47003647685050964, 'beta_dpo/beta_margin_grad_std': 0.03521895408630371, 'epoch': 0.89} + 89%|█████████████████████████████████████████████████████████████████████▌ | 607/681 [44:38<17:26, 14.15s/it] 89%|█████████████████████████████████████████████████████████████████████▋ | 608/681 [44:41<12:55, 10.62s/it] {'loss': 1.2709, 'grad_norm': 8.3565673828125, 'learning_rate': 1.7821396810182437e-08, 'beta_dpo/gap_mean': 134.62435913085938, 'beta_dpo/gap_std': 160.134521484375, 'beta_dpo/beta_used_raw': -0.6566117405891418, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.30109351873397827, 'logits/rejected': -0.28483152389526367, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 131.20831298828125, 'beta_dpo/beta_margin_mean': 0.13120831549167633, 'beta_dpo/beta_margin_std': 0.13629145920276642, 'beta_dpo/beta_margin_grad_mean': -0.46741145849227905, 'beta_dpo/beta_margin_grad_std': 0.03372717648744583, 'epoch': 0.89} + 89%|█████████████████████████████████████████████████████████████████████▋ | 608/681 [44:41<12:55, 10.62s/it] 89%|█████████████████████████████████████████████████████████████████████▊ | 609/681 [44:43<09:48, 8.17s/it] {'loss': 0.6493, 'grad_norm': 3484.029052734375, 'learning_rate': 1.7348604439226617e-08, 'beta_dpo/gap_mean': 137.36264038085938, 'beta_dpo/gap_std': 161.44122314453125, 'beta_dpo/beta_used_raw': 0.5683431029319763, 'beta_dpo/beta_used': 0.5683431029319763, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.26210033893585205, 'logits/rejected': -0.24275103211402893, 'beta_dpo/beta': 0.5683431029319763, 'beta_dpo/loss_margin_mean': 153.95826721191406, 'beta_dpo/beta_margin_mean': 91.26676177978516, 'beta_dpo/beta_margin_std': 144.23231506347656, 'beta_dpo/beta_margin_grad_mean': -0.11586936563253403, 'beta_dpo/beta_margin_grad_std': 0.3091588318347931, 'epoch': 0.89} + 89%|█████████████████████████████████████████████████████████████████████▊ | 609/681 [44:43<09:48, 8.17s/it] 90%|█████████████████████████████████████████████████████████████████████▊ | 610/681 [44:45<07:35, 6.42s/it] {'loss': 1.2757, 'grad_norm': 9.066965103149414, 'learning_rate': 1.6881942648911074e-08, 'beta_dpo/gap_mean': 136.2181854248047, 'beta_dpo/gap_std': 160.43869018554688, 'beta_dpo/beta_used_raw': -0.988802433013916, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.25646454095840454, 'logits/rejected': -0.22565940022468567, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 126.81504821777344, 'beta_dpo/beta_margin_mean': 0.1268150508403778, 'beta_dpo/beta_margin_std': 0.16618604958057404, 'beta_dpo/beta_margin_grad_mean': -0.4685831665992737, 'beta_dpo/beta_margin_grad_std': 0.04099490866065025, 'epoch': 0.9} + 90%|█████████████████████████████████████████████████████████████████████▊ | 610/681 [44:46<07:35, 6.42s/it] 90%|█████████████████████████████████████████████████████████████████████▉ | 611/681 [44:48<06:02, 5.17s/it] {'loss': 8.377, 'grad_norm': 7008.0810546875, 'learning_rate': 1.6421423736208e-08, 'beta_dpo/gap_mean': 137.39132690429688, 'beta_dpo/gap_std': 162.03436279296875, 'beta_dpo/beta_used_raw': 0.4401324391365051, 'beta_dpo/beta_used': 0.7692165374755859, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.20377308130264282, 'logits/rejected': -0.19680052995681763, 'beta_dpo/beta': 0.7692165374755859, 'beta_dpo/loss_margin_mean': 148.3230438232422, 'beta_dpo/beta_margin_mean': 129.48629760742188, 'beta_dpo/beta_margin_std': 206.50274658203125, 'beta_dpo/beta_margin_grad_mean': -0.3122340738773346, 'beta_dpo/beta_margin_grad_std': 0.3016832768917084, 'epoch': 0.9} + 90%|█████████████████████████████████████████████████████████████████████▉ | 611/681 [44:48<06:02, 5.17s/it] 90%|██████████████████████████████████████████████████████████████████████ | 612/681 [44:50<05:04, 4.41s/it] {'loss': 1.2771, 'grad_norm': 13.170220375061035, 'learning_rate': 1.5967059836219042e-08, 'beta_dpo/gap_mean': 142.43182373046875, 'beta_dpo/gap_std': 161.67913818359375, 'beta_dpo/beta_used_raw': -1.4418590068817139, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2226446568965912, 'logits/rejected': -0.18076658248901367, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 165.7865753173828, 'beta_dpo/beta_margin_mean': 0.16578657925128937, 'beta_dpo/beta_margin_std': 0.16075921058654785, 'beta_dpo/beta_margin_grad_mean': -0.4589446187019348, 'beta_dpo/beta_margin_grad_std': 0.039632294327020645, 'epoch': 0.9} + 90%|██████████████████████████████████████████████████████████████████████ | 612/681 [44:50<05:04, 4.41s/it] 90%|██████████████████████████████████████████████████████████████████████▏ | 613/681 [44:53<04:21, 3.84s/it] {'loss': 1.2679, 'grad_norm': 8.972193717956543, 'learning_rate': 1.551886292185553e-08, 'beta_dpo/gap_mean': 144.05943298339844, 'beta_dpo/gap_std': 158.86074829101562, 'beta_dpo/beta_used_raw': -1.0288455486297607, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2800806760787964, 'logits/rejected': -0.29024672508239746, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 146.2444305419922, 'beta_dpo/beta_margin_mean': 0.14624443650245667, 'beta_dpo/beta_margin_std': 0.13885696232318878, 'beta_dpo/beta_margin_grad_mean': -0.46369874477386475, 'beta_dpo/beta_margin_grad_std': 0.034297436475753784, 'epoch': 0.9} + 90%|██████████████████████████████████████████████████████████████████████▏ | 613/681 [44:53<04:21, 3.84s/it] 90%|██████████████████████████████████████████████████████████████████████▎ | 614/681 [44:55<03:51, 3.45s/it] {'loss': 2.7127, 'grad_norm': 895.0585327148438, 'learning_rate': 1.507684480352292e-08, 'beta_dpo/gap_mean': 148.22630310058594, 'beta_dpo/gap_std': 159.02099609375, 'beta_dpo/beta_used_raw': -0.06399475783109665, 'beta_dpo/beta_used': 0.06990689039230347, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.20398010313510895, 'logits/rejected': -0.21416090428829193, 'beta_dpo/beta': 0.06990689039230347, 'beta_dpo/loss_margin_mean': 170.205810546875, 'beta_dpo/beta_margin_mean': 12.157843589782715, 'beta_dpo/beta_margin_std': 20.12245750427246, 'beta_dpo/beta_margin_grad_mean': -0.3197058439254761, 'beta_dpo/beta_margin_grad_std': 0.2986561954021454, 'epoch': 0.9} + 90%|██████████████████████████████████████████████████████████████████████▎ | 614/681 [44:55<03:51, 3.45s/it] 90%|██████████████████████████████████████████████████████████████████████▍ | 615/681 [44:58<03:31, 3.20s/it] {'loss': 1.2775, 'grad_norm': 8.794045448303223, 'learning_rate': 1.4641017128809801e-08, 'beta_dpo/gap_mean': 143.90347290039062, 'beta_dpo/gap_std': 156.93869018554688, 'beta_dpo/beta_used_raw': -1.6342370510101318, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2878304719924927, 'logits/rejected': -0.2756372094154358, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 115.65824127197266, 'beta_dpo/beta_margin_mean': 0.1156582459807396, 'beta_dpo/beta_margin_std': 0.14664776623249054, 'beta_dpo/beta_margin_grad_mean': -0.4713370203971863, 'beta_dpo/beta_margin_grad_std': 0.036130066961050034, 'epoch': 0.9} + 90%|██████████████████████████████████████████████████████████████████████▍ | 615/681 [44:58<03:31, 3.20s/it] 90%|██████████████████████████████████████████████████████████████████████▌ | 616/681 [45:01<03:17, 3.03s/it] {'loss': 1.3106, 'grad_norm': 9.077305793762207, 'learning_rate': 1.4211391382180637e-08, 'beta_dpo/gap_mean': 137.07403564453125, 'beta_dpo/gap_std': 155.02120971679688, 'beta_dpo/beta_used_raw': -3.256364107131958, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2529584467411041, 'logits/rejected': -0.2234017550945282, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 104.92546844482422, 'beta_dpo/beta_margin_mean': 0.10492546856403351, 'beta_dpo/beta_margin_std': 0.15405791997909546, 'beta_dpo/beta_margin_grad_mean': -0.4739888608455658, 'beta_dpo/beta_margin_grad_std': 0.03806653246283531, 'epoch': 0.9} + 90%|██████████████████████████████████████████████████████████████████████▌ | 616/681 [45:01<03:17, 3.03s/it] 91%|██████████████████████████████████████████████████████████████████████▋ | 617/681 [45:03<03:06, 2.92s/it] {'loss': 1.3011, 'grad_norm': 8.899731636047363, 'learning_rate': 1.378797888467345e-08, 'beta_dpo/gap_mean': 129.06411743164062, 'beta_dpo/gap_std': 153.8069305419922, 'beta_dpo/beta_used_raw': -2.259263753890991, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.23405620455741882, 'logits/rejected': -0.19954687356948853, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 89.77762603759766, 'beta_dpo/beta_margin_mean': 0.0897776335477829, 'beta_dpo/beta_margin_std': 0.13734619319438934, 'beta_dpo/beta_margin_grad_mean': -0.47770678997039795, 'beta_dpo/beta_margin_grad_std': 0.03402474522590637, 'epoch': 0.91} + 91%|██████████████████████████████████████████████████████████████████████▋ | 617/681 [45:03<03:06, 2.92s/it] 91%|██████████████████████████████████████████████████████████████████████▊ | 618/681 [45:06<03:01, 2.88s/it] {'loss': 3.6018, 'grad_norm': 2414.7275390625, 'learning_rate': 1.3370790793601371e-08, 'beta_dpo/gap_mean': 126.19082641601562, 'beta_dpo/gap_std': 157.0688018798828, 'beta_dpo/beta_used_raw': -0.7018966674804688, 'beta_dpo/beta_used': 0.22516019642353058, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.28824859857559204, 'logits/rejected': -0.25596606731414795, 'beta_dpo/beta': 0.22516019642353058, 'beta_dpo/loss_margin_mean': 128.67428588867188, 'beta_dpo/beta_margin_mean': 36.04357147216797, 'beta_dpo/beta_margin_std': 58.656856536865234, 'beta_dpo/beta_margin_grad_mean': -0.30100154876708984, 'beta_dpo/beta_margin_grad_std': 0.2931227684020996, 'epoch': 0.91} + 91%|██████████████████████████████████████████████████████████████████████▊ | 618/681 [45:06<03:01, 2.88s/it] 91%|██████████████████████████████████████████████████████████████████████▉ | 619/681 [45:09<02:52, 2.79s/it] {'loss': 2.3727, 'grad_norm': 1730.38427734375, 'learning_rate': 1.2959838102258535e-08, 'beta_dpo/gap_mean': 127.52127075195312, 'beta_dpo/gap_std': 159.29910278320312, 'beta_dpo/beta_used_raw': -0.5022631883621216, 'beta_dpo/beta_used': 0.32898879051208496, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2695918679237366, 'logits/rejected': -0.25438401103019714, 'beta_dpo/beta': 0.32898879051208496, 'beta_dpo/loss_margin_mean': 129.31863403320312, 'beta_dpo/beta_margin_mean': 47.31397247314453, 'beta_dpo/beta_margin_std': 92.59415435791016, 'beta_dpo/beta_margin_grad_mean': -0.32226526737213135, 'beta_dpo/beta_margin_grad_std': 0.3011726140975952, 'epoch': 0.91} + 91%|██████████████████████████████████████████████████████████████████████▉ | 619/681 [45:09<02:52, 2.79s/it] 91%|███████████████████████████████████████████████████████████████████████ | 620/681 [45:11<02:44, 2.70s/it] {'loss': 1.2793, 'grad_norm': 8.195945739746094, 'learning_rate': 1.2555131639630567e-08, 'beta_dpo/gap_mean': 128.8234405517578, 'beta_dpo/gap_std': 161.2275390625, 'beta_dpo/beta_used_raw': -0.7810671329498291, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.27168160676956177, 'logits/rejected': -0.24854370951652527, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 125.8520736694336, 'beta_dpo/beta_margin_mean': 0.1258520781993866, 'beta_dpo/beta_margin_std': 0.16934403777122498, 'beta_dpo/beta_margin_grad_mean': -0.46883711218833923, 'beta_dpo/beta_margin_grad_std': 0.04180603846907616, 'epoch': 0.91} + 91%|███████████████████████████████████████████████████████████████████████ | 620/681 [45:11<02:44, 2.70s/it] 91%|███████████████████████████████████████████████████████████████████████▏ | 621/681 [45:14<02:42, 2.72s/it] {'loss': 2.0444, 'grad_norm': 2288.819091796875, 'learning_rate': 1.2156682070109086e-08, 'beta_dpo/gap_mean': 131.76333618164062, 'beta_dpo/gap_std': 162.11734008789062, 'beta_dpo/beta_used_raw': -0.001695185899734497, 'beta_dpo/beta_used': 0.315225213766098, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.18687333166599274, 'logits/rejected': -0.1780368983745575, 'beta_dpo/beta': 0.315225213766098, 'beta_dpo/loss_margin_mean': 165.3124237060547, 'beta_dpo/beta_margin_mean': 61.944881439208984, 'beta_dpo/beta_margin_std': 95.92522430419922, 'beta_dpo/beta_margin_grad_mean': -0.28073248267173767, 'beta_dpo/beta_margin_grad_std': 0.27754899859428406, 'epoch': 0.91} + 91%|███████████████████████████████████████████████████████████████████████▏ | 621/681 [45:14<02:42, 2.72s/it] 91%|███████████████████████████████████████████████████████████████████████▏ | 622/681 [45:16<02:36, 2.66s/it] {'loss': 2.2051, 'grad_norm': 1909.772705078125, 'learning_rate': 1.1764499893210878e-08, 'beta_dpo/gap_mean': 136.07073974609375, 'beta_dpo/gap_std': 164.10821533203125, 'beta_dpo/beta_used_raw': -1.115787386894226, 'beta_dpo/beta_used': 0.2183779627084732, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2630102336406708, 'logits/rejected': -0.24436010420322418, 'beta_dpo/beta': 0.2183779627084732, 'beta_dpo/loss_margin_mean': 131.12855529785156, 'beta_dpo/beta_margin_mean': 35.79158401489258, 'beta_dpo/beta_margin_std': 66.24662017822266, 'beta_dpo/beta_margin_grad_mean': -0.3208658993244171, 'beta_dpo/beta_margin_grad_std': 0.29182958602905273, 'epoch': 0.91} + 91%|███████████████████████████████████████████████████████████████████████▏ | 622/681 [45:17<02:36, 2.66s/it] 91%|███████████████████████████████████████████████████████████████████████▎ | 623/681 [45:19<02:25, 2.51s/it] {'loss': 1.305, 'grad_norm': 8.092933654785156, 'learning_rate': 1.1378595443300998e-08, 'beta_dpo/gap_mean': 131.22195434570312, 'beta_dpo/gap_std': 165.27459716796875, 'beta_dpo/beta_used_raw': -2.597635269165039, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2123861014842987, 'logits/rejected': -0.18733005225658417, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 112.445068359375, 'beta_dpo/beta_margin_mean': 0.11244507133960724, 'beta_dpo/beta_margin_std': 0.1788908988237381, 'beta_dpo/beta_margin_grad_mean': -0.4722324013710022, 'beta_dpo/beta_margin_grad_std': 0.04376749321818352, 'epoch': 0.91} + 91%|███████████████████████████████████████████████████████████████████████▎ | 623/681 [45:19<02:25, 2.51s/it] 92%|███████████████████████████████████████████████████████████████████████▍ | 624/681 [45:21<02:24, 2.53s/it] {'loss': 18.6323, 'grad_norm': 14112.7099609375, 'learning_rate': 1.0998978889320582e-08, 'beta_dpo/gap_mean': 134.68902587890625, 'beta_dpo/gap_std': 172.1035614013672, 'beta_dpo/beta_used_raw': 1.4514429569244385, 'beta_dpo/beta_used': 1.4514429569244385, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.31213879585266113, 'logits/rejected': -0.2707129120826721, 'beta_dpo/beta': 1.4514429569244385, 'beta_dpo/loss_margin_mean': 160.6850128173828, 'beta_dpo/beta_margin_mean': 235.16859436035156, 'beta_dpo/beta_margin_std': 305.9576416015625, 'beta_dpo/beta_margin_grad_mean': -0.1736312210559845, 'beta_dpo/beta_margin_grad_std': 0.3766280710697174, 'epoch': 0.92} + 92%|███████████████████████████████████████████████████████████████████████▍ | 624/681 [45:21<02:24, 2.53s/it] 92%|███████████████████████████████████████████████████████████████████████▌ | 625/681 [45:24<02:24, 2.58s/it] {'loss': 1.277, 'grad_norm': 8.834936141967773, 'learning_rate': 1.0625660234518913e-08, 'beta_dpo/gap_mean': 135.93350219726562, 'beta_dpo/gap_std': 170.4825439453125, 'beta_dpo/beta_used_raw': -1.086260437965393, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.24899110198020935, 'logits/rejected': -0.22103792428970337, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 134.477294921875, 'beta_dpo/beta_margin_mean': 0.13447730243206024, 'beta_dpo/beta_margin_std': 0.16113615036010742, 'beta_dpo/beta_margin_grad_mean': -0.46674269437789917, 'beta_dpo/beta_margin_grad_std': 0.03943945840001106, 'epoch': 0.92} + 92%|███████████████████████████████████████████████████████████████████████▌ | 625/681 [45:24<02:24, 2.58s/it] 92%|███████████████████████████████████████████████████████████████████████▋ | 626/681 [45:27<02:24, 2.63s/it] {'loss': 1.2908, 'grad_norm': 11.363311767578125, 'learning_rate': 1.0258649316189721e-08, 'beta_dpo/gap_mean': 132.06570434570312, 'beta_dpo/gap_std': 165.1246337890625, 'beta_dpo/beta_used_raw': -1.6771858930587769, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.30383527278900146, 'logits/rejected': -0.27899685502052307, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 117.33277893066406, 'beta_dpo/beta_margin_mean': 0.11733278632164001, 'beta_dpo/beta_margin_std': 0.15290819108486176, 'beta_dpo/beta_margin_grad_mean': -0.47088930010795593, 'beta_dpo/beta_margin_grad_std': 0.03784249722957611, 'epoch': 0.92} + 92%|███████████████████████████████████████████████████████████████████████▋ | 626/681 [45:27<02:24, 2.63s/it] 92%|███████████████████████████████████████████████████████████████████████▊ | 627/681 [45:29<02:23, 2.65s/it] {'loss': 1.2721, 'grad_norm': 10.255217552185059, 'learning_rate': 9.897955805412e-09, 'beta_dpo/gap_mean': 135.79798889160156, 'beta_dpo/gap_std': 170.36813354492188, 'beta_dpo/beta_used_raw': -0.715671956539154, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2576707601547241, 'logits/rejected': -0.27673864364624023, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 162.4019317626953, 'beta_dpo/beta_margin_mean': 0.16240194439888, 'beta_dpo/beta_margin_std': 0.2026146799325943, 'beta_dpo/beta_margin_grad_mean': -0.4600542187690735, 'beta_dpo/beta_margin_grad_std': 0.04917608201503754, 'epoch': 0.92} + 92%|███████████████████████████████████████████████████████████████████████▊ | 627/681 [45:29<02:23, 2.65s/it] 92%|███████████████████████████████████████████████████████████████████████▉ | 628/681 [45:32<02:19, 2.63s/it] {'loss': 1.2686, 'grad_norm': 9.771873474121094, 'learning_rate': 9.543589206795238e-09, 'beta_dpo/gap_mean': 141.70660400390625, 'beta_dpo/gap_std': 172.304931640625, 'beta_dpo/beta_used_raw': -0.7566049098968506, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.25853201746940613, 'logits/rejected': -0.2484220564365387, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 156.77191162109375, 'beta_dpo/beta_margin_mean': 0.15677191317081451, 'beta_dpo/beta_margin_std': 0.16587892174720764, 'beta_dpo/beta_margin_grad_mean': -0.46119076013565063, 'beta_dpo/beta_margin_grad_std': 0.04088958352804184, 'epoch': 0.92} + 92%|███████████████████████████████████████████████████████████████████████▉ | 628/681 [45:32<02:19, 2.63s/it] 92%|████████████████████████████████████████████████████████████████████████ | 629/681 [45:35<02:18, 2.67s/it] {'loss': 1.274, 'grad_norm': 13.822155952453613, 'learning_rate': 9.19555885822887e-09, 'beta_dpo/gap_mean': 140.23866271972656, 'beta_dpo/gap_std': 167.48165893554688, 'beta_dpo/beta_used_raw': -1.1783255338668823, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2648368775844574, 'logits/rejected': -0.2452375888824463, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 128.8066864013672, 'beta_dpo/beta_margin_mean': 0.1288066953420639, 'beta_dpo/beta_margin_std': 0.13501150906085968, 'beta_dpo/beta_margin_grad_mean': -0.4680294096469879, 'beta_dpo/beta_margin_grad_std': 0.033298566937446594, 'epoch': 0.92} + 92%|████████████████████████████████████████████████████████████████████████ | 629/681 [45:35<02:18, 2.67s/it] 93%|████████████████████████████████████████████████████████████████████████▏ | 630/681 [45:37<02:15, 2.65s/it] {'loss': 1.3037, 'grad_norm': 8.615431785583496, 'learning_rate': 8.85387393063622e-09, 'beta_dpo/gap_mean': 132.54100036621094, 'beta_dpo/gap_std': 162.70718383789062, 'beta_dpo/beta_used_raw': -2.620537281036377, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3369476795196533, 'logits/rejected': -0.3151329755783081, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 98.09712219238281, 'beta_dpo/beta_margin_mean': 0.09809713065624237, 'beta_dpo/beta_margin_std': 0.1510220766067505, 'beta_dpo/beta_margin_grad_mean': -0.47568345069885254, 'beta_dpo/beta_margin_grad_std': 0.03724653273820877, 'epoch': 0.93} + 93%|████████████████████████████████████████████████████████████████████████▏ | 630/681 [45:37<02:15, 2.65s/it] 93%|████████████████████████████████████████████████████████████████████████▎ | 631/681 [45:40<02:10, 2.60s/it] {'loss': 1.3022, 'grad_norm': 10.43221378326416, 'learning_rate': 8.518543427732949e-09, 'beta_dpo/gap_mean': 129.70608520507812, 'beta_dpo/gap_std': 164.6175079345703, 'beta_dpo/beta_used_raw': -2.252204656600952, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.19672399759292603, 'logits/rejected': -0.16939029097557068, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 124.01167297363281, 'beta_dpo/beta_margin_mean': 0.12401168048381805, 'beta_dpo/beta_margin_std': 0.18085241317749023, 'beta_dpo/beta_margin_grad_mean': -0.469342440366745, 'beta_dpo/beta_margin_grad_std': 0.04454941302537918, 'epoch': 0.93} + 93%|████████████████████████████████████████████████████████████████████████▎ | 631/681 [45:40<02:10, 2.60s/it] 93%|████████████████████████████████████████████████████████████████████████▍ | 632/681 [45:42<02:04, 2.55s/it] {'loss': 1.2832, 'grad_norm': 8.912779808044434, 'learning_rate': 8.189576185789637e-09, 'beta_dpo/gap_mean': 129.06605529785156, 'beta_dpo/gap_std': 169.87759399414062, 'beta_dpo/beta_used_raw': -0.9367992877960205, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2137627899646759, 'logits/rejected': -0.1909235715866089, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 118.0839614868164, 'beta_dpo/beta_margin_mean': 0.11808396875858307, 'beta_dpo/beta_margin_std': 0.17979924380779266, 'beta_dpo/beta_margin_grad_mean': -0.4707336127758026, 'beta_dpo/beta_margin_grad_std': 0.04446292296051979, 'epoch': 0.93} + 93%|████████████████████████████████████████████████████████████████████████▍ | 632/681 [45:42<02:04, 2.55s/it] 93%|████████████████████████████████████████████████████████████████████████▌ | 633/681 [45:45<02:00, 2.51s/it] {'loss': 4.4345, 'grad_norm': 2468.25341796875, 'learning_rate': 7.866980873399015e-09, 'beta_dpo/gap_mean': 122.80825805664062, 'beta_dpo/gap_std': 166.48403930664062, 'beta_dpo/beta_used_raw': -1.1626986265182495, 'beta_dpo/beta_used': 0.1498415768146515, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.262068510055542, 'logits/rejected': -0.2606055736541748, 'beta_dpo/beta': 0.1498415768146515, 'beta_dpo/loss_margin_mean': 100.83395385742188, 'beta_dpo/beta_margin_mean': 17.67989158630371, 'beta_dpo/beta_margin_std': 41.04912567138672, 'beta_dpo/beta_margin_grad_mean': -0.3400387465953827, 'beta_dpo/beta_margin_grad_std': 0.31042587757110596, 'epoch': 0.93} + 93%|████████████████████████████████████████████████████████████████████████▌ | 633/681 [45:45<02:00, 2.51s/it] 93%|████████████████████████████████████████████████████████████████████████▌ | 634/681 [45:47<01:59, 2.54s/it] {'loss': 8.1633, 'grad_norm': 4420.4560546875, 'learning_rate': 7.550765991247654e-09, 'beta_dpo/gap_mean': 123.09707641601562, 'beta_dpo/gap_std': 168.86935424804688, 'beta_dpo/beta_used_raw': -1.0204623937606812, 'beta_dpo/beta_used': 0.2891407012939453, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2516968548297882, 'logits/rejected': -0.2492125928401947, 'beta_dpo/beta': 0.2891407012939453, 'beta_dpo/loss_margin_mean': 114.10114288330078, 'beta_dpo/beta_margin_mean': 44.27980041503906, 'beta_dpo/beta_margin_std': 89.58101654052734, 'beta_dpo/beta_margin_grad_mean': -0.3603072762489319, 'beta_dpo/beta_margin_grad_std': 0.3205583393573761, 'epoch': 0.93} + 93%|████████████████████████████████████████████████████████████████████████▌ | 634/681 [45:47<01:59, 2.54s/it] 93%|████████████████████████████████████████████████████████████████████████▋ | 635/681 [45:50<01:56, 2.52s/it] {'loss': 12.3188, 'grad_norm': 4297.1875, 'learning_rate': 7.240939871891699e-09, 'beta_dpo/gap_mean': 119.10769653320312, 'beta_dpo/gap_std': 164.04827880859375, 'beta_dpo/beta_used_raw': 0.44367918372154236, 'beta_dpo/beta_used': 0.8167719841003418, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3063223958015442, 'logits/rejected': -0.25702351331710815, 'beta_dpo/beta': 0.8167719841003418, 'beta_dpo/loss_margin_mean': 108.52445983886719, 'beta_dpo/beta_margin_mean': 93.9231948852539, 'beta_dpo/beta_margin_std': 184.6671905517578, 'beta_dpo/beta_margin_grad_mean': -0.3317233920097351, 'beta_dpo/beta_margin_grad_std': 0.3114463686943054, 'epoch': 0.93} + 93%|████████████████████████████████████████████████████████████████████████▋ | 635/681 [45:50<01:56, 2.52s/it] 93%|████████████████████████████████████████████████████████████████████████▊ | 636/681 [45:52<01:56, 2.58s/it] {'loss': 2.1742, 'grad_norm': 1658.96923828125, 'learning_rate': 6.937510679537628e-09, 'beta_dpo/gap_mean': 119.43673706054688, 'beta_dpo/gap_std': 161.71958923339844, 'beta_dpo/beta_used_raw': -0.49636417627334595, 'beta_dpo/beta_used': 0.21374358236789703, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2624373733997345, 'logits/rejected': -0.23375412821769714, 'beta_dpo/beta': 0.21374358236789703, 'beta_dpo/loss_margin_mean': 132.7529754638672, 'beta_dpo/beta_margin_mean': 32.544044494628906, 'beta_dpo/beta_margin_std': 50.19921112060547, 'beta_dpo/beta_margin_grad_mean': -0.29352760314941406, 'beta_dpo/beta_margin_grad_std': 0.28238052129745483, 'epoch': 0.93} + 93%|████████████████████████████████████████████████████████████████████████▊ | 636/681 [45:53<01:56, 2.58s/it] 94%|████████████████████████████████████████████████████████████████████████▉ | 637/681 [45:55<01:56, 2.64s/it] {'loss': 3.3524, 'grad_norm': 4178.92724609375, 'learning_rate': 6.640486409826785e-09, 'beta_dpo/gap_mean': 124.16712951660156, 'beta_dpo/gap_std': 161.0850372314453, 'beta_dpo/beta_used_raw': 0.3115572929382324, 'beta_dpo/beta_used': 0.3223646879196167, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.22674018144607544, 'logits/rejected': -0.22383208572864532, 'beta_dpo/beta': 0.3223646879196167, 'beta_dpo/loss_margin_mean': 139.51402282714844, 'beta_dpo/beta_margin_mean': 42.55961608886719, 'beta_dpo/beta_margin_std': 81.67517852783203, 'beta_dpo/beta_margin_grad_mean': -0.32306286692619324, 'beta_dpo/beta_margin_grad_std': 0.30376118421554565, 'epoch': 0.94} + 94%|████████████████████████████████████████████████████████████████████████▉ | 637/681 [45:55<01:56, 2.64s/it] 94%|█████████████████████████████████████████████████████████████████████████ | 638/681 [45:58<01:54, 2.65s/it] {'loss': 8.0532, 'grad_norm': 9381.5517578125, 'learning_rate': 6.349874889624962e-09, 'beta_dpo/gap_mean': 124.66742706298828, 'beta_dpo/gap_std': 157.39694213867188, 'beta_dpo/beta_used_raw': -0.3003849983215332, 'beta_dpo/beta_used': 1.4511369466781616, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2576182782649994, 'logits/rejected': -0.23263539373874664, 'beta_dpo/beta': 1.4511369466781616, 'beta_dpo/loss_margin_mean': 139.00933837890625, 'beta_dpo/beta_margin_mean': 266.310791015625, 'beta_dpo/beta_margin_std': 417.8957214355469, 'beta_dpo/beta_margin_grad_mean': -0.3164081573486328, 'beta_dpo/beta_margin_grad_std': 0.30334481596946716, 'epoch': 0.94} + 94%|█████████████████████████████████████████████████████████████████████████ | 638/681 [45:58<01:54, 2.65s/it] 94%|█████████████████████████████████████████████████████████████████████████▏ | 639/681 [46:00<01:50, 2.62s/it] {'loss': 1.2811, 'grad_norm': 11.267277717590332, 'learning_rate': 6.065683776815933e-09, 'beta_dpo/gap_mean': 122.42938995361328, 'beta_dpo/gap_std': 157.66665649414062, 'beta_dpo/beta_used_raw': -0.47708529233932495, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2489241063594818, 'logits/rejected': -0.20080968737602234, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 97.4274673461914, 'beta_dpo/beta_margin_mean': 0.09742747247219086, 'beta_dpo/beta_margin_std': 0.1560250073671341, 'beta_dpo/beta_margin_grad_mean': -0.47581177949905396, 'beta_dpo/beta_margin_grad_std': 0.03869582340121269, 'epoch': 0.94} + 94%|█████████████████████████████████████████████████████████████████████████▏ | 639/681 [46:01<01:50, 2.62s/it] 94%|█████████████████████████████████████████████████████████████████████████▎ | 640/681 [46:03<01:47, 2.62s/it] {'loss': 0.5288, 'grad_norm': 2567.301025390625, 'learning_rate': 5.7879205600998296e-09, 'beta_dpo/gap_mean': 126.0462875366211, 'beta_dpo/gap_std': 156.94723510742188, 'beta_dpo/beta_used_raw': 1.0406347513198853, 'beta_dpo/beta_used': 1.0406347513198853, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2669011354446411, 'logits/rejected': -0.2516845762729645, 'beta_dpo/beta': 1.0406347513198853, 'beta_dpo/loss_margin_mean': 156.25440979003906, 'beta_dpo/beta_margin_mean': 186.98306274414062, 'beta_dpo/beta_margin_std': 294.89520263671875, 'beta_dpo/beta_margin_grad_mean': -0.10319266468286514, 'beta_dpo/beta_margin_grad_std': 0.23703627288341522, 'epoch': 0.94} + 94%|█████████████████████████████████████████████████████████████████████████▎ | 640/681 [46:03<01:47, 2.62s/it] 94%|█████████████████████████████████████████████████████████████████████████▍ | 641/681 [46:06<01:44, 2.60s/it] {'loss': 10.8266, 'grad_norm': 3385.51611328125, 'learning_rate': 5.516592558795746e-09, 'beta_dpo/gap_mean': 128.0950164794922, 'beta_dpo/gap_std': 159.058837890625, 'beta_dpo/beta_used_raw': 0.3140296936035156, 'beta_dpo/beta_used': 0.6511551141738892, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2616059482097626, 'logits/rejected': -0.23641052842140198, 'beta_dpo/beta': 0.6511551141738892, 'beta_dpo/loss_margin_mean': 120.50196075439453, 'beta_dpo/beta_margin_mean': 84.13956451416016, 'beta_dpo/beta_margin_std': 165.199462890625, 'beta_dpo/beta_margin_grad_mean': -0.2987769544124603, 'beta_dpo/beta_margin_grad_std': 0.29313045740127563, 'epoch': 0.94} + 94%|█████████████████████████████████████████████████████████████████████████▍ | 641/681 [46:06<01:44, 2.60s/it] 94%|█████████████████████████████████████████████████████████████████████████▌ | 642/681 [46:08<01:42, 2.64s/it] {'loss': 7.0951, 'grad_norm': 6544.80078125, 'learning_rate': 5.251706922648868e-09, 'beta_dpo/gap_mean': 128.44711303710938, 'beta_dpo/gap_std': 167.51364135742188, 'beta_dpo/beta_used_raw': -0.6517113447189331, 'beta_dpo/beta_used': 0.7909172177314758, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.27374494075775146, 'logits/rejected': -0.26332151889801025, 'beta_dpo/beta': 0.7909172177314758, 'beta_dpo/loss_margin_mean': 147.625244140625, 'beta_dpo/beta_margin_mean': 147.3969268798828, 'beta_dpo/beta_margin_std': 221.18307495117188, 'beta_dpo/beta_margin_grad_mean': -0.26804837584495544, 'beta_dpo/beta_margin_grad_std': 0.27035075426101685, 'epoch': 0.94} + 94%|█████████████████████████████████████████████████████████████████████████▌ | 642/681 [46:08<01:42, 2.64s/it] 94%|█████████████████████████████████████████████████████████████████████████▋ | 643/681 [46:11<01:42, 2.70s/it] {'loss': 3.7361, 'grad_norm': 4131.7802734375, 'learning_rate': 4.993270631642038e-09, 'beta_dpo/gap_mean': 131.22329711914062, 'beta_dpo/gap_std': 162.10546875, 'beta_dpo/beta_used_raw': -0.6685765981674194, 'beta_dpo/beta_used': 0.5000445246696472, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.24260678887367249, 'logits/rejected': -0.24370941519737244, 'beta_dpo/beta': 0.5000445246696472, 'beta_dpo/loss_margin_mean': 120.23302459716797, 'beta_dpo/beta_margin_mean': 75.19145965576172, 'beta_dpo/beta_margin_std': 120.19136047363281, 'beta_dpo/beta_margin_grad_mean': -0.2856932282447815, 'beta_dpo/beta_margin_grad_std': 0.28263115882873535, 'epoch': 0.94} + 94%|█████████████████████████████████████████████████████████████████████████▋ | 643/681 [46:11<01:42, 2.70s/it] 95%|█████████████████████████████████████████████████████████████████████████▊ | 644/681 [46:14<01:39, 2.68s/it] {'loss': 1.2896, 'grad_norm': 9.257484436035156, 'learning_rate': 4.741290495811873e-09, 'beta_dpo/gap_mean': 127.92471313476562, 'beta_dpo/gap_std': 164.80690002441406, 'beta_dpo/beta_used_raw': -1.3600785732269287, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.29417717456817627, 'logits/rejected': -0.2829264998435974, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 119.60871887207031, 'beta_dpo/beta_margin_mean': 0.11960872262716293, 'beta_dpo/beta_margin_std': 0.18185746669769287, 'beta_dpo/beta_margin_grad_mean': -0.4704153537750244, 'beta_dpo/beta_margin_grad_std': 0.044792983680963516, 'epoch': 0.95} + 95%|█████████████████████████████████████████████████████████████████████████▊ | 644/681 [46:14<01:39, 2.68s/it] 95%|█████████████████████████████████████████████████████████████████████████▉ | 645/681 [46:17<01:36, 2.67s/it] {'loss': 1.2982, 'grad_norm': 11.280401229858398, 'learning_rate': 4.495773155069299e-09, 'beta_dpo/gap_mean': 125.04953002929688, 'beta_dpo/gap_std': 169.11019897460938, 'beta_dpo/beta_used_raw': -1.6929526329040527, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.26835355162620544, 'logits/rejected': -0.2733767330646515, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 101.11979675292969, 'beta_dpo/beta_margin_mean': 0.10111980140209198, 'beta_dpo/beta_margin_std': 0.1889955848455429, 'beta_dpo/beta_margin_grad_mean': -0.47504597902297974, 'beta_dpo/beta_margin_grad_std': 0.04654289036989212, 'epoch': 0.95} + 95%|█████████████████████████████████████████████████████████████████████████▉ | 645/681 [46:17<01:36, 2.67s/it] 95%|█████████████████████████████████████████████████████████████████████████▉ | 646/681 [46:19<01:30, 2.58s/it] {'loss': 3.2758, 'grad_norm': 7780.4990234375, 'learning_rate': 4.256725079024553e-09, 'beta_dpo/gap_mean': 121.25621032714844, 'beta_dpo/gap_std': 164.90869140625, 'beta_dpo/beta_used_raw': 0.016669809818267822, 'beta_dpo/beta_used': 0.9947884678840637, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.21456298232078552, 'logits/rejected': -0.19140079617500305, 'beta_dpo/beta': 0.9947884678840637, 'beta_dpo/loss_margin_mean': 113.20999145507812, 'beta_dpo/beta_margin_mean': 126.421630859375, 'beta_dpo/beta_margin_std': 230.53216552734375, 'beta_dpo/beta_margin_grad_mean': -0.3158058226108551, 'beta_dpo/beta_margin_grad_std': 0.3032316267490387, 'epoch': 0.95} + 95%|█████████████████████████████████████████████████████████████████████████▉ | 646/681 [46:19<01:30, 2.58s/it] 95%|██████████████████████████████████████████████████████████████████████████ | 647/681 [46:22<01:29, 2.62s/it] {'loss': 8.0903, 'grad_norm': 3096.896240234375, 'learning_rate': 4.024152566816791e-09, 'beta_dpo/gap_mean': 119.49800109863281, 'beta_dpo/gap_std': 160.93655395507812, 'beta_dpo/beta_used_raw': 0.4405333995819092, 'beta_dpo/beta_used': 0.4405333995819092, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.23497043550014496, 'logits/rejected': -0.23454715311527252, 'beta_dpo/beta': 0.4405333995819092, 'beta_dpo/loss_margin_mean': 118.18896484375, 'beta_dpo/beta_margin_mean': 50.956336975097656, 'beta_dpo/beta_margin_std': 66.18246459960938, 'beta_dpo/beta_margin_grad_mean': -0.16993050277233124, 'beta_dpo/beta_margin_grad_std': 0.3702445924282074, 'epoch': 0.95} + 95%|██████████████████████████████████████████████████████████████████████████ | 647/681 [46:22<01:29, 2.62s/it] 95%|██████████████████████████████████████████████████████████████████████████▏ | 648/681 [46:24<01:24, 2.56s/it] {'loss': 3.7315, 'grad_norm': 1881.7218017578125, 'learning_rate': 3.798061746947995e-09, 'beta_dpo/gap_mean': 127.08036804199219, 'beta_dpo/gap_std': 167.84896850585938, 'beta_dpo/beta_used_raw': 0.027231574058532715, 'beta_dpo/beta_used': 0.21638301014900208, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2324717938899994, 'logits/rejected': -0.23608848452568054, 'beta_dpo/beta': 0.21638301014900208, 'beta_dpo/loss_margin_mean': 167.418212890625, 'beta_dpo/beta_margin_mean': 37.98030090332031, 'beta_dpo/beta_margin_std': 73.11116027832031, 'beta_dpo/beta_margin_grad_mean': -0.28432542085647583, 'beta_dpo/beta_margin_grad_std': 0.2745562791824341, 'epoch': 0.95} + 95%|██████████████████████████████████████████████████████████████████████████▏ | 648/681 [46:24<01:24, 2.56s/it] 95%|██████████████████████████████████████████████████████████████████████████▎ | 649/681 [46:27<01:24, 2.63s/it] {'loss': 3.869, 'grad_norm': 2891.095703125, 'learning_rate': 3.5784585771215235e-09, 'beta_dpo/gap_mean': 124.45140075683594, 'beta_dpo/gap_std': 167.86746215820312, 'beta_dpo/beta_used_raw': -0.06394051015377045, 'beta_dpo/beta_used': 0.17022213339805603, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3063885569572449, 'logits/rejected': -0.2801710069179535, 'beta_dpo/beta': 0.17022213339805603, 'beta_dpo/loss_margin_mean': 101.8471450805664, 'beta_dpo/beta_margin_mean': 15.761299133300781, 'beta_dpo/beta_margin_std': 38.01227569580078, 'beta_dpo/beta_margin_grad_mean': -0.37708210945129395, 'beta_dpo/beta_margin_grad_std': 0.333068311214447, 'epoch': 0.95} + 95%|██████████████████████████████████████████████████████████████████████████▎ | 649/681 [46:27<01:24, 2.63s/it] 95%|██████████████████████████████████████████████████████████████████████████▍ | 650/681 [46:29<01:20, 2.60s/it] {'loss': 43.9246, 'grad_norm': 20882.701171875, 'learning_rate': 3.3653488440851253e-09, 'beta_dpo/gap_mean': 129.84597778320312, 'beta_dpo/gap_std': 173.6107635498047, 'beta_dpo/beta_used_raw': 1.3667818307876587, 'beta_dpo/beta_used': 1.3667818307876587, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.22325937449932098, 'logits/rejected': -0.22227120399475098, 'beta_dpo/beta': 1.3667818307876587, 'beta_dpo/loss_margin_mean': 161.89785766601562, 'beta_dpo/beta_margin_mean': 236.2583770751953, 'beta_dpo/beta_margin_std': 431.2769470214844, 'beta_dpo/beta_margin_grad_mean': -0.2347412258386612, 'beta_dpo/beta_margin_grad_std': 0.42016705870628357, 'epoch': 0.95} + 95%|██████████████████████████████████████████████████████████████████████████▍ | 650/681 [46:29<01:20, 2.60s/it] 96%|██████████████████████████████████████████████████████████████████████████▌ | 651/681 [46:32<01:17, 2.60s/it] {'loss': 8.9479, 'grad_norm': 7399.314453125, 'learning_rate': 3.158738163478475e-09, 'beta_dpo/gap_mean': 134.56472778320312, 'beta_dpo/gap_std': 172.713623046875, 'beta_dpo/beta_used_raw': 0.4660683274269104, 'beta_dpo/beta_used': 0.7648828029632568, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.29069170355796814, 'logits/rejected': -0.3059248924255371, 'beta_dpo/beta': 0.7648828029632568, 'beta_dpo/loss_margin_mean': 150.5056610107422, 'beta_dpo/beta_margin_mean': 129.44383239746094, 'beta_dpo/beta_margin_std': 225.9346466064453, 'beta_dpo/beta_margin_grad_mean': -0.32739847898483276, 'beta_dpo/beta_margin_grad_std': 0.3100513815879822, 'epoch': 0.96} + 96%|██████████████████████████████████████████████████████████████████████████▌ | 651/681 [46:32<01:17, 2.60s/it] 96%|██████████████████████████████████████████████████████████████████████████▋ | 652/681 [46:34<01:15, 2.59s/it] {'loss': 1.2702, 'grad_norm': 13.33399772644043, 'learning_rate': 2.9586319796851555e-09, 'beta_dpo/gap_mean': 133.96636962890625, 'beta_dpo/gap_std': 171.03175354003906, 'beta_dpo/beta_used_raw': -0.46489205956459045, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2815973162651062, 'logits/rejected': -0.2725764214992523, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 133.89273071289062, 'beta_dpo/beta_margin_mean': 0.1338927298784256, 'beta_dpo/beta_margin_std': 0.1681978404521942, 'beta_dpo/beta_margin_grad_mean': -0.46688932180404663, 'beta_dpo/beta_margin_grad_std': 0.04138989374041557, 'epoch': 0.96} + 96%|██████████████████████████████████████████████████████████████████████████▋ | 652/681 [46:35<01:15, 2.59s/it] 96%|██████████████████████████████████████████████████████████████████████████▊ | 653/681 [46:37<01:12, 2.58s/it] {'loss': 1.2661, 'grad_norm': 9.623185157775879, 'learning_rate': 2.7650355656892166e-09, 'beta_dpo/gap_mean': 136.72564697265625, 'beta_dpo/gap_std': 170.6292724609375, 'beta_dpo/beta_used_raw': -0.37464144825935364, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.26191675662994385, 'logits/rejected': -0.26024746894836426, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 152.7071533203125, 'beta_dpo/beta_margin_mean': 0.15270715951919556, 'beta_dpo/beta_margin_std': 0.1718183010816574, 'beta_dpo/beta_margin_grad_mean': -0.4622488021850586, 'beta_dpo/beta_margin_grad_std': 0.0421992689371109, 'epoch': 0.96} + 96%|██████████████████████████████████████████████████████████████████████████▊ | 653/681 [46:37<01:12, 2.58s/it] 96%|██████████████████████████████████████████████████████████████████████████▉ | 654/681 [46:40<01:09, 2.57s/it] {'loss': 1.479, 'grad_norm': 1289.0914306640625, 'learning_rate': 2.577954022936174e-09, 'beta_dpo/gap_mean': 135.6177978515625, 'beta_dpo/gap_std': 171.04434204101562, 'beta_dpo/beta_used_raw': -0.6519217491149902, 'beta_dpo/beta_used': 0.12737774848937988, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.285967618227005, 'logits/rejected': -0.2813323140144348, 'beta_dpo/beta': 0.12737774848937988, 'beta_dpo/loss_margin_mean': 119.36299896240234, 'beta_dpo/beta_margin_mean': 14.189286231994629, 'beta_dpo/beta_margin_std': 31.74391746520996, 'beta_dpo/beta_margin_grad_mean': -0.3520982265472412, 'beta_dpo/beta_margin_grad_std': 0.311506450176239, 'epoch': 0.96} + 96%|██████████████████████████████████████████████████████████████████████████▉ | 654/681 [46:40<01:09, 2.57s/it] 96%|███████████████████████████████████████████████████████████████████████████ | 655/681 [46:42<01:08, 2.64s/it] {'loss': 1.2895, 'grad_norm': 11.29627513885498, 'learning_rate': 2.397392281198729e-09, 'beta_dpo/gap_mean': 134.3379364013672, 'beta_dpo/gap_std': 172.51646423339844, 'beta_dpo/beta_used_raw': -1.6912943124771118, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.21164986491203308, 'logits/rejected': -0.22321152687072754, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 127.51449584960938, 'beta_dpo/beta_margin_mean': 0.12751449644565582, 'beta_dpo/beta_margin_std': 0.1827131062746048, 'beta_dpo/beta_margin_grad_mean': -0.4685191512107849, 'beta_dpo/beta_margin_grad_std': 0.04493279755115509, 'epoch': 0.96} + 96%|███████████████████████████████████████████████████████████████████████████ | 655/681 [46:42<01:08, 2.64s/it] 96%|███████████████████████████████████████████████████████████████████████████▏ | 656/681 [46:45<01:06, 2.66s/it] {'loss': 10.8002, 'grad_norm': 4871.01171875, 'learning_rate': 2.223355098446622e-09, 'beta_dpo/gap_mean': 140.21481323242188, 'beta_dpo/gap_std': 170.7769775390625, 'beta_dpo/beta_used_raw': 1.46394944190979, 'beta_dpo/beta_used': 1.46394944190979, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.20203420519828796, 'logits/rejected': -0.2107037454843521, 'beta_dpo/beta': 1.46394944190979, 'beta_dpo/loss_margin_mean': 189.06527709960938, 'beta_dpo/beta_margin_mean': 281.1544494628906, 'beta_dpo/beta_margin_std': 236.0167694091797, 'beta_dpo/beta_margin_grad_mean': -0.09375060349702835, 'beta_dpo/beta_margin_grad_std': 0.2914803922176361, 'epoch': 0.96} + 96%|███████████████████████████████████████████████████████████████████████████▏ | 656/681 [46:45<01:06, 2.66s/it] 96%|███████████████████████████████████████████████████████████████████████████▎ | 657/681 [46:47<01:00, 2.53s/it] {'loss': 0.6362, 'grad_norm': 5.878337860107422, 'learning_rate': 2.055847060721566e-09, 'beta_dpo/gap_mean': 148.42965698242188, 'beta_dpo/gap_std': 167.33609008789062, 'beta_dpo/beta_used_raw': -0.031182467937469482, 'beta_dpo/beta_used': 0.7246884703636169, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2323456108570099, 'logits/rejected': -0.23794196546077728, 'beta_dpo/beta': 0.7246884703636169, 'beta_dpo/loss_margin_mean': 171.03836059570312, 'beta_dpo/beta_margin_mean': 136.55160522460938, 'beta_dpo/beta_margin_std': 201.0517578125, 'beta_dpo/beta_margin_grad_mean': -0.24664191901683807, 'beta_dpo/beta_margin_grad_std': 0.24966345727443695, 'epoch': 0.96} + 96%|███████████████████████████████████████████████████████████████████████████▎ | 657/681 [46:47<01:00, 2.53s/it] 97%|███████████████████████████████████████████████████████████████████████████▎ | 658/681 [46:50<00:56, 2.47s/it] {'loss': 1.2622, 'grad_norm': 9.239810943603516, 'learning_rate': 1.8948725820160662e-09, 'beta_dpo/gap_mean': 145.939208984375, 'beta_dpo/gap_std': 164.26235961914062, 'beta_dpo/beta_used_raw': -0.7150457501411438, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.23999705910682678, 'logits/rejected': -0.2215622067451477, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 135.171630859375, 'beta_dpo/beta_margin_mean': 0.13517163693904877, 'beta_dpo/beta_margin_std': 0.15315905213356018, 'beta_dpo/beta_margin_grad_mean': -0.466478168964386, 'beta_dpo/beta_margin_grad_std': 0.03783747926354408, 'epoch': 0.97} + 97%|███████████████████████████████████████████████████████████████████████████▎ | 658/681 [46:50<00:56, 2.47s/it] 97%|███████████████████████████████████████████████████████████████████████████▍ | 659/681 [46:52<00:56, 2.55s/it] {'loss': 1.2654, 'grad_norm': 13.10746955871582, 'learning_rate': 1.7404359041573723e-09, 'beta_dpo/gap_mean': 143.0897216796875, 'beta_dpo/gap_std': 163.14138793945312, 'beta_dpo/beta_used_raw': -0.6675459146499634, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3149293065071106, 'logits/rejected': -0.26698166131973267, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 124.60411071777344, 'beta_dpo/beta_margin_mean': 0.12460412085056305, 'beta_dpo/beta_margin_std': 0.15533404052257538, 'beta_dpo/beta_margin_grad_mean': -0.46911635994911194, 'beta_dpo/beta_margin_grad_std': 0.0383928045630455, 'epoch': 0.97} + 97%|███████████████████████████████████████████████████████████████████████████▍ | 659/681 [46:52<00:56, 2.55s/it] 97%|███████████████████████████████████████████████████████████████████████████▌ | 660/681 [46:55<00:53, 2.53s/it] {'loss': 2.3556, 'grad_norm': 1521.5159912109375, 'learning_rate': 1.592541096695571e-09, 'beta_dpo/gap_mean': 144.819091796875, 'beta_dpo/gap_std': 160.9578857421875, 'beta_dpo/beta_used_raw': 0.0521998405456543, 'beta_dpo/beta_used': 0.336564302444458, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.18781328201293945, 'logits/rejected': -0.15785738825798035, 'beta_dpo/beta': 0.336564302444458, 'beta_dpo/loss_margin_mean': 162.99786376953125, 'beta_dpo/beta_margin_mean': 57.831546783447266, 'beta_dpo/beta_margin_std': 95.76539611816406, 'beta_dpo/beta_margin_grad_mean': -0.2779940366744995, 'beta_dpo/beta_margin_grad_std': 0.27703657746315, 'epoch': 0.97} + 97%|███████████████████████████████████████████████████████████████████████████▌ | 660/681 [46:55<00:53, 2.53s/it] 97%|███████████████████████████████████████████████████████████████████████████▋ | 661/681 [46:57<00:48, 2.43s/it] {'loss': 1.2689, 'grad_norm': 8.182291030883789, 'learning_rate': 1.4511920567963908e-09, 'beta_dpo/gap_mean': 144.63906860351562, 'beta_dpo/gap_std': 161.95355224609375, 'beta_dpo/beta_used_raw': -1.1518099308013916, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2719656527042389, 'logits/rejected': -0.2467373013496399, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 147.99667358398438, 'beta_dpo/beta_margin_mean': 0.1479966789484024, 'beta_dpo/beta_margin_std': 0.1754070222377777, 'beta_dpo/beta_margin_grad_mean': -0.4634128510951996, 'beta_dpo/beta_margin_grad_std': 0.04297526925802231, 'epoch': 0.97} + 97%|███████████████████████████████████████████████████████████████████████████▋ | 661/681 [46:57<00:48, 2.43s/it] 97%|███████████████████████████████████████████████████████████████████████████▊ | 662/681 [47:00<00:48, 2.53s/it] {'loss': 1.2631, 'grad_norm': 10.364067077636719, 'learning_rate': 1.3163925091384532e-09, 'beta_dpo/gap_mean': 144.40728759765625, 'beta_dpo/gap_std': 164.30880737304688, 'beta_dpo/beta_used_raw': -0.6409615278244019, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.3215191066265106, 'logits/rejected': -0.2895079255104065, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 129.33206176757812, 'beta_dpo/beta_margin_mean': 0.1293320655822754, 'beta_dpo/beta_margin_std': 0.17222696542739868, 'beta_dpo/beta_margin_grad_mean': -0.4679609537124634, 'beta_dpo/beta_margin_grad_std': 0.04259883239865303, 'epoch': 0.97} + 97%|███████████████████████████████████████████████████████████████████████████▊ | 662/681 [47:00<00:48, 2.53s/it] 97%|███████████████████████████████████████████████████████████████████████████▉ | 663/681 [47:03<00:46, 2.60s/it] {'loss': 1.2757, 'grad_norm': 7.655603885650635, 'learning_rate': 1.1881460058152382e-09, 'beta_dpo/gap_mean': 142.96701049804688, 'beta_dpo/gap_std': 167.32403564453125, 'beta_dpo/beta_used_raw': -1.430047631263733, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.31214457750320435, 'logits/rejected': -0.310594379901886, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 137.47317504882812, 'beta_dpo/beta_margin_mean': 0.1374731808900833, 'beta_dpo/beta_margin_std': 0.1681915521621704, 'beta_dpo/beta_margin_grad_mean': -0.4659326374530792, 'beta_dpo/beta_margin_grad_std': 0.041490860283374786, 'epoch': 0.97} + 97%|███████████████████████████████████████████████████████████████████████████▉ | 663/681 [47:03<00:46, 2.60s/it] 98%|████████████████████████████████████████████████████████████████████████████ | 664/681 [47:05<00:44, 2.61s/it] {'loss': 22.277, 'grad_norm': 14736.9189453125, 'learning_rate': 1.066455926241383e-09, 'beta_dpo/gap_mean': 145.85546875, 'beta_dpo/gap_std': 171.21942138671875, 'beta_dpo/beta_used_raw': 0.9985529780387878, 'beta_dpo/beta_used': 1.081035852432251, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.23802334070205688, 'logits/rejected': -0.23446832597255707, 'beta_dpo/beta': 1.081035852432251, 'beta_dpo/loss_margin_mean': 160.23175048828125, 'beta_dpo/beta_margin_mean': 193.7392120361328, 'beta_dpo/beta_margin_std': 372.88427734375, 'beta_dpo/beta_margin_grad_mean': -0.3265109956264496, 'beta_dpo/beta_margin_grad_std': 0.31032606959342957, 'epoch': 0.98} + 98%|████████████████████████████████████████████████████████████████████████████ | 664/681 [47:05<00:44, 2.61s/it] 98%|████████████████████████████████████████████████████████████████████████████▏ | 665/681 [47:08<00:40, 2.54s/it] {'loss': 1.9778, 'grad_norm': 950.77587890625, 'learning_rate': 9.513254770636137e-10, 'beta_dpo/gap_mean': 143.3297882080078, 'beta_dpo/gap_std': 168.05531311035156, 'beta_dpo/beta_used_raw': 0.05960509926080704, 'beta_dpo/beta_used': 0.17351345717906952, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2172248661518097, 'logits/rejected': -0.18482929468154907, 'beta_dpo/beta': 0.17351345717906952, 'beta_dpo/loss_margin_mean': 132.63438415527344, 'beta_dpo/beta_margin_mean': 24.549057006835938, 'beta_dpo/beta_margin_std': 46.99803924560547, 'beta_dpo/beta_margin_grad_mean': -0.30906784534454346, 'beta_dpo/beta_margin_grad_std': 0.29436877369880676, 'epoch': 0.98} + 98%|████████████████████████████████████████████████████████████████████████████▏ | 665/681 [47:08<00:40, 2.54s/it] 98%|████████████████████████████████████████████████████████████████████████████▎ | 666/681 [47:10<00:39, 2.63s/it] {'loss': 1.2653, 'grad_norm': 10.848896026611328, 'learning_rate': 8.427576920763956e-10, 'beta_dpo/gap_mean': 144.62229919433594, 'beta_dpo/gap_std': 164.13558959960938, 'beta_dpo/beta_used_raw': -0.8510459661483765, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.26341164112091064, 'logits/rejected': -0.24032096564769745, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 145.38601684570312, 'beta_dpo/beta_margin_mean': 0.14538602530956268, 'beta_dpo/beta_margin_std': 0.14133024215698242, 'beta_dpo/beta_margin_grad_mean': -0.4638909697532654, 'beta_dpo/beta_margin_grad_std': 0.034985702484846115, 'epoch': 0.98} + 98%|████████████████████████████████████████████████████████████████████████████▎ | 666/681 [47:11<00:39, 2.63s/it] 98%|████████████████████████████████████████████████████████████████████████████▍ | 667/681 [47:13<00:37, 2.67s/it] {'loss': 3.6685, 'grad_norm': 7423.337890625, 'learning_rate': 7.407554321417764e-10, 'beta_dpo/gap_mean': 142.28619384765625, 'beta_dpo/gap_std': 162.02328491210938, 'beta_dpo/beta_used_raw': 0.0477980375289917, 'beta_dpo/beta_used': 0.555698573589325, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.24232017993927002, 'logits/rejected': -0.21042859554290771, 'beta_dpo/beta': 0.555698573589325, 'beta_dpo/loss_margin_mean': 135.94004821777344, 'beta_dpo/beta_margin_mean': 75.98949432373047, 'beta_dpo/beta_margin_std': 132.38754272460938, 'beta_dpo/beta_margin_grad_mean': -0.32695654034614563, 'beta_dpo/beta_margin_grad_std': 0.3104262053966522, 'epoch': 0.98} + 98%|████████████████████████████████████████████████████████████████████████████▍ | 667/681 [47:13<00:37, 2.67s/it] 98%|████████████████████████████████████████████████████████████████████████████▌ | 668/681 [47:16<00:35, 2.70s/it] {'loss': 1.2854, 'grad_norm': 15.746362686157227, 'learning_rate': 6.453213851142225e-10, 'beta_dpo/gap_mean': 135.5725555419922, 'beta_dpo/gap_std': 161.8687744140625, 'beta_dpo/beta_used_raw': -1.565541386604309, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.26321089267730713, 'logits/rejected': -0.2517067492008209, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 110.26235961914062, 'beta_dpo/beta_margin_mean': 0.11026235669851303, 'beta_dpo/beta_margin_std': 0.16938358545303345, 'beta_dpo/beta_margin_grad_mean': -0.47267022728919983, 'beta_dpo/beta_margin_grad_std': 0.04184536263346672, 'epoch': 0.98} + 98%|████████████████████████████████████████████████████████████████████████████▌ | 668/681 [47:16<00:35, 2.70s/it] 98%|████████████████████████████████████████████████████████████████████████████▋ | 669/681 [47:19<00:32, 2.71s/it] {'loss': 3.9008, 'grad_norm': 2606.953125, 'learning_rate': 5.564580657695939e-10, 'beta_dpo/gap_mean': 139.15911865234375, 'beta_dpo/gap_std': 162.9943084716797, 'beta_dpo/beta_used_raw': 0.24128052592277527, 'beta_dpo/beta_used': 0.49764859676361084, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.239346444606781, 'logits/rejected': -0.21844345331192017, 'beta_dpo/beta': 0.49764859676361084, 'beta_dpo/loss_margin_mean': 155.27737426757812, 'beta_dpo/beta_margin_mean': 76.75032043457031, 'beta_dpo/beta_margin_std': 137.6516876220703, 'beta_dpo/beta_margin_grad_mean': -0.2775057852268219, 'beta_dpo/beta_margin_grad_std': 0.27767181396484375, 'epoch': 0.98} + 98%|████████████████████████████████████████████████████████████████████████████▋ | 669/681 [47:19<00:32, 2.71s/it] 98%|████████████████████████████████████████████████████████████████████████████▋ | 670/681 [47:21<00:29, 2.67s/it] {'loss': 13.5793, 'grad_norm': 7477.4453125, 'learning_rate': 4.741678157389739e-10, 'beta_dpo/gap_mean': 141.39236450195312, 'beta_dpo/gap_std': 165.60235595703125, 'beta_dpo/beta_used_raw': -0.3109077215194702, 'beta_dpo/beta_used': 0.5937625169754028, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.25414931774139404, 'logits/rejected': -0.23977619409561157, 'beta_dpo/beta': 0.5937625169754028, 'beta_dpo/loss_margin_mean': 155.38189697265625, 'beta_dpo/beta_margin_mean': 102.75801086425781, 'beta_dpo/beta_margin_std': 171.8385009765625, 'beta_dpo/beta_margin_grad_mean': -0.32673099637031555, 'beta_dpo/beta_margin_grad_std': 0.3107914626598358, 'epoch': 0.98} + 98%|████████████████████████████████████████████████████████████████████████████▋ | 670/681 [47:21<00:29, 2.67s/it] 99%|████████████████████████████████████████████████████████████████████████████▊ | 671/681 [47:24<00:26, 2.61s/it] {'loss': 15.1475, 'grad_norm': 13217.642578125, 'learning_rate': 3.9845280344705245e-10, 'beta_dpo/gap_mean': 142.10791015625, 'beta_dpo/gap_std': 166.866943359375, 'beta_dpo/beta_used_raw': 1.3876622915267944, 'beta_dpo/beta_used': 1.3876622915267944, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.21202997863292694, 'logits/rejected': -0.20390699803829193, 'beta_dpo/beta': 1.3876622915267944, 'beta_dpo/loss_margin_mean': 143.68527221679688, 'beta_dpo/beta_margin_mean': 201.6892547607422, 'beta_dpo/beta_margin_std': 243.80215454101562, 'beta_dpo/beta_margin_grad_mean': -0.2369070202112198, 'beta_dpo/beta_margin_grad_std': 0.42259082198143005, 'epoch': 0.99} + 99%|████████████████████████████████████████████████████████████████████████████▊ | 671/681 [47:24<00:26, 2.61s/it] 99%|████████████████████████████████████████████████████████████████████████████▉ | 672/681 [47:26<00:23, 2.59s/it] {'loss': 1.2722, 'grad_norm': 10.910394668579102, 'learning_rate': 3.293150240547549e-10, 'beta_dpo/gap_mean': 139.9226531982422, 'beta_dpo/gap_std': 167.88650512695312, 'beta_dpo/beta_used_raw': -0.8151004910469055, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.24089352786540985, 'logits/rejected': -0.22517436742782593, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 121.51133728027344, 'beta_dpo/beta_margin_mean': 0.12151134014129639, 'beta_dpo/beta_margin_std': 0.1770341694355011, 'beta_dpo/beta_margin_grad_mean': -0.469896525144577, 'beta_dpo/beta_margin_grad_std': 0.043766915798187256, 'epoch': 0.99} + 99%|████████████████████████████████████████████████████████████████████████████▉ | 672/681 [47:26<00:23, 2.59s/it] 99%|█████████████████████████████████████████████████████████████████████████████ | 673/681 [47:29<00:19, 2.49s/it] {'loss': 7.6929, 'grad_norm': 4248.92431640625, 'learning_rate': 2.6675629940689504e-10, 'beta_dpo/gap_mean': 141.91412353515625, 'beta_dpo/gap_std': 166.22857666015625, 'beta_dpo/beta_used_raw': 0.13607317209243774, 'beta_dpo/beta_used': 0.39367401599884033, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.21331897377967834, 'logits/rejected': -0.20891378819942474, 'beta_dpo/beta': 0.39367401599884033, 'beta_dpo/loss_margin_mean': 155.3968048095703, 'beta_dpo/beta_margin_mean': 63.47161102294922, 'beta_dpo/beta_margin_std': 101.09577178955078, 'beta_dpo/beta_margin_grad_mean': -0.27898791432380676, 'beta_dpo/beta_margin_grad_std': 0.2772313356399536, 'epoch': 0.99} + 99%|█████████████████████████████████████████████████████████████████████████████ | 673/681 [47:29<00:19, 2.49s/it] 99%|█████████████████████████████████████████████████████████████████████████████▏| 674/681 [47:31<00:17, 2.55s/it] {'loss': 3.5724, 'grad_norm': 3347.8056640625, 'learning_rate': 2.1077827798404725e-10, 'beta_dpo/gap_mean': 145.38265991210938, 'beta_dpo/gap_std': 166.84365844726562, 'beta_dpo/beta_used_raw': 0.35867586731910706, 'beta_dpo/beta_used': 0.3700469732284546, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.22968342900276184, 'logits/rejected': -0.21133801341056824, 'beta_dpo/beta': 0.3700469732284546, 'beta_dpo/loss_margin_mean': 163.85891723632812, 'beta_dpo/beta_margin_mean': 60.167579650878906, 'beta_dpo/beta_margin_std': 115.83226776123047, 'beta_dpo/beta_margin_grad_mean': -0.31875723600387573, 'beta_dpo/beta_margin_grad_std': 0.2990269958972931, 'epoch': 0.99} + 99%|█████████████████████████████████████████████████████████████████████████████▏| 674/681 [47:31<00:17, 2.55s/it] 99%|█████████████████████████████████████████████████████████████████████████████▎| 675/681 [47:34<00:15, 2.52s/it] {'loss': 1.2649, 'grad_norm': 10.684988021850586, 'learning_rate': 1.6138243485910863e-10, 'beta_dpo/gap_mean': 149.49859619140625, 'beta_dpo/gap_std': 167.7472381591797, 'beta_dpo/beta_used_raw': -1.1393800973892212, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.2344612330198288, 'logits/rejected': -0.22431063652038574, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 168.26443481445312, 'beta_dpo/beta_margin_mean': 0.1682644486427307, 'beta_dpo/beta_margin_std': 0.17532816529273987, 'beta_dpo/beta_margin_grad_mean': -0.4584572911262512, 'beta_dpo/beta_margin_grad_std': 0.04273706302046776, 'epoch': 0.99} + 99%|█████████████████████████████████████████████████████████████████████████████▎| 675/681 [47:34<00:15, 2.52s/it] 99%|█████████████████████████████████████████████████████████████████████████████▍| 676/681 [47:36<00:12, 2.56s/it] {'loss': 5.0433, 'grad_norm': 3344.320068359375, 'learning_rate': 1.1857007165852472e-10, 'beta_dpo/gap_mean': 150.6968994140625, 'beta_dpo/gap_std': 166.34634399414062, 'beta_dpo/beta_used_raw': -0.8106540441513062, 'beta_dpo/beta_used': 0.3458569049835205, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.314957857131958, 'logits/rejected': -0.2842877507209778, 'beta_dpo/beta': 0.3458569049835205, 'beta_dpo/loss_margin_mean': 143.155029296875, 'beta_dpo/beta_margin_mean': 58.578914642333984, 'beta_dpo/beta_margin_std': 92.11776733398438, 'beta_dpo/beta_margin_grad_mean': -0.2846805453300476, 'beta_dpo/beta_margin_grad_std': 0.2793225646018982, 'epoch': 0.99} + 99%|█████████████████████████████████████████████████████████████████████████████▍| 676/681 [47:36<00:12, 2.56s/it] 99%|█████████████████████████████████████████████████████████████████████████████▌| 677/681 [47:39<00:09, 2.47s/it] {'loss': 12.5035, 'grad_norm': 9669.5361328125, 'learning_rate': 8.23423165278725e-11, 'beta_dpo/gap_mean': 149.2086181640625, 'beta_dpo/gap_std': 164.42991638183594, 'beta_dpo/beta_used_raw': 0.5463694334030151, 'beta_dpo/beta_used': 0.9840426445007324, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.22851765155792236, 'logits/rejected': -0.20020201802253723, 'beta_dpo/beta': 0.9840426445007324, 'beta_dpo/loss_margin_mean': 154.48655700683594, 'beta_dpo/beta_margin_mean': 163.275146484375, 'beta_dpo/beta_margin_std': 241.04299926757812, 'beta_dpo/beta_margin_grad_mean': -0.2947867214679718, 'beta_dpo/beta_margin_grad_std': 0.29029718041419983, 'epoch': 0.99} + 99%|█████████████████████████████████████████████████████████████████████████████▌| 677/681 [47:39<00:09, 2.47s/it] 100%|█████████████████████████████████████████████████████████████████████████████▋| 678/681 [47:41<00:07, 2.46s/it] {'loss': 5.5623, 'grad_norm': 6134.18310546875, 'learning_rate': 5.270012410216185e-11, 'beta_dpo/gap_mean': 150.82748413085938, 'beta_dpo/gap_std': 165.1314697265625, 'beta_dpo/beta_used_raw': 0.4158139228820801, 'beta_dpo/beta_used': 0.5137372016906738, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.24406661093235016, 'logits/rejected': -0.23352187871932983, 'beta_dpo/beta': 0.5137372016906738, 'beta_dpo/loss_margin_mean': 160.3917999267578, 'beta_dpo/beta_margin_mean': 91.00566101074219, 'beta_dpo/beta_margin_std': 150.59832763671875, 'beta_dpo/beta_margin_grad_mean': -0.33945244550704956, 'beta_dpo/beta_margin_grad_std': 0.3146733343601227, 'epoch': 1.0} + 100%|█████████████████████████████████████████████████████████████████████████████▋| 678/681 [47:41<00:07, 2.46s/it] 100%|█████████████████████████████████████████████████████████████████████████████▊| 679/681 [47:44<00:05, 2.55s/it] {'loss': 4.2081, 'grad_norm': 2949.92333984375, 'learning_rate': 2.9644275480772416e-11, 'beta_dpo/gap_mean': 149.608642578125, 'beta_dpo/gap_std': 166.2967529296875, 'beta_dpo/beta_used_raw': 0.4399394392967224, 'beta_dpo/beta_used': 0.4399394392967224, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.24739307165145874, 'logits/rejected': -0.2278253436088562, 'beta_dpo/beta': 0.4399394392967224, 'beta_dpo/loss_margin_mean': 132.75094604492188, 'beta_dpo/beta_margin_mean': 58.7913932800293, 'beta_dpo/beta_margin_std': 76.95616149902344, 'beta_dpo/beta_margin_grad_mean': -0.202021986246109, 'beta_dpo/beta_margin_grad_std': 0.3905799984931946, 'epoch': 1.0} + 100%|█████████████████████████████████████████████████████████████████████████████▊| 679/681 [47:44<00:05, 2.55s/it] 100%|█████████████████████████████████████████████████████████████████████████████▉| 680/681 [47:47<00:02, 2.63s/it] {'loss': 1.4902, 'grad_norm': 773.09716796875, 'learning_rate': 1.31753782067201e-11, 'beta_dpo/gap_mean': 149.79739379882812, 'beta_dpo/gap_std': 168.91465759277344, 'beta_dpo/beta_used_raw': -0.949596643447876, 'beta_dpo/beta_used': 0.16887128353118896, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.282899409532547, 'logits/rejected': -0.2579476833343506, 'beta_dpo/beta': 0.16887128353118896, 'beta_dpo/loss_margin_mean': 154.96934509277344, 'beta_dpo/beta_margin_mean': 24.72771453857422, 'beta_dpo/beta_margin_std': 45.65426254272461, 'beta_dpo/beta_margin_grad_mean': -0.2913900911808014, 'beta_dpo/beta_margin_grad_std': 0.28668370842933655, 'epoch': 1.0} + 100%|█████████████████████████████████████████████████████████████████████████████▉| 680/681 [47:47<00:02, 2.63s/it] 100%|██████████████████████████████████████████████████████████████████████████████| 681/681 [47:49<00:00, 2.61s/it] {'loss': 1.2798, 'grad_norm': 11.882765769958496, 'learning_rate': 3.2938662507808745e-12, 'beta_dpo/gap_mean': 145.9384002685547, 'beta_dpo/gap_std': 166.8389892578125, 'beta_dpo/beta_used_raw': -1.753014087677002, 'beta_dpo/beta_used': 0.0010000000474974513, 'beta_dpo/mask_keep_frac': 0.78125, 'logits/chosen': -0.26762282848358154, 'logits/rejected': -0.25434818863868713, 'beta_dpo/beta': 0.0010000000474974513, 'beta_dpo/loss_margin_mean': 134.85069274902344, 'beta_dpo/beta_margin_mean': 0.13485069572925568, 'beta_dpo/beta_margin_std': 0.17000959813594818, 'beta_dpo/beta_margin_grad_mean': -0.46664658188819885, 'beta_dpo/beta_margin_grad_std': 0.041838180273771286, 'epoch': 1.0} + 100%|██████████████████████████████████████████████████████████████████████████████| 681/681 [47:49<00:00, 2.61s/it][INFO|trainer.py:3984] 2026-04-18 00:11:37,099 >> Saving model checkpoint to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-681 +[INFO|configuration_utils.py:419] 2026-04-18 00:11:37,112 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-681/config.json +[INFO|configuration_utils.py:911] 2026-04-18 00:11:37,121 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-681/generation_config.json +[INFO|modeling_utils.py:3580] 2026-04-18 00:12:23,937 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-681/model.safetensors.index.json. +[INFO|tokenization_utils_base.py:2510] 2026-04-18 00:12:23,946 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-681/tokenizer_config.json +[INFO|tokenization_utils_base.py:2519] 2026-04-18 00:12:23,963 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-681/special_tokens_map.json +[INFO|trainer.py:4083] 2026-04-18 00:16:18,284 >> Deleting older checkpoint [/scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/checkpoint-400] due to args.save_total_limit +[INFO|trainer.py:2681] 2026-04-18 00:16:21,557 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + + {'train_runtime': 3177.7378, 'train_samples_per_second': 13.72, 'train_steps_per_second': 0.214, 'train_loss': 2.627565469291942, 'epoch': 1.0} + 100%|██████████████████████████████████████████████████████████████████████████████| 681/681 [52:49<00:00, 2.61s/it] 100%|██████████████████████████████████████████████████████████████████████████████| 681/681 [52:49<00:00, 4.65s/it] +***** train metrics ***** + epoch = 1.0 + total_flos = 0GF + train_loss = 2.6276 + train_runtime = 0:52:57.73 + train_samples = 43598 + train_samples_per_second = 13.72 + train_steps_per_second = 0.214 +2026-04-18 00:16:21 - INFO - __main__ - *** Training complete *** +2026-04-18 00:16:21 - INFO - __main__ - *** Save model *** +[INFO|configuration_utils.py:419] 2026-04-18 00:16:38,640 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/config.json +[INFO|configuration_utils.py:911] 2026-04-18 00:16:38,659 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/generation_config.json +[INFO|modeling_utils.py:3580] 2026-04-18 00:17:42,598 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/model.safetensors.index.json. +[INFO|tokenization_utils_base.py:2510] 2026-04-18 00:17:42,622 >> tokenizer config file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/tokenizer_config.json +[INFO|tokenization_utils_base.py:2519] 2026-04-18 00:17:42,638 >> Special tokens file saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/special_tokens_map.json +2026-04-18 00:17:42 - INFO - __main__ - Saved HF-compatible model artifacts to /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753 +[INFO|modelcard.py:450] 2026-04-18 00:17:42,897 >> Dropping the following result as it does not have all the necessary fields: +{'dataset': {'name': 'Anthropic/hh-rlhf', 'type': 'Anthropic/hh-rlhf'}} +[INFO|configuration_utils.py:419] 2026-04-18 00:17:42,938 >> Configuration saved in /scratch/feng.yulu/dynamic-dpo-v4/outputs/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753/config.json +2026-04-18 00:17:42 - INFO - __main__ - *** Evaluate *** +[INFO|trainer.py:4307] 2026-04-18 00:17:42,939 >> +***** Running Evaluation ***** +[INFO|trainer.py:4309] 2026-04-18 00:17:42,939 >> Num examples = 2339 +[INFO|trainer.py:4312] 2026-04-18 00:17:42,939 >> Batch size = 8 + 0%| | 0/73 [00:00