初始化项目,由ModelHub XC社区提供模型

Model: jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-14 07:01:43 +08:00
commit 73b3eb1f2d
20 changed files with 15924 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

78
README.md Normal file
View File

@@ -0,0 +1,78 @@
---
library_name: transformers
base_model: W-61/llama-3-8b-base-sft-hh-harmless-4xh200
tags:
- alignment-handbook
- new-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4
This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-hh-harmless-4xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-hh-harmless-4xh200) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5708
- Fcm Dpo/beta: 0.0068
- Margin Dpo/margin Mean: 57.1444
- Margin Dpo/margin Std: 97.9395
- Logps/chosen: -207.2700
- Logps/rejected: -269.1039
- Logps/ref Chosen: -74.8595
- Logps/ref Rejected: -79.5490
- Logits/chosen: 0.7066
- Logits/rejected: 0.6592
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
| 1.0189 | 0.3023 | 200 | 0.5583 | 0.0797 | 5.9909 | 10.1977 | -86.3897 | -97.0701 | -74.8595 | -79.5490 | 0.3147 | 0.2708 |
| 1.163 | 0.6047 | 400 | 0.5695 | 0.0083 | 45.5198 | 75.3415 | -175.3136 | -225.5229 | -74.8595 | -79.5490 | 0.7402 | 0.6880 |
| 1.1521 | 0.9070 | 600 | 0.5708 | 0.0068 | 57.1444 | 97.9395 | -207.2700 | -269.1039 | -74.8595 | -79.5490 | 0.7066 | 0.6592 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

23
all_results.json Normal file
View File

@@ -0,0 +1,23 @@
{
"epoch": 0.999244142101285,
"eval_fcm_dpo/beta": 0.006462239660322666,
"eval_logits/chosen": 0.7311930656433105,
"eval_logits/rejected": 0.6827455759048462,
"eval_logps/chosen": -208.29672241210938,
"eval_logps/ref_chosen": -74.85946655273438,
"eval_logps/ref_rejected": -79.54898834228516,
"eval_logps/rejected": -270.3067932128906,
"eval_loss": 0.5738745331764221,
"eval_margin_dpo/margin_mean": 57.32048416137695,
"eval_margin_dpo/margin_std": 98.36843872070312,
"eval_runtime": 37.9174,
"eval_samples": 2303,
"eval_samples_per_second": 60.737,
"eval_steps_per_second": 1.899,
"total_flos": 0.0,
"train_loss": 1.1320684536863204,
"train_runtime": 1756.8176,
"train_samples": 42336,
"train_samples_per_second": 24.098,
"train_steps_per_second": 0.376
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"vocab_size": 128256
}

17
eval_results.json Normal file
View File

@@ -0,0 +1,17 @@
{
"epoch": 0.999244142101285,
"eval_fcm_dpo/beta": 0.006462239660322666,
"eval_logits/chosen": 0.7311930656433105,
"eval_logits/rejected": 0.6827455759048462,
"eval_logps/chosen": -208.29672241210938,
"eval_logps/ref_chosen": -74.85946655273438,
"eval_logps/ref_rejected": -79.54898834228516,
"eval_logps/rejected": -270.3067932128906,
"eval_loss": 0.5738745331764221,
"eval_margin_dpo/margin_mean": 57.32048416137695,
"eval_margin_dpo/margin_std": 98.36843872070312,
"eval_runtime": 37.9174,
"eval_samples": 2303,
"eval_samples_per_second": 60.737,
"eval_steps_per_second": 1.899
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.51.0"
}

661
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,661 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0013527870178222656, "std": 0.2564818859100342, "min": -0.736083984375, "p10": -0.3432229995727539, "median": 0.038166046142578125, "p90": 0.29227676391601565, "max": 0.645111083984375, "pos_frac": 0.578125, "sample": [0.1120758056640625, 0.12518310546875, 0.31621551513671875, 0.13765716552734375, -0.12592506408691406, 0.23141098022460938, -0.21887779235839844, 0.21950721740722656, 0.04480743408203125, 0.020877838134765625, 0.0570220947265625, 0.058269500732421875, -0.4338226318359375, -0.030628204345703125, 0.645111083984375, -0.395477294921875, 0.09050941467285156, 0.0007190704345703125, -0.34615325927734375, 0.016077041625976562, -0.33638572692871094, 0.293853759765625, 0.17610931396484375, 0.22386932373046875, 0.21470260620117188, -0.08536529541015625, 0.0907745361328125, -0.03816986083984375, 0.39190101623535156, 0.16336441040039062, 0.08024787902832031, -0.031158447265625, 0.08477020263671875, 0.002460479736328125, -0.242034912109375, 0.07232666015625, -0.60186767578125, 0.20531463623046875, 0.155731201171875, -0.14299774169921875, -0.25698089599609375, 0.12331962585449219, -0.26497650146484375, 0.15140533447265625, -0.0920257568359375, -0.18599319458007812, 0.19028091430664062, 0.2496490478515625, 0.42162322998046875, 0.17873382568359375, -0.1525421142578125, -0.4972076416015625, 0.32010650634765625, -0.10365867614746094, -0.233795166015625, -0.19828224182128906, -0.4018898010253906, -0.13407135009765625, -0.09596633911132812, 0.031524658203125, 0.28859710693359375, -0.192962646484375, -0.736083984375, 0.3026123046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000001.npy"}
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.03744968771934509, "std": 0.2875921130180359, "min": -0.7604827880859375, "p10": -0.2812448501586914, "median": 0.03963661193847656, "p90": 0.3654294967651367, "max": 0.8134727478027344, "pos_frac": 0.5625, "sample": [0.30594635009765625, -0.24289894104003906, -0.11509323120117188, -0.13417816162109375, 0.06942558288574219, 0.36568641662597656, -0.14640045166015625, 0.1497650146484375, 0.30261993408203125, 0.10124588012695312, 0.13028717041015625, -0.0031890869140625, 0.0361480712890625, 0.5662612915039062, 0.09694290161132812, -0.01091766357421875, 0.1128997802734375, 0.0411834716796875, -0.21860504150390625, -0.1236419677734375, -0.08812713623046875, 0.10360527038574219, 0.1790008544921875, -0.5114288330078125, 0.3056755065917969, -0.14553451538085938, 0.28168487548828125, 0.26990509033203125, 0.1686878204345703, 0.038089752197265625, 0.19541168212890625, -0.10783576965332031, -0.2644004821777344, -0.19707489013671875, -0.140472412109375, 0.1349811553955078, 0.19672012329101562, -0.0714111328125, 0.53369140625, 0.1271820068359375, 0.8134727478027344, 0.2990264892578125, -0.7604827880859375, -0.08274078369140625, 0.05890846252441406, 0.029361724853515625, 0.4510040283203125, -0.1599273681640625, -0.29346656799316406, 0.10005569458007812, -0.27509117126464844, -0.1937713623046875, 0.19167327880859375, 0.28173065185546875, -0.09406471252441406, -0.3380699157714844, -0.29186248779296875, 0.36483001708984375, 0.009979248046875, 0.44391632080078125, -0.126708984375, -0.6550216674804688, 0.6160736083984375, -0.28388214111328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000002.npy"}
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": -0.027670294046401978, "std": 0.322698175907135, "min": -0.9874267578125, "p10": -0.3776283264160156, "median": -0.036754608154296875, "p90": 0.32611446380615244, "max": 0.864410400390625, "pos_frac": 0.453125, "sample": [0.046783447265625, -0.22136878967285156, -0.35160064697265625, -0.6313018798828125, -0.4466705322265625, 0.5540084838867188, -0.16051483154296875, 0.05480194091796875, 0.29473876953125, -0.028900146484375, -0.0974578857421875, 0.16973876953125, -0.0107574462890625, 0.16780471801757812, -0.16024017333984375, 0.03995513916015625, -0.6391448974609375, -0.184539794921875, 0.015779495239257812, -0.1351776123046875, -0.9874267578125, -0.0447845458984375, -0.23479461669921875, -0.12485122680664062, -0.2223968505859375, 0.29577064514160156, 0.5475540161132812, 0.864410400390625, -0.07752037048339844, 0.04608154296875, 0.11844253540039062, -0.32825469970703125, 0.242095947265625, -0.175750732421875, -0.2863025665283203, -0.2856788635253906, -0.37226104736328125, -0.15032958984375, 0.1902027130126953, 0.6794776916503906, -0.08673095703125, -0.19676971435546875, 0.1570892333984375, 0.4781341552734375, 0.09525299072265625, 0.15586471557617188, -0.137481689453125, 0.35128021240234375, 0.16535568237304688, -0.5679855346679688, -0.25269317626953125, 0.33911895751953125, 0.25131988525390625, 0.2628803253173828, -0.11470413208007812, 0.13034820556640625, 0.0344085693359375, -0.04460906982421875, -0.1464080810546875, -0.0200347900390625, -0.5379486083984375, 0.23628997802734375, -0.3799285888671875, 0.087432861328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000003.npy"}
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": 0.023561745882034302, "std": 0.30187320709228516, "min": -0.7515411376953125, "p10": -0.3348857879638672, "median": 0.06153583526611328, "p90": 0.35387821197509767, "max": 0.949371337890625, "pos_frac": 0.59375, "sample": [-0.2685699462890625, 0.36951446533203125, 0.24127197265625, 0.34613037109375, 0.21721649169921875, -0.33870697021484375, 0.12327766418457031, 0.35719871520996094, 0.2555999755859375, 0.02918243408203125, -0.17435073852539062, 0.10840034484863281, -0.253021240234375, -0.4595184326171875, -0.42950439453125, 0.405609130859375, -0.0446624755859375, -0.2422637939453125, 0.15355682373046875, -0.46897125244140625, -0.2153778076171875, 0.022960662841796875, 0.14693450927734375, -0.7515411376953125, 0.07630538940429688, 0.22447967529296875, 0.43206024169921875, 0.2518501281738281, -0.025371551513671875, 0.133209228515625, 0.949371337890625, 0.209869384765625, 0.12271881103515625, 0.1160430908203125, -0.27310943603515625, 0.002410888671875, 0.465789794921875, 0.20409774780273438, -0.298980712890625, -0.1625518798828125, -0.0419464111328125, 0.028228759765625, 0.294830322265625, -0.2617053985595703, -0.25739288330078125, 0.6027755737304688, -0.205963134765625, 0.07630348205566406, -0.13094329833984375, 0.2648735046386719, 0.1624908447265625, -0.0666351318359375, -0.06882667541503906, -0.3259696960449219, 0.25689697265625, 0.0467681884765625, 0.12274932861328125, 0.28680419921875, 0.21025466918945312, -0.6083316802978516, -0.5483016967773438, -0.0035552978515625, 0.11532974243164062, 0.000659942626953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000004.npy"}
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": -0.03837430477142334, "std": 0.3129001557826996, "min": -0.861541748046875, "p10": -0.4639053344726562, "median": -0.012578010559082031, "p90": 0.39830932617187503, "max": 0.550537109375, "pos_frac": 0.484375, "sample": [-0.3263435363769531, -0.1242218017578125, -0.08813095092773438, 0.411590576171875, 0.191070556640625, -0.434906005859375, 0.1630401611328125, 0.2364654541015625, 0.09989166259765625, 0.10230255126953125, 0.4005889892578125, -0.861541748046875, -0.050899505615234375, -0.08390426635742188, -0.41187286376953125, -0.14646148681640625, 0.13978958129882812, -0.02465057373046875, -0.42975616455078125, -0.036285400390625, -0.092193603515625, 0.16979217529296875, 0.016567230224609375, 0.175628662109375, -0.2607421875, 0.50299072265625, -0.48987579345703125, -0.0009326934814453125, 0.4316864013671875, 0.12384414672851562, -0.21154022216796875, -0.02422332763671875, -0.206298828125, 0.1860637664794922, 0.3929901123046875, -0.5152664184570312, -0.13303756713867188, 0.08219146728515625, 0.17136001586914062, -0.11474800109863281, -0.04344940185546875, -0.3484649658203125, -0.19715499877929688, 0.03501129150390625, 0.550537109375, -0.12249755859375, 0.3228607177734375, -0.548095703125, 0.21643829345703125, -0.3436431884765625, 6.866455078125e-05, -0.35691070556640625, 0.11964035034179688, 0.4053497314453125, -0.614166259765625, 0.4561347961425781, -0.8050422668457031, -0.16780853271484375, 0.0304412841796875, 0.07779884338378906, 0.0051937103271484375, -0.4763336181640625, 0.06329536437988281, 0.35482025146484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000005.npy"}
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": -0.055038899183273315, "std": 0.40691402554512024, "min": -1.463165283203125, "p10": -0.4826988220214843, "median": -0.037400245666503906, "p90": 0.38625755310058596, "max": 0.9407119750976562, "pos_frac": 0.4375, "sample": [-0.3682212829589844, 0.0016536712646484375, 0.0486907958984375, -0.21327972412109375, -0.18287277221679688, -0.20042037963867188, -0.06482124328613281, -0.000720977783203125, -0.13634490966796875, -0.21142959594726562, -0.17778587341308594, -0.321258544921875, -0.3228263854980469, 0.9407119750976562, 0.4423980712890625, -0.91558837890625, 0.25870513916015625, 0.9159774780273438, -0.07039260864257812, -0.122467041015625, -0.08911895751953125, -0.16912841796875, 0.27858734130859375, 0.3211402893066406, -0.31908416748046875, -0.30197906494140625, -0.2920989990234375, 0.368988037109375, 0.09454345703125, -0.1017913818359375, 0.10835456848144531, 0.05019187927246094, 0.5434417724609375, 0.2714729309082031, -1.0592803955078125, 0.02309417724609375, -0.1947174072265625, -0.00811767578125, -0.3171348571777344, 0.010013580322265625, 0.06200408935546875, 0.226715087890625, 0.2500495910644531, 0.12969207763671875, -0.39666748046875, -0.009979248046875, 0.7748870849609375, 0.1090087890625, -0.2175445556640625, 0.1648406982421875, 0.4270477294921875, 0.37799835205078125, -0.5195693969726562, -0.21515846252441406, -0.3101043701171875, 0.04244804382324219, -0.1602458953857422, -0.6002273559570312, -1.463165283203125, -0.6121978759765625, -0.5561981201171875, 0.3897972106933594, 0.07479476928710938, -0.0077991485595703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000006.npy"}
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": -0.017774999141693115, "std": 0.2240528017282486, "min": -0.5420074462890625, "p10": -0.28635101318359374, "median": 0.0030078887939453125, "p90": 0.24654731750488285, "max": 0.5261497497558594, "pos_frac": 0.515625, "sample": [0.018749237060546875, -0.3482856750488281, 0.1627197265625, 0.0060672760009765625, -0.14526939392089844, 0.5261497497558594, 0.10579299926757812, 0.0784454345703125, -0.0196380615234375, 0.5219078063964844, -0.20299148559570312, 0.1563587188720703, 0.0912017822265625, -0.19240570068359375, 0.03522682189941406, -0.16657447814941406, -0.10858535766601562, 0.06052398681640625, 0.30219268798828125, 0.075836181640625, -0.018665313720703125, -0.0506591796875, -0.1562347412109375, 0.0784759521484375, 0.2193756103515625, 0.2512016296386719, -0.2633857727050781, -0.02239227294921875, -0.317169189453125, 0.10187530517578125, -0.2793121337890625, 0.2574310302734375, 0.001125335693359375, 0.4721832275390625, -0.21588134765625, 0.01697540283203125, 0.13708877563476562, 0.19731903076171875, 0.144989013671875, 0.0050029754638671875, 0.11632537841796875, -0.25887489318847656, -0.10922622680664062, -0.28936767578125, -0.0858917236328125, -0.1364593505859375, -0.2124786376953125, -0.031106948852539062, -0.0970306396484375, 0.2799072265625, -0.18595504760742188, -0.5420074462890625, 0.19642257690429688, 0.00489044189453125, -0.195037841796875, 0.0329132080078125, -0.5274734497070312, -0.39208984375, 0.079376220703125, 0.235687255859375, -0.21434783935546875, 0.09201812744140625, -0.04999542236328125, -0.36456298828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000007.npy"}
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": 0.07266855239868164, "std": 0.33058449625968933, "min": -0.5345611572265625, "p10": -0.34346523284912106, "median": 0.08408546447753906, "p90": 0.5087875366210939, "max": 0.8812408447265625, "pos_frac": 0.59375, "sample": [0.1582489013671875, 0.8812408447265625, 0.794189453125, 0.12393951416015625, 0.089874267578125, 0.12805938720703125, 0.15743255615234375, 0.5272598266601562, -0.4341297149658203, 0.5173797607421875, -0.03509521484375, 0.3254241943359375, -0.0263671875, -0.055553436279296875, 0.34606170654296875, 0.8552780151367188, 0.2130889892578125, -0.2997474670410156, 0.21392059326171875, 0.649810791015625, -0.10268783569335938, -0.5345611572265625, -0.46240234375, 0.087799072265625, 0.20705795288085938, -0.3142414093017578, -0.17353439331054688, -0.30223846435546875, -0.08201980590820312, -0.11625480651855469, 0.4623222351074219, 0.170501708984375, -0.187896728515625, -0.08603286743164062, 0.12701416015625, -0.4357948303222656, 0.3076286315917969, 0.11534881591796875, 0.7444915771484375, 0.03461456298828125, 0.488739013671875, -0.020555496215820312, 0.15703201293945312, -0.3161468505859375, 0.21321487426757812, -0.405029296875, 0.22660064697265625, -0.11557388305664062, -0.35517311096191406, 0.0473480224609375, 0.03820991516113281, 0.24283409118652344, -0.5194015502929688, -0.2559394836425781, 0.2261962890625, -0.060764312744140625, -0.2061176300048828, -0.19551849365234375, 0.08037185668945312, 0.18632888793945312, 0.017719268798828125, 0.0070648193359375, 0.4798126220703125, 0.10010528564453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000008.npy"}
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.04694512486457825, "std": 0.3240334093570709, "min": -1.075286865234375, "p10": -0.32922859191894527, "median": 0.05302906036376953, "p90": 0.4370166778564454, "max": 0.67425537109375, "pos_frac": 0.609375, "sample": [-0.29772186279296875, -0.01080322265625, -0.026111602783203125, 0.488006591796875, 0.05372047424316406, -0.5920791625976562, 0.12845611572265625, -0.2679100036621094, 0.0032196044921875, -1.075286865234375, -0.02841949462890625, -0.08751296997070312, 0.38604736328125, 0.0057392120361328125, 0.052337646484375, -0.3427314758300781, -0.058349609375, 0.344757080078125, 0.15961074829101562, 0.30467987060546875, 0.67425537109375, 0.07626533508300781, 0.4945831298828125, 0.023468017578125, -0.2692070007324219, 0.3562469482421875, -0.044010162353515625, 0.44556427001953125, 0.18609046936035156, -0.2965965270996094, 0.19045257568359375, -0.3621864318847656, 0.10209274291992188, 0.2312774658203125, -0.013734817504882812, 0.03979682922363281, 0.5010337829589844, -0.07647705078125, -0.74041748046875, 0.38172149658203125, 0.350738525390625, -0.11522293090820312, 0.0303955078125, -0.2541351318359375, -0.015460968017578125, 0.18868255615234375, 0.18097686767578125, 0.15821456909179688, 0.05039787292480469, -0.06728363037109375, 0.5311622619628906, 0.2579498291015625, 0.08657646179199219, 0.2498626708984375, 0.4170722961425781, 0.6261749267578125, 0.05851554870605469, -0.441009521484375, -0.078125, 0.24349212646484375, 0.17291259765625, -0.574127197265625, -0.2595672607421875, 0.1664276123046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000009.npy"}
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": 0.01525232195854187, "std": 0.3680116534233093, "min": -0.89385986328125, "p10": -0.34790191650390623, "median": -0.008053779602050781, "p90": 0.44946136474609377, "max": 1.2289886474609375, "pos_frac": 0.484375, "sample": [-0.393951416015625, -3.814697265625e-05, 0.05810546875, 0.04958152770996094, 0.2644691467285156, -0.89385986328125, -0.14606475830078125, 0.10152435302734375, -0.19091796875, -0.1697845458984375, 0.0785675048828125, 0.3668937683105469, -0.29268646240234375, 0.052093505859375, -0.06463623046875, -0.07767868041992188, -0.11030197143554688, -0.47320556640625, 0.565826416015625, 0.450775146484375, 0.06462478637695312, 0.34528160095214844, 0.2425537109375, 0.3881378173828125, -0.2907142639160156, -0.4994354248046875, -0.3500213623046875, 0.051513671875, 0.11330032348632812, -0.3063812255859375, -0.049304962158203125, 0.824676513671875, 0.14358139038085938, 0.310638427734375, -0.03277587890625, 0.595703125, -0.19449234008789062, 0.3187694549560547, 0.3413047790527344, 0.029127120971679688, 0.12562179565429688, -0.05035209655761719, -0.2913055419921875, -0.1783447265625, -0.21746826171875, -0.11073684692382812, -0.694305419921875, -0.20661544799804688, 1.2289886474609375, -0.1757965087890625, -0.7978973388671875, 0.07631683349609375, -0.21542739868164062, 0.14493751525878906, 0.06579971313476562, -0.06180572509765625, 0.811126708984375, 0.044712066650390625, -0.139495849609375, 0.4463958740234375, -0.016069412231445312, -0.17859649658203125, 0.48862457275390625, -0.34295654296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000010.npy"}
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": 0.03665819764137268, "std": 0.35121646523475647, "min": -1.524627685546875, "p10": -0.21020431518554689, "median": -0.0035123825073242188, "p90": 0.39149131774902346, "max": 0.8540496826171875, "pos_frac": 0.5, "sample": [0.46743011474609375, -0.06538581848144531, 0.2397003173828125, -0.06377410888671875, -0.09469413757324219, 0.2856712341308594, -0.0665740966796875, -0.05939674377441406, -0.0416259765625, -0.9261322021484375, -0.05326080322265625, 0.3946685791015625, 0.19425201416015625, 0.1606922149658203, -0.064788818359375, 0.11275482177734375, 0.3847808837890625, -0.020294189453125, 0.0028533935546875, 0.0549774169921875, 0.23106002807617188, -0.399078369140625, 0.733734130859375, 0.0263519287109375, -0.2080841064453125, -0.451507568359375, -0.16610336303710938, -0.03795433044433594, 0.004177093505859375, 0.2578392028808594, -0.009878158569335938, 0.1423969268798828, 0.13578414916992188, 0.8540496826171875, 0.08848762512207031, 0.21981048583984375, 0.09893798828125, 0.80560302734375, 0.38393402099609375, -0.22609710693359375, -0.142913818359375, -0.1363067626953125, -0.35941314697265625, 0.1775360107421875, -0.01570892333984375, -1.524627685546875, 0.3943672180175781, -0.16058349609375, -0.137451171875, -0.04195404052734375, 0.23507308959960938, -0.186370849609375, -0.17132568359375, -0.024017333984375, -0.08303070068359375, 0.0606842041015625, -0.09091758728027344, 0.20756912231445312, -0.17812728881835938, 0.2338104248046875, -0.21111297607421875, 0.6088027954101562, 0.3076934814453125, 0.25913238525390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000011.npy"}
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": 0.04242032766342163, "std": 0.3406548500061035, "min": -0.5034446716308594, "p10": -0.2965644836425781, "median": -0.006520271301269531, "p90": 0.44374103546142596, "max": 1.303955078125, "pos_frac": 0.484375, "sample": [-0.04155158996582031, 0.05204010009765625, -0.009134292602539062, 0.3545265197753906, -0.11997032165527344, -0.047637939453125, 0.16925430297851562, -0.194061279296875, -0.029542922973632812, -0.2162322998046875, -0.30059051513671875, -0.058345794677734375, -0.24224853515625, 1.303955078125, 0.06989669799804688, -0.14096832275390625, 0.21587371826171875, -0.00390625, 0.5686721801757812, 0.2649192810058594, -0.09281539916992188, 0.07855224609375, -0.470489501953125, 0.26513671875, 0.058902740478515625, -0.07347869873046875, -0.28717041015625, -0.49842071533203125, 0.08116912841796875, 0.4649772644042969, -0.3173332214355469, -0.0485687255859375, -0.5034446716308594, -0.264923095703125, -0.12412643432617188, 0.12307167053222656, -0.2415771484375, 0.4997520446777344, -0.13927078247070312, -0.09079933166503906, 0.8114242553710938, -0.18625450134277344, -0.12286376953125, 0.058315277099609375, 0.20912551879882812, 0.3567066192626953, 0.06430816650390625, 0.00133514404296875, -0.26819610595703125, 0.48250579833984375, 0.117462158203125, 0.22597122192382812, 0.39418983459472656, -0.43145751953125, 1.1262969970703125, 0.36602020263671875, -0.1302032470703125, 0.02783203125, -0.4331645965576172, -0.20805740356445312, 0.164093017578125, 0.0379486083984375, 0.1271820068359375, -0.08971023559570312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000012.npy"}
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": -0.03977331519126892, "std": 0.30139175057411194, "min": -0.829742431640625, "p10": -0.4104103088378906, "median": -0.005412101745605469, "p90": 0.35236053466796885, "max": 0.5204734802246094, "pos_frac": 0.5, "sample": [-0.42974853515625, -0.02117156982421875, 0.15972900390625, 0.0034332275390625, 0.505126953125, -0.7554130554199219, -0.3602256774902344, -0.2783241271972656, -0.13507080078125, 0.0661468505859375, 0.03579139709472656, 0.01483154296875, 0.17401504516601562, -0.077972412109375, 0.4228935241699219, 0.3324432373046875, -0.33660888671875, -0.1552734375, 0.1817779541015625, 0.10791015625, -0.36528778076171875, -0.2571449279785156, -0.221160888671875, 0.4115753173828125, -0.16807174682617188, 0.04050445556640625, 0.08197021484375, -0.1472759246826172, -0.21498870849609375, -0.15838623046875, 0.1357860565185547, -0.5899887084960938, 0.5204734802246094, -0.55792236328125, 0.11527252197265625, 0.1387786865234375, 0.00969696044921875, -0.5635528564453125, 0.13715744018554688, -0.2761993408203125, 0.04762077331542969, 0.3608245849609375, 0.21913909912109375, -0.2051105499267578, -0.35195159912109375, -0.05017280578613281, -0.03571319580078125, -0.28717803955078125, -0.4713287353515625, -0.07627105712890625, 0.029666900634765625, 0.08950042724609375, -0.05086326599121094, -0.014257431030273438, 0.332611083984375, -0.2310028076171875, -0.829742431640625, 0.42169952392578125, 0.21408843994140625, 0.3322181701660156, 0.25907135009765625, 0.45941162109375, 0.012714385986328125, -0.24599266052246094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000013.npy"}
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": 0.03765037655830383, "std": 0.3069196045398712, "min": -0.5273971557617188, "p10": -0.28827476501464844, "median": 0.005507469177246094, "p90": 0.2977220535278321, "max": 1.2598114013671875, "pos_frac": 0.515625, "sample": [-0.09005355834960938, 0.051239013671875, -0.1417999267578125, 0.0484619140625, 0.15281105041503906, 0.18260574340820312, -0.16509246826171875, 0.30743408203125, 0.010829925537109375, -0.04467582702636719, -0.0515289306640625, 0.049671173095703125, 0.1336688995361328, -0.2632942199707031, -0.2261199951171875, -0.041164398193359375, 0.02535247802734375, -0.14032745361328125, 0.0770416259765625, 0.00536346435546875, -0.2464141845703125, -0.5010528564453125, 0.05378532409667969, -0.0310821533203125, 0.15396881103515625, -0.32562255859375, -0.40716552734375, 0.10724639892578125, 0.27506065368652344, 1.2598114013671875, 0.100982666015625, 1.056640625, 0.10390472412109375, 0.6278076171875, -0.15990066528320312, -0.1057281494140625, 0.4344329833984375, -0.5273971557617188, 0.1681652069091797, -0.0243377685546875, 0.658203125, 0.6268405914306641, -0.16526031494140625, 0.209564208984375, 0.026599884033203125, -0.0074596405029296875, -0.06545639038085938, -0.05962371826171875, 0.0056514739990234375, -0.023378372192382812, -0.298980712890625, -0.4377784729003906, 0.097808837890625, 0.0679168701171875, -0.01587677001953125, -0.009735107421875, -0.3279571533203125, -0.09083366394042969, -0.0587005615234375, 0.027967453002929688, 0.1845226287841797, 0.00731658935546875, -0.003902435302734375, 0.16864776611328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000014.npy"}
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": -0.01783338189125061, "std": 0.36869484186172485, "min": -1.0941009521484375, "p10": -0.46094207763671874, "median": 0.011754989624023438, "p90": 0.3139869689941407, "max": 1.2339553833007812, "pos_frac": 0.5, "sample": [0.29691314697265625, -0.1010894775390625, -0.09047126770019531, -0.002124786376953125, -0.2663383483886719, 0.11711883544921875, -1.0941009521484375, -0.4528350830078125, 0.369598388671875, -0.301727294921875, 0.50909423828125, 0.05803680419921875, 0.025634765625, 0.12934112548828125, 0.2147369384765625, 0.07164955139160156, 0.4622650146484375, -0.0751190185546875, -0.04464149475097656, -0.4215869903564453, 0.10166168212890625, -0.21106719970703125, -0.08166122436523438, -0.5236701965332031, -0.16498565673828125, 1.2339553833007812, 0.6105117797851562, -0.033355712890625, 0.3213043212890625, -0.37697601318359375, 0.17394256591796875, 0.046962738037109375, -0.6634521484375, -0.6594009399414062, 0.23714447021484375, 0.14867782592773438, 0.17362213134765625, -0.003936767578125, -0.033176422119140625, 0.0657501220703125, -0.094451904296875, 0.6194381713867188, -0.08962249755859375, -0.21483612060546875, 0.14466476440429688, -0.46441650390625, -0.16512298583984375, -0.23291778564453125, 0.152801513671875, -0.9440765380859375, -0.740570068359375, 0.1445484161376953, -0.1483306884765625, 0.09911346435546875, -0.19569015502929688, 0.1665802001953125, 0.1792144775390625, 0.10318756103515625, 0.21990203857421875, -0.028497695922851562, 0.282196044921875, -0.14594268798828125, 0.17763519287109375, 0.2676525115966797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000015.npy"}
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": 0.003320828080177307, "std": 0.3384479582309723, "min": -0.6446533203125, "p10": -0.3680244445800781, "median": -0.032579898834228516, "p90": 0.339632797241211, "max": 1.230865478515625, "pos_frac": 0.4375, "sample": [-0.33226585388183594, 0.21030426025390625, -0.3592071533203125, 0.3268394470214844, 0.18939590454101562, -0.2101287841796875, 0.06139183044433594, 0.250030517578125, 1.230865478515625, 0.037322998046875, -0.004302978515625, 0.26680755615234375, 0.32212066650390625, 0.0292510986328125, -0.053928375244140625, -0.0068759918212890625, 0.031009674072265625, -0.1092071533203125, -0.041110992431640625, -0.6446533203125, 0.30136871337890625, -0.3736228942871094, 0.1006622314453125, 0.6262893676757812, 0.08798408508300781, -0.5662994384765625, -0.1667327880859375, -0.19020843505859375, -0.3794670104980469, -0.32471466064453125, 0.2704925537109375, 0.3189849853515625, -0.11414337158203125, -0.09320831298828125, 0.34511566162109375, -0.3498992919921875, 0.37237548828125, -0.3137054443359375, -0.061534881591796875, -0.3039531707763672, -0.42471885681152344, -0.10081863403320312, -0.24521636962890625, -0.07233047485351562, 0.7023773193359375, -0.08172416687011719, 0.01340484619140625, -0.29868316650390625, 0.15999221801757812, 0.0048618316650390625, 0.27288055419921875, -0.09547615051269531, -0.024048805236816406, -0.37180328369140625, -0.41559600830078125, 0.07434463500976562, -0.27040863037109375, 0.721771240234375, -0.23834228515625, 0.015932083129882812, -0.0045070648193359375, -0.046031951904296875, 0.79986572265625, -0.24263381958007812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000016.npy"}
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": 0.00871431827545166, "std": 0.35241279006004333, "min": -0.8102836608886719, "p10": -0.3547260284423828, "median": -0.008581161499023438, "p90": 0.3361181259155275, "max": 1.470489501953125, "pos_frac": 0.46875, "sample": [0.0247955322265625, -0.020843505859375, 0.060115814208984375, -0.12514495849609375, -0.251739501953125, 0.2919578552246094, -0.4202117919921875, -0.0522003173828125, 0.20542335510253906, -0.40454864501953125, -0.684783935546875, -0.006298065185546875, -0.20744705200195312, 0.0168914794921875, -0.2597675323486328, -0.06604385375976562, -0.0414886474609375, -0.17816925048828125, -0.007030487060546875, -0.31638336181640625, 0.033538818359375, 0.2755775451660156, -0.450927734375, 0.23419952392578125, -0.13874053955078125, -0.3496818542480469, 0.0539398193359375, 0.022806167602539062, 0.15656089782714844, -0.23528671264648438, 0.30901336669921875, -0.243865966796875, 0.12262725830078125, 0.16020584106445312, 0.8981781005859375, 1.470489501953125, -0.09093475341796875, 0.5746192932128906, 0.1342926025390625, 0.501678466796875, -0.18023681640625, -0.17819595336914062, -0.026277542114257812, 0.17186355590820312, 0.117523193359375, 0.2447967529296875, 0.1727752685546875, -0.0101318359375, 0.3477344512939453, 0.655975341796875, -0.13162994384765625, -0.6944503784179688, -0.8102836608886719, -0.14117050170898438, -0.3568878173828125, 0.17711257934570312, -0.04456901550292969, 0.23004150390625, -0.2118072509765625, -0.0435791015625, 0.37421417236328125, 0.01802825927734375, 0.028104782104492188, -0.1466064453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000017.npy"}
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": -0.008108556270599365, "std": 0.27705371379852295, "min": -0.5421104431152344, "p10": -0.29163074493408203, "median": -0.03574180603027344, "p90": 0.3552059173583987, "max": 0.7672271728515625, "pos_frac": 0.4375, "sample": [0.08879852294921875, -0.13378524780273438, -0.18549728393554688, -0.036556243896484375, -0.2372875213623047, -0.5421104431152344, 0.13181686401367188, 0.26969146728515625, 0.6096878051757812, -0.0349273681640625, 0.286712646484375, -0.51251220703125, -0.11101341247558594, 0.13582611083984375, -0.10218048095703125, -0.2777862548828125, 0.18912124633789062, -0.20660400390625, 0.020755767822265625, -0.14129638671875, 0.42275238037109375, 0.7672271728515625, -0.174530029296875, 0.5524520874023438, -0.21227645874023438, -0.08037567138671875, -0.22574615478515625, 0.013217926025390625, 0.2941551208496094, 0.22461509704589844, 0.1256389617919922, 0.07611846923828125, -0.2700996398925781, -0.2326812744140625, -0.10255241394042969, -0.01848602294921875, 0.19386672973632812, 0.07853507995605469, 0.06293487548828125, -0.13409423828125, -0.08187103271484375, -0.48169708251953125, -0.27411651611328125, -0.337249755859375, -0.28590965270996094, 0.4165973663330078, 0.08732986450195312, 0.037616729736328125, -0.2940826416015625, -0.4748077392578125, -0.32427215576171875, -0.011951446533203125, -0.21540451049804688, -0.11411857604980469, 0.38137054443359375, -0.0082244873046875, -0.08015823364257812, -0.11793327331542969, 0.10683441162109375, 0.23767471313476562, 0.2539520263671875, -0.2320556640625, 0.6166267395019531, 0.105377197265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000018.npy"}
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": 0.04513297975063324, "std": 0.31944769620895386, "min": -1.2708740234375, "p10": -0.24649486541748045, "median": 0.02225494384765625, "p90": 0.3876319885253907, "max": 0.9148483276367188, "pos_frac": 0.515625, "sample": [0.68255615234375, -0.1275634765625, -0.23717689514160156, -0.507537841796875, 0.1508331298828125, 0.2813072204589844, -0.118438720703125, -0.08951568603515625, -0.183502197265625, 0.11301422119140625, 0.174041748046875, 0.369415283203125, 0.24051666259765625, 0.08562088012695312, -0.1618194580078125, -0.11005783081054688, 0.041339874267578125, -0.054279327392578125, 0.29239654541015625, 0.22471046447753906, -0.18503570556640625, -0.13326263427734375, -0.074859619140625, -0.1318359375, 0.5539703369140625, 0.3060302734375, -0.1344757080078125, -0.00734710693359375, -0.25048828125, -0.1521129608154297, 0.39543914794921875, 0.003170013427734375, 0.20917510986328125, 0.435638427734375, -1.2708740234375, 0.1620941162109375, -0.20853424072265625, -0.050571441650390625, 0.31061553955078125, 0.11199378967285156, -0.10368156433105469, -0.3188037872314453, 0.566192626953125, 0.3034858703613281, -0.21825408935546875, 0.21814537048339844, 0.090911865234375, -0.0673980712890625, -0.04594612121582031, 0.28905487060546875, 0.48320770263671875, 0.1963214874267578, -0.3339729309082031, -0.03544330596923828, -0.13177490234375, 0.3262977600097656, -0.45484161376953125, 0.10869598388671875, -0.05196189880371094, -0.31771087646484375, 0.0454254150390625, 0.1395263671875, 0.9148483276367188, 0.33159637451171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000019.npy"}
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": -0.009545668959617615, "std": 0.2801485061645508, "min": -0.81488037109375, "p10": -0.3431053161621094, "median": -0.005894184112548828, "p90": 0.3039026260375977, "max": 0.9271087646484375, "pos_frac": 0.484375, "sample": [-0.220977783203125, -0.1798858642578125, -0.3690032958984375, -0.4846153259277344, 0.11968994140625, -0.14386749267578125, 0.0080413818359375, -0.05426025390625, 0.14044952392578125, -0.345245361328125, 0.18365478515625, 0.17922592163085938, -0.2523651123046875, 0.05066680908203125, -0.0566253662109375, -0.3597526550292969, -0.33811187744140625, -0.12380790710449219, 0.010868072509765625, -0.23963165283203125, 0.06572723388671875, -0.03222084045410156, -0.17803382873535156, 0.0338134765625, -0.08151435852050781, 0.05194091796875, 0.3074016571044922, -0.3116493225097656, 0.23390960693359375, -0.0007333755493164062, -0.153228759765625, -0.335174560546875, -0.0697174072265625, 0.223236083984375, -0.81488037109375, 0.34989166259765625, -0.030120849609375, 0.127044677734375, 0.65802001953125, 0.08997344970703125, -0.11279296875, -0.26611328125, 0.34656524658203125, -0.4117279052734375, 0.08929443359375, 0.9271087646484375, 0.29573822021484375, -0.16211700439453125, 0.3898506164550781, 0.033416748046875, -0.39928436279296875, -0.192474365234375, 0.5302467346191406, 0.20825958251953125, -0.01195526123046875, 0.09334182739257812, 0.058338165283203125, -0.01105499267578125, 0.23244476318359375, 0.21392059326171875, -0.029077529907226562, 0.133209228515625, -0.243011474609375, 0.018819808959960938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000020.npy"}
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": 0.004698842763900757, "std": 0.36607056856155396, "min": -0.9580535888671875, "p10": -0.33912696838378903, "median": -0.06434440612792969, "p90": 0.5297363281250003, "max": 1.1309432983398438, "pos_frac": 0.40625, "sample": [0.18019676208496094, -0.14322662353515625, 0.055545806884765625, -0.4506683349609375, 0.9453353881835938, 0.45001220703125, 0.14050865173339844, -0.17824935913085938, -0.11762809753417969, -0.13483619689941406, -0.09111785888671875, -0.25012969970703125, 0.2323455810546875, 0.1729888916015625, -0.05590057373046875, 0.56390380859375, -0.6243743896484375, 0.6424179077148438, -0.11534881591796875, -0.11406326293945312, -0.5851287841796875, 0.10016059875488281, 0.15658187866210938, -0.05068206787109375, -0.0820770263671875, -0.9580535888671875, -0.08719253540039062, 1.1309432983398438, 0.6366043090820312, 0.03798484802246094, -0.07867431640625, 0.12202644348144531, -0.200103759765625, -0.06638145446777344, -0.2389678955078125, 0.6276531219482422, 0.8017120361328125, 0.41803741455078125, -0.19547271728515625, -0.2879676818847656, -0.07332992553710938, -0.10338592529296875, 0.16851806640625, 0.28064537048339844, 0.0806121826171875, -0.17428207397460938, -0.3100128173828125, -0.08180999755859375, -0.05942535400390625, -0.06541252136230469, -0.053020477294921875, -0.46063232421875, 0.18091583251953125, -0.120513916015625, -0.06327629089355469, -0.13584136962890625, 0.03031158447265625, -0.04339599609375, 0.1584930419921875, 0.0122833251953125, -0.24199676513671875, -0.3516044616699219, 0.18167877197265625, -0.7635059356689453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000021.npy"}
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": 0.031166434288024902, "std": 0.2844938039779663, "min": -0.8590850830078125, "p10": -0.32732696533203126, "median": 0.03526592254638672, "p90": 0.3818088531494141, "max": 0.7165374755859375, "pos_frac": 0.609375, "sample": [-0.49816131591796875, -0.5889663696289062, -0.14412689208984375, 0.3438072204589844, 0.3849067687988281, 0.44503021240234375, -0.023494720458984375, 0.01251220703125, 0.011289596557617188, 0.07738304138183594, 0.1632232666015625, -0.29742431640625, 0.055065155029296875, 0.01422119140625, 0.258148193359375, 0.02256011962890625, -0.3618011474609375, -0.1124420166015625, -0.34143829345703125, -0.32877349853515625, 0.37458038330078125, 0.1285400390625, 0.03479766845703125, -0.05198478698730469, -0.11535453796386719, 0.4623260498046875, 0.1341991424560547, -0.16085052490234375, -0.10584068298339844, 0.03573417663574219, -0.8590850830078125, -0.46612548828125, -0.18890762329101562, 0.15459823608398438, 0.1023101806640625, -0.15325927734375, -0.13184738159179688, 0.1318511962890625, -0.32395172119140625, 0.06494140625, 0.3527870178222656, -0.2394256591796875, 0.7165374755859375, 0.10870361328125, 0.010986328125, 0.42978668212890625, 0.0677032470703125, 0.144622802734375, 0.2059478759765625, 0.320953369140625, 0.012516021728515625, 0.571075439453125, -0.09157943725585938, 0.5070419311523438, -0.012298583984375, 0.0391998291015625, 0.11401557922363281, 0.23172378540039062, 0.2550048828125, -0.13962173461914062, 0.312530517578125, -0.18256378173828125, 0.1526031494140625, -0.051788330078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000022.npy"}
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": 0.06387221813201904, "std": 0.36412501335144043, "min": -0.81903076171875, "p10": -0.3677883148193359, "median": 0.057666778564453125, "p90": 0.508061981201172, "max": 1.2723007202148438, "pos_frac": 0.53125, "sample": [-0.5405120849609375, -0.599578857421875, -0.27289581298828125, 0.120330810546875, 0.13704299926757812, 0.10870742797851562, -0.335784912109375, 0.078033447265625, 0.27335357666015625, 0.58380126953125, 0.11283111572265625, -0.013057708740234375, 0.3112640380859375, -0.10695838928222656, 0.08017349243164062, -0.279388427734375, -0.46539306640625, 0.3972816467285156, 0.03730010986328125, 0.333648681640625, -0.23278045654296875, -0.03440093994140625, -0.0007495880126953125, -0.04170989990234375, -0.14728164672851562, 1.1084747314453125, -0.1936187744140625, -0.049591064453125, 1.2723007202148438, -0.08160400390625, -0.81903076171875, 0.2373809814453125, 0.14453887939453125, 0.4510173797607422, 0.7296943664550781, 0.031101226806640625, -0.4688262939453125, -0.412689208984375, 0.1576385498046875, -0.3815040588378906, -0.0311279296875, 0.0793914794921875, -0.1680908203125, -0.005706787109375, -0.06050872802734375, 0.595733642578125, -0.14046859741210938, 0.5324974060058594, 0.30031585693359375, -0.169830322265625, 0.1209869384765625, -0.23853683471679688, -0.15569686889648438, 0.08386802673339844, -0.07404708862304688, 0.11795234680175781, 0.47296905517578125, -0.050579071044921875, 0.134033203125, 0.523101806640625, 0.38047027587890625, 0.21877670288085938, 0.23760032653808594, 0.156158447265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000023.npy"}
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": -0.09060648083686829, "std": 0.2680717706680298, "min": -0.5941505432128906, "p10": -0.4525730133056641, "median": -0.08649826049804688, "p90": 0.24691066741943363, "max": 0.5813751220703125, "pos_frac": 0.359375, "sample": [-0.24442672729492188, 0.06917953491210938, -0.19195556640625, 0.08857345581054688, 0.4801788330078125, -0.4021492004394531, 0.1493072509765625, -0.50567626953125, -0.4417076110839844, -0.4572296142578125, -0.11577033996582031, -0.31392478942871094, 0.07757949829101562, 0.368743896484375, 0.5813751220703125, 0.2515106201171875, -0.008884429931640625, -0.11207771301269531, -0.25738525390625, -0.02782440185546875, -0.08501434326171875, 0.43288421630859375, 0.1275634765625, -0.4110908508300781, 0.36858367919921875, -0.4132728576660156, 0.0485687255859375, -0.5941505432128906, -0.5579833984375, -0.4739227294921875, 0.2361774444580078, -0.1761608123779297, -0.33983612060546875, -0.087982177734375, -0.2773284912109375, 0.0285186767578125, -0.5153770446777344, 0.23218536376953125, -0.272735595703125, -0.28990936279296875, -0.0198211669921875, -0.006011962890625, 0.16135025024414062, -0.05168914794921875, -0.072784423828125, 0.068572998046875, -0.16027069091796875, -0.2720489501953125, -0.11118125915527344, -0.261962890625, -0.29292869567871094, 0.03208160400390625, 0.26380157470703125, 0.06063079833984375, -0.4974517822265625, 0.06655120849609375, -0.017894744873046875, 0.0537109375, -0.163543701171875, -0.13628005981445312, 0.14708709716796875, -0.01313018798828125, -0.35218048095703125, -0.19057464599609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000024.npy"}
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": 0.013779401779174805, "std": 0.290467232465744, "min": -0.7797470092773438, "p10": -0.299188232421875, "median": 0.01091766357421875, "p90": 0.33541526794433607, "max": 0.8313140869140625, "pos_frac": 0.515625, "sample": [0.0789947509765625, -0.3176422119140625, -0.19440269470214844, 0.2560577392578125, 0.16628265380859375, 0.06212806701660156, 0.8313140869140625, -0.008565902709960938, 0.1879730224609375, -0.059230804443359375, -0.55352783203125, -0.0071926116943359375, -0.556549072265625, 0.1381988525390625, 0.04833984375, 0.23694992065429688, -0.7797470092773438, 0.13891983032226562, 0.02672576904296875, -0.03234100341796875, -0.24208831787109375, 0.015367507934570312, 0.1780548095703125, 0.7815704345703125, 0.3497810363769531, -0.06005287170410156, -0.1173553466796875, 0.462982177734375, -0.08081436157226562, -0.17670059204101562, 0.00836944580078125, 0.06375885009765625, 0.18384552001953125, 0.14150238037109375, 0.53643798828125, -0.12639617919921875, -0.02557373046875, -0.22109603881835938, 0.17395591735839844, -0.1839141845703125, 0.0988311767578125, -0.17639541625976562, 0.223388671875, -0.010894775390625, -0.00897979736328125, 0.01346588134765625, 0.401611328125, -0.25388336181640625, 0.23659515380859375, -0.5454254150390625, -0.024799346923828125, 0.023105621337890625, -0.305633544921875, -0.054927825927734375, -0.24558258056640625, 0.09124374389648438, 0.3018951416015625, 0.28412628173828125, 0.25209808349609375, -0.25089263916015625, -0.36664581298828125, 0.3843994140625, -0.2249889373779297, -0.284149169921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000025.npy"}
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": 0.047515541315078735, "std": 0.3367118835449219, "min": -0.6785125732421875, "p10": -0.3442237854003906, "median": 0.03556060791015625, "p90": 0.437755584716797, "max": 1.42486572265625, "pos_frac": 0.53125, "sample": [0.10715103149414062, -0.1619415283203125, -0.2037200927734375, 0.27176666259765625, 0.0499420166015625, 0.15708351135253906, -0.458343505859375, -0.44850921630859375, -0.27402496337890625, 0.18671035766601562, -0.22732925415039062, 1.42486572265625, -0.0301666259765625, -0.5659866333007812, -0.6785125732421875, 0.33284759521484375, 0.0211029052734375, 0.2609882354736328, 0.6672821044921875, 0.23744964599609375, 0.5572662353515625, -0.0191497802734375, -0.33443450927734375, -0.11338233947753906, 0.311553955078125, -0.2633514404296875, 0.132080078125, 0.0297393798828125, 0.11230850219726562, -0.07486724853515625, -0.07004165649414062, 0.8012580871582031, -0.23142242431640625, 0.15799331665039062, -0.06438446044921875, 0.4093170166015625, 0.047698974609375, 0.06340789794921875, 0.1394805908203125, 0.3477325439453125, -0.25867652893066406, -0.10297393798828125, -0.10753250122070312, -0.07455825805664062, 0.4649810791015625, -0.348419189453125, 0.147430419921875, 0.12090873718261719, -0.0191650390625, -0.08624267578125, 0.5388107299804688, 0.0413818359375, 0.2967071533203125, 0.44994354248046875, -0.09916877746582031, -0.0225830078125, -0.4250335693359375, -0.09346389770507812, 0.12355804443359375, 0.12494850158691406, -0.39519500732421875, 0.1349468231201172, 0.074554443359375, -0.05162239074707031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000026.npy"}
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": -0.012165874242782593, "std": 0.32004043459892273, "min": -0.9730377197265625, "p10": -0.35468025207519527, "median": 0.0058727264404296875, "p90": 0.36933593750000016, "max": 0.646697998046875, "pos_frac": 0.515625, "sample": [-0.20418167114257812, 0.1224212646484375, -0.14281463623046875, 0.011501312255859375, 0.26507568359375, 0.1304931640625, 0.11534500122070312, -0.43843841552734375, 0.0919036865234375, -0.9730377197265625, -0.073272705078125, -0.8027801513671875, 0.5043430328369141, 0.025981903076171875, -0.22511672973632812, -0.14849090576171875, 0.30135536193847656, -0.19029998779296875, -0.6429672241210938, -0.21645355224609375, 0.4002532958984375, 0.033721923828125, -0.56695556640625, 0.094024658203125, 0.23348236083984375, 0.3277435302734375, -0.05681610107421875, -0.034389495849609375, 0.2950286865234375, -0.2068462371826172, -0.0826568603515625, 0.1604442596435547, -0.11384010314941406, -0.1199798583984375, 0.2952136993408203, -0.8375244140625, 0.3871612548828125, 0.646697998046875, 0.0377044677734375, 0.3210563659667969, 0.39316558837890625, -0.38327789306640625, -0.11180686950683594, -0.06708908081054688, -0.2879524230957031, 0.18296051025390625, -0.2066802978515625, 0.000244140625, 0.09426498413085938, -0.027141571044921875, 0.09110641479492188, -0.20271682739257812, -0.03220367431640625, -0.2332305908203125, 0.17307090759277344, 0.029710769653320312, 0.576202392578125, -0.07087326049804688, 0.067962646484375, -0.035167694091796875, 0.055263519287109375, -0.28057861328125, 0.4861869812011719, 0.2858734130859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000027.npy"}
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": -0.051616936922073364, "std": 0.2624434232711792, "min": -0.6728744506835938, "p10": -0.39627380371093746, "median": -0.06497478485107422, "p90": 0.3117889404296876, "max": 0.5880928039550781, "pos_frac": 0.421875, "sample": [-0.18291473388671875, -0.495635986328125, -0.126922607421875, -0.20458221435546875, -0.1634368896484375, 0.077117919921875, -0.1159210205078125, -0.18820953369140625, 0.1362934112548828, -0.36688232421875, -0.2738075256347656, 0.01490020751953125, -0.05390167236328125, -0.40410614013671875, -0.060577392578125, 0.23690414428710938, 0.134521484375, -0.4052886962890625, -0.08066940307617188, 0.2687339782714844, 0.045318603515625, -0.4868507385253906, 0.07001304626464844, -0.0968017578125, 0.22092628479003906, 0.054607391357421875, -0.22908401489257812, -0.05515289306640625, -0.3656501770019531, -0.26788330078125, -0.20545196533203125, -0.22540664672851562, 0.3217201232910156, 0.19303131103515625, 0.34886932373046875, 0.0565338134765625, -0.13025665283203125, -0.5887832641601562, 0.053203582763671875, -0.4899139404296875, -0.07779121398925781, -0.37799835205078125, -0.2608489990234375, 0.43821144104003906, 0.5880928039550781, -0.08197784423828125, 0.27775001525878906, -0.12799072265625, -0.1695384979248047, 0.3369636535644531, 0.16469573974609375, 0.36685943603515625, 0.10640716552734375, -0.00103759765625, 0.34710693359375, 0.0284423828125, -0.06937217712402344, 0.00839996337890625, -0.6728744506835938, -0.0102996826171875, -0.1348419189453125, 0.2886161804199219, 0.010374069213867188, -0.2494354248046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000028.npy"}
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": 0.002046048641204834, "std": 0.28664064407348633, "min": -0.6049346923828125, "p10": -0.3771804809570312, "median": 0.00034236907958984375, "p90": 0.44926185607910163, "max": 0.695068359375, "pos_frac": 0.5, "sample": [0.09043502807617188, -0.20843124389648438, 0.4623241424560547, -0.3905792236328125, 0.5044708251953125, -0.10173988342285156, 0.1611480712890625, -0.05863761901855469, -0.47090911865234375, 0.14398193359375, 0.03270149230957031, 0.4325218200683594, 0.4666748046875, -0.3044776916503906, 0.3406639099121094, 0.11580657958984375, -0.6049346923828125, -0.5359992980957031, -0.1509857177734375, 0.19024658203125, 0.0058498382568359375, 0.1446990966796875, -0.1222076416015625, -0.31730079650878906, -0.5254745483398438, -0.1170196533203125, -0.176971435546875, -0.4117546081542969, -0.2512054443359375, 0.00695037841796875, -0.017070770263671875, -0.1922607421875, -0.39337158203125, 0.1681365966796875, -0.2447509765625, 0.10949134826660156, 0.00628662109375, 0.09047317504882812, 0.07569503784179688, -0.345916748046875, -0.1496105194091797, -0.10970687866210938, -0.006824493408203125, 0.3523216247558594, 0.0020427703857421875, -0.199127197265625, 0.3294219970703125, 0.695068359375, -0.11336135864257812, 0.19685745239257812, 0.4589080810546875, 0.03186607360839844, 0.6157302856445312, 0.07092666625976562, -0.16017913818359375, 0.030536651611328125, -0.10326385498046875, -0.05487060546875, -0.10706329345703125, 0.4564361572265625, -0.146514892578125, 0.08843612670898438, -0.0013580322265625, 0.34771728515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000029.npy"}
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": -0.010556995868682861, "std": 0.3330932557582855, "min": -0.913055419921875, "p10": -0.38639030456542967, "median": -0.05343914031982422, "p90": 0.39983673095703137, "max": 1.0400390625, "pos_frac": 0.421875, "sample": [-0.23824691772460938, -0.0045566558837890625, -0.2248687744140625, -0.43675994873046875, -0.09403800964355469, -0.044036865234375, 0.07750320434570312, 0.167999267578125, -0.8815231323242188, -0.41941070556640625, -0.227142333984375, 0.25453948974609375, -0.050506591796875, 0.15309906005859375, -0.13332366943359375, 0.135284423828125, -0.1294708251953125, 0.41144561767578125, -0.153350830078125, 0.37274932861328125, -0.33177947998046875, 0.01760101318359375, -0.030103683471679688, 0.1374835968017578, -0.02484130859375, -0.2811279296875, -0.19259262084960938, -0.3970985412597656, 0.26092529296875, -0.09711456298828125, -0.0734405517578125, -0.05637168884277344, -0.08829116821289062, 0.44841766357421875, 0.63818359375, -0.12298583984375, -0.6378707885742188, -0.05670166015625, 0.539764404296875, -0.3614044189453125, -0.05719947814941406, 0.0523681640625, 1.0400390625, 0.052093505859375, -0.20906829833984375, 0.27837371826171875, -0.913055419921875, -0.056613922119140625, 0.15004920959472656, 0.19964599609375, -0.5315399169921875, 0.248504638671875, 0.4737739562988281, 0.19829559326171875, -0.09137725830078125, -0.07893943786621094, 0.09585952758789062, 0.36574554443359375, -0.26116943359375, 0.07789230346679688, 0.35406494140625, -0.21017074584960938, -0.12174224853515625, 0.4424858093261719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000030.npy"}
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": -0.021574944257736206, "std": 0.34197261929512024, "min": -0.97735595703125, "p10": -0.41938209533691406, "median": -0.02232837677001953, "p90": 0.40250415802001965, "max": 0.7290458679199219, "pos_frac": 0.484375, "sample": [0.4887847900390625, -0.307464599609375, -0.686859130859375, 0.3790149688720703, -0.13655662536621094, -0.25054931640625, -0.3063201904296875, 0.2962532043457031, 0.6189804077148438, 0.11788177490234375, 0.015447616577148438, 0.056640625, 0.7290458679199219, 0.720367431640625, 0.110809326171875, 0.261627197265625, 0.0921478271484375, 0.36614990234375, 0.00787353515625, -0.4581108093261719, 0.2261962890625, -0.18303680419921875, -0.14590072631835938, -0.3210620880126953, 0.03191947937011719, -0.4222259521484375, 0.6904373168945312, -0.01715850830078125, -0.03546142578125, 0.009185791015625, 0.0046539306640625, 0.07767105102539062, -0.47617340087890625, -0.3943367004394531, 0.06818580627441406, -0.4127464294433594, -0.1378631591796875, -0.2631721496582031, -0.052410125732421875, 0.05155372619628906, -0.05637359619140625, -0.6933670043945312, -0.3186531066894531, 0.6129150390625, -0.21923828125, -0.12198638916015625, -0.027498245239257812, -0.15838050842285156, 0.4125709533691406, 0.20494842529296875, -0.97735595703125, -0.19720458984375, -0.62286376953125, -0.07345962524414062, -0.091644287109375, 0.0556182861328125, 0.2210674285888672, -0.2078857421875, -0.12010955810546875, 0.1876373291015625, 0.2432861328125, 0.025156021118164062, -0.044559478759765625, 0.17316436767578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000031.npy"}
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.031954437494277954, "std": 0.35147157311439514, "min": -0.73297119140625, "p10": -0.4061000823974609, "median": 0.03265380859375, "p90": 0.5025215148925782, "max": 1.06097412109375, "pos_frac": 0.5625, "sample": [-0.39263916015625, 0.10500335693359375, -0.327301025390625, 0.12249755859375, -0.6006317138671875, 0.0644073486328125, 0.232757568359375, -0.647552490234375, 1.06097412109375, 0.5868988037109375, 0.6935272216796875, 0.7029647827148438, -0.2612876892089844, 0.01349639892578125, 0.48091888427734375, 0.80767822265625, -0.73297119140625, 0.25077056884765625, -0.16569900512695312, 0.16329574584960938, 0.1840038299560547, 0.0400848388671875, 0.11589813232421875, 0.28485107421875, 0.51177978515625, -0.1015777587890625, 0.2059478759765625, 0.19110107421875, -0.0599212646484375, 0.07677459716796875, -0.25040626525878906, 0.6707077026367188, 0.1580371856689453, 0.00791168212890625, -0.0522613525390625, 0.14208984375, -0.5898361206054688, -0.07342147827148438, -0.465789794921875, -0.0747222900390625, 0.19802284240722656, 0.240234375, -0.4118690490722656, -0.258880615234375, -0.061092376708984375, -0.23023223876953125, 0.04565620422363281, -0.3537025451660156, -0.45314788818359375, -0.06989860534667969, 0.0252227783203125, 0.12171554565429688, -0.1486530303955078, -0.08489799499511719, -0.058101654052734375, -0.09348869323730469, 0.05042266845703125, 0.3420677185058594, 0.013702392578125, 0.2948112487792969, 0.07000732421875, 0.22850799560546875, -0.324737548828125, -0.1149444580078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000032.npy"}
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": 0.005411058664321899, "std": 0.3495139479637146, "min": -1.4130020141601562, "p10": -0.31039752960205075, "median": 0.03458595275878906, "p90": 0.33769760131835935, "max": 0.913665771484375, "pos_frac": 0.5625, "sample": [-0.17742919921875, -0.653106689453125, 0.15328216552734375, -0.2654571533203125, -0.027704238891601562, -0.1136016845703125, -0.08586883544921875, 0.10662651062011719, 0.3366241455078125, 0.253173828125, 0.1733856201171875, 0.2634258270263672, -0.0914602279663086, -0.10300636291503906, 0.913665771484375, -0.6311073303222656, -0.7062530517578125, 0.3755645751953125, 0.08046340942382812, -0.298553466796875, 0.2132110595703125, 0.8845138549804688, 0.009490966796875, 0.053195953369140625, -1.4130020141601562, 0.245025634765625, 0.06476593017578125, -0.34668731689453125, 0.3523406982421875, 0.0562591552734375, 0.29264068603515625, -0.11011695861816406, 0.2541770935058594, 0.20615768432617188, 0.3901653289794922, 0.0025386810302734375, 0.2612724304199219, -0.1680145263671875, -0.3154735565185547, -0.4192466735839844, -0.286773681640625, -0.20068359375, -0.12340927124023438, -0.16057586669921875, 0.2671356201171875, -0.1541748046875, 0.17678070068359375, 0.18554306030273438, 0.013648033142089844, -0.2672309875488281, -0.15471267700195312, 0.5229682922363281, -0.13083839416503906, 0.1959228515625, -0.234771728515625, -0.16714859008789062, 0.19910430908203125, 0.33815765380859375, -0.2149658203125, 0.1661834716796875, 0.07510757446289062, 0.16208648681640625, 0.0159759521484375, 0.1071014404296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000033.npy"}
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": 0.05614641308784485, "std": 0.30530083179473877, "min": -0.6487579345703125, "p10": -0.29739608764648434, "median": 0.035836219787597656, "p90": 0.4971456527709961, "max": 0.8417739868164062, "pos_frac": 0.546875, "sample": [0.11227226257324219, 0.8417739868164062, 0.07595062255859375, 0.314544677734375, 0.2543182373046875, 0.3802032470703125, 0.01271820068359375, -0.23566055297851562, 0.15491294860839844, -0.15752601623535156, 0.5523300170898438, 0.1349029541015625, -0.062129974365234375, 0.5160980224609375, 0.0333404541015625, -0.48752593994140625, 0.5936279296875, -0.08377456665039062, 0.4915180206298828, 0.19696807861328125, -0.16957855224609375, -0.03443145751953125, -0.009685516357421875, -0.0923309326171875, -0.04119110107421875, -0.2295074462890625, 0.5101699829101562, 0.6053695678710938, -0.1352081298828125, -0.16422271728515625, -0.100799560546875, 0.177764892578125, 0.03833198547363281, -0.14429092407226562, -0.028141021728515625, 0.2965087890625, -0.24591827392578125, 0.27322959899902344, 0.4095306396484375, 0.4845733642578125, -0.05350494384765625, 0.13006210327148438, -0.11463737487792969, 0.04454612731933594, 0.2428436279296875, 0.4995574951171875, -0.3194580078125, 0.204345703125, 0.13867950439453125, 0.14528274536132812, -0.0001087188720703125, -0.47281646728515625, 0.05907440185546875, 0.02692413330078125, 0.17939376831054688, -0.523162841796875, 0.26207733154296875, -0.2066059112548828, -0.3483142852783203, -0.01331329345703125, 0.1177825927734375, -0.23236083984375, -0.5631942749023438, -0.6487579345703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000034.npy"}
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": -0.009445279836654663, "std": 0.34461814165115356, "min": -0.9331817626953125, "p10": -0.3420547485351562, "median": -0.030025482177734375, "p90": 0.4828432083129886, "max": 1.0062103271484375, "pos_frac": 0.40625, "sample": [-0.3294219970703125, -0.30108642578125, 0.3566417694091797, -0.10093307495117188, -0.3355712890625, 0.8236465454101562, -0.07241058349609375, 0.0001373291015625, -0.470245361328125, -0.2111530303955078, -0.3124847412109375, -0.021841049194335938, 0.08551025390625, -0.052303314208984375, -0.41925811767578125, 0.011156082153320312, 1.0062103271484375, -0.3218536376953125, -0.0792083740234375, -0.04790496826171875, 0.11921119689941406, 0.021541595458984375, 0.5187931060791016, -0.01601409912109375, -0.03509712219238281, 0.007965087890625, -0.2893524169921875, -0.12818145751953125, -0.313262939453125, -0.3448333740234375, -0.9331817626953125, 0.84820556640625, 0.2172393798828125, -0.30682373046875, -0.3490333557128906, -0.00305938720703125, 0.11483383178710938, 0.021514892578125, -0.18268775939941406, -0.11598968505859375, -0.16778945922851562, 0.27120018005371094, 0.1673126220703125, -0.2567329406738281, -0.17013168334960938, -0.02179718017578125, 0.5662269592285156, 0.1623077392578125, -0.03009796142578125, -0.0129547119140625, -0.07799911499023438, 0.07811355590820312, -0.5042190551757812, -0.0299530029296875, -0.4839019775390625, 0.16991424560546875, -0.235198974609375, 0.1360321044921875, 0.3989601135253906, 0.6175537109375, 0.31861305236816406, 0.60693359375, -0.28472900390625, 0.11842536926269531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000035.npy"}
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": 0.03751787543296814, "std": 0.3524102568626404, "min": -0.7460174560546875, "p10": -0.2750833511352539, "median": -0.016645431518554688, "p90": 0.49831542968750026, "max": 1.177398681640625, "pos_frac": 0.46875, "sample": [-0.165496826171875, 1.177398681640625, -0.148956298828125, -0.23507308959960938, -0.1693401336669922, -0.019374847412109375, 0.30135345458984375, 0.22298049926757812, 0.3136329650878906, -0.01174163818359375, -0.1075286865234375, -0.072540283203125, 0.3433990478515625, 0.20068359375, -0.15918731689453125, -0.1705169677734375, 0.15674591064453125, -0.12468719482421875, -0.1369495391845703, 0.7083320617675781, 0.09646987915039062, 0.7928047180175781, -0.2183990478515625, 0.07117080688476562, -0.5073776245117188, -0.1740875244140625, -0.24162673950195312, 0.523345947265625, 0.18854141235351562, -0.48858642578125, -0.21616744995117188, 0.1885051727294922, -0.013916015625, 0.439910888671875, -0.24843597412109375, 0.24484634399414062, -0.23371124267578125, -0.0937652587890625, -0.27129554748535156, -0.3892364501953125, -0.15069961547851562, 0.5726776123046875, 0.35807037353515625, -0.2767066955566406, 0.12909317016601562, 0.10539817810058594, 0.6527481079101562, 0.15711212158203125, 0.36966705322265625, 0.3524589538574219, 0.21321868896484375, -0.24514389038085938, 0.14244842529296875, 0.056549072265625, -0.4123954772949219, -0.23630523681640625, -0.25310516357421875, 0.14423370361328125, -0.7460174560546875, -0.24541473388671875, -0.25115394592285156, 0.8048858642578125, -0.3582305908203125, 0.1656322479248047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000036.npy"}
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": 0.11707085371017456, "std": 0.44830986857414246, "min": -1.6432647705078125, "p10": -0.3196842193603515, "median": 0.09070205688476562, "p90": 0.62470703125, "max": 1.317413330078125, "pos_frac": 0.609375, "sample": [-0.1254119873046875, 1.2317123413085938, 0.3930625915527344, 0.13822174072265625, 0.52972412109375, 0.10303497314453125, 0.016002655029296875, 0.14932632446289062, 0.1764698028564453, 0.5019149780273438, 0.29718017578125, 0.34677886962890625, 0.45867919921875, 0.95318603515625, -0.3897743225097656, -0.8369522094726562, 0.17448806762695312, 0.41840362548828125, 0.201629638671875, 0.618682861328125, -0.2982444763183594, -0.38103675842285156, -1.6432647705078125, -0.2086944580078125, 0.1029510498046875, -0.20365142822265625, 0.01336669921875, -0.08792877197265625, 0.30828857421875, -0.00743865966796875, -0.0924224853515625, -0.40418243408203125, 0.23867034912109375, 0.6657867431640625, 0.869354248046875, -0.0688629150390625, 0.1999053955078125, -0.014902114868164062, -0.03180503845214844, -0.09675979614257812, 1.317413330078125, -0.2277374267578125, -0.031024932861328125, 0.1441802978515625, -0.3288726806640625, 0.023504257202148438, 0.178497314453125, 0.5546417236328125, -0.27979278564453125, -0.4650115966796875, 0.035991668701171875, -0.105804443359375, -0.0042972564697265625, 0.024517059326171875, 0.44017791748046875, 0.3880729675292969, -0.09035873413085938, 0.30010986328125, 0.13787841796875, 0.00905609130859375, 0.7408294677734375, -0.19066619873046875, 0.07845306396484375, 0.627288818359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000037.npy"}
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": 0.03931984305381775, "std": 0.4055160582065582, "min": -0.757659912109375, "p10": -0.5048618316650391, "median": 0.036543846130371094, "p90": 0.5191253662109375, "max": 1.1354866027832031, "pos_frac": 0.578125, "sample": [0.09531211853027344, 0.49512290954589844, 0.5093841552734375, 1.1354866027832031, -0.38242340087890625, 0.6413307189941406, 0.09297943115234375, 0.5072135925292969, -0.48398590087890625, 0.1620635986328125, 0.14475631713867188, 0.008396148681640625, 0.15148544311523438, 0.03429985046386719, -0.08731842041015625, -0.3192024230957031, -0.17885589599609375, 0.06289863586425781, -0.00389862060546875, 0.0838623046875, -0.6484832763671875, 0.8094329833984375, -0.6907501220703125, -0.552886962890625, -0.10022354125976562, 0.038787841796875, 0.3019866943359375, -0.757659912109375, -0.451629638671875, -0.18946075439453125, -0.20779800415039062, -0.068603515625, 0.1571197509765625, 0.022998809814453125, 0.680938720703125, 0.3069725036621094, 0.46337890625, -0.3276557922363281, 0.39951324462890625, 0.19182205200195312, 0.4481773376464844, 0.0567779541015625, -0.13639450073242188, -0.2681121826171875, -0.027736663818359375, -0.18729400634765625, -0.4859962463378906, -0.5129470825195312, 0.8615875244140625, 0.6328125, -0.7242279052734375, 0.01102447509765625, 0.5233001708984375, -0.6881752014160156, -0.038482666015625, 0.33391761779785156, -0.12271881103515625, 0.06313323974609375, 0.01232147216796875, -0.07057571411132812, 0.38593292236328125, 0.06695556640625, 0.27359771728515625, 0.06288528442382812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000038.npy"}
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.09740233421325684, "std": 0.35222142934799194, "min": -1.072540283203125, "p10": -0.2974014282226562, "median": 0.12057209014892578, "p90": 0.5280542373657229, "max": 0.79754638671875, "pos_frac": 0.671875, "sample": [0.6461563110351562, 0.06519317626953125, 0.0756683349609375, 0.04239845275878906, 0.3588104248046875, -0.38934326171875, -0.2685089111328125, 0.3526191711425781, 0.055622100830078125, 0.35270118713378906, -0.23659515380859375, -0.10313796997070312, 0.3417491912841797, -0.07947540283203125, 0.383575439453125, 0.2513294219970703, -0.08581924438476562, 0.4851360321044922, 0.5714130401611328, 0.305206298828125, -0.782684326171875, -0.309783935546875, -0.022375106811523438, 0.2303314208984375, 0.79754638671875, -0.2681140899658203, -1.072540283203125, 0.3450355529785156, -0.0621337890625, 0.033809661865234375, 0.11664390563964844, 0.5887298583984375, 0.16302490234375, 0.093963623046875, 0.06903457641601562, 0.14263534545898438, 0.029880523681640625, 0.17171478271484375, 0.15641021728515625, -0.5048370361328125, 0.213958740234375, -0.5484542846679688, -0.04998588562011719, 0.24248504638671875, 0.54644775390625, -0.05931854248046875, 0.0135650634765625, 0.402587890625, 0.4837799072265625, 0.013521194458007812, 0.46484375, -0.59185791015625, -0.22711563110351562, -0.04306221008300781, -0.17459487915039062, 0.5601806640625, 0.39140892028808594, 0.72027587890625, 0.26012420654296875, -0.15083694458007812, 0.26953125, 0.12450027465820312, 0.17764854431152344, 0.15312576293945312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000039.npy"}
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": -0.04217180609703064, "std": 0.38093283772468567, "min": -1.081268310546875, "p10": -0.5411018371582031, "median": -0.057338714599609375, "p90": 0.4296375274658204, "max": 0.85089111328125, "pos_frac": 0.46875, "sample": [0.203460693359375, 0.85089111328125, 0.132720947265625, -0.017564773559570312, -0.3088054656982422, 0.013622283935546875, 0.4364204406738281, 0.3347320556640625, -0.341400146484375, 0.4853248596191406, 0.5479278564453125, 0.16017723083496094, -0.5901260375976562, -0.2225322723388672, 0.031017303466796875, 0.05756378173828125, 0.2863311767578125, 0.78948974609375, 0.2949867248535156, -0.0951080322265625, -0.06710433959960938, 0.2244110107421875, 0.46619415283203125, 0.3756523132324219, 0.41381072998046875, -0.15423583984375, -0.3714141845703125, -0.38051605224609375, 0.2678070068359375, -0.09694099426269531, 0.3355140686035156, -0.6160430908203125, 0.225860595703125, 0.39972686767578125, 0.15780258178710938, 0.0027103424072265625, -0.17746353149414062, 0.016263961791992188, -0.365081787109375, -0.5672760009765625, -0.20311737060546875, -0.07403564453125, 0.042415618896484375, -0.55169677734375, -0.39524078369140625, -0.3001365661621094, -0.047573089599609375, -0.3607749938964844, -0.13220596313476562, 0.0718231201171875, 0.4622459411621094, -0.8743896484375, -0.18220138549804688, -0.1146697998046875, -0.2191925048828125, -0.8205833435058594, -1.081268310546875, 0.06777191162109375, -0.16573333740234375, -0.5163803100585938, 0.18504905700683594, -0.1555194854736328, -0.317352294921875, -0.15503692626953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000040.npy"}
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.05953595042228699, "std": 0.3931024670600891, "min": -0.8320159912109375, "p10": -0.3822832107543945, "median": -0.024175643920898438, "p90": 0.6127738952636721, "max": 1.0704345703125, "pos_frac": 0.46875, "sample": [0.1600799560546875, 0.20409393310546875, -0.16860389709472656, 0.06798171997070312, 0.435791015625, -0.2865142822265625, -0.17071151733398438, 0.9926948547363281, -0.0390777587890625, -0.2557182312011719, 0.327178955078125, -0.01392364501953125, -0.13678741455078125, 0.5604095458984375, 0.19399261474609375, 0.260009765625, -0.09780311584472656, 1.0704345703125, -0.4633445739746094, -0.1583576202392578, 0.6352157592773438, -0.4801483154296875, -0.40625572204589844, 0.27683258056640625, -0.028697967529296875, -0.4252471923828125, -0.11447906494140625, 0.24251174926757812, 0.1279754638671875, -0.2012481689453125, -0.08763694763183594, 0.9821014404296875, 1.062408447265625, 0.6419906616210938, -0.18895721435546875, -0.10970306396484375, 0.19879150390625, -0.46619415283203125, 0.4912109375, 0.014896392822265625, -0.02703857421875, -0.58203125, -0.2930717468261719, 0.10943031311035156, 0.6850509643554688, -0.14836502075195312, 0.16673660278320312, -0.021312713623046875, 0.25748443603515625, 0.12030792236328125, -0.1796722412109375, -0.079132080078125, 0.27086639404296875, -0.25366973876953125, -0.23763656616210938, -0.8320159912109375, 0.08080291748046875, -0.24324798583984375, 0.2947235107421875, -0.32634735107421875, 0.17482757568359375, 0.44762420654296875, -0.11297416687011719, -0.1082305908203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000041.npy"}
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.06822726130485535, "std": 0.4075385630130768, "min": -0.8384552001953125, "p10": -0.4653099060058593, "median": 0.0604400634765625, "p90": 0.6284713745117189, "max": 0.8837890625, "pos_frac": 0.546875, "sample": [0.3047332763671875, 0.680999755859375, -0.26294708251953125, -0.1646270751953125, -0.04754829406738281, -0.6065521240234375, -0.11689567565917969, -0.09350013732910156, -0.5205764770507812, 0.17540740966796875, -0.4812469482421875, 0.1475067138671875, -0.024593353271484375, 0.04291534423828125, -0.27808380126953125, 0.2972869873046875, 0.7159576416015625, 0.3726043701171875, 0.13297653198242188, -0.15421295166015625, -0.12358856201171875, 0.781982421875, 0.8837890625, 0.5336074829101562, -0.37175750732421875, -0.26796722412109375, 0.21869277954101562, 0.07796478271484375, -0.13394927978515625, -0.4106407165527344, 0.3926048278808594, 0.4806861877441406, -0.22160911560058594, -0.8384552001953125, 0.6496429443359375, 0.45023345947265625, 0.4799041748046875, 0.13344573974609375, -0.42812347412109375, -0.22803497314453125, -0.185699462890625, -0.3122406005859375, -0.5270004272460938, -0.2413787841796875, -0.1483612060546875, 0.711334228515625, 0.003204345703125, 0.424102783203125, -0.023830413818359375, -0.7828903198242188, 0.15863990783691406, 0.4490623474121094, 0.4227752685546875, 0.08929061889648438, 0.3998298645019531, -0.1194915771484375, -0.6853561401367188, 0.022172927856445312, 0.6443557739257812, 0.5914077758789062, 0.23162269592285156, 0.4622688293457031, 0.38903045654296875, 0.21566390991210938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000042.npy"}
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": 0.08003199100494385, "std": 0.44571539759635925, "min": -1.123260498046875, "p10": -0.4905912399291992, "median": 0.1021881103515625, "p90": 0.6192752838134766, "max": 0.9953842163085938, "pos_frac": 0.6875, "sample": [-0.32904052734375, -0.04633331298828125, -0.24505615234375, 0.047637939453125, 0.0712432861328125, -0.1717815399169922, 0.0503387451171875, 0.09865570068359375, 0.628143310546875, -0.879913330078125, 0.1654205322265625, 0.8122634887695312, 0.2020111083984375, -0.44290924072265625, 0.281158447265625, 0.27886962890625, 0.39257240295410156, 0.05796051025390625, 0.06602668762207031, 0.010406494140625, 0.42606163024902344, -0.1466064453125, -1.123260498046875, 0.03154563903808594, 0.6850700378417969, 0.08530426025390625, 0.20128822326660156, 0.5306167602539062, -0.0284423828125, 0.5724029541015625, 0.4345703125, 0.1699371337890625, 0.23407745361328125, -0.4175567626953125, 0.07244682312011719, 0.0916900634765625, 0.3175201416015625, -0.3625507354736328, 0.5985832214355469, 0.43328094482421875, 0.15079498291015625, -0.24443817138671875, 0.16126632690429688, -0.9376068115234375, 0.13071250915527344, 0.761260986328125, 0.01873016357421875, 0.5339546203613281, -0.062744140625, 0.4767951965332031, 0.14731407165527344, 0.9953842163085938, 0.10572052001953125, -1.1085281372070312, 0.6344223022460938, 0.1300811767578125, -0.5110263824462891, 0.3766632080078125, 0.838714599609375, -0.5596923828125, 0.175872802734375, -0.2888946533203125, -0.06456565856933594, -0.591796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000043.npy"}
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": 0.022116124629974365, "std": 0.4408392608165741, "min": -0.7774620056152344, "p10": -0.503582763671875, "median": 0.0298614501953125, "p90": 0.5121032714843751, "max": 1.367462158203125, "pos_frac": 0.515625, "sample": [-0.4015350341796875, -0.15213775634765625, -0.126800537109375, 0.19658851623535156, 0.34356689453125, 0.05499267578125, 0.02719879150390625, -0.574676513671875, -0.3439369201660156, -0.7774620056152344, -0.07067489624023438, 0.3409881591796875, -0.7094268798828125, 0.03252410888671875, 0.23946380615234375, -0.41426849365234375, 0.09818458557128906, 0.2827587127685547, 0.5085811614990234, 0.1648693084716797, -0.002529144287109375, 0.44599151611328125, 0.3039379119873047, -0.506072998046875, -0.2825927734375, 0.472930908203125, 0.20053863525390625, -0.6111602783203125, -0.3913421630859375, 0.3604278564453125, 0.76300048828125, -0.13963699340820312, 0.12090301513671875, -0.09334564208984375, -0.001148223876953125, 0.11791610717773438, 1.367462158203125, -0.190521240234375, -0.673370361328125, -0.452972412109375, -0.32409095764160156, 0.2655487060546875, 0.64410400390625, 0.32257080078125, 0.143096923828125, 0.5968170166015625, -0.377166748046875, 0.1503143310546875, -0.7322463989257812, 0.056926727294921875, -0.25901031494140625, -0.1171875, 0.25672149658203125, -0.1888885498046875, -0.2750091552734375, -0.16373443603515625, 0.8680953979492188, 1.246673583984375, 0.5136127471923828, 0.06695556640625, 0.220855712890625, -0.24384307861328125, -0.285125732421875, -0.497772216796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000044.npy"}
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.08512008190155029, "std": 0.4688953757286072, "min": -1.23333740234375, "p10": -0.49772109985351565, "median": 0.09869003295898438, "p90": 0.6359352111816408, "max": 1.09259033203125, "pos_frac": 0.625, "sample": [-0.15137481689453125, -0.23880767822265625, 0.3571815490722656, -0.20172882080078125, 0.07137298583984375, -0.49982452392578125, -0.2797355651855469, 0.1947021484375, -0.7279205322265625, -1.23333740234375, 0.424407958984375, 0.10234832763671875, 0.16782760620117188, 0.6557540893554688, -0.2055988311767578, 0.09503173828125, 0.011415481567382812, -0.7873001098632812, 0.7197952270507812, 0.11156463623046875, 0.589691162109375, -0.37001800537109375, 0.5565357208251953, -0.18181419372558594, 0.30556488037109375, 0.94403076171875, 0.2645721435546875, 0.06571769714355469, 0.17523956298828125, 0.6886444091796875, 0.16270828247070312, 0.2215576171875, -0.9547500610351562, 0.045833587646484375, 0.4226570129394531, 1.09259033203125, -0.09569168090820312, 0.08940887451171875, -0.4928131103515625, -0.10321807861328125, -1.0961990356445312, -0.13226318359375, -0.2034454345703125, 0.1575145721435547, 0.3618621826171875, 0.1355438232421875, 0.3149871826171875, 0.0711517333984375, 0.404205322265625, 0.4208221435546875, -0.5297012329101562, 0.3555278778076172, 0.987762451171875, -0.066192626953125, 0.07555961608886719, 0.577484130859375, 1.0695648193359375, -0.030059814453125, 0.11489295959472656, -0.09815025329589844, -0.04787445068359375, 0.40240478515625, 0.22707366943359375, -0.037006378173828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000045.npy"}
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.18374782800674438, "std": 0.4387693405151367, "min": -0.83892822265625, "p10": -0.30096664428710934, "median": 0.1079874038696289, "p90": 0.7113197326660157, "max": 1.7942352294921875, "pos_frac": 0.625, "sample": [-0.36156463623046875, -0.2195281982421875, -0.27768707275390625, 0.483123779296875, 0.54986572265625, 0.633392333984375, -0.10729217529296875, -0.06775665283203125, 0.42975616455078125, -0.83892822265625, 0.8267326354980469, 0.17757415771484375, 0.2577667236328125, 0.42586517333984375, 0.17118072509765625, 0.14886856079101562, -0.766632080078125, 0.295074462890625, 0.4454212188720703, 0.19171714782714844, 0.3292694091796875, 0.982147216796875, -0.057903289794921875, 0.6001205444335938, 0.7971115112304688, 0.0169677734375, 0.412261962890625, 1.7942352294921875, 0.02246856689453125, -0.0164947509765625, -0.00383758544921875, 0.3392333984375, -0.21239471435546875, 0.04364013671875, -0.1761016845703125, 0.691436767578125, 0.14953994750976562, -0.0731353759765625, -0.41909217834472656, 0.7198410034179688, 0.3675689697265625, 0.18766021728515625, 0.00484466552734375, -0.0385284423828125, 0.6764411926269531, 0.03540802001953125, -0.19989013671875, -0.1561126708984375, -0.03488922119140625, 0.4175834655761719, 0.48850250244140625, -0.000362396240234375, -0.11063385009765625, 0.5654945373535156, -0.310943603515625, 0.92510986328125, -0.3209686279296875, 0.01043701171875, -0.3303413391113281, 0.9730377197265625, 0.21599578857421875, 0.06710624694824219, -0.029682159423828125, 0.02075958251953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000046.npy"}
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.15279105305671692, "std": 0.3858776092529297, "min": -1.0280380249023438, "p10": -0.29414443969726556, "median": 0.12363433837890625, "p90": 0.592043685913086, "max": 0.84893798828125, "pos_frac": 0.65625, "sample": [0.49622344970703125, -0.159820556640625, -0.023273468017578125, -0.05301666259765625, 0.07666778564453125, 0.24863052368164062, -0.08794021606445312, 0.388824462890625, -0.13606643676757812, 0.41295623779296875, -0.018583297729492188, -0.3107337951660156, 0.289306640625, -1.0280380249023438, 0.14247894287109375, -0.08798980712890625, 0.2538185119628906, 0.7554779052734375, 0.4349212646484375, 0.04781341552734375, 0.20489501953125, 0.06725311279296875, 0.0879669189453125, 0.24083709716796875, -0.18305206298828125, 0.4146270751953125, 0.20459556579589844, 0.0015926361083984375, 0.06274032592773438, 0.3730278015136719, -0.023067474365234375, 0.7590065002441406, -0.06490325927734375, 0.5750846862792969, -0.9564743041992188, 0.08165740966796875, 0.5491180419921875, 0.39580535888671875, 0.20162582397460938, 0.5151748657226562, -0.1957550048828125, 0.7655715942382812, -0.05176544189453125, 0.10478973388671875, 0.5941276550292969, -0.2554359436035156, 0.677978515625, 0.3064422607421875, 0.3641986846923828, 0.3147563934326172, -0.07318878173828125, -0.00244903564453125, -0.3421821594238281, -0.701416015625, -0.35323333740234375, 0.84893798828125, 0.07630157470703125, 0.7573356628417969, 0.4981842041015625, 0.4612388610839844, 0.5763359069824219, 0.5871810913085938, 0.0204620361328125, -0.3489570617675781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000047.npy"}
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.061874061822891235, "std": 0.423190712928772, "min": -0.7635955810546875, "p10": -0.4359161376953125, "median": 0.07938385009765625, "p90": 0.5731002807617188, "max": 1.2590408325195312, "pos_frac": 0.5625, "sample": [0.13475990295410156, -0.1202239990234375, 0.28592872619628906, 0.37245941162109375, 0.0485992431640625, 0.8671646118164062, -0.3794708251953125, -0.4625701904296875, -0.42917633056640625, 0.11846160888671875, 0.1983795166015625, 0.08376502990722656, -0.43880462646484375, -0.7012863159179688, 0.5787811279296875, 1.0771026611328125, 0.42049407958984375, -0.16261863708496094, -0.7635955810546875, 0.559844970703125, 0.08150482177734375, -0.08834457397460938, -0.5555496215820312, 0.29444122314453125, 0.28119659423828125, 0.17737579345703125, -0.0847320556640625, 0.000446319580078125, -0.5937118530273438, 0.784637451171875, -0.3209686279296875, 0.11279296875, 0.16178131103515625, -0.07901382446289062, 0.4371604919433594, 0.23816680908203125, 0.211273193359375, -0.017589569091796875, 0.9782581329345703, -0.047824859619140625, -0.28072166442871094, -0.3471221923828125, 0.15100860595703125, 0.1729564666748047, 0.14661407470703125, -0.13639259338378906, 0.24842262268066406, -0.07670974731445312, -0.3687744140625, -0.2996997833251953, 0.042758941650390625, 0.07726287841796875, -0.28934478759765625, 0.211639404296875, -0.04291725158691406, 0.2832469940185547, 1.2590408325195312, -0.0533599853515625, 0.2111053466796875, -0.23017501831054688, 0.15957260131835938, 0.901275634765625, -0.35422706604003906, -0.684814453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000048.npy"}
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": 0.12440502643585205, "std": 0.487355500459671, "min": -1.1612548828125, "p10": -0.4299640655517577, "median": 0.1329965591430664, "p90": 0.5845947265625002, "max": 1.5429229736328125, "pos_frac": 0.703125, "sample": [0.368194580078125, 0.35555267333984375, 0.024358749389648438, 0.30326080322265625, 0.37256622314453125, -0.10426902770996094, 0.0021533966064453125, 0.06402587890625, 0.24723243713378906, 0.11057281494140625, 0.0251007080078125, 0.10178756713867188, 0.39385986328125, 0.6354751586914062, -0.25959205627441406, 0.05626678466796875, -0.7523975372314453, 0.12994384765625, 0.36173248291015625, -0.812255859375, 0.13732528686523438, 0.17485809326171875, 0.07511138916015625, -0.0884552001953125, 0.1360492706298828, 0.287689208984375, 0.1192169189453125, 0.0767059326171875, 0.2296905517578125, -0.8777313232421875, 0.23712730407714844, 0.468963623046875, -0.18352508544921875, 0.5288944244384766, 0.4367237091064453, 1.4490509033203125, -0.12783050537109375, 0.8514862060546875, 0.60418701171875, 0.9590225219726562, -0.0116119384765625, -0.152374267578125, -0.12620925903320312, 0.6556854248046875, -0.4828643798828125, 0.2649078369140625, -0.14001846313476562, 0.40233421325683594, 0.21593475341796875, -0.08120918273925781, 0.53887939453125, 0.40167236328125, -1.1612548828125, 0.14794921875, -0.3065299987792969, -0.866424560546875, 0.36576080322265625, 1.5429229736328125, 0.3981513977050781, -0.12579345703125, 0.0955810546875, 0.044921875, 0.32112884521484375, -1.097747802734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000049.npy"}
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": 0.15634474158287048, "std": 0.38536250591278076, "min": -0.5968093872070312, "p10": -0.31254730224609373, "median": 0.1098642349243164, "p90": 0.6150485992431642, "max": 1.3623504638671875, "pos_frac": 0.640625, "sample": [0.3231964111328125, 0.5621795654296875, -0.31695556640625, -0.18781661987304688, 0.6291465759277344, 0.2787017822265625, 0.4688453674316406, -0.3022613525390625, 1.3623504638671875, 0.2819671630859375, 0.241851806640625, 0.2471466064453125, -0.2122039794921875, 0.009777069091796875, 0.2652702331542969, -0.32561302185058594, 0.3897247314453125, 0.5024452209472656, 0.3396778106689453, 0.17971038818359375, 0.636810302734375, -0.0498504638671875, 0.07134628295898438, 0.260498046875, 0.47951507568359375, 0.9544677734375, 0.037540435791015625, 0.18805313110351562, -0.2669334411621094, 0.1371917724609375, 0.5054454803466797, 0.2034015655517578, 0.5601272583007812, 0.4834709167480469, -0.47967529296875, -0.4820823669433594, 0.5821533203125, 1.089599609375, 0.025358200073242188, 0.10394287109375, 0.46424102783203125, 0.6523284912109375, 0.039154052734375, -0.01950836181640625, -0.08202934265136719, 0.11262702941894531, 0.7147598266601562, -0.28034210205078125, -0.34688568115234375, 0.11817169189453125, -0.0039806365966796875, -0.07852935791015625, 0.03542518615722656, -0.04466438293457031, 0.4853324890136719, -0.22866058349609375, -0.5968093872070312, -0.16143417358398438, -0.4090118408203125, -0.2331066131591797, -0.02294921875, 0.1071014404296875, 0.061443328857421875, -0.05413055419921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000050.npy"}
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": 0.17680680751800537, "std": 0.4669606685638428, "min": -0.92156982421875, "p10": -0.4086559295654296, "median": 0.13001441955566406, "p90": 0.7976875305175782, "max": 1.31109619140625, "pos_frac": 0.59375, "sample": [0.9029083251953125, 0.36948394775390625, -0.049686431884765625, -0.09208297729492188, 0.24312210083007812, 0.40128326416015625, 0.2793598175048828, 0.1025543212890625, -0.5723590850830078, -0.01068115234375, 0.024080276489257812, 0.6881065368652344, -0.4307899475097656, 1.079803466796875, -0.2725830078125, 0.2901153564453125, 0.40459442138671875, 0.4244384765625, 0.19266128540039062, 1.31109619140625, -0.3171844482421875, 0.200775146484375, 0.5455532073974609, 0.5847930908203125, -0.04093170166015625, -0.3570098876953125, -0.1188812255859375, -0.4321479797363281, 0.10007286071777344, 0.472320556640625, 0.13695526123046875, 0.55120849609375, -0.14143753051757812, 0.5006866455078125, -0.92156982421875, -0.08905792236328125, 0.5069389343261719, -0.18185806274414062, -0.434051513671875, -0.13172149658203125, 0.2679100036621094, -0.45592498779296875, 0.12307357788085938, 1.1111602783203125, 0.2735099792480469, 0.2725372314453125, -0.208038330078125, 1.261260986328125, -0.203857421875, 0.8106765747070312, 0.48620033264160156, 0.7673797607421875, 0.5462265014648438, 0.011566162109375, -0.0391387939453125, -0.23427581787109375, -0.3202667236328125, 1.1266021728515625, -0.0035839080810546875, 0.1193084716796875, -0.5418624877929688, 0.224853515625, 0.2436656951904297, -0.042224884033203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000051.npy"}
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.1731322705745697, "std": 0.4756048023700714, "min": -0.9478759765625, "p10": -0.3468593597412109, "median": 0.09690189361572266, "p90": 0.7480728149414063, "max": 1.838134765625, "pos_frac": 0.609375, "sample": [1.260345458984375, 0.3300933837890625, 0.20751953125, 0.8529510498046875, 0.7494735717773438, -0.1279144287109375, 0.3852825164794922, -0.24014663696289062, -0.06417274475097656, 1.1058349609375, 1.2015380859375, 0.7448043823242188, -0.35222625732421875, 0.5038986206054688, 0.17194366455078125, -0.127655029296875, 0.48397064208984375, 0.04212188720703125, -0.10642242431640625, -0.3514976501464844, -0.22723007202148438, -0.07091140747070312, 0.392364501953125, 0.007869720458984375, 0.4564666748046875, 0.15691375732421875, -0.24198150634765625, -0.12115859985351562, 0.2817840576171875, 0.2840995788574219, 0.1904144287109375, -0.2697296142578125, 0.07558441162109375, 0.027557373046875, 0.08613777160644531, 0.47176361083984375, -0.003444671630859375, 0.05389404296875, 0.107666015625, -0.9478759765625, -0.07887458801269531, 0.0550994873046875, 0.10924339294433594, 0.3184623718261719, -0.2301616668701172, 0.6140899658203125, -0.15267372131347656, 0.2913398742675781, 1.838134765625, -0.37421417236328125, 0.6286468505859375, -0.5157623291015625, -0.33603668212890625, 0.4171867370605469, 0.41968536376953125, -0.38161468505859375, 0.945556640625, -0.5016136169433594, 0.58990478515625, 0.31763458251953125, -0.08434295654296875, 0.21985435485839844, -0.2499713897705078, -0.15903472900390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000052.npy"}
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.12276646494865417, "std": 0.37780168652534485, "min": -0.9053497314453125, "p10": -0.36185569763183595, "median": 0.12801074981689453, "p90": 0.5562637329101564, "max": 1.32476806640625, "pos_frac": 0.640625, "sample": [0.24796676635742188, 0.45182037353515625, 0.5786094665527344, 0.275604248046875, 0.11874961853027344, 0.2276020050048828, -0.0305938720703125, -0.15383148193359375, 0.1833953857421875, -0.46831512451171875, 0.12872314453125, -0.469757080078125, -0.0096588134765625, 0.2580242156982422, 0.04449462890625, 0.19935226440429688, 0.12729835510253906, 0.67529296875, -0.17730140686035156, -0.12142181396484375, 0.11392593383789062, -0.0500640869140625, 0.3963470458984375, 0.3273124694824219, -0.9053497314453125, 0.36322784423828125, -0.5034217834472656, 0.48883819580078125, 0.17136383056640625, 0.19227218627929688, 0.423004150390625, -0.6633529663085938, -0.0940399169921875, -0.363433837890625, -0.0130157470703125, 0.117523193359375, 0.4763450622558594, 0.6963005065917969, -0.11343765258789062, -0.0865631103515625, -0.2509441375732422, -0.1807098388671875, 0.33043861389160156, -0.64727783203125, 0.6249542236328125, 0.507354736328125, 0.01299285888671875, -0.06749725341796875, 0.46370697021484375, 1.32476806640625, 0.40550994873046875, 0.478546142578125, -0.14807510375976562, 0.59002685546875, 0.17961883544921875, 0.017154693603515625, 0.5772247314453125, -0.3581733703613281, 0.2590179443359375, 0.04464530944824219, 0.012340545654296875, 0.49015045166015625, -0.1550731658935547, 0.2865180969238281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000053.npy"}
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.15033209323883057, "std": 0.4245028793811798, "min": -0.9717483520507812, "p10": -0.305889892578125, "median": 0.1077728271484375, "p90": 0.6347579956054688, "max": 1.5042266845703125, "pos_frac": 0.625, "sample": [-0.04833984375, 1.0356521606445312, -0.3201446533203125, 0.05766487121582031, -0.0623779296875, -0.1362762451171875, 0.7396011352539062, 0.29695892333984375, 0.8261375427246094, -0.6864700317382812, -0.0705718994140625, 0.0026645660400390625, 0.514678955078125, -0.9717483520507812, 0.062320709228515625, -0.08583450317382812, -0.166717529296875, 0.4137916564941406, 0.3347587585449219, -0.19573974609375, 0.27797698974609375, 1.5042266845703125, 0.17234039306640625, 0.36006927490234375, 0.105255126953125, 0.41600990295410156, 0.25063133239746094, -0.18694686889648438, 0.336578369140625, 0.003204345703125, -0.14028167724609375, 0.18857192993164062, -0.07714080810546875, -0.2726287841796875, 0.4616832733154297, 0.314727783203125, 0.01778411865234375, -0.18257904052734375, 0.8165817260742188, 0.6369552612304688, 0.3497886657714844, -0.1046295166015625, 0.42279815673828125, 0.6071853637695312, -0.07379913330078125, 0.7929649353027344, 0.11029052734375, 0.059597015380859375, -0.06116485595703125, 0.20126724243164062, -0.19043731689453125, -0.7414932250976562, 0.3426246643066406, 0.243072509765625, 0.45175743103027344, -0.05126190185546875, 0.06960678100585938, -0.433135986328125, 0.4760246276855469, 0.25283050537109375, -0.42176055908203125, 0.6296310424804688, 0.598480224609375, -0.4520111083984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000054.npy"}
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.2756025791168213, "std": 0.6257441639900208, "min": -1.1141357421875, "p10": -0.36760559082031247, "median": 0.1581573486328125, "p90": 1.106480407714844, "max": 2.17437744140625, "pos_frac": 0.65625, "sample": [1.3050155639648438, 0.20706939697265625, 0.06698989868164062, 0.21579742431640625, -0.05208778381347656, 0.00567626953125, 0.5571861267089844, -0.12795257568359375, -0.3798828125, -0.072509765625, 0.0777130126953125, 0.83251953125, -0.7013702392578125, -0.38631439208984375, -0.10551071166992188, 1.0506820678710938, 1.1303939819335938, 0.0737457275390625, -0.091949462890625, -0.2840576171875, -0.037227630615234375, -1.1141357421875, 0.5592117309570312, 0.16327285766601562, 2.17437744140625, 0.3486652374267578, 0.24379730224609375, 1.363983154296875, -0.31064605712890625, 0.219268798828125, 0.5105991363525391, 0.876007080078125, 0.594818115234375, -0.45641136169433594, -0.29471588134765625, 0.26851654052734375, 0.0929861068725586, 0.07673263549804688, 0.561767578125, -0.2739906311035156, -0.0524749755859375, 0.9023208618164062, -0.16005706787109375, 1.0090484619140625, 0.5138778686523438, 0.22230148315429688, 1.7965774536132812, 0.135772705078125, -0.338958740234375, -0.267333984375, 0.030185699462890625, 0.22467041015625, 0.2205963134765625, 0.15304183959960938, -0.6305351257324219, 0.9472084045410156, 0.832000732421875, 0.25598907470703125, 0.5080585479736328, 1.4701995849609375, 0.139373779296875, -0.2175750732421875, 1.6097068786621094, -0.5534591674804688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000055.npy"}
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.13655099272727966, "std": 0.4718063175678253, "min": -1.413299560546875, "p10": -0.4015968322753906, "median": 0.1533374786376953, "p90": 0.821933746337891, "max": 1.16455078125, "pos_frac": 0.625, "sample": [-0.1718597412109375, 0.2748832702636719, 0.04909706115722656, -0.45855712890625, 0.175628662109375, 0.09009552001953125, 1.0417556762695312, -0.4612884521484375, -0.40457916259765625, -0.00235748291015625, 0.22906494140625, -0.2979278564453125, 0.0188751220703125, 0.1998443603515625, 0.7285003662109375, 1.031402587890625, -0.48029327392578125, -0.4928855895996094, 0.202392578125, 0.9733123779296875, -0.23481369018554688, 1.0120086669921875, 0.28513145446777344, 0.42304229736328125, 0.37265968322753906, 0.4565143585205078, -0.30158233642578125, 0.21108245849609375, -0.06817817687988281, 0.33131980895996094, -0.2876396179199219, 0.5271072387695312, 1.0385208129882812, 0.8619766235351562, 0.4425506591796875, -0.1858062744140625, -1.413299560546875, 0.2879753112792969, -0.10936546325683594, 0.16959381103515625, 0.46862030029296875, 0.02433013916015625, -0.2700347900390625, -0.38430023193359375, 0.17530250549316406, 0.08366012573242188, 0.23773956298828125, 0.6792526245117188, -0.052539825439453125, 0.34572601318359375, 0.5615653991699219, -0.13937759399414062, 0.17170333862304688, 0.3967552185058594, -0.3946380615234375, -0.3780345916748047, -0.10501480102539062, 0.01033782958984375, 0.5127220153808594, 0.09043121337890625, -0.1080322265625, 1.16455078125, 0.13708114624023438, -0.5524444580078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000056.npy"}
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.2697131037712097, "std": 0.5775672793388367, "min": -0.9733352661132812, "p10": -0.5129234313964843, "median": 0.2958984375, "p90": 1.0143753051757813, "max": 1.68402099609375, "pos_frac": 0.671875, "sample": [-0.5759925842285156, -0.3940238952636719, 0.31322288513183594, 1.68402099609375, -0.03854179382324219, 0.45135498046875, 0.5049591064453125, -0.5385360717773438, 1.0840072631835938, 1.1469802856445312, 1.4170074462890625, 0.768829345703125, -0.6407318115234375, -0.19663047790527344, 0.42972564697265625, -0.4492321014404297, 0.8215408325195312, 0.8253021240234375, 1.6050338745117188, 0.27986717224121094, 0.6357498168945312, -0.210205078125, 0.40413665771484375, 0.47882080078125, -0.043315887451171875, -0.740478515625, 0.13493728637695312, 0.161224365234375, 0.3203887939453125, -0.9733352661132812, 0.6290740966796875, 0.10420608520507812, 0.282318115234375, 0.309478759765625, -0.001667022705078125, 0.746490478515625, -0.838897705078125, 0.4833812713623047, -0.5131912231445312, 0.800628662109375, 0.25820159912109375, -0.326507568359375, -0.43121337890625, 0.4409503936767578, -0.512298583984375, -0.0572357177734375, 1.0964202880859375, 0.7555618286132812, 0.37117767333984375, 0.6343631744384766, 0.7728462219238281, 0.01813507080078125, 0.18730926513671875, 1.0306396484375, 0.05563545227050781, 0.9764251708984375, -0.0361785888671875, 0.34576416015625, 0.25325775146484375, 0.379364013671875, -0.053455352783203125, 0.09896659851074219, 0.5210647583007812, -0.18546295166015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000057.npy"}
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.17981407046318054, "std": 0.6800174713134766, "min": -1.0586509704589844, "p10": -0.5673294067382811, "median": 0.09869670867919922, "p90": 0.8658611297607424, "max": 2.510528564453125, "pos_frac": 0.546875, "sample": [-1.0586509704589844, 0.7455863952636719, 0.6345767974853516, -0.035427093505859375, 0.7521438598632812, -0.4974212646484375, -0.1887359619140625, 0.8869857788085938, -0.3746337890625, -0.3897552490234375, 0.8165702819824219, -0.24704360961914062, 1.1018218994140625, -0.655029296875, -0.42038917541503906, 0.6128120422363281, 1.3962974548339844, -0.5972900390625, -0.1246337890625, 0.6630401611328125, 0.09352493286132812, 0.3034095764160156, 0.23075485229492188, -0.0449371337890625, -0.4370574951171875, -0.08051681518554688, -0.49590301513671875, 0.01848602294921875, -0.20877838134765625, 0.39090728759765625, -0.2637481689453125, -0.6648349761962891, 0.14102554321289062, 0.206512451171875, 2.510528564453125, -0.07583999633789062, -0.6004638671875, 1.6643218994140625, 0.81268310546875, 0.27459716796875, -0.485687255859375, 0.15478897094726562, 0.13544082641601562, -0.722381591796875, -0.39398193359375, 2.40838623046875, -0.30679893493652344, -0.22544097900390625, 0.0443267822265625, 0.7163772583007812, 0.7609405517578125, 0.9168777465820312, 0.2873687744140625, 0.2650718688964844, -0.04219818115234375, 0.5401077270507812, 0.16509056091308594, 0.10386848449707031, -0.6369857788085938, 0.676361083984375, -0.1256561279296875, -0.13555908203125, 0.2023468017578125, 0.4099407196044922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000058.npy"}
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.24419182538986206, "std": 0.5678459405899048, "min": -0.6368675231933594, "p10": -0.3384092330932617, "median": 0.15627670288085938, "p90": 1.0159271240234378, "max": 2.23870849609375, "pos_frac": 0.59375, "sample": [-0.012920379638671875, 0.28043365478515625, -0.20093536376953125, -0.15291595458984375, -0.23740005493164062, -0.3389568328857422, 0.4747200012207031, -0.3742218017578125, 0.6431541442871094, 1.096832275390625, 1.49090576171875, 0.038105010986328125, 0.7510528564453125, 1.0897560119628906, 0.2700767517089844, 0.3752708435058594, 0.5157470703125, -0.5293922424316406, 0.1480255126953125, -0.2541046142578125, 0.4758777618408203, 0.09325790405273438, -0.16129302978515625, 0.20849990844726562, -0.35028839111328125, 0.100616455078125, -0.24493789672851562, 1.0489501953125, 2.23870849609375, -0.04315185546875, 0.12603759765625, 0.1769561767578125, 0.6223163604736328, -0.4550666809082031, -0.53955078125, 0.0065135955810546875, -0.2180633544921875, 0.3991718292236328, -0.3371315002441406, -0.0077724456787109375, 1.9253692626953125, 0.500885009765625, 1.0628585815429688, 0.49176597595214844, -0.1386737823486328, 0.6198863983154297, -0.244598388671875, 0.7560634613037109, -0.007915496826171875, 0.7388763427734375, -0.6368675231933594, 0.296539306640625, -0.20499801635742188, -0.056060791015625, -0.24477386474609375, 0.16452789306640625, -0.3130340576171875, 0.33941650390625, 0.18118667602539062, 0.45203399658203125, 0.4833984375, 0.938873291015625, 0.5553970336914062, -0.24476242065429688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000059.npy"}
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": 0.0648232102394104, "std": 0.6048787832260132, "min": -1.810546875, "p10": -0.6168701171875, "median": 0.15417003631591797, "p90": 0.726021957397461, "max": 1.2087230682373047, "pos_frac": 0.5625, "sample": [0.4424896240234375, 0.130279541015625, 0.3737602233886719, 0.241912841796875, -0.29386138916015625, 0.7279891967773438, -0.136627197265625, -0.5990219116210938, 0.03304290771484375, -1.0909576416015625, 0.5292396545410156, -0.09471893310546875, -0.27839088439941406, -0.10510444641113281, -1.5346145629882812, 0.3198699951171875, -0.3605976104736328, 0.6376285552978516, 1.2087230682373047, 0.413665771484375, -0.239410400390625, 0.0124053955078125, 0.521453857421875, -0.5430068969726562, 0.4021110534667969, -0.1353759765625, 0.6683502197265625, -0.015703201293945312, 0.7721176147460938, -1.1547126770019531, 0.1936969757080078, 0.02567291259765625, 0.7793960571289062, 0.218841552734375, -0.3992347717285156, 0.50189208984375, 0.5453510284423828, -0.4204826354980469, 0.4282264709472656, -0.5135250091552734, -0.6245193481445312, 0.6896743774414062, 0.572479248046875, -1.1795501708984375, 0.17806053161621094, 0.4349365234375, 0.7214317321777344, -0.072601318359375, -0.007877349853515625, -0.1555023193359375, 0.5601730346679688, 0.5248184204101562, 0.2966728210449219, -0.30651092529296875, 0.963348388671875, -0.15358352661132812, 0.8358268737792969, 0.2215576171875, 0.3241081237792969, -0.22122573852539062, -1.810546875, -0.8101139068603516, 0.9815006256103516, -0.026641845703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000060.npy"}
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.13854748010635376, "std": 0.6573115587234497, "min": -1.4625701904296875, "p10": -0.7199119567871093, "median": 0.0984334945678711, "p90": 1.0215099334716797, "max": 1.879364013671875, "pos_frac": 0.5625, "sample": [0.387847900390625, 0.83026123046875, 1.0345802307128906, 0.96484375, -0.18941879272460938, 0.07443428039550781, 0.715301513671875, 1.0574951171875, 0.42070770263671875, 0.06554794311523438, -0.225860595703125, -0.3612937927246094, -0.2274169921875, -0.110931396484375, 0.5981559753417969, -0.5471973419189453, 0.6357650756835938, -0.20940589904785156, -0.12398910522460938, 0.25997352600097656, -0.007917404174804688, 0.3685951232910156, 0.46453857421875, 0.13024139404296875, -0.21607589721679688, -0.0982513427734375, -0.3891754150390625, -0.7422943115234375, 0.40268707275390625, -0.35472869873046875, 0.7880954742431641, -0.3186836242675781, -0.8191146850585938, 0.23992919921875, 0.1731719970703125, 0.2658195495605469, 0.2457122802734375, -0.747650146484375, -0.08467483520507812, 1.879364013671875, 1.1822662353515625, -1.040985107421875, 0.650115966796875, -0.7586441040039062, 0.3560523986816406, 0.7234344482421875, 0.12111663818359375, 0.15190505981445312, -0.23096084594726562, -0.33360862731933594, 1.25439453125, -0.46215057373046875, -0.22428131103515625, 0.01648712158203125, 0.5625152587890625, -1.1903610229492188, 1.4599456787109375, 0.07575035095214844, 0.6533737182617188, -0.3894805908203125, 0.9910125732421875, 1.200408935546875, -1.4625701904296875, -0.6676864624023438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000061.npy"}
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.1931813657283783, "std": 0.5983238220214844, "min": -1.6626739501953125, "p10": -0.5223491668701172, "median": 0.20030879974365234, "p90": 0.9242641448974613, "max": 1.4494171142578125, "pos_frac": 0.59375, "sample": [0.8042068481445312, -0.9733257293701172, -0.3589019775390625, 0.4480419158935547, 0.5335235595703125, -0.04886627197265625, 1.0215415954589844, 0.07825279235839844, -0.7436294555664062, -0.38117218017578125, 1.3224258422851562, -0.040889739990234375, 0.9923553466796875, -0.5120391845703125, 0.2807159423828125, -0.013336181640625, 0.280609130859375, 0.160003662109375, -0.014682769775390625, 0.5475425720214844, -0.40552520751953125, 0.11685943603515625, -0.762786865234375, 0.3949546813964844, 0.19469451904296875, 0.44625091552734375, -0.39128875732421875, 0.1854705810546875, 0.5398712158203125, 0.28679656982421875, -0.8712539672851562, -0.0199432373046875, -0.10605239868164062, 0.62506103515625, 0.9708175659179688, -1.6626739501953125, 0.267425537109375, 0.6081199645996094, -0.03519248962402344, 0.3690032958984375, -0.37227630615234375, 0.77093505859375, 0.5647048950195312, 0.5853347778320312, 0.4150676727294922, 0.2393035888671875, 0.701385498046875, -0.5267677307128906, 0.11175537109375, 1.3858566284179688, -0.04052734375, 0.5150337219238281, -0.07108688354492188, 0.20592308044433594, -0.115447998046875, -0.11765289306640625, 1.373748779296875, 1.4494171142578125, -0.12866973876953125, -0.21945953369140625, 0.8156394958496094, 0.791412353515625, -0.635223388671875, 0.5322170257568359], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000062.npy"}
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.454556941986084, "std": 0.4705674350261688, "min": -1.1421051025390625, "p10": -0.030171966552734332, "median": 0.4634847640991211, "p90": 1.0991645812988282, "max": 1.44317626953125, "pos_frac": 0.890625, "sample": [0.5512771606445312, 0.23828125, -0.04901123046875, -0.11236572265625, 0.950836181640625, -0.24204635620117188, 0.062366485595703125, 1.266082763671875, 1.1570968627929688, 0.268402099609375, -1.1421051025390625, 0.10739898681640625, 0.18744277954101562, -0.1472625732421875, 0.7245712280273438, 0.6373443603515625, 1.0696258544921875, 0.01378631591796875, 0.4785919189453125, 0.6056175231933594, -0.20452499389648438, 0.24941253662109375, 0.6401748657226562, 1.1494140625, 1.1118240356445312, 0.7041034698486328, 0.2234039306640625, 0.5787944793701172, 0.14203643798828125, 0.121185302734375, 0.3206787109375, 0.6260833740234375, 0.5569572448730469, 0.7027740478515625, 0.5426712036132812, 0.2611541748046875, 0.8033218383789062, 0.2779273986816406, 0.4873046875, 0.30665016174316406, 0.5145893096923828, 1.44317626953125, 0.2431182861328125, 1.3439617156982422, 0.214080810546875, 0.5381412506103516, 0.438507080078125, 0.9921455383300781, 0.7077484130859375, 0.22265243530273438, 0.3743114471435547, 0.5927276611328125, 0.7608413696289062, -0.9114913940429688, 0.9192581176757812, 0.4088134765625, 0.38225555419921875, 0.4483776092529297, 0.11053085327148438, 0.08385467529296875, 0.6415481567382812, 1.2320556640625, 0.8382720947265625, 0.32489013671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000063.npy"}
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.08324190974235535, "std": 0.5646152496337891, "min": -2.062744140625, "p10": -0.4096931457519531, "median": 0.039028167724609375, "p90": 0.6819555282592774, "max": 1.83465576171875, "pos_frac": 0.546875, "sample": [0.20140838623046875, 1.83465576171875, -0.7387104034423828, -0.00318145751953125, 0.051174163818359375, 0.01551055908203125, 0.6714038848876953, -0.2095489501953125, 0.4031047821044922, 0.5159530639648438, 0.20064353942871094, 0.027515411376953125, 0.9534549713134766, -0.41985321044921875, 0.1204681396484375, -0.026479721069335938, -0.2541351318359375, 0.18603515625, 0.6050758361816406, 0.6710319519042969, 0.6144561767578125, 0.050540924072265625, -0.5538101196289062, -0.06537055969238281, -0.36675262451171875, 0.013872146606445312, 0.8984489440917969, -2.062744140625, -0.16963577270507812, 0.6485843658447266, 0.19173622131347656, -0.9398040771484375, 0.14107513427734375, -0.38262176513671875, 0.10830497741699219, -0.12253570556640625, 0.2825927734375, -0.9357757568359375, -0.1499347686767578, 0.47234344482421875, 1.0647430419921875, 0.31069183349609375, 0.474639892578125, -0.010494232177734375, 0.0689697265625, -0.7891082763671875, -0.1355113983154297, -0.330047607421875, -0.385986328125, 0.8559112548828125, -0.040843963623046875, -0.002437591552734375, 0.9422416687011719, 0.40894317626953125, -0.04488182067871094, 0.5334835052490234, 0.1816577911376953, -0.26857757568359375, -0.228912353515625, 0.15555953979492188, -0.18201446533203125, -0.04779815673828125, -0.3677196502685547, 0.6864776611328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000064.npy"}
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.4559671878814697, "std": 0.7178877592086792, "min": -1.7797584533691406, "p10": -0.26595916748046877, "median": 0.42757225036621094, "p90": 1.4616437911987308, "max": 2.005859375, "pos_frac": 0.765625, "sample": [0.5180015563964844, 0.4237861633300781, 0.5622673034667969, 0.41329193115234375, -0.15840721130371094, 0.3762626647949219, 0.1943206787109375, 1.680419921875, 0.08592033386230469, 0.9643630981445312, -0.22772216796875, 0.4551582336425781, 1.2307014465332031, 0.3429107666015625, 0.7868576049804688, 0.41570091247558594, 0.8045139312744141, 0.5826034545898438, -0.2596893310546875, 1.7894744873046875, 0.5779571533203125, 0.492584228515625, -0.244293212890625, 1.632049560546875, 1.4906253814697266, -0.041095733642578125, -1.7797584533691406, 0.02197265625, -0.6163711547851562, 0.43135833740234375, 0.04464530944824219, 0.1976776123046875, -0.7835845947265625, -0.6300468444824219, 0.8147659301757812, 0.8576278686523438, -0.2530250549316406, 1.195770263671875, 1.5640487670898438, 1.090545654296875, 1.3062782287597656, 0.30489158630371094, 1.3940200805664062, 0.3864250183105469, 0.01839447021484375, 0.144073486328125, 0.05694770812988281, 0.46884918212890625, -0.9525966644287109, -0.08319854736328125, 0.5051498413085938, 0.3606147766113281, 0.45557403564453125, 0.8421440124511719, -0.2669525146484375, 0.8279495239257812, 1.6309432983398438, -0.34122467041015625, 1.2504119873046875, 1.2658309936523438, 0.6647796630859375, -0.263641357421875, 2.005859375, 0.1561870574951172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000065.npy"}
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.2545853555202484, "std": 0.6430941224098206, "min": -1.248779296875, "p10": -0.43251647949218747, "median": 0.1909198760986328, "p90": 1.0460979461669924, "max": 2.464202880859375, "pos_frac": 0.671875, "sample": [0.10675621032714844, 1.3266448974609375, 0.26108360290527344, 0.07721328735351562, 0.17558670043945312, 0.7296295166015625, 0.08128547668457031, 0.4863166809082031, -0.054172515869140625, 0.4621143341064453, 0.285186767578125, 0.48751068115234375, 1.11370849609375, 0.13146591186523438, -0.86663818359375, -0.1996612548828125, -0.1820526123046875, 0.5497512817382812, 1.06390380859375, 1.1516036987304688, -0.0488433837890625, 1.0045509338378906, -0.276702880859375, -0.31865692138671875, -0.30934906005859375, 0.6087703704833984, 0.26363182067871094, 0.906341552734375, 0.8833541870117188, 0.09369659423828125, 0.357635498046875, 0.2658233642578125, -0.3898773193359375, 0.49695777893066406, 0.3647422790527344, -0.24071502685546875, 0.12352943420410156, 0.36647796630859375, 0.04693603515625, -0.08933258056640625, -0.77099609375, -1.248779296875, 0.035125732421875, -0.500457763671875, -0.12145233154296875, -0.4507904052734375, 0.6525039672851562, -0.062206268310546875, 0.2062530517578125, 0.21412277221679688, 0.21197891235351562, 0.22724533081054688, -0.022312164306640625, 2.464202880859375, 1.8537139892578125, -0.30318450927734375, -0.628204345703125, -0.7067108154296875, 0.9085540771484375, 1.7097625732421875, 0.479461669921875, 0.015623092651367188, 0.068084716796875, 0.765716552734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000066.npy"}
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.2615726888179779, "std": 0.6426294445991516, "min": -0.6160125732421875, "p10": -0.45417022705078125, "median": 0.21226882934570312, "p90": 1.1132881164550783, "max": 2.656951904296875, "pos_frac": 0.625, "sample": [-0.6160125732421875, -0.571746826171875, -0.47545814514160156, -0.3115692138671875, -0.5567398071289062, 0.2232074737548828, 1.2131824493408203, 0.7037601470947266, 0.34162139892578125, -0.2559967041015625, 0.2558479309082031, 0.21297264099121094, 0.5624923706054688, -0.4455718994140625, 0.26146697998046875, 0.8308582305908203, -0.4325084686279297, 0.6883621215820312, 0.22283172607421875, 0.7366180419921875, -0.045654296875, 0.8778305053710938, -0.5771026611328125, 1.3626022338867188, -0.0680999755859375, 0.29254150390625, 0.4140586853027344, -0.105072021484375, 1.2536239624023438, -0.41611480712890625, 0.1342926025390625, 0.5157890319824219, -0.098297119140625, 0.27805328369140625, 0.00920867919921875, 0.21134376525878906, 0.27685546875, 0.220611572265625, -0.2855243682861328, 0.7007064819335938, 2.656951904296875, -0.457855224609375, 0.6148681640625, 1.121795654296875, 0.12631988525390625, -0.107208251953125, -0.105682373046875, 0.3566761016845703, 0.1623992919921875, -0.23506927490234375, 0.18818283081054688, 2.163299560546875, 0.2115650177001953, 0.29329490661621094, 0.7435836791992188, -0.5899658203125, -0.09003448486328125, 0.161407470703125, -0.23833656311035156, -0.44075775146484375, 1.0934371948242188, 0.71026611328125, 1.23187255859375, -0.36962890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000067.npy"}
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.3488554060459137, "std": 0.6922289133071899, "min": -1.222869873046875, "p10": -0.37765731811523434, "median": 0.2648153305053711, "p90": 1.055366897583008, "max": 2.97540283203125, "pos_frac": 0.734375, "sample": [2.2471466064453125, 0.458282470703125, 0.5344219207763672, 0.9935150146484375, -0.07572174072265625, 0.733551025390625, 0.6669692993164062, 0.04151153564453125, 0.49161338806152344, -0.1969451904296875, 0.557708740234375, -1.222869873046875, 0.867645263671875, 1.06982421875, -0.06217193603515625, 1.0216331481933594, -0.32917022705078125, 0.1382904052734375, -0.205902099609375, 0.23322296142578125, 0.44727325439453125, -0.07110786437988281, 1.09893798828125, 0.45383453369140625, 0.40578460693359375, 0.157867431640625, 0.759857177734375, 0.16987991333007812, 0.4904022216796875, -0.5001373291015625, 0.07695579528808594, 0.387176513671875, 0.6626358032226562, 0.22701263427734375, -0.23494720458984375, 0.0963134765625, 2.97540283203125, -0.08701324462890625, 0.23021697998046875, 0.26642799377441406, 0.4702301025390625, -0.11267471313476562, -0.3984375, 0.5996627807617188, -0.855926513671875, -1.0129623413085938, 0.33373260498046875, 0.135040283203125, 0.2632026672363281, 0.18058013916015625, 0.9397735595703125, 0.1888275146484375, -0.24808502197265625, 1.1681365966796875, 0.762725830078125, 0.7892990112304688, 0.195159912109375, 1.0091629028320312, 1.4440765380859375, 1.5848388671875, 0.2733802795410156, 0.085479736328125, -0.6633033752441406, -0.7805023193359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000068.npy"}
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.3770950138568878, "std": 0.6233501434326172, "min": -0.967041015625, "p10": -0.31717262268066404, "median": 0.337615966796875, "p90": 1.0682865142822267, "max": 2.6227874755859375, "pos_frac": 0.734375, "sample": [0.1019744873046875, -0.3792839050292969, 1.5221405029296875, 0.43221282958984375, 0.8799667358398438, -0.135284423828125, -0.0425872802734375, 0.11704063415527344, 0.5607452392578125, 0.30358123779296875, 0.22673988342285156, 0.2992973327636719, 0.1500415802001953, 0.8130455017089844, 1.1613006591796875, 1.0723114013671875, 0.1532268524169922, 0.8424758911132812, 0.7231597900390625, -0.967041015625, 2.6227874755859375, -0.7977447509765625, 1.7545318603515625, 0.1546783447265625, 0.399139404296875, -0.13372421264648438, 0.23223876953125, 0.5639724731445312, 0.6578369140625, 0.509307861328125, 0.4972076416015625, 0.438812255859375, 0.4092979431152344, 0.45262908935546875, 0.5364799499511719, -0.46314239501953125, 0.5028038024902344, 0.094451904296875, 0.0815582275390625, 0.243377685546875, -0.1594390869140625, 0.2875843048095703, -0.1886444091796875, 0.30500030517578125, 0.6660385131835938, 0.37023162841796875, 0.6671981811523438, 0.5021724700927734, -0.2627544403076172, -0.28334808349609375, -0.34694671630859375, 0.4169273376464844, -0.5390357971191406, -0.1729869842529297, 1.0588951110839844, -0.3316688537597656, 0.8449554443359375, 0.9664726257324219, 0.7069168090820312, 0.23910140991210938, 1.62200927734375, -0.2340087890625, 1.6229248046875, -0.21507835388183594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000069.npy"}
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.2790454030036926, "std": 0.6391950249671936, "min": -1.3532600402832031, "p10": -0.4896984100341797, "median": 0.3255128860473633, "p90": 0.9860521316528321, "max": 1.729400634765625, "pos_frac": 0.703125, "sample": [0.37921905517578125, -0.4808502197265625, 0.7529144287109375, -0.123687744140625, 0.7802982330322266, -0.4914207458496094, 0.010181427001953125, 0.6988143920898438, 0.7993240356445312, -0.6216106414794922, 0.3194122314453125, -0.021823883056640625, -1.10406494140625, 0.8356170654296875, 0.16495513916015625, -0.37265777587890625, 0.1191253662109375, -0.8451004028320312, -0.48567962646484375, 0.2035980224609375, 1.2849349975585938, 0.9961681365966797, 0.5325813293457031, 0.5508079528808594, -0.33899688720703125, 0.743499755859375, 0.4890289306640625, 0.658172607421875, -0.40651702880859375, 0.99847412109375, 0.1586017608642578, 0.5352554321289062, -0.42256927490234375, 0.43723297119140625, 0.7148590087890625, 1.4410858154296875, 1.1056289672851562, 0.8658447265625, 0.45217132568359375, 0.7628974914550781, 0.5911788940429688, -0.11647224426269531, 0.03994178771972656, 0.742706298828125, 0.6124725341796875, 0.20999908447265625, -1.3532600402832031, 1.729400634765625, 0.42669677734375, 0.8311614990234375, 0.9624481201171875, 0.3210296630859375, 0.12432861328125, 0.020233154296875, -0.15613174438476562, 0.1707611083984375, -0.3995208740234375, -0.8154487609863281, 0.6710891723632812, 0.32999610900878906, -0.7772674560546875, -0.15571975708007812, 0.1303386688232422, 1.643218994140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000070.npy"}
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.11482015252113342, "std": 0.5464509129524231, "min": -1.678680419921875, "p10": -0.5115257263183594, "median": 0.13874435424804688, "p90": 0.8030761718750001, "max": 1.2841033935546875, "pos_frac": 0.578125, "sample": [-0.2808990478515625, 0.7607040405273438, 0.33655357360839844, 0.9685306549072266, -0.10318374633789062, 0.03098297119140625, 0.8823871612548828, -0.37358856201171875, -0.04967498779296875, 0.61077880859375, -0.1656341552734375, -0.680999755859375, 0.88775634765625, -0.19589996337890625, -0.44254302978515625, -0.06860160827636719, -0.036956787109375, 0.8212356567382812, -0.06662750244140625, 0.2902984619140625, -1.678680419921875, 1.2841033935546875, 0.04320526123046875, -0.010791778564453125, 0.8589038848876953, 0.6348876953125, 0.29901885986328125, 0.2830944061279297, 1.0920124053955078, -0.35480499267578125, -1.293701171875, 0.6371917724609375, 0.242828369140625, -0.8418731689453125, 0.04410362243652344, 0.4675636291503906, 0.3366260528564453, -0.7812042236328125, 0.5131549835205078, -0.45944976806640625, 0.21495819091796875, -0.533843994140625, -0.13677978515625, 0.4531097412109375, -0.04280853271484375, 0.056163787841796875, -0.546173095703125, 0.32140350341796875, -0.20682525634765625, 0.2030792236328125, 0.6448860168457031, 0.4842529296875, 0.07440948486328125, 0.6166534423828125, -0.36518096923828125, -0.2136077880859375, 0.3548774719238281, 0.36309814453125, 0.25086212158203125, -0.1354217529296875, 0.47399139404296875, 0.4618358612060547, 0.41065216064453125, -0.2959098815917969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000071.npy"}
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.38662609457969666, "std": 0.6615582704544067, "min": -1.0678386688232422, "p10": -0.47748641967773436, "median": 0.37055206298828125, "p90": 1.1709230422973635, "max": 2.138885498046875, "pos_frac": 0.734375, "sample": [-0.48072052001953125, 1.0251083374023438, 0.3596038818359375, 0.9597625732421875, -0.6253509521484375, 1.2741127014160156, 0.18082809448242188, 1.547210693359375, -1.0678386688232422, 0.16243743896484375, 1.284698486328125, 1.1942901611328125, 0.6250534057617188, 1.1163997650146484, 0.19733810424804688, -0.07052421569824219, -0.5244026184082031, 0.6655178070068359, 0.325164794921875, -0.469940185546875, 0.5114593505859375, 0.5669479370117188, 1.1155223846435547, 0.08463668823242188, 0.25580596923828125, -0.1244964599609375, -0.4180278778076172, -0.8667716979980469, -0.1148223876953125, -0.5878105163574219, 1.9703445434570312, 0.9629859924316406, 0.19559478759765625, 1.72314453125, -1.0247116088867188, -0.017169952392578125, 0.5994777679443359, 0.2872295379638672, 0.6749763488769531, 0.09746932983398438, 0.7916831970214844, 0.48769378662109375, 0.9216537475585938, -0.29422760009765625, 0.7515945434570312, 0.381500244140625, 0.8019256591796875, 0.2292938232421875, 0.16340255737304688, 0.6486053466796875, 0.270721435546875, 0.6705780029296875, 2.138885498046875, -0.2665138244628906, 0.654052734375, 0.1636371612548828, 0.6773605346679688, 0.43534278869628906, 0.4289093017578125, 0.24526023864746094, -0.1603240966796875, 0.6123123168945312, 0.5272140502929688, -0.107025146484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000072.npy"}
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.16693082451820374, "std": 0.8232986330986023, "min": -1.6412811279296875, "p10": -0.689578628540039, "median": 0.10947227478027344, "p90": 1.155889129638672, "max": 2.597259521484375, "pos_frac": 0.546875, "sample": [0.3343639373779297, 0.20770645141601562, -0.40316009521484375, -0.27465057373046875, -0.7016372680664062, -0.3621673583984375, 0.06946563720703125, -0.84698486328125, -1.6412811279296875, 1.1574935913085938, -0.38172149658203125, -0.9798507690429688, 0.15900802612304688, -0.6614418029785156, -0.3289203643798828, 0.3092079162597656, -0.3170928955078125, 0.5061492919921875, 0.697784423828125, -0.0902252197265625, -1.40716552734375, -0.3564720153808594, -0.48919105529785156, 0.48177337646484375, 2.21441650390625, 0.7554702758789062, -0.3030853271484375, -0.26662445068359375, -0.44477272033691406, -0.15692901611328125, 0.5769309997558594, 0.07156181335449219, 0.15917015075683594, 0.1771392822265625, -0.3931732177734375, 0.1473827362060547, 0.52099609375, 1.394134521484375, -1.1734657287597656, 0.05678558349609375, 1.1521453857421875, 1.578033447265625, -1.0986328125, -0.5019683837890625, -0.4401702880859375, 1.1277847290039062, 0.5695571899414062, -0.033123016357421875, 0.8624725341796875, 0.9037704467773438, -0.11212158203125, 1.0622940063476562, 1.2446708679199219, 0.3507270812988281, 0.3864784240722656, -0.3839874267578125, 0.4185218811035156, 0.19387054443359375, 0.1630859375, -0.1283721923828125, 0.6407947540283203, 2.235485076904297, 2.597259521484375, -0.12193107604980469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000073.npy"}
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.4163666069507599, "std": 0.7454833984375, "min": -1.17803955078125, "p10": -0.3728626251220703, "median": 0.3061676025390625, "p90": 1.3472885131835939, "max": 2.8951644897460938, "pos_frac": 0.6875, "sample": [-0.0858001708984375, -0.11531829833984375, -0.2382354736328125, 0.09577178955078125, 0.687957763671875, 0.01847076416015625, 0.810516357421875, 2.256561279296875, 1.1577224731445312, 1.6465606689453125, 1.1057472229003906, 1.2087326049804688, 0.218963623046875, 0.586334228515625, -0.14390945434570312, 0.5336570739746094, 1.45721435546875, 0.2570343017578125, -0.2436065673828125, -0.1145172119140625, 0.058444976806640625, 0.998687744140625, -0.21291732788085938, -0.18148040771484375, 0.4019775390625, -0.4576282501220703, -0.689239501953125, 0.15827178955078125, 1.303955078125, -1.17803955078125, 0.3036346435546875, 0.8768367767333984, 0.41483497619628906, -0.6492633819580078, -0.4214515686035156, 0.437713623046875, -0.3169441223144531, 0.26529693603515625, -0.22684478759765625, -0.3511772155761719, 0.8832263946533203, -0.8686733245849609, 0.49370574951171875, 0.6206207275390625, 0.593994140625, 1.3928985595703125, 2.8951644897460938, -0.3821563720703125, 1.8966636657714844, 1.0019683837890625, 0.0657958984375, 0.2234020233154297, 0.461151123046875, 0.3087005615234375, -0.08463287353515625, 0.27734375, 0.40923309326171875, 0.228485107421875, 0.972808837890625, 1.3658599853515625, 1.2250232696533203, 0.44734954833984375, -0.2153034210205078, 0.8003082275390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000074.npy"}
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.44073379039764404, "std": 0.7438761591911316, "min": -1.5034065246582031, "p10": -0.35098819732666015, "median": 0.3230166435241699, "p90": 1.3075607299804688, "max": 2.7578125, "pos_frac": 0.6875, "sample": [-0.09201812744140625, -0.2435760498046875, 0.5135173797607422, 0.31636810302734375, 1.6680984497070312, -0.08469390869140625, 0.2166595458984375, 0.7077064514160156, 1.3015899658203125, 1.2595252990722656, 0.44832611083984375, 0.10037612915039062, -0.604034423828125, 0.4225006103515625, -0.26759910583496094, 1.0730705261230469, -0.21427154541015625, -0.3437023162841797, 0.3296651840209961, 0.5774116516113281, 2.7578125, 0.8701095581054688, 1.92083740234375, 1.1381378173828125, 0.6667938232421875, 1.0594139099121094, 1.1657485961914062, -0.327606201171875, -0.10268783569335938, 0.12055587768554688, 0.01866912841796875, 1.6450881958007812, 0.53802490234375, 1.0618152618408203, 0.8361892700195312, 0.11454582214355469, 0.7230377197265625, 0.014617919921875, 1.4230270385742188, 1.2486152648925781, -0.3541107177734375, 0.2865486145019531, -1.5034065246582031, 0.26740264892578125, 0.5514984130859375, 0.7500495910644531, -0.001918792724609375, 0.02831268310546875, -0.05591011047363281, 0.19739341735839844, 0.24333572387695312, 1.2522048950195312, -0.6499824523925781, -0.38249969482421875, -0.0692901611328125, -0.4770240783691406, 1.31011962890625, -0.25971412658691406, -0.5702095031738281, 0.6551780700683594, 0.8372173309326172, -0.14752197265625, 1.8787689208984375, 0.4428558349609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000075.npy"}
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.23598253726959229, "std": 0.6922919750213623, "min": -1.1638946533203125, "p10": -0.6817832946777344, "median": 0.2537384033203125, "p90": 1.121124076843262, "max": 2.40509033203125, "pos_frac": 0.65625, "sample": [-0.6639404296875, 1.1636009216308594, -0.826904296875, -0.30841064453125, 0.44901275634765625, -0.8229961395263672, 0.3082141876220703, 0.5639419555664062, 0.21376609802246094, 1.4096221923828125, 1.410593032836914, 2.40509033203125, 0.04111289978027344, -1.1638946533203125, 0.5113983154296875, -0.9900283813476562, 0.343109130859375, 0.8040485382080078, 0.2159271240234375, 0.22048568725585938, 0.5794219970703125, 1.5332107543945312, -0.494659423828125, 1.059326171875, 0.6922149658203125, 0.24017333984375, 0.02626800537109375, 1.4870147705078125, -0.6617298126220703, -0.08045768737792969, 0.18070602416992188, -0.5447540283203125, -0.5257987976074219, 0.37933349609375, -0.18708038330078125, 0.5772857666015625, 0.580718994140625, 0.2659454345703125, -0.8607921600341797, 0.82049560546875, 0.3186378479003906, -0.38394927978515625, -0.22104263305664062, -0.2950935363769531, 0.2415313720703125, 0.5732440948486328, 0.318603515625, -0.7152519226074219, 0.7357101440429688, 0.6258163452148438, -0.6894302368164062, 0.09035491943359375, -0.13150787353515625, 0.625396728515625, 1.145263671875, 1.064798355102539, -0.40563201904296875, -0.01509857177734375, -0.20519256591796875, 0.4086761474609375, 0.793701171875, 0.33525657653808594, 0.31694793701171875, 0.220550537109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000076.npy"}
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.3053016662597656, "std": 0.72513347864151, "min": -1.3454704284667969, "p10": -0.6922149658203125, "median": 0.2065868377685547, "p90": 1.3229064941406252, "max": 1.972412109375, "pos_frac": 0.671875, "sample": [0.2824592590332031, 0.5976104736328125, -0.020412445068359375, -0.2147064208984375, 0.4969348907470703, 1.2469139099121094, 0.8259124755859375, 0.3396625518798828, -1.3454704284667969, 0.12804412841796875, 1.2650146484375, 0.6611137390136719, 0.46395111083984375, -0.5837936401367188, 0.6348838806152344, 1.1531810760498047, -0.0913543701171875, 1.4642257690429688, 0.5131149291992188, 1.72283935546875, 0.06530952453613281, 0.10626029968261719, 1.5065078735351562, 0.4366493225097656, 0.13549423217773438, -0.16928863525390625, -0.031036376953125, 0.13834381103515625, 0.053325653076171875, -0.07519340515136719, -0.729034423828125, -0.696746826171875, 0.24824142456054688, -0.1806488037109375, 0.47934722900390625, 1.34771728515625, -0.0616912841796875, 0.7538375854492188, 0.25699615478515625, 1.17108154296875, -0.7003631591796875, -0.17566680908203125, 0.5514297485351562, 1.260101318359375, -0.0207366943359375, 1.972412109375, 0.05675506591796875, 0.024480819702148438, 1.5523529052734375, 0.1211395263671875, 0.2839508056640625, -0.8817977905273438, 0.1649322509765625, 1.378631591796875, 1.2499618530273438, 0.040386199951171875, -0.9298629760742188, -0.681640625, -1.0467376708984375, -0.1454334259033203, 0.2837677001953125, 0.6061553955078125, 0.9348793029785156, -0.6553878784179688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000077.npy"}
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.3619759976863861, "std": 0.658534586429596, "min": -1.1016616821289062, "p10": -0.4375785827636719, "median": 0.44443511962890625, "p90": 1.0648571014404298, "max": 1.8192825317382812, "pos_frac": 0.671875, "sample": [0.7784881591796875, -0.201904296875, 0.40155982971191406, 0.6627197265625, -0.01194000244140625, 1.7148971557617188, 1.6055679321289062, -0.28382110595703125, 0.6990108489990234, -0.0471038818359375, 0.6621551513671875, 0.30084228515625, -0.2153301239013672, 0.5221786499023438, -0.167449951171875, 0.9329757690429688, -0.8106918334960938, 1.7290191650390625, -0.48858642578125, 0.6566925048828125, 0.1763916015625, -0.26076507568359375, 1.0897979736328125, 0.3685798645019531, 1.0717315673828125, 0.118927001953125, 0.7673263549804688, -0.43576812744140625, 0.974212646484375, -0.741912841796875, 0.8441963195800781, 0.9785079956054688, 0.5160598754882812, 1.8192825317382812, 0.505340576171875, -1.08599853515625, -1.1016616821289062, 0.412811279296875, 0.7688465118408203, 1.0488166809082031, 0.7137374877929688, 0.457611083984375, 0.7557220458984375, -0.4233970642089844, -0.698486328125, 0.7311859130859375, -0.1338653564453125, 0.48122215270996094, 0.32723236083984375, 1.3555068969726562, -0.4383544921875, 0.37392425537109375, -0.08367919921875, -0.37790679931640625, 0.4312591552734375, 0.32464599609375, 0.10909271240234375, -0.281829833984375, 0.9856986999511719, -0.18444061279296875, 0.5745697021484375, 0.7174110412597656, 0.4688377380371094, 0.7067642211914062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000078.npy"}
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 0.4268933832645416, "std": 0.6467375159263611, "min": -0.9049568176269531, "p10": -0.3797014236450195, "median": 0.36688995361328125, "p90": 1.2760738372802738, "max": 2.3421249389648438, "pos_frac": 0.734375, "sample": [-0.022256851196289062, 0.39404296875, -0.7176418304443359, 0.6398334503173828, -0.39580535888671875, 0.9918022155761719, -0.9049568176269531, -0.25856590270996094, 0.9163894653320312, 0.0811309814453125, -0.090362548828125, 0.28215789794921875, 1.0587043762207031, 2.1630172729492188, -0.6641769409179688, -0.45085906982421875, 0.6344337463378906, 0.46262550354003906, -0.5042400360107422, 0.12615203857421875, 0.11360740661621094, 1.6797637939453125, 0.298736572265625, 1.1929931640625, 0.5743446350097656, -0.22784042358398438, 1.3116798400878906, 0.7825698852539062, 0.6717720031738281, 0.26430511474609375, 0.0410308837890625, -0.36612701416015625, 0.6553115844726562, 0.3631858825683594, 1.0125350952148438, 0.5993919372558594, 0.5660285949707031, 0.6522884368896484, -0.15110015869140625, -0.30107879638671875, 0.3705940246582031, 0.2960052490234375, 0.1190643310546875, -0.38551902770996094, -0.16620826721191406, 0.3429450988769531, 0.7955551147460938, 0.1781005859375, 0.7992362976074219, 0.8106346130371094, 0.5339756011962891, 0.27478790283203125, 0.7448577880859375, -0.0231170654296875, 0.2678642272949219, 1.435028076171875, 1.4124908447265625, 0.7063827514648438, 0.6112136840820312, 1.4184188842773438, 0.1934661865234375, 0.7710609436035156, 2.3421249389648438, -0.0026092529296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000079.npy"}
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.38737374544143677, "std": 0.7441350221633911, "min": -1.2086334228515625, "p10": -0.5088739395141602, "median": 0.3528470993041992, "p90": 1.2716903686523442, "max": 2.3882980346679688, "pos_frac": 0.703125, "sample": [-0.031206130981445312, 1.0808601379394531, 0.8289337158203125, 0.9958572387695312, 0.9330997467041016, -1.1768341064453125, -0.2781829833984375, 0.3259868621826172, -0.2123870849609375, -0.8796310424804688, 1.7758197784423828, -0.11511993408203125, 0.48221588134765625, -0.4975318908691406, 0.08193778991699219, 2.3882980346679688, 0.60003662109375, 0.42565155029296875, 0.9030494689941406, 0.12972640991210938, 0.1936798095703125, 0.37970733642578125, -0.2396259307861328, 0.05033111572265625, 1.7901458740234375, 1.6425247192382812, 0.6918792724609375, 0.019317626953125, 1.3235626220703125, 1.1527557373046875, 0.38695526123046875, 0.7154159545898438, -0.16417694091796875, 0.4623146057128906, 0.24535369873046875, 1.058258056640625, -0.4255218505859375, 0.8062591552734375, 0.9482650756835938, -1.2086334228515625, -0.665771484375, 0.09096527099609375, -0.53076171875, 1.0844535827636719, -0.1796112060546875, 0.3909454345703125, 2.3175811767578125, 0.23849868774414062, 0.6989364624023438, -0.5137348175048828, 1.322662353515625, 0.11790275573730469, -0.347137451171875, 0.6480636596679688, 0.5360813140869141, -0.012233734130859375, 0.25308990478515625, 0.1332569122314453, 0.9618682861328125, 0.7101211547851562, -0.0150299072265625, -0.7465667724609375, 0.209930419921875, 0.49906158447265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000080.npy"}
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.43776261806488037, "std": 0.6847680807113647, "min": -1.016357421875, "p10": -0.260491943359375, "median": 0.29534435272216797, "p90": 1.2416038513183594, "max": 2.557046890258789, "pos_frac": 0.71875, "sample": [0.5461044311523438, -0.0126800537109375, 1.2622108459472656, -0.35943603515625, 1.4348526000976562, -0.26599884033203125, 0.2994213104248047, -0.21820831298828125, -0.054912567138671875, 0.00121307373046875, 0.4090728759765625, 0.26433563232421875, -0.318756103515625, 0.5101547241210938, -0.026681900024414062, 0.08182525634765625, 0.06758880615234375, 0.1394329071044922, -0.8163375854492188, 1.1239643096923828, 0.2513999938964844, 0.23156356811523438, 0.84674072265625, -0.134307861328125, 1.143951416015625, 0.7594566345214844, 0.442779541015625, 1.1968231201171875, -0.1663188934326172, 0.688720703125, 0.792877197265625, 1.2425079345703125, 1.0041732788085938, 0.07357978820800781, -0.14202117919921875, 0.907623291015625, 0.08758544921875, 2.157501220703125, 0.798553466796875, 1.79400634765625, 0.8070144653320312, 0.6304035186767578, 0.6943588256835938, -0.24764251708984375, 1.2394943237304688, -0.03269386291503906, 1.6006755828857422, 0.09178924560546875, 0.5698471069335938, 0.544952392578125, 2.557046890258789, 1.179595947265625, 0.63958740234375, 0.1875457763671875, 0.21787261962890625, -1.016357421875, -0.10844039916992188, 0.29126739501953125, -0.1183319091796875, 0.4245147705078125, 0.11917877197265625, 1.0049552917480469, -0.4709930419921875, -0.8331947326660156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000081.npy"}
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 0.5566992163658142, "std": 0.6405846476554871, "min": -0.9626541137695312, "p10": -0.3561763763427733, "median": 0.6027250289916992, "p90": 1.3451896667480472, "max": 2.07012939453125, "pos_frac": 0.84375, "sample": [0.6062068939208984, 0.855560302734375, 0.4362335205078125, 0.11963653564453125, 0.5992431640625, 0.2810020446777344, 0.7289905548095703, 0.7867965698242188, 1.2679595947265625, 0.8854026794433594, 1.0260791778564453, 0.6683464050292969, -0.537567138671875, 1.8466720581054688, 0.7920112609863281, 1.1714019775390625, 0.736297607421875, 0.8089675903320312, 1.0980606079101562, -0.40032196044921875, 0.1949310302734375, 0.792083740234375, 0.3661460876464844, 0.3646697998046875, 0.00952911376953125, 0.6963119506835938, 0.3513774871826172, 0.627716064453125, 1.137176513671875, 0.8492813110351562, 0.5561294555664062, 0.09816169738769531, 1.8835678100585938, -0.128173828125, -0.5856475830078125, 0.3303070068359375, 1.0767669677734375, -0.6736297607421875, 0.6545333862304688, 1.7381916046142578, 0.6711463928222656, -0.5766639709472656, 1.136444091796875, 0.004444122314453125, 0.8746681213378906, 0.11017990112304688, 0.9197101593017578, 0.3881855010986328, 0.9909515380859375, 2.07012939453125, -0.9626541137695312, 0.019435882568359375, 0.5826873779296875, 1.4125900268554688, 0.37123870849609375, -0.09506607055664062, -0.2736244201660156, 0.5397720336914062, 1.5406932830810547, 0.5968246459960938, 0.17189788818359375, -0.3915557861328125, 0.03261756896972656, 1.3782882690429688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000082.npy"}
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 0.43767082691192627, "std": 0.8553323745727539, "min": -1.7645416259765625, "p10": -0.6170509338378904, "median": 0.4155149459838867, "p90": 1.2725542068481446, "max": 2.849029541015625, "pos_frac": 0.734375, "sample": [1.8997039794921875, 0.925445556640625, 0.5098190307617188, 0.3015766143798828, 1.2883739471435547, -0.46530914306640625, -0.21804046630859375, 0.23531723022460938, 1.2356414794921875, 1.1473388671875, -1.5389480590820312, -0.03442192077636719, -0.3438987731933594, 0.2856178283691406, 1.3818378448486328, 0.23043441772460938, -0.7313156127929688, 1.0969085693359375, 0.2629890441894531, 0.5023765563964844, 1.0595779418945312, 0.5805511474609375, 0.36426544189453125, -0.025310516357421875, -1.173431396484375, 0.20157623291015625, 0.7005462646484375, 1.1282577514648438, 0.010984420776367188, -0.07822608947753906, -0.0408172607421875, -0.2230987548828125, 1.7251472473144531, -1.7645416259765625, -0.9452190399169922, 0.3830070495605469, 0.5281658172607422, -0.881866455078125, 0.3163795471191406, -0.4231719970703125, 0.44802284240722656, 0.6921234130859375, 2.019775390625, 0.5901851654052734, 0.0002079010009765625, 0.5443801879882812, -0.05730628967285156, 0.551239013671875, 0.640228271484375, 0.9854812622070312, 2.6153831481933594, 0.7522945404052734, 1.0847625732421875, 0.11614608764648438, 0.513824462890625, 1.1237735748291016, 1.1238899230957031, 1.2059001922607422, 0.9688720703125, 2.849029541015625, -0.6820831298828125, 0.14876556396484375, 0.25936317443847656, 0.10245132446289062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000083.npy"}
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 0.322173148393631, "std": 0.7241069078445435, "min": -1.65228271484375, "p10": -0.6368824005126953, "median": 0.36527442932128906, "p90": 1.1792686462402346, "max": 1.5393257141113281, "pos_frac": 0.71875, "sample": [0.3142242431640625, 0.9670486450195312, 0.44452667236328125, 1.0037422180175781, -0.3973655700683594, 1.072662353515625, 0.9515304565429688, -0.628265380859375, 0.4487342834472656, -0.8976593017578125, -0.9898452758789062, 1.5376396179199219, 0.6753196716308594, 0.9340591430664062, 0.9043731689453125, 0.01639556884765625, -0.20281600952148438, 0.17681884765625, 0.517669677734375, -0.6870956420898438, -0.6700439453125, 0.8377685546875, 0.2669544219970703, 0.7181854248046875, 0.5568504333496094, 0.8740234375, 0.04137420654296875, 0.5928955078125, 0.229217529296875, -0.31974029541015625, 1.0856895446777344, 0.140716552734375, 0.002124786376953125, 0.0745391845703125, -0.42099761962890625, 1.1979637145996094, 1.5092182159423828, 1.1356468200683594, 0.4676017761230469, -0.6262664794921875, -1.65228271484375, 1.3849639892578125, 0.7484207153320312, 0.28203582763671875, 0.7968215942382812, -0.2234516143798828, 0.3244972229003906, 1.5393257141113281, 0.4771766662597656, -0.15312957763671875, 0.8015518188476562, -1.4983901977539062, 1.5129146575927734, 0.3894195556640625, 0.3411293029785156, 0.6874198913574219, -0.083038330078125, 0.2882404327392578, 0.0281982421875, -0.5092239379882812, 1.3639373779296875, -0.13060760498046875, -0.6405754089355469, 0.6883087158203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000084.npy"}
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 0.6393401622772217, "std": 0.8568982481956482, "min": -1.4040336608886719, "p10": -0.3671066284179687, "median": 0.6292047500610352, "p90": 1.8311199188232425, "max": 2.6470794677734375, "pos_frac": 0.765625, "sample": [1.494039535522461, -0.051753997802734375, 0.4831714630126953, 0.6393032073974609, 0.5531997680664062, 1.45257568359375, 0.17137908935546875, 0.45682525634765625, -0.5873603820800781, 1.9121208190917969, 2.180877685546875, -1.3692474365234375, 1.2170181274414062, 2.114288330078125, 0.7189121246337891, 0.7743453979492188, 1.8624687194824219, 0.5227203369140625, 0.8495254516601562, 1.2912216186523438, 0.6319694519042969, 0.6264400482177734, -0.51788330078125, 1.0132331848144531, 0.0732879638671875, 0.359344482421875, 0.45453643798828125, 1.265411376953125, 1.7579727172851562, 0.1353607177734375, -0.7098045349121094, 0.8542346954345703, -0.2832183837890625, 0.1292724609375, -0.19898033142089844, -0.2383880615234375, 2.6470794677734375, 1.5614166259765625, 0.24888992309570312, 0.2658576965332031, -1.4040336608886719, 1.5551605224609375, 1.6143798828125, 0.9767112731933594, 2.0879058837890625, 1.9075355529785156, 0.7386627197265625, 1.1299896240234375, 0.7382240295410156, 0.8537464141845703, -0.001804351806640625, -0.20226287841796875, 0.09661865234375, 1.0677032470703125, 0.2508811950683594, -0.30096435546875, -0.5785331726074219, -0.38356781005859375, 1.2493362426757812, -0.32869720458984375, 1.4592056274414062, 0.472015380859375, 0.7888870239257812, 0.3690052032470703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000085.npy"}
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 0.42513370513916016, "std": 0.7199453115463257, "min": -0.9382476806640625, "p10": -0.5451644897460937, "median": 0.4033479690551758, "p90": 1.3678531646728516, "max": 2.4727706909179688, "pos_frac": 0.765625, "sample": [0.6920127868652344, 0.5942459106445312, -0.40480804443359375, 0.1445465087890625, 1.5800304412841797, 1.0662879943847656, 0.07909393310546875, 0.13535308837890625, -0.7325897216796875, 0.19550323486328125, -0.40160369873046875, -0.9382476806640625, 0.4973907470703125, 1.49420166015625, 0.38654327392578125, 0.9719772338867188, 1.053415298461914, -0.5598049163818359, -0.1416015625, 0.038360595703125, 0.507568359375, 0.0274505615234375, 0.1330718994140625, 0.4616127014160156, 0.41419219970703125, 0.06599807739257812, 1.3719329833984375, 0.49037933349609375, 1.4617538452148438, 0.30788612365722656, 1.6620903015136719, 0.725128173828125, 0.52410888671875, 1.0976409912109375, 0.965240478515625, 0.2192840576171875, -0.590728759765625, 0.08028030395507812, 1.2963142395019531, 0.53955078125, -0.10941314697265625, 0.4014549255371094, 0.8979911804199219, -0.25252532958984375, 0.27114105224609375, 0.8060340881347656, 0.7394466400146484, 0.4998931884765625, 0.9129409790039062, -0.5110034942626953, 2.4163665771484375, 0.28656005859375, -0.6650314331054688, 0.0442352294921875, 0.9748878479003906, 2.4727706909179688, 0.4052410125732422, -0.6387939453125, -0.4076652526855469, -0.5651054382324219, 0.19118118286132812, -0.29932594299316406, 1.3583335876464844, 0.4678802490234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000086.npy"}
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 0.5574126243591309, "std": 0.9524425268173218, "min": -2.3579559326171875, "p10": -0.6283886909484864, "median": 0.6787929534912109, "p90": 1.8264053344726567, "max": 2.33221435546875, "pos_frac": 0.75, "sample": [0.9008522033691406, 2.164093017578125, 0.6185302734375, 0.5389842987060547, 1.9844894409179688, 1.2913360595703125, -1.2901458740234375, -1.0147781372070312, 0.9735107421875, 1.0808238983154297, -0.2238006591796875, -0.6244306564331055, 0.40056419372558594, -0.5071220397949219, -0.3973388671875, 1.4698257446289062, 0.9396133422851562, 1.475799560546875, 1.2071170806884766, 1.0521621704101562, -1.399322509765625, 2.2938232421875, 1.0700607299804688, 0.01767730712890625, 1.0725936889648438, -2.3579559326171875, -0.38028717041015625, -0.4959068298339844, 1.8726577758789062, -0.6606616973876953, 0.427459716796875, 0.17159271240234375, 1.5858917236328125, -0.16487503051757812, 0.2098846435546875, -0.6300849914550781, 0.7345733642578125, 0.5628166198730469, 0.8307819366455078, 0.7302970886230469, 0.038677215576171875, 0.8162689208984375, 1.2478256225585938, -0.8892021179199219, 0.0355072021484375, -0.5330047607421875, 1.0074424743652344, 1.7184829711914062, 0.8495368957519531, 0.7964630126953125, 1.209564208984375, 1.5125579833984375, 1.88348388671875, 0.5677108764648438, 0.5054092407226562, 0.25399017333984375, 0.15265655517578125, -0.05263519287109375, 0.627288818359375, 1.896148681640625, 2.33221435546875, 0.796661376953125, 0.3109893798828125, 1.05926513671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000087.npy"}
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 0.4542158246040344, "std": 0.9697148203849792, "min": -2.1235809326171875, "p10": -0.7148777008056639, "median": 0.5528717041015625, "p90": 1.6607639312744147, "max": 2.733428955078125, "pos_frac": 0.6875, "sample": [0.5773773193359375, 0.5901145935058594, -0.49373626708984375, 1.1727828979492188, 2.7050628662109375, 0.24457550048828125, 2.0782089233398438, -0.6147308349609375, 1.2920074462890625, -0.2657661437988281, 1.5309906005859375, 0.900909423828125, 1.0290718078613281, 0.6196212768554688, -0.7467918395996094, 0.594024658203125, 0.4051971435546875, -0.12142562866210938, -2.1235809326171875, 0.709625244140625, 1.7163810729980469, 0.25174713134765625, 0.3979682922363281, -0.2597064971923828, -1.1726150512695312, 0.97613525390625, 1.3497810363769531, 2.733428955078125, -0.24860382080078125, -0.16523170471191406, -0.9556655883789062, -0.8095016479492188, 1.4085235595703125, 0.8936920166015625, 0.9423370361328125, -0.5441398620605469, 0.9519500732421875, 0.8719520568847656, 2.0661087036132812, -0.11601829528808594, 0.1530628204345703, 0.7346115112304688, 1.5200672149658203, 0.19450950622558594, -1.3959808349609375, 0.26959991455078125, 0.4305267333984375, -0.4343376159667969, 1.903106689453125, 0.8291149139404297, 0.836334228515625, 0.61956787109375, 1.3996734619140625, 0.0311431884765625, 0.5283660888671875, -0.48075294494628906, 0.15892791748046875, 0.022672653198242188, 0.6659927368164062, 2.0772151947021484, -0.1397571563720703, 0.59759521484375, -0.640411376953125, -1.183095932006836], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000088.npy"}
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 0.45661553740501404, "std": 0.8943817615509033, "min": -2.099822998046875, "p10": -0.5750144958496093, "median": 0.4176187515258789, "p90": 1.843901062011719, "max": 2.273681640625, "pos_frac": 0.703125, "sample": [0.5249710083007812, 0.4659385681152344, 0.37212181091308594, 0.4346923828125, 2.273681640625, 2.1484127044677734, 0.3339881896972656, 0.4005451202392578, 0.1317596435546875, 1.9873886108398438, 1.19366455078125, 1.1035842895507812, -0.1680450439453125, 1.27239990234375, -0.221923828125, 0.7203636169433594, 0.7739181518554688, -0.227142333984375, -0.1938323974609375, -1.32452392578125, 0.32659149169921875, -0.006137847900390625, 0.3867340087890625, 1.6408767700195312, 0.6330013275146484, 1.881927490234375, 0.5491790771484375, 0.024768829345703125, -0.3048667907714844, 0.673370361328125, -0.1065826416015625, 0.8605194091796875, 0.3442859649658203, -0.19913482666015625, 0.633544921875, 0.13674354553222656, 2.217254638671875, 0.5244407653808594, -2.099822998046875, -1.1899566650390625, 1.6525440216064453, 1.3590431213378906, 1.8916397094726562, 1.8972358703613281, 0.01506805419921875, -0.1878814697265625, 0.8432426452636719, 1.7551727294921875, 1.0262527465820312, -1.0867385864257812, -0.231353759765625, 0.2866077423095703, 0.6271896362304688, 0.38265228271484375, 0.6986179351806641, -0.59002685546875, 0.3826560974121094, 0.4914703369140625, -0.6614036560058594, -0.9380340576171875, 0.7400360107421875, -0.5399856567382812, 0.6393299102783203, -0.15863990783691406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000089.npy"}
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 0.4473702609539032, "std": 0.8737971782684326, "min": -1.825653076171875, "p10": -0.7045791625976562, "median": 0.4495992660522461, "p90": 1.3895774841308595, "max": 3.43170166015625, "pos_frac": 0.703125, "sample": [-0.9516868591308594, 1.3085346221923828, -0.11374473571777344, 0.5913619995117188, 1.0901432037353516, 0.40313720703125, 0.7903881072998047, 1.450347900390625, 0.8380279541015625, -1.825653076171875, 1.6673316955566406, 1.4075469970703125, 0.7007198333740234, 0.9467449188232422, 0.9835433959960938, -0.3424568176269531, 0.2749137878417969, 1.19952392578125, -0.17577362060546875, 0.18240737915039062, 1.4367904663085938, 1.0537490844726562, 0.8361968994140625, 2.49957275390625, 0.5222015380859375, 0.4960613250732422, -0.81072998046875, 3.43170166015625, 0.9656181335449219, 1.5759124755859375, 0.5499954223632812, 0.5762100219726562, 0.2252330780029297, 0.010894775390625, 0.16220855712890625, -0.8033294677734375, -0.05121612548828125, -0.7256317138671875, 0.20665740966796875, 0.8081417083740234, 0.5087966918945312, -0.65545654296875, 1.139678955078125, 1.101165771484375, 1.3476486206054688, 0.27101707458496094, 0.24285507202148438, 1.1868305206298828, -0.18138694763183594, -0.055309295654296875, -0.5175819396972656, 0.9746475219726562, 0.3603973388671875, -0.16154098510742188, -0.1188812255859375, 0.2580890655517578, -0.1617584228515625, -1.1668224334716797, 0.22859954833984375, 0.7120513916015625, -0.17458343505859375, -1.1390380859375, 0.07120323181152344, 1.1694793701171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000090.npy"}
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 0.6276512145996094, "std": 0.9792388677597046, "min": -2.0394821166992188, "p10": -0.23851623535156247, "median": 0.4860200881958008, "p90": 2.0109500885009766, "max": 3.2094192504882812, "pos_frac": 0.703125, "sample": [1.275146484375, 0.6606826782226562, 1.5388545989990234, -0.1519927978515625, 0.48876953125, 2.0076522827148438, -0.20818710327148438, -0.058109283447265625, 1.7215423583984375, 3.2094192504882812, -0.123870849609375, -0.0432586669921875, 1.305654525756836, 0.22425460815429688, 0.00843048095703125, -0.5780124664306641, 2.0229644775390625, 0.4532470703125, 0.056407928466796875, -1.2572097778320312, 1.436502456665039, 2.6863174438476562, 0.14011383056640625, 0.5611343383789062, 2.0702285766601562, 1.071258544921875, 0.8396682739257812, 0.203521728515625, 0.846923828125, 0.6867294311523438, 0.481292724609375, -0.12282180786132812, 0.8552398681640625, 0.1938953399658203, 1.7111568450927734, 0.9835853576660156, -0.1371002197265625, 0.995330810546875, -0.020732879638671875, -2.0394821166992188, 0.32131195068359375, -0.8321075439453125, -0.1004180908203125, -1.0896453857421875, 0.6993293762207031, 0.9036369323730469, -0.16768646240234375, 2.280691146850586, 0.5077285766601562, 1.2565536499023438, -0.06024169921875, -0.4034252166748047, -0.2515144348144531, 0.06917095184326172, 1.8630828857421875, 2.4691238403320312, 0.1272125244140625, -0.0404815673828125, 2.0123634338378906, 1.4658317565917969, 1.1918220520019531, 0.48327064514160156, 1.1708221435546875, 0.2980976104736328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000091.npy"}
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 0.4553444981575012, "std": 0.9607120752334595, "min": -1.7937049865722656, "p10": -0.8024467468261718, "median": 0.48258209228515625, "p90": 1.7075443267822266, "max": 2.584197998046875, "pos_frac": 0.65625, "sample": [0.45241546630859375, -0.2768402099609375, -1.2571792602539062, -0.012035369873046875, 0.10550689697265625, 0.43056488037109375, 0.15979766845703125, 0.6853122711181641, -0.18640899658203125, -0.4998607635498047, -0.83819580078125, 0.8545894622802734, 1.4637069702148438, -0.49648284912109375, 2.1655197143554688, 1.5520210266113281, 0.5230331420898438, -0.23738479614257812, 1.7143077850341797, -0.6384315490722656, 0.483184814453125, -0.4301109313964844, 1.5624427795410156, 0.6472244262695312, 1.7510833740234375, 0.2686023712158203, 0.2713813781738281, 0.8525619506835938, -0.7190322875976562, 0.4819793701171875, -1.1375064849853516, 1.4431686401367188, 0.8504562377929688, -0.04482078552246094, 0.4757537841796875, -0.2909965515136719, 0.1836700439453125, 0.4888153076171875, -0.9145965576171875, -0.6360397338867188, 0.7248764038085938, 0.7278289794921875, 1.18426513671875, 1.1990890502929688, 0.8242740631103516, -1.0816268920898438, 1.9539947509765625, 1.1104812622070312, 2.584197998046875, -1.392822265625, -0.2537498474121094, 2.0731048583984375, 1.0848312377929688, -0.16360855102539062, -0.1136627197265625, -1.7937049865722656, 1.558837890625, 0.6414413452148438, 0.6312980651855469, 1.89959716796875, 1.691762924194336, 0.38787269592285156, 1.3299484252929688, 1.0823440551757812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000092.npy"}
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 0.45140883326530457, "std": 0.9907634258270264, "min": -2.0633544921875, "p10": -0.7996025085449218, "median": 0.32828330993652344, "p90": 1.683036041259766, "max": 3.68072509765625, "pos_frac": 0.671875, "sample": [0.5737419128417969, 0.91973876953125, 1.71722412109375, 0.3215370178222656, -2.0633544921875, -0.2271900177001953, 0.681304931640625, -0.4193763732910156, 0.027544021606445312, 1.0461311340332031, 2.3483543395996094, 1.5553054809570312, 3.68072509765625, -1.1938972473144531, 0.2832298278808594, 0.6359157562255859, 1.1150169372558594, 0.9442024230957031, -0.04430389404296875, -0.5848960876464844, 1.5963058471679688, 0.56048583984375, -0.0111236572265625, 0.1154327392578125, -0.012712478637695312, -0.27684593200683594, 1.6032638549804688, -0.1766376495361328, -1.1266345977783203, -0.454681396484375, 0.7559814453125, 0.20781707763671875, 0.8038215637207031, 1.7510833740234375, -0.7206954956054688, 0.31396484375, -1.0623779296875, 0.5783729553222656, -0.0864715576171875, 0.2493438720703125, 0.964630126953125, 0.6280632019042969, 0.8506088256835938, 1.261495590209961, 0.49068450927734375, 0.33502960205078125, 0.04154777526855469, 0.1961212158203125, 1.992523193359375, 0.27504730224609375, 2.4421157836914062, 0.7507076263427734, 1.3352584838867188, 1.0603294372558594, 0.20595932006835938, -0.6365718841552734, 2.0953025817871094, -0.851959228515625, -0.9492874145507812, -0.15946197509765625, 1.1586151123046875, -0.8334197998046875, -0.09360885620117188, 0.4057884216308594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000093.npy"}
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 0.6093254685401917, "std": 1.1611896753311157, "min": -4.065105438232422, "p10": -0.6408924102783203, "median": 0.595524787902832, "p90": 1.8433439254760742, "max": 4.096893310546875, "pos_frac": 0.734375, "sample": [1.8861198425292969, 0.6588172912597656, 1.3017196655273438, 1.8356494903564453, 0.539764404296875, -0.9998016357421875, -0.07104682922363281, 0.32407379150390625, 1.280426025390625, 1.6533660888671875, 0.3472003936767578, 2.8173294067382812, 0.177093505859375, -4.065105438232422, 0.6151580810546875, 1.2545280456542969, -0.6493873596191406, -0.14251327514648438, 0.20758056640625, 0.6379776000976562, 1.5147781372070312, 1.731475830078125, 0.8454322814941406, 1.3438720703125, 1.75732421875, 0.1969165802001953, 0.7051773071289062, -0.6210708618164062, -1.0770835876464844, 0.3537750244140625, 1.3783378601074219, -0.2965106964111328, 0.08250617980957031, -0.27838134765625, 1.6071395874023438, 0.5221214294433594, 1.0616455078125, 2.10992431640625, 0.06402206420898438, 4.096893310546875, 1.328542709350586, 0.9236297607421875, -0.4574127197265625, 0.028594970703125, 1.3047218322753906, 1.8466415405273438, -0.9263992309570312, 1.921600341796875, 0.8982696533203125, 1.1997318267822266, 2.59796142578125, 0.5758914947509766, 0.48626708984375, -0.23877906799316406, 0.8900680541992188, 0.15587997436523438, -0.8035202026367188, 1.1929969787597656, -0.4885711669921875, 1.024679183959961, 0.24019432067871094, -0.4285430908203125, -0.9123630523681641, -0.07049751281738281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000094.npy"}
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 0.6571837067604065, "std": 0.94516921043396, "min": -1.932373046875, "p10": -0.6170387268066405, "median": 0.6560726165771484, "p90": 1.7208374023437505, "max": 2.70068359375, "pos_frac": 0.796875, "sample": [2.635223388671875, -1.2334022521972656, 1.1681594848632812, 0.12620925903320312, 0.3730621337890625, 0.22801780700683594, 1.5466766357421875, 0.4973773956298828, 1.1451416015625, 1.5753097534179688, 1.1108245849609375, 0.783355712890625, 0.8159332275390625, 0.6777763366699219, -0.7928085327148438, 0.23052978515625, 1.450347900390625, 0.634368896484375, 0.5738277435302734, 0.9316062927246094, 0.1698150634765625, 1.45538330078125, -0.9667301177978516, 1.9166984558105469, -0.2752838134765625, -0.6788711547851562, 2.70068359375, 0.2799835205078125, 0.7685737609863281, -1.932373046875, 0.8433418273925781, 0.6857757568359375, 0.2931480407714844, 1.3220138549804688, -0.7388210296630859, 0.460601806640625, 1.9537811279296875, 0.9559860229492188, -0.12247467041015625, -0.40529632568359375, -0.4727630615234375, 0.22237396240234375, 0.31819915771484375, 0.8719043731689453, 0.0345916748046875, 0.9212074279785156, 1.602874755859375, -0.09255409240722656, 0.6215553283691406, 2.6625595092773438, 1.4566421508789062, 1.771392822265625, -0.9643745422363281, 0.6195449829101562, 1.5623092651367188, 2.461700439453125, 0.3104248046875, 0.5100784301757812, 1.1705093383789062, 0.83172607421875, 0.05874061584472656, 1.4872322082519531, 1.219146728515625, -0.2887382507324219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000095.npy"}
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 0.6444511413574219, "std": 0.973292350769043, "min": -1.9362258911132812, "p10": -0.6396026611328124, "median": 0.6201896667480469, "p90": 1.8765544891357424, "max": 2.4892730712890625, "pos_frac": 0.734375, "sample": [1.6790275573730469, -1.9362258911132812, 1.2211456298828125, 0.67877197265625, 1.2193450927734375, -0.050811767578125, 0.63238525390625, -0.8572196960449219, 0.4608154296875, 0.6110305786132812, 0.5811004638671875, -0.08121490478515625, 1.015228271484375, 2.4892730712890625, 0.9558639526367188, 1.3256950378417969, -0.6743011474609375, -0.04305267333984375, 1.122528076171875, -0.41396331787109375, 1.8971824645996094, 1.748046875, 0.3675994873046875, -1.7099227905273438, 0.23876190185546875, 0.4647636413574219, 1.1884689331054688, 1.0710906982421875, 0.6293487548828125, -0.09783554077148438, 1.4555625915527344, 2.27276611328125, 2.142852783203125, 1.8284225463867188, -0.3821277618408203, 0.7118377685546875, -0.7395401000976562, 0.06668663024902344, 0.4709320068359375, 1.3526954650878906, 1.3239593505859375, -0.8727836608886719, 1.1482563018798828, 0.004100799560546875, 1.6544342041015625, 0.1458587646484375, 1.728189468383789, -0.9166259765625, 1.7495803833007812, -0.39727020263671875, 1.2482376098632812, 0.09574127197265625, 0.5422630310058594, 1.9554367065429688, 1.9493560791015625, 0.7939834594726562, 0.4619102478027344, 0.24264907836914062, -0.05004692077636719, 2.219024658203125, 1.607940673828125, -0.009725570678710938, 0.26602935791015625, -0.5586395263671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000096.npy"}
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 0.48872342705726624, "std": 1.0944247245788574, "min": -1.3875503540039062, "p10": -0.7087158203124999, "median": 0.3880043029785156, "p90": 1.943289947509766, "max": 3.7520751953125, "pos_frac": 0.625, "sample": [-0.3422698974609375, 1.2413482666015625, -1.2242393493652344, 3.1860904693603516, 0.7259731292724609, 2.5438232421875, -0.08193588256835938, -1.3492889404296875, 0.21755218505859375, -0.569091796875, -1.05914306640625, 0.14656829833984375, 0.30641937255859375, 0.9259548187255859, -0.7360610961914062, -1.3875503540039062, -0.1291332244873047, 1.4329204559326172, 0.8790206909179688, 0.06035423278808594, 0.5903778076171875, 0.441650390625, 0.2732124328613281, -0.4927825927734375, -0.025543212890625, -0.650909423828125, 0.14536285400390625, 0.9433746337890625, 0.6609764099121094, -0.277862548828125, 1.0489044189453125, -0.155242919921875, 1.8405532836914062, 0.6124057769775391, 0.7588634490966797, 0.1781463623046875, 2.761659622192383, -0.3459510803222656, -0.27901268005371094, 1.80157470703125, 1.1731986999511719, -0.10537338256835938, 3.7520751953125, 0.7481040954589844, 0.48854827880859375, -1.017435073852539, -0.08644866943359375, 0.36688232421875, 2.96148681640625, -0.733489990234375, -0.3490447998046875, 0.40912628173828125, 1.38995361328125, -0.36338043212890625, 2.1835403442382812, 1.9873199462890625, 1.5849018096923828, 0.5485000610351562, 0.45166015625, 1.1068801879882812, 0.4340705871582031, -0.18243408203125, 0.4401588439941406, -0.5275707244873047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000097.npy"}
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.4276837110519409, "std": 0.9096823930740356, "min": -1.1939201354980469, "p10": -0.8419265747070311, "median": 0.5301437377929688, "p90": 1.7297561645507815, "max": 2.5442733764648438, "pos_frac": 0.671875, "sample": [-0.3439903259277344, 0.7266445159912109, 0.7367935180664062, 0.8638916015625, 0.8691558837890625, 0.5839767456054688, 0.30733489990234375, 0.5025711059570312, 0.8301925659179688, 1.5118255615234375, -0.8668975830078125, -0.7488613128662109, 2.1147918701171875, -0.282318115234375, 1.6489105224609375, 0.5577163696289062, 0.3574943542480469, 2.5442733764648438, 0.6875438690185547, -0.9486846923828125, 0.7193031311035156, 2.076099395751953, -0.5585784912109375, 1.153411865234375, 0.2921619415283203, -0.3856639862060547, -0.20238494873046875, 0.10238265991210938, -0.8905849456787109, -0.20952606201171875, 0.9093399047851562, 0.49737548828125, 0.782135009765625, 0.9864959716796875, -0.9088249206542969, 0.1041259765625, 0.92340087890625, 2.0140609741210938, 0.09998321533203125, 0.14712905883789062, 1.3294143676757812, -0.575531005859375, -0.783660888671875, 1.0961380004882812, -1.0325927734375, -0.8712501525878906, 1.2951622009277344, 0.36096763610839844, -0.28314208984375, 1.9202117919921875, 0.6126251220703125, 1.9184112548828125, -0.7563858032226562, -1.1939201354980469, 0.798492431640625, 0.7169189453125, 1.0737533569335938, -0.19843482971191406, 0.21916580200195312, 1.764404296875, -0.4579734802246094, 0.76470947265625, 1.0937004089355469, -0.7436332702636719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000098.npy"}
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.6673829555511475, "std": 1.0425124168395996, "min": -1.6840057373046875, "p10": -0.46651573181152334, "median": 0.5557451248168945, "p90": 1.9016254425048829, "max": 3.7293548583984375, "pos_frac": 0.71875, "sample": [2.6461868286132812, 2.819110870361328, 1.2296066284179688, 2.8020095825195312, 0.20740509033203125, 0.5011749267578125, 0.22040939331054688, -0.6733512878417969, -0.07599639892578125, 3.0561065673828125, 0.58477783203125, 1.8239288330078125, 1.9491310119628906, 1.6267585754394531, 0.24053573608398438, 0.18804168701171875, 0.07178115844726562, 0.2688922882080078, -0.3548469543457031, 1.6098079681396484, 1.1764984130859375, -0.2678184509277344, -0.11045455932617188, 0.25347137451171875, -0.3000946044921875, 0.1929779052734375, 1.45159912109375, -0.9720458984375, -0.624847412109375, 1.1707611083984375, 0.2543525695800781, 1.605316162109375, -0.3086395263671875, 0.8912277221679688, 0.9600257873535156, 0.5432357788085938, 1.306396484375, -1.0360908508300781, 0.6614742279052734, 1.8952674865722656, 0.6663990020751953, -0.00335693359375, 1.9043502807617188, 3.7293548583984375, -0.2869720458984375, 1.2233123779296875, 1.5151596069335938, -0.0681915283203125, -1.6840057373046875, 0.9277820587158203, 1.0277137756347656, 1.0084819793701172, 0.9542140960693359, 0.8196830749511719, -0.7114105224609375, 0.48465728759765625, 0.2128753662109375, 1.1764450073242188, 0.5682544708251953, -0.5030670166015625, 0.661041259765625, 0.12865066528320312, -0.14171600341796875, -0.3812294006347656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000099.npy"}
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 0.527890682220459, "std": 1.3799700736999512, "min": -2.970294952392578, "p10": -1.0219474792480467, "median": 0.2904186248779297, "p90": 2.4875102996826186, "max": 5.48468017578125, "pos_frac": 0.703125, "sample": [0.69244384765625, 0.01923370361328125, -1.0755767822265625, 1.2740020751953125, 0.496795654296875, -0.4243278503417969, 0.6859455108642578, 1.531198501586914, 2.6307525634765625, 1.036224365234375, 1.0308074951171875, 0.9687881469726562, 1.1439132690429688, 1.3214569091796875, 0.6150970458984375, 2.9506607055664062, -0.8968124389648438, -0.23153305053710938, 0.2135944366455078, 0.5439605712890625, 0.22320938110351562, 0.13393402099609375, -1.297760009765625, 0.20207977294921875, 0.5659561157226562, 1.2568817138671875, 0.14267539978027344, 1.1219940185546875, 0.06686592102050781, -1.357666015625, -0.3845481872558594, -1.562042236328125, -0.5758438110351562, -1.5050697326660156, -0.2871246337890625, -0.3373870849609375, 0.21045684814453125, -0.0971527099609375, -2.970294952392578, 2.697610855102539, -0.4193572998046875, 2.9631500244140625, -0.3785133361816406, 0.832183837890625, 0.2611274719238281, 2.153278350830078, 1.7570648193359375, 0.055973052978515625, 0.07978057861328125, -0.5851478576660156, 3.12811279296875, 1.6546173095703125, 0.5345611572265625, 0.31970977783203125, -0.0598907470703125, 1.067159652709961, 0.16977691650390625, 2.710906982421875, 1.679107666015625, 0.6111984252929688, 5.48468017578125, 0.025421142578125, -2.559234619140625, 1.52593994140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000100.npy"}
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 0.8248313069343567, "std": 1.0132062435150146, "min": -1.2912063598632812, "p10": -0.4748527526855467, "median": 0.7900924682617188, "p90": 2.1771278381347656, "max": 3.9773330688476562, "pos_frac": 0.78125, "sample": [0.0700836181640625, 0.6860198974609375, -0.6080398559570312, -0.67657470703125, 2.2496566772460938, 1.7540512084960938, 1.135833740234375, 0.2955589294433594, -0.30450439453125, 0.8406772613525391, -0.05925178527832031, 1.547119140625, 0.4966278076171875, 0.08847808837890625, 2.7434120178222656, 0.047977447509765625, 1.2071113586425781, 1.1514434814453125, -0.12019729614257812, -0.9507198333740234, 1.3614959716796875, 0.8069000244140625, 1.058746337890625, 0.2837944030761719, 1.0979499816894531, 1.4297561645507812, 1.9525604248046875, 2.676624298095703, 0.5360221862792969, 1.8842182159423828, 1.1118392944335938, 2.4554443359375, 2.041555404663086, 0.1636199951171875, -0.5478591918945312, -1.2912063598632812, 0.7377586364746094, -0.9305419921875, -0.27030181884765625, -0.016620635986328125, 0.773284912109375, 1.3048248291015625, 0.9082012176513672, 0.5289154052734375, 2.1995925903320312, 0.47833251953125, 1.0843658447265625, 0.525604248046875, -0.20552635192871094, 0.7106494903564453, 2.1247100830078125, 0.2601642608642578, 2.208637237548828, 1.815969467163086, -0.135498046875, 0.5487060546875, 1.3869476318359375, 1.121551513671875, -0.6182365417480469, 0.8172492980957031, 1.3899688720703125, 1.2158679962158203, 3.9773330688476562, 0.23106765747070312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000101.npy"}
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.5678646564483643, "std": 1.1672836542129517, "min": -2.636871337890625, "p10": -0.9031829833984374, "median": 0.5154151916503906, "p90": 1.904111099243164, "max": 3.2996826171875, "pos_frac": 0.671875, "sample": [1.233978271484375, -0.16217803955078125, -0.27504539489746094, 1.1251907348632812, -1.611663818359375, 0.52423095703125, -0.30496978759765625, 1.6433067321777344, -1.1257705688476562, -0.039562225341796875, 1.6429824829101562, 0.8265151977539062, 0.32404327392578125, 0.28852081298828125, 1.6278533935546875, -0.0960693359375, -1.093648910522461, 2.621246337890625, -0.66741943359375, 1.6182708740234375, -0.7949295043945312, 0.13415908813476562, 0.33742523193359375, 0.2026519775390625, 1.5618057250976562, 0.7063140869140625, 1.0420074462890625, 1.37213134765625, -0.9678192138671875, 0.48922157287597656, 1.9856796264648438, 1.9041671752929688, -0.6938629150390625, 1.6583290100097656, 0.33846282958984375, 0.894683837890625, 0.9355545043945312, 3.0092849731445312, 2.261383056640625, -0.4367561340332031, 0.20812034606933594, 3.2996826171875, -0.9561920166015625, 0.9060516357421875, -0.4424552917480469, 0.5065994262695312, 0.8628082275390625, 0.7440719604492188, 0.7926769256591797, 0.1116485595703125, 1.3862438201904297, 1.0769729614257812, 0.03186798095703125, -0.9495773315429688, -0.4636077880859375, -0.5828514099121094, -0.5115585327148438, 3.05902099609375, -0.03952789306640625, -2.636871337890625, 1.9039802551269531, 1.6337661743164062, 1.7105941772460938, 0.6521682739257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000102.npy"}
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 0.6194085478782654, "std": 1.1317458152770996, "min": -2.20745849609375, "p10": -0.5907722473144531, "median": 0.6405315399169922, "p90": 2.0825656890869144, "max": 3.7946929931640625, "pos_frac": 0.703125, "sample": [-0.5523681640625, 0.45211029052734375, 1.59613037109375, -1.4020843505859375, 1.2195205688476562, 0.2223968505859375, 0.9446258544921875, -0.20123863220214844, 0.9190254211425781, 0.8123703002929688, 0.00254058837890625, 0.8600006103515625, -0.20257568359375, 0.3132972717285156, -1.3578338623046875, 1.6917953491210938, 1.9720783233642578, 1.9192962646484375, -2.20745849609375, 2.0979042053222656, 0.334136962890625, 0.5548019409179688, 0.4124279022216797, 0.7287483215332031, 0.7404518127441406, -0.143218994140625, -0.08445358276367188, 2.1312942504882812, -0.8320083618164062, 0.8208961486816406, 0.30272674560546875, 2.0467758178710938, 1.9212646484375, 0.23761558532714844, -0.9060745239257812, 1.0638580322265625, -0.6072311401367188, 0.88592529296875, 0.4604339599609375, -0.03479576110839844, 2.2185287475585938, 0.1494426727294922, 1.1094818115234375, 1.2047348022460938, 1.2232952117919922, -0.278594970703125, 2.101104736328125, -0.264251708984375, 2.2993812561035156, -0.4209556579589844, 3.549652099609375, 1.7973861694335938, -1.6700820922851562, 0.7263870239257812, -0.1552276611328125, -0.22191619873046875, 0.03787994384765625, 0.036968231201171875, -0.30855751037597656, 0.7262611389160156, 0.9833412170410156, 0.8005752563476562, 3.7946929931640625, 1.0695114135742188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000103.npy"}
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 0.6416438817977905, "std": 1.259078860282898, "min": -1.3468246459960938, "p10": -0.8673133850097654, "median": 0.48157596588134766, "p90": 2.492529296875, "max": 5.0255126953125, "pos_frac": 0.671875, "sample": [0.58673095703125, 0.9788818359375, 1.2060966491699219, 0.4550895690917969, -1.2836074829101562, -0.09140777587890625, 0.6476325988769531, 1.0026092529296875, 1.06842041015625, -0.538055419921875, 0.13100624084472656, -0.577850341796875, 5.0255126953125, 1.6746978759765625, 0.0595550537109375, -0.6618881225585938, 1.557373046875, 0.5132923126220703, 0.3259544372558594, 0.04917144775390625, -0.3022308349609375, -1.0909576416015625, 2.50421142578125, 1.535980224609375, -0.9983367919921875, -0.2337627410888672, 0.7878761291503906, -1.0211944580078125, 2.180257797241211, -0.04703330993652344, 0.5049610137939453, -0.08139610290527344, -0.4698143005371094, 0.11511421203613281, 1.1449470520019531, 0.7879161834716797, 0.0765380859375, -0.9811019897460938, 2.604736328125, 0.5486602783203125, 1.1611824035644531, 0.23621559143066406, 3.98150634765625, -0.33821678161621094, 1.2610626220703125, 1.7243309020996094, 1.54486083984375, 0.13237953186035156, -1.3468246459960938, -0.6395492553710938, 1.4584159851074219, 0.6958942413330078, 1.3722877502441406, 2.5049591064453125, 0.5324649810791016, 0.43323516845703125, -0.3398399353027344, -0.1323394775390625, 0.45819091796875, -0.955352783203125, -0.40212249755859375, 2.936878204345703, 2.46527099609375, 2.625732421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000104.npy"}
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 0.5302802920341492, "std": 1.2047958374023438, "min": -2.2762908935546875, "p10": -0.7134994506835938, "median": 0.3614025115966797, "p90": 2.339589309692384, "max": 3.4521427154541016, "pos_frac": 0.65625, "sample": [0.9652519226074219, 0.2879791259765625, 0.7471466064453125, 0.8799324035644531, 0.6013393402099609, 1.4489803314208984, -2.0178070068359375, 2.8424072265625, -1.36309814453125, 0.8632869720458984, -0.2934684753417969, 1.7741374969482422, 1.3450965881347656, 1.7607574462890625, 0.3444862365722656, 2.4792251586914062, -0.0028476715087890625, -2.2762908935546875, -0.26111602783203125, 2.4555511474609375, 0.23299407958984375, 0.2986907958984375, -0.05825042724609375, -1.1812667846679688, -0.1947784423828125, -0.2393951416015625, 1.2004203796386719, 2.496429443359375, 0.9929885864257812, -0.4951019287109375, -0.11479949951171875, 0.8160800933837891, -0.3448333740234375, -0.07462310791015625, -0.05733489990234375, 1.8778839111328125, 0.33492279052734375, 1.07757568359375, -0.70477294921875, 0.5902862548828125, 0.3852500915527344, 0.17798995971679688, -0.7172393798828125, 1.6645641326904297, 2.747955322265625, 2.069011688232422, -2.1847381591796875, -0.329315185546875, -1.005523681640625, 0.00833892822265625, 0.3161163330078125, 0.6649436950683594, 0.20431900024414062, 2.8212661743164062, 1.0305938720703125, 0.01532745361328125, 3.4521427154541016, 0.37831878662109375, -0.5924243927001953, 1.216287612915039, -0.4873046875, 1.636749267578125, 0.8241119384765625, 0.6071319580078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000105.npy"}
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 0.5590271949768066, "std": 1.1443263292312622, "min": -2.972015380859375, "p10": -0.7068904876708983, "median": 0.6703596115112305, "p90": 1.870207595825196, "max": 3.181671142578125, "pos_frac": 0.671875, "sample": [0.667724609375, 1.5193328857421875, 1.3273296356201172, 0.9226608276367188, 0.24920654296875, -0.782501220703125, 3.181671142578125, 1.2640094757080078, -0.048297882080078125, 0.774749755859375, 0.621063232421875, 0.51129150390625, 2.4774017333984375, 0.47271728515625, -0.13733673095703125, -0.6085357666015625, 1.2507362365722656, 0.7789649963378906, -0.26854705810546875, -0.3134765625, -0.009723663330078125, 0.30464935302734375, -1.08392333984375, -0.0387420654296875, -0.2010650634765625, -0.3142127990722656, -0.2440929412841797, -0.9554595947265625, 0.7341156005859375, 2.9068756103515625, -0.7490425109863281, 0.30927467346191406, -0.4073638916015625, 0.7714157104492188, 1.2218475341796875, 1.937591552734375, 1.7129783630371094, -0.26180076599121094, 2.5109939575195312, 1.0770263671875, 1.1903533935546875, 0.6978683471679688, 2.973134994506836, 0.9492816925048828, -1.1573600769042969, 0.35695648193359375, 0.05318260192871094, 0.9344711303710938, -2.972015380859375, 0.6729946136474609, -0.588348388671875, 1.6459007263183594, 1.3528213500976562, 2.181795120239258, 1.2088775634765625, 0.11945343017578125, 0.058895111083984375, 1.4142818450927734, -0.4626884460449219, 1.2987823486328125, -2.3355865478515625, 0.8288116455078125, 0.8424491882324219, 1.4319229125976562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000106.npy"}
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 0.39563363790512085, "std": 1.3711782693862915, "min": -3.0797576904296875, "p10": -1.3511459350585937, "median": 0.30430030822753906, "p90": 2.060131454467774, "max": 3.7076416015625, "pos_frac": 0.671875, "sample": [0.1484851837158203, -2.4438323974609375, 0.623931884765625, 3.4798355102539062, 0.8453292846679688, 0.7504959106445312, -1.369232177734375, 1.4334793090820312, -1.3430709838867188, 0.854827880859375, -3.0797576904296875, 1.2116622924804688, 0.8679599761962891, 0.22647857666015625, 0.6800460815429688, -1.4292335510253906, 0.3675689697265625, 0.13149261474609375, -0.23473167419433594, -1.3311309814453125, -0.00861358642578125, 3.7076416015625, 3.4737625122070312, 0.13675308227539062, 1.9509620666503906, -2.0823440551757812, 1.7909717559814453, -1.4027519226074219, -0.49570465087890625, 0.02875518798828125, 0.7923736572265625, -0.8490180969238281, 0.428741455078125, 0.4235992431640625, 0.18047332763671875, 0.048065185546875, 2.9665756225585938, -0.09400558471679688, 1.3392868041992188, 2.7046051025390625, 1.9032440185546875, 2.1069183349609375, 0.5584793090820312, -1.1982421875, 1.0089664459228516, 0.8400650024414062, 0.298492431640625, 0.24945068359375, 0.5327129364013672, -1.3546066284179688, 2.7303848266601562, 1.4804763793945312, 0.18737220764160156, -0.9715232849121094, -0.18602561950683594, 1.2569580078125, 0.1898193359375, 0.625, 0.3101081848144531, -0.093353271484375, -0.42671966552734375, -0.880035400390625, -0.351593017578125, 1.0734710693359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000107.npy"}
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 0.738323450088501, "std": 1.2854849100112915, "min": -2.529052734375, "p10": -0.7580177307128906, "median": 0.7354049682617188, "p90": 2.5784149169921884, "max": 3.4606475830078125, "pos_frac": 0.640625, "sample": [1.4675140380859375, 1.0830078125, 2.8281173706054688, 1.5136489868164062, -0.35062217712402344, 2.3349151611328125, 1.5630340576171875, 1.8043155670166016, 3.4606475830078125, -0.2407379150390625, -0.3640594482421875, 3.2843856811523438, 1.3447113037109375, -0.2594261169433594, 1.1320648193359375, 0.783660888671875, -0.7697525024414062, -0.23474884033203125, 0.08311080932617188, 1.0815486907958984, 2.02569580078125, 0.2314453125, -0.401641845703125, -0.93341064453125, 1.1207275390625, -0.7306365966796875, 1.5721206665039062, -1.2195892333984375, 3.2137298583984375, 0.48604583740234375, 0.3747711181640625, -0.09650611877441406, 1.005615234375, 0.41341400146484375, 1.441558837890625, 3.262277603149414, -0.33304595947265625, -0.11772346496582031, -2.529052734375, 1.3686065673828125, -1.1376876831054688, 2.7065963745117188, 0.3398456573486328, -0.16271018981933594, -0.29653167724609375, 0.8576240539550781, -0.09394645690917969, 2.3987579345703125, -0.22716522216796875, -0.60028076171875, -1.5491981506347656, -0.871551513671875, 0.6586456298828125, 1.9044723510742188, 0.6871490478515625, 1.1434059143066406, 1.0844650268554688, 2.6554107666015625, 1.298004150390625, 1.9622573852539062, -0.21175193786621094, 1.8994255065917969, 0.0694732666015625, 1.0382537841796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000108.npy"}
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 0.8866567611694336, "std": 1.2254610061645508, "min": -0.8974037170410156, "p10": -0.5519176483154297, "median": 0.5802679061889648, "p90": 2.2170282363891602, "max": 5.324951171875, "pos_frac": 0.8125, "sample": [0.792449951171875, 2.0468215942382812, 3.55023193359375, -0.15713882446289062, 1.7516193389892578, 1.2105121612548828, 2.5649070739746094, 1.700836181640625, 0.84222412109375, 1.9367637634277344, 0.5892620086669922, 0.0584564208984375, 2.2252044677734375, 1.8749656677246094, 0.526458740234375, 0.146270751953125, 1.077371597290039, 0.42569732666015625, 0.6397819519042969, 1.2601547241210938, 1.4276561737060547, 2.1979503631591797, -0.7239227294921875, 0.12047195434570312, 0.9131622314453125, 4.356475830078125, -0.7819671630859375, 0.09921836853027344, 0.1280040740966797, 5.324951171875, 0.33305931091308594, 0.9147872924804688, -0.5134468078613281, -0.5684051513671875, 0.2568168640136719, 0.4653434753417969, -0.6175117492675781, -0.055145263671875, 0.41745758056640625, 1.6915817260742188, -0.3607330322265625, 0.5669193267822266, 0.46452903747558594, 0.3315887451171875, 0.21570587158203125, 0.01080322265625, 1.0649261474609375, 0.7091541290283203, 0.0622711181640625, -0.8974037170410156, 0.2325420379638672, 0.4908599853515625, -0.2764434814453125, 2.6570587158203125, 1.3626689910888672, 1.8391265869140625, 3.5811805725097656, 1.343658447265625, 0.5712738037109375, 0.6096305847167969, -0.6165542602539062, -0.6391010284423828, 1.2296142578125, 1.7433700561523438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000109.npy"}
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": 0.4851612150669098, "std": 1.210650086402893, "min": -2.3746776580810547, "p10": -1.0435035705566407, "median": 0.43205738067626953, "p90": 1.6965932846069338, "max": 3.81573486328125, "pos_frac": 0.6875, "sample": [0.45913124084472656, 0.4099998474121094, -1.1993255615234375, 0.4259681701660156, 0.4279193878173828, 0.27175140380859375, 1.2295608520507812, 1.5611724853515625, -0.041778564453125, 2.445098876953125, 2.320323944091797, 1.0719871520996094, 0.7002029418945312, 2.5748443603515625, -0.13031005859375, -1.0447921752929688, 0.046398162841796875, 3.149932861328125, -0.4000720977783203, -2.3746776580810547, 0.45648765563964844, 1.093658447265625, 0.547149658203125, 1.2659912109375, 0.0667572021484375, 0.7002944946289062, 0.43619537353515625, -0.3923759460449219, -0.002361297607421875, -1.5326766967773438, 0.9549827575683594, 0.926605224609375, 0.9766159057617188, -0.737762451171875, -0.2857208251953125, 0.8881301879882812, -0.9989776611328125, 1.54864501953125, -0.12068939208984375, 1.057504653930664, 0.9609375, 0.7552032470703125, -0.01050567626953125, 0.1325531005859375, 0.11382675170898438, 1.1969833374023438, 1.4402618408203125, -1.066619873046875, -1.040496826171875, -2.1067962646484375, 0.3196563720703125, 0.3153228759765625, 0.21612548828125, -0.5363082885742188, 0.49700927734375, 3.81573486328125, 1.7270660400390625, 1.6254901885986328, 0.4128246307373047, 3.5077056884765625, -0.9681072235107422, 1.4112701416015625, 0.9906158447265625, -1.411224365234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000110.npy"}
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 0.6044571399688721, "std": 1.337281584739685, "min": -3.71466064453125, "p10": -0.9910301208496093, "median": 0.76910400390625, "p90": 2.3068618774414062, "max": 2.998126983642578, "pos_frac": 0.671875, "sample": [-0.45401763916015625, -1.5162353515625, -0.8350677490234375, 2.9579925537109375, 1.0730628967285156, 0.75518798828125, -0.9310874938964844, 1.206207275390625, 2.0556869506835938, 0.7866859436035156, 1.551727294921875, 1.364898681640625, -0.6969757080078125, 0.7202072143554688, -0.267913818359375, 1.3548164367675781, -0.08755111694335938, 1.3449993133544922, -0.8154296875, 2.198526382446289, -0.46446990966796875, 0.891265869140625, 0.9855079650878906, 2.750457763671875, 0.5584564208984375, 1.3501415252685547, 0.6589736938476562, 2.7648277282714844, 0.6433525085449219, 0.8942947387695312, -0.16602706909179688, -1.1767330169677734, -3.71466064453125, -1.0167198181152344, 0.9604682922363281, 0.93170166015625, 0.5381450653076172, 2.3113021850585938, -0.6838569641113281, 2.0438690185546875, 2.170013427734375, 1.9925460815429688, 2.41302490234375, -0.5407276153564453, 1.0716285705566406, 0.2894916534423828, 0.9863338470458984, 2.998126983642578, -1.3505325317382812, 1.1569766998291016, 2.434112548828125, 2.2965011596679688, 1.427286148071289, -0.3956413269042969, -2.5657081604003906, 0.31940460205078125, -1.2162017822265625, 0.19301605224609375, -0.3215293884277344, 1.1374626159667969, 0.78302001953125, -0.2863273620605469, 0.4416084289550781, 0.42535400390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000111.npy"}
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 0.7614700198173523, "std": 1.2925748825073242, "min": -3.282947540283203, "p10": -0.8418540954589843, "median": 0.8292036056518555, "p90": 2.335989379882813, "max": 3.6537322998046875, "pos_frac": 0.734375, "sample": [2.6019287109375, 3.6537322998046875, 2.378662109375, 1.1414108276367188, -0.21331024169921875, 1.2244186401367188, 3.2911911010742188, 0.9748954772949219, 2.6323165893554688, 2.5320167541503906, 0.2187347412109375, -0.1380157470703125, -0.93505859375, 1.4085845947265625, 0.0475311279296875, -1.084381103515625, 0.1912841796875, 0.09474945068359375, -2.6082534790039062, -0.24061965942382812, 1.3088531494140625, -0.8825836181640625, 2.0935020446777344, 1.6315841674804688, -0.204071044921875, 1.4014739990234375, 0.34717559814453125, 0.6907386779785156, 2.034942626953125, 1.0760345458984375, 2.236419677734375, 1.0090789794921875, -0.5103302001953125, 0.5727310180664062, 1.8592109680175781, 0.719970703125, 0.9065074920654297, 0.154632568359375, 3.35003662109375, 0.07094955444335938, -0.09694862365722656, 0.72784423828125, 1.7626991271972656, 0.9606418609619141, -0.340362548828125, 2.09173583984375, 1.6042633056640625, 1.6048202514648438, -0.8917293548583984, 0.24831771850585938, -3.282947540283203, 2.0680465698242188, -0.7468185424804688, 1.1867103576660156, 1.290313720703125, 0.3766288757324219, -0.1665325164794922, -0.9465923309326172, 0.9274749755859375, 0.6580810546875, 1.12921142578125, -0.29241943359375, 1.0710678100585938, 0.7518997192382812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000112.npy"}
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 0.6044440269470215, "std": 1.347267746925354, "min": -4.2308502197265625, "p10": -1.0253307342529294, "median": 0.5215549468994141, "p90": 2.5000017166137694, "max": 2.8277587890625, "pos_frac": 0.703125, "sample": [-0.6714706420898438, -1.3785476684570312, 0.01508331298828125, -0.23639297485351562, -0.7288780212402344, 2.823101043701172, 0.5873985290527344, -0.34745025634765625, 0.6213836669921875, 2.5020885467529297, -0.015697479248046875, 0.42403411865234375, -0.5651016235351562, 2.409149169921875, -1.1995964050292969, 0.499267578125, 0.5438423156738281, 2.4951324462890625, -1.1523818969726562, -1.7202262878417969, 0.3817901611328125, 0.4620170593261719, 1.4926261901855469, -1.426239013671875, -0.467620849609375, 0.7125625610351562, 1.9138259887695312, 1.0679702758789062, 0.6046867370605469, 0.030546188354492188, 2.3821163177490234, 2.8277587890625, 0.08272743225097656, -0.5517196655273438, 1.3713226318359375, 2.7826175689697266, -4.2308502197265625, 1.0752811431884766, 1.320526123046875, 0.362945556640625, 0.3098907470703125, 2.7896690368652344, 0.31329345703125, 2.3779678344726562, -1.2118415832519531, 0.2049560546875, 0.8131256103515625, 0.28037071228027344, -0.5329170227050781, 2.1240081787109375, -0.14202880859375, 0.7375717163085938, 0.7110614776611328, 1.92828369140625, 2.66094970703125, -0.1980876922607422, 1.5471343994140625, 0.11267280578613281, 1.2551116943359375, 0.8101043701171875, 1.5997276306152344, 0.8001556396484375, -0.3734550476074219, 2.667064666748047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000113.npy"}
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 1.0640310049057007, "std": 1.3352947235107422, "min": -1.3097610473632812, "p10": -0.5731361389160154, "median": 0.7521648406982422, "p90": 2.98054599761963, "max": 3.85498046875, "pos_frac": 0.765625, "sample": [0.0774688720703125, 0.5276565551757812, 0.5384750366210938, 1.0862159729003906, 2.3781166076660156, 0.46961212158203125, 1.5556297302246094, 0.5133056640625, 1.7352790832519531, -0.9543285369873047, -0.2076892852783203, 2.1468544006347656, 1.227081298828125, 0.3850860595703125, 0.44948577880859375, -1.1094894409179688, 1.8328628540039062, 0.8369922637939453, 0.06373977661132812, 0.3549957275390625, 1.4000701904296875, 0.29779052734375, 3.3037567138671875, 0.016307830810546875, 0.6626834869384766, 2.3193817138671875, 0.9719963073730469, 2.5420989990234375, -0.14751434326171875, 2.382244110107422, 0.028768539428710938, 0.7523078918457031, 3.0680160522460938, 1.9480743408203125, 1.9011726379394531, 0.18037796020507812, 1.6689682006835938, 1.9177780151367188, 2.776449203491211, -1.3097610473632812, 3.790008544921875, 2.4927101135253906, 2.4859161376953125, -0.18572616577148438, -0.0846710205078125, -0.10970687866210938, 3.85498046875, 0.8913764953613281, -0.8128032684326172, -0.2968330383300781, -0.40996551513671875, 3.6136474609375, 1.3716812133789062, 3.127777099609375, 2.536884307861328, 2.028167724609375, 0.6364231109619141, -0.06729888916015625, -0.8256149291992188, -0.9815769195556641, 3.6010284423828125, -0.64306640625, 0.7520217895507812, 0.7443084716796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000114.npy"}
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 0.7438331246376038, "std": 1.2980138063430786, "min": -1.4774551391601562, "p10": -0.6261756896972656, "median": 0.5631093978881836, "p90": 2.2557617187500005, "max": 5.150278091430664, "pos_frac": 0.734375, "sample": [-0.26399993896484375, 0.8351249694824219, 1.2421798706054688, -1.369781494140625, 3.075836181640625, 0.8668766021728516, 1.0060501098632812, 1.7374763488769531, 0.5494308471679688, 5.034423828125, 0.8428153991699219, -1.4774551391601562, 5.150278091430664, -0.6494827270507812, 0.49674224853515625, 0.925018310546875, 0.3733673095703125, 0.297149658203125, 0.3809661865234375, 0.05055999755859375, 1.8779029846191406, 0.743499755859375, -0.5383224487304688, -0.5717926025390625, -0.8273830413818359, 0.6751728057861328, -0.4253692626953125, 0.5431060791015625, -0.8281707763671875, 2.3108901977539062, 0.4025726318359375, 0.5845603942871094, 0.28073692321777344, 0.965545654296875, 0.8708267211914062, 2.1271286010742188, 0.01287078857421875, 1.5726985931396484, 0.740814208984375, 1.45770263671875, 1.9935302734375, -1.2355918884277344, 2.0120925903320312, -0.2252197265625, 2.5242919921875, 1.1198348999023438, 1.6216354370117188, 0.9141941070556641, -0.08491134643554688, 0.1043701171875, 0.26840972900390625, 0.3500518798828125, 3.1024837493896484, 0.06865692138671875, -0.36792945861816406, -0.042083740234375, -0.6926040649414062, -0.4717998504638672, -0.5441436767578125, 0.327117919921875, 1.5352973937988281, 0.5767879486083984, 0.7517509460449219, 2.9205322265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000115.npy"}
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 0.879035472869873, "std": 1.1776129007339478, "min": -1.977142333984375, "p10": -0.46249389648437494, "median": 0.5759391784667969, "p90": 2.6145002365112306, "max": 3.2762298583984375, "pos_frac": 0.78125, "sample": [1.6457138061523438, 0.4053955078125, 0.5248641967773438, 3.0017929077148438, 1.7140655517578125, -0.30854034423828125, -0.37811279296875, 0.3667144775390625, 1.2767086029052734, 1.907012939453125, 2.1060028076171875, -0.419769287109375, -0.39058685302734375, 2.6244354248046875, 0.4054241180419922, -0.21528053283691406, 0.20499610900878906, 0.030759811401367188, 1.8928756713867188, 0.4429283142089844, -1.977142333984375, 1.734832763671875, 2.5494651794433594, 0.6571617126464844, -0.1861419677734375, 1.6018524169921875, 0.1573333740234375, 2.7958526611328125, 1.6971893310546875, 0.9207649230957031, -0.5059623718261719, 0.26099395751953125, -0.393035888671875, 0.274383544921875, 3.2762298583984375, 0.62701416015625, 2.2845535278320312, 2.7300033569335938, 0.0496978759765625, 0.2065582275390625, 0.34510040283203125, 1.0061492919921875, 2.591318130493164, 0.7313919067382812, 2.07574462890625, 2.0128021240234375, 3.0769729614257812, -0.480804443359375, 0.03742218017578125, 1.5324172973632812, 0.12416458129882812, 1.5822830200195312, 0.21917724609375, -0.5287322998046875, 1.1953811645507812, 0.1115570068359375, 0.07834243774414062, 0.9100723266601562, -0.5182399749755859, 0.9036865234375, 2.5105438232421875, -0.7569541931152344, 2.6363697052001953, -0.7368984222412109], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000116.npy"}
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 0.767128586769104, "std": 1.3275485038757324, "min": -4.178764343261719, "p10": -0.5814245223999024, "median": 0.8458852767944336, "p90": 2.451040649414063, "max": 3.9985809326171875, "pos_frac": 0.75, "sample": [1.09368896484375, 2.581686019897461, 0.233489990234375, 0.5293121337890625, 0.3658905029296875, 1.3738594055175781, 1.401702880859375, 1.1019134521484375, 0.8710403442382812, 0.9213333129882812, 0.22417831420898438, -0.5850620269775391, 1.918701171875, -0.57293701171875, 0.7888946533203125, 2.0138702392578125, 0.28411865234375, 1.0149307250976562, 1.8016471862792969, 2.3568649291992188, -0.3367767333984375, 0.8673591613769531, 1.3952865600585938, 2.9710540771484375, -0.15622711181640625, 0.02204132080078125, -0.460479736328125, 1.7233505249023438, 0.298095703125, 3.9985809326171875, 0.25091552734375, 3.19757080078125, 2.1039047241210938, -0.5339145660400391, 1.05108642578125, 1.4114875793457031, 1.319375991821289, 2.4914016723632812, 1.3016777038574219, 2.5199432373046875, 0.7478866577148438, -0.06509780883789062, -0.06305694580078125, 1.2203941345214844, -0.639862060546875, -4.178764343261719, 0.8244113922119141, 0.47159576416015625, 1.7516021728515625, 0.6793746948242188, 0.09951019287109375, 1.1454238891601562, 1.6219749450683594, -1.6514396667480469, 2.1714630126953125, -1.8429527282714844, -0.7903614044189453, -0.3336524963378906, -1.5792999267578125, 0.11345672607421875, 2.7191848754882812, 1.2132720947265625, -0.111358642578125, 0.4176673889160156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000117.npy"}
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 0.8884795904159546, "std": 1.2573249340057373, "min": -1.962209701538086, "p10": -0.5893230438232422, "median": 0.8503007888793945, "p90": 2.720302963256836, "max": 3.6930923461914062, "pos_frac": 0.75, "sample": [1.2604446411132812, 0.6533889770507812, 1.8289642333984375, -0.1168365478515625, -1.962209701538086, 1.444122314453125, 0.97332763671875, 0.20784378051757812, 1.844522476196289, -0.5547828674316406, 0.8377494812011719, 1.19293212890625, 0.5255889892578125, 2.961994171142578, 0.38753509521484375, 0.2544708251953125, -1.7628250122070312, 1.4218521118164062, -0.4077186584472656, 0.8628520965576172, -0.143280029296875, 2.0526580810546875, 1.4352588653564453, 2.3095703125, 1.5336952209472656, 0.22111892700195312, -0.24317169189453125, -0.46021270751953125, 1.238229751586914, 0.017160415649414062, 1.4698104858398438, 1.10137939453125, 3.35662841796875, -0.30428504943847656, 0.3641204833984375, 0.504241943359375, -0.054290771484375, 3.6930923461914062, 2.7298736572265625, -1.5096282958984375, -0.80517578125, 2.0511932373046875, -0.7268905639648438, 1.52545166015625, 0.3202400207519531, 0.6344223022460938, -0.6041259765625, 0.9380035400390625, 2.741046905517578, 1.3705291748046875, 1.1402435302734375, 1.6324748992919922, 0.11963653564453125, 2.6979713439941406, 3.508617401123047, 0.7345542907714844, -0.7320556640625, 1.9268169403076172, 2.738475799560547, 2.507802963256836, 1.5774917602539062, -0.31447410583496094, 0.568603515625, 0.1466522216796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000118.npy"}
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 0.6178329586982727, "std": 1.4707363843917847, "min": -2.9482574462890625, "p10": -1.2831039428710938, "median": 0.7064008712768555, "p90": 2.453397369384766, "max": 3.5966949462890625, "pos_frac": 0.71875, "sample": [0.41162872314453125, 2.5049896240234375, -1.0445232391357422, 1.3946952819824219, -1.3176422119140625, 0.6195793151855469, 2.3209362030029297, 0.87347412109375, -0.9130363464355469, 0.1735687255859375, -1.0524177551269531, 0.5275650024414062, 0.9272232055664062, 1.8137664794921875, 1.8980026245117188, -2.3164901733398438, 0.24425506591796875, -2.097301483154297, 0.31098365783691406, 0.1920623779296875, -0.8129291534423828, 2.6864013671875, 0.37912750244140625, 1.619363784790039, 2.8499679565429688, 0.5553665161132812, -1.5243186950683594, 1.9339218139648438, 2.1876220703125, 3.37921142578125, 1.423553466796875, -1.901519775390625, 1.3487796783447266, 1.2073631286621094, -0.3077239990234375, 1.5318603515625, 0.6883449554443359, 0.33269500732421875, 0.81793212890625, 2.7960662841796875, 2.3330154418945312, 1.265167236328125, -0.5737552642822266, 0.7531661987304688, -2.0704269409179688, -1.0362129211425781, 1.11572265625, 1.767364501953125, 2.2384185791015625, 0.7454757690429688, 2.2267608642578125, -2.9482574462890625, -1.2025146484375, -0.7470016479492188, 0.7244434356689453, 0.079925537109375, 1.6861324310302734, 3.5966949462890625, 2.532855987548828, 0.74920654296875, 0.6883583068847656, 0.126953125, -0.7021903991699219, -0.47039794921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000119.npy"}
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 0.9073318243026733, "std": 1.2090052366256714, "min": -1.3836116790771484, "p10": -0.5340534210205077, "median": 0.7082099914550781, "p90": 2.364506912231446, "max": 4.7896575927734375, "pos_frac": 0.8125, "sample": [-0.5899658203125, 0.5512962341308594, -1.2252273559570312, 2.18048095703125, 2.1924285888671875, 0.42783355712890625, 1.4655838012695312, 0.7013397216796875, 0.6683578491210938, -1.3539962768554688, 0.807952880859375, -0.12088394165039062, 0.4054584503173828, 0.8501701354980469, 0.8530960083007812, 3.15966796875, 1.55096435546875, 0.19243812561035156, 0.6631546020507812, 0.6981887817382812, 2.7443313598632812, 0.7375564575195312, 1.1225509643554688, 0.7816219329833984, 0.4664764404296875, 0.8333625793457031, 1.5540847778320312, 1.746734619140625, 1.1886062622070312, 1.2950363159179688, 4.285072326660156, 0.102142333984375, 1.6264114379882812, 0.2637310028076172, 2.249927520751953, 0.21406936645507812, 1.5780525207519531, -0.365325927734375, 2.8694686889648438, 0.5859527587890625, -0.6872024536132812, 0.4989776611328125, -0.34600830078125, 4.7896575927734375, -0.3261394500732422, 0.7126083374023438, -1.3836116790771484, 2.1679534912109375, 1.4293365478515625, 1.0028152465820312, 0.6687812805175781, -0.4035911560058594, 2.4136123657226562, 0.28472900390625, 1.9022369384765625, -0.8447647094726562, 0.5068206787109375, 2.4326095581054688, 0.14127159118652344, -0.603973388671875, 0.740478515625, 0.2085437774658203, 2.102079391479492, 0.7038116455078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000120.npy"}
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 0.7750145792961121, "std": 1.2171183824539185, "min": -1.9221267700195312, "p10": -0.640726089477539, "median": 0.6997356414794922, "p90": 2.1482923507690437, "max": 4.84881591796875, "pos_frac": 0.75, "sample": [3.0050201416015625, 1.80206298828125, -1.0875701904296875, -0.5277175903320312, 2.560832977294922, 1.5770263671875, 2.5804786682128906, -0.17991256713867188, 1.5548477172851562, -0.40712738037109375, 0.7426605224609375, -0.5556106567382812, 4.1344757080078125, 0.441986083984375, 0.7357330322265625, 0.008893966674804688, 2.2450733184814453, -1.9221267700195312, 0.988311767578125, -1.1194572448730469, 1.1855087280273438, 0.8830451965332031, 1.2301864624023438, 0.58624267578125, 0.656707763671875, 1.2892990112304688, 0.7668075561523438, -0.8629360198974609, 0.7509918212890625, 0.9981269836425781, 0.08381271362304688, 4.84881591796875, 1.5199356079101562, 0.2845611572265625, -0.35651397705078125, -0.985992431640625, 0.1673736572265625, 0.6720199584960938, 1.9199371337890625, 0.6785964965820312, 0.7208747863769531, 0.47714996337890625, 1.2697601318359375, 1.8819122314453125, 0.4397430419921875, 0.47931671142578125, -0.7697296142578125, 1.9224700927734375, 1.6548309326171875, 1.2060680389404297, -0.4456958770751953, -0.48746490478515625, 1.4042510986328125, 1.510223388671875, 0.5194988250732422, 2.703399658203125, 0.5687351226806641, -0.6772041320800781, 0.4279155731201172, -0.237823486328125, 0.038501739501953125, 1.4799575805664062, 0.75347900390625, -0.13364410400390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000121.npy"}
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 1.1058697700500488, "std": 1.573955774307251, "min": -3.513427734375, "p10": -0.3767772674560545, "median": 0.9789943695068359, "p90": 2.7502172470092776, "max": 6.35693359375, "pos_frac": 0.828125, "sample": [6.35693359375, 0.04698944091796875, -0.1548309326171875, 2.7669296264648438, -3.513427734375, -0.462982177734375, 1.245361328125, 0.6817550659179688, 0.3494243621826172, 1.019775390625, 2.040609359741211, 0.9659652709960938, 0.27313232421875, 0.2328643798828125, -0.17563247680664062, 1.73223876953125, 0.209075927734375, 1.7438812255859375, -0.7924919128417969, 2.236469268798828, 0.4959220886230469, 4.935344696044922, 1.1299934387207031, 2.425935745239258, 4.145526885986328, 0.41949462890625, 0.7286243438720703, -1.0254707336425781, 1.64056396484375, 1.5142555236816406, 0.4719696044921875, 2.7844314575195312, 0.03687477111816406, 1.6326141357421875, -1.9996719360351562, 1.4202919006347656, 0.7683010101318359, 0.01324462890625, 1.7477664947509766, 0.66510009765625, -0.004390716552734375, 1.5541915893554688, -1.0600852966308594, 3.94036865234375, 1.6073074340820312, 2.266630172729492, 1.1069355010986328, 0.9920234680175781, 0.53619384765625, 0.7343063354492188, 1.3106536865234375, 1.4668807983398438, 0.2829780578613281, 0.5083122253417969, -0.088226318359375, 1.1657524108886719, 0.9330635070800781, 1.015899658203125, 0.6211929321289062, 2.3927154541015625, 2.4852066040039062, 2.711221694946289, -1.088470458984375, 4.6318511962890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000122.npy"}
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 0.8363161087036133, "std": 1.433629035949707, "min": -3.3234100341796875, "p10": -1.013052368164062, "median": 0.9346199035644531, "p90": 2.4640560150146493, "max": 4.485382080078125, "pos_frac": 0.71875, "sample": [2.7272872924804688, 0.5812911987304688, -0.04444122314453125, -0.230377197265625, -1.5764274597167969, 0.07377052307128906, 0.394134521484375, -0.18643951416015625, 4.485382080078125, 2.260059356689453, 0.57025146484375, 2.20965576171875, -0.3873748779296875, -1.4842681884765625, 1.9956588745117188, 1.5969390869140625, 0.4431343078613281, 1.5132217407226562, -1.6586761474609375, 1.4365921020507812, 1.1132049560546875, 2.551483154296875, 1.08831787109375, 2.0390758514404297, 2.1510543823242188, 2.7774658203125, 1.3882026672363281, -0.24665260314941406, 1.7488861083984375, 1.0008926391601562, 0.08622550964355469, 3.724029541015625, -2.155668258666992, 1.5682125091552734, -1.2349624633789062, -1.4335098266601562, 1.7734756469726562, -0.3033561706542969, 0.1901874542236328, 1.0795249938964844, 0.9357147216796875, 0.4955940246582031, 1.548990249633789, 1.0926132202148438, 0.7036876678466797, 1.431365966796875, -0.10677146911621094, -0.12042617797851562, -0.1258392333984375, 2.700193405151367, 1.36407470703125, 0.9335250854492188, 0.03171539306640625, 2.1211013793945312, -3.3234100341796875, 2.0950927734375, -0.3539314270019531, 1.3503990173339844, -0.49526214599609375, 1.6838150024414062, 0.7197780609130859, 0.5269737243652344, 3.862548828125, 0.8272247314453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000123.npy"}
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 1.15788996219635, "std": 1.377743124961853, "min": -2.7148094177246094, "p10": -0.4243183135986328, "median": 1.0748882293701172, "p90": 3.0118103027343754, "max": 4.6872100830078125, "pos_frac": 0.8125, "sample": [-0.4307212829589844, 4.6872100830078125, 1.9942474365234375, -2.7148094177246094, 1.0396480560302734, 0.9001388549804688, 3.6061859130859375, 2.167919158935547, 1.1881103515625, 3.31866455078125, 1.6456222534179688, 1.7341575622558594, 0.5967006683349609, 3.0465087890625, 0.7177276611328125, 0.09962654113769531, 0.793609619140625, 1.8757295608520508, 0.12310028076171875, -1.0885601043701172, 4.329132080078125, 1.6968154907226562, 2.478504180908203, -0.4093780517578125, -0.23666763305664062, 1.2273330688476562, 0.6645259857177734, -0.13623046875, 1.9627532958984375, -0.75994873046875, 1.5296154022216797, 1.0626068115234375, 1.980072021484375, 2.1353759765625, 1.2388343811035156, 1.201568603515625, 3.446704864501953, -0.5587615966796875, 3.9158248901367188, 1.083587646484375, 0.3415069580078125, 2.93084716796875, -0.3455085754394531, 2.0128250122070312, 1.1332855224609375, 0.7652816772460938, 1.4971694946289062, 0.4168243408203125, -1.17803955078125, 2.359334945678711, 2.570892333984375, 0.23348617553710938, 0.07776451110839844, 1.0661888122558594, 1.3242950439453125, -0.1073150634765625, 0.9500465393066406, 0.04150390625, 0.011898040771484375, 0.909027099609375, -0.46502685546875, 1.5644378662109375, 0.9686050415039062, 1.8725433349609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000124.npy"}
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 1.0070770978927612, "std": 1.4293771982192993, "min": -1.6960296630859375, "p10": -0.7965309143066406, "median": 0.9317131042480469, "p90": 3.155955314636231, "max": 4.7757110595703125, "pos_frac": 0.765625, "sample": [2.6554107666015625, 2.2383346557617188, 1.0516319274902344, 1.061187744140625, 0.624114990234375, -0.7150840759277344, 4.7757110595703125, 0.03226661682128906, 1.02276611328125, 1.4160270690917969, 3.626983642578125, 0.26773834228515625, -0.47121429443359375, 1.2596435546875, 0.54962158203125, 3.3059463500976562, 2.0203704833984375, 1.8689727783203125, 1.03564453125, 0.31311798095703125, 3.697071075439453, 0.07738876342773438, 1.7140045166015625, 1.1961593627929688, 0.4151763916015625, 0.8319587707519531, 3.0369338989257812, -0.8691368103027344, -1.1214141845703125, 0.8406600952148438, 2.6771087646484375, -0.1612548828125, -0.8054962158203125, 3.3839111328125, 2.2760982513427734, -0.7161483764648438, 1.9949321746826172, 0.6928291320800781, 2.121246337890625, 2.2382659912109375, 3.619903564453125, 0.10982513427734375, 1.1712570190429688, 1.5415000915527344, 1.7516593933105469, -1.2667388916015625, 0.2571258544921875, 3.2069644927978516, -0.1556549072265625, 0.042606353759765625, -1.4354248046875, 1.4184188842773438, 1.0929069519042969, 0.6005859375, -0.7756118774414062, 0.8018226623535156, 0.42974853515625, -0.6577987670898438, 1.6135025024414062, -0.4082965850830078, 0.7218894958496094, -0.8586196899414062, -1.6960296630859375, 1.8679046630859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000125.npy"}
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 0.8797396421432495, "std": 1.8501155376434326, "min": -3.0294723510742188, "p10": -0.984432220458984, "median": 0.7440299987792969, "p90": 2.817420959472657, "max": 10.086669921875, "pos_frac": 0.703125, "sample": [1.3351097106933594, 0.1996135711669922, 2.652130126953125, 0.9485626220703125, -0.3624591827392578, 3.1638526916503906, 2.8882598876953125, 0.3180732727050781, 0.7606735229492188, -1.1934852600097656, 2.126392364501953, 2.2616424560546875, -2.41162109375, 1.2231979370117188, 3.1446151733398438, 1.0377826690673828, 0.07483673095703125, -1.2172203063964844, -0.10869789123535156, -0.27156829833984375, -0.061676025390625, 2.0465011596679688, 0.40222740173339844, 0.6994867324829102, -0.09764862060546875, -0.29758453369140625, 0.727386474609375, -0.6662673950195312, 0.6475830078125, -1.12078857421875, 0.4045677185058594, 0.47381591796875, -1.5286121368408203, 1.2477798461914062, -0.052997589111328125, -0.6525230407714844, -0.3887367248535156, 10.086669921875, 1.5311775207519531, -3.0294723510742188, 1.0292282104492188, 1.3661575317382812, 1.0696544647216797, 0.8986396789550781, 0.8371124267578125, 1.6689834594726562, 1.668487548828125, 4.035087585449219, 0.8437957763671875, 1.9314956665039062, 0.7803077697753906, 3.3113555908203125, 1.3950271606445312, 1.1033935546875, 0.5240020751953125, -0.0421295166015625, -0.0681304931640625, 2.517049789428711, 0.47945404052734375, 1.588134765625, 0.00634765625, 4.580039978027344, -2.4524688720703125, 0.2917327880859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000126.npy"}
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 1.1577074527740479, "std": 1.3368220329284668, "min": -2.059417724609375, "p10": -0.19371051788330074, "median": 1.1194992065429688, "p90": 3.0586595535278325, "max": 4.472564697265625, "pos_frac": 0.796875, "sample": [0.1142730712890625, 2.723358154296875, 1.1156463623046875, 1.9198379516601562, 1.12335205078125, 1.133209228515625, 1.3358306884765625, 3.111551284790039, 0.1867828369140625, 2.7777099609375, 2.2160511016845703, 0.64019775390625, 1.1325836181640625, 2.9352455139160156, 2.2065811157226562, 0.6132965087890625, 1.575408935546875, 1.2503852844238281, -0.582550048828125, 1.2611274719238281, 2.0109786987304688, -2.059417724609375, 0.99200439453125, 3.2821311950683594, 1.160614013671875, -1.2520599365234375, 1.862457275390625, 0.857452392578125, -1.6285247802734375, 0.43059539794921875, 0.4945487976074219, 0.7628631591796875, 0.14646530151367188, 1.6061782836914062, -0.032917022705078125, 2.659025192260742, -0.025768280029296875, 3.2144317626953125, 2.918292999267578, 4.472564697265625, 0.2861900329589844, 2.3040809631347656, 0.5347518920898438, -0.6619739532470703, 1.848052978515625, 0.7325668334960938, -0.30999755859375, -0.09393310546875, 3.8866729736328125, -0.2073822021484375, 1.5323867797851562, 3.6156005859375, 0.7092475891113281, -0.03079986572265625, 1.0847969055175781, 0.1090545654296875, 1.8120956420898438, 0.496826171875, 0.045650482177734375, 1.479196548461914, 1.2563304901123047, -0.16180992126464844, 3.2364044189453125, -0.07252883911132812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000127.npy"}
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 0.9054335355758667, "std": 1.730657935142517, "min": -3.5445556640625, "p10": -1.0833532333374023, "median": 1.0943527221679688, "p90": 3.035258483886719, "max": 5.1027984619140625, "pos_frac": 0.6875, "sample": [2.3473663330078125, 2.716583251953125, 0.5321846008300781, 1.857950210571289, 1.7010269165039062, 1.1075897216796875, 1.08111572265625, 1.1971435546875, 0.9052162170410156, 3.069549560546875, 1.006683349609375, -2.71942138671875, 1.590240478515625, -0.26448631286621094, 1.3882102966308594, -1.0531425476074219, -0.039684295654296875, -1.0824813842773438, -1.7988128662109375, 2.7347259521484375, 2.9552459716796875, -2.19854736328125, -1.3396186828613281, 4.137908935546875, 4.038970947265625, 1.419219970703125, 0.3757476806640625, 3.2971115112304688, -0.3479194641113281, 2.398317337036133, 0.277801513671875, 1.4384078979492188, 1.3358383178710938, 1.6334991455078125, 1.6153373718261719, 1.4077301025390625, 0.13704681396484375, 3.7418689727783203, -1.3805999755859375, -0.9819793701171875, -1.0671157836914062, 1.155569076538086, 0.6814613342285156, 0.39698028564453125, -0.5476455688476562, -0.8608551025390625, 1.460836410522461, 2.62152099609375, 1.9527587890625, -1.0837268829345703, 1.2033843994140625, 0.62823486328125, 5.1027984619140625, 0.2239990234375, 2.4613876342773438, 1.0182571411132812, -0.35037994384765625, 1.28179931640625, 4.00787353515625, -0.291412353515625, -3.5445556640625, -0.6313972473144531, 2.1780014038085938, -0.288970947265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000128.npy"}
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 0.7994937896728516, "std": 1.503303050994873, "min": -3.477783203125, "p10": -0.8239009857177733, "median": 0.5411443710327148, "p90": 2.5750087738037113, "max": 5.261878967285156, "pos_frac": 0.65625, "sample": [1.60791015625, -0.5887794494628906, 1.5370311737060547, -1.0633392333984375, -0.016246795654296875, 3.3796119689941406, -1.261474609375, -0.8845043182373047, -0.00881195068359375, 2.357879638671875, 1.9266281127929688, 1.1175956726074219, 1.7248687744140625, 0.0956268310546875, -1.285552978515625, -0.6504669189453125, 2.608692169189453, 0.28729248046875, 3.1150970458984375, 2.4964141845703125, 0.6719970703125, -0.55340576171875, 0.9636993408203125, 0.38525390625, -0.6824932098388672, 3.69024658203125, 1.7889347076416016, 0.30852508544921875, 1.1760635375976562, 2.40069580078125, 1.6364669799804688, 0.5479736328125, -1.49591064453125, 2.3129329681396484, 0.28276634216308594, 0.8921585083007812, -1.3542709350585938, -0.35004425048828125, 0.5038661956787109, 3.4189300537109375, 1.2978363037109375, -0.22821617126464844, -0.11186599731445312, 0.5343151092529297, 0.6291847229003906, 0.9816169738769531, 0.440216064453125, 5.261878967285156, 2.136016845703125, 3.10400390625, 0.7031402587890625, 2.1577224731445312, 2.0958709716796875, -0.2714881896972656, 0.08136558532714844, -0.6490631103515625, 2.3501052856445312, -3.477783203125, -0.112335205078125, -0.06296348571777344, 0.27038002014160156, -0.356353759765625, -0.011224746704101562, 1.365386962890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000129.npy"}
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 1.020914912223816, "std": 1.3209627866744995, "min": -1.4043922424316406, "p10": -0.5640167236328125, "median": 1.0074844360351562, "p90": 2.6957986831665046, "max": 5.154773712158203, "pos_frac": 0.796875, "sample": [1.7746810913085938, 0.2712898254394531, 3.1617431640625, 1.71051025390625, 3.1246719360351562, 1.822509765625, 0.73748779296875, 2.783018112182617, 0.7613582611083984, 0.11056900024414062, -0.167755126953125, 2.28179931640625, 1.8981094360351562, 0.67816162109375, -0.5690383911132812, 1.1899566650390625, 1.4861278533935547, 0.44388580322265625, 1.3956680297851562, 1.6505584716796875, 2.1109275817871094, 2.3109207153320312, 3.1147308349609375, 2.36614990234375, 0.12370681762695312, -0.793304443359375, -0.352294921875, 0.06634521484375, -1.2283306121826172, 0.7392044067382812, 1.6034393310546875, 2.4922866821289062, 1.6317977905273438, 0.9438896179199219, -0.15452194213867188, 5.154773712158203, 2.0013275146484375, 0.5364856719970703, 3.384918212890625, 0.023525238037109375, 0.059490203857421875, 2.2672348022460938, -0.48203277587890625, -1.4043922424316406, 1.32611083984375, 1.1555862426757812, 1.2613296508789062, -0.5522994995117188, 1.6877670288085938, -0.44751739501953125, -1.1265335083007812, 1.3158836364746094, -1.1905975341796875, 1.1122970581054688, -0.9776840209960938, 0.8868217468261719, 0.108062744140625, 1.0436210632324219, 0.12384033203125, 0.33880615234375, 3.5214767456054688, 1.4549121856689453, 0.9713478088378906, 0.26373291015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000130.npy"}
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 1.1049723625183105, "std": 1.7726188898086548, "min": -3.4256744384765625, "p10": -0.915949249267578, "median": 1.1852302551269531, "p90": 3.5425773620605483, "max": 4.465431213378906, "pos_frac": 0.71875, "sample": [-0.20238876342773438, 1.6396007537841797, 0.24996185302734375, -0.1929035186767578, -0.5377731323242188, 4.2305145263671875, 1.206808090209961, 1.6092872619628906, -0.5211181640625, 2.7580718994140625, 1.4451751708984375, 0.48558807373046875, -0.8072013854980469, -1.042388916015625, 0.7801666259765625, 1.7559185028076172, -3.15576171875, 0.9677963256835938, 3.6838912963867188, 0.402984619140625, -0.987030029296875, -1.5004653930664062, 1.8403167724609375, 2.2615203857421875, 1.0859222412109375, 0.48429107666015625, 1.3152103424072266, -2.2297210693359375, 0.8610496520996094, -0.5611724853515625, 1.4151077270507812, 2.0453414916992188, 3.7134857177734375, 2.566509246826172, 2.4898834228515625, 2.8111572265625, 0.7598762512207031, 4.3895263671875, 1.1636524200439453, 2.7814178466796875, 1.2283134460449219, 4.465431213378906, -0.8209915161132812, 3.0515823364257812, 1.3552169799804688, 0.00252532958984375, 0.47205162048339844, 3.2128448486328125, -0.8701858520507812, 1.69293212890625, -0.22205352783203125, -0.5238151550292969, 1.8635292053222656, -0.23488998413085938, -3.4256744384765625, 2.8461532592773438, 0.43254661560058594, 0.7251434326171875, 3.0618743896484375, -0.9355621337890625, 1.2330245971679688, 3.8084468841552734, 4.400209426879883, 2.4374732971191406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000131.npy"}
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 0.9653000831604004, "std": 1.7123063802719116, "min": -4.044097900390625, "p10": -0.9430816650390624, "median": 1.079437255859375, "p90": 2.8759227752685548, "max": 5.625335693359375, "pos_frac": 0.734375, "sample": [3.6293563842773438, 0.6359748840332031, 1.3736038208007812, 2.164165496826172, -0.4503746032714844, -1.0847835540771484, 2.1276702880859375, 0.11907386779785156, -0.79583740234375, 2.7964553833007812, 2.8463706970214844, 2.773679733276367, 1.3118972778320312, 1.11077880859375, -1.6792488098144531, 0.24660491943359375, -0.14117431640625, 2.1103668212890625, 0.8525466918945312, -0.7561721801757812, -0.21514892578125, -4.044097900390625, 1.6103382110595703, 0.5172805786132812, -1.2760124206542969, -1.5799713134765625, 1.4540519714355469, 3.1072311401367188, 2.5893917083740234, -0.18320465087890625, 0.5897750854492188, 0.1653289794921875, -3.750516891479492, 0.3548622131347656, 1.4599952697753906, 1.1936187744140625, 3.5978012084960938, 2.1954345703125, 2.293935775756836, 1.7608642578125, -0.5434684753417969, 5.625335693359375, 0.46669769287109375, 2.0359878540039062, 2.590414047241211, 1.5985336303710938, 0.35315704345703125, 0.4775390625, 0.9762859344482422, 1.6989612579345703, 2.111391067504883, -0.13573455810546875, 3.9613571166992188, 1.9328269958496094, 3.1869640350341797, 0.0149078369140625, 1.048095703125, -0.5606460571289062, 0.1131134033203125, 1.6317367553710938, -0.974456787109375, 1.1195831298828125, 2.8885879516601562, -0.869873046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000132.npy"}
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 0.955377995967865, "std": 1.5940176248550415, "min": -2.54974365234375, "p10": -0.7762151718139648, "median": 0.6990013122558594, "p90": 2.682621192932129, "max": 5.6649627685546875, "pos_frac": 0.734375, "sample": [2.2468719482421875, -0.401123046875, 0.446197509765625, 2.6955032348632812, 0.1941986083984375, 0.4785919189453125, 0.2586402893066406, 2.9682159423828125, 0.0019683837890625, -0.7633590698242188, 1.3478469848632812, 3.2814788818359375, 2.1215972900390625, 0.4249000549316406, -1.3681907653808594, 2.6076889038085938, 2.276092529296875, -2.3972320556640625, 2.309947967529297, -0.15956878662109375, -0.5466537475585938, 1.0327835083007812, -0.2983894348144531, 2.48394775390625, 2.0673770904541016, 0.16956710815429688, 1.7883758544921875, -0.4416961669921875, 0.7387008666992188, -0.5215282440185547, 1.4431228637695312, 0.5310211181640625, -2.54974365234375, 2.524343490600586, -0.9908905029296875, 1.716644287109375, -0.016254425048828125, 1.017547607421875, 2.6525630950927734, 2.1147613525390625, 0.6593017578125, 1.9744491577148438, 1.630584716796875, -2.4683456420898438, 0.12575531005859375, 3.854736328125, 1.4768753051757812, 2.2075729370117188, 0.290557861328125, 5.6649627685546875, 0.35736846923828125, 0.13909149169921875, -0.4338874816894531, 3.967987060546875, 1.32183837890625, 0.3419685363769531, 1.719207763671875, 2.973806381225586, 1.2895965576171875, -0.7817249298095703, 2.642425537109375, -0.311798095703125, -1.4364547729492188, 0.4524497985839844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000133.npy"}
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 1.3798508644104004, "std": 1.7071197032928467, "min": -2.760650634765625, "p10": -0.6509517669677732, "median": 1.4992408752441406, "p90": 3.2960651397705085, "max": 5.9425506591796875, "pos_frac": 0.8125, "sample": [1.0280227661132812, -0.4714202880859375, 1.8789443969726562, 1.6319084167480469, 0.8562049865722656, 1.3626670837402344, 0.35218048095703125, 3.0699691772460938, 3.3942947387695312, 1.532989501953125, 1.7916641235351562, 2.420440673828125, 2.1154861450195312, 1.4625816345214844, 0.6483936309814453, -0.18764305114746094, 2.5685997009277344, 1.5493698120117188, -0.2875175476074219, 1.4654922485351562, 2.6645278930664062, 1.0837326049804688, 2.6046104431152344, 0.3988075256347656, 0.34104156494140625, 3.392963409423828, 0.6861038208007812, 0.1248626708984375, -0.9507808685302734, 2.3295021057128906, -1.3653182983398438, 0.10435676574707031, 2.8560028076171875, 0.5119781494140625, 0.3392372131347656, -2.0372753143310547, 2.050994873046875, 1.7082653045654297, 2.0465545654296875, 1.8219757080078125, 5.9425506591796875, 2.4005508422851562, -1.8208770751953125, -2.760650634765625, 2.4435653686523438, -0.33660125732421875, 4.1078643798828125, -0.91668701171875, 3.967071533203125, 1.412750244140625, 0.3307838439941406, 2.540943145751953, 2.3438072204589844, 1.7490997314453125, 5.630035400390625, 0.48891448974609375, -0.7278938293457031, 1.6444168090820312, 1.0846500396728516, 1.6015205383300781, 3.004180908203125, 5.277915954589844, 0.3243122100830078, -0.31653594970703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000134.npy"}
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 0.8812651634216309, "std": 1.6729177236557007, "min": -3.165721893310547, "p10": -1.0550222396850586, "median": 0.7825870513916016, "p90": 2.4195789337158207, "max": 5.525703430175781, "pos_frac": 0.671875, "sample": [1.9757575988769531, -1.2888755798339844, 2.4724769592285156, 0.3914794921875, 0.7928504943847656, -0.9460010528564453, -0.035980224609375, -1.10174560546875, -0.07069015502929688, 5.525703430175781, 2.004150390625, -0.37671661376953125, 2.1414031982421875, -0.12647247314453125, -0.3730316162109375, -1.4501075744628906, 2.1622238159179688, 1.07208251953125, -0.412078857421875, 0.8082675933837891, 0.7066917419433594, -0.22594451904296875, -3.165721893310547, 0.7380905151367188, 0.9055595397949219, 3.2460365295410156, 0.6025123596191406, -0.03226470947265625, 0.49794769287109375, 0.9640007019042969, 0.491729736328125, 0.8732261657714844, 2.2609100341796875, 0.6074581146240234, 2.2961502075195312, 1.078155517578125, 0.7723236083984375, 1.2625465393066406, 2.0653076171875, -2.3237037658691406, -0.17884445190429688, 4.7994537353515625, -0.0009517669677734375, 1.6127166748046875, 1.2746696472167969, 0.8138961791992188, -0.4243316650390625, -2.1852493286132812, 1.9886817932128906, 1.1959075927734375, 2.1003456115722656, -1.7086639404296875, 0.32074737548828125, 2.121591567993164, 4.3976898193359375, 4.497806549072266, -0.7048816680908203, 1.9251251220703125, -0.3540458679199219, 1.4841651916503906, 0.3616485595703125, 1.8263778686523438, 4.170475006103516, 0.2809333801269531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000135.npy"}
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 1.1316382884979248, "std": 2.3239142894744873, "min": -3.7392120361328125, "p10": -1.6195945739746092, "median": 0.9489021301269531, "p90": 4.621074104309082, "max": 7.965263366699219, "pos_frac": 0.71875, "sample": [6.857421875, 3.2520275115966797, -2.879323959350586, -1.1814498901367188, -1.709075927734375, 1.4021797180175781, -0.01717376708984375, 0.9315261840820312, -0.1225433349609375, 1.2426185607910156, 1.3242416381835938, -1.9406890869140625, -0.3845024108886719, 1.6867961883544922, -0.29926300048828125, 2.2170181274414062, 4.542806625366211, 0.6371612548828125, 0.7473716735839844, -1.12762451171875, 2.2169952392578125, 0.5774154663085938, 2.2821731567382812, 4.6546173095703125, -0.5720977783203125, 0.12632369995117188, 4.865119934082031, 0.9752273559570312, 0.8182525634765625, 6.7293701171875, 2.047637939453125, 0.15921783447265625, 1.825286865234375, 3.1824111938476562, -0.765167236328125, 1.139190673828125, 0.43540191650390625, 1.3630657196044922, 1.0755844116210938, 0.710601806640625, -3.4271984100341797, 1.595062255859375, 1.5852203369140625, 2.4035491943359375, 1.0496978759765625, 2.7514095306396484, -0.6112213134765625, 0.5952987670898438, -2.0699462890625, 7.965263366699219, 1.0669116973876953, -2.0636444091796875, -1.4108047485351562, 0.966278076171875, -0.36623382568359375, 2.346832275390625, 0.8834228515625, 0.42455291748046875, 5.069700241088867, 5.580007553100586, 2.150604248046875, 0.3605804443359375, 0.2925682067871094, -3.7392120361328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000136.npy"}
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 1.1874182224273682, "std": 1.870070457458496, "min": -3.50006103515625, "p10": -1.0897037506103515, "median": 1.3073177337646484, "p90": 3.121282958984375, "max": 6.434974670410156, "pos_frac": 0.734375, "sample": [-0.23296356201171875, -3.50006103515625, 2.196941375732422, 2.8406753540039062, 2.4841384887695312, 3.0954132080078125, 3.2470741271972656, 4.2631988525390625, -0.2654228210449219, 3.3701324462890625, 2.654630661010742, 1.9000625610351562, 2.835357666015625, 2.579010009765625, 3.8931427001953125, 1.3423385620117188, 1.638671875, 3.0510406494140625, 0.5198974609375, 0.7289943695068359, -0.9507522583007812, 1.383880615234375, 2.094940185546875, -1.6643104553222656, 0.47899818420410156, 0.36695098876953125, 2.581707000732422, 6.434974670410156, -1.0724296569824219, -0.5627059936523438, -1.9201812744140625, 0.2560844421386719, 0.11872100830078125, 1.7646560668945312, 6.3488311767578125, 0.25339317321777344, 0.10272216796875, -1.2168960571289062, 1.2722969055175781, -0.03046417236328125, -0.8290863037109375, 0.5519294738769531, 1.6539573669433594, 1.5560455322265625, 1.9025192260742188, -0.4849090576171875, 0.8598575592041016, 2.8118762969970703, 0.1535797119140625, 3.1323699951171875, -0.02447509765625, -1.09710693359375, 0.17177391052246094, -1.8726119995117188, 2.5660877227783203, 1.4820938110351562, 0.9674739837646484, -0.3075599670410156, -1.47802734375, 2.1558609008789062, 0.25006866455078125, 2.8870468139648438, 2.353179931640625, 1.95013427734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000137.npy"}
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 0.9443341493606567, "std": 1.8011301755905151, "min": -3.9569320678710938, "p10": -1.314938163757324, "median": 0.9905080795288086, "p90": 2.986742115020752, "max": 6.17730712890625, "pos_frac": 0.734375, "sample": [0.003696441650390625, 2.819671630859375, 2.390819549560547, 3.22198486328125, 1.4232635498046875, 1.1243877410888672, 1.1542510986328125, 1.9359054565429688, -1.253509521484375, 0.6985130310058594, 0.6354293823242188, -2.3551788330078125, 3.0123291015625, 0.7956085205078125, 6.17730712890625, -2.1016845703125, -2.1332664489746094, 2.366058349609375, 2.4014968872070312, 2.3025474548339844, 0.7089157104492188, 1.5768661499023438, 2.582855224609375, 2.137592315673828, 0.31739044189453125, 1.8043842315673828, 2.2863998413085938, 1.57342529296875, 0.3591156005859375, 0.34946441650390625, 2.92703914642334, -0.03764152526855469, 3.170307159423828, -0.36370849609375, -1.3412647247314453, -0.1436328887939453, -0.01177978515625, -2.7144088745117188, 0.07995033264160156, 3.048137664794922, 0.3172607421875, 2.363555908203125, 0.44763946533203125, 1.2136306762695312, 0.037273406982421875, -3.9569320678710938, 4.76371955871582, 0.7870979309082031, 1.2824935913085938, -0.1596221923828125, 1.03387451171875, 1.336639404296875, 2.25244140625, -1.2356643676757812, -1.8324127197265625, -0.17980575561523438, -1.0734539031982422, 1.0283374786376953, -0.13241195678710938, 1.0807952880859375, 3.9415359497070312, 0.9526786804199219, 2.632904052734375, 0.6067733764648438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000138.npy"}
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 1.2585362195968628, "std": 1.7008817195892334, "min": -1.7556419372558594, "p10": -0.8301074981689454, "median": 0.981048583984375, "p90": 3.816580200195313, "max": 5.4986419677734375, "pos_frac": 0.765625, "sample": [1.5796432495117188, 3.1119956970214844, 0.350250244140625, -0.1626739501953125, -1.7556419372558594, 2.1554832458496094, 0.28972625732421875, 3.7378387451171875, 1.0485877990722656, 0.28824806213378906, 0.10948562622070312, 5.4986419677734375, 2.860292434692383, -0.4465503692626953, 2.6539306640625, 3.8503265380859375, 4.615715026855469, 0.8603706359863281, -0.8349151611328125, 2.4748458862304688, -0.2523956298828125, -1.5508880615234375, -1.064361572265625, 0.9187107086181641, 2.1274795532226562, 0.8330764770507812, 1.645782470703125, 0.7219066619873047, 2.8566856384277344, 1.2680511474609375, 1.1577186584472656, 2.5327682495117188, 1.0175933837890625, -0.07489013671875, -0.35118865966796875, 1.9964141845703125, 1.6667404174804688, 1.3751106262207031, 0.9445037841796875, -0.040248870849609375, 3.3362045288085938, 0.3737754821777344, 4.055906295776367, 0.1624622344970703, -0.8188896179199219, 0.004924774169921875, 1.1496658325195312, 1.3589096069335938, 4.2458648681640625, 0.16285324096679688, 4.127777099609375, 0.202178955078125, 2.1277542114257812, -1.4689483642578125, -1.331085205078125, 0.70245361328125, -0.14235877990722656, 1.8951873779296875, -1.6512947082519531, 2.402740478515625, 4.185173034667969, 3.7353668212890625, 0.928741455078125, 0.7867889404296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000139.npy"}
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 1.5439517498016357, "std": 2.3902835845947266, "min": -6.923797607421875, "p10": -1.2722417831420898, "median": 1.5657730102539062, "p90": 4.459193420410156, "max": 5.8890838623046875, "pos_frac": 0.71875, "sample": [4.500946044921875, 0.778228759765625, 0.2620964050292969, -0.1435394287109375, 0.7245254516601562, 2.8582916259765625, -2.0702362060546875, -2.3205108642578125, 2.7483863830566406, 3.3604888916015625, -6.923797607421875, 2.580810546875, 2.0546340942382812, 1.293426513671875, 4.110195159912109, 1.7788467407226562, -1.2967166900634766, 0.9228134155273438, -0.800537109375, 3.1034011840820312, 1.7361030578613281, 3.2819976806640625, 1.9886016845703125, 2.0651092529296875, -0.2575111389160156, 1.5350341796875, 1.4252777099609375, 5.47838020324707, 0.8325347900390625, -0.829132080078125, -1.2151336669921875, -1.05059814453125, 2.7306346893310547, -0.8348236083984375, -0.5101280212402344, -0.3482208251953125, 4.264213562011719, 4.462310791015625, 4.4519195556640625, 4.1273345947265625, 2.2383575439453125, 4.0071258544921875, 5.465087890625, 3.8910579681396484, 1.5077323913574219, -1.8272113800048828, 4.948631286621094, 1.3995552062988281, 5.2978973388671875, -0.6286945343017578, -2.7408180236816406, 2.9974365234375, 1.0768699645996094, 1.6011428833007812, 5.8890838623046875, 1.1323089599609375, 4.066009521484375, -0.00150299072265625, 2.4934768676757812, 1.5965118408203125, 0.4192619323730469, 3.2741317749023438, -1.4510879516601562, 1.3048858642578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000140.npy"}
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 1.4029275178909302, "std": 2.1202025413513184, "min": -3.017120361328125, "p10": -1.0679058074951169, "median": 1.0166072845458984, "p90": 4.18316993713379, "max": 6.627128601074219, "pos_frac": 0.734375, "sample": [-3.017120361328125, 0.2330322265625, 3.04693603515625, 0.22127914428710938, 1.5130767822265625, 0.9353561401367188, 0.809722900390625, 2.20172119140625, 0.4725303649902344, -2.0881690979003906, 1.5019989013671875, 1.6796417236328125, 3.275236129760742, -0.5620098114013672, 4.016445159912109, 1.2739429473876953, 2.7689285278320312, 0.35195159912109375, -0.24831771850585938, -2.537078857421875, 0.8951034545898438, 0.3043231964111328, 0.2926216125488281, 2.3700180053710938, 3.0760040283203125, -0.2476806640625, 0.6770095825195312, 2.6862945556640625, -0.408782958984375, -1.23284912109375, -1.2157211303710938, -1.6432647705078125, 1.2719268798828125, 1.5621719360351562, 3.3679237365722656, 0.012035369873046875, 3.180461883544922, 0.6780014038085938, 6.627128601074219, 3.8883056640625, 4.5047607421875, 2.9591846466064453, 2.1216773986816406, 2.4726333618164062, 2.8983383178710938, -0.5497970581054688, -0.12982177734375, 2.0850448608398438, 5.278553009033203, 4.2546234130859375, -1.9247245788574219, 0.6977462768554688, 0.9884300231933594, 6.134185791015625, 2.8702850341796875, 6.167449951171875, -0.7230033874511719, 5.251569747924805, -0.4095001220703125, 1.216217041015625, 1.0447845458984375, 0.6842098236083984, -0.0581512451171875, -0.037471771240234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000141.npy"}
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 1.0465269088745117, "std": 1.7961697578430176, "min": -3.0587120056152344, "p10": -1.0248371124267575, "median": 0.7502059936523438, "p90": 3.151304626464844, "max": 5.5347747802734375, "pos_frac": 0.734375, "sample": [0.7376861572265625, -2.0928115844726562, 0.5012435913085938, 2.3151397705078125, 3.611011505126953, 5.0677337646484375, 0.409423828125, -0.5705146789550781, 1.2086448669433594, 2.1034088134765625, 0.4240264892578125, -0.002513885498046875, 2.1349124908447266, -3.0587120056152344, 0.20864486694335938, 1.3129425048828125, 1.8558349609375, -2.837434768676758, 0.5588531494140625, 0.6428089141845703, 1.4642486572265625, 3.176513671875, 1.3328075408935547, 1.8357734680175781, 1.4398117065429688, 2.9225616455078125, -0.4676704406738281, 0.9376068115234375, 4.894100189208984, -1.1263313293457031, 0.762725830078125, -0.328948974609375, 2.5693511962890625, 5.5347747802734375, 1.30029296875, -2.178302764892578, 0.121124267578125, -1.1940174102783203, -0.141876220703125, 1.2088279724121094, 4.595672607421875, 3.0924835205078125, 0.5805816650390625, 0.5318069458007812, 0.0349578857421875, 2.6161766052246094, -1.3342227935791016, 0.654449462890625, -0.19343948364257812, 2.6032562255859375, -0.7880172729492188, -0.6976280212402344, 1.436004638671875, 2.421295166015625, 0.24470901489257812, 0.7051162719726562, 2.8991165161132812, 3.3704299926757812, 2.450653076171875, -0.5029315948486328, 0.47097015380859375, -0.550048828125, 2.5594024658203125, 1.1832199096679688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000142.npy"}
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 1.0988903045654297, "std": 2.1381947994232178, "min": -4.74609375, "p10": -0.9194351196289061, "median": 1.0603313446044922, "p90": 3.7057252883911143, "max": 6.184638977050781, "pos_frac": 0.765625, "sample": [1.0503158569335938, 1.644256591796875, 1.2977333068847656, 2.109027862548828, 1.6332550048828125, -0.802764892578125, 1.1181411743164062, -0.4121551513671875, 3.970338821411133, -2.2322044372558594, 3.1550235748291016, -0.2410430908203125, 4.8655548095703125, 0.94525146484375, 1.598480224609375, 1.7165374755859375, -4.194889068603516, 1.8589553833007812, 3.7966079711914062, 0.9876079559326172, 4.4368438720703125, -0.6009330749511719, 6.184638977050781, 2.8899765014648438, 1.7375946044921875, 0.5126724243164062, 2.241790771484375, -3.70440673828125, 0.4661140441894531, 0.5827484130859375, 0.858428955078125, 0.9839687347412109, -0.670379638671875, 2.8944931030273438, 4.904182434082031, -0.9694366455078125, -0.5731658935546875, 0.23723220825195312, 0.5529861450195312, 0.7252883911132812, 1.1440315246582031, -4.74609375, 3.1520156860351562, 0.9398956298828125, -0.008878707885742188, 1.4849700927734375, 1.4569778442382812, 3.196441650390625, 0.08874893188476562, 0.6785125732421875, -1.2377090454101562, 1.0703468322753906, 2.2938919067382812, 5.777862548828125, -3.65771484375, 3.4936656951904297, 1.54742431640625, 0.383209228515625, 2.7241363525390625, 1.9590644836425781, 0.020952224731445312, -0.43102455139160156, 1.1490592956542969, 0.294525146484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000143.npy"}
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 1.2274713516235352, "std": 2.0776610374450684, "min": -5.0457305908203125, "p10": -1.0359329223632812, "median": 1.0798988342285156, "p90": 4.334702491760255, "max": 6.1159515380859375, "pos_frac": 0.6875, "sample": [4.5272979736328125, 2.5616416931152344, -0.0584716796875, 0.12132835388183594, -0.9603691101074219, 0.8655338287353516, -0.8802547454833984, 2.3348007202148438, 0.6521835327148438, 0.1806812286376953, -0.43164825439453125, 0.7739639282226562, 1.6967658996582031, 1.929788589477539, 2.6945953369140625, 1.1175689697265625, 3.2467193603515625, 0.6409988403320312, -1.0639495849609375, 2.138031005859375, 1.776336669921875, 2.144397735595703, 0.4939727783203125, -0.3416900634765625, -0.012889862060546875, 0.7434921264648438, 2.20123291015625, 3.5340805053710938, -0.9734268188476562, -0.38446807861328125, 2.2717437744140625, 1.0422286987304688, -1.9637565612792969, -0.703887939453125, 2.1065711975097656, -0.2091064453125, -0.5911293029785156, 4.160245895385742, 4.4094696044921875, 4.525230407714844, 3.253143310546875, 2.1041641235351562, -1.0627212524414062, 2.3987884521484375, -1.2295913696289062, 1.6341094970703125, 1.3876991271972656, -1.4327545166015625, 3.286510467529297, -0.38198089599609375, 0.24497222900390625, -0.7558135986328125, 1.1692123413085938, 5.198511123657227, -2.5986175537109375, 4.689573287963867, 6.1159515380859375, 5.2733001708984375, 0.78302001953125, -5.0457305908203125, 1.7250518798828125, 0.7127723693847656, 2.2745418548583984, 2.4982051849365234], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000144.npy"}
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 1.2544798851013184, "std": 2.2611687183380127, "min": -5.2676849365234375, "p10": -1.1059808731079102, "median": 1.1298637390136719, "p90": 4.182637786865235, "max": 6.699134826660156, "pos_frac": 0.703125, "sample": [1.2126388549804688, 1.6995811462402344, -1.1017284393310547, -0.031036376953125, 0.6141738891601562, -5.2676849365234375, 2.3546905517578125, -0.0568084716796875, 6.457859039306641, 1.2850799560546875, 1.574594497680664, 3.2040557861328125, 0.15229415893554688, 2.0342330932617188, 0.5647773742675781, 0.7018203735351562, -0.6881790161132812, 3.930328369140625, -0.7965927124023438, 1.7068252563476562, 0.14687347412109375, 0.7174949645996094, -1.1663742065429688, 2.241546630859375, 4.749229431152344, -0.08515167236328125, -0.40760040283203125, 1.692474365234375, 4.0999603271484375, -0.06911468505859375, 6.699134826660156, 2.6038055419921875, 1.3382492065429688, 5.491020202636719, 0.631927490234375, 4.218070983886719, 1.639383316040039, 3.272418975830078, 1.30242919921875, 3.184415817260742, -2.2728729248046875, 1.6289253234863281, 4.6459808349609375, 3.187173843383789, 0.07641029357910156, -1.97821044921875, 0.9430656433105469, 1.1425399780273438, -0.9375953674316406, 0.7638168334960938, 2.3175926208496094, 2.0801963806152344, 2.8195133209228516, -0.3041725158691406, 0.9072494506835938, 5.9967041015625, 1.1171875, -0.8911590576171875, -3.345184326171875, 0.902069091796875, 3.3313751220703125, -1.75048828125, -1.1078033447265625, -0.8367195129394531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000145.npy"}
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 1.3463964462280273, "std": 2.106631278991699, "min": -2.3526344299316406, "p10": -1.558448028564453, "median": 1.2755661010742188, "p90": 3.778536224365235, "max": 8.428245544433594, "pos_frac": 0.75, "sample": [3.456695556640625, 2.998891830444336, -0.8611984252929688, 1.2712860107421875, 0.7305908203125, -1.5875320434570312, -0.9182090759277344, 1.746734619140625, 2.369831085205078, 2.1975936889648438, -2.3281326293945312, -2.1909408569335938, 2.9374237060546875, 2.2016830444335938, 0.19878387451171875, -1.4905853271484375, 2.6039161682128906, 0.2945556640625, 3.3436927795410156, -1.8398513793945312, 1.658843994140625, 4.89141845703125, 1.4342041015625, 0.6765956878662109, -2.3526344299316406, -0.2805213928222656, 1.080850601196289, 4.0012664794921875, -1.1323623657226562, 2.3137741088867188, 1.1627388000488281, -2.253673553466797, 6.6890869140625, 0.846038818359375, 0.35759735107421875, -0.5860328674316406, 1.2894287109375, 0.6803207397460938, 1.423858642578125, -0.49692726135253906, 3.8627395629882812, 1.4824295043945312, 0.6370201110839844, 2.5153350830078125, 3.582061767578125, 4.587715148925781, 2.675992965698242, 1.706686019897461, 3.2549362182617188, 1.27984619140625, 1.7854022979736328, 1.043060302734375, 2.9838733673095703, 1.2133712768554688, -1.591176986694336, -0.2153778076171875, 8.428245544433594, -0.63226318359375, 1.0990028381347656, 1.008218765258789, 2.5126113891601562, 0.8582992553710938, 1.5132980346679688, 4.038938522338867], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000146.npy"}
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 1.0431100130081177, "std": 2.1669564247131348, "min": -6.47955322265625, "p10": -1.2800270080566403, "median": 0.9717769622802734, "p90": 3.556937408447266, "max": 5.879302978515625, "pos_frac": 0.6875, "sample": [1.4829254150390625, 1.1606178283691406, 2.52679443359375, -0.7369003295898438, 1.3469123840332031, 2.47259521484375, -1.6773300170898438, 1.4583892822265625, -0.436920166015625, -0.3961772918701172, 2.38494873046875, 2.272014617919922, 2.9096527099609375, 3.56787109375, -0.4214019775390625, 3.5314254760742188, 5.093791961669922, -6.47955322265625, 2.709491729736328, -0.2955818176269531, 1.299703598022461, 0.0481109619140625, 0.2508392333984375, 0.232421875, 0.4729461669921875, 4.823270797729492, -0.9022293090820312, 2.809417724609375, 1.4226531982421875, 0.494903564453125, -0.15941810607910156, 1.0369796752929688, 0.686309814453125, -1.84228515625, -0.77557373046875, 2.0820846557617188, 2.774698257446289, 5.0043792724609375, 5.2810211181640625, -0.8084049224853516, -0.66729736328125, 0.1492156982421875, 0.7351455688476562, 2.73260498046875, 2.94024658203125, -1.4419403076171875, 5.879302978515625, -0.7261428833007812, 2.2002906799316406, 0.2835502624511719, 1.3085479736328125, 1.7618789672851562, 1.5679473876953125, -0.4643115997314453, -1.5037994384765625, 0.21094512939453125, 0.9065742492675781, 0.3536701202392578, 2.8142623901367188, -2.517059326171875, -2.91009521484375, 1.6509780883789062, 4.852407455444336, -0.06327629089355469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000147.npy"}
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 1.4712722301483154, "std": 2.2392170429229736, "min": -3.3433380126953125, "p10": -1.2786214828491205, "median": 1.3355712890625, "p90": 4.491445732116699, "max": 6.737060546875, "pos_frac": 0.75, "sample": [4.818572998046875, -2.255006790161133, 1.579315185546875, -1.4920578002929688, 3.8369064331054688, 6.069404602050781, 2.193096160888672, -3.056924819946289, 2.8919219970703125, 3.5998916625976562, -0.7806034088134766, 3.8486061096191406, 1.1659774780273438, 1.3160896301269531, 3.516307830810547, -0.27896881103515625, -0.7709617614746094, 4.528228759765625, 1.922698974609375, -0.46907806396484375, 2.1373939514160156, -1.7211990356445312, 4.61383056640625, 0.7525997161865234, 4.405618667602539, 5.53790283203125, 3.059307098388672, 0.09297561645507812, 6.737060546875, 1.4789657592773438, 1.3417434692382812, -0.6939697265625, 3.5398502349853516, -1.97259521484375, 1.73779296875, 3.2177810668945312, 0.6268310546875, 2.912443161010742, -3.3433380126953125, -0.19736099243164062, 1.3790817260742188, 0.19344329833984375, 3.976165771484375, 6.0478363037109375, -2.4115943908691406, 2.2916736602783203, 1.727102279663086, 1.2148094177246094, 1.90411376953125, 1.3293991088867188, 1.8644943237304688, 0.8256072998046875, 0.6524314880371094, 0.3557586669921875, 0.07769393920898438, 2.7454681396484375, 0.6961669921875, 1.0364227294921875, 1.050241470336914, -0.4976959228515625, 0.03994560241699219, 2.02325439453125, -0.36763763427734375, -0.4398040771484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000148.npy"}
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 1.6928033828735352, "std": 2.3128156661987305, "min": -3.9768905639648438, "p10": -0.9029851913452148, "median": 1.2421655654907227, "p90": 4.886675834655763, "max": 7.407131195068359, "pos_frac": 0.796875, "sample": [1.7938232421875, 3.742523193359375, 4.438758850097656, 0.4708976745605469, 1.91900634765625, -0.09619522094726562, -0.5917472839355469, 3.1367568969726562, -0.4282188415527344, 3.417928695678711, -1.6764297485351562, 1.2963504791259766, 0.26538848876953125, 1.0310478210449219, 2.2628173828125, 0.6780357360839844, 1.3972129821777344, 0.8324127197265625, 0.3991127014160156, 0.4320068359375, 0.8547840118408203, 6.288032531738281, 4.673820495605469, -1.9061241149902344, 7.407131195068359, 2.5681705474853516, 0.7881698608398438, -0.9144115447998047, 1.4373111724853516, -3.9768905639648438, -0.35128021240234375, 4.066978454589844, -1.0436019897460938, 6.869649887084961, 0.9346256256103516, 4.977899551391602, 3.6397857666015625, -0.8763236999511719, 5.2704010009765625, -0.7821216583251953, -1.1783905029296875, 1.7292938232421875, 0.06002235412597656, 3.6953887939453125, 6.420375823974609, 3.639383316040039, 1.1879806518554688, 2.7560176849365234, 1.0147781372070312, 0.6114044189453125, 1.5695114135742188, 0.7674331665039062, 0.8265228271484375, -2.037750244140625, 0.42235755920410156, 3.466705322265625, 1.746734619140625, 5.83526611328125, 3.6276702880859375, 1.4159164428710938, 0.9615478515625, 3.0356388092041016, 0.20750045776367188, 1.910614013671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000149.npy"}
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 1.9585299491882324, "std": 2.1404976844787598, "min": -2.7538833618164062, "p10": -0.5230422973632811, "median": 2.0854949951171875, "p90": 4.755265045166016, "max": 6.8963623046875, "pos_frac": 0.8125, "sample": [2.5380783081054688, 0.0799713134765625, 5.554107666015625, 2.0123977661132812, -2.4705734252929688, -0.223663330078125, 1.1249942779541016, 3.6043014526367188, 2.4692535400390625, 4.155107498168945, 0.29355621337890625, 2.1915054321289062, 0.6447601318359375, 2.961639404296875, -0.23161888122558594, 1.6913986206054688, 2.087860107421875, 5.9257049560546875, 2.366180419921875, 2.85772705078125, 2.0915679931640625, 0.38426971435546875, 3.7284393310546875, -1.9021453857421875, 0.4663238525390625, 3.646860122680664, 0.9153690338134766, 3.278583526611328, -0.40567779541015625, 4.845634460449219, 2.191648483276367, 2.0831298828125, 3.9331321716308594, 1.1837539672851562, 6.441307067871094, -2.7538833618164062, 1.406412124633789, 2.2905712127685547, 2.7007522583007812, 5.30255126953125, 1.994842529296875, 4.8534393310546875, -0.39665985107421875, 1.0663299560546875, 4.544403076171875, 0.7413787841796875, 3.7935562133789062, 3.822294235229492, -1.1785049438476562, 6.8963623046875, 0.3006744384765625, 0.3727264404296875, 2.1862106323242188, -0.00576019287109375, 0.9340858459472656, 3.3558731079101562, -1.2884292602539062, -1.492156982421875, 4.19049072265625, -0.5733413696289062, 1.0383071899414062, 3.2313156127929688, 1.9317855834960938, 3.565399169921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000150.npy"}
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 2.147468328475952, "std": 2.27799129486084, "min": -2.4498062133789062, "p10": -0.6337985992431641, "median": 2.197803497314453, "p90": 4.469093322753906, "max": 8.925018310546875, "pos_frac": 0.8125, "sample": [3.418609619140625, 1.7537250518798828, 4.015239715576172, -0.6558570861816406, 0.7392005920410156, 0.2739219665527344, 8.925018310546875, 0.2517814636230469, 5.4540863037109375, 0.6270484924316406, 1.8223953247070312, 3.3943233489990234, 2.607818603515625, 0.380523681640625, 2.376619338989258, 2.6861934661865234, 4.371147155761719, -0.3634376525878906, 1.936727523803711, 0.3475151062011719, -0.5752716064453125, -0.6898078918457031, 3.4513931274414062, -1.0196304321289062, 7.2248077392578125, -1.7319412231445312, 2.614715576171875, 2.2580223083496094, -0.5823287963867188, -0.452423095703125, 2.528026580810547, -1.3440475463867188, 4.3555450439453125, 2.4246673583984375, 1.0736236572265625, -0.8070068359375, 2.1947097778320312, 2.5336456298828125, 0.5086212158203125, 7.177154541015625, 0.6327762603759766, 1.9569740295410156, 4.056209564208984, 1.2685832977294922, 5.88531494140625, 4.511070251464844, 3.9961166381835938, 3.8463287353515625, 3.8663177490234375, 1.6439132690429688, 2.200897216796875, 3.0830459594726562, 1.5143985748291016, 6.01361083984375, -2.4498062133789062, 3.573823928833008, 3.1780872344970703, 1.5511856079101562, -0.16489219665527344, 1.430419921875, 2.2235641479492188, 4.325244903564453, 0.20389938354492188, 3.5858154296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000151.npy"}
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 1.6905221939086914, "std": 2.573993682861328, "min": -3.7663116455078125, "p10": -0.891940689086914, "median": 1.3593931198120117, "p90": 5.476993751525879, "max": 7.461097717285156, "pos_frac": 0.734375, "sample": [-0.7719097137451172, 0.8823699951171875, 3.4381961822509766, 0.7255706787109375, -3.5062637329101562, 0.04341888427734375, 3.1851654052734375, 3.196533203125, 0.981597900390625, -1.3272552490234375, 1.3118114471435547, -0.87237548828125, 1.8068313598632812, 0.5037002563476562, 4.372962951660156, 5.451568603515625, 0.571014404296875, 1.5830917358398438, 1.0237464904785156, -0.9003257751464844, -3.6438751220703125, 4.952890396118164, 2.4316940307617188, 2.4466629028320312, 0.075927734375, 0.46484947204589844, 3.4784469604492188, 1.5805435180664062, 0.6377086639404297, 1.7295989990234375, -0.4476318359375, 6.246295928955078, 0.6733684539794922, 0.5713958740234375, -0.25305938720703125, 0.198486328125, 1.4069747924804688, -0.6701126098632812, 3.320465087890625, 2.666595458984375, -1.7860565185546875, 5.487890243530273, 3.6012496948242188, -0.65570068359375, 5.140903472900391, 5.532131195068359, 3.5849075317382812, -0.4780158996582031, 2.7498092651367188, 2.7485809326171875, -3.7663116455078125, 6.067237854003906, 0.23387527465820312, 6.431831359863281, 3.923948287963867, 2.1116409301757812, 5.795196533203125, 4.719749450683594, -0.7762603759765625, 7.461097717285156, -0.47662353515625, -1.8460693359375, -0.2836456298828125, 3.105377197265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000152.npy"}
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 1.8225750923156738, "std": 2.1199917793273926, "min": -2.8994979858398438, "p10": -0.6518917083740234, "median": 1.5833988189697266, "p90": 4.6528663635253915, "max": 6.743282318115234, "pos_frac": 0.828125, "sample": [2.4421825408935547, 1.63507080078125, 1.6178054809570312, 4.78375244140625, 3.3797035217285156, 4.500099182128906, 3.4294967651367188, 0.9255905151367188, 1.2548294067382812, 1.3308181762695312, 1.2141189575195312, 5.140537261962891, 1.2905044555664062, 4.7183380126953125, 2.1204681396484375, -0.5784912109375, 2.9230728149414062, 0.4776153564453125, 0.18923187255859375, 5.61016845703125, 6.743282318115234, 4.1489105224609375, -1.301055908203125, 1.5489921569824219, 6.663204193115234, -1.3933582305908203, 4.878204345703125, 0.7451992034912109, 0.439910888671875, 4.191490173339844, -2.8994979858398438, 1.1705665588378906, -2.627635955810547, 1.1241722106933594, 2.4449310302734375, 0.37299156188964844, 0.9869918823242188, 4.119895935058594, 1.83990478515625, 1.7288589477539062, 1.9219818115234375, -0.6614761352539062, 4.361824035644531, 2.9749412536621094, 1.7637939453125, 0.18369293212890625, 3.0127410888671875, 1.4000015258789062, 2.3652420043945312, -0.5391674041748047, 0.8781890869140625, 2.442462921142578, 0.42389488220214844, 0.5456085205078125, 4.194664001464844, -1.7693939208984375, -0.36834716796875, 3.931520462036133, -1.0116424560546875, 3.30584716796875, 2.7325172424316406, 0.3111381530761719, -0.6295280456542969, 1.543426513671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000153.npy"}
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 1.3450573682785034, "std": 2.7528128623962402, "min": -5.48297119140625, "p10": -2.3064804077148438, "median": 1.600224494934082, "p90": 4.585367202758789, "max": 8.124473571777344, "pos_frac": 0.75, "sample": [-2.7502975463867188, 0.9573135375976562, -5.48297119140625, -5.204875946044922, 4.591575622558594, 3.0491085052490234, -2.3765010833740234, 1.0432491302490234, 0.551177978515625, -2.3260726928710938, 2.1813793182373047, 0.09681510925292969, 2.8073368072509766, 0.19757080078125, -1.7683944702148438, -2.2607650756835938, -0.8542060852050781, 1.7038211822509766, 0.2632904052734375, 1.2035140991210938, 4.245277404785156, 6.559551239013672, 5.4442596435546875, 2.1675186157226562, -3.1199798583984375, 2.059999465942383, 5.638694763183594, 0.7060470581054688, 6.73480224609375, -5.3488006591796875, 2.487520217895508, 1.8451824188232422, 4.157958984375, 2.717742919921875, 1.8302421569824219, 0.0549163818359375, 1.1558990478515625, 2.196086883544922, 3.0547618865966797, 1.14447021484375, -0.5494918823242188, 0.08011627197265625, 1.918050765991211, -0.6362228393554688, 0.8566665649414062, 8.124473571777344, 2.2613449096679688, 2.9645862579345703, 2.4655075073242188, -0.8471336364746094, -0.1281585693359375, 0.1781158447265625, 1.5985984802246094, -0.987762451171875, 4.93408203125, 2.4338607788085938, 4.570880889892578, -0.4367084503173828, 3.992319107055664, 3.4789390563964844, 3.4589767456054688, 1.6018505096435547, 0.9614372253417969, 2.43511962890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000154.npy"}
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 1.73045015335083, "std": 2.9542205333709717, "min": -4.564704895019531, "p10": -2.1355007171630858, "median": 1.6221904754638672, "p90": 5.639109230041504, "max": 9.002914428710938, "pos_frac": 0.75, "sample": [0.6462936401367188, 0.5042266845703125, 5.675048828125, 2.0295372009277344, 0.49228668212890625, 1.6396293640136719, 0.06815338134765625, 3.5972728729248047, -0.7791423797607422, 6.725124359130859, 4.283660888671875, 4.276082992553711, -0.397979736328125, -3.0282821655273438, 0.085052490234375, -1.7285614013671875, 2.2926025390625, 6.39874267578125, 2.8473663330078125, 5.55525016784668, 6.66650390625, 0.0067882537841796875, 3.3595504760742188, -2.212932586669922, 3.880584716796875, 7.3338165283203125, 1.3467864990234375, 3.72601318359375, -1.2638511657714844, 3.9339675903320312, -4.55804443359375, 1.2053108215332031, 9.002914428710938, 3.6617279052734375, 4.6305389404296875, 3.320220947265625, -0.13646697998046875, -1.1095962524414062, -3.65576171875, 2.0039596557617188, -2.8236541748046875, 0.8355979919433594, 3.3207855224609375, 1.2379913330078125, 4.940711975097656, 1.6047515869140625, 1.649322509765625, 4.6134796142578125, -2.4984588623046875, 2.1546554565429688, 3.049774169921875, 1.404022216796875, 0.08976173400878906, 0.354248046875, 1.1683855056762695, -0.21576690673828125, 2.2711143493652344, -1.9548263549804688, -4.564704895019531, -0.8383255004882812, 1.7582473754882812, 6.443824768066406, 4.014469146728516, 0.40900421142578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000155.npy"}
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 2.084041118621826, "std": 2.6511764526367188, "min": -3.262451171875, "p10": -0.9169752120971679, "median": 1.8191394805908203, "p90": 5.930484008789063, "max": 9.77740478515625, "pos_frac": 0.796875, "sample": [0.6757354736328125, -0.8779850006103516, -0.638275146484375, -2.636859893798828, 4.83294677734375, 6.0879364013671875, 9.19727897644043, 0.9858627319335938, 2.5384063720703125, 5.419261932373047, 1.2196502685546875, 3.9753646850585938, -1.022003173828125, 3.263427734375, 5.88897705078125, 4.272003173828125, 5.572418212890625, 2.7968215942382812, -0.539947509765625, 2.9177322387695312, 0.9944400787353516, 0.7598514556884766, 1.8179969787597656, 2.0435867309570312, 0.7585906982421875, 1.0637893676757812, -0.3736076354980469, 3.7961883544921875, 2.7654571533203125, 3.7684707641601562, 2.0411376953125, 1.820281982421875, -0.933685302734375, 2.41497802734375, 1.5492477416992188, 9.77740478515625, -3.262451171875, 2.454498291015625, -1.5967025756835938, 6.256904602050781, 3.5959548950195312, 6.5242919921875, 0.512908935546875, 1.3025245666503906, 0.5192642211914062, 0.5668601989746094, -0.3802032470703125, 0.33769989013671875, 2.047882080078125, 1.8758697509765625, 1.3178482055664062, -1.96868896484375, 2.721881866455078, 1.021820068359375, 3.3820114135742188, 0.531463623046875, -0.616851806640625, 2.5099563598632812, 5.948272705078125, 0.4418525695800781, 1.8988475799560547, 7.002838134765625, 1.508544921875, -1.0693435668945312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000156.npy"}
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 1.7060656547546387, "std": 2.5894224643707275, "min": -5.046119689941406, "p10": -0.6367210388183593, "median": 1.1487483978271484, "p90": 5.117234802246094, "max": 10.198471069335938, "pos_frac": 0.765625, "sample": [4.4347686767578125, 6.321800231933594, -0.6536788940429688, 3.0473175048828125, 3.8802108764648438, -2.3628387451171875, 2.2183685302734375, 0.0378570556640625, 5.519969940185547, -0.12812042236328125, -0.5684432983398438, 0.8901138305664062, 6.2014617919921875, 0.9404258728027344, -5.046119689941406, 0.09230232238769531, 0.6215667724609375, 1.0256500244140625, 3.553049087524414, 5.471656799316406, 2.3922348022460938, 3.475078582763672, 10.198471069335938, 2.5837154388427734, 4.64056396484375, 1.1203994750976562, 0.11296463012695312, -0.2584190368652344, 1.0952301025390625, -0.5971527099609375, 3.830982208251953, 0.2003021240234375, 2.810209274291992, 0.9150848388671875, 2.0917205810546875, 5.141693115234375, -1.469635009765625, 5.513973236083984, 3.512847900390625, 0.8629608154296875, -0.5710716247558594, 0.8277664184570312, -0.4456920623779297, 4.390655517578125, 5.0601654052734375, 1.0013351440429688, -0.4096221923828125, 1.8673877716064453, 1.2581539154052734, 0.3099842071533203, 1.863067626953125, -1.8233146667480469, 1.1770973205566406, 1.0539283752441406, 2.365631103515625, 3.467710494995117, 0.8399925231933594, 1.9636001586914062, -3.5339889526367188, 2.9672775268554688, 1.79071044921875, -0.42400360107421875, -2.784454345703125, 3.3053436279296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000157.npy"}
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 1.7115428447723389, "std": 2.4945318698883057, "min": -3.757568359375, "p10": -1.452313232421875, "median": 1.6557655334472656, "p90": 4.78684253692627, "max": 8.111968994140625, "pos_frac": 0.6875, "sample": [0.9433212280273438, 3.1990585327148438, 1.7728500366210938, 2.129779815673828, -3.1625518798828125, 3.114238739013672, 7.073486328125, 2.9760284423828125, -1.4673919677734375, -0.2444915771484375, 4.7299346923828125, 2.767242431640625, 2.222198486328125, 2.871000289916992, -2.891082763671875, 2.262491226196289, 5.9381103515625, 1.1584911346435547, 1.4388790130615234, 4.300407409667969, -0.6675186157226562, 1.8653411865234375, 3.0308876037597656, 0.9774589538574219, 4.81123161315918, 1.5386810302734375, -0.1777191162109375, 5.838905334472656, 2.4993133544921875, 2.57928466796875, -0.17840576171875, 3.0890655517578125, 4.054100036621094, 0.9523086547851562, 4.596559524536133, 3.393461227416992, 3.5694427490234375, -0.4124755859375, 1.2761211395263672, -2.2725067138671875, 5.75897216796875, 1.3949928283691406, 2.625030517578125, 5.671356201171875, 2.5422821044921875, 2.4580116271972656, -0.12137413024902344, -0.03946685791015625, 8.111968994140625, -0.7432327270507812, 0.9760169982910156, -0.2862701416015625, -1.4171295166015625, 0.799285888671875, -0.03144264221191406, -0.6190986633300781, -1.8119258880615234, 4.41204833984375, 2.7024574279785156, 1.0175743103027344, -1.9615287780761719, 0.47159576416015625, -0.10935020446777344, -3.757568359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000158.npy"}
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 1.8885865211486816, "std": 2.9497358798980713, "min": -8.919570922851562, "p10": -1.3030244827270507, "median": 1.630197525024414, "p90": 5.228288078308106, "max": 8.737777709960938, "pos_frac": 0.796875, "sample": [2.757122039794922, 3.831228256225586, 1.120086669921875, 4.968925476074219, 1.9199600219726562, -2.0589523315429688, -3.810699462890625, 6.304765701293945, 0.3005218505859375, 8.737777709960938, 4.5040740966796875, 1.4232406616210938, 6.337282180786133, 1.5666885375976562, -0.5630645751953125, 1.3671455383300781, 2.4595794677734375, 8.15252685546875, 2.3231201171875, 0.14536666870117188, 0.6618080139160156, -3.330474853515625, 1.6164627075195312, 2.4330215454101562, 1.3839874267578125, 1.3167133331298828, 1.6439323425292969, -1.6979598999023438, 4.948486328125, 8.483837127685547, -1.3619346618652344, 3.389923095703125, 1.781290054321289, -0.36553955078125, 0.76690673828125, 1.8296871185302734, 5.185935974121094, 2.5650711059570312, 0.7249526977539062, 0.39090728759765625, 1.4514617919921875, -1.5062255859375, 6.725494384765625, 0.3993549346923828, 2.207172393798828, 5.246438980102539, -1.165567398071289, 2.8172855377197266, -0.48954010009765625, 1.95867919921875, -0.1280689239501953, 4.5228271484375, 2.4818191528320312, 1.0962677001953125, 0.2804450988769531, -8.919570922851562, 4.214506149291992, 2.108165740966797, -0.3903045654296875, 1.353994369506836, 2.512664794921875, 1.4133758544921875, 4.9245452880859375, 3.6006011962890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000159.npy"}
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 1.8935188055038452, "std": 3.161186456680298, "min": -4.384101867675781, "p10": -1.5497854232788084, "median": 1.5363969802856445, "p90": 6.153559875488282, "max": 10.790420532226562, "pos_frac": 0.71875, "sample": [0.689697265625, -0.9027748107910156, 2.526885986328125, 7.195671081542969, 1.1923389434814453, 4.235206604003906, 0.173980712890625, -0.2798328399658203, 2.321165084838867, 4.176445007324219, 9.498992919921875, -0.5609130859375, 6.339849472045898, 3.6014862060546875, 3.324007034301758, 6.3048095703125, 0.10654830932617188, -3.2938308715820312, 3.3274688720703125, 4.430667877197266, -0.02625274658203125, -2.2254066467285156, 0.18883895874023438, -1.3289031982421875, 2.572172164916992, 3.034282684326172, 2.9235458374023438, 0.5749073028564453, 3.5053863525390625, 0.19604873657226562, 1.740488052368164, 2.128009796142578, 10.790420532226562, 8.72372055053711, 0.510345458984375, -1.4613170623779297, 4.2889404296875, 1.1731948852539062, 0.35962677001953125, 4.5992431640625, 0.5036163330078125, 3.660411834716797, -0.11301040649414062, -1.584625244140625, -0.6922569274902344, 2.721038818359375, -4.275993347167969, -1.4684925079345703, 2.9342193603515625, 1.9812469482421875, -2.9427413940429688, -0.6574306488037109, -1.1749114990234375, 4.2129364013671875, 3.6794891357421875, 1.2794876098632812, 0.31998443603515625, 5.8006439208984375, -4.384101867675781, 3.351123809814453, 3.9344482421875, 7.706150054931641, 1.332305908203125, -1.613494873046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000160.npy"}
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 1.9934735298156738, "std": 3.1191980838775635, "min": -6.910736083984375, "p10": -1.1656540870666503, "median": 1.5968494415283203, "p90": 6.164649009704591, "max": 10.303020477294922, "pos_frac": 0.71875, "sample": [1.6136970520019531, -4.595188140869141, 0.7087020874023438, 1.545806884765625, 2.2339630126953125, -0.7815170288085938, 3.130950927734375, 7.9162750244140625, -1.172342300415039, -1.1500482559204102, -1.874603271484375, 3.2487716674804688, 2.123279571533203, 3.8779449462890625, 5.8968048095703125, 1.9643688201904297, 2.6445865631103516, -6.910736083984375, 2.9979324340820312, -0.70623779296875, -1.5708274841308594, 1.228759765625, 1.1315765380859375, -0.30167388916015625, 2.3919677734375, -2.041057586669922, -0.17512893676757812, 3.363710403442383, 2.6729583740234375, -0.06594085693359375, 6.044746398925781, 5.682403564453125, 4.112102508544922, 7.294334411621094, 6.777099609375, -0.9739761352539062, -1.8930282592773438, 9.497848510742188, 4.207672119140625, 3.8179359436035156, 1.5269603729248047, 6.276069641113281, -0.3153514862060547, 1.7781295776367188, 1.1428146362304688, -0.3063926696777344, 1.1553421020507812, 3.119607925415039, 0.8889617919921875, 2.6506309509277344, -0.3341064453125, 2.0711936950683594, 6.216035842895508, 0.7289581298828125, 0.32482147216796875, 0.10947227478027344, 1.5800018310546875, 1.2567558288574219, 5.907073974609375, 2.5636444091796875, 10.303020477294922, -0.5723724365234375, 0.4660606384277344, 5.131084442138672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000161.npy"}
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 2.7857651710510254, "std": 3.0921390056610107, "min": -3.7810211181640625, "p10": -1.448583984375, "median": 2.8053903579711914, "p90": 6.4878276824951175, "max": 11.471954345703125, "pos_frac": 0.828125, "sample": [5.8499298095703125, 6.406787872314453, -1.474822998046875, 1.1385841369628906, 0.889007568359375, 4.293651580810547, 3.013031005859375, 1.1217498779296875, 4.28497314453125, 6.461397171020508, 2.0002288818359375, 3.6748390197753906, 4.072332382202148, 11.471954345703125, 4.577690124511719, 2.967012405395508, 2.600311279296875, 8.888099670410156, 2.643768310546875, 8.424072265625, 3.7779693603515625, 3.535564422607422, 1.413909912109375, -1.387359619140625, 2.6058387756347656, -2.3730010986328125, 4.429656982421875, 2.4576988220214844, -2.4557418823242188, 4.778839111328125, 1.1166629791259766, 3.41845703125, 0.289581298828125, 6.459604263305664, 5.8843536376953125, 2.3396987915039062, 6.499155044555664, 4.028474807739258, 0.2458343505859375, -1.5939903259277344, -2.418304443359375, -1.2707138061523438, 6.648475646972656, 6.621084213256836, 3.8414363861083984, 2.4689178466796875, -2.0669097900390625, 3.0756683349609375, -3.7810211181640625, -0.3111419677734375, 1.0619468688964844, 0.4133453369140625, 3.847747802734375, 3.699983596801758, 8.0025634765625, 4.000450134277344, 0.14299774169921875, -1.3203582763671875, 1.6680908203125, 1.4862136840820312, 1.2672080993652344, 6.09581184387207, 5.10883903503418, 1.2308197021484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000162.npy"}
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 2.7662222385406494, "std": 3.509826183319092, "min": -4.949745178222656, "p10": -1.741928482055664, "median": 2.4468765258789062, "p90": 6.921141815185547, "max": 10.994915008544922, "pos_frac": 0.796875, "sample": [-0.37899303436279297, 4.7147979736328125, -4.949745178222656, 6.906097412109375, 3.6008148193359375, 0.5142326354980469, -4.186428070068359, 2.4952850341796875, 9.1121826171875, -2.071014404296875, 3.590667724609375, 5.468391418457031, 9.295829772949219, 1.6707229614257812, 1.9542312622070312, 3.6141281127929688, 3.6767616271972656, 0.16028594970703125, 9.809555053710938, 2.7864761352539062, 0.49994659423828125, 3.2201004028320312, 4.916608810424805, -1.1844558715820312, -2.113353729248047, 6.849250793457031, 2.3832359313964844, 5.435646057128906, 3.1652088165283203, 2.398468017578125, 4.001569747924805, 2.1404190063476562, 4.719276428222656, 0.45934486389160156, 5.6532745361328125, 5.4098358154296875, -3.024200439453125, 6.927589416503906, -1.6906776428222656, 1.1010513305664062, 3.063831329345703, 10.994915008544922, 1.9032363891601562, 5.3592681884765625, -0.505126953125, -0.8916206359863281, -0.6454391479492188, 0.7375831604003906, 1.2749099731445312, 0.7356719970703125, -4.089103698730469, 6.318849563598633, 5.328495025634766, 0.586395263671875, 1.7271537780761719, -1.7638931274414062, 7.937751770019531, 8.49005126953125, 6.084587097167969, 5.503946304321289, 4.105817794799805, 1.6248607635498047, 1.7624282836914062, 2.341238021850586], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000163.npy"}
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 2.730526924133301, "std": 3.721027135848999, "min": -6.994415283203125, "p10": -1.9044610977172844, "median": 2.217374801635742, "p90": 7.531835174560547, "max": 12.113731384277344, "pos_frac": 0.8125, "sample": [-2.939544677734375, 0.9303855895996094, 1.1306533813476562, 2.2184104919433594, 1.9386367797851562, 1.1540603637695312, 11.959800720214844, 8.107444763183594, -0.55499267578125, 1.4664115905761719, 4.4398040771484375, -1.0785140991210938, 5.066583633422852, -2.5523834228515625, 4.482460021972656, 4.97564697265625, 5.7311248779296875, 2.1985702514648438, 8.767181396484375, -0.8817234039306641, 1.59130859375, -2.8942413330078125, 3.1983871459960938, 2.216339111328125, 0.3884773254394531, 2.134004592895508, 7.377086639404297, 1.2553482055664062, 2.223236083984375, 0.2372283935546875, 4.527658462524414, 6.992931365966797, -1.1000728607177734, 3.6913108825683594, 5.244171142578125, -2.2491989135742188, 5.430809020996094, 12.113731384277344, 10.456727981567383, -4.011573791503906, 1.4814643859863281, 2.5531692504882812, 1.4870414733886719, 4.401605606079102, 2.321613311767578, 2.4182090759277344, 4.8470306396484375, 0.7165813446044922, 5.939788818359375, 1.0571250915527344, 1.5790863037109375, 2.2456016540527344, 7.964008331298828, -2.3074398040771484, 0.019956588745117188, -6.994415283203125, 0.5235366821289062, 7.598155975341797, 7.217674255371094, -0.4330940246582031, 7.03143310546875, 1.6771087646484375, 2.984363555908203, 3.0404281616210938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000164.npy"}
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 2.503763198852539, "std": 3.1647257804870605, "min": -4.308692932128906, "p10": -1.281958770751953, "median": 2.422191619873047, "p90": 6.283879280090333, "max": 9.9088134765625, "pos_frac": 0.78125, "sample": [7.3803253173828125, 3.336559295654297, 3.3867721557617188, 4.391338348388672, 1.3665351867675781, -0.10236358642578125, 1.4747085571289062, -1.33331298828125, 2.9675064086914062, 1.8693561553955078, 0.44187164306640625, 4.211769104003906, 6.126350402832031, 0.7527713775634766, -1.0174407958984375, 2.412750244140625, 2.0766067504882812, 2.2401390075683594, -3.0471267700195312, -0.00676727294921875, -4.308692932128906, 0.29558563232421875, 9.776863098144531, -0.847747802734375, 6.13934326171875, 7.314239501953125, 2.1677017211914062, 1.9341392517089844, 8.398426055908203, 6.957149505615234, 3.0297012329101562, -0.05969047546386719, 9.9088134765625, 5.779533386230469, 3.23101806640625, -3.653472900390625, 0.575653076171875, 5.9999237060546875, 4.322635650634766, 3.9781036376953125, -1.335550308227539, 0.7220535278320312, 0.8712921142578125, 0.6671848297119141, 6.345823287963867, -1.1621322631835938, 2.4316329956054688, 1.6464118957519531, 3.0267372131347656, 0.18600082397460938, -1.0952301025390625, 4.170127868652344, -3.5044784545898438, 4.965129852294922, 3.2658920288085938, -2.3978538513183594, 3.3382186889648438, 0.6251678466796875, 3.8032093048095703, 3.6160888671875, 4.648094177246094, 6.0132293701171875, 4.461448669433594, 5.0647735595703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000165.npy"}
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 1.9662084579467773, "std": 3.2375712394714355, "min": -4.0970916748046875, "p10": -1.6410324096679685, "median": 1.6274967193603516, "p90": 6.397647857666016, "max": 10.464454650878906, "pos_frac": 0.671875, "sample": [3.9840087890625, -0.9915714263916016, -2.07452392578125, 5.298912048339844, 4.7422027587890625, 1.1797561645507812, -1.3276214599609375, 7.247627258300781, 2.0765838623046875, 3.6966552734375, -1.41650390625, 2.1425209045410156, 6.388408660888672, -1.7372589111328125, 4.27972412109375, -0.1879730224609375, 0.5983352661132812, 6.401607513427734, 1.077880859375, -0.46248626708984375, -1.1197967529296875, 5.704608917236328, -2.9102859497070312, 2.41424560546875, 6.567760467529297, 2.6141929626464844, 0.2677459716796875, 5.20855712890625, 4.2259979248046875, 2.5162734985351562, -1.9599456787109375, -3.5036964416503906, 1.5145339965820312, 8.752204895019531, 2.3710403442382812, -0.3005714416503906, 5.350013732910156, 1.4112777709960938, 2.8407821655273438, 2.319631576538086, 1.540802001953125, 9.25909423828125, -1.0362396240234375, -4.0970916748046875, 4.77862548828125, -1.3302383422851562, 0.6088485717773438, -0.7589149475097656, 1.7141914367675781, 10.464454650878906, 2.687835693359375, -2.6583251953125, 0.577117919921875, 3.6001205444335938, -1.2501678466796875, -0.7658576965332031, 4.0564727783203125, -0.9841880798339844, 7.160919189453125, -0.8271598815917969, 2.7686767578125, 3.9383773803710938, 0.8896961212158203, 0.2994346618652344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000166.npy"}
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 2.9351627826690674, "std": 3.473567008972168, "min": -5.664314270019531, "p10": -1.1028526306152342, "median": 2.3713598251342773, "p90": 7.273158645629883, "max": 10.817874908447266, "pos_frac": 0.8125, "sample": [0.4430809020996094, 1.4593696594238281, 1.9801464080810547, 3.8022193908691406, -3.241180419921875, 4.8556060791015625, 7.982475280761719, 0.8753166198730469, 2.4748592376708984, 3.8193359375, 5.710201263427734, 2.0529327392578125, 0.9290409088134766, 1.0467529296875, -0.25805091857910156, 0.8758201599121094, 5.586189270019531, 4.977752685546875, 7.1797332763671875, -1.6367645263671875, -5.664314270019531, 2.7740440368652344, 2.0892810821533203, 1.4702224731445312, 2.008626937866211, 1.1238250732421875, 2.2678604125976562, -0.6912212371826172, 2.1049556732177734, 6.132232666015625, 1.8621826171875, 6.45918083190918, 0.16385650634765625, 3.9241676330566406, 3.7938766479492188, 4.719612121582031, 8.041473388671875, 10.543525695800781, 6.8092041015625, 4.792396545410156, 0.8556098937988281, 3.5598297119140625, -1.8815994262695312, 2.5418758392333984, 5.977476119995117, -0.2030792236328125, -3.733245849609375, 8.738471984863281, -0.9383468627929688, 4.553230285644531, 5.714389801025391, -0.24885177612304688, 0.023912429809570312, 10.817874908447266, 4.366363525390625, 7.313198089599609, -1.1733551025390625, 10.409835815429688, 5.429126739501953, 6.51177978515625, 3.7607192993164062, 0.4762840270996094, 0.78118896484375, -1.4420852661132812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000167.npy"}
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 1.930140495300293, "std": 3.0414938926696777, "min": -3.9036827087402344, "p10": -1.4313331604003905, "median": 1.689413070678711, "p90": 6.344366073608401, "max": 8.817604064941406, "pos_frac": 0.671875, "sample": [4.117942810058594, -1.1628494262695312, 5.426349639892578, -1.48828125, 1.374856948852539, 5.229602813720703, 4.63531494140625, -1.0248336791992188, -3.0723609924316406, -2.0438079833984375, 2.2470245361328125, -0.7459087371826172, 1.7051010131835938, 7.485466003417969, 3.3017730712890625, 1.331594467163086, 2.7462539672851562, 2.1530799865722656, 0.18404388427734375, 3.36053466796875, 4.38970947265625, -0.766998291015625, 1.0700454711914062, 2.128314971923828, -0.2302722930908203, 3.4271507263183594, 7.7684783935546875, 1.7014350891113281, -3.6814041137695312, 1.6773910522460938, -0.6932830810546875, 1.6484146118164062, -3.19195556640625, 3.9553680419921875, 5.667835235595703, 0.162994384765625, 6.8218994140625, 1.7053070068359375, -0.6314773559570312, 0.8990249633789062, 0.6877956390380859, 8.817604064941406, -1.2593765258789062, 6.634307861328125, 5.382102966308594, -0.5844650268554688, 3.7220306396484375, -0.02203369140625, 2.0986557006835938, -0.9890365600585938, 3.056121826171875, 0.738616943359375, 4.449760437011719, 0.8784770965576172, -1.372528076171875, 7.143239974975586, -1.4565353393554688, -3.9036827087402344, -0.4929656982421875, -0.14355087280273438, 5.495534896850586, 4.926780700683594, 6.860137939453125, 3.2731246948242188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000168.npy"}
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 2.825845956802368, "std": 4.198537349700928, "min": -4.548563003540039, "p10": -2.0339859008789056, "median": 2.068012237548828, "p90": 7.871726989746094, "max": 17.025924682617188, "pos_frac": 0.75, "sample": [1.6828994750976562, 6.031597137451172, 4.607627868652344, -2.354267120361328, 4.449733734130859, 1.2285232543945312, -2.705486297607422, 0.3164176940917969, 4.9599456787109375, 3.6999969482421875, 1.9615936279296875, 0.7082595825195312, 6.470703125, 7.89971923828125, 0.26629638671875, -1.5320892333984375, 2.69256591796875, 1.14990234375, 3.5180892944335938, -0.7321548461914062, -0.0482177734375, 7.42340087890625, 9.323272705078125, 2.9185028076171875, -0.5746002197265625, 14.308784484863281, 2.0546112060546875, 1.9330062866210938, -0.19742202758789062, 4.879802703857422, 1.8152751922607422, 2.0814132690429688, 5.892528533935547, -1.3930435180664062, 4.539304733276367, 1.583200454711914, 6.491338729858398, -3.70159912109375, 3.020090103149414, 2.5749893188476562, 0.4130535125732422, 2.9794387817382812, 0.6757354736328125, -0.5060195922851562, 3.2005367279052734, 1.4160003662109375, 4.669944763183594, 0.161529541015625, 8.387557983398438, 4.5650177001953125, -1.0967941284179688, 5.023366928100586, 2.8371543884277344, -4.544158935546875, -2.24908447265625, 17.025924682617188, 8.863174438476562, 7.8064117431640625, -4.548563003540039, 12.43222427368164, 1.9472579956054688, 5.769048690795898, -3.1885986328125, -0.4305267333984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000169.npy"}
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 2.5192790031433105, "std": 3.98819899559021, "min": -8.656923294067383, "p10": -2.814384078979492, "median": 2.499805450439453, "p90": 7.8610160827636735, "max": 10.820968627929688, "pos_frac": 0.75, "sample": [5.180419921875, 0.7874755859375, 2.4080333709716797, 3.66802978515625, 2.0236568450927734, 10.745750427246094, 2.4316558837890625, 3.244304656982422, 2.131500244140625, 8.892711639404297, -3.1446304321289062, 1.2887687683105469, 4.238380432128906, 10.820968627929688, 6.7155303955078125, -0.8242855072021484, 1.8874340057373047, 3.3012466430664062, 5.048553466796875, -0.5681610107421875, -1.0338554382324219, 8.04888916015625, 1.793264389038086, 0.5056362152099609, -2.3015365600585938, 3.2764053344726562, -4.696556091308594, -0.8286056518554688, -8.656923294067383, 2.294189453125, 6.506755828857422, 2.5679550170898438, 5.966789245605469, 3.1962718963623047, -5.157344818115234, 1.1000900268554688, -3.80206298828125, 6.1966094970703125, 7.422645568847656, -0.11873626708984375, 4.122886657714844, -0.4992485046386719, 0.7772064208984375, 3.854888916015625, 3.1424407958984375, 10.055892944335938, 4.03924560546875, 8.445274353027344, 3.3873138427734375, 1.7599945068359375, -1.4526443481445312, -1.851358413696289, 0.8675613403320312, 8.674369812011719, -3.0341758728027344, 0.6029930114746094, 5.104087829589844, 4.229576110839844, 0.9836845397949219, 3.1462574005126953, 6.3891754150390625, 5.565925598144531, 4.799919128417969, -4.434638977050781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000170.npy"}
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 3.107259511947632, "std": 4.543727397918701, "min": -10.371391296386719, "p10": -0.9903060913085937, "median": 2.32466983795166, "p90": 9.520965957641604, "max": 11.710590362548828, "pos_frac": 0.78125, "sample": [3.118408203125, 3.0897483825683594, 9.049545288085938, 5.412448883056641, 4.131950378417969, 11.710590362548828, 4.345829010009766, 6.6425323486328125, 0.394256591796875, 1.9814872741699219, 0.06438446044921875, 1.4723052978515625, 10.184135437011719, 3.8318862915039062, 9.723003387451172, 0.0789031982421875, 5.629306793212891, 1.9117927551269531, 6.375421524047852, -0.1598663330078125, -0.05365180969238281, 2.205778121948242, 2.443561553955078, -3.0364990234375, 6.741798400878906, 0.09582328796386719, 2.011260986328125, 0.693023681640625, 8.602508544921875, -6.539636611938477, 2.7005577087402344, 3.6339187622070312, 10.015609741210938, -0.4548988342285156, 2.549457550048828, 8.656082153320312, -0.9140853881835938, -3.999114990234375, 10.233867645263672, 9.723556518554688, -10.371391296386719, 1.2487831115722656, 5.1449737548828125, 1.6219024658203125, 1.088836669921875, 8.6290283203125, 7.5235748291015625, 0.32770538330078125, -1.2123947143554688, 6.692588806152344, 2.5613670349121094, 10.735649108886719, 6.674039840698242, 2.1015472412109375, -1.0229721069335938, 1.315338134765625, 8.414077758789062, -0.016391754150390625, -0.6009101867675781, -6.83489990234375, 8.933563232421875, 0.7949600219726562, 1.3908843994140625, -0.5722312927246094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000171.npy"}
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 3.1506667137145996, "std": 4.235545635223389, "min": -10.878538131713867, "p10": -1.4239158630371092, "median": 3.1594200134277344, "p90": 8.785691642761233, "max": 13.212018966674805, "pos_frac": 0.8125, "sample": [-3.4970169067382812, 3.070690155029297, 1.1316986083984375, 2.06695556640625, 2.686614990234375, 1.558145523071289, -0.07017326354980469, 4.124393463134766, 5.6260833740234375, 9.190912246704102, 12.48284912109375, 4.214447021484375, 3.2912750244140625, -0.61553955078125, 4.4300994873046875, 13.212018966674805, 6.0657958984375, -3.1480865478515625, -1.4878082275390625, 3.0042037963867188, 8.119384765625, 1.6058845520019531, -4.535322189331055, -1.2748336791992188, 4.5516510009765625, 3.2741336822509766, 2.219329833984375, 5.0393218994140625, 3.8433685302734375, 1.1606826782226562, 3.9829750061035156, 3.7058486938476562, 4.0348358154296875, 8.37457275390625, 0.4022865295410156, -1.216400146484375, 8.014789581298828, 4.261253356933594, 9.572675704956055, 2.040966033935547, 0.2919502258300781, 3.248149871826172, 3.4041061401367188, 6.227609634399414, 6.84037971496582, -2.5708847045898438, 1.4198684692382812, 0.7131423950195312, 1.9588165283203125, 4.727354049682617, -3.0834617614746094, 7.33953857421875, 9.564804077148438, -10.878538131713867, 1.4490909576416016, 6.7286376953125, 1.2980575561523438, 2.2914657592773438, 0.81854248046875, 8.961885452270508, -0.41943359375, 0.1982440948486328, 11.655677795410156, 4.94268798828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000172.npy"}
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 3.238369941711426, "std": 5.5073747634887695, "min": -7.616413116455078, "p10": -3.110066795349121, "median": 2.8436851501464844, "p90": 10.790581703186037, "max": 14.84576416015625, "pos_frac": 0.703125, "sample": [-4.480236053466797, 1.3674087524414062, -0.39772796630859375, 5.104114532470703, 2.7994384765625, 7.769035339355469, 3.824186325073242, -1.116943359375, -0.869873046875, 4.279533386230469, 0.43062591552734375, 2.8879318237304688, 9.052909851074219, -7.268774032592773, -2.954988479614258, 9.967292785644531, 11.2557373046875, 1.1373291015625, 3.227407455444336, 1.5578346252441406, 5.939056396484375, 11.369796752929688, -2.425445556640625, 12.96636962890625, -6.773656845092773, -6.0113067626953125, 6.998435974121094, 7.345573425292969, 2.472637176513672, 0.8127937316894531, -3.1765289306640625, 10.901535034179688, -2.4622364044189453, 10.382698059082031, 5.6755218505859375, 1.774251937866211, 8.10678482055664, -2.115264892578125, 7.09051513671875, 10.195480346679688, 3.8135833740234375, 0.5038223266601562, 1.807830810546875, 9.660545349121094, 5.239330291748047, -7.616413116455078, -0.13440704345703125, -1.6839447021484375, 14.84576416015625, 7.729530334472656, 10.53169059753418, -3.4748363494873047, 3.924102783203125, 5.670646667480469, 11.065624237060547, -2.7501907348632812, -0.8412246704101562, -2.816194534301758, 3.2094802856445312, 14.282928466796875, 0.16683197021484375, 5.475919723510742, 0.44451904296875, 1.561492919921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000173.npy"}
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 3.5528039932250977, "std": 5.246769428253174, "min": -10.234687805175781, "p10": -2.7894542694091795, "median": 3.219013214111328, "p90": 10.380493164062502, "max": 15.690460205078125, "pos_frac": 0.75, "sample": [3.0848159790039062, 3.600189208984375, 7.70904541015625, 2.303403854370117, 4.113922119140625, -2.477386474609375, 2.5059127807617188, 6.9513702392578125, 1.6167449951171875, 4.518232345581055, 10.52056884765625, 8.791259765625, 0.7793807983398438, -0.6656265258789062, 9.589996337890625, -3.448028564453125, 5.724214553833008, 6.875585556030273, 4.12158203125, -7.141502380371094, 10.05364990234375, 6.643585205078125, -1.9158134460449219, 7.24951171875, 15.125865936279297, -1.2511768341064453, 5.27032470703125, -1.4915847778320312, 12.898727416992188, 1.2933807373046875, 1.205718994140625, 15.690460205078125, 13.19926643371582, -2.6351890563964844, 4.685482025146484, 7.664520263671875, 1.5552520751953125, 7.90478515625, -0.23866844177246094, 10.718551635742188, -2.8555679321289062, 7.8156280517578125, 3.942394256591797, -0.24043846130371094, 0.4804115295410156, 6.877166748046875, 1.3752975463867188, -3.2361526489257812, 11.995414733886719, 1.142202377319336, -10.234687805175781, 7.2483673095703125, 1.2620468139648438, 2.0615882873535156, 6.7306365966796875, -3.5856857299804688, 5.221778869628906, -1.4479293823242188, 6.402812957763672, -4.244594573974609, 1.0507869720458984, 3.168853759765625, 3.2691726684570312, 0.4796180725097656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000174.npy"}
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 3.8157334327697754, "std": 5.544049263000488, "min": -8.883819580078125, "p10": -2.480907440185547, "median": 2.891972541809082, "p90": 10.871784400939942, "max": 14.610687255859375, "pos_frac": 0.734375, "sample": [-3.887155532836914, 4.823635101318359, -2.5956878662109375, 1.360198974609375, 0.07016754150390625, 7.550098419189453, -0.0057811737060546875, 9.942901611328125, 4.0676422119140625, -0.8780021667480469, 0.16204071044921875, 9.890937805175781, 3.419239044189453, 5.963951110839844, -1.9270191192626953, 11.00648307800293, 4.5082550048828125, 8.466529846191406, 8.15218734741211, 1.8481979370117188, 2.5142288208007812, 3.0008487701416016, 2.255859375, 7.035240173339844, 8.101959228515625, 5.093910217285156, -0.8458671569824219, -1.3427963256835938, 13.12053108215332, -8.883819580078125, 8.902061462402344, 0.7472915649414062, 2.7368087768554688, 5.230560302734375, -2.08282470703125, 0.07379531860351562, 1.6692657470703125, 14.404518127441406, -1.3322868347167969, -4.739349365234375, 4.0469970703125, -1.643310546875, 0.9865341186523438, 2.7830963134765625, -4.6095733642578125, -2.217254638671875, 9.702407836914062, -2.5566329956054688, 9.534828186035156, 10.214736938476562, 4.657688140869141, 9.466939926147461, 1.2084197998046875, -4.107902526855469, 1.42822265625, 10.557487487792969, -2.3042144775390625, 13.451690673828125, 6.9336700439453125, 14.610687255859375, 0.41448211669921875, 14.263046264648438, 5.524986267089844, 14.261150360107422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000175.npy"}
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 2.624541759490967, "std": 5.649908542633057, "min": -12.225875854492188, "p10": -3.5875656127929685, "median": 2.4376583099365234, "p90": 9.475597000122075, "max": 18.299453735351562, "pos_frac": 0.671875, "sample": [12.646530151367188, -8.929733276367188, 6.308067321777344, 6.367654800415039, 0.42317962646484375, 5.0446319580078125, -1.685791015625, 4.958305358886719, 3.6011104583740234, 1.6577987670898438, -8.991737365722656, -4.141082763671875, -3.306640625, 4.717689514160156, -3.7079620361328125, 10.056529998779297, 0.3151435852050781, 5.627422332763672, -1.5520172119140625, -0.29974365234375, -0.5066413879394531, -1.012786865234375, 2.879281997680664, 14.502254486083984, 5.074302673339844, 2.3208847045898438, -2.9164199829101562, 5.055534362792969, -2.340007781982422, 5.855007171630859, 7.668212890625, -1.0040531158447266, -0.499053955078125, 6.9301300048828125, 2.554431915283203, 12.82840347290039, 0.7781753540039062, 4.0721282958984375, 1.1921348571777344, -3.7156448364257812, 1.7316665649414062, 1.6409454345703125, 4.586610794067383, 11.445137023925781, 3.60968017578125, 6.744625091552734, -1.4931640625, -2.2794418334960938, 8.120086669921875, 13.872535705566406, -1.6256103515625, 5.600822448730469, 0.9549179077148438, 5.7124176025390625, 1.4298248291015625, 3.4613285064697266, 18.299453735351562, 0.8189888000488281, -0.8293838500976562, 7.18524169921875, 4.536827087402344, -12.225875854492188, 4.1392364501953125, -6.2918243408203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000176.npy"}
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 5.241544723510742, "std": 5.100275993347168, "min": -7.996917724609375, "p10": -0.18316879272460931, "median": 4.7786865234375, "p90": 12.0098783493042, "max": 17.742950439453125, "pos_frac": 0.859375, "sample": [3.0310897827148438, 1.4804649353027344, 6.795766830444336, 16.854888916015625, 7.636512756347656, -0.11363983154296875, 2.5583572387695312, 6.799041748046875, 13.296493530273438, 5.108314514160156, 17.742950439453125, 9.450660705566406, 11.643457412719727, -1.0200920104980469, 3.2664833068847656, -3.2487125396728516, 12.166915893554688, 1.7242279052734375, -2.0398006439208984, 6.439617156982422, 6.6250762939453125, 4.4568023681640625, 2.1129798889160156, 10.506448745727539, 7.881137847900391, 1.0924224853515625, 3.1174774169921875, 9.896539688110352, -1.3960590362548828, 1.1739311218261719, 5.1005706787109375, 4.003639221191406, 12.927841186523438, 7.750823974609375, 0.48519325256347656, 8.550384521484375, 11.342367172241211, 3.6636123657226562, 3.7052001953125, 5.980373382568359, 1.3705062866210938, 3.6869773864746094, 7.7548828125, 0.687164306640625, 8.591590881347656, -3.9343643188476562, 3.8716201782226562, 0.838531494140625, 9.57828140258789, -0.2129669189453125, 8.8931884765625, 0.5719928741455078, -7.996917724609375, 16.004192352294922, 5.646217346191406, 8.447349548339844, 5.306966781616211, 12.29327392578125, 6.2888641357421875, 2.512195587158203, 3.0009536743164062, 9.79117202758789, -0.1093902587890625, 4.026824951171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000177.npy"}
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 3.1327052116394043, "std": 4.778141021728516, "min": -7.2646026611328125, "p10": -3.5718080520629876, "median": 2.901021957397461, "p90": 10.749153709411624, "max": 13.245704650878906, "pos_frac": 0.796875, "sample": [2.2286319732666016, 9.825942993164062, 4.436058044433594, 1.0580062866210938, 8.397838592529297, 0.5671539306640625, -1.3958206176757812, 6.547084808349609, 5.557933807373047, 3.9870147705078125, -4.085990905761719, 4.470130920410156, 0.339324951171875, -5.09507942199707, 3.4489059448242188, 2.85150146484375, 11.144815444946289, 0.31038856506347656, -1.6553421020507812, 1.222463607788086, 11.426021575927734, 0.9374523162841797, 3.3440704345703125, -5.44110107421875, 3.2500991821289062, 1.4205322265625, 2.2158584594726562, 4.50726318359375, 1.5175018310546875, 2.439403533935547, 5.2187042236328125, 0.1401538848876953, 1.7151718139648438, 12.721755981445312, 1.8948822021484375, 6.27178955078125, 3.462005615234375, 2.7384414672851562, 13.245704650878906, 1.5018043518066406, -4.6056976318359375, 7.60028076171875, 12.549530029296875, 3.41455078125, -7.2646026611328125, 8.164138793945312, -0.9008216857910156, -3.066986083984375, 11.251861572265625, 3.5256290435791016, 7.2002410888671875, 12.496536254882812, 2.950542449951172, 5.985893249511719, 1.683980941772461, -0.0515899658203125, 7.5074615478515625, 0.6569976806640625, -4.7140655517578125, -3.009002685546875, 4.0416107177734375, 5.0045166015625, 5.17181396484375, -3.7881603240966797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000178.npy"}
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 3.532163143157959, "std": 5.833497047424316, "min": -12.038360595703125, "p10": -3.4846759796142575, "median": 4.034276962280273, "p90": 10.123480606079104, "max": 18.939041137695312, "pos_frac": 0.71875, "sample": [4.055206298828125, -1.80792236328125, 0.9073562622070312, 6.4278717041015625, -0.38431549072265625, 6.1334228515625, 4.579957962036133, 4.195098876953125, 13.720596313476562, 6.021076202392578, 8.344022750854492, 10.477466583251953, 7.341358184814453, -5.805759429931641, 5.805454254150391, -3.9523468017578125, -12.038360595703125, -2.147571563720703, 1.277679443359375, 4.308372497558594, -1.5882339477539062, 8.612724304199219, 0.076202392578125, 6.2852935791015625, 10.32211685180664, -5.297859191894531, -0.6803359985351562, 7.635200500488281, 1.965423583984375, 2.392414093017578, 4.013347625732422, 9.659996032714844, 1.9332275390625, -3.7318878173828125, 1.4564933776855469, -4.8115081787109375, 0.6764068603515625, 17.849533081054688, 7.8083648681640625, -1.2360115051269531, 6.200983047485352, 6.513130187988281, 6.957115173339844, 6.968669891357422, -0.42989349365234375, 9.640762329101562, 11.841972351074219, 2.1077117919921875, 12.294273376464844, 18.939041137695312, 6.9764404296875, 5.307346343994141, 9.199432373046875, 0.8249435424804688, -1.490234375, 6.155426025390625, 1.3509559631347656, 0.9351959228515625, 2.862173080444336, -1.9892959594726562, -8.707651138305664, -1.106222152709961, 6.814445495605469, -2.907848358154297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000179.npy"}
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 3.2362074851989746, "std": 6.137191295623779, "min": -9.76885986328125, "p10": -4.605236816406249, "median": 2.799741744995117, "p90": 12.221791076660159, "max": 17.54206085205078, "pos_frac": 0.71875, "sample": [11.271728515625, 0.5625534057617188, 5.001796722412109, 0.9825286865234375, 2.4969940185546875, 6.647861480712891, -7.061363220214844, 3.979167938232422, 0.13419151306152344, -1.533163070678711, 11.323434829711914, 7.747406005859375, -3.848541259765625, -0.6298046112060547, 5.687103271484375, 0.9954776763916016, -4.929534912109375, 9.443923950195312, -0.4862174987792969, -8.123275756835938, 12.57037353515625, 2.9006309509277344, 4.76017951965332, 5.03973388671875, 4.951686859130859, 7.182199478149414, -0.2526283264160156, 1.697052001953125, 2.994924545288086, 3.8853073120117188, 5.504203796386719, 10.571008682250977, -6.375877380371094, 0.6947174072265625, 5.304340362548828, 4.38031005859375, 1.3458423614501953, 13.293739318847656, 0.014940261840820312, -1.3667831420898438, 2.632701873779297, 6.065319061279297, 13.143369674682617, -1.2904586791992188, -3.1571807861328125, 2.2205772399902344, 16.232322692871094, -1.44549560546875, 17.54206085205078, -0.3578166961669922, 11.408432006835938, -6.3687744140625, 2.6988525390625, -2.9694976806640625, 0.5449295043945312, 3.1256141662597656, 16.331317901611328, -9.76885986328125, 4.395511627197266, -7.570304870605469, 6.48077392578125, 3.9151458740234375, 12.692588806152344, 1.857992172241211], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000180.npy"}
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 4.972424507141113, "std": 5.1943511962890625, "min": -4.966470718383789, "p10": -0.8822631835937498, "median": 4.629560470581055, "p90": 11.626101303100588, "max": 18.19561767578125, "pos_frac": 0.859375, "sample": [2.9054718017578125, 8.17257308959961, 0.728912353515625, -0.1334228515625, 3.4274635314941406, 3.048433303833008, 4.188240051269531, -4.6642303466796875, 2.6607513427734375, 1.2617645263671875, 8.241462707519531, 2.2886009216308594, 4.572322845458984, 0.0588226318359375, 7.830835342407227, 6.345893859863281, 7.500936508178711, 6.4959716796875, 9.726730346679688, -3.1369781494140625, 6.2896270751953125, 13.453010559082031, 8.621517181396484, 4.686798095703125, 17.04345703125, 8.121780395507812, 3.5027847290039062, 18.19561767578125, 0.14481353759765625, -4.384788513183594, 10.306526184082031, 3.157245635986328, 13.908294677734375, -4.966470718383789, 9.75518798828125, 0.7757415771484375, -0.9923954010009766, 12.40765380859375, 0.9600028991699219, 0.36409759521484375, 7.495441436767578, 2.550302505493164, 5.9520263671875, 11.040023803710938, 9.38150405883789, 0.8397331237792969, 1.7561168670654297, 4.8983001708984375, 6.127159118652344, 0.5573825836181641, 1.3313446044921875, 11.877277374267578, 16.34368896484375, 2.9376754760742188, 5.762603759765625, 11.03021240234375, 3.3462753295898438, 5.3013458251953125, 6.5371856689453125, -1.8707771301269531, 6.094520568847656, -0.6252880096435547, 8.4139404296875, -1.7138824462890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000181.npy"}
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 5.117817401885986, "std": 6.9557414054870605, "min": -14.474906921386719, "p10": -2.2054616928100588, "median": 4.141023635864258, "p90": 13.968859481811526, "max": 18.925018310546875, "pos_frac": 0.734375, "sample": [18.925018310546875, -2.2008914947509766, -2.2074203491210938, 4.976585388183594, 10.238758087158203, 9.030563354492188, 6.253875732421875, -2.4447097778320312, -3.447479248046875, 16.653579711914062, 7.737751007080078, 3.8395309448242188, 17.500396728515625, -1.5590744018554688, 9.88037109375, 7.8851318359375, 3.272918701171875, 4.227291107177734, 14.185955047607422, 3.6626663208007812, 5.279977798461914, 0.452301025390625, 1.92864990234375, 0.2095794677734375, 6.973747253417969, -1.4944000244140625, 13.022027969360352, 6.027021408081055, 12.804534912109375, 12.860029220581055, 17.223175048828125, 3.467357635498047, 8.144134521484375, 3.2549705505371094, 13.096115112304688, -2.8744964599609375, -0.11077880859375, 3.4106597900390625, 12.6505126953125, -0.04523658752441406, 2.302032470703125, -7.956933975219727, 16.459182739257812, -2.17132568359375, 12.804008483886719, 8.043403625488281, 2.3548583984375, -0.5928878784179688, 11.099113464355469, 4.054756164550781, 6.31585693359375, 13.462303161621094, -1.6544113159179688, 13.36402702331543, 14.679496765136719, 2.208108901977539, -1.9728469848632812, 5.332826614379883, 7.737438201904297, 1.5391254425048828, 0.9320602416992188, -8.4622802734375, -0.55340576171875, -14.474906921386719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000182.npy"}
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 2.133739471435547, "std": 5.6255974769592285, "min": -15.072120666503906, "p10": -4.636223602294922, "median": 1.513040542602539, "p90": 10.294499969482423, "max": 12.689735412597656, "pos_frac": 0.703125, "sample": [1.225555419921875, 10.422714233398438, -0.8018035888671875, 0.050785064697265625, 2.1033992767333984, -2.0357208251953125, -2.517852783203125, 10.416908264160156, -5.749153137207031, 0.113677978515625, -15.072120666503906, 0.6444549560546875, 5.795083999633789, -2.7774505615234375, 5.029609680175781, 10.795267105102539, -0.9112567901611328, 4.4954833984375, 8.618782043457031, 5.849157333374023, 5.8578643798828125, -3.03240966796875, 9.396087646484375, -5.52031135559082, 0.7287254333496094, -2.3812942504882812, 5.4174652099609375, 1.5195388793945312, -6.39306640625, -4.7067718505859375, 12.689735412597656, 5.6735687255859375, 1.5065422058105469, 1.4798126220703125, -5.962837219238281, 7.1831512451171875, 4.28973388671875, 11.480361938476562, -9.569282531738281, 12.109737396240234, -4.403095245361328, 0.10007476806640625, 9.717453002929688, -0.03363990783691406, 1.7919654846191406, 4.003959655761719, 7.979551315307617, 2.9658145904541016, -4.190826416015625, -0.9071979522705078, 1.1121063232421875, 5.9174957275390625, 0.04492950439453125, 1.8872604370117188, 11.291305541992188, 0.1436614990234375, 4.435661315917969, -4.471611022949219, 0.9979305267333984, 0.18859100341796875, 2.6611480712890625, 10.008880615234375, 5.114070892333984, 2.7419586181640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000183.npy"}
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 4.248322486877441, "std": 5.304624080657959, "min": -4.839437484741211, "p10": -2.2558116912841797, "median": 2.7974328994750977, "p90": 12.11415252685547, "max": 16.503921508789062, "pos_frac": 0.765625, "sample": [1.8806610107421875, 8.997413635253906, 12.598697662353516, 2.3307933807373047, 11.590530395507812, 2.905609130859375, -0.5172653198242188, -0.04736328125, -0.00482940673828125, 3.4457626342773438, -0.04659271240234375, 13.156181335449219, 2.590730667114258, 2.6892566680908203, 8.308349609375, -4.1841583251953125, 11.268234252929688, 11.138626098632812, 3.278156280517578, 16.503921508789062, -2.4731903076171875, 12.865869522094727, 4.444751739501953, 15.884468078613281, -2.27569580078125, 9.763965606689453, 1.2854557037353516, 8.398513793945312, 13.04842758178711, 1.3540878295898438, 3.1878128051757812, 0.3089141845703125, 3.2596206665039062, -2.2094154357910156, -0.2459239959716797, 3.2545394897460938, 12.33856201171875, 3.8118724822998047, 1.1677932739257812, 9.409019470214844, 2.6116085052490234, 1.2864761352539062, 9.92083740234375, -0.38922882080078125, 10.012306213378906, 5.2227020263671875, 4.5411529541015625, 2.1509017944335938, 0.024507522583007812, 10.504554748535156, -2.9202423095703125, -0.591796875, 2.1093978881835938, 2.5518112182617188, 7.87353515625, 11.332534790039062, -4.839437484741211, 4.7528533935546875, 0.28568267822265625, 4.1010894775390625, -2.8216171264648438, 0.7463188171386719, -2.8294010162353516, 1.793914794921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000184.npy"}
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 4.402966022491455, "std": 7.5710954666137695, "min": -10.901622772216797, "p10": -4.219289016723633, "median": 4.803098678588867, "p90": 14.976423263549806, "max": 18.7646484375, "pos_frac": 0.65625, "sample": [3.6065673828125, 18.077980041503906, -2.7849197387695312, 10.568258285522461, 18.133926391601562, -10.129222869873047, 5.1240386962890625, 3.5726280212402344, -2.2574920654296875, -0.10013580322265625, 17.076438903808594, 16.72687530517578, -0.43973541259765625, -2.8432464599609375, 15.132972717285156, 3.0315017700195312, 18.7646484375, 3.5181617736816406, 11.842666625976562, 12.61964225769043, -2.5343856811523438, 15.454147338867188, 5.4871063232421875, 4.8929901123046875, 1.766530990600586, 8.619197845458984, 4.803432464599609, 8.363105773925781, 5.159454345703125, 7.1024017333984375, -4.027284622192383, -1.0960006713867188, 9.175798416137695, 14.249755859375, -10.901622772216797, 8.42523193359375, 8.590972900390625, 6.292497634887695, 5.413623809814453, -1.0291519165039062, -8.799560546875, 4.802764892578125, -2.3521804809570312, 6.841888427734375, 10.070610046386719, -8.947616577148438, 10.629724502563477, -0.6542148590087891, 13.594223022460938, -2.0149898529052734, 2.6838226318359375, -8.521902084350586, 4.42877197265625, -4.000205993652344, 0.6126708984375, 5.065778732299805, -7.009555816650391, -4.301576614379883, 7.406223297119141, -2.3675079345703125, -0.5190620422363281, 3.9733924865722656, 13.107818603515625, 14.611141204833984], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000185.npy"}
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 4.7698259353637695, "std": 7.654938220977783, "min": -12.245613098144531, "p10": -4.31530532836914, "median": 4.536876678466797, "p90": 14.169695854187012, "max": 20.72021484375, "pos_frac": 0.71875, "sample": [13.8575439453125, 17.585540771484375, -8.843971252441406, 3.5249176025390625, 4.7153778076171875, -0.0769195556640625, 8.5411376953125, -5.420997619628906, -4.462730407714844, 10.277572631835938, 5.8798065185546875, -3.0540199279785156, 5.210845947265625, 14.19973373413086, 19.963653564453125, 14.099607467651367, 2.7385520935058594, 7.361259460449219, 2.0666847229003906, -9.017288208007812, 1.0392265319824219, 4.968423843383789, 9.92910385131836, 19.009666442871094, -12.245613098144531, -3.9262351989746094, -2.510232925415039, -3.512960433959961, 0.3886566162109375, 2.385662078857422, 3.334552764892578, 13.644979476928711, -3.9713134765625, -0.834808349609375, -1.5008125305175781, -1.3628997802734375, 6.7189483642578125, 14.047027587890625, 13.12933349609375, 3.1953697204589844, -1.5196075439453125, -8.008255004882812, 0.4527549743652344, 2.2087783813476562, 6.7853240966796875, 3.7150115966796875, 13.878816604614258, 9.35345458984375, -5.692108154296875, 5.115776062011719, 19.536956787109375, 6.831813812255859, 4.619438171386719, 18.76654052734375, 3.333099365234375, 4.227073669433594, 20.72021484375, 9.189393997192383, 6.942985534667969, 7.346860885620117, 4.454315185546875, 7.811248779296875, -1.2728271484375, 5.399435043334961], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000186.npy"}
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 4.478249549865723, "std": 7.111066818237305, "min": -14.732017517089844, "p10": -3.532375717163086, "median": 5.213069915771484, "p90": 13.288368225097656, "max": 19.361007690429688, "pos_frac": 0.71875, "sample": [9.062919616699219, -1.7672271728515625, 11.457183837890625, 15.494979858398438, 19.353271484375, 11.270179748535156, 5.853626251220703, 2.3412647247314453, 5.0048980712890625, 11.422798156738281, -8.175498962402344, 6.128143310546875, -0.5033912658691406, 8.927627563476562, 8.848175048828125, 3.593984603881836, 1.6833057403564453, 13.763900756835938, 6.7650909423828125, 7.613420486450195, -9.45782470703125, -14.732017517089844, 13.334144592285156, 0.18778038024902344, 3.603240966796875, -0.8520126342773438, 0.8134422302246094, 0.71478271484375, 1.650360107421875, 10.504261016845703, 10.063030242919922, 11.627372741699219, -7.933677673339844, -1.7194061279296875, 5.621925354003906, 4.773094177246094, 2.6935272216796875, 0.3548088073730469, -1.3831253051757812, 6.990438461303711, -1.867422103881836, -3.5967979431152344, 13.181556701660156, -0.5498943328857422, 5.421241760253906, 8.3978271484375, 13.885299682617188, -1.2920608520507812, 7.2993316650390625, 11.1134033203125, 12.718212127685547, -0.8695831298828125, 7.8447113037109375, 19.361007690429688, 2.5522918701171875, -3.3820571899414062, 8.53436279296875, 13.343032836914062, 3.2113800048828125, 7.174568176269531, -0.094512939453125, -8.275493621826172, -8.487823486328125, 5.9925994873046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000187.npy"}
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 4.657002925872803, "std": 7.441919803619385, "min": -21.395965576171875, "p10": -2.4411958694458007, "median": 3.8400611877441406, "p90": 14.651660156250008, "max": 22.17047882080078, "pos_frac": 0.78125, "sample": [3.8066787719726562, -2.3397769927978516, 18.015972137451172, -3.4010753631591797, -1.2100753784179688, 12.210678100585938, 10.150985717773438, 4.378171920776367, 9.171630859375, 2.23297119140625, 1.6166229248046875, 17.441055297851562, 9.662761688232422, -1.8662643432617188, 0.32796478271484375, 0.6003456115722656, 7.865348815917969, 15.451011657714844, 0.32481956481933594, -3.372241973876953, 4.656269073486328, 1.689849853515625, 1.2047882080078125, -0.41693115234375, 6.9411468505859375, 0.20108795166015625, -2.484661102294922, 10.261795043945312, 2.371143341064453, 9.697795867919922, 11.040817260742188, 1.0406494140625, 6.462310791015625, 22.17047882080078, 11.614784240722656, 0.08098602294921875, 12.352424621582031, 5.203826904296875, -1.6991500854492188, 3.2998428344726562, 19.532909393310547, 1.7461395263671875, 5.182043075561523, 2.9177207946777344, 0.44107818603515625, -0.7791900634765625, 5.335979461669922, 12.241584777832031, 3.1240463256835938, 0.6075897216796875, 16.917221069335938, -10.586395263671875, -1.8412246704101562, 6.52471923828125, 3.873443603515625, 4.130596160888672, 4.921497344970703, -4.819147109985352, 12.786506652832031, 16.856510162353516, 12.638574600219727, 4.313972473144531, -3.3788700103759766, -21.395965576171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000188.npy"}
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 4.635939598083496, "std": 7.032341480255127, "min": -11.71963119506836, "p10": -4.830697631835937, "median": 4.866243362426758, "p90": 14.517121124267579, "max": 18.10747528076172, "pos_frac": 0.828125, "sample": [1.9283103942871094, 5.16064453125, 6.380939483642578, 5.032707214355469, 5.155364990234375, 14.474166870117188, 12.633346557617188, 9.858037948608398, 0.7160587310791016, 2.7194976806640625, -9.094223022460938, 14.757793426513672, -8.556495666503906, 1.5111312866210938, 0.163330078125, -1.292531967163086, 14.535530090332031, 8.269367218017578, -9.29605484008789, 5.3744964599609375, 3.3389205932617188, 13.761253356933594, 4.841403961181641, 8.01971435546875, 0.02455902099609375, 3.193449020385742, 7.35211181640625, 4.983198165893555, 3.490427017211914, -0.9625015258789062, 12.676040649414062, -2.9629058837890625, 3.6506614685058594, 17.41053009033203, 15.0439453125, 4.891082763671875, 0.09875869750976562, 2.119781494140625, 1.5847244262695312, 9.29507064819336, 12.745067596435547, 4.989105224609375, 5.754133224487305, 3.460216522216797, 9.631813049316406, 7.289638519287109, -4.517173767089844, -10.584268569946289, 3.427978515625, 1.9892501831054688, 0.497283935546875, 15.280588150024414, -6.539703369140625, 18.10747528076172, 10.87115478515625, 4.574485778808594, -4.965065002441406, -11.71963119506836, 6.512712478637695, 1.8450927734375, 5.656940460205078, 0.46456146240234375, 14.049003601074219, 15.597803115844727], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000189.npy"}
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 6.251853942871094, "std": 7.374175071716309, "min": -9.169157028198242, "p10": -2.394378662109375, "median": 5.871931076049805, "p90": 15.618896484375, "max": 23.738525390625, "pos_frac": 0.8125, "sample": [5.516246795654297, 11.886260986328125, 21.844024658203125, -5.54119873046875, 9.079658508300781, 7.5760040283203125, 0.7960205078125, 1.304046630859375, -9.169157028198242, 18.799713134765625, 6.104717254638672, 11.896926879882812, 6.456964492797852, 0.6924819946289062, 2.773468017578125, 1.5138053894042969, -0.3365745544433594, 15.571258544921875, 2.500429153442383, 7.006093978881836, 13.254718780517578, 14.48501205444336, 0.798004150390625, 0.4218330383300781, -3.946918487548828, 4.6321868896484375, -9.022125244140625, 4.092456817626953, 4.544780731201172, 6.5203094482421875, 9.019676208496094, -1.6500320434570312, 5.6391448974609375, 7.2679443359375, 10.534820556640625, 0.0014324188232421875, 15.639312744140625, -2.1611061096191406, -3.3169898986816406, 6.854145050048828, 0.28424072265625, 14.175262451171875, 8.637222290039062, 23.738525390625, 5.225130081176758, 1.0584716796875, -5.28858757019043, 11.66094970703125, -2.39666748046875, -0.8108978271484375, 5.414613723754883, 7.684547424316406, 1.800222396850586, 10.452499389648438, 11.484039306640625, 14.427841186523438, 20.925765991210938, 4.8115692138671875, 12.896871566772461, 15.734718322753906, -2.3890380859375, 15.99114990234375, 11.571990966796875, 13.14840316772461], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000190.npy"}
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 4.756298542022705, "std": 7.729222297668457, "min": -18.100540161132812, "p10": -3.4661668777465815, "median": 4.016722679138184, "p90": 14.093866348266603, "max": 22.34259796142578, "pos_frac": 0.828125, "sample": [12.462051391601562, -2.744680404663086, 2.4517364501953125, 9.485824584960938, 20.582183837890625, 0.5686874389648438, 0.012847900390625, 17.122894287109375, 4.491373062133789, 2.4435882568359375, 2.6891632080078125, 22.34259796142578, 2.1730880737304688, 0.8853569030761719, 7.277580261230469, 3.5026607513427734, -13.021049499511719, 1.4294967651367188, 14.96038818359375, 4.998540878295898, 8.224700927734375, 11.289947509765625, 10.977245330810547, 10.206298828125, -7.22515869140625, 10.801979064941406, 5.541717529296875, 6.8733978271484375, -3.7753753662109375, 0.6334724426269531, 8.140268325805664, 0.1583404541015625, -1.064870834350586, 1.8011436462402344, 18.587112426757812, -18.100540161132812, 0.15753555297851562, -5.8283843994140625, 9.539104461669922, 10.479446411132812, 7.590625762939453, 9.220401763916016, 3.1015777587890625, 9.210807800292969, 8.68471908569336, 2.7256107330322266, 13.20648193359375, 13.653789520263672, -1.8627510070800781, 0.730438232421875, -2.711080551147461, 4.560352325439453, 14.282470703125, 3.542072296142578, -7.336284637451172, 9.0562744140625, 10.948905944824219, -12.683135986328125, 6.892293930053711, 1.02606201171875, 0.839813232421875, 14.470779418945312, 1.6479110717773438, 2.0732574462890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000191.npy"}
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 3.473459243774414, "std": 6.7660064697265625, "min": -18.26007843017578, "p10": -3.8928358078002927, "median": 2.9152183532714844, "p90": 12.508498191833501, "max": 20.84257698059082, "pos_frac": 0.6875, "sample": [-18.26007843017578, -6.460987091064453, -0.17994308471679688, 6.451358795166016, 8.436527252197266, 1.0244808197021484, -4.01014518737793, 9.265213012695312, 13.812461853027344, -1.7352828979492188, -4.889583587646484, -3.42645263671875, -2.4269638061523438, 0.2696037292480469, 1.870635986328125, 7.3039093017578125, 13.083023071289062, 3.5217742919921875, 0.4871253967285156, 1.576416015625, 13.167596817016602, 16.80193328857422, -1.1961250305175781, -1.4893646240234375, 10.614593505859375, -0.38479042053222656, 0.5775146484375, 7.296123504638672, 2.82879638671875, 2.733570098876953, 8.451766967773438, 8.107223510742188, 4.85576057434082, 11.167940139770508, -1.4902515411376953, -0.8841972351074219, 4.3096160888671875, 0.3427772521972656, -1.9720611572265625, 3.0016403198242188, 4.163991928100586, 1.0975608825683594, -10.555850982666016, 20.84257698059082, 9.586654663085938, 6.22291374206543, -3.6191139221191406, 3.471057891845703, 4.123725891113281, -5.544120788574219, 5.51568603515625, 15.264991760253906, 0.865753173828125, 7.4900054931640625, -1.1472301483154297, 6.983421325683594, 10.30856704711914, -1.060943603515625, 11.037773132324219, 14.338859558105469, 6.536535263061523, -4.216762542724609, 2.39642333984375, 5.645748138427734], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000192.npy"}
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 5.939870357513428, "std": 7.3793158531188965, "min": -15.252155303955078, "p10": -1.715423583984375, "median": 5.065986633300781, "p90": 15.345264434814453, "max": 22.796127319335938, "pos_frac": 0.796875, "sample": [1.7965240478515625, 3.1937923431396484, 8.225028991699219, 3.837230682373047, 4.772260665893555, -2.7407989501953125, -0.9713287353515625, 15.172096252441406, -0.39398193359375, 0.3589954376220703, 14.709430694580078, 1.4206352233886719, 12.323646545410156, 5.5996856689453125, 15.419479370117188, -0.35260772705078125, 5.116752624511719, 1.9365921020507812, 8.884223937988281, 10.7078857421875, -3.2049636840820312, 18.258440017700195, 22.796127319335938, -3.383106231689453, 7.564460754394531, 14.03729248046875, 1.7469406127929688, 7.757240295410156, -1.6097335815429688, 6.849849700927734, 16.145023345947266, 5.016357421875, 1.308929443359375, 10.259376525878906, 7.660896301269531, 1.0306167602539062, -1.7607192993164062, -0.7294273376464844, 9.146873474121094, 5.522041320800781, -3.6421661376953125, 2.385967254638672, 14.486522674560547, 19.27526092529297, -2.6484012603759766, 1.972900390625, 6.145200729370117, -0.7261581420898438, 1.227081298828125, 14.991539001464844, 10.405967712402344, 0.2599468231201172, 21.048355102539062, 20.708261489868164, 14.997417449951172, 5.1156158447265625, 9.295944213867188, -15.252155303955078, 7.538398742675781, 14.895675659179688, 1.70220947265625, 0.8367595672607422, 1.397369384765625, 0.3061332702636719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000193.npy"}
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 4.901164531707764, "std": 8.879342079162598, "min": -16.7884521484375, "p10": -6.8056547164916985, "median": 5.6189374923706055, "p90": 15.599432373046875, "max": 21.137144088745117, "pos_frac": 0.75, "sample": [19.900222778320312, 4.226982116699219, 10.822105407714844, -7.267877578735352, 8.9232177734375, 1.9755744934082031, -10.406770706176758, 2.0570831298828125, 1.5523223876953125, 8.354995727539062, 4.7569122314453125, 7.608489990234375, 3.8697052001953125, 12.006942749023438, -13.22796630859375, -3.200958251953125, 10.305931091308594, -5.330970764160156, 4.470550537109375, -2.6320419311523438, 7.6068115234375, 2.9537582397460938, 11.800102233886719, 13.470840454101562, -3.6359481811523438, 7.741539001464844, -1.9173126220703125, 12.1768798828125, 0.28399658203125, 9.498680114746094, 1.668731689453125, 13.703824996948242, -8.550254821777344, -5.727134704589844, -14.5751953125, -1.516244888305664, 20.6258487701416, 8.755645751953125, 11.585247039794922, 16.2921085357666, -14.569652557373047, 6.754005432128906, 6.480962753295898, 2.297527313232422, 14.85430908203125, 21.137144088745117, 10.327535629272461, -16.7884521484375, 18.216773986816406, 12.459686279296875, 10.268590927124023, -1.80743408203125, 7.02081298828125, 17.000240325927734, 15.467498779296875, 15.655975341796875, 9.658496856689453, 0.19922256469726562, 0.15487289428710938, 11.552047729492188, 3.1549015045166016, 1.9537315368652344, -1.418375015258789, 2.6377391815185547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000194.npy"}
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 6.003086090087891, "std": 8.10669231414795, "min": -11.125274658203125, "p10": -3.793479156494139, "median": 4.757865905761719, "p90": 18.188327598571778, "max": 22.648635864257812, "pos_frac": 0.734375, "sample": [3.01641845703125, 5.45030403137207, -2.0471954345703125, 5.298736572265625, 21.57788848876953, 9.367401123046875, -0.09965133666992188, 4.927070617675781, 6.6094207763671875, 20.508773803710938, 10.497467041015625, 11.164472579956055, -6.067314147949219, -4.3712615966796875, -7.135467529296875, -11.125274658203125, -1.292196273803711, 7.747402191162109, 4.057764053344727, 13.726524353027344, -0.909698486328125, 4.588661193847656, 9.614795684814453, 10.007644653320312, 4.069236755371094, 3.9056625366210938, 18.31583595275879, 19.09954833984375, 2.2571640014648438, 9.004783630371094, 4.372611999511719, -0.3300323486328125, 21.658287048339844, 7.677909851074219, 17.89080810546875, -1.41070556640625, -1.0012264251708984, 15.740806579589844, 11.393234252929688, 15.150337219238281, 3.9790000915527344, -2.4453201293945312, -0.5616073608398438, -0.1683349609375, 1.6883678436279297, 2.660726547241211, 0.942108154296875, 16.607864379882812, -9.951705932617188, 0.2967357635498047, -5.109081268310547, 0.9762115478515625, 22.648635864257812, 13.712043762207031, 11.159072875976562, 10.674468994140625, 10.536849975585938, 13.074111938476562, -7.08514404296875, 18.597427368164062, 5.1060791015625, 2.8063278198242188, 3.495208740234375, 7.650503158569336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000195.npy"}
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 5.255779266357422, "std": 7.820141315460205, "min": -23.045513153076172, "p10": -3.168609619140625, "median": 4.5742340087890625, "p90": 16.480239105224612, "max": 21.334640502929688, "pos_frac": 0.78125, "sample": [2.2693328857421875, -23.045513153076172, -2.532949447631836, 3.001251220703125, 4.120038986206055, -5.340847015380859, 0.11101531982421875, 5.239501953125, 4.607025146484375, -3.7432098388671875, 1.5641021728515625, 17.225936889648438, 14.202934265136719, 7.759181976318359, 7.8677978515625, 3.1450653076171875, -9.427684783935547, 16.75749969482422, -3.254302978515625, 17.376983642578125, -1.4122066497802734, 4.379508972167969, 18.487804412841797, 1.1141395568847656, -1.9889259338378906, 4.320777893066406, 5.68548583984375, 6.141838073730469, 6.0538177490234375, 21.334640502929688, 3.7290496826171875, 11.907169342041016, 1.3269195556640625, -2.968658447265625, -1.6148147583007812, 9.407157897949219, 1.4124469757080078, 7.813142776489258, 5.4263763427734375, 16.788475036621094, 8.159408569335938, 6.6973419189453125, 11.753128051757812, 3.245738983154297, 12.421916961669922, -5.932304382324219, 0.9780330657958984, 4.54144287109375, 6.0603790283203125, 15.833297729492188, 13.888954162597656, 12.82609748840332, -1.3862838745117188, -1.280029296875, 7.72077751159668, 13.141632080078125, 15.728300094604492, 9.501174926757812, 2.231037139892578, 1.3775482177734375, 3.338054656982422, -5.312984466552734, 7.175437927246094, 18.414459228515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000196.npy"}
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 6.4464335441589355, "std": 10.398252487182617, "min": -19.352428436279297, "p10": -8.661231613159178, "median": 4.952279090881348, "p90": 18.260497665405275, "max": 24.523178100585938, "pos_frac": 0.765625, "sample": [-2.015666961669922, 24.177230834960938, 4.906955718994141, 1.9674224853515625, 4.184722900390625, 8.582420349121094, -10.239036560058594, -16.354400634765625, 2.69488525390625, -2.3398513793945312, 17.372711181640625, 17.826086044311523, -11.064628601074219, -9.635486602783203, 4.924013137817383, 16.65955352783203, 8.746810913085938, 2.9255104064941406, 24.523178100585938, 20.091476440429688, 2.6887969970703125, 6.390403747558594, 15.82351303100586, 9.79780387878418, 10.42523193359375, 1.99407958984375, 4.759223937988281, -6.387969970703125, 15.945253372192383, 11.396575927734375, 16.55828094482422, 16.639801025390625, -13.680221557617188, 16.396902084350586, -4.74700927734375, -19.352428436279297, 4.759937286376953, 4.9805450439453125, 17.96875, 10.061714172363281, -0.5417976379394531, 17.38719940185547, 19.12720489501953, 0.0868988037109375, -2.06341552734375, 21.540512084960938, 1.250335693359375, 0.40216827392578125, 4.347373962402344, 17.071624755859375, -4.88752555847168, -1.7509613037109375, 12.999580383300781, 14.650579452514648, 3.4984960556030273, -11.845367431640625, 23.15996551513672, 12.535934448242188, 0.8468475341796875, 10.022258758544922, 3.5862503051757812, 14.447799682617188, 7.961158752441406, 18.38553237915039], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000197.npy"}
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 7.0605268478393555, "std": 7.862888336181641, "min": -11.444778442382812, "p10": -3.221794700622558, "median": 7.361915588378906, "p90": 16.233576393127443, "max": 26.850482940673828, "pos_frac": 0.828125, "sample": [-2.2871551513671875, 14.276374816894531, 0.3506031036376953, 7.0551300048828125, 5.2766571044921875, 10.639694213867188, -3.622354507446289, 5.973365783691406, 3.4433422088623047, 6.271728515625, 13.199310302734375, 5.58050537109375, 0.09744644165039062, -7.110715866088867, 26.579851150512695, 19.323413848876953, 8.627609252929688, 3.43939208984375, 8.160293579101562, -0.3748207092285156, 10.442413330078125, 10.869718551635742, 16.329191207885742, -6.582788467407227, 19.870574951171875, -9.248519897460938, -4.107231140136719, 0.15976905822753906, 16.010475158691406, 11.299722671508789, 12.645553588867188, 12.205520629882812, 6.2644195556640625, -11.444778442382812, 6.2385711669921875, -0.31023406982421875, 0.8616981506347656, 2.0854415893554688, -5.023841857910156, 5.631355285644531, 7.668701171875, 12.197349548339844, 15.91695785522461, 10.039810180664062, 11.900983810424805, 7.9054107666015625, 0.18834877014160156, 20.763315200805664, 26.850482940673828, 8.120643615722656, 4.925514221191406, 2.2042999267578125, 10.409004211425781, 15.662050247192383, 5.5956268310546875, 8.152229309082031, 8.737945556640625, 7.935375213623047, -1.2869377136230469, 6.501712799072266, 6.823646545410156, 8.49957275390625, 17.212677001953125, 9.852298736572266], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000198.npy"}
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 5.297488212585449, "std": 10.330986022949219, "min": -18.685646057128906, "p10": -9.141678619384765, "median": 7.359708786010742, "p90": 16.606206512451173, "max": 25.84933090209961, "pos_frac": 0.71875, "sample": [12.797870635986328, 11.91126823425293, 3.2027969360351562, 6.629375457763672, -6.540227890014648, -12.428802490234375, 5.141288757324219, 16.876026153564453, 9.9390869140625, 8.20501708984375, 11.48455810546875, -4.073291778564453, 25.84933090209961, 13.748046875, 23.942001342773438, 17.149688720703125, 14.595630645751953, 12.796638488769531, 11.203781127929688, 15.021125793457031, -3.2590789794921875, -12.589488983154297, 12.336994171142578, 15.603630065917969, -2.3057422637939453, 1.0331573486328125, 6.859066009521484, -4.1841278076171875, 12.11684799194336, 11.317474365234375, 8.451889038085938, 1.8803749084472656, 15.34225082397461, -3.0473480224609375, -0.8169031143188477, 1.7657623291015625, 1.82568359375, -9.367843627929688, -9.702552795410156, 9.504776000976562, -18.645687103271484, 8.045013427734375, 2.6288719177246094, 15.034629821777344, -14.333473205566406, 7.8603515625, 9.003572463989258, 4.207370758056641, -3.3533668518066406, 1.83172607421875, 0.2981071472167969, -18.685646057128906, -6.2613983154296875, 15.976627349853516, -8.613960266113281, 0.1105499267578125, 16.998214721679688, -3.9958648681640625, 15.906951904296875, 10.719409942626953, 22.64061737060547, 0.5022048950195312, 21.525535583496094, 9.422855377197266], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000199.npy"}
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 7.785175800323486, "std": 10.806940078735352, "min": -23.911361694335938, "p10": -4.658149337768554, "median": 7.139091491699219, "p90": 24.115522766113287, "max": 26.945724487304688, "pos_frac": 0.78125, "sample": [4.132102966308594, 16.76681900024414, 25.810882568359375, 3.7439117431640625, 7.7714691162109375, 1.7642364501953125, 21.298049926757812, 3.8229904174804688, 7.175682067871094, -3.42010498046875, 2.608489990234375, 12.7901611328125, -6.741748809814453, 26.407546997070312, 1.6006240844726562, -1.710012435913086, 5.0796051025390625, -1.1856651306152344, 22.48699951171875, -8.363265991210938, 0.476409912109375, 0.8519134521484375, 11.128494262695312, 16.967899322509766, 1.4296646118164062, 13.042972564697266, 15.884275436401367, -5.425331115722656, 13.703205108642578, -0.3972930908203125, 19.932418823242188, 14.045036315917969, 7.102500915527344, 24.813461303710938, 24.819061279296875, -2.907533645629883, -23.911361694335938, 12.658195495605469, 26.945724487304688, 19.880226135253906, 18.230552673339844, 9.812164306640625, 26.705841064453125, 6.4955902099609375, 8.08087158203125, -15.530914306640625, 1.0895004272460938, 4.602020263671875, 9.48982048034668, 25.62531280517578, 1.6865921020507812, 8.99057388305664, 6.685041427612305, -4.45660400390625, 7.1897430419921875, 12.728363037109375, -4.744525909423828, -3.082622528076172, 8.62271499633789, 4.749164581298828, 0.4802265167236328, -5.144392967224121, 17.655044555664062, 19.412429809570312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000200.npy"}
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 6.5143938064575195, "std": 10.071527481079102, "min": -15.658920288085938, "p10": -7.439478302001953, "median": 5.23114013671875, "p90": 21.148456573486335, "max": 28.52845001220703, "pos_frac": 0.734375, "sample": [3.09918212890625, 10.332275390625, 7.1041259765625, 5.114288330078125, -0.21698951721191406, 8.246376037597656, 16.328968048095703, 3.423776626586914, 8.285568237304688, -0.31522369384765625, 1.2039260864257812, 13.137918472290039, 15.925912857055664, 13.527236938476562, 0.8396244049072266, -7.15728759765625, 28.52845001220703, 19.082054138183594, -0.028533935546875, 25.25457000732422, -7.560417175292969, 2.9025497436523438, 7.301704406738281, -8.47580337524414, 2.7560863494873047, 22.044235229492188, 10.922752380371094, 10.524768829345703, -8.651437759399414, 18.665313720703125, 1.8607444763183594, 22.0340576171875, 2.6380062103271484, -3.174297332763672, 13.986438751220703, 14.037181854248047, -1.6899642944335938, -8.631912231445312, 0.313201904296875, 5.347991943359375, 16.224884033203125, 8.024372100830078, -2.890932083129883, 15.175373077392578, -3.860177993774414, 16.550399780273438, 6.812740325927734, -10.577911376953125, 4.905878067016602, 22.804428100585938, -2.1388301849365234, 5.108306884765625, 7.025516510009766, 14.724292755126953, 9.723146438598633, 25.801406860351562, 14.626087188720703, -3.9649276733398438, 0.984893798828125, -9.485427856445312, 24.31679916381836, 1.9271392822265625, 1.8952178955078125, -15.658920288085938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000201.npy"}
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 7.564221382141113, "std": 10.052115440368652, "min": -17.6904296875, "p10": -4.476778411865234, "median": 8.619969367980957, "p90": 18.6509147644043, "max": 30.150482177734375, "pos_frac": 0.78125, "sample": [-4.091114044189453, 5.385101318359375, 28.825206756591797, 30.150482177734375, 18.59857177734375, -2.4813385009765625, 12.029624938964844, 14.800506591796875, 14.378288269042969, 12.357574462890625, 2.3015613555908203, 13.29837417602539, 2.373027801513672, 5.501213073730469, -2.8243484497070312, 13.525108337402344, 19.369964599609375, 0.8359832763671875, 12.517005920410156, -4.642063140869141, 7.125640869140625, 7.383472442626953, 13.684555053710938, 4.5796051025390625, -5.621768951416016, 15.570266723632812, 4.7477569580078125, 28.84494400024414, -0.25037384033203125, 14.703874588012695, -1.6141891479492188, 9.240522384643555, 9.323509216308594, 10.575103759765625, 2.6291656494140625, -17.6904296875, 13.856521606445312, 9.766695022583008, 12.515037536621094, -14.039505004882812, 8.919754028320312, 7.9886322021484375, 9.905014038085938, 6.669929504394531, 18.055328369140625, 7.580619812011719, 16.29486083984375, 3.563814163208008, 19.606597900390625, -1.9725112915039062, 0.8905029296875, -14.595584869384766, -0.7781848907470703, -5.748310089111328, 3.5381317138671875, 11.688369750976562, 19.389053344726562, 1.2135505676269531, 10.438953399658203, 14.235055923461914, -15.318641662597656, 18.01258087158203, 8.320184707641602, 18.67334747314453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000202.npy"}
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 6.927526473999023, "std": 10.273737907409668, "min": -18.015335083007812, "p10": -4.7321434020996085, "median": 5.466495513916016, "p90": 20.386658477783204, "max": 31.678237915039062, "pos_frac": 0.75, "sample": [6.283149719238281, 1.2237091064453125, -8.844955444335938, -4.024169921875, 3.281494140625, 8.649784088134766, 1.2972335815429688, 9.778656005859375, 3.6806869506835938, -2.9360885620117188, 16.90582275390625, 7.605842590332031, 7.091455459594727, 26.773040771484375, 1.8725814819335938, 31.678237915039062, 20.412918090820312, 1.6543655395507812, 17.554550170898438, 8.329193115234375, -18.015335083007812, -0.8628768920898438, 11.520584106445312, -2.3275108337402344, -1.0937328338623047, -5.7620391845703125, 4.5248565673828125, 26.955780029296875, 2.4623336791992188, 12.658653259277344, 8.47833251953125, 4.822563171386719, 20.32538604736328, 5.105886459350586, 12.15631103515625, 1.3786773681640625, 20.73023223876953, 12.68332290649414, 22.607803344726562, 6.132955551147461, 3.0700607299804688, -2.048126220703125, 3.7388534545898438, 8.529571533203125, 16.29861068725586, -0.9320945739746094, 19.3746337890625, 18.42444610595703, -5.035560607910156, 30.090896606445312, 10.004493713378906, 2.0890884399414062, 0.75067138671875, 5.4380035400390625, 6.7985076904296875, -10.657012939453125, -0.3756256103515625, 5.494987487792969, 19.8046875, 15.857301712036133, 12.946060180664062, -11.439682006835938, -7.579010009765625, -0.031742095947265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000203.npy"}
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 7.061635494232178, "std": 10.766473770141602, "min": -23.08606719970703, "p10": -3.811529731750488, "median": 6.531612396240234, "p90": 24.378340148925787, "max": 30.66246795654297, "pos_frac": 0.75, "sample": [27.1280517578125, 19.673683166503906, -7.139093399047852, 6.795564651489258, 6.45379638671875, 30.66246795654297, 2.803314208984375, 9.885751724243164, -6.348297119140625, 5.035369873046875, 24.89862060546875, -13.464698791503906, -3.9061031341552734, 7.0173492431640625, 6.609428405761719, 8.765586853027344, 12.36273193359375, 2.3382949829101562, -12.6944580078125, -0.08074188232421875, 13.00499153137207, 5.111656188964844, 11.110687255859375, 19.859588623046875, -1.1047782897949219, -3.097625732421875, 9.269287109375, 8.210342407226562, 2.3205394744873047, 5.8336029052734375, 25.12898826599121, -3.5908584594726562, 6.317365646362305, 12.707786560058594, 12.581186294555664, -2.058626174926758, 11.012630462646484, 27.427989959716797, -3.5598011016845703, 5.73907470703125, 3.830150604248047, 2.538036346435547, 17.05231475830078, 20.274356842041016, -8.286964416503906, 5.451534271240234, 10.433578491210938, -2.0672473907470703, 15.947479248046875, 0.15006637573242188, 23.164352416992188, 25.385650634765625, 0.8048496246337891, 13.065811157226562, 6.779533386230469, 25.829486846923828, 10.467567443847656, 0.6404838562011719, 0.5117988586425781, 10.833816528320312, -23.08606719970703, -3.131765365600586, 7.767152786254883, -1.43194580078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000204.npy"}
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 7.253925800323486, "std": 11.939494132995605, "min": -22.89361572265625, "p10": -7.157149124145508, "median": 7.710343360900879, "p90": 22.259577178955077, "max": 35.167381286621094, "pos_frac": 0.734375, "sample": [33.167449951171875, 2.7668323516845703, -0.5744342803955078, 5.8425750732421875, 8.452966690063477, 31.719951629638672, 7.62736701965332, 12.795066833496094, -22.89361572265625, 20.877410888671875, 3.354665756225586, 3.7084999084472656, 35.167381286621094, 8.93830680847168, 10.501298904418945, 1.3436660766601562, 4.631980895996094, -7.231536865234375, -3.273040771484375, 10.723190307617188, 4.525043487548828, 13.314804077148438, 1.5607051849365234, 7.7933197021484375, 15.492813110351562, -1.6942291259765625, -8.361198425292969, -0.903594970703125, 2.0034561157226562, 26.147621154785156, -1.9197998046875, -5.0858612060546875, 11.897186279296875, 9.023712158203125, -2.1259994506835938, 11.93408203125, 11.358213424682617, 8.500732421875, -1.8036346435546875, 18.238616943359375, -2.45074462890625, 19.443138122558594, 15.073165893554688, 12.469802856445312, 27.56940460205078, 22.4659423828125, 0.312103271484375, 6.3568115234375, 21.22557830810547, -8.393585205078125, 7.338951110839844, 8.217193603515625, 14.523277282714844, 11.595718383789062, -16.37451171875, 22.28740692138672, 15.168937683105469, -7.683418273925781, 22.19464111328125, -6.983577728271484, 2.689483642578125, -21.71338653564453, 1.2943687438964844, 10.082592010498047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000205.npy"}
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 5.3691887855529785, "std": 12.401838302612305, "min": -20.982742309570312, "p10": -10.736388778686521, "median": 3.8622779846191406, "p90": 22.37092742919922, "max": 28.827354431152344, "pos_frac": 0.65625, "sample": [-5.87959098815918, -2.745849609375, 4.502540588378906, -17.907508850097656, -8.570873260498047, -1.6208267211914062, 4.018379211425781, 8.463138580322266, 14.322029113769531, 8.887985229492188, 2.7394771575927734, 27.508750915527344, 10.488468170166016, -2.4543533325195312, -6.878929138183594, 13.77545166015625, -2.582550048828125, 22.593482971191406, -15.200714111328125, 5.197439193725586, 28.827354431152344, 10.529499053955078, 3.055938720703125, 3.186676025390625, -4.904966354370117, 21.85163116455078, 6.895967483520508, -16.683141708374023, 12.794622421264648, 18.249710083007812, 17.51458740234375, 26.1145076751709, 1.19915771484375, 7.444402694702148, 21.803466796875, -1.954620361328125, -20.982742309570312, -11.664466857910156, -4.185548782348633, 1.5899658203125, 12.859527587890625, 0.8522605895996094, 15.922576904296875, -5.984519958496094, -1.8054428100585938, -13.209033966064453, 24.597816467285156, 15.755615234375, 2.7253265380859375, 3.7061767578125, 16.626815795898438, 19.48065948486328, 26.257484436035156, -6.662162780761719, 13.284263610839844, 20.83563995361328, 3.678497314453125, -1.1540851593017578, 4.321460723876953, 24.940994262695312, -17.90891456604004, -2.9362030029296875, 0.02693939208984375, 8.078437805175781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000206.npy"}
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 7.982369899749756, "std": 11.401705741882324, "min": -14.030433654785156, "p10": -4.385458374023437, "median": 5.750144958496094, "p90": 25.29790229797364, "max": 34.42835235595703, "pos_frac": 0.734375, "sample": [34.42835235595703, -0.6814727783203125, -2.512022018432617, 1.4627151489257812, 10.619422912597656, 9.394149780273438, 2.244293212890625, 3.7098236083984375, 1.5672836303710938, 23.420654296875, 7.719974517822266, 3.0661773681640625, -1.322671890258789, 26.06793975830078, -1.7630252838134766, -4.516777038574219, 6.8484954833984375, 14.985687255859375, 4.3550872802734375, 19.256046295166016, 23.501148223876953, 6.8715972900390625, 13.781314849853516, 5.57769775390625, -6.509405136108398, -6.117713928222656, -3.824575424194336, 28.297821044921875, -4.079048156738281, 5.9225921630859375, 23.360244750976562, 3.331829071044922, 30.761474609375, -1.3252792358398438, 4.8534698486328125, -2.8671875, -8.807332992553711, 12.882766723632812, 7.29656982421875, 1.5907783508300781, 14.399932861328125, 19.68133544921875, 3.8919296264648438, 10.296039581298828, 0.1189727783203125, 14.499618530273438, 10.986190795898438, -11.82379150390625, 19.61345100402832, 11.796401977539062, -14.030433654785156, 15.114507675170898, -0.9948577880859375, 10.849891662597656, 3.0810699462890625, 32.03697967529297, 28.806907653808594, 3.5396652221679688, 10.25439453125, 10.847442626953125, -8.90264892578125, -1.3360595703125, 4.705802917480469, 30.590045928955078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000207.npy"}
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 4.607153415679932, "std": 11.03698444366455, "min": -22.856983184814453, "p10": -8.566188049316406, "median": 3.9517736434936523, "p90": 19.415039062500004, "max": 26.04901885986328, "pos_frac": 0.625, "sample": [9.05763053894043, 0.7671070098876953, 18.342449188232422, -5.0726470947265625, 22.549606323242188, 19.69196319580078, 1.3564128875732422, -2.65179443359375, -9.329376220703125, -17.687789916992188, -1.7008056640625, -9.0775146484375, 18.768882751464844, 10.939674377441406, 12.796314239501953, 3.4865894317626953, 7.347076416015625, 9.365142822265625, -3.487039566040039, -7.1382904052734375, 15.222000122070312, -7.3089752197265625, 1.2767372131347656, -1.855255126953125, 9.779773712158203, 24.455848693847656, 16.222496032714844, -4.127660751342773, 7.1330413818359375, 13.92475700378418, -4.488821029663086, -1.0430831909179688, -4.31707763671875, 3.4065399169921875, -16.25524139404297, 13.340606689453125, 19.933483123779297, 9.842796325683594, 4.6751708984375, 1.0499458312988281, 26.04901885986328, 7.5895538330078125, 5.373130798339844, -2.3477249145507812, 18.325241088867188, 0.6141357421875, 23.262222290039062, -2.467113494873047, 13.335531234741211, 4.416957855224609, -10.469291687011719, 16.209762573242188, -0.7036361694335938, 13.164283752441406, -7.3730926513671875, 9.723918914794922, 7.9257354736328125, -5.5011444091796875, -13.490320205688477, -1.3671798706054688, -22.856983184814453, 3.452220916748047, 11.330955505371094, 21.47095489501953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000208.npy"}
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 9.348740577697754, "std": 13.418309211730957, "min": -28.653526306152344, "p10": -5.244469451904297, "median": 7.427080154418945, "p90": 26.10232391357422, "max": 34.073326110839844, "pos_frac": 0.78125, "sample": [23.28377914428711, 24.623231887817383, 4.6749267578125, 7.859287261962891, -8.53094482421875, 30.279327392578125, 10.005929946899414, 23.80047607421875, 3.182464599609375, 31.34082794189453, -4.667488098144531, 6.145744323730469, -7.6214599609375, 19.864837646484375, -0.6887054443359375, 4.752128601074219, -2.5225830078125, -0.77996826171875, -2.1594161987304688, 33.190216064453125, 2.143016815185547, -9.365005493164062, 0.7519760131835938, 21.088760375976562, 5.1876373291015625, 5.5994110107421875, 16.447540283203125, 18.1002197265625, 27.34149169921875, 11.826393127441406, 16.60747718811035, 12.477325439453125, 4.564540863037109, 24.541114807128906, 25.427200317382812, 1.33697509765625, 26.39166259765625, 10.882530212402344, -28.653526306152344, 6.8095703125, -4.90142822265625, 1.9488105773925781, -3.3744354248046875, -5.391487121582031, 24.452728271484375, 13.02981185913086, 34.073326110839844, 22.955162048339844, 10.608695983886719, 19.746875762939453, 4.273885726928711, -11.153356552124023, 3.167051315307617, 12.09869384765625, 5.568634033203125, 2.376924514770508, 18.037975311279297, -26.779129028320312, 7.5844268798828125, 12.054292678833008, 23.83197021484375, 29.841968536376953, 7.269733428955078, 1.4593391418457031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000209.npy"}
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 9.002415657043457, "std": 13.351548194885254, "min": -16.447799682617188, "p10": -8.301068115234372, "median": 6.410961151123047, "p90": 26.827783584594727, "max": 36.32450485229492, "pos_frac": 0.765625, "sample": [27.219322204589844, 4.7108917236328125, 2.8211669921875, 13.259819030761719, -1.7524700164794922, 16.575027465820312, 36.32450485229492, 16.70904541015625, 30.583724975585938, 28.505035400390625, 2.8843727111816406, 28.877822875976562, 26.862876892089844, -3.604705810546875, 26.49554443359375, 2.194549560546875, -14.162725448608398, -15.856464385986328, 26.745899200439453, -9.478385925292969, 26.44860076904297, 13.540657043457031, 3.9810123443603516, -0.432464599609375, 25.177825927734375, 1.53271484375, -0.9412879943847656, 6.503944396972656, 25.62841796875, -5.553993225097656, 26.575294494628906, 0.9184932708740234, 8.471923828125, 6.3179779052734375, 10.59347152709961, -3.415264129638672, 1.0219917297363281, -4.914432525634766, -2.593852996826172, 25.29829978942871, -12.958829879760742, 12.012664794921875, 7.2133026123046875, 4.5321044921875, 20.260536193847656, 0.10669136047363281, 13.116598129272461, 1.4736270904541016, 8.880500793457031, 3.5741348266601562, 9.07559585571289, 19.03413963317871, 5.632831573486328, -14.325265884399414, 3.8494491577148438, 31.554405212402344, 18.717002868652344, 18.591053009033203, 18.99330711364746, -16.447799682617188, 1.9856910705566406, -12.355522155761719, 5.1415863037109375, 18.422637939453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000210.npy"}
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 8.534067153930664, "std": 12.497889518737793, "min": -17.489036560058594, "p10": -5.952279281616211, "median": 5.285987854003906, "p90": 27.355234909057618, "max": 40.995849609375, "pos_frac": 0.765625, "sample": [1.215362548828125, -17.489036560058594, 15.049362182617188, 4.734209060668945, 0.4704246520996094, 2.2364768981933594, -5.224151611328125, -1.1460113525390625, 7.002349853515625, 2.1254005432128906, 16.513275146484375, -0.10804367065429688, 14.021461486816406, 25.453887939453125, 7.064777374267578, 14.409244537353516, 1.409698486328125, -0.14291000366210938, 22.323623657226562, -6.290470123291016, 3.9774341583251953, -8.17218017578125, 14.226730346679688, -5.96209716796875, 3.479778289794922, -11.991479873657227, 0.6509628295898438, 5.7948150634765625, 31.667156219482422, 6.356296539306641, 15.927749633789062, 2.20361328125, 8.787069320678711, 2.3308486938476562, 27.406997680664062, 13.77469253540039, 9.093204498291016, 3.4948272705078125, -5.929370880126953, 16.433673858642578, 4.77716064453125, 40.995849609375, 30.736297607421875, 3.058919906616211, -0.43981170654296875, 20.49820327758789, 27.293094635009766, 15.618316650390625, 9.450790405273438, 29.986968994140625, -15.510139465332031, 6.754779815673828, 27.381866455078125, 24.29698944091797, 23.27617645263672, 23.287689208984375, 4.302482604980469, 14.003631591796875, -2.453092575073242, 28.794601440429688, -2.9417991638183594, -6.08648681640625, 0.4105072021484375, 1.507598876953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000211.npy"}
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 9.557143211364746, "std": 13.5269136428833, "min": -31.879730224609375, "p10": -5.0732236862182605, "median": 9.512225151062012, "p90": 27.826911544799806, "max": 35.46950912475586, "pos_frac": 0.796875, "sample": [28.467880249023438, 17.574527740478516, 21.037109375, 27.196598052978516, 2.160400390625, 3.2032241821289062, 35.46950912475586, 13.556747436523438, -2.1799087524414062, 23.400497436523438, -2.1391258239746094, -31.879730224609375, 24.942739486694336, 34.526336669921875, 7.598175048828125, 19.166656494140625, 24.2899169921875, 12.126863479614258, -14.612411499023438, 16.084501266479492, 10.831748962402344, 0.991119384765625, 0.7253646850585938, 6.075355529785156, -17.058212280273438, 1.1974639892578125, 8.852081298828125, -3.5844058990478516, 25.625455856323242, 28.792694091796875, 15.477527618408203, 13.508216857910156, 19.14940643310547, 9.928741455078125, 14.126296997070312, 4.959953308105469, -2.993274688720703, 12.487831115722656, 0.8401775360107422, 13.332592010498047, 0.75384521484375, 1.7643661499023438, 0.6784629821777344, -2.8491287231445312, 9.095708847045898, 19.45555877685547, 11.186033248901367, 33.901390075683594, -11.080020904541016, -5.7112884521484375, -1.3583412170410156, 1.7827606201171875, 15.869773864746094, -7.851818084716797, 15.689590454101562, 28.0970458984375, 24.24231719970703, 28.71759796142578, 21.016773223876953, 1.9621505737304688, 5.07086181640625, 7.3815765380859375, -10.013992309570312, 0.5992965698242188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000212.npy"}
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 10.995447158813477, "std": 14.516810417175293, "min": -33.77265167236328, "p10": -4.87284240722656, "median": 9.305355072021484, "p90": 30.386119842529304, "max": 43.022918701171875, "pos_frac": 0.78125, "sample": [22.543777465820312, 1.3207778930664062, 5.2155914306640625, 26.47905731201172, 10.430912017822266, 6.281944274902344, -6.038818359375, -2.3726463317871094, -1.92559814453125, 23.516014099121094, 27.431961059570312, 1.9869136810302734, 2.1026535034179688, -5.7020263671875, -33.77265167236328, -8.364761352539062, 19.18752098083496, 43.022918701171875, -0.8420448303222656, 15.478759765625, 24.791927337646484, -17.762939453125, 28.867820739746094, 36.013282775878906, 25.06658935546875, 5.343698501586914, -1.5732746124267578, -11.005973815917969, 3.9747695922851562, 5.468053817749023, -6.198692321777344, 6.618865966796875, 14.523689270019531, 10.048416137695312, 15.715744018554688, 13.886199951171875, 3.0605297088623047, 21.896209716796875, 27.469242095947266, 6.080291748046875, 3.9847488403320312, 8.562294006347656, 31.65428924560547, 1.1074905395507812, 24.787521362304688, 11.384521484375, 1.6358489990234375, 18.713336944580078, 32.71636962890625, 1.34796142578125, -2.938079833984375, 19.16930389404297, -1.128570556640625, 1.9907302856445312, 3.10650634765625, 16.04755401611328, -1.67425537109375, 33.819976806640625, 13.236412048339844, 11.357925415039062, 27.946182250976562, 31.27423858642578, 26.30475616455078, 31.036819458007812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000213.npy"}
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 10.645904541015625, "std": 14.614936828613281, "min": -21.945327758789062, "p10": -6.952447891235352, "median": 9.799899101257324, "p90": 31.26233882904053, "max": 43.67909240722656, "pos_frac": 0.71875, "sample": [17.01558494567871, 13.42431640625, -0.35052490234375, 7.914836883544922, 5.454620361328125, 20.560861587524414, 2.658252716064453, 6.402826309204102, 17.660430908203125, 27.797828674316406, 30.350618362426758, 21.88313102722168, 11.009672164916992, 1.2330474853515625, 17.934844970703125, 28.16326904296875, 18.798141479492188, -15.764686584472656, 17.625205993652344, 19.215938568115234, -0.8010787963867188, 17.194717407226562, 8.501411437988281, -7.07037353515625, -0.9325389862060547, 28.44508934020996, -8.104324340820312, 16.67462158203125, 34.56570816040039, 25.83531951904297, 14.526023864746094, -4.873619079589844, -5.576684951782227, 3.7357406616210938, 15.759941101074219, 25.20794677734375, -13.90093994140625, -1.33319091796875, 31.83588981628418, 43.67909240722656, 0.7647628784179688, -8.54061508178711, 25.519874572753906, 13.408885955810547, 34.8884162902832, 35.32093811035156, 5.080783843994141, 0.12007522583007812, 6.752769470214844, -6.677288055419922, -4.920989990234375, 35.832061767578125, 5.44184684753418, -2.563283920288086, 5.065835952758789, -0.7584247589111328, 31.653076171875, 8.590126037597656, -21.945327758789062, 16.159881591796875, 11.031288146972656, 14.382434844970703, -2.9673614501953125, -12.65882682800293], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000214.npy"}
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 10.953174591064453, "std": 15.440237045288086, "min": -21.60851287841797, "p10": -8.403816413879394, "median": 6.396636962890625, "p90": 29.818287658691407, "max": 35.975032806396484, "pos_frac": 0.703125, "sample": [9.264904022216797, -8.423919677734375, -21.60851287841797, 13.377288818359375, 8.395408630371094, -13.541961669921875, 4.151618957519531, -3.0193023681640625, 29.443410873413086, 31.01241683959961, 4.527656555175781, -2.704700469970703, -10.764595031738281, 5.798530578613281, 14.672439575195312, 34.55010986328125, 3.5897979736328125, 27.571678161621094, 7.568878173828125, -0.665802001953125, 5.323692321777344, 6.589622497558594, -0.002529144287109375, 22.290435791015625, 28.964481353759766, 0.5507755279541016, -3.8690338134765625, 26.726463317871094, 23.99938201904297, -8.356908798217773, -8.844348907470703, 27.736831665039062, 3.382770538330078, -7.635528564453125, -3.2023773193359375, -14.448089599609375, 20.75275421142578, 16.973297119140625, 25.277923583984375, 29.8614501953125, 4.843229293823242, 6.203651428222656, -4.93231201171875, -1.6227493286132812, 28.049530029296875, 29.31103515625, 35.176963806152344, 35.975032806396484, 25.339324951171875, 34.17529296875, 20.535053253173828, 28.72350311279297, 24.93408966064453, -11.049373626708984, 5.078468322753906, 1.9118213653564453, 5.37109375, 33.64366149902344, 24.446060180664062, -0.2553749084472656, 29.717575073242188, 2.317413330078125, 25.29095458984375, -7.447200775146484], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000215.npy"}
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 11.672144889831543, "std": 15.648802757263184, "min": -28.798675537109375, "p10": -4.937823677062988, "median": 10.457801818847656, "p90": 31.612817764282234, "max": 66.48959350585938, "pos_frac": 0.765625, "sample": [3.3582229614257812, 29.376846313476562, 14.770263671875, 26.533737182617188, 19.562097549438477, 7.731842041015625, 11.733375549316406, 4.030792236328125, 23.00885009765625, 29.842243194580078, 20.188133239746094, 15.543800354003906, 1.9246902465820312, 10.551971435546875, 10.363632202148438, -0.60113525390625, 6.079235076904297, 11.583663940429688, -4.472991943359375, 34.03620910644531, 19.18902587890625, 34.37870788574219, 13.987083435058594, 19.098426818847656, 14.870750427246094, 5.716312408447266, 32.37163543701172, 1.7151603698730469, 2.7801589965820312, 36.29254150390625, 22.277198791503906, -28.798675537109375, 7.607797622680664, 13.122352600097656, 7.887908935546875, 12.37841796875, -17.6939697265625, 0.7849216461181641, 17.18545913696289, -4.517278671264648, -2.2238311767578125, 66.48959350585938, 34.56620788574219, -6.002052307128906, 4.9905548095703125, -11.041084289550781, 14.268951416015625, 1.0368824005126953, -9.223003387451172, 22.198150634765625, 39.78764724731445, -11.621540069580078, -3.9408798217773438, -0.35845947265625, 29.617210388183594, -0.5543632507324219, 10.080652236938477, 19.75135040283203, -5.1180572509765625, 29.298583984375, -1.7734508514404297, 20.95306396484375, 9.980804443359375, 10.074920654296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000216.npy"}
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 12.108684539794922, "std": 15.351384162902832, "min": -22.811233520507812, "p10": -3.2076026916503895, "median": 9.743703842163086, "p90": 33.73451461791993, "max": 45.38276672363281, "pos_frac": 0.78125, "sample": [19.855865478515625, 32.97209930419922, -4.7808380126953125, 1.2458648681640625, -0.21096038818359375, -3.702312469482422, 2.549531936645508, 45.38276672363281, -0.7315826416015625, 0.54742431640625, 39.20201110839844, 40.18727111816406, 3.1092605590820312, -0.4732704162597656, 9.234611511230469, 3.7984085083007812, 28.288314819335938, 14.526241302490234, 8.190155029296875, 26.488357543945312, 6.171422958374023, 25.085241317749023, 2.6353759765625, 23.06804656982422, 7.802398681640625, 32.31040954589844, -22.811233520507812, -0.01714324951171875, 2.8963584899902344, 10.252796173095703, 31.980640411376953, 17.14020538330078, 5.89141845703125, -5.643394470214844, 12.543487548828125, -11.922943115234375, 2.153167724609375, 18.61834144592285, 1.0123443603515625, -9.634124755859375, -22.451446533203125, 30.383501052856445, 11.442359924316406, -2.0532798767089844, -0.342132568359375, 3.5823287963867188, 5.490837097167969, 11.698440551757812, 12.405036926269531, 42.78163146972656, 12.874860763549805, 26.03063201904297, 7.352376937866211, 1.6224346160888672, 37.343505859375, 16.706100463867188, 35.94287872314453, 16.809310913085938, 12.749595642089844, 28.82333755493164, 23.668258666992188, 34.06126403808594, 13.304550170898438, -0.4825439453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000217.npy"}
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 7.752167701721191, "std": 17.23804473876953, "min": -40.47589874267578, "p10": -11.840415382385252, "median": 6.705181121826172, "p90": 29.88693656921387, "max": 44.3359375, "pos_frac": 0.6875, "sample": [1.595947265625, 1.6367301940917969, 5.8668365478515625, 15.586454391479492, 0.763336181640625, -1.5002593994140625, 7.326713562011719, -2.12384033203125, 1.0522613525390625, 30.645660400390625, 7.904541015625, 11.384017944335938, 26.270614624023438, -27.38251495361328, -10.444061279296875, -17.629446029663086, 6.319427490234375, 33.66645050048828, -24.839691162109375, -0.6054916381835938, 19.590301513671875, 12.364814758300781, 3.7079391479492188, 8.041936874389648, 20.814929962158203, 44.3359375, 29.15631103515625, 39.024688720703125, 18.6276912689209, 20.3272705078125, -2.9789695739746094, 24.760406494140625, 8.472305297851562, -10.613149642944336, -6.047267913818359, -3.982757568359375, 0.8095474243164062, 13.788156509399414, -3.129791259765625, 23.810531616210938, 30.200061798095703, 16.147232055664062, -1.6522369384765625, 4.34320068359375, 7.628406524658203, -12.366386413574219, 21.44072151184082, -0.3231639862060547, -13.753341674804688, -7.655609130859375, 22.516311645507812, -26.53014373779297, 11.332443237304688, 0.2125720977783203, 7.090934753417969, 27.82269287109375, 4.168132781982422, 36.74597930908203, -40.47589874267578, -3.7889785766601562, 13.779632568359375, 3.77044677734375, 28.4566650390625, 40.65452575683594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000218.npy"}
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 8.23349666595459, "std": 14.614946365356445, "min": -22.581905364990234, "p10": -9.500190734863281, "median": 6.8352813720703125, "p90": 29.91617202758789, "max": 36.959938049316406, "pos_frac": 0.703125, "sample": [7.199615478515625, 32.770259857177734, 22.79491424560547, 3.135507583618164, 14.676010131835938, -1.0396881103515625, 1.7752952575683594, 0.9883460998535156, 7.063697814941406, -13.167869567871094, -0.9623966217041016, 36.959938049316406, 16.85205078125, -9.234176635742188, -11.778091430664062, 11.693527221679688, 13.220352172851562, 2.4091320037841797, 12.251022338867188, 1.1522502899169922, 1.4221248626708984, -9.61419677734375, -6.6629180908203125, 7.1066436767578125, -13.764190673828125, -4.14703369140625, 4.899085998535156, -1.3298053741455078, 17.972885131835938, 6.133396148681641, 25.541046142578125, 30.474594116210938, 30.06385040283203, 34.10618591308594, 23.50464630126953, 29.964706420898438, -2.8423004150390625, 2.4468917846679688, -3.4283218383789062, 11.537479400634766, 8.947816848754883, 9.374015808105469, -22.280906677246094, 10.795478820800781, -0.8962783813476562, -2.9108200073242188, 22.87175750732422, 3.1473846435546875, 14.582962036132812, 35.88770294189453, 24.448158264160156, 26.861038208007812, 7.838459014892578, -1.4049949645996094, -22.581905364990234, 1.8704643249511719, 3.2143020629882812, 6.606864929199219, 25.127578735351562, -1.5264434814453125, 29.80292510986328, -20.29901123046875, 27.52338409423828, 7.7993927001953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000219.npy"}
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 9.964658737182617, "std": 17.040769577026367, "min": -25.47081756591797, "p10": -7.651459121704101, "median": 6.240787506103516, "p90": 33.00094909667969, "max": 45.90142822265625, "pos_frac": 0.671875, "sample": [2.271099090576172, -14.206253051757812, 15.231266021728516, 5.069023132324219, 1.2073650360107422, 12.2630615234375, 37.69976806640625, 7.2325439453125, 6.4904632568359375, 4.959571838378906, -2.8005828857421875, 40.10035705566406, 3.727191925048828, -23.692428588867188, -7.908130645751953, 2.2065582275390625, 35.74748992919922, 0.41439056396484375, -2.9682273864746094, 1.2042160034179688, 26.974082946777344, 9.024726867675781, -2.2605228424072266, -20.090227127075195, -18.410293579101562, -10.385538101196289, -6.078855514526367, -7.052558898925781, -2.53277587890625, 21.79755401611328, 26.59801483154297, 18.41175079345703, 19.966567993164062, 30.93084716796875, 34.749778747558594, 2.3764686584472656, 30.48785400390625, 21.999290466308594, 27.765342712402344, -3.7511520385742188, 1.7538604736328125, 16.210039138793945, -3.8208541870117188, 14.854515075683594, 45.90142822265625, -1.3699569702148438, 33.351829528808594, -0.18252182006835938, 7.440847396850586, 32.182228088378906, 26.3303279876709, -6.3447113037109375, 31.1700496673584, -0.20009231567382812, 12.242408752441406, 5.991111755371094, -25.47081756591797, 15.857860565185547, 27.440841674804688, -0.2444782257080078, 42.56178283691406, -4.847492218017578, 28.44198226928711, 13.718877792358398], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000220.npy"}
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 10.86303997039795, "std": 16.9837646484375, "min": -28.31214141845703, "p10": -8.523681259155273, "median": 7.685710906982422, "p90": 36.32310180664064, "max": 46.439697265625, "pos_frac": 0.765625, "sample": [2.7772998809814453, -5.880699157714844, 15.759162902832031, 16.16558837890625, -1.0465545654296875, 27.536285400390625, 6.815485000610352, 3.129974365234375, 17.472225189208984, -14.961418151855469, -8.25759506225586, 5.5533599853515625, 11.200115203857422, 8.389266967773438, -11.641693115234375, 4.795011520385742, 2.7316055297851562, -0.472930908203125, 13.57337760925293, 17.57217788696289, 27.287437438964844, 29.750030517578125, 11.247875213623047, 31.536834716796875, 43.94500732421875, 3.9827423095703125, 26.985504150390625, 44.52446365356445, -3.3590316772460938, 1.4647483825683594, 17.537551879882812, 38.374359130859375, 0.8717231750488281, 11.692718505859375, 44.84490966796875, 10.000877380371094, 24.15570068359375, -8.637718200683594, 20.036270141601562, 15.427520751953125, 21.68072509765625, -11.025447845458984, 4.9341278076171875, -0.5995922088623047, 0.2811851501464844, -1.4246292114257812, 0.8389339447021484, 39.1539306640625, 0.084716796875, 5.357242584228516, -23.464767456054688, -28.31214141845703, 1.4206771850585938, 46.439697265625, 29.195816040039062, -18.286630630493164, 7.3167724609375, 12.011253356933594, 42.004974365234375, 25.5904541015625, -0.9123001098632812, 2.9626007080078125, 29.05272674560547, 8.054649353027344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000221.npy"}
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 13.956724166870117, "std": 17.75596046447754, "min": -27.636573791503906, "p10": -6.308567428588866, "median": 13.327266693115234, "p90": 35.92926712036133, "max": 46.558685302734375, "pos_frac": 0.78125, "sample": [-1.88079833984375, -15.491065979003906, 18.16086196899414, 9.286865234375, 30.200580596923828, 31.396957397460938, 15.586048126220703, -4.433856964111328, 36.270111083984375, 40.191986083984375, 13.437347412109375, 4.1230316162109375, -24.95311737060547, 2.0745849609375, 29.938493728637695, 2.5607986450195312, 7.0479278564453125, 8.491710662841797, 5.722221374511719, 27.539020538330078, 20.29308319091797, -0.10283088684082031, 31.498077392578125, 31.330429077148438, -2.1875457763671875, 36.131568908691406, 4.38385009765625, 46.21502685546875, -4.28596305847168, 26.170166015625, 13.981325149536133, 8.289169311523438, 36.18304443359375, 31.46563720703125, 8.073226928710938, 13.217185974121094, 14.610477447509766, -7.1120147705078125, -4.265232086181641, -7.9060821533203125, -27.636573791503906, 14.70501708984375, 15.919853210449219, 10.1236572265625, 1.64691162109375, 46.558685302734375, 5.068122863769531, -27.27587127685547, 0.1051483154296875, 29.888343811035156, 30.41741180419922, -0.23372840881347656, 30.30145263671875, 40.711204528808594, 13.150527954101562, 9.075340270996094, 32.62221908569336, 32.23637390136719, 8.92724609375, 35.45722961425781, 22.797332763671875, 17.09197425842285, 34.311065673828125, -13.990890502929688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000222.npy"}
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 13.174043655395508, "std": 15.920175552368164, "min": -12.526493072509766, "p10": -4.875368690490722, "median": 12.507072448730469, "p90": 37.015024185180664, "max": 47.52177429199219, "pos_frac": 0.765625, "sample": [20.40052032470703, 5.354785919189453, 27.60492706298828, 8.280693054199219, -2.8722381591796875, 21.229951858520508, 0.310821533203125, 17.957382202148438, 10.1402587890625, 22.702651977539062, 3.022329330444336, 40.349273681640625, 24.175432205200195, 23.29383087158203, 0.9056129455566406, -7.842159271240234, -0.3428363800048828, 21.572662353515625, 2.0546875, -3.4613380432128906, 34.40522384643555, 12.178253173828125, -9.688426971435547, 23.846097946166992, -8.262214660644531, 2.9112091064453125, 1.9097881317138672, 21.98333740234375, -4.261104583740234, -9.065723419189453, 26.15875244140625, 7.52690315246582, -10.072708129882812, 3.7394332885742188, 0.7873077392578125, 41.895179748535156, 36.39264678955078, 19.04244613647461, -1.8604660034179688, -12.526493072509766, -4.665031433105469, -4.965513229370117, 18.08704376220703, 22.50841522216797, 39.6839599609375, 6.143115997314453, 12.9625244140625, 27.663881301879883, 14.678924560546875, 37.28175735473633, 47.52177429199219, -1.9038772583007812, 46.253868103027344, 41.286041259765625, 33.74127197265625, 12.835891723632812, 25.3480224609375, 27.99005889892578, 13.37042236328125, -2.279722213745117, 1.5520477294921875, 12.980636596679688, 1.999237060546875, 1.18731689453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000223.npy"}
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 7.5786590576171875, "std": 16.3422794342041, "min": -26.053361892700195, "p10": -12.540241241455078, "median": 3.852513313293457, "p90": 30.962076187133796, "max": 46.681739807128906, "pos_frac": 0.65625, "sample": [7.848072052001953, 46.28031921386719, 13.33441162109375, 8.7113037109375, 6.0578460693359375, 36.492671966552734, 17.611095428466797, 14.774383544921875, -0.7338771820068359, 3.5855331420898438, 32.278682708740234, 4.344165802001953, 27.331642150878906, 31.706645965576172, -13.25119400024414, -0.2208099365234375, 3.735910415649414, 14.913034439086914, 14.911872863769531, 1.8929595947265625, 0.4444427490234375, 20.517318725585938, -0.0122528076171875, -26.053361892700195, -12.703697204589844, -1.4609146118164062, -4.1679534912109375, -6.583953857421875, 9.96014404296875, -2.4813385009765625, 27.690589904785156, -23.145565032958984, -15.344785690307617, -6.9030303955078125, -7.32318115234375, 0.1922893524169922, -0.7075099945068359, -18.942367553710938, 4.267005920410156, 10.088363647460938, -17.674240112304688, 18.088592529296875, 26.61547088623047, 2.6735305786132812, -12.158843994140625, 3.2099227905273438, 11.488639831542969, 27.966259002685547, -7.178924560546875, -1.1394119262695312, -1.869436264038086, 14.997888565063477, 27.903453826904297, 34.573028564453125, 46.681739807128906, 14.145484924316406, 3.171295166015625, 3.9691162109375, 0.15848731994628906, 37.83566665649414, -2.7287139892578125, 29.224746704101562, 4.1357879638671875, 2.0097084045410156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000224.npy"}
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 13.439470291137695, "std": 17.788959503173828, "min": -23.735061645507812, "p10": -6.9235050201416, "median": 11.20750617980957, "p90": 40.778519821166995, "max": 49.56721496582031, "pos_frac": 0.796875, "sample": [0.8856601715087891, 14.666885375976562, 4.158664703369141, 0.338409423828125, -5.089019775390625, 22.3778076171875, -7.965770721435547, 0.18982696533203125, 21.91044807434082, 11.730419158935547, 7.270748138427734, 38.52809143066406, 5.019100189208984, 41.110076904296875, -10.60445785522461, 1.1579322814941406, 14.758125305175781, 0.2633209228515625, 26.281726837158203, 39.19782257080078, 42.184959411621094, 18.509475708007812, 44.696495056152344, 15.054092407226562, 49.56721496582031, 15.430656433105469, 10.11053466796875, 32.626922607421875, 40.004886627197266, -4.273307800292969, 1.5698318481445312, -21.77850341796875, 19.88506317138672, -7.709712982177734, 44.51390075683594, 14.949264526367188, 0.9456996917724609, 16.299636840820312, 24.959136962890625, 8.655494689941406, 1.0574569702148438, 5.928138732910156, 39.508270263671875, -0.39753150939941406, 17.793212890625, -14.33587646484375, -10.882843017578125, -1.760589599609375, 15.378631591796875, 4.777687072753906, 23.71623992919922, 0.7912425994873047, 5.250709533691406, 44.631629943847656, 31.458141326904297, -0.9851799011230469, 3.0507278442382812, 10.684593200683594, 42.864200592041016, 20.023008346557617, -23.735061645507812, 18.979476928710938, 35.637367248535156, -1.695098876953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000225.npy"}
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 10.95936107635498, "std": 20.019088745117188, "min": -32.84208679199219, "p10": -10.875880050659177, "median": 7.659109115600586, "p90": 36.66970252990723, "max": 45.429962158203125, "pos_frac": 0.75, "sample": [7.734275817871094, -8.137496948242188, 19.550949096679688, 14.202005386352539, 7.893218994140625, 7.961517333984375, 13.408149719238281, 2.487537384033203, 23.240650177001953, -11.884632110595703, 30.283756256103516, 5.35844612121582, 26.81739044189453, 35.6546516418457, -23.02269744873047, 0.6701812744140625, 4.021528244018555, -32.84208679199219, 36.52602767944336, 18.16083526611328, 39.41899490356445, -1.8873748779296875, 28.64702606201172, 3.501922607421875, 25.094497680664062, 0.09042739868164062, 4.816432952880859, 36.73127746582031, 33.96332550048828, 29.224990844726562, -6.011125564575195, 42.62583923339844, 7.583942413330078, 11.4664306640625, 43.271263122558594, 5.3790740966796875, 20.704360961914062, -32.16450500488281, -2.3906078338623047, 1.006296157836914, 1.1420440673828125, 5.1467132568359375, -8.522125244140625, -31.35308074951172, 39.72743225097656, 45.429962158203125, -22.111190795898438, 5.915092468261719, -28.968307495117188, 19.34235382080078, 6.833545684814453, 2.742856979370117, 10.70523452758789, 10.08377456665039, 2.8090057373046875, 42.85931396484375, 34.16082763671875, 35.501136779785156, -2.8069000244140625, -1.054849624633789, 36.368499755859375, -1.517507553100586, -5.223154067993164, 35.031715393066406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000226.npy"}
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 11.97449779510498, "std": 20.747207641601562, "min": -35.127044677734375, "p10": -14.333302307128903, "median": 5.959417343139648, "p90": 41.33274612426758, "max": 47.17878723144531, "pos_frac": 0.6875, "sample": [36.51111602783203, 11.461261749267578, -1.013702392578125, 39.01384735107422, 3.0958099365234375, -2.269916534423828, 30.88671112060547, 43.75762176513672, -35.127044677734375, -5.484457015991211, 44.48152160644531, -12.029762268066406, -1.7004432678222656, 43.07115936279297, 10.520538330078125, 2.6534347534179688, 1.390777587890625, -1.4437980651855469, 38.55259323120117, -3.0333328247070312, 10.145004272460938, 36.68901824951172, 5.862804412841797, 6.0560302734375, 47.17878723144531, 16.470870971679688, -3.0071983337402344, 30.107376098632812, 0.11519432067871094, -1.1153411865234375, 25.425018310546875, 46.688453674316406, 29.239730834960938, 4.87255859375, 6.7094573974609375, 30.34377670288086, 45.17478561401367, 1.72930908203125, 16.904754638671875, 40.86327362060547, 3.3347244262695312, 10.763877868652344, -17.151878356933594, 7.6553192138671875, 30.872711181640625, -2.51947021484375, 1.742767333984375, 3.4658584594726562, -18.45416259765625, -0.5018424987792969, -27.305946350097656, 25.91399383544922, 12.696380615234375, 32.277366638183594, 3.255422592163086, -17.097442626953125, 2.1399993896484375, -0.195098876953125, 41.238800048828125, -4.73358154296875, -24.309127807617188, -15.320533752441406, 37.479103088378906, 41.373008728027344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000227.npy"}
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 11.78009033203125, "std": 18.425840377807617, "min": -31.074337005615234, "p10": -9.969034576416014, "median": 8.188926696777344, "p90": 36.911511993408205, "max": 51.26502990722656, "pos_frac": 0.71875, "sample": [-0.15966224670410156, 36.50092315673828, 2.740020751953125, 35.221229553222656, 8.188064575195312, 20.67914581298828, 12.016561508178711, 43.48999786376953, 7.4275970458984375, -17.396202087402344, 21.2607364654541, 0.620330810546875, 7.96142578125, -31.074337005615234, 6.5309906005859375, 1.8419628143310547, 37.08747863769531, -10.374862670898438, 8.355602264404297, 14.665142059326172, 0.7240047454833984, 38.70458984375, -24.03014373779297, 8.189788818359375, -5.51446533203125, 33.78004837036133, 28.37198829650879, 51.26502990722656, 4.466054916381836, 26.008712768554688, -7.829265594482422, -9.022102355957031, 30.548336029052734, -0.7614517211914062, 21.739219665527344, -4.1817474365234375, 26.453472137451172, -1.8139152526855469, -13.017303466796875, -6.099315643310547, 17.094619750976562, 26.052764892578125, 22.87213134765625, -16.58211326599121, 7.384429931640625, 2.0307998657226562, 43.653045654296875, 45.905242919921875, -11.6387939453125, 7.112579345703125, -1.104278564453125, 8.104610443115234, 12.503494262695312, 13.805889129638672, -2.3906326293945312, 10.91781997680664, 35.523406982421875, 43.12401580810547, 30.067543029785156, 18.35626220703125, -6.865509033203125, 11.137855529785156, 28.312036514282227, 4.984855651855469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000228.npy"}
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 11.715320587158203, "std": 21.139482498168945, "min": -41.78205490112305, "p10": -10.161994171142577, "median": 11.284873962402344, "p90": 37.83474884033203, "max": 49.046539306640625, "pos_frac": 0.75, "sample": [24.794265747070312, 37.767906188964844, 20.22992706298828, 21.46685028076172, 15.832618713378906, 12.54852294921875, 29.4685115814209, 19.162031173706055, 16.879283905029297, -19.954864501953125, 32.299415588378906, 20.30615234375, 38.565589904785156, -10.790611267089844, 22.399948120117188, 5.87213134765625, -38.792236328125, -5.433330535888672, 37.86339569091797, 36.06166458129883, 10.481842041015625, 1.861185073852539, 12.284263610839844, 16.351463317871094, -24.786117553710938, 8.119525909423828, -8.289070129394531, -0.7819843292236328, 3.403738021850586, -8.695220947265625, -0.05780029296875, 42.453582763671875, -41.78205490112305, 35.97681427001953, -7.327247619628906, 3.784822463989258, 34.293846130371094, -22.567279815673828, 3.8003921508789062, 12.087905883789062, 42.387794494628906, 18.049470901489258, 0.3773193359375, -1.4991035461425781, 35.22922897338867, 0.011577606201171875, 43.31500244140625, 12.871952056884766, 36.331790924072266, 7.661020278930664, 5.571586608886719, -1.0501365661621094, 23.263832092285156, 3.3022613525390625, 45.607078552246094, 2.7569122314453125, -0.26462554931640625, 2.2025222778320312, 37.69381332397461, 49.046539306640625, 35.178253173828125, 2.7972640991210938, 2.0063858032226562, -40.22706604003906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000229.npy"}
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 13.247392654418945, "std": 19.01605987548828, "min": -38.09685516357422, "p10": -4.275773239135741, "median": 7.044792175292969, "p90": 37.96122436523438, "max": 61.730224609375, "pos_frac": 0.84375, "sample": [-4.955310821533203, 9.764389038085938, 33.683921813964844, 2.2416648864746094, 25.302127838134766, -16.45587158203125, 1.022378921508789, 0.9052352905273438, 4.584381103515625, 44.05207824707031, 10.601432800292969, 41.083953857421875, 41.441017150878906, 28.64080810546875, 16.930419921875, 23.672224044799805, 19.95553207397461, 35.52519226074219, -24.850303649902344, 3.094085693359375, -38.09685516357422, 3.2676239013671875, 2.520050048828125, 23.597553253173828, 14.744247436523438, -2.690185546875, 37.022117614746094, 3.6521759033203125, 38.184356689453125, 37.440582275390625, 23.451614379882812, 2.8664398193359375, -1.4277057647705078, 3.1686344146728516, 2.8553695678710938, 6.655494689941406, 5.2973175048828125, 4.920507431030273, 61.730224609375, 6.528472900390625, 5.2969970703125, 4.650169372558594, 2.1959190368652344, 24.293663024902344, 54.491554260253906, 30.6234130859375, -14.397161483764648, 44.081764221191406, 3.2046890258789062, 0.030277252197265625, 30.129302978515625, -14.629142761230469, 34.243316650390625, 33.42695617675781, 12.391056060791016, 7.434089660644531, 2.37994384765625, -0.7678546905517578, 7.462322235107422, 3.286285400390625, 8.759981155395508, 32.604888916015625, -7.380699157714844, 12.094036102294922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000230.npy"}
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 18.866680145263672, "std": 20.33354377746582, "min": -41.711639404296875, "p10": -4.358425521850585, "median": 22.264423370361328, "p90": 45.75472793579102, "max": 54.98699188232422, "pos_frac": 0.828125, "sample": [41.274505615234375, 8.028915405273438, 27.972827911376953, 27.115386962890625, 1.2919921875, 21.91156768798828, 3.703571319580078, 44.28472900390625, 8.094352722167969, 9.801782608032227, 31.126699447631836, 2.2876739501953125, 23.088699340820312, -3.824970245361328, -5.25244140625, 27.380218505859375, 7.839599609375, 45.96349334716797, 12.8115234375, -11.510520935058594, 30.631256103515625, -4.160923004150391, 30.508285522460938, 33.34946823120117, 7.1751861572265625, 47.77464294433594, -4.4430694580078125, -16.19747543334961, 47.6722412109375, 26.35944366455078, 44.50199890136719, -33.9056510925293, 22.617279052734375, 45.267608642578125, 34.84534454345703, 14.271331787109375, 4.036649703979492, 16.228229522705078, 0.975006103515625, -3.2679672241210938, 54.98699188232422, 4.0081329345703125, 20.124393463134766, 16.761367797851562, 35.04230499267578, 8.231706619262695, 17.138031005859375, -7.158267974853516, 7.557407379150391, -1.7525749206542969, 36.49111557006836, 42.1954345703125, 23.320350646972656, 28.88562774658203, 36.29191589355469, 47.005149841308594, 25.492645263671875, 24.181859970092773, 25.86181640625, 27.895736694335938, -41.711639404296875, 13.847116470336914, 46.222808837890625, 48.91960906982422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000231.npy"}
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 12.77048110961914, "std": 22.941038131713867, "min": -38.309783935546875, "p10": -10.48172035217285, "median": 8.320854187011719, "p90": 44.47232093811035, "max": 66.60281372070312, "pos_frac": 0.734375, "sample": [-0.0660552978515625, 44.39484786987305, 31.89781951904297, 50.408485412597656, 4.561304092407227, 44.505523681640625, 26.49083709716797, 33.169898986816406, -0.5151596069335938, -17.164344787597656, -16.124614715576172, 4.873260498046875, -3.4668045043945312, 8.411605834960938, 12.440593719482422, 42.04080581665039, 1.9230899810791016, 1.6639842987060547, 19.592178344726562, 1.7569389343261719, 0.13505172729492188, 26.443832397460938, 19.274505615234375, 20.140411376953125, 7.547027587890625, 26.088626861572266, -5.850351333618164, 50.430789947509766, -8.52420425415039, 10.143989562988281, 13.536603927612305, 41.56809997558594, -9.321104049682617, -3.0486679077148438, 10.16241455078125, 1.0512008666992188, 1.270364761352539, -35.541351318359375, 53.411651611328125, 10.706947326660156, 1.1256141662597656, 46.82501220703125, -7.179557800292969, 63.287353515625, 40.89520263671875, 66.60281372070312, -10.942325592041016, -9.406974792480469, -4.9976654052734375, 7.960222244262695, 17.52661895751953, 38.52778625488281, 26.203094482421875, 7.994842529296875, 2.2173194885253906, -32.36906433105469, 0.2796916961669922, 42.97142791748047, 11.3226318359375, 19.85507583618164, -13.957054138183594, -38.309783935546875, 8.2301025390625, 12.228382110595703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000232.npy"}
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 10.630615234375, "std": 20.551122665405273, "min": -37.90801239013672, "p10": -9.22447395324707, "median": 6.166379928588867, "p90": 37.878220367431645, "max": 53.6844482421875, "pos_frac": 0.6875, "sample": [52.55713653564453, 11.477096557617188, 38.352195739746094, -13.781526565551758, 6.154850006103516, 16.641305923461914, 17.636314392089844, 15.58432388305664, -1.9946517944335938, -4.4766845703125, 6.177909851074219, 53.6844482421875, -3.652698516845703, -0.45545196533203125, -2.2134780883789062, 10.474838256835938, 35.41782760620117, 2.4968490600585938, -1.8029403686523438, 0.7832889556884766, 39.52915954589844, 5.598979949951172, 32.37609100341797, 35.77693176269531, -8.560440063476562, 19.82408905029297, -6.67791748046875, 18.14128875732422, -23.438270568847656, -23.249786376953125, 36.59185028076172, 8.779354095458984, 40.85005187988281, 16.792438507080078, 35.12596893310547, -37.90801239013672, 11.644172668457031, 2.154970169067383, -2.2061767578125, 1.1338348388671875, 5.479377746582031, -28.358078002929688, 25.0458984375, 0.25984954833984375, 13.369972229003906, -7.320220947265625, -5.794624328613281, 14.974517822265625, 36.77227783203125, 29.54541015625, -9.50905990600586, -8.006044387817383, 1.1001415252685547, 53.5941162109375, 21.677261352539062, 53.27516174316406, 3.268747329711914, -4.845569610595703, 35.99456787109375, 0.8600616455078125, 8.906906127929688, 0.2213592529296875, -17.21560287475586, 15.723381042480469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000233.npy"}
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 10.78758716583252, "std": 22.911170959472656, "min": -34.513526916503906, "p10": -14.12402572631836, "median": 4.406772613525391, "p90": 44.59977569580079, "max": 52.470001220703125, "pos_frac": 0.65625, "sample": [4.608634948730469, -31.40887451171875, 5.576148986816406, -7.672454833984375, 22.285987854003906, -1.5486717224121094, 2.4554100036621094, -10.038009643554688, 30.303573608398438, -12.325767517089844, 11.232315063476562, 20.546051025390625, 45.704383850097656, 15.73994255065918, 46.41139221191406, 31.781322479248047, -15.0399169921875, -28.041919708251953, 38.979339599609375, -2.9575252532958984, 2.9510498046875, 1.7999038696289062, 42.96532440185547, 52.470001220703125, -1.1505146026611328, -5.5530242919921875, 41.09682083129883, -1.121307373046875, -13.852088928222656, 5.319118499755859, 39.821327209472656, 48.60311508178711, 45.300254821777344, 40.714439392089844, -0.3217010498046875, -34.513526916503906, -1.079376220703125, 2.6842117309570312, 2.890705108642578, -12.885459899902344, 30.227890014648438, 35.31156921386719, 3.730091094970703, 20.668487548828125, 40.56116485595703, 3.158924102783203, 2.3589935302734375, 15.805473327636719, 45.58027648925781, -6.096200942993164, -14.240570068359375, 13.130121231079102, -28.444751739501953, 20.62142562866211, -33.092140197753906, 0.3919410705566406, 28.32861328125, 10.444374084472656, -6.602691650390625, 4.2049102783203125, 12.687973022460938, 49.347434997558594, 32.16905212402344, -12.577411651611328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000234.npy"}
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 17.92596435546875, "std": 20.281660079956055, "min": -34.185081481933594, "p10": -2.1759710311889644, "median": 17.807273864746094, "p90": 44.9944149017334, "max": 55.59253692626953, "pos_frac": 0.84375, "sample": [1.3616371154785156, 15.15838623046875, 18.692306518554688, 23.702938079833984, 32.11262512207031, 10.594886779785156, 9.055931091308594, 55.59253692626953, 53.12284851074219, 30.961135864257812, 20.313217163085938, 39.47711944580078, 14.126392364501953, 25.097911834716797, 38.73228454589844, 34.18720626831055, 20.09037971496582, 3.9067420959472656, 34.30929183959961, 46.59392166137695, 3.8809661865234375, 7.907535552978516, 12.343109130859375, 40.24313735961914, 41.300010681152344, 28.416641235351562, -2.3116378784179688, -3.6458740234375, 44.95754623413086, -31.52130126953125, 25.836830139160156, 2.017976760864258, 6.543571472167969, 16.9222412109375, 45.37814712524414, 7.982761383056641, 20.225788116455078, 4.289894104003906, 22.331451416015625, -29.29254150390625, -1.8442630767822266, 38.77693176269531, 12.860382080078125, 6.331951141357422, 32.83668518066406, 5.083751678466797, -0.6074905395507812, 40.94649124145508, 55.054656982421875, 4.99305534362793, 5.023468017578125, -10.205032348632812, 49.202545166015625, 20.630889892578125, 1.6741561889648438, -34.185081481933594, 24.40996551513672, 30.911087036132812, 45.010215759277344, 3.235504150390625, 28.544464111328125, -1.859415054321289, 1.8307647705078125, -2.3899459838867188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000235.npy"}
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 13.915044784545898, "std": 24.07503318786621, "min": -54.446380615234375, "p10": -7.814044189453124, "median": 7.904017448425293, "p90": 46.34733085632324, "max": 61.71009826660156, "pos_frac": 0.6875, "sample": [40.5274772644043, -4.0052337646484375, 3.106842041015625, 33.34132766723633, 8.756393432617188, 1.9472503662109375, 39.43788146972656, 15.669570922851562, -4.582771301269531, 14.288349151611328, 29.441356658935547, 7.260284423828125, -3.3039379119873047, 3.6551666259765625, -6.495212554931641, 2.7900161743164062, 34.15742492675781, 43.45933532714844, 21.275100708007812, 47.001220703125, 1.8178119659423828, -1.7802276611328125, -0.163116455078125, -1.3088798522949219, 12.115699768066406, 7.703643798828125, 26.58948516845703, 40.5369873046875, 49.36503601074219, 29.76862335205078, 46.4173698425293, -0.4632110595703125, 1.6199188232421875, 11.94754409790039, 33.00846862792969, -15.906021118164062, 1.2581939697265625, 61.71009826660156, -1.3951644897460938, 8.104391098022461, -31.634050369262695, 12.547588348388672, 42.858856201171875, 46.18390655517578, 31.995346069335938, 22.022024154663086, -1.0218429565429688, -6.986991882324219, 49.734004974365234, 3.0374794006347656, -2.0711708068847656, -54.446380615234375, -48.61918640136719, 52.05643844604492, 51.02838134765625, 45.911277770996094, 8.658889770507812, -10.207033157348633, -8.168495178222656, -9.554737091064453, 45.98027801513672, -1.8211669921875, 6.819189071655273, 7.5858306884765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000236.npy"}
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 15.910423278808594, "std": 24.301307678222656, "min": -49.17250061035156, "p10": -15.752486419677732, "median": 12.611647605895996, "p90": 46.86309204101563, "max": 64.00025939941406, "pos_frac": 0.78125, "sample": [40.79694366455078, 42.554962158203125, 0.915618896484375, 51.759552001953125, 0.09680938720703125, -10.776657104492188, 8.24212646484375, 13.499683380126953, 6.985858917236328, 38.39130401611328, 17.36432647705078, 20.29150390625, -13.080986022949219, -0.756988525390625, 38.0212516784668, 9.070899963378906, -1.325653076171875, 29.723678588867188, -24.397445678710938, 43.80779266357422, -49.17250061035156, 37.23291778564453, 15.668380737304688, 37.64325714111328, 58.96165466308594, -3.397371292114258, 34.519317626953125, 39.20441436767578, -16.897415161132812, 12.857641220092773, 38.186279296875, 20.578811645507812, 7.449073791503906, -33.167388916015625, 26.936643600463867, 2.8552818298339844, 4.7532806396484375, -32.53876495361328, 53.72981262207031, 4.405233383178711, 22.96270751953125, 12.365653991699219, 9.79388427734375, 9.822158813476562, 6.2310943603515625, 0.018459320068359375, 27.69353675842285, 37.357421875, 21.44921875, 47.99101257324219, 43.12116241455078, 6.762298583984375, 51.336395263671875, 64.00025939941406, -19.336685180664062, 5.105960845947266, -2.2253475189208984, 44.23127746582031, 5.891727447509766, 18.311195373535156, -2.6474571228027344, 7.63226318359375, -20.9642333984375, 50.36998748779297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000237.npy"}
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 16.17724609375, "std": 23.57661247253418, "min": -40.977691650390625, "p10": -12.605672645568847, "median": 13.760161399841309, "p90": 48.05127029418945, "max": 56.58648681640625, "pos_frac": 0.734375, "sample": [-14.409191131591797, 6.151252746582031, 48.491641998291016, -12.006584167480469, 48.20764923095703, 56.58648681640625, 0.3708534240722656, 8.0738525390625, 48.103233337402344, -12.862424850463867, 9.355964660644531, -7.211761474609375, 47.930023193359375, 14.278884887695312, 32.81695556640625, 22.61507797241211, -1.4101066589355469, -8.932943344116211, 21.670333862304688, 51.24970245361328, 36.409515380859375, 29.922943115234375, 38.55495834350586, 31.554550170898438, 11.92752456665039, 46.58697509765625, 19.6080322265625, -5.285743713378906, -21.27342987060547, 35.98844909667969, 45.27220153808594, 34.277706146240234, 4.725057601928711, 44.80010986328125, -1.7979087829589844, 6.339487075805664, 21.86773681640625, -2.898527145385742, 33.72447204589844, 43.54388427734375, 3.5935134887695312, -11.657386779785156, 1.2715644836425781, 7.6748809814453125, 30.622352600097656, 33.875953674316406, 56.22111511230469, 22.854034423828125, 13.241437911987305, 23.704193115234375, 4.5638427734375, 30.996742248535156, 44.949989318847656, -15.980119705200195, -8.359445571899414, 0.6492042541503906, -40.977691650390625, 6.307975769042969, 54.651275634765625, -5.1268768310546875, 14.299041748046875, -22.094947814941406, -25.268329620361328, 2.414651870727539], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000238.npy"}
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 16.441516876220703, "std": 25.16668128967285, "min": -42.2532958984375, "p10": -11.023306274414061, "median": 11.785194396972656, "p90": 53.315637969970716, "max": 67.63279724121094, "pos_frac": 0.765625, "sample": [9.691421508789062, 9.953155517578125, 29.981826782226562, 0.22383499145507812, 49.85430908203125, 10.41900634765625, -7.584770202636719, 0.22159576416015625, -7.335052490234375, 10.885749816894531, -11.242813110351562, 23.554275512695312, -0.9028511047363281, 55.66181945800781, 42.03204345703125, 2.892242431640625, -10.436653137207031, -20.29157829284668, 4.718292236328125, 47.56573486328125, 2.5404319763183594, 10.713813781738281, 41.03401184082031, 54.79906463623047, 55.641170501708984, 9.88578987121582, 27.82018280029297, 37.8037223815918, 22.776565551757812, 8.172958374023438, 48.86593246459961, 21.723283767700195, -27.724143981933594, -10.511123657226562, 13.131027221679688, 42.969024658203125, -1.3874492645263672, 7.702665328979492, -15.27734375, 25.6815185546875, -1.3121414184570312, 35.755592346191406, 13.357505798339844, 7.4883880615234375, -37.271514892578125, 33.43916320800781, 58.82701110839844, 62.41644287109375, 67.63279724121094, -30.833492279052734, 0.03782463073730469, 6.2714080810546875, -42.2532958984375, 44.68528747558594, 15.270355224609375, 36.02427673339844, 55.162681579589844, -1.9848480224609375, 18.213943481445312, 33.95109558105469, 2.008443832397461, 19.640724182128906, 26.82215118408203, 12.684638977050781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000239.npy"}
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 21.3055419921875, "std": 24.768512725830078, "min": -49.12628936767578, "p10": -5.172893142700195, "median": 18.42636489868164, "p90": 55.522087860107426, "max": 65.01725769042969, "pos_frac": 0.796875, "sample": [30.183757781982422, -5.4995269775390625, 46.296875, 1.1020050048828125, 36.260826110839844, 63.74761962890625, 18.20361328125, 31.203697204589844, -18.61809730529785, -14.060050964355469, 5.549522399902344, -3.1327342987060547, 38.780555725097656, 24.121938705444336, 38.62935256958008, 57.91230010986328, 40.83673858642578, -11.120532989501953, 52.637306213378906, 26.988845825195312, -4.410747528076172, 40.596839904785156, 15.534618377685547, -20.756629943847656, 17.05371856689453, 42.244659423828125, 9.314376831054688, 47.37646484375, 64.83476257324219, 18.64911651611328, 45.56829071044922, 65.01725769042969, 4.854881286621094, 1.428955078125, 1.1379470825195312, 41.29811477661133, 16.381187438964844, 40.146148681640625, 50.836326599121094, 1.536844253540039, 42.61894989013672, 4.551342010498047, -5.774204254150391, 24.59600830078125, 58.198875427246094, -49.12628936767578, -1.2669696807861328, -0.05176544189453125, 21.206815719604492, 4.14752197265625, 17.569683074951172, 54.68611145019531, -0.084014892578125, 20.57080078125, 0.2908897399902344, 8.105365753173828, 55.88036346435547, 56.67155456542969, 27.81414031982422, 6.137651443481445, 46.032676696777344, 2.1666183471679688, -1.5514144897460938, 11.496780395507812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000240.npy"}
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 13.889958381652832, "std": 24.913246154785156, "min": -47.06633758544922, "p10": -9.534612274169922, "median": 10.953893661499023, "p90": 46.692653274536134, "max": 69.70549774169922, "pos_frac": 0.65625, "sample": [-8.956436157226562, -7.982856750488281, 41.470314025878906, -23.35479164123535, 26.38782501220703, -9.185760498046875, 18.504989624023438, 37.31230926513672, 35.68928527832031, 37.07402038574219, -5.7459869384765625, 43.091651916503906, -9.684120178222656, 10.483087539672852, 2.1939544677734375, 54.739891052246094, 1.4516029357910156, 2.7127933502197266, -2.6946182250976562, 38.44120788574219, 26.560134887695312, 14.386825561523438, -5.1849365234375, 3.0270004272460938, 25.552688598632812, -3.161529541015625, 47.20246505737305, -0.19902610778808594, -3.3418197631835938, -12.305618286132812, 52.03650665283203, 62.934722900390625, 61.64900207519531, 21.46502685546875, -1.331085205078125, -33.222808837890625, -4.100868225097656, 19.412261962890625, 12.350067138671875, 2.851459503173828, -47.06633758544922, 1.9980888366699219, -0.08869361877441406, 46.04026794433594, 11.424699783325195, 16.497222900390625, -28.574005126953125, 69.70549774169922, 20.361106872558594, -29.45550537109375, 46.592071533203125, 27.47386932373047, 19.714462280273438, 13.977325439453125, 46.73575973510742, -5.697330474853516, 3.09161376953125, 41.629241943359375, 37.10993194580078, 26.476032257080078, 2.718120574951172, 6.331262588500977, -4.097963333129883, -2.4682064056396484], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000241.npy"}
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 19.99578857421875, "std": 27.287883758544922, "min": -53.576087951660156, "p10": -3.682270812988281, "median": 15.830272674560547, "p90": 59.079270935058595, "max": 84.89300537109375, "pos_frac": 0.859375, "sample": [-37.59076690673828, 21.015594482421875, 14.494499206542969, 3.9646453857421875, 32.16388702392578, -15.658756256103516, 58.30389404296875, 35.748443603515625, 15.880455017089844, 31.852859497070312, 40.65165710449219, 26.101417541503906, 0.06326675415039062, 72.92623138427734, 55.57645797729492, -4.4566497802734375, 62.29243087768555, -3.189910888671875, 6.5078887939453125, 14.730335235595703, 11.08026123046875, 26.054931640625, 59.41157531738281, 3.6371326446533203, 0.7975387573242188, 16.16009521484375, 15.78009033203125, 3.6378707885742188, 43.25788879394531, 25.133438110351562, 38.61997985839844, 1.3497505187988281, 69.44953918457031, 8.108078002929688, 40.6654052734375, 11.234764099121094, 0.4421653747558594, 1.2660484313964844, 84.89300537109375, 8.787967681884766, -20.25495147705078, 23.828590393066406, 26.65831756591797, 2.6927642822265625, 22.02918243408203, 0.9178085327148438, 35.326385498046875, 7.829841613769531, 51.9329833984375, 50.73722457885742, 63.497222900390625, 29.579492568969727, 70.26754760742188, -3.8522682189941406, 0.03397560119628906, 38.670265197753906, 41.046356201171875, 1.2442741394042969, 7.206321716308594, -53.576087951660156, 4.139595031738281, -3.2856101989746094, -34.47400665283203, 16.389856338500977], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000242.npy"}
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 17.38387107849121, "std": 26.08226203918457, "min": -48.317779541015625, "p10": -11.644943618774414, "median": 15.635169982910156, "p90": 53.74440460205078, "max": 63.705360412597656, "pos_frac": 0.75, "sample": [2.2350616455078125, 41.104339599609375, 25.164329528808594, -20.858856201171875, 53.78289794921875, 3.612184524536133, -48.317779541015625, 5.8754730224609375, 45.713218688964844, -1.4122543334960938, 37.13177490234375, 16.110671997070312, -1.8841094970703125, 10.301864624023438, 13.643196105957031, 62.69483947753906, -0.21323776245117188, -1.4868850708007812, 54.97359085083008, -3.8919448852539062, 1.3651885986328125, 54.704498291015625, 19.862380981445312, -3.059123992919922, 30.158981323242188, 25.118850708007812, 2.7125320434570312, 35.62879943847656, 53.49446105957031, 26.142135620117188, -27.448074340820312, -30.769969940185547, 30.015838623046875, -7.795270919799805, 50.21873474121094, 19.0991153717041, 39.98979187011719, 53.58892822265625, -11.647705078125, 15.15966796875, 61.299781799316406, 30.264686584472656, 28.201332092285156, 63.705360412597656, 4.0972900390625, 0.12725830078125, 1.9888858795166016, 7.396520614624023, 53.65458679199219, 50.114662170410156, 2.6763916015625, 25.63951873779297, 4.076911926269531, 42.4478759765625, -25.16922378540039, 22.04681396484375, 19.319507598876953, 18.586551666259766, -11.638500213623047, -9.699455261230469, -20.239830017089844, 60.55159378051758, 4.7503204345703125, 7.550811767578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000243.npy"}
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 23.37171173095703, "std": 25.796049118041992, "min": -38.5531005859375, "p10": -1.3637868881225577, "median": 17.905658721923828, "p90": 60.22068023681641, "max": 69.79051208496094, "pos_frac": 0.875, "sample": [9.063224792480469, 15.759574890136719, 12.874237060546875, 18.45795440673828, -16.801467895507812, 17.307233810424805, -2.6458358764648438, 58.900062561035156, 22.857072830200195, -20.557952880859375, 19.70962142944336, 53.23641586303711, 1.3442840576171875, 54.89796447753906, 25.970909118652344, 5.4793548583984375, 42.84016418457031, 62.456199645996094, -33.289512634277344, 54.036407470703125, 18.614990234375, 12.966987609863281, -38.5531005859375, 44.9276237487793, 9.095199584960938, 52.73839569091797, 28.301904678344727, 45.214263916015625, 60.786659240722656, 10.222780227661133, 66.74124145507812, 8.134788513183594, 2.874940872192383, -6.809638977050781, 1.1272506713867188, 11.884096145629883, -0.45591163635253906, 41.6641845703125, 52.34071350097656, 69.79051208496094, 1.662485122680664, 50.926788330078125, 0.9850788116455078, 13.402069091796875, 2.0986251831054688, 63.505794525146484, 20.513336181640625, 43.394775390625, 3.349559783935547, 16.123798370361328, 1.4812965393066406, 11.134468078613281, 58.66337966918945, 17.353363037109375, 64.27342224121094, 30.200729370117188, -1.7528762817382812, 23.172691345214844, 3.730327606201172, 5.488166809082031, 29.027542114257812, 24.703092575073242, 61.30409240722656, 57.54389953613281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000244.npy"}
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 26.814062118530273, "std": 27.470849990844727, "min": -37.57301330566406, "p10": -2.714118766784668, "median": 27.529865264892578, "p90": 64.28683166503907, "max": 76.68413543701172, "pos_frac": 0.859375, "sample": [30.57709503173828, 36.1824951171875, 32.32223129272461, 66.730224609375, 12.72047233581543, 4.480628967285156, 0.9915580749511719, 57.16523742675781, 52.560791015625, 33.25550079345703, 1.6371879577636719, 12.1182861328125, 59.88414001464844, 39.785579681396484, 32.53358459472656, 57.52156066894531, -4.0532379150390625, 54.443756103515625, -22.946197509765625, 76.68413543701172, 72.4303207397461, 3.0865554809570312, -0.149261474609375, 24.442842483520508, 53.63022232055664, 17.224594116210938, 15.184814453125, -2.3389739990234375, 56.93182373046875, 0.7691936492919922, 42.96308898925781, -31.106821060180664, 0.8119125366210938, 39.84247589111328, 5.3760986328125, 28.59105682373047, -8.884048461914062, 18.31812858581543, 48.54584884643555, 70.18152618408203, 26.676856994628906, 2.1409759521484375, 68.9552993774414, 10.098060607910156, -2.8748950958251953, 29.530866622924805, 62.06743621826172, 64.87569427490234, 0.7924957275390625, 3.31988525390625, 28.38287353515625, 53.2263069152832, -9.700714111328125, 46.99546432495117, 68.92893981933594, 22.830699920654297, -37.57301330566406, 62.912818908691406, 21.94792938232422, 3.738126754760742, 6.9122467041015625, 39.58905029296875, 16.883922576904297, 34.9961051940918], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000245.npy"}
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 22.51306915283203, "std": 33.81535720825195, "min": -57.862701416015625, "p10": -12.217256164550781, "median": 16.405277252197266, "p90": 63.244164657592776, "max": 90.79730224609375, "pos_frac": 0.75, "sample": [53.80764389038086, 60.094207763671875, 59.78466033935547, 59.47866439819336, 12.526611328125, 3.5991363525390625, -51.40645217895508, 61.81205749511719, 19.49120330810547, -8.561670303344727, 48.30387878417969, 58.12889099121094, 26.9007568359375, -3.613954544067383, 45.048004150390625, 44.29179763793945, 12.600105285644531, -4.0941314697265625, -5.176429748535156, -0.3944244384765625, 4.62115478515625, 44.26435470581055, 0.6050949096679688, 63.59724807739258, 77.06195831298828, -11.996017456054688, 20.857254028320312, 90.79730224609375, -12.813873291015625, 5.0644073486328125, 10.634309768676758, 62.42030334472656, 11.14453125, 5.781194686889648, -29.3978271484375, 25.067642211914062, -52.954185485839844, 10.136470794677734, 69.6291275024414, -0.0350341796875, 39.52965545654297, -34.69341278076172, -8.624504089355469, 69.11478424072266, 20.378917694091797, 2.53753662109375, 45.277503967285156, 59.38081359863281, 53.46369934082031, -57.862701416015625, 1.877105712890625, 21.995811462402344, 47.561729431152344, 79.0816421508789, 10.843372344970703, 2.6937637329101562, 58.460182189941406, 63.978782653808594, -1.2265625, -12.31207275390625, 29.062225341796875, 13.319351196289062, 47.31085205078125, 2.5818557739257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000246.npy"}
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 12.170625686645508, "std": 29.670921325683594, "min": -66.2921142578125, "p10": -20.62179412841796, "median": 8.8781099319458, "p90": 53.88974761962891, "max": 87.58526611328125, "pos_frac": 0.6875, "sample": [-32.053367614746094, 12.42603874206543, 33.654144287109375, 3.9587326049804688, 2.9397926330566406, -8.682891845703125, 3.6640777587890625, 64.22370910644531, 68.40806579589844, -7.2408905029296875, 20.839401245117188, 26.748916625976562, 47.35224151611328, 24.33159065246582, 7.81646728515625, 29.606536865234375, 53.00187683105469, 19.165470123291016, -3.2077407836914062, -25.899169921875, 87.58526611328125, 59.987579345703125, -47.282005310058594, -4.485569000244141, 4.7806396484375, -5.5450439453125, 37.29665756225586, -4.283149719238281, -23.44135284423828, 34.581703186035156, 15.267181396484375, 6.1108856201171875, 9.912994384765625, 3.1799468994140625, -0.830810546875, 7.234504699707031, 42.56122589111328, -66.2921142578125, -11.838821411132812, 0.10813331604003906, -2.7528038024902344, 10.978958129882812, 0.5451507568359375, 54.270263671875, 59.30548095703125, 8.420459747314453, 0.4185752868652344, -12.414588928222656, 11.969867706298828, 30.850921630859375, 13.646652221679688, 33.88154602050781, -14.042823791503906, 10.888635635375977, -52.71436309814453, 32.42650604248047, 40.928653717041016, 55.11522674560547, -6.9235076904296875, -42.46913146972656, -7.22930908203125, 43.946319580078125, 14.876693725585938, 9.335760116577148], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000247.npy"}
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 23.729290008544922, "std": 35.61945724487305, "min": -43.60997009277344, "p10": -16.93644027709961, "median": 13.997364044189453, "p90": 70.18956451416015, "max": 90.0232925415039, "pos_frac": 0.71875, "sample": [5.039878845214844, 57.112884521484375, -30.010818481445312, -36.48728942871094, 80.7877426147461, 64.23391723632812, 55.909698486328125, 53.00830078125, -1.8076648712158203, -15.79279899597168, 0.8436374664306641, 69.46501159667969, 12.729888916015625, -34.59075927734375, 68.60783386230469, 34.49476623535156, 72.04083251953125, -11.504753112792969, -42.90764236450195, 12.331829071044922, -25.046897888183594, 4.793098449707031, 25.192344665527344, 11.128528594970703, 66.0811996459961, 48.80887222290039, 58.429359436035156, 10.906974792480469, 43.54697799682617, 69.477783203125, -16.02147674560547, 63.6417236328125, -1.3466644287109375, -15.035942077636719, 70.49461364746094, 68.21851348876953, 7.53839111328125, 17.351577758789062, 20.79871940612793, 15.841178894042969, 41.839317321777344, 56.04484558105469, 13.596641540527344, 81.33003234863281, 78.3665771484375, 31.10739517211914, 3.351766586303711, -1.2853279113769531, 4.600013732910156, 38.60070037841797, 90.0232925415039, 8.567150115966797, 30.661331176757812, 14.398086547851562, -43.60997009277344, 5.660310745239258, -7.6578369140625, 74.39413452148438, 65.22875213623047, -1.6047134399414062, -8.151578903198242, 6.730365753173828, -17.328567504882812, -4.491401672363281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000248.npy"}
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 9.62271785736084, "std": 33.44557189941406, "min": -82.80052185058594, "p10": -33.13845157623291, "median": 11.018248558044434, "p90": 54.31418838500977, "max": 68.68882751464844, "pos_frac": 0.703125, "sample": [17.576698303222656, -29.91596794128418, 5.2275848388671875, 40.398040771484375, 9.770103454589844, -17.820297241210938, -44.301536560058594, 32.43804931640625, 12.20318603515625, 32.14460754394531, 25.96449089050293, 52.72410202026367, 1.745697021484375, 53.585609436035156, -62.440093994140625, -15.288177490234375, 16.837188720703125, -39.482261657714844, -46.71185302734375, -27.808700561523438, 3.1291141510009766, 0.2942657470703125, -21.608924865722656, 6.817806243896484, 28.610740661621094, 12.322410583496094, -0.6636257171630859, -10.66961669921875, 65.50469970703125, -12.08477783203125, -22.078895568847656, 61.03776550292969, 11.367109298706055, 24.541105270385742, 39.84101486206055, -44.9609375, 17.29547882080078, 3.2109832763671875, 11.020566940307617, -23.73773193359375, 4.839262008666992, 67.84893798828125, 53.744781494140625, 43.8118896484375, 11.01593017578125, 7.836353302001953, -34.51951599121094, 28.744121551513672, 17.490150451660156, 11.116561889648438, 27.32969856262207, 63.7431755065918, 43.331695556640625, 1.4771347045898438, 15.295377731323242, 11.3458251953125, -14.679628372192383, 54.55821990966797, 68.68882751464844, -82.80052185058594, 67.51480102539062, -26.68379020690918, 4.392852783203125, 4.376739501953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000249.npy"}
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 24.05508041381836, "std": 32.998592376708984, "min": -84.99110412597656, "p10": -7.018776321411132, "median": 19.928841590881348, "p90": 68.60685653686525, "max": 84.43354034423828, "pos_frac": 0.8125, "sample": [45.315040588378906, 13.387868881225586, -7.267608642578125, 12.89326286315918, 17.45367431640625, 29.5562744140625, 36.463871002197266, 78.71434020996094, 69.69422149658203, 5.835838317871094, 52.741451263427734, -31.273178100585938, 36.64558410644531, 9.412155151367188, 44.844383239746094, 0.20316696166992188, 1.82281494140625, -84.99110412597656, 59.235511779785156, 1.8802719116210938, 29.180051803588867, -4.843441009521484, 25.779541015625, -34.70842742919922, 57.302024841308594, -9.378875732421875, 72.84251403808594, 35.21466064453125, 1.8714828491210938, 33.168731689453125, 11.379451751708984, 24.567794799804688, 66.06967163085938, 3.235797882080078, 29.783790588378906, 22.072265625, 0.31638336181640625, 62.375606536865234, 84.43354034423828, 16.575347900390625, -46.473228454589844, -1.8938026428222656, 30.536762237548828, 1.6458740234375, 77.69474792480469, 46.8641471862793, 39.27130126953125, 50.79770278930664, 71.40928649902344, 56.9190788269043, -1.60516357421875, -10.557609558105469, 46.18147277832031, 3.371429443359375, 2.95648193359375, 8.930549621582031, 78.90826416015625, 62.90418243408203, 12.359928131103516, -0.2762775421142578, -6.438167572021484, 64.54215240478516, 17.785417556762695, 13.8848876953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000250.npy"}
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 15.458715438842773, "std": 34.142791748046875, "min": -64.12371826171875, "p10": -27.896794128417966, "median": 15.290432929992676, "p90": 59.39466094970705, "max": 87.7523422241211, "pos_frac": 0.703125, "sample": [1.6894607543945312, 80.7568130493164, 81.13983917236328, -0.026372909545898438, 38.08293914794922, 42.11774444580078, -64.12371826171875, 19.509624481201172, 23.629371643066406, 87.7523422241211, 2.1777114868164062, -61.366294860839844, 8.954803466796875, -22.51873016357422, 6.935220718383789, -16.64571762084961, 30.238285064697266, 78.984375, 8.549581527709961, -54.07286071777344, -22.45752716064453, 76.66876983642578, -2.0991344451904297, -30.145645141601562, 25.967971801757812, -13.149940490722656, 16.771141052246094, 54.4083251953125, -10.936149597167969, 2.5201644897460938, 64.24966430664062, 49.745025634765625, 8.489986419677734, 20.884273529052734, 36.803401947021484, 24.876380920410156, 29.84778594970703, 29.659271240234375, -43.81584167480469, -1.0480384826660156, 47.18618392944336, 3.2471923828125, 53.91876983642578, 14.503780364990234, 10.33489990234375, 29.473541259765625, 9.945526123046875, 12.53460693359375, 19.675704956054688, 30.972320556640625, 61.53166198730469, 3.2079010009765625, 48.89231491088867, -24.813316345214844, -4.406074523925781, -1.1510181427001953, 36.98268127441406, -33.62957763671875, 22.650251388549805, -29.218284606933594, -15.720115661621094, 22.9276123046875, 45.229881286621094, 16.077085494995117], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000251.npy"}
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 14.927448272705078, "std": 37.03622817993164, "min": -83.53823852539062, "p10": -36.59866142272948, "median": 9.254096984863281, "p90": 64.60034637451173, "max": 88.49456787109375, "pos_frac": 0.734375, "sample": [25.786468505859375, -56.06495666503906, -9.003786087036133, -48.866119384765625, 41.426177978515625, -19.15619659423828, 5.857471466064453, 40.96913146972656, 0.7131156921386719, 77.89617156982422, 66.40362548828125, 40.459938049316406, 8.721511840820312, 4.657768249511719, 84.60324096679688, -10.83453369140625, 77.85447692871094, -47.717926025390625, 8.688941955566406, -0.19128799438476562, -44.27808380126953, 12.527399063110352, 29.231552124023438, 8.24959945678711, 43.2135009765625, 6.146087646484375, 9.78668212890625, -22.506851196289062, 61.84666061401367, 6.158336639404297, 36.821632385253906, 86.53239440917969, 57.08345413208008, 8.060111999511719, 28.03082275390625, 3.5277061462402344, 13.411239624023438, 17.35175895690918, 26.17630958557129, -26.05600357055664, 39.580814361572266, -16.225852966308594, 1.16790771484375, 51.538089752197266, 3.8653945922851562, 88.49456787109375, -22.303367614746094, 0.5615921020507812, -83.53823852539062, -43.2197151184082, -11.966867446899414, 4.154327392578125, 54.736175537109375, -41.116943359375, 6.867485046386719, 62.484153747558594, 24.64258575439453, 65.50728607177734, 16.45965576171875, 44.072792053222656, 40.21905517578125, 19.308494567871094, -13.313310623168945, 9.863052368164062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000252.npy"}
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 16.78863525390625, "std": 33.18363952636719, "min": -57.011898040771484, "p10": -25.55415630340576, "median": 12.643257141113281, "p90": 65.19212951660157, "max": 85.07160186767578, "pos_frac": 0.671875, "sample": [44.301265716552734, 1.1096115112304688, 77.99673461914062, -5.361274719238281, -1.3247928619384766, 28.326953887939453, 15.811271667480469, 57.30671310424805, 45.58421325683594, 18.53837776184082, 40.611358642578125, -8.159751892089844, 51.42338180541992, 2.2776870727539062, -3.061521530151367, -57.011898040771484, -16.706886291503906, 35.37646484375, -35.7286376953125, 41.80194854736328, 33.84016418457031, 37.6158447265625, -42.25997543334961, 40.979759216308594, 18.355987548828125, -2.4960784912109375, 65.59744262695312, 13.49066162109375, -3.5110855102539062, 27.21875762939453, 8.508636474609375, 7.331794738769531, 58.83034133911133, -3.8523216247558594, 5.7294921875, 84.06155395507812, 27.079193115234375, -45.15369415283203, 8.914962768554688, 30.066926956176758, 6.96875, -9.428817749023438, 12.688201904296875, -4.054210662841797, 75.41236114501953, -21.372373580932617, -13.494867324829102, 24.792388916015625, 78.30370330810547, 19.55765724182129, 12.598312377929688, 68.63619995117188, 85.07160186767578, 3.9420242309570312, 64.24639892578125, -26.2733154296875, 13.5181884765625, -25.638307571411133, 46.370079040527344, -25.357803344726562, -29.836910247802734, -4.535341262817383, 8.643932342529297, 10.25533676147461], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000253.npy"}
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 20.583770751953125, "std": 34.1507568359375, "min": -63.840354919433594, "p10": -13.167141342163085, "median": 11.171609878540039, "p90": 72.81505355834963, "max": 85.04656982421875, "pos_frac": 0.65625, "sample": [-4.775634765625, 17.794662475585938, 63.3414306640625, 5.1422119140625, -4.1295013427734375, 51.428382873535156, 43.02875900268555, 67.07083129882812, 2.778360366821289, -0.000598907470703125, -3.4130401611328125, 9.239795684814453, 41.236183166503906, 59.966400146484375, 53.61054992675781, 59.31480026245117, 85.04656982421875, 25.189239501953125, -13.602455139160156, 39.94324493408203, -4.756507873535156, 84.50737762451172, -9.366508483886719, 75.27686309814453, -4.540351867675781, -63.840354919433594, -9.69580078125, 80.5577392578125, 20.49787712097168, -1.3506584167480469, 83.051513671875, -15.33575439453125, 5.8597412109375, -2.4360694885253906, -2.0969505310058594, 29.47956085205078, 1.9736671447753906, 58.80493927001953, 0.5641498565673828, 42.33732604980469, 13.103424072265625, -12.151409149169922, 15.407894134521484, -19.89751434326172, -11.636739730834961, 79.61478424072266, 3.4017105102539062, 76.50724792480469, 20.219879150390625, 13.25897216796875, 15.946792602539062, -19.790672302246094, -8.942527770996094, 64.25718688964844, 62.57905578613281, -4.728645324707031, 20.028297424316406, 3.363933563232422, 56.19355010986328, 32.57679748535156, 7.4767303466796875, -19.73968505859375, 5.384124755859375, -42.77381134033203], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000254.npy"}
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 22.705631256103516, "std": 38.189796447753906, "min": -65.41065979003906, "p10": -18.695563507080074, "median": 15.253151893615723, "p90": 82.60808715820315, "max": 99.1187744140625, "pos_frac": 0.71875, "sample": [77.67202758789062, 12.983566284179688, 99.1187744140625, 20.77471923828125, 60.56956481933594, 35.68060302734375, -2.6523513793945312, -13.768951416015625, 13.271625518798828, 59.94514465332031, -31.6929931640625, -0.8444347381591797, 20.40267562866211, 37.092262268066406, -45.24223327636719, 7.478343963623047, 38.77948760986328, 12.400611877441406, 16.530296325683594, -64.95201873779297, 77.881591796875, -7.254539489746094, 11.728668212890625, 9.6400146484375, 84.3218994140625, 38.46849822998047, -43.9034423828125, 24.466777801513672, 84.4346694946289, -6.775543212890625, 78.60919189453125, 51.463478088378906, -20.806968688964844, 37.26240921020508, -0.599945068359375, 13.976007461547852, 8.99528694152832, 38.06520080566406, 88.05562591552734, 7.560827255249023, 4.427549362182617, 35.02656555175781, 66.5081787109375, -2.20660400390625, 26.224565505981445, -2.0020599365234375, 13.899742126464844, 94.02470397949219, -43.57726287841797, 22.36963653564453, 2.5651016235351562, 84.37435913085938, 12.896080017089844, 85.71000671386719, 29.471702575683594, 21.205398559570312, 2.5508079528808594, 43.68437576293945, 65.15310668945312, -65.41065979003906, -0.9728927612304688, -1.7339859008789062, 29.951217651367188, -0.11573028564453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000255.npy"}
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 22.016427993774414, "std": 39.52329635620117, "min": -84.0827407836914, "p10": -20.17053470611572, "median": 20.36073112487793, "p90": 76.27729415893556, "max": 91.02099609375, "pos_frac": 0.703125, "sample": [-16.757041931152344, 28.179458618164062, 46.604736328125, 34.60847473144531, 42.51817321777344, -22.529937744140625, -20.547204971313477, 82.4227294921875, 31.134428024291992, 1.815439224243164, -34.97877502441406, 2.5334339141845703, -18.008316040039062, 90.75651550292969, 9.041580200195312, 88.1696548461914, 3.2382049560546875, 6.3889617919921875, -84.0827407836914, 45.700469970703125, 3.659465789794922, 36.490692138671875, -14.192991256713867, 15.768218994140625, -16.694305419921875, -61.08753967285156, 28.06597137451172, -8.453605651855469, 34.44463348388672, 63.63264465332031, -4.29534912109375, 36.799095153808594, 22.480445861816406, 77.758056640625, 72.1527099609375, 24.222801208496094, 78.11288452148438, 66.05878448486328, -8.182342529296875, 83.78045654296875, 19.963348388671875, 28.469697952270508, 91.02099609375, -22.435699462890625, 72.82218170166016, 20.758113861083984, 70.47531127929688, 4.682243347167969, 16.178876876831055, 66.02142333984375, -11.443832397460938, 69.8211898803711, 6.133941650390625, 5.82342529296875, 58.19647216796875, -19.291637420654297, 40.3631591796875, -7.802360534667969, -0.6485309600830078, 72.38795471191406, -57.06358337402344, 18.802963256835938, -14.155105590820312, 33.24183654785156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000256.npy"}
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 24.298580169677734, "std": 39.94623947143555, "min": -77.30756378173828, "p10": -15.396185302734368, "median": 12.542634963989258, "p90": 82.09365692138672, "max": 113.31301879882812, "pos_frac": 0.734375, "sample": [-0.8745059967041016, 8.1090087890625, -8.546661376953125, 113.31301879882812, 91.73927307128906, 26.513904571533203, 1.9984378814697266, 54.58411407470703, 20.404747009277344, 58.150238037109375, 81.67294311523438, 8.612800598144531, 3.4287891387939453, -5.7859039306640625, 62.75482177734375, 61.83592987060547, 82.26974487304688, 1.4362030029296875, -21.744033813476562, -5.148460388183594, 2.411226272583008, 22.552871704101562, 25.67315673828125, 15.751701354980469, 29.39557647705078, -3.7263031005859375, -1.2907295227050781, 69.50093078613281, 53.09955596923828, -1.2351226806640625, 71.54714965820312, 84.56866455078125, 8.969482421875, 3.447906494140625, 22.054298400878906, 9.665916442871094, -18.331695556640625, 77.34687805175781, 96.31576538085938, -54.19345474243164, -77.30756378173828, 67.4075927734375, 25.112403869628906, 27.570579528808594, 3.0271759033203125, -31.24110221862793, 0.19211578369140625, -1.3867645263671875, 41.079872131347656, 49.192771911621094, 47.73724365234375, 4.1986083984375, 88.00841522216797, 13.26578140258789, -56.68353271484375, 87.505615234375, -1.6635856628417969, -28.296829223632812, 81.68278503417969, 9.052459716796875, 43.15945053100586, 11.819488525390625, 9.558303833007812, -6.130317687988281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000257.npy"}
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 21.659503936767578, "std": 29.889745712280273, "min": -52.05229187011719, "p10": -5.058204078674314, "median": 18.47098159790039, "p90": 58.46318283081057, "max": 95.244140625, "pos_frac": 0.859375, "sample": [25.936492919921875, 4.038372039794922, 27.061485290527344, -3.131216049194336, 7.037574768066406, 72.600830078125, 92.95156860351562, 13.93267822265625, 48.53431701660156, 4.130521774291992, 11.383749008178711, 17.122657775878906, -52.05229187011719, 22.339921951293945, 31.941810607910156, 47.62434387207031, -5.884056091308594, -12.780284881591797, 26.601150512695312, -43.287147521972656, -0.19759750366210938, 40.25244903564453, -42.575645446777344, 33.32860565185547, 5.9857177734375, 40.38044738769531, 15.2088623046875, 35.7816162109375, 15.150630950927734, 13.04803466796875, 14.8369140625, 3.9172096252441406, 54.16712188720703, 49.37109375, 15.434709548950195, 1.7606277465820312, 6.432891845703125, 12.093826293945312, 22.753570556640625, 95.244140625, 30.396636962890625, -6.110870361328125, 84.59521484375, 37.127227783203125, 21.557785034179688, 78.790771484375, 45.40650939941406, 36.801700592041016, 3.3110580444335938, 69.2205581665039, 1.1847457885742188, 21.391372680664062, 34.71819305419922, 22.009597778320312, 9.821311950683594, 8.89146614074707, 19.819305419921875, -49.9967041015625, 7.3031768798828125, 2.0603885650634766, 28.71288299560547, 39.42533874511719, 60.304351806640625, 10.988546371459961], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000258.npy"}
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 33.63237762451172, "std": 39.632965087890625, "min": -54.85014343261719, "p10": -20.65388755798338, "median": 39.219900131225586, "p90": 85.4992462158203, "max": 99.2623291015625, "pos_frac": 0.8125, "sample": [-1.080780029296875, -50.31640625, -54.68394470214844, 71.14433288574219, 21.73809814453125, 75.41685485839844, 85.4627914428711, 63.645263671875, 49.29264831542969, -54.85014343261719, 51.62806701660156, 3.5865631103515625, 30.461549758911133, 55.876556396484375, 52.906192779541016, -3.0133209228515625, -38.2974853515625, 3.020599365234375, 91.1500244140625, 23.146575927734375, 46.73593521118164, 63.97624206542969, 74.93002319335938, 87.00950622558594, 49.46167755126953, -4.586841583251953, -27.539764404296875, 35.280235290527344, 2.8915481567382812, 8.833559036254883, 66.32073974609375, 55.795989990234375, -3.9115943908691406, 23.627525329589844, 55.80950164794922, 38.92108154296875, 84.92072296142578, 2.681060791015625, 56.87449645996094, 38.33885192871094, 85.5148696899414, 39.51871871948242, 24.69178009033203, 99.2623291015625, 5.9130096435546875, 4.871734619140625, -34.714866638183594, -1.219045639038086, -54.66724395751953, 1.1657943725585938, 91.90586853027344, 88.47590637207031, 62.28155517578125, 48.004180908203125, 43.46614074707031, 53.83008575439453, 34.599365234375, 28.074462890625, 16.80193328857422, 33.99897003173828, 62.15208435058594, 90.00940704345703, 52.62664794921875, 43.30403137207031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000259.npy"}
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 25.3404541015625, "std": 41.61600875854492, "min": -67.58354187011719, "p10": -25.793962097167963, "median": 16.418731689453125, "p90": 83.17778930664063, "max": 92.94204711914062, "pos_frac": 0.6875, "sample": [81.92672729492188, -41.04145812988281, 62.83854293823242, -18.15865707397461, -2.648052215576172, 90.90240478515625, 23.262466430664062, 15.399162292480469, 86.87318420410156, -36.670135498046875, 78.70709228515625, 44.41718292236328, 76.28141784667969, 42.24003601074219, 54.92994689941406, -21.572525024414062, 75.24452209472656, 41.681480407714844, 11.991361618041992, 91.80175018310547, 65.1668930053711, 11.086135864257812, 87.42916870117188, 85.23885345458984, -5.445770263671875, 14.928943634033203, 74.40129089355469, 83.71395874023438, 4.537517547607422, -2.09423828125, 12.727096557617188, 12.873001098632812, 58.703025817871094, 13.392585754394531, 0.41063880920410156, 46.504634857177734, 12.670005798339844, -15.350273132324219, -67.58354187011719, 20.97399139404297, 2.239107131958008, 43.33599853515625, -0.6155910491943359, -4.9267578125, 17.43830108642578, 47.96430969238281, 92.94204711914062, -14.345611572265625, 79.67048645019531, -17.06829833984375, -49.06941223144531, 45.84892272949219, 27.7901611328125, 6.80828857421875, 59.45281982421875, 39.604949951171875, -37.0615234375, 65.42401123046875, 47.09735107421875, -27.6031494140625, -0.7488670349121094, -55.17668914794922, -19.317211151123047, -0.5849685668945312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000260.npy"}
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 21.40045928955078, "std": 36.345767974853516, "min": -48.82720947265625, "p10": -21.841885757446285, "median": 16.095584869384766, "p90": 77.3747772216797, "max": 99.54679107666016, "pos_frac": 0.703125, "sample": [3.165557861328125, 24.928253173828125, 28.84438133239746, 2.9002552032470703, -9.265151977539062, 51.70975875854492, -1.3715648651123047, 90.96446228027344, 18.60186767578125, -11.779268264770508, -38.74987030029297, -10.315505981445312, -1.5955352783203125, 78.07896423339844, 55.38813018798828, -9.54693603515625, -1.8548049926757812, 90.54898071289062, 4.958818435668945, 82.37635803222656, 9.426597595214844, 31.587425231933594, -16.824016571044922, -23.992401123046875, 70.36459350585938, 24.47850799560547, 99.54679107666016, 18.526966094970703, -4.383827209472656, 27.88139533996582, 38.31624221801758, 48.98741912841797, 39.35899353027344, 80.092041015625, 23.433507919311523, 0.3537731170654297, 79.96891784667969, -48.82720947265625, -32.17841339111328, 58.47740173339844, 17.25104522705078, 5.570158004760742, 55.20085525512695, 56.329307556152344, 14.633243560791016, -40.489295959472656, 3.1239757537841797, -33.72947692871094, -46.087059020996094, 14.1146240234375, 24.811317443847656, -7.7160797119140625, 56.462738037109375, 75.73167419433594, 28.511123657226562, -12.697792053222656, 34.32229232788086, 14.94012451171875, -2.0659942626953125, 52.14744567871094, 9.199783325195312, 62.92234802246094, 14.474365234375, 0.08679389953613281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000261.npy"}
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 24.20038604736328, "std": 35.03689956665039, "min": -48.657981872558594, "p10": -24.6344856262207, "median": 17.257442474365234, "p90": 68.5267433166504, "max": 96.14093017578125, "pos_frac": 0.734375, "sample": [-32.01129150390625, 45.8973388671875, 14.703437805175781, 2.162160873413086, -4.808837890625, 56.72012710571289, 8.7158203125, 5.4142608642578125, 16.396827697753906, 80.01576232910156, -7.716575622558594, 56.459693908691406, -24.234054565429688, 7.318906784057617, 55.234745025634766, -29.51910400390625, 31.772600173950195, 78.37617492675781, 24.49969482421875, 39.26449966430664, -48.657981872558594, -4.164813995361328, 3.0289382934570312, 18.118057250976562, 31.827964782714844, 62.78195571899414, 12.270111083984375, 96.14093017578125, -29.74908447265625, 8.708150863647461, 68.53630828857422, 48.34342956542969, 16.205093383789062, 66.15254211425781, -4.193443298339844, 83.87468719482422, 14.922941207885742, 38.7653923034668, 88.82504272460938, 1.613311767578125, -1.7462749481201172, -28.70831298828125, -23.042236328125, 63.56590270996094, 68.50442504882812, 39.43415832519531, 39.633453369140625, 5.142679214477539, -26.479751586914062, 20.769405364990234, 49.024261474609375, -2.023468017578125, 66.31806182861328, -3.087972640991211, 37.01966857910156, 26.349178314208984, 66.2322998046875, 72.06387329101562, 4.17791748046875, 45.10725402832031, 3.3762664794921875, -6.369972229003906, 60.35828399658203, -24.80609893798828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000262.npy"}
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 20.865982055664062, "std": 35.84383010864258, "min": -65.5619888305664, "p10": -16.133981895446777, "median": 19.136926651000977, "p90": 65.13028488159179, "max": 91.53880310058594, "pos_frac": 0.6875, "sample": [28.471542358398438, 49.424869537353516, -44.430015563964844, -7.8261260986328125, -13.300786972045898, -8.949897766113281, 52.125732421875, 0.417694091796875, 65.23809051513672, 64.21772003173828, 36.91595458984375, 1.538167953491211, -2.5435028076171875, 43.20998764038086, 2.39056396484375, -36.33600616455078, 58.133453369140625, -5.4810638427734375, 9.092056274414062, -10.645851135253906, 39.95127868652344, 64.95108032226562, 89.88102722167969, 22.316646575927734, -4.728126525878906, 1.251455307006836, 69.9587631225586, 46.813926696777344, 65.2011947631836, 58.88707733154297, 87.86709594726562, 15.194686889648438, -16.3931884765625, 4.0794677734375, -11.340755462646484, 27.714210510253906, -65.5619888305664, 56.76860427856445, 28.270782470703125, 60.727691650390625, 13.481414794921875, -2.8755874633789062, -10.375289916992188, 64.96482849121094, -16.184600830078125, 7.569854736328125, 27.0816650390625, 57.684669494628906, 44.08134460449219, 18.295196533203125, -8.96197509765625, 91.53880310058594, -0.980255126953125, 1.6447639465332031, 22.7335205078125, -16.015871047973633, 22.59326934814453, 54.36487579345703, 19.978656768798828, -16.82151222229004, 1.1728668212890625, -65.07945251464844, 72.36453247070312, 29.693580627441406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000263.npy"}
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 23.831483840942383, "std": 38.00233459472656, "min": -71.97615051269531, "p10": -23.715273284912108, "median": 20.32912826538086, "p90": 74.4424545288086, "max": 99.99903869628906, "pos_frac": 0.71875, "sample": [41.533935546875, -24.890716552734375, 15.610389709472656, 0.8404521942138672, 1.7099742889404297, -9.390268325805664, 3.427947998046875, -1.8752365112304688, 43.77011489868164, 25.315673828125, -28.38309669494629, 4.007917404174805, -71.97615051269531, 37.42144775390625, -16.659019470214844, 16.482345581054688, 50.025047302246094, 49.39912414550781, 11.341468811035156, 60.916717529296875, 50.591156005859375, 39.2789306640625, 72.74894714355469, 41.73988342285156, 43.06834411621094, 99.99903869628906, -0.5604991912841797, 22.713790893554688, -11.934249877929688, 51.40673828125, 13.052574157714844, 2.9933242797851562, 63.668418884277344, 1.6070175170898438, -15.55288314819336, 26.054641723632812, 86.60606384277344, -27.104331970214844, 56.46044158935547, 91.99951171875, -17.938308715820312, -20.972572326660156, 17.94446563720703, 80.20600891113281, 52.72051239013672, 61.70353317260742, -37.74742126464844, 38.35423278808594, -36.321014404296875, 17.437042236328125, -20.9444580078125, 2.4542160034179688, 90.81671905517578, 34.62495040893555, 37.56711196899414, -27.744457244873047, 16.11054229736328, 63.98699951171875, 25.616012573242188, -2.6670188903808594, -6.8739776611328125, 98.77940368652344, 65.46923828125, 75.16824340820312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000264.npy"}
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 28.10588836669922, "std": 45.01197814941406, "min": -79.43333435058594, "p10": -30.266352081298827, "median": 24.328593254089355, "p90": 88.88290481567383, "max": 98.28874206542969, "pos_frac": 0.75, "sample": [6.090545654296875, 3.884725570678711, -2.7984161376953125, -30.975265502929688, 14.224374771118164, -0.8197402954101562, 33.10491180419922, 17.962120056152344, -33.81291198730469, 89.02946472167969, 31.058460235595703, 9.20361328125, 38.684818267822266, 53.69541931152344, -79.43333435058594, -7.533958435058594, 36.28021240234375, 2.6628265380859375, 14.113912582397461, -65.86563873291016, 25.554039001464844, 71.06716918945312, 78.97432708740234, 20.355253219604492, 5.324274063110352, -7.803165435791016, 69.35739135742188, 7.1009063720703125, 32.56114196777344, 23.103147506713867, 94.83158874511719, 98.28874206542969, -28.612220764160156, 87.38066864013672, 56.41584777832031, 92.76367950439453, 3.157217025756836, 5.104526519775391, -7.582305908203125, -22.708297729492188, 31.360023498535156, 77.48139953613281, 85.61746978759766, 88.54093170166016, 91.96977996826172, 67.49590301513672, 93.28820037841797, 44.48009490966797, 13.261955261230469, -43.53917694091797, 20.646026611328125, 25.631942749023438, -7.742206573486328, 85.65989685058594, 75.93722534179688, 81.65077209472656, 92.48822021484375, -58.18208312988281, 42.167999267578125, -9.19887924194336, 56.448631286621094, 3.9698715209960938, 60.40052032470703, -54.44781494140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000265.npy"}
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 25.83509635925293, "std": 39.261451721191406, "min": -71.61661529541016, "p10": -11.072357368469238, "median": 21.235530853271484, "p90": 82.49664611816407, "max": 109.16537475585938, "pos_frac": 0.734375, "sample": [-6.2207489013671875, 83.42643737792969, 83.6796875, 5.547519683837891, 41.367576599121094, -1.3421573638916016, -4.312950134277344, -46.49363708496094, 3.9252758026123047, 22.173423767089844, 34.20011520385742, 4.75445556640625, 7.404747009277344, 48.16948699951172, 46.6585693359375, 53.92365646362305, 7.6619110107421875, 109.16537475585938, -9.097492218017578, -11.118759155273438, 92.55337524414062, 25.475112915039062, -4.1566619873046875, 94.8415298461914, 43.399131774902344, 90.89537048339844, 6.8955078125, 65.94721984863281, 37.894859313964844, -16.764373779296875, 74.71322631835938, 24.192298889160156, 78.26811981201172, -2.5471954345703125, 36.252079010009766, -4.924995422363281, 37.707984924316406, 48.57464599609375, 57.19145584106445, -2.0212135314941406, 0.830322265625, -37.21635437011719, -32.626991271972656, 8.677860260009766, -71.61661529541016, 11.285171508789062, 20.297637939453125, 14.068618774414062, 30.209861755371094, 60.849754333496094, -50.675960540771484, 2.596853256225586, -5.32403564453125, 32.060272216796875, 77.339599609375, 41.14034652709961, 93.7395248413086, -10.964086532592773, 61.54058837890625, 4.954063415527344, 58.413230895996094, 5.210744857788086, 80.32713317871094, 0.4686393737792969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000266.npy"}
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 19.258594512939453, "std": 40.35269546508789, "min": -93.19387817382812, "p10": -26.09142742156982, "median": 17.222091674804688, "p90": 75.76970520019532, "max": 100.6767578125, "pos_frac": 0.703125, "sample": [52.39349365234375, 0.9080352783203125, 2.0071678161621094, -65.13394927978516, 4.6420135498046875, -12.030391693115234, 67.63113403320312, 86.89707946777344, 58.96482849121094, 4.7233734130859375, -6.262197494506836, 22.123611450195312, 1.5053596496582031, 97.14205169677734, 3.6789932250976562, -37.265167236328125, 24.138259887695312, -34.96244430541992, -20.122039794921875, -72.71340942382812, -28.649736404418945, -15.269580841064453, -4.201713562011719, 39.16337585449219, 66.24299621582031, 78.60862731933594, -0.10235595703125, 12.61968994140625, 64.21549987792969, -20.009124755859375, 28.646385192871094, 1.5010566711425781, 73.57791137695312, 41.025875091552734, -12.284111022949219, 20.427173614501953, 34.43199157714844, -1.9549560546875, 2.1606292724609375, 100.6767578125, 23.801605224609375, 0.38130950927734375, 68.771484375, -1.3916263580322266, 76.70904541015625, 46.062679290771484, 79.07221221923828, 20.996131896972656, -93.19387817382812, 77.66083526611328, 17.803253173828125, -30.837799072265625, -2.2440433502197266, 16.64093017578125, 18.664260864257812, 9.90274429321289, 44.36016845703125, 40.268211364746094, 70.76799774169922, 19.411911010742188, 1.5723381042480469, 50.95438003540039, 18.51441192626953, -1.1907844543457031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000267.npy"}
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 25.0127010345459, "std": 33.90481185913086, "min": -44.986968994140625, "p10": -6.723542022705077, "median": 17.039085388183594, "p90": 77.10038070678712, "max": 98.5246353149414, "pos_frac": 0.703125, "sample": [8.935455322265625, 98.5246353149414, 0.022541046142578125, -5.465850830078125, 63.287261962890625, 77.64546966552734, 21.820838928222656, -5.154365539550781, -15.993610382080078, 42.20561599731445, 69.94047546386719, -0.28118896484375, -0.7394390106201172, 10.016193389892578, 13.205024719238281, 10.063606262207031, 43.243690490722656, -12.02984619140625, -2.3585758209228516, 75.82850646972656, 51.319908142089844, 61.45277404785156, -1.8089771270751953, 6.0980377197265625, -6.1208953857421875, 18.819580078125, 11.555953979492188, 42.76426696777344, 15.53118896484375, 2.416229248046875, 67.77738952636719, 3.818714141845703, 20.907649993896484, 20.75585174560547, -8.371795654296875, 80.58666229248047, -3.3205108642578125, -12.012187957763672, -38.087188720703125, 19.29714584350586, 49.743438720703125, 4.860139846801758, 1.0370559692382812, 78.72540283203125, 50.010353088378906, 43.958892822265625, 75.80464172363281, -1.737152099609375, 18.546981811523438, -44.986968994140625, -2.1322784423828125, -6.981819152832031, 45.110206604003906, -0.39890098571777344, 97.0341796875, 51.48548889160156, 18.55987548828125, 23.23143768310547, 90.83076477050781, 30.179502487182617, -0.4444465637207031, 90.76860046386719, 37.53419876098633, 3.9769515991210938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000268.npy"}
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 27.667041778564453, "std": 37.95743179321289, "min": -69.98845672607422, "p10": -7.39667739868164, "median": 19.970104217529297, "p90": 84.21973876953126, "max": 107.43902587890625, "pos_frac": 0.75, "sample": [-2.1602783203125, -27.406326293945312, 7.7080230712890625, 77.62982177734375, 22.395233154296875, -3.0590286254882812, 20.28002166748047, 72.50696563720703, 13.433189392089844, -4.8273468017578125, -7.569915771484375, 22.979122161865234, 44.91736602783203, 21.87096405029297, 94.34449768066406, 88.31423950195312, -4.534862518310547, 37.156036376953125, 101.80712127685547, -48.81652069091797, 21.948089599609375, -9.2467041015625, 74.93197631835938, 19.660186767578125, 4.33880615234375, 87.16303253173828, -15.547433853149414, 107.43902587890625, 37.793853759765625, 84.92345428466797, 12.517122268676758, 17.151962280273438, -6.992454528808594, 59.16340637207031, -2.5405426025390625, 3.912313461303711, 2.2725296020507812, 26.363479614257812, -1.6576766967773438, 67.88670349121094, 82.5777359008789, 37.803836822509766, 10.296241760253906, 7.849004745483398, 80.4910659790039, -25.25152015686035, 46.837738037109375, 13.946151733398438, -0.7192192077636719, 2.9511795043945312, 64.29220581054688, 0.6549072265625, 87.70086669921875, 9.257377624511719, -1.1501388549804688, -69.98845672607422, 7.0838623046875, 58.63706970214844, 28.098129272460938, 47.555091857910156, 55.660362243652344, 22.656234741210938, 70.7517318725586, 14.2498779296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000269.npy"}
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 21.389820098876953, "std": 39.946868896484375, "min": -83.50619506835938, "p10": -12.670439147949217, "median": 13.629142761230469, "p90": 78.75046691894532, "max": 102.88921356201172, "pos_frac": 0.765625, "sample": [39.380393981933594, 94.0407943725586, -13.15386962890625, 92.37272644042969, 37.789371490478516, 92.09074401855469, 8.544857025146484, 21.112125396728516, 47.863616943359375, 22.09686279296875, 102.88921356201172, 55.08778381347656, 13.491653442382812, 56.63743591308594, 2.477001190185547, -34.57695007324219, 19.361663818359375, 45.70153045654297, 16.968891143798828, 61.141143798828125, 24.31591033935547, 56.71003723144531, 10.135169982910156, -41.64365768432617, -6.640525817871094, 4.941875457763672, 68.00399017333984, -83.50619506835938, 21.19613265991211, 8.211833953857422, 5.3036346435546875, 4.24237060546875, 3.0965652465820312, 27.961875915527344, -3.6135177612304688, 5.12738037109375, 93.21573638916016, -23.439971923828125, 16.27767562866211, -82.95625305175781, 79.60125732421875, 65.03331756591797, 64.08601379394531, -8.60038948059082, 9.566797256469727, 22.709938049316406, -3.272216796875, 7.556632995605469, 81.83387756347656, 33.29900360107422, 76.76528930664062, 0.7367916107177734, -11.542434692382812, 56.500762939453125, 2.2039871215820312, -6.182395935058594, -0.28232574462890625, -68.7590103149414, 40.2725830078125, 9.224128723144531, 11.681686401367188, -2.797637939453125, 7.289039611816406, 13.766632080078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000270.npy"}
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 18.96731185913086, "std": 45.364017486572266, "min": -79.93426513671875, "p10": -26.095703125, "median": 5.213886260986328, "p90": 91.33197860717775, "max": 124.53167724609375, "pos_frac": 0.609375, "sample": [24.159805297851562, 36.27483367919922, -6.6372833251953125, 10.607101440429688, 99.26226043701172, 68.51800537109375, 13.199129104614258, 2.4826278686523438, 83.43389129638672, -33.48419952392578, 44.00244140625, -4.552585601806641, 109.71475219726562, -13.447853088378906, 46.287269592285156, 91.99061584472656, 2.1899776458740234, 7.2382965087890625, 0.8147087097167969, 101.64036560058594, 8.727783203125, 2.5020904541015625, 69.9734115600586, -9.00301742553711, -9.86083984375, 0.43625640869140625, -26.228866577148438, 45.37257385253906, -3.6447601318359375, -18.151948928833008, 6.486940383911133, 59.085880279541016, 20.97369384765625, -62.648895263671875, -4.10736083984375, 77.66805267333984, 15.678974151611328, 4.515296936035156, 48.83794403076172, 89.79515838623047, -2.1386032104492188, -40.173545837402344, 124.53167724609375, -37.162864685058594, -6.905677795410156, 5.9124755859375, -3.6622848510742188, -62.88189697265625, -79.93426513671875, -1.1175308227539062, 16.399555206298828, -13.596542358398438, 86.19257354736328, 95.49668884277344, 38.958335876464844, -0.7939453125, -22.386688232421875, 45.785911560058594, -5.872261047363281, 3.9967384338378906, -12.821455001831055, -25.784988403320312, 103.80852508544922, 7.955535888671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000271.npy"}
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 37.57149124145508, "std": 36.19429397583008, "min": -33.594608306884766, "p10": -0.6935497283935546, "median": 37.54483604431152, "p90": 94.18352432250977, "max": 107.01348114013672, "pos_frac": 0.859375, "sample": [11.159370422363281, -1.5271530151367188, -13.350906372070312, 44.975582122802734, 46.86505126953125, 100.54723358154297, 25.325424194335938, 87.9698257446289, 81.70801544189453, 5.924774169921875, 65.00029754638672, 36.45988082885742, 12.263542175292969, 1.6361770629882812, 55.174964904785156, 41.39237976074219, 56.349220275878906, 3.1319580078125, 75.43678283691406, 56.19529342651367, 78.736328125, 10.841163635253906, 60.89118576049805, 57.19292449951172, 102.40221405029297, 11.092277526855469, 38.629791259765625, -23.10137176513672, 12.012275695800781, 102.00052642822266, 40.111000061035156, 101.98570251464844, 60.53215026855469, 0.3347892761230469, 44.30982971191406, 20.737590789794922, 17.114778518676758, 0.5587196350097656, 21.657363891601562, -33.594608306884766, 68.65985870361328, 101.21064758300781, 107.01348114013672, 36.188941955566406, 1.6425361633300781, 10.062332153320312, 92.16795349121094, 43.435340881347656, 42.86801528930664, 71.50411224365234, 31.579092025756836, 72.4825668334961, -8.899429321289062, 0.7271575927734375, 13.381797790527344, 19.473175048828125, 52.030418395996094, 39.279876708984375, -10.263553619384766, -0.21065902709960938, -0.5541553497314453, -0.7532901763916016, 95.0473403930664, 9.419546127319336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000272.npy"}
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 25.617786407470703, "std": 50.630043029785156, "min": -99.61708068847656, "p10": -42.30457763671875, "median": 26.440750122070312, "p90": 87.43654022216796, "max": 138.27603149414062, "pos_frac": 0.71875, "sample": [-0.044033050537109375, 47.79517364501953, 87.21444702148438, -42.40269470214844, 60.8367919921875, -5.209892272949219, -68.50103759765625, 61.46730422973633, -83.64332580566406, 50.19867706298828, 30.805374145507812, -42.07563781738281, 38.57014465332031, -1.9683284759521484, 5.722938537597656, 36.341915130615234, 91.43250274658203, 14.661144256591797, 72.88568878173828, 84.36078643798828, 78.09090423583984, 9.302875518798828, -5.457004547119141, -1.3829154968261719, -99.61708068847656, 66.5770034790039, 53.65631103515625, 2.4043502807617188, -26.992799758911133, 2.439105987548828, 51.59996032714844, 138.27603149414062, 87.53172302246094, -84.09757995605469, -11.449142456054688, 40.901649475097656, 53.06071090698242, 63.57429885864258, 4.689521789550781, 74.64131164550781, -60.06153106689453, 22.076126098632812, 55.80809783935547, -9.765403747558594, 32.5986442565918, 118.20045471191406, 102.1067886352539, 49.083953857421875, 8.943519592285156, 53.10200881958008, 1.4481258392333984, 21.556320190429688, 0.8806629180908203, -10.066947937011719, 13.239044189453125, 95.97191619873047, 1.0394783020019531, 43.2314453125, 21.315113067626953, -13.148979187011719, 77.65010070800781, 99.30924987792969, -57.92818069458008, 36.75122833251953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000273.npy"}
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 28.930540084838867, "std": 48.54859924316406, "min": -88.0556869506836, "p10": -26.47051448822021, "median": 24.040225982666016, "p90": 95.13912429809571, "max": 106.32150268554688, "pos_frac": 0.734375, "sample": [31.25861358642578, -34.058921813964844, -5.175102233886719, -0.9058074951171875, -12.509902954101562, -74.46673583984375, 82.35159301757812, 82.80424499511719, 14.133228302001953, 13.977903366088867, 50.80940246582031, 80.19886016845703, -3.0638656616210938, -15.34521484375, 34.61602783203125, 106.32150268554688, 99.93936157226562, -51.102088928222656, 15.99356460571289, 14.434364318847656, 95.58908081054688, 88.35034942626953, 50.98128128051758, 93.7883529663086, 57.402183532714844, 2.21173095703125, 94.08922576904297, 102.88896179199219, 26.244123458862305, 3.835235595703125, 2.7797794342041016, 2.666522979736328, 87.29593658447266, -5.722511291503906, -12.741861343383789, 37.665401458740234, -60.591644287109375, -21.352569580078125, 65.55029296875, 6.4516448974609375, -28.66391944885254, 49.360992431640625, 8.36330795288086, 15.008041381835938, -8.529607772827148, 23.333541870117188, 25.732383728027344, -88.0556869506836, 29.442611694335938, 26.48526382446289, 97.63480377197266, 92.4382095336914, 3.965679168701172, 29.452346801757812, 7.719268798828125, -12.03312873840332, -50.17470169067383, 8.148881912231445, 99.88056945800781, 92.9046630859375, 60.09739303588867, 24.746910095214844, 105.80260467529297, 90.90157318115234], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000274.npy"}
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 30.950244903564453, "std": 43.15102005004883, "min": -100.84375762939453, "p10": -12.5193977355957, "median": 20.979830741882324, "p90": 96.19285659790042, "max": 110.87442779541016, "pos_frac": 0.78125, "sample": [27.426559448242188, 8.420907974243164, -7.285728454589844, 51.51683044433594, 89.88900756835938, 12.619014739990234, 35.66712188720703, -100.84375762939453, 103.05928039550781, 83.85943603515625, 16.869400024414062, 16.88726806640625, 0.12364959716796875, 99.78997039794922, -1.1027755737304688, 10.878917694091797, 75.6435546875, 25.596981048583984, 91.4339599609375, -19.429672241210938, -67.61190795898438, 105.8039321899414, 11.383918762207031, 5.132423400878906, 68.70134735107422, -0.40546417236328125, -29.023509979248047, 8.214164733886719, -1.0367393493652344, 10.664321899414062, 15.359619140625, 110.87442779541016, 33.08605194091797, 27.129409790039062, 98.23238372802734, 64.60606384277344, 48.649436950683594, 21.441381454467773, -8.257217407226562, -0.2765350341796875, 32.79841613769531, 10.8406982421875, 20.518280029296875, 1.433370590209961, 12.10515022277832, 102.90389251708984, 41.95048522949219, 79.18498992919922, -14.346046447753906, 48.263118743896484, 1.1264076232910156, 2.0651092529296875, -2.2792911529541016, 12.446090698242188, 99.50495147705078, 77.82024383544922, -14.547664642333984, 57.53515625, 61.3516845703125, 79.40624237060547, 56.126136779785156, 30.569412231445312, -22.09584617614746, 62.44727325439453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000275.npy"}
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 28.09267234802246, "std": 43.017784118652344, "min": -82.77752685546875, "p10": -12.268635559082028, "median": 22.6466064453125, "p90": 88.06622314453125, "max": 110.55722045898438, "pos_frac": 0.78125, "sample": [88.65811157226562, 6.553092956542969, -7.834905624389648, 54.208900451660156, 4.849569320678711, 23.80669403076172, 110.55722045898438, -67.71957397460938, 46.762855529785156, 17.65418815612793, 14.64461898803711, 24.691007614135742, 36.083351135253906, 62.24089050292969, -5.65777587890625, 98.68359375, -1.3962516784667969, 1.713470458984375, 14.854873657226562, 5.669620513916016, 28.50131607055664, 79.50457000732422, 76.56372833251953, 104.23880004882812, 5.86395263671875, -50.213401794433594, 78.40234375, -30.951866149902344, 34.501731872558594, 82.59004211425781, 6.170162200927734, 7.38299560546875, -29.954322814941406, -82.77752685546875, 63.303131103515625, -13.6046142578125, 86.68515014648438, -5.723991394042969, 31.54498291015625, 70.81143951416016, -57.01203155517578, 14.623321533203125, 96.68594360351562, 1.0812873840332031, -6.389411926269531, 34.27324676513672, 72.80493927001953, 30.6982421875, 77.38897705078125, 31.973785400390625, -6.595283508300781, -9.151351928710938, 18.57984161376953, 29.64969253540039, 13.442138671875, 104.32752990722656, 93.75395965576172, 49.04960250854492, 21.48651885986328, 19.43982696533203, 16.812257766723633, 40.30726623535156, 36.638858795166016, 2.199676513671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000276.npy"}
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 18.521484375, "std": 42.40331268310547, "min": -79.8580322265625, "p10": -37.06490039825438, "median": 13.040191650390625, "p90": 77.52691421508791, "max": 112.1702880859375, "pos_frac": 0.65625, "sample": [52.30435562133789, -58.710906982421875, 63.88368225097656, -2.4511871337890625, 62.56897735595703, -53.72749710083008, 8.017337799072266, 13.031692504882812, 80.27367401123047, 14.417160034179688, -25.107101440429688, -1.3760147094726562, 37.1043701171875, 0.1121978759765625, -16.03781509399414, 41.94221496582031, 21.211278915405273, -2.7252197265625, 1.656524658203125, 16.222949981689453, 25.43694496154785, 38.72650909423828, 3.7374916076660156, 5.095817565917969, -3.801952362060547, 41.232177734375, -3.7181549072265625, -44.159210205078125, 13.048690795898438, 79.5244140625, 0.40421104431152344, 35.53375244140625, -26.258691787719727, 95.32957458496094, 24.67839813232422, 70.38681030273438, -12.466316223144531, -54.575828552246094, 16.730682373046875, 23.538284301757812, -1.0713958740234375, -41.69613265991211, 98.47752380371094, 72.20556640625, 30.483505249023438, 4.161853790283203, -2.7540969848632812, 55.304534912109375, -56.703895568847656, 112.1702880859375, 52.702911376953125, -19.551912307739258, 92.15158081054688, 37.61521911621094, -4.06304931640625, 0.7666702270507812, 72.86608123779297, -79.8580322265625, 45.9822998046875, 99.04739379882812, -13.377517700195312, 4.427360534667969, -0.08545875549316406, 45.13933563232422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000277.npy"}
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 28.93728256225586, "std": 47.22331237792969, "min": -88.86378479003906, "p10": -16.20767765045166, "median": 22.876949310302734, "p90": 96.88439788818361, "max": 112.80070495605469, "pos_frac": 0.78125, "sample": [-15.238180160522461, -55.24534606933594, 79.774658203125, 20.925308227539062, 1.8147125244140625, -4.113059997558594, -74.84902954101562, 41.43830871582031, 107.93050384521484, 110.21589660644531, 27.183265686035156, 8.372795104980469, 89.8591079711914, 99.36459350585938, 28.32141876220703, -8.674055099487305, 26.24313735961914, 52.33773422241211, 57.89649963378906, 77.88369750976562, 54.975059509277344, 46.85688781738281, 8.555343627929688, 1.4262313842773438, -73.56546020507812, 112.80070495605469, -28.762435913085938, 55.91700744628906, 13.5780029296875, 3.7581329345703125, 10.761798858642578, 24.828590393066406, -4.669677734375, -16.62317657470703, 75.23905944824219, 3.3259201049804688, 91.09727478027344, 14.378067016601562, 51.3253288269043, 65.2507095336914, 105.87903594970703, 108.10601806640625, -3.164234161376953, 46.347381591796875, 30.970531463623047, 37.64378356933594, 1.2227706909179688, 2.913583755493164, -44.00585174560547, 3.2413482666015625, 45.89617156982422, 74.64300537109375, -88.86378479003906, -2.3334007263183594, 4.348594665527344, 89.5633544921875, 3.548553466796875, 100.49819946289062, 45.881534576416016, 10.809791564941406, 5.595495223999023, -6.791034698486328, 85.76739501953125, 12.372634887695312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000278.npy"}
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 39.80183410644531, "std": 41.80936813354492, "min": -64.41661071777344, "p10": -1.3696569442749016, "median": 33.07258415222168, "p90": 104.60911560058594, "max": 110.18145751953125, "pos_frac": 0.875, "sample": [-0.6164340972900391, 4.207950592041016, 110.18145751953125, 13.943069458007812, -64.41661071777344, 75.98068237304688, 9.226966857910156, -1.6924667358398438, 9.063232421875, 50.284976959228516, 18.635520935058594, 87.42906188964844, 49.37902069091797, 92.18754577636719, 34.057796478271484, 109.58570861816406, 57.5875244140625, 87.77366638183594, -28.154943466186523, 3.582622528076172, -31.213390350341797, 94.14607238769531, 57.33631896972656, 20.622074127197266, 7.2805328369140625, 49.100128173828125, 23.527925491333008, 0.7213020324707031, 16.27588653564453, 74.71757507324219, 31.235565185546875, 73.42741394042969, 109.44098663330078, 39.599395751953125, -10.471611022949219, 86.13641357421875, 46.49847412109375, 32.087371826171875, -19.65392303466797, 105.05845642089844, 105.68562316894531, 7.17548942565918, 103.56065368652344, 6.1989898681640625, 19.993133544921875, 25.523094177246094, 7.216190338134766, 3.6519699096679688, 38.0811767578125, 107.68170928955078, 64.13067626953125, -8.235710144042969, 51.84133529663086, 101.55012512207031, 1.4556503295898438, 9.114166259765625, 17.713165283203125, 11.277008056640625, 67.2757568359375, 57.69188690185547, 68.15009307861328, 43.17478942871094, 107.91989135742188, 5.391021728515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000279.npy"}
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 42.298980712890625, "std": 42.46880340576172, "min": -37.02916717529297, "p10": -1.7738250732421872, "median": 27.45783042907715, "p90": 106.94368438720703, "max": 127.50045776367188, "pos_frac": 0.859375, "sample": [66.66950988769531, 108.79544067382812, 3.844236373901367, 69.45286560058594, -2.963348388671875, 107.17594909667969, 1.195556640625, -0.8114852905273438, 1.7623310089111328, 74.4989013671875, 16.10169219970703, 4.013511657714844, 78.12321472167969, 79.58329772949219, 105.03340148925781, 101.20697021484375, 34.82704162597656, 33.79670333862305, 101.74916076660156, 25.68633270263672, 3.7049102783203125, 3.1614952087402344, -2.0329818725585938, -1.4597511291503906, 106.4017333984375, 77.95765686035156, 17.886768341064453, 74.68114471435547, 91.15298461914062, 127.50045776367188, 32.454837799072266, 11.64080810546875, 17.940704345703125, 79.22824096679688, 4.563686370849609, 39.608154296875, 115.01493835449219, -3.051727294921875, 109.22795867919922, 100.19225311279297, -37.02916717529297, -1.9084281921386719, 6.283164978027344, -1.9338188171386719, 67.84933471679688, 33.067752838134766, 38.531982421875, -3.521181106567383, 20.308990478515625, 6.100704193115234, 107.30398559570312, 29.229328155517578, 7.794528961181641, 83.0779037475586, 17.48841667175293, 13.206100463867188, 15.335941314697266, 12.930526733398438, 9.302370071411133, 16.494728088378906, 11.172370910644531, 93.7657470703125, 109.55720520019531, 37.21086120605469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000280.npy"}
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 38.107269287109375, "std": 48.45594024658203, "min": -75.23761749267578, "p10": -11.56676902770996, "median": 30.02034854888916, "p90": 103.60713348388671, "max": 114.11410522460938, "pos_frac": 0.8125, "sample": [-26.604076385498047, 80.4207763671875, 11.424789428710938, 1.086029052734375, 91.85429382324219, 74.79611206054688, 29.205041885375977, 60.041534423828125, 37.11224365234375, 108.98565673828125, 38.69697570800781, 105.87227630615234, 63.31169509887695, 58.64801788330078, 3.5296173095703125, 4.310937881469727, 5.74070930480957, 103.52314758300781, 0.6660213470458984, 87.93834686279297, 79.98204040527344, 2.8211135864257812, -3.928558349609375, 30.835655212402344, 91.02428436279297, 4.755504608154297, 11.110206604003906, -8.325199127197266, 37.643943786621094, -75.23761749267578, 5.044700622558594, 3.8154449462890625, 5.630058288574219, 9.262277603149414, 103.64312744140625, -52.727577209472656, 73.29257202148438, 114.11410522460938, 17.66048812866211, -41.882015228271484, 104.19332122802734, 75.31065368652344, 107.49694061279297, 66.82455444335938, 108.35308074951172, 25.032691955566406, 95.18852233886719, 7.613025665283203, 102.5536880493164, 41.73713684082031, -1.75067138671875, 2.2409744262695312, 101.94900512695312, -44.64479064941406, 2.4331932067871094, 85.85910034179688, 91.22187805175781, -9.609466552734375, 70.52267456054688, 91.31609344482422, -12.40561294555664, -6.586858749389648, 4.616353988647461, -19.694761276245117], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000281.npy"}
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 33.2713737487793, "std": 53.93399429321289, "min": -91.20254516601562, "p10": -38.363702011108394, "median": 26.303120613098145, "p90": 107.91553649902346, "max": 133.14834594726562, "pos_frac": 0.6875, "sample": [35.672218322753906, -26.98418426513672, -52.42161178588867, -1.9486236572265625, 6.238090515136719, 79.9593505859375, 67.81085205078125, 8.47593879699707, -2.378612518310547, 117.28897094726562, 73.9822998046875, 99.00193786621094, 26.64737892150879, 79.4849853515625, 100.9346923828125, 18.5677490234375, 112.83554077148438, 48.640254974365234, -4.45513916015625, 70.77978515625, 25.9588623046875, 40.9361572265625, -3.029153823852539, 109.78549194335938, 20.00398826599121, -3.4777069091796875, -49.95295715332031, 94.41287231445312, 101.02586364746094, 8.282207489013672, -39.63455581665039, -19.018556594848633, -91.20254516601562, 97.06178283691406, 5.128141403198242, -47.04882049560547, 112.69932556152344, 8.718315124511719, -35.39837646484375, 29.454010009765625, 82.95185089111328, 90.8232421875, 2.208831787109375, 33.414608001708984, 103.55230712890625, 133.14834594726562, 47.36472702026367, -0.5767555236816406, -4.920356750488281, 21.529129028320312, -12.406723022460938, -16.880645751953125, -20.106597900390625, 19.43478775024414, -48.30724334716797, 118.0397720336914, 115.09632873535156, 71.53553771972656, -62.345664978027344, 31.592071533203125, 53.858177185058594, 55.722389221191406, 82.82557678222656, 8.978042602539062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000282.npy"}
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 30.390365600585938, "std": 50.57979965209961, "min": -98.3125, "p10": -28.69057922363281, "median": 16.215795516967773, "p90": 94.82052612304688, "max": 117.2327651977539, "pos_frac": 0.78125, "sample": [113.12922668457031, -56.991477966308594, -33.30506896972656, -29.27392578125, 70.17994689941406, 62.0424690246582, 4.535060882568359, 2.3597640991210938, 89.1425552368164, 0.9175033569335938, -15.222122192382812, 85.71885681152344, 14.108329772949219, 16.373363494873047, 104.42024230957031, 11.167705535888672, 10.317825317382812, 92.89656066894531, -27.329437255859375, 16.0582275390625, 94.189697265625, 61.11323547363281, 92.47415161132812, 12.143630981445312, 56.24967956542969, 105.99121856689453, 77.06621551513672, 0.8245925903320312, 117.2327651977539, 10.648735046386719, 4.359092712402344, 95.09088134765625, -54.646759033203125, 6.748146057128906, 9.540084838867188, 57.630828857421875, -65.98934936523438, 25.930633544921875, 49.510101318359375, 91.28678894042969, 103.49893188476562, -14.950674057006836, 94.0649642944336, 92.17207336425781, 16.571025848388672, 18.567180633544922, 22.035301208496094, -98.3125, 6.970563888549805, -53.384368896484375, 76.99964141845703, 1.5451774597167969, 49.42047119140625, -2.408863067626953, 43.281982421875, 79.9367446899414, -3.80841064453125, -13.459636688232422, 13.65770149230957, 3.8072662353515625, -18.2811279296875, 110.99929809570312, 11.476715087890625, 25.944074630737305], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000283.npy"}
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 33.095237731933594, "std": 46.77497100830078, "min": -99.68183898925781, "p10": -3.3311256408691383, "median": 15.894235610961914, "p90": 103.90989990234377, "max": 120.41670227050781, "pos_frac": 0.84375, "sample": [-4.5858612060546875, 4.919746398925781, 43.46735382080078, 17.814056396484375, 117.52787017822266, 66.4041748046875, 4.139822006225586, -12.2601318359375, 120.41670227050781, 5.555629730224609, 14.61355972290039, 16.725017547607422, 1.3196945190429688, 3.228271484375, 89.63749694824219, -4.210681915283203, 8.356979370117188, 69.7232894897461, 1.9598236083984375, 119.60951232910156, 10.764869689941406, 26.328048706054688, 6.315765380859375, -28.245941162109375, -13.280540466308594, 10.72735595703125, 5.603782653808594, 23.82373809814453, 46.4097785949707, 1.6641349792480469, 98.92900085449219, 56.130516052246094, 66.07572174072266, 78.37425231933594, 8.781881332397461, 15.063453674316406, 76.40990447998047, 106.04457092285156, 113.40548706054688, 4.954017639160156, 69.3252944946289, 120.33586120605469, 50.31877136230469, -1.2649955749511719, 1.0033817291259766, -0.42481231689453125, 79.97335815429688, 9.376134872436523, 54.3299560546875, 94.58792114257812, 2.344205856323242, 59.60065841674805, 76.13722229003906, 6.3401641845703125, 8.377166748046875, 23.940261840820312, 116.96224975585938, 35.53155517578125, 33.917240142822266, -99.68183898925781, 60.430503845214844, -93.06806945800781, 12.339729309082031, -1.2788276672363281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000284.npy"}
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 28.496816635131836, "std": 48.52507019042969, "min": -97.45296478271484, "p10": -31.920432472229, "median": 19.378978729248047, "p90": 106.06190948486328, "max": 114.12158203125, "pos_frac": 0.734375, "sample": [33.877357482910156, 6.564445495605469, -12.59344482421875, 106.12672424316406, 7.166015625, -35.694087982177734, -97.45296478271484, 10.893028259277344, 47.2368278503418, -26.513391494750977, 48.32225036621094, 105.91067504882812, 94.52513885498047, 28.246253967285156, 82.56690979003906, -48.50904846191406, 49.22868347167969, 109.7716293334961, 2.1392555236816406, 5.459751129150391, 22.65790557861328, 1.5157737731933594, -34.237735748291016, -34.53486251831055, 4.398895263671875, -61.742881774902344, 2.739471435546875, 16.100051879882812, 27.4813232421875, -42.87078094482422, -9.585357666015625, 114.12158203125, -11.131217956542969, 102.34745025634766, 53.29670715332031, 62.137149810791016, 36.149452209472656, 5.9423370361328125, 64.85807800292969, 85.93389892578125, 59.52400207519531, -0.38282012939453125, -0.9494552612304688, 111.12574768066406, 80.84691619873047, 30.212242126464844, 0.8292427062988281, 108.0052490234375, -7.21270751953125, -3.0665550231933594, 40.146080017089844, 67.31819915771484, 13.461540222167969, 55.026954650878906, 5.2745208740234375, 35.97941589355469, 110.0254898071289, 108.22309875488281, -22.577125549316406, 4.597198486328125, 0.44976806640625, 52.14299011230469, -6.727264404296875, 58.67430114746094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000285.npy"}
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 45.10606384277344, "std": 48.958065032958984, "min": -52.43107604980469, "p10": -0.6963649749755858, "median": 31.174280166625977, "p90": 119.74642791748047, "max": 137.65185546875, "pos_frac": 0.875, "sample": [53.15455627441406, 10.567422866821289, 1.0627765655517578, 58.88005828857422, 3.5380859375, 108.63541412353516, 7.079719543457031, 0.8582401275634766, 23.479509353637695, -3.835906982421875, 71.78018188476562, 8.571044921875, 29.75817108154297, 93.18954467773438, 98.5588607788086, 124.0113525390625, 97.34175109863281, 33.37543487548828, -17.31381607055664, 10.709121704101562, 71.92010498046875, 0.707794189453125, 32.590389251708984, 114.01066589355469, 4.133953094482422, -52.43107604980469, 73.05593872070312, 38.63837814331055, -28.07044219970703, 54.98899841308594, 101.42066955566406, 56.06131362915039, 110.67494201660156, 120.13111877441406, 58.65338897705078, 121.65875244140625, 26.658435821533203, 97.8818588256836, -40.573429107666016, 1.9779510498046875, 120.62533569335938, 137.65185546875, 18.57391357421875, -0.7629547119140625, 76.78164672851562, 15.34259033203125, 16.946199417114258, 12.949935913085938, 120.18180084228516, 113.01678466796875, 118.84881591796875, 29.757431030273438, 18.40401840209961, 5.661169052124023, 4.481410980224609, 22.119644165039062, 73.4387435913086, 11.647594451904297, -38.69334411621094, 123.88221740722656, 47.65602111816406, -0.5409889221191406, 51.74853515625, 9.578666687011719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000286.npy"}
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 45.031578063964844, "std": 51.35892868041992, "min": -95.50527954101562, "p10": -4.275063323974607, "median": 34.2608757019043, "p90": 117.68679809570312, "max": 129.35043334960938, "pos_frac": 0.828125, "sample": [42.34247589111328, -23.247207641601562, 9.8043212890625, 92.15872192382812, 129.35043334960938, 111.10096740722656, 8.74224853515625, 34.08263397216797, 47.27794647216797, -5.1118621826171875, 49.938941955566406, 17.793132781982422, 41.100791931152344, 22.695022583007812, 121.36634826660156, 122.07707977294922, 121.55294799804688, 59.413490295410156, 110.71875, 76.17179870605469, 37.543418884277344, 34.439117431640625, 96.37249755859375, 2.7122573852539062, 6.994815826416016, 91.6593017578125, -1.831085205078125, -25.28589630126953, 97.54521942138672, 0.13493728637695312, 92.92957305908203, 4.7926177978515625, -95.50527954101562, 4.987968444824219, -6.263479232788086, 62.150474548339844, 22.65924072265625, -0.4603233337402344, 24.93895721435547, 95.18217468261719, 105.98723602294922, -17.922134399414062, 118.24169921875, 98.20661926269531, -18.05524444580078, 3.5060882568359375, 119.52119445800781, 3.159160614013672, 99.44839477539062, 103.11886596679688, 0.1938934326171875, 3.5709228515625, 78.55548095703125, 4.931035995483398, 1.3966255187988281, 26.605209350585938, 120.61979675292969, 7.454521179199219, 62.38771057128906, -0.0992584228515625, -2.3225326538085938, 116.39202880859375, 111.2691879272461, 0.8288345336914062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000287.npy"}
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 32.744781494140625, "std": 52.460025787353516, "min": -88.75093078613281, "p10": -24.754982757568357, "median": 27.123839378356934, "p90": 109.1789939880371, "max": 126.73743438720703, "pos_frac": 0.703125, "sample": [25.84371566772461, 2.1932449340820312, -23.412506103515625, -88.75093078613281, 28.68811798095703, 15.227588653564453, 19.361448287963867, -25.33032989501953, 72.85984802246094, 47.08623504638672, 18.03005599975586, 70.16372680664062, 20.802841186523438, 108.2214126586914, -9.4591064453125, 77.83120727539062, -18.795623779296875, 45.71613693237305, -18.092498779296875, 56.05381393432617, -80.09526824951172, 109.60338592529297, 126.73743438720703, 109.58938598632812, -8.651092529296875, 118.86837768554688, 30.63237190246582, 46.792266845703125, 4.941001892089844, 23.91252326965332, 119.11103820800781, -66.97280883789062, 49.665283203125, 65.69842529296875, 115.36830139160156, 55.347835540771484, 47.88560485839844, -30.085037231445312, 32.266212463378906, -20.49913787841797, -39.483604431152344, -56.58270263671875, 107.75711822509766, -0.3685302734375, -0.41405487060546875, -13.901199340820312, 87.86888885498047, -7.4365997314453125, 93.82848358154297, 94.57965087890625, 96.27733612060547, 0.2443084716796875, 73.814208984375, 7.32171630859375, -0.7102584838867188, -0.14728546142578125, 70.62238311767578, 14.949165344238281, 0.7389163970947266, 114.45724487304688, 28.403963088989258, 79.74543762207031, 13.47735595703125, 56.26968002319336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000288.npy"}
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 29.594276428222656, "std": 55.03404998779297, "min": -97.03804016113281, "p10": -33.33634796142578, "median": 14.65349292755127, "p90": 111.34924926757813, "max": 130.57443237304688, "pos_frac": 0.703125, "sample": [20.30828857421875, 11.745880126953125, 103.18339538574219, -14.620870590209961, -33.00542449951172, -14.061920166015625, -10.13607406616211, 22.365467071533203, 81.74898529052734, 26.360313415527344, 13.259994506835938, 36.67756652832031, 101.49478149414062, 98.97102355957031, 31.182632446289062, 16.0469913482666, 90.433349609375, -34.58598327636719, -13.075885772705078, 110.3826904296875, 11.982574462890625, 115.41929626464844, 0.9903564453125, -38.50361633300781, 94.3168716430664, 1.6879634857177734, 111.76348876953125, 3.3406143188476562, -8.411396026611328, 116.86721801757812, 0.5264434814453125, 24.39015007019043, -22.559101104736328, -97.03804016113281, 127.408203125, -3.653453826904297, 4.513299942016602, 130.57443237304688, 9.975631713867188, 19.795005798339844, 3.0824108123779297, 18.33704376220703, -31.777587890625, 19.891298294067383, 43.962867736816406, -33.478172302246094, 44.12263107299805, -3.4022178649902344, -48.32328796386719, -70.16070556640625, 81.34992980957031, 18.881256103515625, 3.8770294189453125, 11.551162719726562, 111.97171783447266, 108.26480102539062, 85.80291748046875, 108.61225891113281, 113.94548034667969, 4.006309509277344, -0.5666389465332031, -42.659088134765625, 104.02261352539062, -5.341434478759766], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000289.npy"}
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 36.85968780517578, "std": 57.31482696533203, "min": -101.42191314697266, "p10": -15.281102943420409, "median": 29.447711944580078, "p90": 118.96206588745117, "max": 122.57669067382812, "pos_frac": 0.765625, "sample": [64.706787109375, 35.714073181152344, 100.62708282470703, 121.3377685546875, 43.865142822265625, 7.570623397827148, -14.378175735473633, -5.295793533325195, 2.454925537109375, 121.50147247314453, 47.27020263671875, 17.179309844970703, 30.876663208007812, -68.22249603271484, 35.8267822265625, 0.4733734130859375, -9.856361389160156, -22.955177307128906, -15.668071746826172, -10.280494689941406, 52.63838195800781, 29.437294006347656, 119.51622772216797, 122.42045593261719, 26.140098571777344, 30.186080932617188, 7.127635955810547, -3.650543212890625, 10.232366561889648, -67.85211181640625, 113.8754653930664, 121.6021728515625, 105.00763702392578, 39.86606216430664, 57.92518615722656, 122.57320404052734, 114.4597396850586, 65.7086181640625, 35.648216247558594, -78.4068603515625, 110.30088806152344, 92.87301635742188, 11.265880584716797, -101.42191314697266, -0.9031143188476562, -5.753326416015625, -4.89384651184082, 122.57669067382812, 112.68004608154297, 104.9618148803711, 15.338062286376953, 29.4581298828125, 3.52423095703125, 12.995071411132812, -97.33888244628906, 107.21185302734375, 13.600959777832031, 117.66902160644531, 26.889663696289062, 28.34595489501953, 76.29902648925781, 27.83612060546875, 47.18658447265625, 1.1151580810546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000290.npy"}
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 38.22565841674805, "std": 53.5938835144043, "min": -103.57132720947266, "p10": -16.119858169555663, "median": 16.889495849609375, "p90": 115.88039779663086, "max": 147.365478515625, "pos_frac": 0.75, "sample": [111.16702270507812, 119.24185180664062, 17.967636108398438, -23.414474487304688, 24.074142456054688, 14.87109375, -12.790443420410156, 1.9161262512207031, 116.18145751953125, 3.5676345825195312, 77.07291412353516, 73.48226928710938, -23.662158966064453, 1.1230525970458984, 108.5067138671875, 104.1373062133789, -17.081748962402344, -5.782867431640625, 13.9559326171875, 94.79077911376953, -3.90509033203125, 6.90771484375, 24.324691772460938, 33.18364715576172, 56.74960708618164, 100.82467651367188, 89.32699584960938, -0.7483501434326172, 121.09375, 3.2408294677734375, 48.48474884033203, 124.52180480957031, -1.3951072692871094, 13.754570007324219, -13.557525634765625, 4.605615615844727, 75.23780822753906, -20.3095703125, 121.72036743164062, 9.256889343261719, -9.8533935546875, 115.17792510986328, -103.57132720947266, 21.1376953125, 119.8798828125, 32.64048767089844, 57.517730712890625, 107.78512573242188, 70.36139678955078, 147.365478515625, -13.875446319580078, 11.205795288085938, -25.129955291748047, 13.558723449707031, -43.71142578125, 5.786846160888672, 15.811355590820312, 113.4002456665039, 8.030616760253906, 79.7156753540039, 85.30587768554688, 52.90796661376953, 2.9263458251953125, -10.573776245117188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000291.npy"}
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 31.444477081298828, "std": 56.53643035888672, "min": -107.42623901367188, "p10": -33.59621505737303, "median": 19.377357482910156, "p90": 101.74904022216798, "max": 121.82402801513672, "pos_frac": 0.734375, "sample": [-90.9708251953125, -7.658973693847656, 90.3838882446289, 0.18801116943359375, 120.99638366699219, 47.44689178466797, 33.5494270324707, 121.41830444335938, 91.43173217773438, -15.031906127929688, -47.053497314453125, 3.9365882873535156, 11.148519515991211, 10.822517395019531, -60.787445068359375, -1.383392333984375, 58.049224853515625, 58.86570739746094, 80.46546936035156, 10.329025268554688, 38.02186584472656, 85.31108093261719, 77.580810546875, -40.015647888183594, -8.621246337890625, -1.5769271850585938, 0.8200569152832031, 102.4061508178711, -18.617538452148438, 80.1694564819336, 21.042457580566406, 90.48906707763672, 6.016359329223633, -5.373992919921875, 0.10964775085449219, -101.74553680419922, 98.74278259277344, -107.42623901367188, 0.03607940673828125, 7.574699401855469, -4.696258544921875, 100.21578216552734, 85.09402465820312, -6.103891372680664, 121.10498046875, 85.40599060058594, 8.69500732421875, 65.77536010742188, 31.668926239013672, 4.8900299072265625, 115.61041259765625, 94.65171813964844, 2.182069778442383, 97.29830932617188, 57.43368911743164, 17.712257385253906, 53.1999397277832, 23.101947784423828, 118.31111145019531, 22.55274200439453, 16.962310791015625, -5.2414093017578125, -56.291587829589844, 121.82402801513672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000292.npy"}
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 45.242408752441406, "std": 62.27794647216797, "min": -118.57060241699219, "p10": -14.782803726196288, "median": 32.5662956237793, "p90": 119.82091522216797, "max": 144.44635009765625, "pos_frac": 0.765625, "sample": [4.407886505126953, 28.426063537597656, 7.095134735107422, 119.48590087890625, 70.53231811523438, 20.44814682006836, -15.242694854736328, 21.97195816040039, 76.38826751708984, 4.75115966796875, 119.96449279785156, -9.123708724975586, 120.76530456542969, 97.80237579345703, 96.89918518066406, 5.338020324707031, 108.34986877441406, 75.26737976074219, 99.59326171875, 103.6470718383789, 84.80201721191406, -32.51268005371094, -8.980979919433594, -81.87785339355469, 98.96492004394531, 0.9894256591796875, 81.66293334960938, -45.950504302978516, 18.538902282714844, 136.15762329101562, 0.9188575744628906, 130.31341552734375, 122.5509033203125, -81.80758666992188, 101.09494018554688, 3.5903663635253906, 115.89966583251953, -118.57060241699219, -4.231815338134766, 16.398895263671875, 24.7718505859375, 118.75469970703125, -63.95767593383789, -10.259963989257812, -2.668121337890625, 1.3942337036132812, 36.70652770996094, 100.49223327636719, 144.44635009765625, 18.616836547851562, 107.00567626953125, -2.610870361328125, 57.18494415283203, 1.2587242126464844, 109.66590881347656, -13.709724426269531, 106.7001953125, 13.411514282226562, 78.40149688720703, 122.44760131835938, 43.674774169921875, -7.867710113525391, 102.37101745605469, 114.5653076171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000293.npy"}
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 30.249427795410156, "std": 50.13374328613281, "min": -92.17405700683594, "p10": -15.169700050354002, "median": 9.022289276123047, "p90": 115.87930374145508, "max": 140.16067504882812, "pos_frac": 0.71875, "sample": [115.69706726074219, 19.826583862304688, 40.950775146484375, 7.59619140625, 107.73379516601562, 17.88813018798828, 122.63249206542969, -16.153274536132812, -38.35481262207031, 4.083244323730469, 50.3311767578125, -92.17405700683594, -12.874692916870117, 18.221649169921875, 14.848047256469727, 89.95571899414062, 115.95740509033203, 6.557258605957031, 119.84083557128906, 17.45197296142578, 0.451446533203125, 10.448387145996094, 2.4764347076416016, 55.9814453125, -4.0767669677734375, 126.47081756591797, -4.478717803955078, 64.23606872558594, -1.5535964965820312, 134.62374877929688, 109.51406860351562, -24.789478302001953, 1.966623306274414, 140.16067504882812, -6.809518814086914, 0.9006805419921875, -4.416572570800781, -2.7459030151367188, -19.283531188964844, 31.669326782226562, 1.5475082397460938, -0.69171142578125, 112.21941375732422, -0.3923912048339844, 3.895345687866211, 51.49712371826172, 22.490007400512695, 30.791614532470703, 4.904766082763672, 122.37841796875, -18.502033233642578, 41.445255279541016, 81.15924072265625, 3.5105133056640625, 5.902318954467773, 47.31166076660156, 2.4905853271484375, -1.8770084381103516, 16.810699462890625, 4.7143096923828125, 81.90066528320312, -29.011432647705078, 37.39503860473633, -6.687652587890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000294.npy"}
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 28.76223373413086, "std": 54.57584762573242, "min": -94.53628540039062, "p10": -24.72507019042968, "median": 9.375619888305664, "p90": 107.85434646606447, "max": 142.35092163085938, "pos_frac": 0.734375, "sample": [114.08041381835938, 9.337142944335938, -3.665985107421875, -6.4223175048828125, 53.42902374267578, 78.8941650390625, 109.00064849853516, 24.500858306884766, 26.22388458251953, 20.043014526367188, 123.27086639404297, -13.321468353271484, 142.35092163085938, 14.452980041503906, 9.41409683227539, 8.874519348144531, 2.33819580078125, 4.125679016113281, -15.832412719726562, 92.46562194824219, 114.26187133789062, -81.57745361328125, -2.4295082092285156, 5.881437301635742, 58.683319091796875, 3.39202880859375, 5.336305618286133, -12.68740463256836, 99.39846801757812, 5.326072692871094, -28.536209106445312, -3.9534568786621094, 126.28663635253906, 105.17964172363281, 21.84088134765625, -67.44232177734375, 104.97380065917969, 25.94827651977539, 57.732723236083984, 121.70820617675781, 69.5797119140625, -94.53628540039062, 6.7517242431640625, -76.86419677734375, 77.32414245605469, 40.99193572998047, -47.15167236328125, 3.1253738403320312, -41.52752685546875, 64.5719223022461, -7.94801139831543, 3.2519264221191406, 19.943138122558594, 5.634674072265625, 5.149404525756836, 5.968244552612305, 90.9287109375, -8.272735595703125, 19.59778594970703, 1.687713623046875, 82.65013122558594, 69.98580932617188, 101.78374481201172, -4.725757598876953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000295.npy"}
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 35.291046142578125, "std": 56.69245147705078, "min": -95.20028686523438, "p10": -39.19529991149901, "median": 22.251319885253906, "p90": 116.91401062011721, "max": 137.93704223632812, "pos_frac": 0.796875, "sample": [54.22190856933594, 102.49919128417969, 97.0086669921875, 12.401792526245117, 1.0414199829101562, 12.196517944335938, 2.913095474243164, 133.9951171875, 3.133228302001953, 78.35728454589844, 21.461620330810547, 13.679088592529297, -58.884368896484375, 123.09782409667969, 1.7670669555664062, 37.73283386230469, 103.29022216796875, 75.54951477050781, 10.209396362304688, 32.7171630859375, 1.0676727294921875, 23.041019439697266, 5.193206787109375, -57.45208740234375, -22.32269287109375, -15.656753540039062, 2.2401885986328125, 37.03661346435547, 12.832893371582031, 28.009254455566406, 10.482877731323242, -58.75864028930664, 27.62188720703125, 0.73992919921875, 137.93704223632812, 49.99244689941406, -2.6503448486328125, -43.97847366333008, 99.59899139404297, -58.06498718261719, 34.376365661621094, 77.77000427246094, 124.1883316040039, -95.20028686523438, 95.96348571777344, 20.42303466796875, 85.00759887695312, 85.24779510498047, 83.7135009765625, 77.09375762939453, 111.96064758300781, 12.845947265625, 119.03688049316406, 11.724233627319336, 136.04473876953125, -5.4791412353515625, 10.905776977539062, 124.33393096923828, -12.69732666015625, -28.034561157226562, -66.40377807617188, 84.53973388671875, 78.39840698242188, 57.569068908691406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000296.npy"}
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 39.51006317138672, "std": 50.20656204223633, "min": -85.0361557006836, "p10": -8.032652854919434, "median": 26.721290588378906, "p90": 116.17851943969727, "max": 126.742919921875, "pos_frac": 0.78125, "sample": [92.81709289550781, -37.21488952636719, 3.4799118041992188, 82.78746795654297, 39.6048583984375, 18.37054443359375, 12.960029602050781, 114.35237884521484, -3.255706787109375, -85.0361557006836, 91.13673400878906, -2.7884063720703125, 17.670551300048828, 25.213298797607422, 3.418264389038086, 120.62244415283203, 15.204010009765625, 62.7745246887207, 15.540332794189453, 3.9813079833984375, 27.053024291992188, 6.068389892578125, -23.496505737304688, -64.01299285888672, -7.217607498168945, 116.96115112304688, 31.742897033691406, 7.041168212890625, 98.50971221923828, 4.513359069824219, 111.71617889404297, 121.63516235351562, -4.938690185546875, -7.093324661254883, 9.811267852783203, 15.996511459350586, 8.427459716796875, -8.3819580078125, 100.235595703125, 61.93738555908203, 51.948944091796875, 29.63797378540039, 87.30509948730469, 61.63819885253906, 26.389556884765625, 76.16068267822266, 28.352622985839844, 0.5084123611450195, -11.235382080078125, 119.01510620117188, -1.8704795837402344, 90.2021484375, 107.50227355957031, 80.37092590332031, -1.8831443786621094, 55.554473876953125, 119.00265502929688, 65.8426513671875, 3.84576416015625, 126.742919921875, -11.048240661621094, 31.99850082397461, 125.4186019897461, 69.09709930419922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000297.npy"}
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 29.97411346435547, "std": 60.637428283691406, "min": -112.7193832397461, "p10": -46.865143585205075, "median": 11.745721817016602, "p90": 115.619881439209, "max": 135.05335998535156, "pos_frac": 0.71875, "sample": [31.623680114746094, -76.71475982666016, 135.05335998535156, 9.927398681640625, 124.60609436035156, 7.0505523681640625, 92.95140075683594, 2.0160369873046875, -60.02454376220703, 95.10185241699219, 124.61708068847656, 113.3048324584961, -57.801666259765625, 42.7548828125, 2.3291244506835938, 49.49568557739258, 71.6585693359375, -37.10003662109375, 92.14100646972656, 84.82966613769531, 5.454227447509766, -1.9728889465332031, 86.58625030517578, 27.4595947265625, 30.195289611816406, 9.432037353515625, -83.33527374267578, 8.364599227905273, 71.81916809082031, 116.61204528808594, 1.5079669952392578, 125.14237976074219, -4.895294189453125, 14.091056823730469, 23.19466781616211, 98.20138549804688, 5.262693405151367, 118.27913665771484, 4.053079605102539, 2.7583236694335938, -0.9080810546875, -3.0694751739501953, -5.414794921875, 14.82110595703125, 21.630233764648438, 129.13504028320312, 110.34806823730469, -1.7433700561523438, -17.795429229736328, 4.3545684814453125, 41.854373931884766, 9.72772216796875, -41.383399963378906, -49.21446228027344, -5.195587158203125, 96.2231674194336, -83.10122680664062, 87.07447814941406, -19.075592041015625, -112.7193832397461, 111.63549041748047, 13.564044952392578, 2.0964889526367188, 109.46875762939453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000298.npy"}
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 26.14179229736328, "std": 60.901329040527344, "min": -103.95539093017578, "p10": -59.347006225585936, "median": 16.532028198242188, "p90": 106.59142379760745, "max": 145.43276977539062, "pos_frac": 0.65625, "sample": [99.42054748535156, 81.5980224609375, -6.728239059448242, -15.469612121582031, 4.300081253051758, 96.2275390625, 16.7467041015625, 53.25595474243164, 145.43276977539062, -0.5400276184082031, -52.90143585205078, 67.57572937011719, -103.95539093017578, 3.4325485229492188, -82.88209533691406, -19.197097778320312, 16.264190673828125, 112.13325500488281, 48.954254150390625, -60.253684997558594, 112.04876708984375, -24.275188446044922, 32.65641403198242, 57.30232620239258, -64.22477722167969, -84.89252471923828, 0.35347747802734375, 4.945560455322266, 126.69709777832031, -0.7587127685546875, 93.78573608398438, -2.9877700805664062, 52.824073791503906, -86.05465698242188, 4.046295166015625, 10.9896240234375, 54.2838134765625, 108.98631286621094, 16.42523956298828, 16.638816833496094, 58.21527099609375, -44.253807067871094, 100.0199966430664, 101.00334930419922, 114.37739562988281, -57.231422424316406, 12.726797103881836, 85.15204620361328, 82.23953247070312, 71.3592529296875, -1.1752300262451172, 23.8878173828125, 70.85511779785156, -68.59982299804688, 8.7852783203125, 80.39559936523438, -25.266448974609375, 92.72770690917969, -0.8379077911376953, 24.203758239746094, -0.8695735931396484, 50.18628692626953, -54.234920501708984, 117.20475769042969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000299.npy"}
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 33.914119720458984, "std": 58.671775817871094, "min": -93.97015380859375, "p10": -43.09077606201172, "median": 29.93411922454834, "p90": 114.84218673706056, "max": 133.4669952392578, "pos_frac": 0.734375, "sample": [32.62213897705078, 106.44983673095703, 2.9283905029296875, 72.28111267089844, -0.9503822326660156, 22.321304321289062, 95.6396255493164, 11.172576904296875, -1.509237289428711, -82.4806900024414, 127.50338745117188, 24.82776641845703, 81.5314712524414, 121.83114624023438, 124.66725158691406, 64.7694091796875, 118.59031677246094, -43.6209716796875, -25.387245178222656, 82.51573181152344, 3.0587635040283203, 82.5044174194336, 37.065826416015625, -15.363426208496094, -67.34138488769531, 76.53564453125, 100.06326293945312, -15.62261962890625, 30.59223175048828, 37.93998718261719, 6.710201263427734, -91.12504577636719, 83.13581848144531, -0.404693603515625, -93.97015380859375, 47.56991195678711, 29.2760066986084, 123.76298522949219, 1.5462646484375, 112.90973663330078, 47.342132568359375, 35.99948501586914, 3.811737060546875, 133.4669952392578, 90.0575942993164, -41.85365295410156, 1.0060920715332031, 89.70864868164062, -2.79498291015625, 75.68540954589844, 67.24588012695312, 9.40591812133789, 28.923583984375, 41.79249572753906, -10.671928405761719, -65.54875946044922, 24.738479614257812, 102.78286743164062, 72.48696899414062, -66.61636352539062, 12.721900939941406, 1.5394210815429688, 115.67037963867188, -20.943252563476562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000300.npy"}
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 30.530223846435547, "std": 55.7114143371582, "min": -117.0846176147461, "p10": -28.530914688110347, "median": 24.509875297546387, "p90": 111.58396148681642, "max": 150.63406372070312, "pos_frac": 0.71875, "sample": [93.67819213867188, 12.536949157714844, 38.685035705566406, 31.938663482666016, 37.17755126953125, -30.040176391601562, 89.06974029541016, 121.45365905761719, 89.534423828125, 10.980964660644531, 32.692726135253906, 26.0336856842041, 20.93048095703125, -18.217697143554688, -100.1957778930664, 12.998672485351562, -1.457489013671875, 75.94363403320312, 86.93109893798828, 4.148181915283203, -42.645668029785156, 45.26228713989258, 117.132080078125, -7.330024719238281, 47.94414520263672, 2.4194869995117188, -3.290983200073242, 112.4811782836914, 80.03399658203125, -18.25493621826172, -20.974468231201172, 28.249427795410156, 56.254093170166016, 93.65206909179688, -44.07160949707031, -0.2143096923828125, -18.668991088867188, 150.63406372070312, -54.70136260986328, 22.986064910888672, -10.362569808959961, 29.56410026550293, 36.27833557128906, -70.19786071777344, 3.6961135864257812, 83.09394836425781, 100.91331481933594, 50.53492736816406, 19.356781005859375, 115.56181335449219, 18.10975456237793, 69.70271301269531, 64.2001953125, -25.00930404663086, -16.531021118164062, 7.697345733642578, 114.5706787109375, 109.4904556274414, 31.728683471679688, 7.835334777832031, 22.092636108398438, -117.0846176147461, 124.03341674804688, 2.9400558471679688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000301.npy"}
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 22.355518341064453, "std": 53.07563018798828, "min": -114.89791870117188, "p10": -38.38638267517089, "median": 17.61827278137207, "p90": 97.58284301757816, "max": 139.27352905273438, "pos_frac": 0.703125, "sample": [55.893104553222656, 133.1986083984375, 3.813690185546875, 139.27352905273438, 42.22099304199219, 24.126733779907227, 103.58863830566406, -10.746833801269531, 119.6364974975586, 17.140121459960938, 37.63665771484375, 35.04882049560547, 7.475469589233398, 5.713596343994141, 78.21734619140625, 111.02293395996094, 2.496856689453125, 15.15399169921875, 5.887779235839844, 3.963836669921875, 19.423717498779297, -87.2077865600586, -88.84706115722656, 20.642623901367188, 33.32231140136719, -9.511556625366211, 28.374656677246094, 52.47925567626953, -7.757328033447266, -4.9359283447265625, -22.00745391845703, 46.43922424316406, 55.49479675292969, 4.768106460571289, -43.032958984375, 100.83518981933594, 122.1971435546875, -25.931137084960938, 10.262237548828125, 26.394451141357422, 54.03382873535156, 89.99403381347656, -1.2543144226074219, 44.8206787109375, 28.072357177734375, 37.05134582519531, -100.06062316894531, -4.444068908691406, -10.850372314453125, 44.882667541503906, 77.17285919189453, 49.731903076171875, -0.26905059814453125, 1.7254180908203125, 18.096424102783203, -48.78535842895508, -34.86882019042969, 84.50289916992188, -6.76190185546875, -39.8939094543457, -114.89791870117188, 14.018829345703125, 73.00773620605469, 13.563650131225586], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000302.npy"}
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 41.81725311279297, "std": 60.561119079589844, "min": -97.42330932617188, "p10": -46.60631790161132, "median": 46.40812683105469, "p90": 119.27970428466799, "max": 136.28057861328125, "pos_frac": 0.703125, "sample": [-8.230453491210938, 55.660789489746094, 94.043212890625, 24.24309730529785, 126.3907470703125, 102.55517578125, 80.91777038574219, -49.160614013671875, 97.39221954345703, 51.36952209472656, 112.62483215332031, -40.64629364013672, 30.237491607666016, -6.578142166137695, -77.47224426269531, -63.826988220214844, -54.37846374511719, 108.73049926757812, -97.42330932617188, -18.1893310546875, -0.0961151123046875, 13.438505172729492, 136.28057861328125, 74.08226013183594, -35.7764892578125, 77.81204223632812, 84.31201171875, -53.464088439941406, -3.0425491333007812, 29.839935302734375, 61.261940002441406, 2.25311279296875, 6.365119934082031, 41.44673156738281, 17.418685913085938, -13.71352767944336, 115.63218688964844, 18.290687561035156, 93.93013000488281, 83.67221069335938, 17.331016540527344, 104.2511215209961, 60.442420959472656, 122.51237487792969, -2.6289138793945312, 96.05207824707031, 127.79441833496094, 34.64140319824219, 58.87446594238281, 106.29586029052734, -8.800094604492188, -2.1378345489501953, -49.3566780090332, 19.247772216796875, 125.76628112792969, 85.34474182128906, 23.277528762817383, 60.11848449707031, 78.99775695800781, 109.854248046875, 120.84292602539062, -37.587379455566406, 78.30918884277344, 128.6580810546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000303.npy"}
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 41.36079406738281, "std": 47.26332092285156, "min": -107.73617553710938, "p10": -6.18816795349121, "median": 36.32781982421875, "p90": 110.12300186157228, "max": 135.26251220703125, "pos_frac": 0.8125, "sample": [112.6588134765625, -6.39996337890625, 49.58638000488281, 116.55278015136719, -5.7538909912109375, 53.11484909057617, 34.81990051269531, 62.80047607421875, -28.235639572143555, 14.389392852783203, 7.070587158203125, 83.98660278320312, 79.58795166015625, 37.153167724609375, 94.42733764648438, 47.98456573486328, -1.8494377136230469, 34.8644905090332, 88.459716796875, -107.73617553710938, 27.37847137451172, 101.79232788085938, 35.502471923828125, 23.995338439941406, -2.8400344848632812, 5.897735595703125, 10.548791885375977, 46.75619888305664, 34.10533142089844, 107.7126235961914, 44.89257049560547, 2.9700565338134766, 121.27090454101562, 44.27490997314453, 7.536895751953125, -53.26300048828125, 74.27215576171875, 85.54093933105469, 25.297828674316406, 69.737548828125, 2.628702163696289, -1.8014812469482422, 135.26251220703125, 40.15802001953125, 1.5068798065185547, 11.138259887695312, 57.53411865234375, 64.99100494384766, 111.15602111816406, 123.87262725830078, -17.407012939453125, -1.5743522644042969, 83.81753540039062, 3.389110565185547, 85.17499542236328, -6.374286651611328, 43.23609161376953, 119.1848373413086, 78.00971984863281, 27.5714111328125, 18.466690063476562, -18.167583465576172, 29.507797241210938, 74.94729614257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000304.npy"}
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 31.717294692993164, "std": 58.222312927246094, "min": -117.54310607910156, "p10": -48.42933502197265, "median": 26.551830291748047, "p90": 107.63830642700195, "max": 134.73439025878906, "pos_frac": 0.75, "sample": [105.80474853515625, 12.628623962402344, 10.761337280273438, -62.122779846191406, -5.818572998046875, 0.9832649230957031, 129.1018524169922, -59.20367431640625, 37.52435302734375, 114.43702697753906, 24.11263656616211, 128.65296936035156, -17.528045654296875, 87.86604309082031, -87.71498107910156, -15.526968002319336, 17.849708557128906, 5.814411163330078, -63.557830810546875, 70.19781494140625, 9.48770523071289, 28.413421630859375, 16.32343292236328, 52.266021728515625, 40.04035186767578, 59.7564697265625, 67.73059844970703, 1.070526123046875, 43.524658203125, -117.54310607910156, 4.925506591796875, 43.037620544433594, 107.27064514160156, 90.95943450927734, -15.797576904296875, 37.48176574707031, 134.73439025878906, -3.1798858642578125, 20.17702293395996, 54.123565673828125, 10.166046142578125, 11.556961059570312, 57.6845703125, 126.90889739990234, 95.28221893310547, -7.4522857666015625, 12.675888061523438, -43.290130615234375, 119.184326171875, 72.33317565917969, 90.3814926147461, 24.69023895263672, 29.45553207397461, 97.05770874023438, 60.98258972167969, 73.72686767578125, -40.82267761230469, 65.89299011230469, -80.58255004882812, 107.7958755493164, -19.056859970092773, 11.997682571411133, 94.90550231933594, -50.63185119628906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000305.npy"}
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 32.04744338989258, "std": 46.583473205566406, "min": -99.3863754272461, "p10": -14.585342025756832, "median": 22.42153549194336, "p90": 98.13080902099611, "max": 144.621337890625, "pos_frac": 0.78125, "sample": [124.76962280273438, 72.24675750732422, 19.836179733276367, 118.81634521484375, 21.0777587890625, 78.56021881103516, 7.994438171386719, 16.197856903076172, 25.348562240600586, 54.092315673828125, 104.55632781982422, 91.81793212890625, -39.07971954345703, 12.628910064697266, -16.308109283447266, -99.3863754272461, 99.82774353027344, 23.782379150390625, 45.03557205200195, -4.131710052490234, 4.789772033691406, 86.94632720947266, 86.8271713256836, 10.456502914428711, 7.833555221557617, 9.931999206542969, -19.08534049987793, 86.61692810058594, 3.5889053344726562, 54.295501708984375, -26.485260009765625, 94.17129516601562, -5.924060821533203, 4.42228889465332, 67.06177520751953, 30.572669982910156, 108.72840881347656, 128.29945373535156, 12.8134765625, -3.9418258666992188, 35.361976623535156, 3.975252151489258, 28.002460479736328, 41.55419158935547, -42.219947814941406, 23.76531219482422, -2.039663314819336, 41.558074951171875, 5.947700500488281, 25.500732421875, 64.50818634033203, 57.83690643310547, 1.9908599853515625, -1.1031494140625, 19.891708374023438, 12.052967071533203, 25.66063690185547, -10.5655517578125, 33.70258331298828, 144.621337890625, -38.587921142578125, 17.426101684570312, 63.928924560546875, -1.3357048034667969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000306.npy"}
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 15.79727554321289, "std": 57.541934967041016, "min": -93.02023315429688, "p10": -51.77424659729004, "median": 5.1778717041015625, "p90": 103.43731536865236, "max": 130.12538146972656, "pos_frac": 0.5625, "sample": [89.35994720458984, -46.34673309326172, 7.419864654541016, -21.871673583984375, -93.02023315429688, 51.623817443847656, 130.12538146972656, -38.03742218017578, 116.37985229492188, 86.03627014160156, -35.12432861328125, 16.072555541992188, 18.576305389404297, 119.4530029296875, -90.96383666992188, 39.363983154296875, 26.064895629882812, 0.07524871826171875, 65.27241516113281, -34.273162841796875, 44.26042175292969, -74.32604217529297, -1.8333816528320312, -70.11756896972656, 18.714340209960938, -59.446624755859375, -41.630401611328125, 42.26837921142578, 55.72114562988281, -1.1734695434570312, -18.443458557128906, 18.656522750854492, -0.7208251953125, -53.21586227416992, -48.41047668457031, 47.8380126953125, -34.91392517089844, -0.00995635986328125, 97.26695251464844, -8.04998779296875, 2.3243885040283203, -14.510063171386719, 2.9358787536621094, 110.01268005371094, 112.19856262207031, -28.509483337402344, 32.627784729003906, 106.08175659179688, 72.22933197021484, 50.56073760986328, -71.83782958984375, 36.99895477294922, 90.51299285888672, -1.4368953704833984, 49.33143615722656, 44.297115325927734, 2.8707504272460938, 115.45230102539062, -46.56415557861328, -38.64435577392578, -46.90676498413086, 26.412567138671875, 88.71289825439453, -2.7449378967285156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000307.npy"}
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 30.68783187866211, "std": 53.82151794433594, "min": -119.14776611328125, "p10": -34.826884460449214, "median": 24.200960159301758, "p90": 106.19354553222657, "max": 128.06982421875, "pos_frac": 0.75, "sample": [110.88317108154297, 52.99706268310547, 44.40330505371094, -29.390514373779297, 101.2454833984375, 75.79641723632812, -18.146148681640625, 34.08001708984375, 44.225677490234375, -1.7432785034179688, 106.73876953125, 97.75276184082031, 116.75660705566406, 12.0439453125, 1.251983642578125, -50.13663101196289, 53.49524688720703, 104.92135620117188, 49.42656707763672, 27.242389678955078, 65.92965698242188, -14.067035675048828, 108.86943817138672, 50.19412612915039, 14.798591613769531, 45.95703125, -2.398773193359375, 65.68142700195312, 59.16162109375, -81.3040771484375, 128.06982421875, 8.959068298339844, 21.84701919555664, -37.15675735473633, 53.94291687011719, 0.5246124267578125, 86.4717025756836, -2.3014259338378906, 0.7661285400390625, -119.14776611328125, 7.9298095703125, 1.1401290893554688, 67.08421325683594, 121.34477233886719, 14.860980987548828, 26.554901123046875, 50.238990783691406, -12.366012573242188, 77.4956283569336, 3.283428192138672, 94.7511978149414, 5.6048583984375, -11.861831665039062, 60.928245544433594, 95.87570190429688, 15.767127990722656, 113.43122100830078, -47.2769775390625, -52.72099304199219, 7.027814865112305, 5.469663619995117, -77.93632507324219, 11.125534057617188, -2.3723583221435547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000308.npy"}
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 31.875831604003906, "std": 47.73465347290039, "min": -80.56820678710938, "p10": -18.199723815917967, "median": 29.681879997253418, "p90": 94.79745559692384, "max": 134.95211791992188, "pos_frac": 0.6875, "sample": [70.12309265136719, 132.8091278076172, 8.25674819946289, 96.0762939453125, 49.913673400878906, -0.06371307373046875, 81.31622314453125, -10.2845458984375, 60.87678527832031, 13.131385803222656, 32.37884521484375, -0.8835182189941406, -40.657432556152344, -5.718671798706055, 17.289321899414062, -80.56820678710938, 29.996549606323242, 90.23075866699219, -31.32410430908203, -3.576547622680664, 3.47235107421875, 48.50777816772461, 80.07109069824219, -3.529695510864258, 71.62814331054688, 76.97372436523438, -5.752433776855469, 29.726659774780273, 21.66321563720703, -0.1584320068359375, 15.527814865112305, 29.637100219726562, -19.103225708007812, 95.60150146484375, 71.02926635742188, 49.09274673461914, 30.343551635742188, 5.142047882080078, -10.052675247192383, 59.75299835205078, -5.7179412841796875, 111.9940185546875, 72.76823425292969, 10.455230712890625, 134.95211791992188, -75.78195190429688, 92.92134857177734, 45.36574935913086, 19.70857048034668, 44.60539245605469, -16.091552734375, 4.168846130371094, 35.11528778076172, 53.465057373046875, 120.01236724853516, 29.468902587890625, 40.43425750732422, 81.98184204101562, -29.155181884765625, -36.53643798828125, -14.57489013671875, -9.961715698242188, 52.50982666015625, 119.05032348632812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000309.npy"}
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 19.62242889404297, "std": 51.486820220947266, "min": -66.24376678466797, "p10": -40.40702590942382, "median": 9.409549713134766, "p90": 91.83303833007814, "max": 141.1099853515625, "pos_frac": 0.609375, "sample": [92.99891662597656, -0.7918853759765625, 20.649913787841797, 36.24848937988281, 27.768783569335938, -63.96070098876953, -13.376190185546875, 72.56941223144531, 86.8973617553711, 0.13953590393066406, -28.645517349243164, -35.38877868652344, 56.79210662841797, 119.05252075195312, 31.57745361328125, -1.3779869079589844, 0.20138168334960938, 65.16677856445312, 131.7449493408203, -25.79443359375, 3.1278724670410156, -28.497188568115234, -66.24376678466797, 141.1099853515625, -31.013099670410156, -37.36225128173828, 41.86194610595703, -19.226303100585938, 18.12823486328125, 89.11265563964844, -46.48951721191406, 66.32743835449219, -2.8185882568359375, 86.547607421875, -41.71192932128906, 47.259765625, 5.750877380371094, 52.011749267578125, 19.45154571533203, 50.466094970703125, 2.0718002319335938, 13.068222045898438, -56.40446472167969, -26.74017333984375, 46.07453155517578, 30.63128662109375, -3.0894775390625, 46.65376281738281, -3.2645492553710938, -8.436798095703125, 4.091163635253906, 110.36943054199219, 94.69416046142578, -65.41502380371094, 1.092081069946289, -32.083595275878906, -19.7294921875, 108.75092315673828, 33.104652404785156, -18.313201904296875, -57.28666687011719, 88.39159393310547, 21.936216354370117, 25.403854370117188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000310.npy"}
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 30.218578338623047, "std": 49.6635627746582, "min": -128.88417053222656, "p10": -18.721763038635252, "median": 24.969532012939453, "p90": 101.15419769287111, "max": 177.15829467773438, "pos_frac": 0.75, "sample": [96.58047485351562, 2.4212265014648438, 16.80329132080078, -28.629859924316406, -14.274826049804688, -22.339553833007812, 42.44301986694336, 108.71576690673828, -4.891233444213867, 70.0263671875, 5.9026641845703125, -128.88417053222656, 2.7560501098632812, 66.11348724365234, 83.13555908203125, -0.9423885345458984, 74.43556213378906, 39.10181427001953, 105.55145263671875, 39.52069091796875, 3.739990234375, 54.87751770019531, 0.6342792510986328, 28.258155822753906, 77.02704620361328, 11.623645782470703, -29.634395599365234, -10.72043228149414, 13.697021484375, 30.042266845703125, -19.661026000976562, 125.5325927734375, 143.9552459716797, 36.447998046875, 6.906787872314453, 56.322105407714844, 6.078895568847656, 47.371891021728516, 19.065887451171875, 40.158050537109375, 177.15829467773438, 29.786453247070312, -9.546943664550781, 103.11436462402344, -40.32129669189453, 11.830856323242188, 17.054183959960938, -8.135459899902344, 39.78001403808594, 15.172119140625, 127.07023620605469, 38.45672607421875, 45.73113250732422, -16.530149459838867, 25.404991149902344, 75.16096496582031, 1.1204452514648438, -37.21665954589844, -10.583343505859375, 24.534072875976562, -0.23764801025390625, 30.21900177001953, 54.89118957519531, 44.80657958984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000311.npy"}
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 39.848915100097656, "std": 46.23249435424805, "min": -70.20339965820312, "p10": -9.666482162475585, "median": 31.796886444091797, "p90": 107.1795936584473, "max": 129.73745727539062, "pos_frac": 0.859375, "sample": [29.02065658569336, 2.645923614501953, -12.778564453125, 8.0535888671875, 23.96265411376953, 6.213996887207031, 47.17335891723633, -0.3295326232910156, 16.163253784179688, 17.466285705566406, 126.99527740478516, 81.97691345214844, 67.70267486572266, 78.09033966064453, 2.204357147216797, 76.18880462646484, 10.831153869628906, 0.8408012390136719, 7.492315292358398, 54.45172119140625, 120.73992156982422, 19.85431671142578, 11.392547607421875, 15.193504333496094, 34.08110809326172, 97.0444564819336, 11.333030700683594, -10.13250732421875, 28.12030029296875, 68.59259796142578, 124.64288330078125, -57.04993438720703, 61.673133850097656, 54.214500427246094, -66.53765106201172, 35.61972427368164, 94.23629760742188, 15.0577392578125, 80.6158447265625, 126.72917175292969, 129.73745727539062, 1.0218849182128906, 28.641952514648438, 52.31336975097656, 89.39517211914062, 25.00959014892578, 37.77375030517578, 111.52322387695312, 51.83531188964844, -8.579090118408203, -22.322463989257812, 66.49031066894531, 119.46530151367188, 70.44297790527344, 29.512664794921875, 77.53886413574219, 68.22130584716797, 44.84593200683594, -70.20339965820312, 70.17355346679688, 51.64832305908203, -18.286422729492188, 25.86041259765625, 8.483566284179688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000312.npy"}
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 21.175430297851562, "std": 49.661590576171875, "min": -79.16624450683594, "p10": -33.66112289428711, "median": 11.102115631103516, "p90": 93.32003097534181, "max": 148.70074462890625, "pos_frac": 0.671875, "sample": [2.170339584350586, 28.893089294433594, 49.05249786376953, 13.278728485107422, 11.132820129394531, 36.30608367919922, 3.5910263061523438, 20.77431297302246, 58.878395080566406, -61.067222595214844, 1.5187721252441406, 40.63292694091797, 18.619224548339844, 32.113433837890625, 4.100105285644531, -1.271728515625, 0.6611709594726562, -33.08716583251953, -1.8234405517578125, -3.6052398681640625, 43.042564392089844, 74.32099151611328, 3.6795902252197266, 136.0355987548828, -78.53741455078125, -30.886579513549805, -5.100044250488281, 11.0714111328125, 88.96978759765625, 29.381675720214844, 127.98519897460938, 96.54292297363281, -51.66007995605469, -5.236749649047852, 94.15591430664062, 29.979660034179688, 1.901275634765625, 4.684810638427734, -1.8801116943359375, 123.25243377685547, 20.290058135986328, 20.22543716430664, 43.33258056640625, -5.1970977783203125, 148.70074462890625, 1.0166473388671875, -14.135643005371094, -15.460578918457031, 18.035247802734375, -1.2480850219726562, 91.36963653564453, 26.99028778076172, 44.63690948486328, -1.1550712585449219, -2.4475364685058594, -66.22880554199219, 71.29834747314453, -79.16624450683594, 103.86172485351562, -33.9071044921875, -42.471675872802734, 19.780426025390625, 5.264617919921875, 89.27171325683594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000313.npy"}
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 42.15062713623047, "std": 52.70272445678711, "min": -60.211219787597656, "p10": -7.010929298400878, "median": 30.57467746734619, "p90": 120.96314926147461, "max": 148.77606201171875, "pos_frac": 0.765625, "sample": [22.79126739501953, 10.406993865966797, 13.926567077636719, 128.83665466308594, 42.82394790649414, 74.25464630126953, 0.0103302001953125, 97.36768341064453, -60.211219787597656, 41.137413024902344, -2.71380615234375, -3.497333526611328, 139.476318359375, 37.88133239746094, -3.54833984375, 148.77606201171875, 41.35649490356445, 31.75096321105957, 109.02657318115234, 7.803352355957031, 24.636184692382812, -0.37485504150390625, 106.69615173339844, 119.87059020996094, 115.37869262695312, -55.77423095703125, 22.139877319335938, 125.03497314453125, 10.444717407226562, 1.41741943359375, 75.97294616699219, 43.771270751953125, 25.536640167236328, -6.025909423828125, 1.8710880279541016, 34.495845794677734, 68.56561279296875, -20.845748901367188, 11.866601943969727, 71.41046142578125, -7.433080673217773, -0.31646728515625, 121.43138885498047, 60.28713607788086, 107.36319732666016, 11.979385375976562, 81.61711120605469, 113.30517578125, -4.187263488769531, 123.3920669555664, 29.398391723632812, 113.03475189208984, 71.88796997070312, 13.714775085449219, 78.30523681640625, 24.294296264648438, -18.100112915039062, 38.38860321044922, 9.914993286132812, -1.77685546875, 46.78883361816406, -48.18983459472656, -47.591861724853516, 126.38817596435547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000314.npy"}
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 32.72768783569336, "std": 48.14788055419922, "min": -92.77214050292969, "p10": -25.613632202148438, "median": 30.832984924316406, "p90": 95.06671295166015, "max": 133.0277557373047, "pos_frac": 0.75, "sample": [-7.793846130371094, 70.8607177734375, 105.28604888916016, -5.03343391418457, -0.6037158966064453, 37.44416046142578, -92.77214050292969, -1.2457332611083984, -58.67985534667969, 22.153846740722656, -55.98571014404297, 32.61383819580078, 0.5544166564941406, 110.99169158935547, -0.9563636779785156, 49.38292694091797, 20.66303253173828, -14.329414367675781, 11.459842681884766, 0.6021862030029297, 38.047019958496094, 53.82997131347656, 1.6334114074707031, 40.074241638183594, 24.640655517578125, 46.90340042114258, 79.0606689453125, -23.088043212890625, 28.78847885131836, 35.91017150878906, 26.605667114257812, 56.345088958740234, 92.01172637939453, 19.818801879882812, 23.235214233398438, 87.84271240234375, 88.96662902832031, -23.960189819335938, -33.720184326171875, 46.63304138183594, 94.81487274169922, -3.5311737060546875, 88.48609924316406, 110.80253601074219, -26.322250366210938, 58.447174072265625, 50.94215393066406, 133.0277557373047, -38.01527786254883, 29.05213165283203, 45.67051696777344, 5.66961669921875, 87.02301025390625, 0.8560733795166016, 68.12297058105469, 10.104042053222656, 95.17464447021484, 51.599273681640625, 111.51170349121094, -43.700584411621094, 66.84891510009766, 1.3063907623291016, 117.47904968261719, 45.011474609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000315.npy"}
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 41.38255310058594, "std": 47.1692008972168, "min": -43.423377990722656, "p10": -15.577214813232418, "median": 41.014991760253906, "p90": 111.23598403930664, "max": 124.97417449951172, "pos_frac": 0.75, "sample": [-8.839263916015625, 26.266265869140625, 57.31480407714844, -12.019798278808594, -32.593597412109375, -6.836875915527344, 45.521636962890625, 36.689117431640625, 22.44518280029297, 110.4058609008789, 124.97417449951172, 65.944580078125, 70.87493896484375, 78.52424621582031, 21.725616455078125, -39.41062927246094, -43.423377990722656, 13.046348571777344, -18.412675857543945, 76.25593566894531, 17.383302688598633, 116.82829284667969, 3.0400009155273438, 9.262306213378906, 29.346023559570312, 35.668575286865234, 43.47942352294922, 4.3969268798828125, -9.868690490722656, 31.037193298339844, 108.28176879882812, 73.81925964355469, 81.75273132324219, 116.80976867675781, -11.197870254516602, 69.41400146484375, -17.101821899414062, 76.15159606933594, 55.45856475830078, 79.26592254638672, 110.02786254882812, 61.58177947998047, 38.550559997558594, -1.7986717224121094, 111.59175109863281, 2.971010208129883, 0.8318023681640625, 122.5760498046875, -8.722366333007812, 54.71258544921875, -20.281463623046875, -30.33330726623535, 6.9149017333984375, -11.677589416503906, 51.25001525878906, 46.84880065917969, -0.2959709167480469, 120.83837890625, 80.77159118652344, 92.80487823486328, 63.826560974121094, 50.02391052246094, 121.63142395019531, 82.15924835205078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000316.npy"}
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 44.12655258178711, "std": 51.27021026611328, "min": -81.2801513671875, "p10": -9.729949951171873, "median": 38.38996887207031, "p90": 124.30546264648437, "max": 143.59030151367188, "pos_frac": 0.796875, "sample": [60.353126525878906, 69.77021789550781, 47.62666702270508, 79.23104858398438, 35.9801025390625, 143.59030151367188, -21.12453842163086, 54.63977813720703, -30.466352462768555, 29.645896911621094, 2.7265663146972656, 41.832637786865234, -10.357063293457031, 10.84906005859375, 44.93650817871094, 28.364025115966797, 97.04962158203125, 34.74412536621094, 79.17400360107422, 126.79402160644531, -42.7060546875, -3.5143508911132812, 35.823211669921875, -41.507720947265625, 41.07970428466797, 29.237136840820312, 99.4666748046875, 68.0489273071289, 76.5988998413086, -81.2801513671875, 129.10003662109375, 12.709915161132812, 57.63182830810547, 132.2005157470703, 124.35013580322266, -61.46088409423828, 52.75434875488281, -0.22417831420898438, 62.88018798828125, 130.05062866210938, 12.996540069580078, 37.34574890136719, 133.83773803710938, -8.266685485839844, 19.13018035888672, 8.505844116210938, 24.20868682861328, 110.81935119628906, 9.149581909179688, -1.1501083374023438, 76.58042907714844, 21.99386215209961, 39.43418884277344, 124.20122528076172, -0.1671600341796875, 44.91246032714844, -7.819061279296875, 50.6964111328125, 108.42989349365234, 94.31094360351562, 1.4426116943359375, 110.21522521972656, 29.631240844726562, 37.06159973144531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000317.npy"}
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 30.102907180786133, "std": 63.10503387451172, "min": -97.73421478271484, "p10": -49.60027389526367, "median": 27.713668823242188, "p90": 130.6276626586914, "max": 149.95533752441406, "pos_frac": 0.6875, "sample": [36.4235954284668, 99.95215606689453, 13.679450988769531, -77.72688293457031, 102.35967254638672, 109.63983154296875, 24.412322998046875, -36.65087890625, 60.04759216308594, 131.388427734375, 20.868854522705078, -43.717079162597656, 58.15813064575195, 49.187477111816406, 115.65132141113281, -0.7993583679199219, 24.98846435546875, -49.62987518310547, 37.136497497558594, 3.079069137573242, -10.416618347167969, 61.09759521484375, 130.52752685546875, -15.463905334472656, 9.989768981933594, 49.896854400634766, 134.8435516357422, -88.40135192871094, 101.01611328125, 90.52361297607422, -14.68963623046875, 30.438873291015625, -38.175743103027344, 19.005741119384766, 0.8787384033203125, 133.01556396484375, 149.95533752441406, -38.05065155029297, 130.6705780029297, 5.186470031738281, -49.53120422363281, 30.730972290039062, 56.515594482421875, -97.73421478271484, -0.7973556518554688, 42.67463684082031, 22.00434112548828, 131.13992309570312, 0.01775360107421875, 82.37001037597656, 0.8608760833740234, 137.54873657226562, 40.466217041015625, 68.53750610351562, -71.15969848632812, -25.26795196533203, 56.2991943359375, -73.34382629394531, 41.991424560546875, -3.563201904296875, 68.20916748046875, -61.014564514160156, -27.573017120361328, 36.90754699707031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000318.npy"}
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 42.75314712524414, "std": 58.36390686035156, "min": -78.68328094482422, "p10": -29.86054534912109, "median": 41.6118278503418, "p90": 128.0707633972168, "max": 159.3751220703125, "pos_frac": 0.734375, "sample": [41.04895782470703, 132.27691650390625, 19.69182586669922, 121.0860595703125, 29.16425895690918, 98.538818359375, 3.870952606201172, 45.21192932128906, 42.17469787597656, 44.656158447265625, 121.8893814086914, 111.73392486572266, -3.089061737060547, -0.5369491577148438, 18.239181518554688, 50.615325927734375, 9.497077941894531, 143.4672393798828, 3.7189407348632812, 73.62789916992188, 6.052158355712891, -24.23455810546875, 48.09539794921875, 62.31862258911133, 149.6666717529297, -5.760719299316406, 127.68206024169922, 64.47821807861328, 129.19155883789062, 128.2373504638672, -32.27168273925781, -49.85194396972656, 40.696983337402344, 159.3751220703125, -7.107107162475586, 65.48080444335938, -45.7664794921875, 94.12899780273438, 70.9340591430664, -44.75385284423828, 125.92050170898438, 44.63353729248047, -13.929573059082031, -0.25263214111328125, 9.719337463378906, 12.952024459838867, -67.89436340332031, -5.691352844238281, -56.49224090576172, 87.77053833007812, -5.905403137207031, 30.878005981445312, 17.478713989257812, -15.490249633789062, 38.142250061035156, 96.33976745605469, 56.169281005859375, 59.46944808959961, 67.99938201904297, 12.487085342407227, 80.89659118652344, -78.68328094482422, 134.72158813476562, 61.48716735839844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000319.npy"}
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 34.23474884033203, "std": 58.51535415649414, "min": -110.94578552246094, "p10": -28.964812469482414, "median": 9.093643188476562, "p90": 118.01186752319337, "max": 172.23944091796875, "pos_frac": 0.6875, "sample": [92.81969451904297, -40.73552322387695, 0.6283321380615234, 42.144508361816406, 7.94964599609375, 28.575773239135742, 123.25975036621094, 2.9913787841796875, 82.88851165771484, -1.5832481384277344, -32.11543273925781, 113.95137023925781, -0.14142417907714844, -0.3654441833496094, -47.489646911621094, 111.51318359375, 64.40607452392578, 2.0289077758789062, -16.587059020996094, 93.38673400878906, -33.71416473388672, 4.398876190185547, -110.94578552246094, -0.1532573699951172, 5.3833465576171875, 172.23944091796875, 141.17681884765625, 107.53632354736328, 89.25474548339844, -13.331207275390625, 49.63561248779297, -17.859506607055664, 8.057426452636719, 0.119110107421875, -0.6142253875732422, 119.51952362060547, 6.281585693359375, -21.613365173339844, -6.299327850341797, -8.065792083740234, 34.89133071899414, 130.91549682617188, 10.129859924316406, 45.83750915527344, 13.820358276367188, 49.40019989013672, 138.7547149658203, 48.004478454589844, 0.38751220703125, -36.53204345703125, 2.377086639404297, -65.97783660888672, 133.30227661132812, -4.033287048339844, 3.098865509033203, 49.29796600341797, 19.137985229492188, 92.24447631835938, 86.3392333984375, 57.889137268066406, 60.89924621582031, -9.577232360839844, 114.49400329589844, 97.39057922363281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000320.npy"}
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 39.27128219604492, "std": 59.61345672607422, "min": -104.65850830078125, "p10": -43.341794586181635, "median": 39.718421936035156, "p90": 118.40801849365235, "max": 134.64453125, "pos_frac": 0.796875, "sample": [105.35568237304688, 88.3877944946289, 2.3314666748046875, 100.1683120727539, 90.17623901367188, -26.21080780029297, 133.66111755371094, 80.44042205810547, 119.41517639160156, 38.34906768798828, 133.6300048828125, 27.263019561767578, 48.81296157836914, 71.29994201660156, -104.65850830078125, 5.399440765380859, 26.609397888183594, 62.490020751953125, 29.937896728515625, 112.0058822631836, 41.81181335449219, 18.344078063964844, -55.46251678466797, -1.514404296875, 41.08777618408203, -69.79795837402344, -70.5846176147461, -73.65007019042969, 57.91392517089844, 44.72499084472656, 24.442550659179688, 42.63603591918945, 103.73933410644531, 72.495849609375, 1.1031265258789062, 88.53713989257812, 59.109893798828125, -0.9218616485595703, 26.40726089477539, 105.31531524658203, 2.064697265625, -3.1195716857910156, 24.555843353271484, 103.0547866821289, 0.7653961181640625, -92.87467956542969, 121.50897216796875, 134.20504760742188, 116.0579833984375, 61.349910736083984, 7.591617584228516, 7.388513565063477, 19.30467987060547, 122.78096008300781, 104.13327026367188, -8.916099548339844, 25.090118408203125, 59.330810546875, 134.64453125, -36.97614288330078, 12.189064025878906, 1.2386398315429688, 43.46141815185547, -46.06993103027344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000321.npy"}
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 47.356597900390625, "std": 55.99868392944336, "min": -99.48309326171875, "p10": -8.129712677001953, "median": 43.96798324584961, "p90": 127.0623550415039, "max": 137.24032592773438, "pos_frac": 0.875, "sample": [-18.06911277770996, 61.11597442626953, 131.53021240234375, 16.742279052734375, 97.86795043945312, 17.70360565185547, 137.24032592773438, 98.00790405273438, 125.05300903320312, 25.99574089050293, 37.107948303222656, 45.885658264160156, 121.88221740722656, 135.7650146484375, 50.5882568359375, 2.5917434692382812, 16.4652099609375, 53.43351745605469, 6.330894470214844, 8.776763916015625, 68.48094940185547, 8.171791076660156, -73.97145080566406, 126.08443450927734, 115.66122436523438, 10.319488525390625, 30.116378784179688, 20.822107315063477, 0.5697402954101562, 1.2952117919921875, -51.649658203125, 89.7568359375, 48.694305419921875, 117.77617645263672, 89.81413269042969, 63.53822326660156, 1.9553909301757812, 133.91607666015625, 6.32635498046875, -8.086502075195312, 67.53457641601562, 46.00666046142578, 1.2439403533935547, 125.84605407714844, 42.70106506347656, 128.31646728515625, 126.77935791015625, 70.73387145996094, -24.122222900390625, -99.48309326171875, 35.99924087524414, 14.780899047851562, 24.097858428955078, 52.377906799316406, 127.18363952636719, 27.560821533203125, 45.234901428222656, 17.471725463867188, 51.813201904296875, -50.47724533081055, 133.13455200195312, -8.148231506347656, 86.78626251220703, 15.843690872192383], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000322.npy"}
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 30.61416244506836, "std": 66.77407836914062, "min": -119.4692611694336, "p10": -39.381613159179686, "median": 8.429084777832031, "p90": 129.12370910644532, "max": 155.35546875, "pos_frac": 0.71875, "sample": [3.2687911987304688, -90.63087463378906, 69.96255493164062, 27.78858184814453, -3.0031814575195312, 105.38502502441406, 18.138961791992188, 13.255619049072266, 6.161674499511719, 125.02203369140625, 35.999542236328125, -16.155744552612305, 130.4752197265625, 5.689521789550781, 124.82624053955078, -37.218692779541016, -49.5980224609375, 61.334564208984375, 108.85491180419922, 1.2441558837890625, 37.81785583496094, 125.44187927246094, 3.7484283447265625, 9.18084716796875, 143.392333984375, 1.5919265747070312, -40.18115234375, 66.15858459472656, 0.8512706756591797, -13.586181640625, 155.35546875, -37.516021728515625, 10.597213745117188, -67.97703552246094, 133.0059356689453, 5.483573913574219, -81.96112060546875, 125.97018432617188, 58.84257507324219, 0.46416664123535156, 7.640235900878906, 71.92149353027344, -5.537872314453125, -7.273078918457031, 146.10838317871094, 145.36952209472656, 136.32418823242188, -119.4692611694336, 116.37710571289062, 2.7415390014648438, -119.1554946899414, -0.8196334838867188, 88.37200927734375, 4.71533203125, -19.418670654296875, 86.53295135498047, 71.95881652832031, 44.71772766113281, 7.6773223876953125, -3.375713348388672, 28.202693939208984, 9.987197875976562, 2.78448486328125, -14.556480407714844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000323.npy"}
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 44.276397705078125, "std": 54.64817428588867, "min": -90.05611419677734, "p10": -9.800782775878906, "median": 34.17530059814453, "p90": 121.58735580444336, "max": 148.17044067382812, "pos_frac": 0.828125, "sample": [76.60708618164062, 37.59571838378906, 25.365234375, 6.101921081542969, -77.71015930175781, 139.7252655029297, 56.356807708740234, 55.36920166015625, -15.884674072265625, -9.48992919921875, 1.7714614868164062, 113.36067199707031, 75.34590148925781, 41.60322189331055, 100.87287902832031, 21.63003158569336, 121.27649688720703, -29.250381469726562, 70.24260711669922, 1.655975341796875, 36.56321716308594, 10.86553955078125, 6.249958038330078, 75.93013763427734, 33.03888702392578, 20.803192138671875, -0.9697036743164062, 26.137290954589844, 148.17044067382812, 14.446453094482422, 19.47657012939453, -90.05611419677734, 50.881080627441406, 122.99260711669922, 17.042083740234375, 57.46986770629883, 27.070213317871094, -14.453008651733398, 74.41007995605469, 89.20691680908203, 35.31171417236328, -41.46217346191406, 146.0863037109375, 14.450462341308594, 133.47064208984375, 17.110530853271484, 0.9404029846191406, 118.28140258789062, 48.28525924682617, 119.62081146240234, 142.16050720214844, 0.8253364562988281, 8.058391571044922, 100.40827941894531, 121.7205810546875, 1.8342132568359375, 37.040626525878906, -9.934005737304688, -2.8834915161132812, 48.640995025634766, 117.88082885742188, 110.64231872558594, 28.309120178222656, -0.9306926727294922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000324.npy"}
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 39.421478271484375, "std": 62.06978225708008, "min": -156.0183868408203, "p10": -32.838969421386714, "median": 37.97574806213379, "p90": 115.77892227172852, "max": 160.1978759765625, "pos_frac": 0.75, "sample": [33.424015045166016, 124.53047180175781, 2.0856094360351562, 99.06061553955078, -35.76740264892578, 82.29624938964844, 89.869873046875, 53.34856414794922, 55.34870910644531, 56.073143005371094, 57.26506042480469, 108.59831237792969, -3.4084720611572266, -23.098724365234375, 24.481430053710938, -6.314319610595703, 116.0529556274414, 92.25527954101562, 103.76716613769531, -93.91719818115234, 144.45217895507812, -26.005958557128906, 114.9811019897461, 1.303018569946289, -0.7255210876464844, 135.24472045898438, 15.413249969482422, 115.13951110839844, -48.0049934387207, 58.59131622314453, -62.02476501464844, 40.007625579833984, 95.69463348388672, 16.028289794921875, -68.06278228759766, 32.501739501953125, -49.99721908569336, 112.49908447265625, 35.943870544433594, 78.76239013671875, 66.96192932128906, 17.634885787963867, 8.545074462890625, 160.1978759765625, -156.0183868408203, 46.56653594970703, 27.78631591796875, 23.64098358154297, 110.57975769042969, 80.44950866699219, 138.15179443359375, -10.588569641113281, 43.792213439941406, -8.4544677734375, 4.984840393066406, -9.777191162109375, 47.61830139160156, 55.02399826049805, 119.95002746582031, 43.92366027832031, 9.679580688476562, 23.687904357910156, 15.326484680175781, -14.381149291992188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000325.npy"}
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 20.207664489746094, "std": 61.34089660644531, "min": -107.40586853027344, "p10": -63.48169021606445, "median": 14.569503784179688, "p90": 108.04640579223633, "max": 139.80308532714844, "pos_frac": 0.59375, "sample": [-11.967918395996094, -3.8690147399902344, 7.564975738525391, 29.445476531982422, 43.44859313964844, -90.7220458984375, 133.99891662597656, 61.83500671386719, -3.727813720703125, 72.37456512451172, -45.97509765625, 39.119144439697266, 19.07400131225586, -1.0421142578125, 61.08645248413086, 98.41681671142578, 4.856290817260742, 68.87516784667969, -4.965747833251953, 27.897781372070312, 114.7168197631836, 0.116790771484375, 22.638778686523438, 68.06948852539062, -59.21086883544922, -23.562103271484375, -107.40586853027344, -15.3594970703125, -16.458740234375, -5.0909423828125, 139.80308532714844, -89.17620086669922, 19.03728485107422, -3.241046905517578, 43.547019958496094, 24.635711669921875, 5.119140625, -17.696807861328125, -50.687164306640625, -14.639602661132812, 92.89771270751953, 130.24737548828125, -77.48176574707031, -3.8409347534179688, 106.86808776855469, 34.15679931640625, 25.69430923461914, -90.21039581298828, 97.9329833984375, 130.71299743652344, -1.2267112731933594, -4.092119216918945, -44.31730651855469, 108.55139923095703, 14.747100830078125, 29.623451232910156, -94.74285888671875, 68.61656188964844, 63.635868072509766, 11.80938720703125, 110.47525787353516, -65.31204223632812, 14.39190673828125, 93.27476501464844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000326.npy"}
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 31.55462074279785, "std": 63.20494842529297, "min": -98.43331909179688, "p10": -51.47156715393066, "median": 22.570411682128906, "p90": 118.19459686279299, "max": 158.5616455078125, "pos_frac": 0.703125, "sample": [49.654151916503906, -53.13506317138672, -15.949790954589844, 100.13681030273438, 42.51970672607422, 22.9124755859375, -76.72872161865234, 62.722434997558594, -78.39210510253906, 74.59445190429688, 137.07284545898438, 25.12625503540039, -73.0751953125, 111.88726806640625, 21.088035583496094, 130.232666015625, 95.08340454101562, 18.246814727783203, 24.140914916992188, -47.5900764465332, 0.6575794219970703, 7.484851837158203, 50.016334533691406, 152.75759887695312, 131.0837860107422, 9.634750366210938, 18.046051025390625, 84.38734436035156, 61.016990661621094, -3.8752899169921875, 15.941375732421875, 2.004749298095703, -81.98825073242188, 98.73161315917969, 96.65286254882812, -5.774604797363281, 38.84886169433594, 22.228347778320312, -4.177768707275391, 7.2704925537109375, 137.2202911376953, 14.3717041015625, -88.59574890136719, -1.5537567138671875, 87.30860137939453, 158.5616455078125, -42.96099853515625, 87.03703308105469, 120.4725112915039, 39.70280456542969, 83.72863006591797, -9.397172927856445, 24.056045532226562, 15.976810455322266, -29.805614471435547, -17.380578994750977, 112.87946319580078, 105.0823745727539, 26.8865966796875, 4.068172454833984, -8.889366149902344, -98.43331909179688, 41.79786682128906, -14.133293151855469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000327.npy"}
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 39.34053421020508, "std": 57.65131378173828, "min": -97.5099105834961, "p10": -21.1927635192871, "median": 21.114356994628906, "p90": 121.58308715820314, "max": 151.85531616210938, "pos_frac": 0.765625, "sample": [103.08216857910156, 5.774553298950195, 7.578847885131836, 44.2425537109375, 118.15971374511719, 8.552816390991211, -11.979915618896484, 91.06249237060547, 77.17806243896484, 14.525909423828125, 3.5450515747070312, 99.20425415039062, 128.22886657714844, 7.472682952880859, 49.225616455078125, -45.21574401855469, 132.87411499023438, 28.786542892456055, 17.293106079101562, -7.285167694091797, 15.671112060546875, -4.5955810546875, 6.727518081665039, 29.08358383178711, -12.976730346679688, -26.36534881591797, 134.40643310546875, 10.022239685058594, -65.59808349609375, 28.591392517089844, -97.5099105834961, 114.94646453857422, 97.660400390625, 2.439453125, 123.05024719238281, -24.71392059326172, 22.092666625976562, 0.1452178955078125, 151.85531616210938, 20.13604736328125, 117.117431640625, 117.84794616699219, -3.5740890502929688, -45.85987854003906, 27.122329711914062, 101.13579559326172, -1.563924789428711, 8.658187866210938, -0.5001983642578125, 17.458301544189453, 72.4312515258789, -41.53047180175781, 73.54153442382812, 23.84283447265625, 0.4581451416015625, 68.83200073242188, 99.07830047607422, -1.956563949584961, 125.10192108154297, 27.16730499267578, 94.65846252441406, 0.3234710693359375, 138.2964630126953, 102.33257293701172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000328.npy"}
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 39.97403335571289, "std": 58.99883270263672, "min": -85.71858978271484, "p10": -28.673413467407226, "median": 27.18001937866211, "p90": 131.79718322753908, "max": 154.491455078125, "pos_frac": 0.765625, "sample": [25.81635093688965, 15.276355743408203, 50.737518310546875, 66.26164245605469, -43.56666564941406, 40.16364288330078, 154.491455078125, -11.525421142578125, 77.23599243164062, 29.090614318847656, 97.02374267578125, 56.722869873046875, -7.063055038452148, 97.57119750976562, 0.1533336639404297, 21.270973205566406, 25.85680389404297, 102.4508056640625, -8.276153564453125, 40.940521240234375, -8.482879638671875, 8.152315139770508, 140.66290283203125, 37.815155029296875, -85.71858978271484, 138.37667846679688, 143.21592712402344, 97.5497817993164, 47.09942626953125, 25.59117889404297, 122.27056121826172, 121.31388854980469, -48.80662536621094, 58.45458984375, 3.026021957397461, 7.151336669921875, 125.15444946289062, 0.7548961639404297, 135.48336791992188, 88.29074096679688, 132.87954711914062, 113.85067749023438, 6.438152313232422, -54.02922821044922, 1.9838142395019531, 129.27166748046875, 0.5278854370117188, 28.50323486328125, 14.695999145507812, 34.66008377075195, -24.805416107177734, 35.56396484375, 1.470458984375, -29.29610824584961, 138.5244140625, 82.52760314941406, -44.459991455078125, 59.19487762451172, -13.95611572265625, -27.220458984375, 16.407577514648438, 2.916107177734375, -0.08204078674316406, -35.21604919433594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000329.npy"}
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 44.68868637084961, "std": 72.08991241455078, "min": -137.86154174804688, "p10": -50.08421478271483, "median": 45.51388359069824, "p90": 132.1488006591797, "max": 158.24610900878906, "pos_frac": 0.734375, "sample": [1.3657379150390625, -26.919754028320312, -14.207052230834961, -124.62705993652344, 43.455413818359375, -137.86154174804688, 116.37759399414062, 30.86121368408203, -19.17485809326172, 126.86933135986328, 128.88250732421875, 61.56855010986328, -84.45194244384766, -0.26929473876953125, 11.965152740478516, 123.90315246582031, 145.57766723632812, -80.05376434326172, 158.24610900878906, 135.3096466064453, 71.68365478515625, -0.5720558166503906, 112.46817016601562, 0.6480960845947266, 126.05198669433594, 103.58369445800781, 0.20372772216796875, 14.948089599609375, -37.23027038574219, 8.239723205566406, 50.117774963378906, 117.24723052978516, -4.476837158203125, 43.48623275756836, -64.44271850585938, 108.03313446044922, 92.35081481933594, 133.41082763671875, 66.01507568359375, 29.080604553222656, -10.387397766113281, 98.36186218261719, 53.443359375, 107.80382537841797, 129.20407104492188, 122.05065155029297, 1.5812664031982422, 51.890682220458984, 6.472198486328125, 48.66120910644531, -6.753997802734375, 19.0799560546875, 137.83889770507812, 121.7485580444336, -19.179336547851562, 18.7677001953125, 47.541534423828125, -55.593048095703125, 144.8570556640625, -73.65779113769531, 88.40909576416016, 106.09164428710938, 14.284202575683594, 139.8957977294922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000330.npy"}
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 54.10665512084961, "std": 68.03654479980469, "min": -141.52694702148438, "p10": -11.226979064941405, "median": 57.96310043334961, "p90": 139.53559112548828, "max": 184.8876495361328, "pos_frac": 0.796875, "sample": [144.12428283691406, 11.388504028320312, 16.978015899658203, 91.96417236328125, 119.39239501953125, 142.83033752441406, 72.63623046875, 72.19938659667969, 90.869873046875, 6.778564453125, -4.757396697998047, 150.85089111328125, 135.19329833984375, 137.27444458007812, 114.33973693847656, 6.897102355957031, 148.48699951171875, 26.658477783203125, 33.498077392578125, 140.50465393066406, 94.88844299316406, 3.913026809692383, 6.487518310546875, 3.2341461181640625, -11.494056701660156, 135.8563232421875, 94.74571990966797, 184.8876495361328, 16.619033813476562, 102.11367797851562, 82.72860717773438, 49.23332595825195, 135.81201171875, 72.39874267578125, -10.603797912597656, -3.1544723510742188, 49.89221954345703, -85.40962219238281, -82.41497802734375, 73.17922973632812, -63.38203430175781, 147.0141143798828, 33.59393310546875, -1.6986083984375, 31.089120864868164, -141.52694702148438, 95.75743865966797, -1.481048583984375, -13.863868713378906, 19.513017654418945, 6.958808898925781, 81.3271484375, 81.47013854980469, 27.93133544921875, 50.320716857910156, 3.9652328491210938, -0.7942819595336914, 124.0813217163086, 65.60548400878906, -77.83819580078125, 134.85963439941406, 94.41200256347656, 70.40725708007812, 124.08349609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000331.npy"}
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 44.427947998046875, "std": 60.37128829956055, "min": -86.80805969238281, "p10": -4.652624320983886, "median": 25.914241790771484, "p90": 136.83228149414063, "max": 159.13070678710938, "pos_frac": 0.8125, "sample": [-57.91580581665039, 6.851081848144531, 5.924018859863281, -0.9451751708984375, 25.068506240844727, 92.51390838623047, 15.497611999511719, 36.033966064453125, 21.0238037109375, 32.20527267456055, 0.6538314819335938, 159.13070678710938, 85.30082702636719, 66.82398986816406, 132.83493041992188, 8.535675048828125, 18.304962158203125, 0.6263389587402344, 84.04655456542969, 43.835235595703125, 74.58808135986328, 8.602279663085938, -86.80805969238281, 136.63455200195312, 93.2144775390625, 105.41686248779297, -63.341522216796875, 21.821060180664062, 2.3266754150390625, -1.2826919555664062, 36.721214294433594, 139.02528381347656, 14.192291259765625, -5.1298675537109375, 2.001800537109375, 25.283935546875, -2.3049755096435547, 153.89605712890625, -71.19340515136719, 5.022562026977539, 97.14938354492188, 130.7962646484375, 94.5275650024414, -3.5390567779541016, 32.12348937988281, 2.8575210571289062, 77.17340087890625, 110.25204467773438, 101.26531982421875, 47.59990692138672, 148.2104034423828, 26.54454803466797, 55.743900299072266, -1.0105056762695312, 33.287147521972656, -18.152969360351562, 139.52728271484375, 4.045886993408203, 14.258163452148438, 136.91702270507812, 158.8154296875, 114.85169982910156, -42.71006393432617, 17.818115234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000332.npy"}
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 47.08435821533203, "std": 59.77134323120117, "min": -96.50389862060547, "p10": -7.050238227844236, "median": 36.8652229309082, "p90": 138.68627014160157, "max": 165.25051879882812, "pos_frac": 0.796875, "sample": [17.62055206298828, -7.90449333190918, 7.4641876220703125, 68.02104187011719, 14.687591552734375, -0.6295394897460938, 29.278728485107422, 45.59454345703125, 4.831714630126953, -44.62017822265625, 36.08098602294922, 61.05588150024414, 74.5880355834961, 96.66771697998047, 125.84396362304688, 18.5184326171875, 55.65435791015625, 139.03756713867188, 154.43374633789062, 146.92910766601562, -4.8948516845703125, 62.2259521484375, 37.10832977294922, 36.72528076171875, 55.59141540527344, -55.433876037597656, -20.517337799072266, 51.639549255371094, 7.597434997558594, 39.92644500732422, 132.20071411132812, 137.8665771484375, -13.20404052734375, 37.005165100097656, 90.46705627441406, 19.693828582763672, 155.27847290039062, 2.9294967651367188, 12.511577606201172, 15.302337646484375, -0.8274707794189453, 48.45799255371094, -96.50389862060547, 4.666234970092773, 52.432655334472656, -67.20811462402344, 165.25051879882812, 26.49945831298828, -1.076995849609375, 23.596214294433594, 7.3285369873046875, -5.056976318359375, 84.63568115234375, 142.49058532714844, 13.174158096313477, -2.4203567504882812, 82.7500228881836, 5.944252014160156, 132.71987915039062, 50.78080749511719, 156.21380615234375, 89.96942901611328, 127.7850570678711, 128.6240234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000333.npy"}
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 34.018856048583984, "std": 69.0396728515625, "min": -124.48040008544922, "p10": -53.46560287475584, "median": 19.105606079101562, "p90": 127.02838821411135, "max": 166.9580078125, "pos_frac": 0.6875, "sample": [-62.11908721923828, -124.48040008544922, 55.31291961669922, 73.53092956542969, 67.32551574707031, -9.96539306640625, -108.02381896972656, 97.8669662475586, 20.970413208007812, -106.17987823486328, 4.3372955322265625, 166.9580078125, -11.671096801757812, 72.78668212890625, 3.2138214111328125, 88.8929672241211, -2.73828125, -24.044740676879883, 14.028106689453125, 95.38250732421875, 75.05828857421875, 150.46139526367188, 25.630958557128906, 68.75513458251953, 87.45011901855469, -63.729373931884766, 26.343467712402344, 0.8473110198974609, 119.23149108886719, 5.318637847900391, -3.0393238067626953, 0.5411148071289062, 145.50616455078125, -97.77032470703125, 77.65449523925781, 139.86669921875, 56.12652587890625, 1.800628662109375, 21.584442138671875, -0.24375152587890625, 4.437625885009766, -23.913463592529297, -6.42364501953125, -5.5041656494140625, 121.16167449951172, -7.796600341796875, 0.7445297241210938, 16.268630981445312, -30.299407958984375, 109.63685607910156, 109.23613739013672, 59.88855743408203, -62.39787292480469, 86.41797637939453, -33.274139404296875, 17.240798950195312, 29.319869995117188, 145.5193634033203, -3.387115478515625, 14.266237258911133, 115.08151245117188, 78.21538543701172, 164.44781494140625, 129.54269409179688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000334.npy"}
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 38.389739990234375, "std": 66.53865051269531, "min": -103.664306640625, "p10": -39.618225479125975, "median": 26.252017974853516, "p90": 142.04629211425782, "max": 157.05328369140625, "pos_frac": 0.75, "sample": [98.63870239257812, 68.72577667236328, -43.846778869628906, 157.05328369140625, 25.25640869140625, 49.992767333984375, 31.360702514648438, 0.9204807281494141, 140.54501342773438, 90.39862060546875, 150.11656188964844, 137.85716247558594, 22.018829345703125, 57.74354553222656, 145.9928741455078, 139.8809356689453, 5.722801208496094, 37.379478454589844, 20.527732849121094, 142.689697265625, -42.97060012817383, 2.38299560546875, 52.781593322753906, 147.61904907226562, -28.046340942382812, 7.879638671875, 75.94100952148438, -40.97667694091797, -32.408203125, 61.676368713378906, 132.12997436523438, 5.230064392089844, 7.127593994140625, 23.70531463623047, 106.32304382324219, -28.787532806396484, 143.59750366210938, -18.847270965576172, 27.24762725830078, 142.77667236328125, 6.066314697265625, -3.292726516723633, 31.7291259765625, 19.51837921142578, 59.67430114746094, 109.98123931884766, -9.949913024902344, -90.33401489257812, 27.601139068603516, 1.2290973663330078, 51.934722900390625, -36.44850540161133, -8.758354187011719, -101.30075073242188, -103.664306640625, -58.877296447753906, 121.26158142089844, 78.58677673339844, 2.1124725341796875, 68.41079711914062, -15.378402709960938, 9.874229431152344, 15.325881958007812, 58.28509521484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000335.npy"}
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 39.463714599609375, "std": 63.77227783203125, "min": -100.09941864013672, "p10": -28.708988189697266, "median": 23.982940673828125, "p90": 137.41546173095705, "max": 184.9031524658203, "pos_frac": 0.71875, "sample": [-37.38330078125, 2.812835693359375, 120.2222671508789, 27.106033325195312, 153.3558349609375, 131.23403930664062, 61.0175666809082, -2.8767013549804688, 143.363525390625, 2.1541996002197266, 1.9385395050048828, -6.957170486450195, 140.86605834960938, -2.87603759765625, 124.86289978027344, 21.110797882080078, 6.055809020996094, 93.09271240234375, 154.20321655273438, -78.77721405029297, -28.626502990722656, 51.909515380859375, 139.79583740234375, 44.530189514160156, -5.246376037597656, 108.84590148925781, 28.008747100830078, -100.09941864013672, -14.754959106445312, -69.31597900390625, 141.4754180908203, -47.89569854736328, 58.14569854736328, 21.158294677734375, 18.879383087158203, 6.717374801635742, 1.711212158203125, -3.2118759155273438, 52.69544219970703, 114.43281555175781, 10.785057067871094, -14.197288513183594, 47.615875244140625, 11.446220397949219, 120.22148132324219, 26.807586669921875, 63.86631774902344, -2.036651611328125, 27.501630783081055, 97.04145812988281, 6.917411804199219, -2.7058868408203125, 65.16077423095703, 131.8612518310547, 184.9031524658203, 71.20979309082031, -54.137939453125, 12.642852783203125, 51.83513641357422, -11.044937133789062, 30.586029052734375, 86.34373474121094, 18.118087768554688, -28.744338989257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000336.npy"}
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 31.06181526184082, "std": 60.1038703918457, "min": -122.0667953491211, "p10": -31.278658294677733, "median": 27.52184295654297, "p90": 114.43272399902344, "max": 159.865478515625, "pos_frac": 0.703125, "sample": [4.694068908691406, -68.8497085571289, -122.0667953491211, -26.40679931640625, 32.77757263183594, 113.06083679199219, -24.34465217590332, 114.75210571289062, 32.685546875, -6.580833435058594, 118.2037353515625, 13.069023132324219, 38.90294647216797, -113.901123046875, -33.4146728515625, -9.294731140136719, 25.446517944335938, 58.201141357421875, 32.81147003173828, -16.910858154296875, 9.218620300292969, 0.7051563262939453, -30.047500610351562, -51.67599105834961, 93.69824981689453, 82.49179077148438, -14.221389770507812, 46.93811798095703, 51.931251525878906, 140.5295867919922, 32.70616912841797, 3.7916183471679688, 132.47080993652344, -4.113044738769531, 117.58267211914062, 60.09550476074219, 113.6875, -0.7223663330078125, 2.3339920043945312, 77.74484252929688, 133.33551025390625, 14.661178588867188, 108.69805908203125, 14.572952270507812, 51.10821533203125, -12.949234008789062, 71.00984191894531, 5.2869720458984375, 28.9603271484375, 26.083358764648438, 80.23042297363281, 159.865478515625, -1.5520401000976562, -9.5418701171875, 31.00103759765625, 87.40971374511719, 7.530115127563477, -31.806297302246094, 73.58456420898438, 69.78133392333984, -88.36416625976562, 10.921920776367188, 93.74899291992188, 36.39944076538086], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000337.npy"}
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 34.35435485839844, "std": 65.03605651855469, "min": -136.0521697998047, "p10": -30.846380233764645, "median": 16.558069229125977, "p90": 131.1405212402344, "max": 148.88677978515625, "pos_frac": 0.78125, "sample": [-16.456298828125, 30.264118194580078, 135.63421630859375, 120.05419921875, 128.676513671875, 111.78958129882812, -84.12379455566406, 117.3573989868164, 15.828765869140625, 131.5294189453125, 148.82656860351562, 6.560676574707031, 0.03368377685546875, 27.331451416015625, 2.8612060546875, -76.67179870605469, 7.6649322509765625, -95.54962158203125, 1.9825592041015625, 1.9979171752929688, -12.087715148925781, 130.23309326171875, 17.287372589111328, 0.4092693328857422, 4.049919128417969, 84.23017883300781, -136.0521697998047, 77.97542572021484, 74.70468139648438, 60.895606994628906, 9.145927429199219, 55.73040771484375, 13.93398666381836, 30.314342498779297, 66.3921890258789, 61.90350341796875, 6.983448028564453, 11.010025024414062, 148.88677978515625, 5.9374237060546875, 9.503700256347656, 106.15641784667969, -32.6373291015625, 105.68470764160156, 1.2781028747558594, 22.485733032226562, -39.26482391357422, -19.913009643554688, 30.31415557861328, -26.667499542236328, 25.33728790283203, 143.1029510498047, 5.457370758056641, 1.846527099609375, 50.7007942199707, 61.1153450012207, -13.195358276367188, 138.4473876953125, 118.9238052368164, -63.32352066040039, -15.121484756469727, 27.24486541748047, 140.73304748535156, -7.006046295166016], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000338.npy"}
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 36.96049880981445, "std": 58.74241638183594, "min": -114.14578247070312, "p10": -19.2326566696167, "median": 31.417585372924805, "p90": 126.27729949951174, "max": 154.14785766601562, "pos_frac": 0.75, "sample": [45.78660583496094, 25.513870239257812, 1.257080078125, -10.58355712890625, -12.275199890136719, 128.9908447265625, 19.87653350830078, 40.15596008300781, 8.987735748291016, -68.72071075439453, 62.439178466796875, -18.92974853515625, 5.370307922363281, 38.38154983520508, 46.42518997192383, 0.8338775634765625, -72.38632202148438, 95.41949462890625, 101.55746459960938, 5.616243362426758, 119.94569396972656, 24.452857971191406, 47.686805725097656, 54.335731506347656, 26.70514678955078, 136.09210205078125, 0.9347190856933594, 18.245182037353516, -18.38793182373047, 55.0122184753418, 50.02260971069336, 94.55525207519531, 146.76724243164062, 148.009765625, 61.83184051513672, 9.436542510986328, 69.41019439697266, -24.893051147460938, 67.39862060546875, 55.496551513671875, 142.69351196289062, 25.487014770507812, -19.36247444152832, -5.922843933105469, 36.886993408203125, -2.516448974609375, 5.906005859375, 53.6285400390625, -88.15081024169922, -33.70489501953125, 14.086465835571289, 6.4109954833984375, 99.28845977783203, 114.21957397460938, -114.14578247070312, 154.14785766601562, -6.8756103515625, 44.20771789550781, -0.0046234130859375, 86.19035339355469, 82.144775390625, 36.13002395629883, -0.9594173431396484, 148.91209411621094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000339.npy"}
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 39.82217788696289, "std": 71.1053695678711, "min": -137.51597595214844, "p10": -36.638143920898436, "median": 26.73036289215088, "p90": 143.8472625732422, "max": 155.12652587890625, "pos_frac": 0.640625, "sample": [61.65007019042969, 138.90655517578125, 19.900516510009766, 54.74137878417969, -4.5236968994140625, 149.8458709716797, 1.0645084381103516, -3.2324771881103516, -37.33692932128906, 144.69442749023438, 10.445899963378906, 99.72425842285156, 48.33204650878906, 102.69544982910156, 146.85330200195312, 100.379150390625, 26.009798049926758, -50.86521911621094, 35.03680419921875, 64.85972595214844, -16.168291091918945, -8.246118545532227, -81.85771179199219, 81.2797622680664, -9.096710205078125, 152.84466552734375, -8.605621337890625, 132.72987365722656, -8.726325988769531, 141.87054443359375, -87.98065185546875, 126.90150451660156, 3.97186279296875, 27.450927734375, 119.88658142089844, 67.70111083984375, 131.813720703125, -1.3516864776611328, -8.6392822265625, 136.1485137939453, 98.92679595947266, -27.65857696533203, -6.731758117675781, 144.9370574951172, 34.0227165222168, -4.665246963500977, -35.00764465332031, -72.60218811035156, 2.0834808349609375, 132.3887939453125, -12.722732543945312, 47.75800323486328, 82.78363800048828, -12.545053482055664, -137.51597595214844, -51.52778625488281, 145.02764892578125, 10.074363708496094, 34.97545623779297, 155.12652587890625, 27.703983306884766, 3.7483596801757812, -26.33050537109375, 15.261894226074219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000340.npy"}
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 44.4967041015625, "std": 67.03886413574219, "min": -114.00514221191406, "p10": -33.86145401000976, "median": 31.452489852905273, "p90": 139.35931243896485, "max": 154.15786743164062, "pos_frac": 0.75, "sample": [-66.783203125, 61.296119689941406, 119.67647552490234, 74.9576416015625, 112.67240142822266, 78.53208923339844, 110.63188934326172, 154.15786743164062, 138.03060913085938, -18.701995849609375, 130.5591583251953, 18.429969787597656, -96.01274871826172, -3.3057403564453125, 9.052846908569336, 2.784931182861328, 100.51265716552734, 3.1642684936523438, 112.81851196289062, 147.54275512695312, 118.61433410644531, 142.92572021484375, 143.08016967773438, 66.33644104003906, -57.1077880859375, 139.9287567138672, 92.14604187011719, 11.18856430053711, -25.08953094482422, -13.955429077148438, 37.861454010009766, 9.355133056640625, -13.421112060546875, 1.14263916015625, 20.878585815429688, 136.28616333007812, 42.31007766723633, 24.80225944519043, 51.905555725097656, -43.02489471435547, -22.53040313720703, -59.227508544921875, 63.636985778808594, 132.01104736328125, 3.5668869018554688, 143.0579071044922, 23.739288330078125, 58.75865173339844, 82.3885726928711, 144.26318359375, 49.71624755859375, 9.576011657714844, 94.76531982421875, -37.620849609375, 3.1203460693359375, 106.48695373535156, 68.71263122558594, 25.04352569580078, -0.06860542297363281, -17.345657348632812, -15.930519104003906, 20.28302001953125, 9.211332321166992, -114.00514221191406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000341.npy"}
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 46.622711181640625, "std": 64.56180572509766, "min": -95.51032257080078, "p10": -15.675201416015623, "median": 29.747163772583008, "p90": 145.5354232788086, "max": 180.2215118408203, "pos_frac": 0.734375, "sample": [17.960914611816406, 38.92271423339844, -1.0542221069335938, -1.6611862182617188, 2.6641788482666016, 130.08627319335938, 30.294498443603516, 118.146484375, -43.84541320800781, 20.4013671875, -1.104318618774414, 29.1998291015625, -2.4804439544677734, 146.45828247070312, 12.264656066894531, -65.39546203613281, 4.596258163452148, -95.51032257080078, -1.4859180450439453, -27.659156799316406, 31.991291046142578, 144.2924346923828, 92.16789245605469, 145.62242126464844, -16.282974243164062, 53.916664123535156, 21.875991821289062, 77.02850341796875, 107.43235778808594, 69.00942993164062, 40.662532806396484, 43.741825103759766, 23.617965698242188, 125.77200317382812, 38.093505859375, -13.814340591430664, 114.28822326660156, 7.7436676025390625, -14.257064819335938, 140.29747009277344, 7.9586029052734375, 149.4098358154297, 48.6181640625, 73.42166137695312, -43.653167724609375, 149.8358154296875, 2.9914016723632812, -4.000278472900391, 180.2215118408203, 154.28602600097656, 98.35980987548828, -55.14057922363281, 16.827613830566406, 145.33242797851562, -11.775646209716797, 148.7328643798828, 51.95345687866211, -1.4398422241210938, 7.776542663574219, 62.30622482299805, 74.28521728515625, 28.820201873779297, 10.010238647460938, 144.7164306640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000342.npy"}
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 29.168371200561523, "std": 59.822017669677734, "min": -126.60848999023438, "p10": -24.174663543701165, "median": 21.488513946533203, "p90": 104.8683837890625, "max": 179.45703125, "pos_frac": 0.71875, "sample": [73.66087341308594, 4.6456146240234375, -16.63593292236328, 3.173248291015625, -4.963874816894531, 68.01515197753906, -3.932037353515625, -0.9815731048583984, 0.85064697265625, -1.7196197509765625, -119.85430145263672, 37.321807861328125, 24.855178833007812, 70.8724594116211, 179.45703125, -6.727260589599609, 129.501953125, 119.32427978515625, 103.32097625732422, 95.31259155273438, 33.088714599609375, 3.0147151947021484, 49.813568115234375, 104.48985290527344, 11.181880950927734, 50.35066604614258, 18.121849060058594, -46.286773681640625, 30.408111572265625, -5.248376846313477, -73.21404266357422, 36.117835998535156, 104.89471435546875, 12.449684143066406, 9.203948974609375, -1.315399169921875, 70.26119232177734, 110.31690979003906, -124.39234924316406, 135.9969940185547, -27.405548095703125, 32.69390869140625, 45.095802307128906, 141.78408813476562, 46.409942626953125, 7.687553405761719, 34.8709716796875, 35.335384368896484, 10.32586669921875, 104.80694580078125, 96.3294906616211, 13.724258422851562, -5.174446105957031, 81.14472198486328, 47.427711486816406, 25.298904418945312, -8.343536376953125, 4.778404235839844, 47.932830810546875, 15.120140075683594, -126.60848999023438, -7.04669189453125, -40.71833419799805, 6.554929733276367], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000343.npy"}
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 45.5706787109375, "std": 61.29072952270508, "min": -87.05777740478516, "p10": -21.32268600463867, "median": 36.38923263549805, "p90": 136.34670715332032, "max": 186.29348754882812, "pos_frac": 0.765625, "sample": [-50.30023193359375, 122.81926727294922, 9.222785949707031, 37.251827239990234, 136.57418823242188, -1.6100711822509766, 11.97097396850586, 35.52663803100586, 14.070503234863281, 48.96112060546875, 4.741912841796875, 13.028533935546875, 23.470396041870117, 94.61814880371094, 79.61315155029297, 68.02256774902344, 84.44247436523438, 186.29348754882812, -0.8601150512695312, 72.86807250976562, -29.901857376098633, -19.768829345703125, -21.988624572753906, 152.56024169921875, 7.606777191162109, -6.917182922363281, -40.33335876464844, 2.2797012329101562, 152.6636505126953, 62.23188400268555, -24.932174682617188, 143.5811767578125, -4.152671813964844, 81.43754577636719, 47.9661865234375, 137.8719024658203, 135.81591796875, 9.896537780761719, 84.74922180175781, -79.9532699584961, 1.3816413879394531, 8.58517074584961, -17.935401916503906, 1.9857311248779297, -5.608787536621094, -9.936746597290039, 136.94720458984375, 129.8596954345703, 50.49470520019531, 57.59394454956055, 92.84564971923828, 48.878265380859375, 89.00914764404297, 8.694475173950195, 129.553466796875, 18.92601203918457, 26.201751708984375, 91.56353759765625, 6.8326568603515625, 111.04024505615234, -87.05777740478516, 87.03424072265625, 103.81805419921875, 54.37828063964844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000344.npy"}
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 39.27054214477539, "std": 62.32624816894531, "min": -98.4980697631836, "p10": -30.00853462219238, "median": 32.30607223510742, "p90": 124.04730682373048, "max": 176.50962829589844, "pos_frac": 0.78125, "sample": [40.976097106933594, 13.227840423583984, 149.48159790039062, -78.00885772705078, 15.76803970336914, 5.8484954833984375, 2.8815383911132812, -47.55035400390625, 35.00896453857422, -9.373641967773438, 119.11660766601562, 87.38528442382812, 82.96857452392578, 87.56327819824219, -60.025577545166016, 84.84343719482422, -66.47840118408203, 45.55361557006836, 22.77129364013672, 102.36126708984375, 176.50962829589844, 10.081228256225586, 84.56849670410156, 37.50878143310547, 107.19246673583984, -1.7224884033203125, 6.5816802978515625, 36.09845733642578, 7.142250061035156, 45.54883575439453, 75.49756622314453, 148.68563842773438, 96.33786010742188, 65.72200775146484, 22.72613525390625, -98.4980697631836, 150.0850067138672, 147.4462890625, 61.87166976928711, -11.571361541748047, 122.6080322265625, 35.261268615722656, -31.6451416015625, 124.66413879394531, 29.603179931640625, -20.83971405029297, 18.55547332763672, -15.029014587402344, 17.41650390625, 20.356666564941406, 18.659278869628906, 8.717903137207031, -12.037490844726562, 22.0787353515625, 5.7945404052734375, 66.22413635253906, 0.6238842010498047, -94.49308776855469, -26.18978500366211, 35.268898010253906, 57.27198791503906, 67.45693969726562, 119.20188903808594, 141.65432739257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000345.npy"}
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 45.19707489013672, "std": 66.65189361572266, "min": -141.74862670898438, "p10": -38.59212608337402, "median": 32.358354568481445, "p90": 142.2462356567383, "max": 156.06771850585938, "pos_frac": 0.765625, "sample": [89.34185791015625, 104.74775695800781, 81.58737182617188, 81.9617691040039, 101.52923583984375, 37.56871032714844, 21.42833709716797, 75.40714263916016, 130.83482360839844, 152.20103454589844, 142.87710571289062, 3.9821929931640625, 39.197418212890625, 126.21477508544922, -32.17158126831055, 147.40841674804688, -80.71592712402344, 30.837627410888672, -41.343788146972656, 41.6876335144043, 70.770751953125, 64.71197509765625, 20.774917602539062, 33.87908172607422, 136.57186889648438, 155.19381713867188, -141.74862670898438, 16.41045379638672, 5.457500457763672, -57.39399719238281, 6.467681884765625, -58.31298065185547, 125.13185119628906, -1.1373672485351562, 25.38528060913086, 19.21704864501953, 3.8847579956054688, -8.050712585449219, 108.35555267333984, 116.41786193847656, 147.8233642578125, 88.07652282714844, -5.6151885986328125, 141.76612854003906, 28.556472778320312, 20.34313201904297, 101.02085876464844, 53.593597412109375, -46.199615478515625, 56.23829650878906, 34.6986083984375, 19.12952423095703, 28.318845748901367, -5.0335235595703125, -67.0919418334961, 6.754432678222656, 13.2745361328125, 90.77529907226562, -20.09180450439453, 156.06771850585938, -9.322879791259766, -1.2558917999267578, 21.765853881835938, 142.45199584960938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000346.npy"}
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 34.057273864746094, "std": 62.30561828613281, "min": -90.91012573242188, "p10": -36.74613647460937, "median": 17.784404754638672, "p90": 135.1233917236328, "max": 164.15316772460938, "pos_frac": 0.703125, "sample": [36.942626953125, -2.2776107788085938, 34.31254577636719, 42.21971130371094, -82.46000671386719, 13.338943481445312, 74.14578247070312, 39.62195587158203, 89.5232162475586, 135.45050048828125, 0.436920166015625, 124.33583068847656, 7.877960205078125, -64.30889129638672, -34.905731201171875, 158.01388549804688, 14.456077575683594, 45.42424392700195, -22.71759033203125, 3.1778640747070312, 10.9853515625, 2.9791431427001953, 87.99354553222656, 6.5469207763671875, -43.3072509765625, 13.70547866821289, 72.77977752685547, 106.6964340209961, 134.36013793945312, -50.648406982421875, 23.249061584472656, -15.808187484741211, -31.392906188964844, 95.19312286376953, -8.609560012817383, 43.3512077331543, -9.218666076660156, 97.5649185180664, -29.2908935546875, 23.15196990966797, 49.6197624206543, 10.858383178710938, -37.534881591796875, 71.32489013671875, 3.801492691040039, 149.0384521484375, -43.99986267089844, -10.208526611328125, 164.15316772460938, 154.14242553710938, 142.7247772216797, 113.8525390625, 31.623260498046875, 21.11273193359375, -90.91012573242188, 51.52970886230469, 54.342750549316406, 6.04730224609375, -5.280315399169922, 0.3605022430419922, -14.51810073852539, -1.3294830322265625, 139.0283203125, 76.99688720703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000347.npy"}
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 44.13256072998047, "std": 69.9046859741211, "min": -92.7342529296875, "p10": -32.05640869140625, "median": 24.753494262695312, "p90": 148.48793182373046, "max": 192.52978515625, "pos_frac": 0.6875, "sample": [121.82441711425781, 142.92733764648438, 6.83233642578125, 2.7576370239257812, 148.18978881835938, 47.337554931640625, -31.80023193359375, -11.919441223144531, -92.7342529296875, 10.152763366699219, 159.65231323242188, -59.58964538574219, 24.287918090820312, 64.6208267211914, 122.41159057617188, 121.98404693603516, 25.219070434570312, 162.84786987304688, 35.2065544128418, 157.68060302734375, 3.7943572998046875, 110.39784240722656, 122.81571197509766, 9.930709838867188, 0.14669036865234375, 138.99899291992188, -3.2200546264648438, 134.04747009277344, -32.16619873046875, -15.067756652832031, 71.62112426757812, -1.7553482055664062, 2.248758316040039, 192.52978515625, 20.90398406982422, -66.70948028564453, 43.49750900268555, -54.570377349853516, -12.086639404296875, 65.95722198486328, 154.27513122558594, 148.61570739746094, 46.31903076171875, -0.2894859313964844, 65.8067626953125, -14.602752685546875, -31.7996883392334, -39.040870666503906, -11.829032897949219, 77.83963012695312, -16.090816497802734, 154.8173370361328, -22.302183151245117, 63.16077423095703, 83.420654296875, 3.9127578735351562, 79.5386962890625, 32.383331298828125, -34.86177062988281, 1.4373226165771484, -19.887222290039062, 68.92524719238281, 133.41387939453125, 12.11831283569336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000348.npy"}
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 43.63902282714844, "std": 64.3814697265625, "min": -94.04632568359375, "p10": -24.5285400390625, "median": 23.757476806640625, "p90": 139.57667694091796, "max": 163.3955841064453, "pos_frac": 0.6875, "sample": [-58.02463150024414, 25.988067626953125, 137.7149200439453, 26.332443237304688, 43.60670471191406, -0.0728912353515625, 49.80467224121094, 135.05776977539062, 43.647850036621094, 13.848152160644531, -1.5220775604248047, 19.210721969604492, 136.56275939941406, 9.190673828125, 22.308204650878906, 82.78836059570312, 100.29584503173828, -52.470787048339844, -94.04632568359375, 148.1053924560547, 23.620468139648438, 100.90025329589844, -25.210159301757812, 2.172313690185547, 117.99847412109375, 23.894485473632812, 15.797935485839844, 53.44955062866211, -1.7587738037109375, 59.9627685546875, 116.08029174804688, 120.84393310546875, 6.382926940917969, 22.51490592956543, -10.451492309570312, 140.37457275390625, 23.56208610534668, 159.1959991455078, -77.60205841064453, 46.30070114135742, 151.43389892578125, 10.330360412597656, 6.299797058105469, 26.14141845703125, 67.97270202636719, 150.20620727539062, 135.67333984375, -1.7658576965332031, -10.300521850585938, -1.905649185180664, 100.84487915039062, -22.938095092773438, -0.29467010498046875, -3.9465866088867188, 38.11366271972656, 160.39259338378906, -27.936264038085938, -8.80816650390625, -31.43126678466797, 104.03251647949219, -0.9263629913330078, -0.9993515014648438, 82.95845794677734, 163.3955841064453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000349.npy"}
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 52.83354187011719, "std": 64.78709411621094, "min": -84.41421508789062, "p10": -21.887252044677727, "median": 46.50280952453613, "p90": 144.66784973144533, "max": 168.109619140625, "pos_frac": 0.796875, "sample": [132.40335083007812, -14.516525268554688, 72.72928619384766, -13.4940185546875, 76.4272689819336, 18.737871170043945, 130.45416259765625, 69.29945373535156, 71.20835876464844, 139.078125, 11.362396240234375, -46.83009338378906, 35.508453369140625, 85.95820617675781, 35.162353515625, 63.35654067993164, 116.94876861572266, 21.752788543701172, -8.180503845214844, -6.035463333129883, 44.04937744140625, 65.33373260498047, -83.97869110107422, 93.56048583984375, 69.2435073852539, 3.100311279296875, -84.41421508789062, 167.09339904785156, -8.61003303527832, 34.986759185791016, 111.1346206665039, -73.52375793457031, 129.5552978515625, 4.965965270996094, 122.38960266113281, 149.62704467773438, 61.856727600097656, 117.19438171386719, 147.06344604492188, 168.109619140625, 54.11200714111328, 159.79306030273438, 148.7639923095703, -25.04613494873047, 42.7487678527832, 4.027381896972656, 5.193763732910156, 15.300634384155273, -41.4832878112793, 62.40393829345703, 69.34858703613281, 48.956241607666016, 1.0951404571533203, 35.12659454345703, 29.825416564941406, -58.277313232421875, 41.82330322265625, 35.87181854248047, -3.4540863037109375, 31.695098876953125, 151.639892578125, 109.26663208007812, 94.74507904052734, 137.80191040039062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000350.npy"}
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 43.94773864746094, "std": 71.4417495727539, "min": -148.15538024902344, "p10": -38.18242721557617, "median": 39.599138259887695, "p90": 134.18377532958985, "max": 151.54678344726562, "pos_frac": 0.78125, "sample": [0.30489158630371094, 99.08032989501953, 144.96539306640625, 94.69001007080078, 8.021160125732422, 123.38442993164062, -15.83980941772461, 39.34283447265625, 138.74459838867188, 28.781707763671875, 25.358642578125, 139.3514862060547, 13.801996231079102, -25.182098388671875, 24.97095489501953, -18.083290100097656, 14.642929077148438, 8.924720764160156, 113.20964813232422, 75.45736694335938, 64.60970306396484, -92.64945220947266, 134.9446563720703, 12.737014770507812, 132.40838623046875, 142.36865234375, 86.03938293457031, 85.53353881835938, -143.48513793945312, 49.77033996582031, 126.26780700683594, -76.50077056884766, 99.62538146972656, 4.0217132568359375, 109.6556625366211, -54.45050048828125, -9.282707214355469, 129.75079345703125, 32.213768005371094, 39.85544204711914, 79.95960235595703, 151.54678344726562, 25.697677612304688, -40.281394958496094, 68.94056701660156, 14.9381103515625, 73.41569519042969, 57.664398193359375, 2.7842559814453125, 108.24163055419922, -33.28483581542969, -0.9893226623535156, 102.01020812988281, 88.75984954833984, 142.68702697753906, 35.002037048339844, 26.12731170654297, 82.06040954589844, -148.15538024902344, 9.940710067749023, -120.49079895019531, 100.40731811523438, 86.58696746826172, -8.275115966796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000351.npy"}
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 45.857688903808594, "std": 64.49071502685547, "min": -102.96881103515625, "p10": -13.053611373901365, "median": 36.02412796020508, "p90": 141.97573547363282, "max": 186.94326782226562, "pos_frac": 0.75, "sample": [139.30169677734375, 168.40200805664062, 137.66407775878906, 7.2129364013671875, 128.31610107421875, -102.96881103515625, -13.97696304321289, 0.14669418334960938, 52.18442153930664, 58.82978439331055, 57.18956756591797, 113.33990478515625, -68.226806640625, -2.083995819091797, 63.514617919921875, 10.519241333007812, -10.497528076171875, -8.057876586914062, -48.90821838378906, 16.84502410888672, 77.84808349609375, 53.28802490234375, 20.611480712890625, 149.4945831298828, 142.23727416992188, 66.15380859375, 31.67974090576172, -102.6921615600586, 19.701473236083984, -7.605144500732422, 28.238483428955078, -0.5963077545166016, 1.9661331176757812, 54.149139404296875, 62.00752258300781, 89.36834716796875, 27.80510902404785, 144.11981201171875, 150.04055786132812, 43.755958557128906, 141.365478515625, -5.946388244628906, 186.94326782226562, 28.368532180786133, 93.2540054321289, 131.97714233398438, -10.899124145507812, 9.526687622070312, -14.531145095825195, 40.79397201538086, -1.358123779296875, 42.974700927734375, 13.299713134765625, 122.86206817626953, 97.1438217163086, 2.6601409912109375, -20.05455780029297, 143.4429168701172, -5.211261749267578, 17.107803344726562, 0.6329689025878906, 88.09477996826172, 40.36851501464844, 41.7584228515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000352.npy"}
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 46.344337463378906, "std": 66.35865783691406, "min": -149.1470489501953, "p10": -10.731399154663082, "median": 31.799238204956055, "p90": 143.0742630004883, "max": 159.27752685546875, "pos_frac": 0.78125, "sample": [84.8438949584961, 147.34640502929688, -51.55625534057617, 34.6648063659668, -77.02334594726562, 22.506099700927734, 84.9971694946289, -5.244728088378906, 136.92897033691406, 16.535133361816406, 7.296840667724609, 50.431617736816406, 80.93309020996094, -21.488800048828125, 144.14404296875, 8.415727615356445, 32.80046081542969, 115.76854705810547, 117.75299072265625, 145.1168212890625, 92.93681335449219, 156.59564208984375, 1.691732406616211, 66.53009796142578, -6.635475158691406, -2.300861358642578, 150.01840209960938, 22.91260528564453, 135.92486572265625, 68.70620727539062, 83.67337036132812, 4.248012542724609, 67.20394897460938, 0.9059638977050781, -149.1470489501953, 130.71078491210938, -65.17233276367188, -51.61418533325195, 135.67196655273438, 130.6985321044922, 57.34366226196289, 144.46066284179688, 159.27752685546875, 3.4759674072265625, 0.1365509033203125, 30.798015594482422, 101.70802307128906, 13.354791641235352, -12.443180084228516, 12.226905822753906, 37.378761291503906, 21.8760986328125, -6.73724365234375, 33.430992126464844, -0.26020050048828125, 11.698928833007812, 3.0095748901367188, 18.37420654296875, -5.4525146484375, 9.3458251953125, 101.27297973632812, -0.37458038330078125, 140.57810974121094, 42.82926940917969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000353.npy"}
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 50.012725830078125, "std": 71.34684753417969, "min": -109.71479797363281, "p10": -59.197801971435545, "median": 42.998796463012695, "p90": 146.74646606445313, "max": 168.15682983398438, "pos_frac": 0.78125, "sample": [158.801513671875, 43.69890213012695, -27.824867248535156, 115.72445678710938, -60.249488830566406, 26.552927017211914, 11.915382385253906, 99.30054473876953, 106.4027099609375, 11.501949310302734, 105.13944244384766, 149.50523376464844, 78.68036651611328, 48.31877136230469, -14.338600158691406, 1.2922592163085938, 124.8172378540039, -26.206771850585938, 125.68763732910156, 168.15682983398438, 84.48934936523438, 142.18331909179688, 31.8681640625, 52.681236267089844, -68.24581909179688, 2.1743602752685547, -109.71479797363281, 128.3885955810547, 30.008583068847656, 55.724571228027344, 33.129486083984375, 108.15171813964844, 146.0806884765625, 43.76239013671875, 38.127410888671875, 27.91967010498047, 83.45494079589844, -62.489097595214844, 148.43057250976562, 8.560989379882812, 99.32547760009766, 131.53140258789062, -88.78009796142578, -73.98597717285156, 147.70361328125, -62.332183837890625, 153.8560333251953, 124.13050842285156, 20.882715225219727, -24.484886169433594, 31.671104431152344, 42.29869079589844, 107.737548828125, 147.03179931640625, 56.72239303588867, 23.942363739013672, 7.179481506347656, 8.08572769165039, -11.151374816894531, 92.5472412109375, 38.061363220214844, 129.76651000976562, -56.743865966796875, -15.74380111694336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000354.npy"}
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 51.47796630859375, "std": 73.3827896118164, "min": -126.82723236083984, "p10": -50.74504928588867, "median": 47.966087341308594, "p90": 147.116943359375, "max": 166.05947875976562, "pos_frac": 0.703125, "sample": [78.91877746582031, 156.93353271484375, -51.82698059082031, 6.398643493652344, 136.41748046875, 45.94380187988281, 42.13761901855469, -22.237516403198242, -58.730186462402344, 61.234413146972656, 60.810997009277344, 24.472991943359375, 79.65072631835938, -51.50494384765625, 134.7515869140625, 93.87789916992188, -2.596294403076172, 49.43838119506836, -61.23137283325195, 91.78168487548828, -63.46739196777344, 71.23466491699219, 30.667892456054688, 131.8959503173828, 137.3639373779297, 48.68229675292969, 105.62841796875, 27.041797637939453, 109.76433563232422, 166.05947875976562, -48.971961975097656, 40.399757385253906, -0.6561927795410156, 47.2498779296875, 8.787773132324219, -126.82723236083984, 90.26979064941406, -5.79437255859375, 55.692752838134766, 144.2592010498047, -1.3996200561523438, 17.893157958984375, 133.3702392578125, -19.684677124023438, -0.20365333557128906, 148.42431640625, 118.72836303710938, 10.242534637451172, 148.96693420410156, -116.97651672363281, 143.21456909179688, 135.05467224121094, 149.62550354003906, -1.8779029846191406, 148.34169006347656, -40.96967315673828, -2.3939285278320312, 22.727237701416016, 156.526611328125, -6.1277008056640625, 113.80565643310547, 23.07646369934082, 101.45708465576172, 128.84616088867188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000355.npy"}
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 42.194644927978516, "std": 82.11406707763672, "min": -153.1595916748047, "p10": -62.30451774597168, "median": 27.32286262512207, "p90": 146.82769775390625, "max": 157.91314697265625, "pos_frac": 0.703125, "sample": [147.20098876953125, 156.92889404296875, 20.415023803710938, -87.21156311035156, 126.96436309814453, 10.351676940917969, 0.7959213256835938, 145.3322296142578, 83.88871002197266, 1.5311393737792969, -122.9825439453125, -143.05801391601562, 82.30975341796875, -89.68973541259766, 148.20465087890625, 157.91314697265625, 6.9389495849609375, 24.196491241455078, 135.6943817138672, -0.2041606903076172, 147.0714111328125, -153.1595916748047, 142.93704223632812, 14.80828857421875, 3.0390625, 151.6355743408203, -60.04172897338867, -39.699737548828125, 54.18544387817383, 30.449234008789062, 76.87452697753906, 7.760009765625, -9.163969039916992, 2.2836170196533203, -63.27428436279297, 122.81637573242188, 125.22815704345703, -45.994384765625, 74.48567199707031, 119.0772705078125, -0.4531593322753906, 1.588836669921875, 146.259033203125, -19.016220092773438, 58.597015380859375, 82.0575942993164, 110.46005249023438, 13.299278259277344, -51.211158752441406, 72.44294738769531, -0.6644058227539062, 81.58566284179688, 145.186767578125, 150.06085205078125, -5.6900482177734375, 102.85958862304688, 34.65897750854492, -17.989158630371094, 141.62948608398438, -29.6240234375, -74.05647277832031, 145.71188354492188, 20.3109130859375, 85.61468505859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000356.npy"}
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 37.79161834716797, "std": 67.40042877197266, "min": -135.77671813964844, "p10": -41.44775047302246, "median": 29.11437225341797, "p90": 133.56835174560547, "max": 152.5031280517578, "pos_frac": 0.75, "sample": [122.7257080078125, -5.130407333374023, 124.59207916259766, -28.223602294921875, -123.3817367553711, 147.91448974609375, 38.0047607421875, 68.29396057128906, 18.773719787597656, 14.612701416015625, -22.082000732421875, 65.95613861083984, -42.24085235595703, 10.66006088256836, 152.0290069580078, 80.83233642578125, 134.04554748535156, 62.71559143066406, 80.2595443725586, 2.963146209716797, 70.72374725341797, 46.89092254638672, 54.09052658081055, 25.105241775512695, 21.00806427001953, 72.13408660888672, -4.550962448120117, 2.3669891357421875, -48.0692138671875, 43.7916374206543, -47.03114318847656, 15.473867416381836, -23.584697723388672, 100.76165008544922, 83.36278533935547, 146.35177612304688, 132.45489501953125, 32.59010314941406, 76.58253479003906, 147.58602905273438, -14.924079895019531, 55.937591552734375, 12.584365844726562, 2.0500106811523438, 134.4515380859375, 43.613555908203125, 2.5754470825195312, -22.89263153076172, -39.5971794128418, 18.0001220703125, 44.66267776489258, 152.5031280517578, 67.87079620361328, 15.583728790283203, 113.54784393310547, 25.638641357421875, 4.003122329711914, 0.9425430297851562, 121.06549072265625, -108.73039245605469, -53.66334533691406, -135.77671813964844, 128.10369873046875, -0.2455463409423828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000357.npy"}
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 52.44757843017578, "std": 69.47084045410156, "min": -109.08885192871094, "p10": -35.31323318481445, "median": 38.377511978149414, "p90": 151.3732131958008, "max": 191.6251220703125, "pos_frac": 0.75, "sample": [108.45219421386719, 98.92672729492188, 109.00668334960938, 152.23583984375, -37.63236999511719, 150.31442260742188, 147.06851196289062, -28.02989959716797, -5.605838775634766, 28.034500122070312, -7.502208709716797, 107.49409484863281, 35.268028259277344, 86.84562683105469, 141.77481079101562, 33.95787048339844, -45.26397705078125, 29.984634399414062, 111.14076232910156, 21.061481475830078, 41.413238525390625, 20.68056869506836, -31.209678649902344, 117.28986358642578, 138.49774169921875, 2.1146774291992188, 20.346923828125, -7.9111785888671875, -18.978042602539062, 7.645099639892578, -68.6458511352539, 27.69548797607422, -37.0718994140625, 6.345304489135742, -18.829261779785156, 60.93363952636719, 85.31877136230469, 48.63175964355469, 19.806007385253906, -0.33646202087402344, 155.53150939941406, 169.23812866210938, 191.6251220703125, 4.805614471435547, 169.90054321289062, 151.8269805908203, 83.44408416748047, -48.00429153442383, 42.417144775390625, 49.214202880859375, -1.4059333801269531, -43.72150421142578, -109.08885192871094, 129.31134033203125, 4.84039306640625, 132.48446655273438, 35.3417854309082, 77.59689331054688, 19.848663330078125, 90.99601745605469, 87.4312515258789, 104.79907989501953, 45.803070068359375, 161.14051818847656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000358.npy"}
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 25.983016967773438, "std": 63.23970031738281, "min": -99.393798828125, "p10": -54.416772460937494, "median": 19.97762107849121, "p90": 113.73329391479497, "max": 164.1874542236328, "pos_frac": 0.65625, "sample": [-44.57786560058594, 69.27644348144531, -24.1614990234375, 10.094467163085938, 54.617942810058594, 58.71318817138672, -26.11362075805664, 27.54206085205078, -60.83282470703125, -26.4158935546875, 164.1874542236328, 62.906097412109375, -10.855224609375, 36.9937744140625, -2.1975936889648438, 74.47723388671875, 139.59051513671875, -38.2418212890625, -0.6790313720703125, -98.13458251953125, 8.626956939697266, 31.774272918701172, 16.98406982421875, -47.47508239746094, 3.3795623779296875, 137.643310546875, 61.031982421875, 18.845943450927734, 42.706878662109375, -90.48474884033203, 145.22781372070312, 155.42724609375, -50.423004150390625, 88.6922607421875, 21.109298706054688, 18.02558135986328, 3.6987075805664062, 81.50325775146484, 22.900054931640625, 26.771568298339844, -59.975196838378906, 94.89402770996094, 119.32256317138672, 57.7742919921875, -2.9666213989257812, 27.348114013671875, -39.923622131347656, -56.128387451171875, -57.89836120605469, 55.673370361328125, 39.96333694458008, -4.273286819458008, -99.393798828125, -7.878961563110352, 89.63732147216797, 1.09405517578125, 95.61061096191406, 14.585588455200195, 74.50173950195312, 100.69166564941406, 21.924083709716797, 149.38674926757812, 4.253414154052734, -17.464752197265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000359.npy"}
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 44.06662368774414, "std": 66.47775268554688, "min": -98.1255111694336, "p10": -20.189468383789062, "median": 24.10453224182129, "p90": 138.38518524169922, "max": 170.77890014648438, "pos_frac": 0.6875, "sample": [-18.3240966796875, 150.42428588867188, 7.53399658203125, 1.59405517578125, 54.078857421875, 96.01644134521484, -2.196643829345703, 37.26454544067383, 27.817340850830078, 127.00009155273438, -0.2826862335205078, -66.77383422851562, 1.0570526123046875, -59.82053756713867, 132.00213623046875, -48.59037780761719, 127.82684326171875, -25.60531997680664, -0.3812522888183594, 141.311279296875, 22.42776870727539, -2.7615013122558594, -55.616607666015625, 17.706329345703125, -3.981414794921875, -3.591461181640625, 121.39614868164062, 8.364952087402344, 101.7934341430664, -1.0367279052734375, 29.492347717285156, 19.482742309570312, 89.22251892089844, -17.93115234375, 25.781295776367188, 133.59561157226562, -11.242340087890625, 73.56624603271484, 139.70484924316406, 12.538421630859375, 165.5523681640625, 0.066802978515625, 129.82147216796875, -10.173192977905273, 170.77890014648438, 19.88501739501953, 142.79995727539062, 9.964813232421875, 42.81345748901367, 135.30596923828125, 133.91226196289062, 38.749053955078125, 154.84706115722656, -19.56540870666504, -4.484968185424805, 52.96527862548828, -20.45692253112793, -98.1255111694336, 40.94486999511719, 109.8380126953125, 36.870697021484375, 7.1377716064453125, 92.1008071899414, 107.85166931152344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000360.npy"}
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 51.81316375732422, "std": 64.5231704711914, "min": -93.33238220214844, "p10": -19.244376754760737, "median": 49.63270950317383, "p90": 142.4635711669922, "max": 161.77627563476562, "pos_frac": 0.75, "sample": [63.528038024902344, 95.34886932373047, 36.76356887817383, -7.755706787109375, 151.1954345703125, 61.32769012451172, -37.77809524536133, -93.33238220214844, 47.75104904174805, 53.55037307739258, 92.416015625, -41.637203216552734, -1.6394500732421875, 144.9722900390625, 113.28627014160156, 82.64153289794922, -21.47281265258789, -4.579082489013672, 62.42657470703125, 6.524993896484375, 38.371246337890625, 48.63384246826172, 12.369712829589844, 151.17263793945312, 132.91212463378906, 101.92880249023438, 134.46929931640625, -84.19429016113281, 136.60989379882812, 95.36283874511719, 33.288944244384766, -5.6590118408203125, 112.04916381835938, 161.77627563476562, 95.05508422851562, 57.843231201171875, 132.34487915039062, 80.80162048339844, -56.00166320800781, -2.2751541137695312, -0.8951282501220703, 6.4110870361328125, 145.23936462402344, 134.2437286376953, 11.82815170288086, 150.48330688476562, 123.00592041015625, 10.987419128417969, -55.319252014160156, -12.737045288085938, 75.41357421875, 17.51513671875, 83.57623291015625, 15.715938568115234, 152.58486938476562, 22.411619186401367, 117.31732177734375, 8.899894714355469, -4.7674713134765625, 50.63157653808594, 63.81085968017578, 32.431129455566406, 0.9013328552246094, -14.044692993164062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000361.npy"}
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 52.24518585205078, "std": 77.07279205322266, "min": -144.29312133789062, "p10": -34.84226989746094, "median": 51.99021530151367, "p90": 145.79051666259767, "max": 238.924560546875, "pos_frac": 0.78125, "sample": [-35.62385559082031, 2.407346725463867, 18.513946533203125, 1.0143184661865234, 1.8198928833007812, -144.29312133789062, 22.36223602294922, 79.99246215820312, 238.924560546875, 122.57490539550781, 10.810447692871094, -22.827373504638672, 20.921592712402344, -3.9102096557617188, -1.6718063354492188, 2.4410667419433594, 83.10829162597656, 78.10287475585938, 86.42562866210938, 150.50839233398438, -3.142850875854492, 18.119781494140625, 57.86248779296875, 77.65646362304688, 141.51608276367188, 4.1792144775390625, 125.19284057617188, 4.16314697265625, 1.62103271484375, 109.35704040527344, 183.73358154296875, 3.839151382446289, -35.287193298339844, 144.14060974121094, 26.72760772705078, 155.42160034179688, 133.21652221679688, 16.924652099609375, 71.4848861694336, -33.24958801269531, 1.885650634765625, 131.69296264648438, 146.28970336914062, 144.62574768066406, 19.865341186523438, 48.455047607421875, 111.48902130126953, 161.7472686767578, 121.4247817993164, 112.94308471679688, -33.804115295410156, -3.333587646484375, 132.76449584960938, -100.33531188964844, 55.52538299560547, 65.55681610107422, 142.92787170410156, -87.55233764648438, -56.357749938964844, 81.18278503417969, 170.9368896484375, -75.1392593383789, 77.46597290039062, 58.35687255859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000362.npy"}
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 51.07155990600586, "std": 68.77987670898438, "min": -141.39625549316406, "p10": -16.32518577575683, "median": 42.456581115722656, "p90": 151.1182586669922, "max": 210.76004028320312, "pos_frac": 0.828125, "sample": [88.74981689453125, -99.05070495605469, 152.81015014648438, 53.34088897705078, 72.96774291992188, 43.26708221435547, 7.6801300048828125, 156.2358856201172, 13.907073974609375, -28.061031341552734, -1.4351959228515625, 26.293212890625, 210.76004028320312, 158.5606689453125, 10.237968444824219, 166.72628784179688, 41.62997817993164, 63.64134979248047, 27.135780334472656, 115.86666870117188, 7.8730316162109375, 94.47901916503906, 11.759050369262695, 140.3743438720703, 145.19737243652344, 34.89253234863281, -5.279144287109375, 20.15484619140625, -8.866310119628906, 26.03479766845703, -23.019323348999023, -141.39625549316406, 72.83121490478516, -84.93827819824219, 85.8206787109375, 73.99093627929688, 41.6878662109375, 101.57080078125, 39.611534118652344, -1.6100425720214844, 4.492588043212891, 7.629264831542969, 49.33416748046875, 148.83253479003906, 6.886823654174805, 152.155517578125, 108.95811462402344, 43.22529602050781, 140.13815307617188, 8.303466796875, 138.1004180908203, 1.8942413330078125, 123.8974838256836, 72.91238403320312, -55.68043518066406, 24.461334228515625, 152.0978546142578, -19.521846771240234, 54.70341491699219, 44.003238677978516, 0.2309246063232422, 78.89041137695312, 61.09339904785156, 9.108682632446289], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000363.npy"}
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 55.89947509765625, "std": 83.79508209228516, "min": -144.48997497558594, "p10": -39.93781242370605, "median": 59.295650482177734, "p90": 159.84512634277345, "max": 193.25010681152344, "pos_frac": 0.734375, "sample": [172.20037841796875, -2.5504302978515625, 20.926895141601562, 158.6639404296875, 160.35134887695312, 12.002326965332031, 73.22377014160156, 120.41381072998047, -30.186241149902344, 170.20614624023438, 181.56776428222656, 111.92505645751953, -34.128902435302734, -58.89360809326172, 81.48025512695312, 150.6573944091797, 64.19882202148438, -10.470333099365234, 193.25010681152344, 0.496246337890625, 166.20156860351562, 58.561309814453125, 117.02220153808594, 150.70159912109375, -111.60716247558594, -64.50872039794922, 152.65872192382812, 4.693208694458008, 0.5241184234619141, 115.90087890625, 146.1439666748047, 60.029991149902344, -2.4424896240234375, 64.3277359008789, 20.247848510742188, 4.4615325927734375, 117.4822998046875, 157.76223754882812, -29.35265350341797, 21.482072830200195, 120.66407775878906, 72.4977035522461, -42.427345275878906, -90.12733459472656, 163.48635864257812, 9.332096099853516, -13.413421630859375, 145.83566284179688, 3.4343490600585938, 130.28753662109375, 8.879531860351562, 118.54823303222656, -116.83070373535156, 54.67937469482422, -144.48997497558594, 15.955368041992188, 74.44725799560547, 107.43702697753906, -1.6681270599365234, 9.157501220703125, -0.3385658264160156, -24.409591674804688, 151.3994140625, 139.6051483154297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000364.npy"}
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 52.74662780761719, "std": 61.377777099609375, "min": -87.15988159179688, "p10": -8.985666084289544, "median": 46.3962516784668, "p90": 138.58911743164063, "max": 188.77716064453125, "pos_frac": 0.8125, "sample": [44.2633056640625, 8.248865127563477, 83.57192993164062, 1.0697364807128906, 48.28126525878906, 1.6817550659179688, 140.31686401367188, 107.49107360839844, 126.36882019042969, 75.85588073730469, 134.55770874023438, 59.852821350097656, -87.15988159179688, -1.2244911193847656, 11.924713134765625, 130.3549041748047, -1.8850669860839844, 1.7136650085449219, -17.225868225097656, 0.24946212768554688, 36.62725830078125, 85.0495376586914, 118.78803253173828, 7.429765701293945, -0.0048809051513671875, -83.40586853027344, 62.511329650878906, 159.5105438232422, 21.595314025878906, 85.08380126953125, 13.669601440429688, 11.960847854614258, 125.56060791015625, 39.520015716552734, 126.93418884277344, 66.76541137695312, 124.42815399169922, -3.006307601928711, 47.64529800415039, 46.36426544189453, 150.90158081054688, 80.85684204101562, 46.42823791503906, -12.089324951171875, 58.60084533691406, 141.58401489257812, 4.540685653686523, -14.609823226928711, 188.77716064453125, 50.27381134033203, 86.78894805908203, 2.6165599822998047, 122.64080047607422, 18.2447509765625, -0.4470710754394531, 31.18072509765625, 1.8161773681640625, -34.14095687866211, 80.97412109375, 7.287742614746094, 155.8271484375, 143.072509765625, -11.548248291015625, 114.87272644042969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000365.npy"}
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 45.25122833251953, "std": 80.38557434082031, "min": -140.32659912109375, "p10": -60.83438415527343, "median": 39.034278869628906, "p90": 149.56640625, "max": 181.16355895996094, "pos_frac": 0.71875, "sample": [1.6787948608398438, 157.89468383789062, 98.40880584716797, 52.84036636352539, -4.656593322753906, 138.0341339111328, -42.236328125, -33.16680145263672, -41.87290954589844, -11.398635864257812, -55.45849609375, 88.2564926147461, 29.42469024658203, 79.31526184082031, 47.91752624511719, 36.510528564453125, -71.84925842285156, 49.196136474609375, 1.4849567413330078, 103.62747955322266, 54.596343994140625, 51.868797302246094, 71.46070861816406, -14.616008758544922, 12.564741134643555, 117.52426147460938, 37.829898834228516, 147.6275634765625, -140.32659912109375, -17.988008499145508, 171.40530395507812, 3.7239456176757812, -3.067953109741211, 181.16355895996094, -73.17387390136719, 3.095388412475586, -63.138336181640625, -98.57408142089844, 32.74980545043945, 40.2386589050293, 2.8588333129882812, -8.288166046142578, 3.7283687591552734, 30.02001953125, -95.13763427734375, 5.7184906005859375, 22.88869285583496, 167.34275817871094, 159.1122589111328, 55.51512145996094, 54.532073974609375, 141.5843963623047, 137.8668212890625, -105.10870361328125, 146.8589630126953, -38.342254638671875, 145.24813842773438, 136.56434631347656, 160.9012451171875, 130.98936462402344, 150.3973388671875, 98.301513671875, 147.39675903320312, 106.21490478515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000366.npy"}
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 45.237735748291016, "std": 70.09475708007812, "min": -153.446044921875, "p10": -32.29785480499267, "median": 48.580780029296875, "p90": 140.8574005126953, "max": 155.9742889404297, "pos_frac": 0.796875, "sample": [2.117778778076172, 152.90728759765625, 66.08113098144531, -7.8766326904296875, 77.26068878173828, -0.3862762451171875, 127.00495910644531, 143.94711303710938, 96.99069213867188, 1.6819801330566406, 27.228805541992188, 23.07501220703125, 5.998832702636719, 13.969375610351562, 148.56231689453125, 17.240497589111328, 69.74482727050781, -3.4310531616210938, -37.82337951660156, 134.78359985351562, 22.784164428710938, 76.11244201660156, -8.098207473754883, -42.847999572753906, 39.643798828125, -149.7747802734375, -37.396507263183594, 78.8315658569336, 14.67868423461914, 53.14251708984375, 81.88235473632812, 115.11338806152344, 89.33104705810547, 61.051605224609375, 99.04766845703125, 3.9952926635742188, 69.13697814941406, -77.17597961425781, 8.941335678100586, 149.17457580566406, 11.715499877929688, -153.446044921875, 16.040910720825195, 155.9742889404297, 94.49217987060547, -17.57464599609375, 75.27261352539062, 9.897029876708984, 113.5435791015625, 27.443233489990234, 18.795547485351562, 56.62364959716797, -120.7166976928711, 138.75128173828125, 76.64360046386719, 141.65142822265625, 44.65550231933594, -20.400999069213867, 92.90219116210938, 144.81915283203125, 20.2415771484375, 69.72795104980469, 52.50605773925781, 139.00466918945312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000367.npy"}
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 42.564666748046875, "std": 69.29644012451172, "min": -150.04251098632812, "p10": -46.7190559387207, "median": 38.23601722717285, "p90": 129.84268035888672, "max": 163.7069854736328, "pos_frac": 0.734375, "sample": [-150.04251098632812, 112.82772827148438, 3.6641006469726562, 19.83829689025879, 71.25137329101562, 91.26065826416016, 126.04559326171875, 55.968544006347656, -12.484550476074219, -10.630218505859375, 154.8544158935547, 3.3804283142089844, 38.29466247558594, 29.735427856445312, 152.7469482421875, 126.86470031738281, -44.10723114013672, -47.838409423828125, -71.02128601074219, 92.92182159423828, 51.62297058105469, 19.396886825561523, 57.00298309326172, -25.328086853027344, 151.27622985839844, 17.204288482666016, -17.450790405273438, 131.11895751953125, 163.7069854736328, 45.380855560302734, -18.223859786987305, 143.2088623046875, 74.5029067993164, 1.6468124389648438, 22.124988555908203, 86.87469482421875, 35.73711395263672, 96.38339233398438, 9.267753601074219, 79.41398620605469, 61.18511962890625, -7.539968490600586, 120.51666259765625, 27.582077026367188, -25.330230712890625, -21.04511260986328, 38.177371978759766, -14.017852783203125, 126.23308563232422, -51.30335998535156, 58.49391174316406, 113.8756103515625, 125.2813720703125, 132.7913818359375, 0.1844654083251953, 71.19007873535156, 48.32378387451172, -52.8648681640625, -83.4817886352539, 99.43864440917969, 118.19659423828125, 34.56309509277344, -99.33708190917969, 34.627227783203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000368.npy"}
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 37.005409240722656, "std": 69.01641845703125, "min": -135.79446411132812, "p10": -41.28927841186523, "median": 31.1099853515625, "p90": 141.78901672363284, "max": 187.0279541015625, "pos_frac": 0.734375, "sample": [13.422203063964844, -78.77735137939453, 41.63811492919922, 68.89788818359375, -80.7073745727539, 59.280731201171875, 48.39701461791992, 0.6655654907226562, 103.0096435546875, 12.329708099365234, 13.434165954589844, 116.31082153320312, 130.1216583251953, 96.40840911865234, 143.57781982421875, 4.1255950927734375, 77.17303466796875, 28.483779907226562, 25.469276428222656, 39.3027458190918, 0.24951171875, -135.79446411132812, 45.5526123046875, -74.39788818359375, 35.27641296386719, 84.17681884765625, -16.675952911376953, 69.55848693847656, 40.768245697021484, -38.562652587890625, 137.61514282226562, -26.445465087890625, 53.51636505126953, 152.09286499023438, 77.43232727050781, 127.05659484863281, 160.40298461914062, 1.235626220703125, 14.20111083984375, -74.22066497802734, -3.456390380859375, 45.56059265136719, -12.369182586669922, 67.16902923583984, -10.383472442626953, 11.057868957519531, 187.0279541015625, 3.4174270629882812, 125.59304809570312, -11.864280700683594, -4.078239440917969, 29.86406707763672, -1.7554740905761719, 151.3831329345703, 1.9742507934570312, -10.337003707885742, 0.07257652282714844, 159.1497039794922, 32.35590362548828, 34.896392822265625, -42.45783233642578, 66.00064086914062, -91.50859832763672, 145.43260192871094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000369.npy"}
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 51.95973205566406, "std": 77.94548797607422, "min": -167.7449188232422, "p10": -24.412602615356434, "median": 43.152313232421875, "p90": 157.43889465332032, "max": 173.837890625, "pos_frac": 0.75, "sample": [59.00714874267578, 156.98150634765625, 108.44879150390625, 51.66734313964844, 173.837890625, 3.5863494873046875, 165.65391540527344, 6.9219970703125, 157.63491821289062, 5.150043487548828, 9.693771362304688, 36.611053466796875, 1.2821502685546875, 132.9945068359375, 173.1306915283203, 78.53738403320312, 29.824172973632812, -72.65096282958984, -11.731246948242188, 43.55255126953125, 49.9659423828125, 80.52952575683594, 75.77778625488281, 78.6595458984375, 153.92337036132812, 109.24981689453125, 78.07612609863281, 34.18217849731445, -3.278106689453125, -4.7948455810546875, -2.392059326171875, -11.277061462402344, -1.4665451049804688, -76.45790100097656, 5.036651611328125, 74.27753448486328, 165.73190307617188, 42.7520751953125, 151.51577758789062, 131.4111785888672, 165.9373321533203, -167.7449188232422, 26.320161819458008, 73.88621520996094, 18.93623924255371, 158.68643188476562, 88.80082702636719, 114.33399963378906, 144.0930938720703, 153.38319396972656, -154.7593231201172, 13.282264709472656, 136.4658203125, -2.763467788696289, -43.12709045410156, 16.203598022460938, 21.93280029296875, -4.4063568115234375, -70.61143493652344, -1.199136734008789, 47.362281799316406, 144.29055786132812, -29.847469329833984, 34.410179138183594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000370.npy"}
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 46.544769287109375, "std": 71.35900115966797, "min": -81.25830078125, "p10": -52.57638359069824, "median": 38.293514251708984, "p90": 149.78711700439456, "max": 178.09048461914062, "pos_frac": 0.71875, "sample": [136.55801391601562, -81.25830078125, 164.66543579101562, 38.83617401123047, 92.15614318847656, 3.3931884765625, 107.56285095214844, 18.01383399963379, -20.904090881347656, 13.019109725952148, -18.6025390625, 115.23063659667969, -53.66190719604492, 160.74630737304688, -0.3512916564941406, 173.1201171875, -4.729394912719727, 109.3034896850586, 15.68402099609375, 152.2052764892578, 25.804641723632812, -70.55143737792969, -14.60130500793457, 4.34033203125, 40.55174255371094, 129.14590454101562, 117.06670379638672, 178.09048461914062, 124.1807861328125, 81.1628646850586, 102.05470275878906, 161.85385131835938, -67.79619598388672, 3.99517822265625, 70.18903350830078, 23.339462280273438, -4.9584808349609375, -69.47098541259766, -12.155654907226562, 160.93064880371094, -62.910377502441406, 15.697711944580078, 5.161771774291992, 71.66197204589844, 58.436065673828125, -34.15168762207031, 101.8430404663086, 73.79727935791016, -33.33202362060547, 37.7508544921875, 97.75341033935547, -10.366317749023438, 5.120475769042969, 91.03530883789062, 88.8909912109375, 0.38470458984375, 104.56908416748047, 15.696006774902344, 68.04231262207031, 55.01819610595703, 144.14474487304688, 92.66429901123047, -50.043495178222656, -62.15864181518555], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000371.npy"}
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 28.69495391845703, "std": 84.24031066894531, "min": -153.01385498046875, "p10": -82.62195892333983, "median": 14.234917640686035, "p90": 154.2288955688477, "max": 170.4378662109375, "pos_frac": 0.6875, "sample": [-33.20088195800781, 14.509586334228516, 146.892578125, 103.16201782226562, 65.07240295410156, 65.67638397216797, 164.4748077392578, -33.69658660888672, 138.65692138671875, 106.07601165771484, 170.4378662109375, -0.6842193603515625, 99.95826721191406, 166.89971923828125, 33.96843719482422, 102.912353515625, 122.23269653320312, -137.4112091064453, 12.069221496582031, 20.215904235839844, 14.358882904052734, -11.448041915893555, 31.308746337890625, 147.76544189453125, 6.518543243408203, -9.316047668457031, -99.23361206054688, 13.748397827148438, 158.79718017578125, -78.396484375, 92.85987091064453, 34.57411193847656, -38.95090103149414, -6.598903656005859, -84.43287658691406, 14.110952377319336, -153.01385498046875, 10.009674072265625, 122.98967742919922, 39.77893829345703, 167.74993896484375, 58.14211654663086, 99.90908813476562, 50.596893310546875, -62.55464172363281, 58.7835693359375, -142.17111206054688, 2.2659988403320312, 4.921430587768555, 8.625251770019531, 4.4002227783203125, -142.56521606445312, 88.01216125488281, 2.3115005493164062, -46.85211944580078, -50.111610412597656, -94.53640747070312, 36.091758728027344, 161.8077850341797, 156.9989471435547, -58.573184967041016, -17.238037109375, 10.204780578613281, 6.6059112548828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000372.npy"}
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 38.738624572753906, "std": 84.87754821777344, "min": -147.19728088378906, "p10": -83.15295257568359, "median": 37.626935958862305, "p90": 151.76422271728515, "max": 183.21290588378906, "pos_frac": 0.6875, "sample": [-121.76080322265625, 140.8915557861328, 141.6925048828125, 90.7640380859375, 144.1713409423828, -120.69409942626953, -38.699241638183594, 133.98529052734375, 101.29179382324219, 30.352128982543945, 114.81298828125, -115.2004165649414, 25.476974487304688, -37.60639953613281, -84.85902404785156, -23.618350982666016, 50.241455078125, 75.65979766845703, -147.19728088378906, -31.930770874023438, 71.53276062011719, -23.054527282714844, 83.47919464111328, -56.50415802001953, 152.5720977783203, 147.02354431152344, -12.507125854492188, -4.8074188232421875, 86.2752914428711, 94.7999496459961, 156.97494506835938, 3.1726226806640625, 44.900943756103516, 149.87918090820312, -80.70155334472656, 1.8294563293457031, -17.095916748046875, 53.728912353515625, 1.8112106323242188, 142.3791961669922, 109.72116088867188, -116.45736694335938, -84.20355224609375, -13.68808364868164, 42.14088439941406, 114.3182144165039, -68.41484069824219, 153.54287719726562, 10.327484130859375, 43.490821838378906, -0.0758209228515625, 156.9239044189453, 2.4484291076660156, 183.21290588378906, 30.006567001342773, 33.11298751831055, 65.80323791503906, 54.19952392578125, 28.547256469726562, 177.9910125732422, 49.74760437011719, 5.236248016357422, 24.69898223876953, 153.17947387695312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000373.npy"}
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 65.550048828125, "std": 72.21820068359375, "min": -136.97779846191406, "p10": -5.46664810180664, "median": 54.8930606842041, "p90": 158.97607421875, "max": 202.01388549804688, "pos_frac": 0.859375, "sample": [202.01388549804688, 135.8840789794922, 136.2621612548828, -4.746070861816406, 41.450260162353516, 127.72014617919922, 167.830078125, 159.23452758789062, 158.37301635742188, 45.391014099121094, 84.10572814941406, 15.34423828125, 104.48384094238281, 193.11337280273438, 84.02274322509766, 23.042877197265625, 145.982177734375, 3.7843780517578125, 21.196212768554688, 50.59949493408203, 28.88971710205078, 128.4606475830078, 42.697208404541016, -83.0447998046875, 99.08322143554688, 93.59386444091797, 44.30949783325195, 114.9643325805664, -4.649940490722656, 95.29925537109375, 73.12132263183594, 69.05072021484375, 1.3813858032226562, 5.821952819824219, 78.53227233886719, 51.12070846557617, 11.668289184570312, -136.97779846191406, 172.071044921875, 91.32978820800781, 44.14487838745117, -102.49451446533203, 22.790578842163086, 175.43638610839844, 174.9075927734375, 39.86528778076172, -31.62322998046875, 16.328369140625, 61.42523956298828, 44.25067901611328, 144.63018798828125, 135.73309326171875, 148.4752655029297, 58.66541290283203, 12.405460357666016, 3.4823665618896484, 43.67345428466797, 85.21516418457031, 128.55357360839844, 148.5527801513672, -53.32530975341797, -9.690155029296875, 37.765228271484375, -5.7754669189453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000374.npy"}
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 43.547298431396484, "std": 76.17642211914062, "min": -155.011474609375, "p10": -44.16235198974609, "median": 37.225894927978516, "p90": 148.84722747802735, "max": 173.52255249023438, "pos_frac": 0.703125, "sample": [-44.523582458496094, -73.10892486572266, 39.48809051513672, 99.86163330078125, 80.55854034423828, 22.977399826049805, 165.6202392578125, -3.1636962890625, 5.95068359375, 89.04463195800781, -10.042709350585938, 109.83949279785156, -39.35047912597656, 149.2338409423828, 5.842071533203125, -3.1967735290527344, 54.185546875, -107.79695129394531, 102.6805191040039, -4.052577972412109, 46.284996032714844, 147.94512939453125, 155.91732788085938, -49.2611083984375, 102.32170867919922, 153.57144165039062, 1.5423812866210938, -33.44952392578125, 66.5962905883789, 139.02647399902344, 4.974723815917969, 0.29804039001464844, 11.66493034362793, -43.319480895996094, 88.49700927734375, 66.27739715576172, 82.9671859741211, 20.381057739257812, 160.96339416503906, 64.41034698486328, 104.49017333984375, 34.96369934082031, 132.61355590820312, -155.011474609375, -10.672531127929688, 13.861114501953125, 160.16522216796875, 3.6007232666015625, 173.52255249023438, 80.40299987792969, 139.63079833984375, 49.51820373535156, -23.14061737060547, 124.6644287109375, -33.7790412902832, 15.653327941894531, 82.81957244873047, -84.64341735839844, -23.35491180419922, 147.93801879882812, -5.0419158935546875, 101.6209487915039, 9.479408264160156, -79.93043518066406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000375.npy"}
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 54.46429443359375, "std": 80.22075653076172, "min": -138.92105102539062, "p10": -52.80082473754882, "median": 44.40786170959473, "p90": 158.63482513427735, "max": 221.57632446289062, "pos_frac": 0.703125, "sample": [34.868194580078125, 3.9787521362304688, -2.456178665161133, 31.4195556640625, -6.724456787109375, -16.051738739013672, -0.49558258056640625, 139.38731384277344, -5.4918670654296875, 50.5352783203125, 101.6116943359375, 91.31858825683594, -138.92105102539062, -3.265625, -58.56993103027344, 8.981826782226562, 169.93438720703125, 66.19371032714844, 14.302661895751953, 221.57632446289062, 39.48091125488281, 162.94180297851562, 3.654144287109375, 147.77940368652344, 178.9899139404297, -20.38128662109375, 142.65850830078125, 144.33509826660156, 145.39199829101562, 105.09160614013672, 113.89158630371094, 43.18780517578125, 150.97705078125, 33.057769775390625, 70.27718353271484, 160.0186767578125, 87.62081909179688, -44.053462982177734, 160.72789001464844, -88.49650573730469, 64.62569427490234, -64.8668441772461, 126.0374526977539, 127.52972412109375, 108.30224609375, 117.88699340820312, -56.18669891357422, 118.32405090332031, -44.90045166015625, 142.67347717285156, -70.67552947998047, 26.432395935058594, 13.429763793945312, 45.6279182434082, 162.1681671142578, -3.3840274810791016, 89.88481140136719, -11.675716400146484, 155.4058380126953, 81.58639526367188, -36.64533996582031, 18.46242332458496, 0.6575546264648438, -64.26826477050781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000376.npy"}
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 40.009239196777344, "std": 73.97216796875, "min": -145.2893524169922, "p10": -51.793272399902335, "median": 30.795957565307617, "p90": 141.89226684570312, "max": 173.6995849609375, "pos_frac": 0.71875, "sample": [96.24095153808594, -30.64685821533203, 121.15544891357422, 118.16266632080078, 67.25068664550781, 6.806434631347656, -88.06197357177734, -64.6240234375, 46.554298400878906, 28.146583557128906, 157.98477172851562, 3.0017318725585938, 117.4696273803711, -2.624542236328125, 143.5203857421875, 137.50027465820312, -102.04460906982422, 72.2844009399414, -56.835777282714844, 0.1868896484375, 154.33607482910156, -10.283966064453125, 84.18180847167969, 20.586563110351562, 91.70388793945312, -135.3839874267578, 19.03331756591797, 173.6995849609375, 33.44533157348633, -13.073163986206055, 134.37530517578125, -7.819169998168945, 104.28630065917969, 1.3805999755859375, 10.278396606445312, -59.59313201904297, 26.32623291015625, 42.777976989746094, 142.055908203125, 23.082305908203125, -15.243263244628906, 91.97618103027344, 3.667612075805664, 71.11548614501953, 59.06878662109375, 154.4834747314453, 47.90505599975586, -11.772296905517578, -40.027427673339844, 57.56620788574219, 132.32144165039062, 5.7799072265625, -13.620264053344727, 155.8743133544922, 1.205413818359375, -17.57976722717285, 141.51043701171875, 79.3115463256836, -145.2893524169922, 9.200202941894531, 67.94105529785156, -6.703590393066406, 46.43299865722656, 78.64347839355469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000377.npy"}
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 38.03527069091797, "std": 71.66476440429688, "min": -146.86831665039062, "p10": -23.119884109497068, "median": 20.718936920166016, "p90": 145.97280426025392, "max": 178.78775024414062, "pos_frac": 0.734375, "sample": [121.1538314819336, 80.18704986572266, 0.6799030303955078, 10.389106750488281, 84.06478118896484, 134.129638671875, -33.03126525878906, 8.761394500732422, 42.67140197753906, 139.8005828857422, -146.86831665039062, 4.404382705688477, 178.78775024414062, -2.1767845153808594, 151.9845428466797, 105.2748794555664, 60.77935791015625, 2.4359588623046875, 154.37057495117188, -4.146518707275391, 83.68911743164062, 18.917999267578125, 8.251178741455078, 23.390220642089844, 150.8972930908203, 11.072898864746094, 56.23749542236328, 14.0360107421875, -75.19439697265625, 21.61435317993164, 93.53865051269531, 24.52983856201172, -74.07292175292969, -2.373929977416992, 84.35784912109375, 20.469459533691406, -21.958984375, 21.89556121826172, 122.64411926269531, -5.538137435913086, 163.2713165283203, 79.82575988769531, 0.5175933837890625, 138.0348358154297, 20.968414306640625, 122.37974548339844, -11.546924591064453, 154.106201171875, 68.79666900634766, 9.955129623413086, 3.720531463623047, -7.586883544921875, 8.116130828857422, -122.35870361328125, -23.617412567138672, 59.51408386230469, -10.327680587768555, 29.69598388671875, 5.847665786743164, 63.48877716064453, -4.357677459716797, -125.64653015136719, -7.21360969543457, 148.6180419921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000378.npy"}
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 49.586158752441406, "std": 73.70210266113281, "min": -127.03343963623047, "p10": -32.90822219848633, "median": 37.73535346984863, "p90": 156.04318237304688, "max": 197.21206665039062, "pos_frac": 0.796875, "sample": [67.61146545410156, -17.76513671875, 143.34698486328125, -3.6053314208984375, 22.290313720703125, 46.81208801269531, 23.1091365814209, -127.03343963623047, 65.22673034667969, 7.260505676269531, 12.360298156738281, 26.943603515625, -79.4271240234375, 122.20707702636719, 18.46955108642578, 194.70037841796875, 37.43414306640625, -37.99269485473633, 127.29314422607422, 157.1064910888672, 195.77508544921875, 76.37667846679688, 153.9805450439453, 114.53211212158203, 29.359161376953125, -31.84259033203125, 75.78437042236328, -84.26158142089844, 45.51978302001953, -54.830841064453125, 197.21206665039062, 38.32209014892578, 1.9254188537597656, 37.698062896728516, -66.994384765625, 135.1387939453125, 148.94857788085938, 8.880226135253906, 11.879470825195312, -7.066844940185547, -33.36492156982422, 9.117630004882812, 45.55664825439453, 165.08248901367188, 127.47955322265625, 13.508411407470703, 156.9271697998047, 173.91116333007812, 0.9256935119628906, -16.714859008789062, 132.0782928466797, 15.918365478515625, 19.430007934570312, 47.99366760253906, 120.5779037475586, 1.1217803955078125, 37.77264404296875, 61.5943603515625, 41.71792984008789, 2.3792591094970703, 95.36272430419922, 42.29823303222656, 94.49989318847656, -14.344192504882812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000379.npy"}
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 62.967567443847656, "std": 78.52870178222656, "min": -154.6351318359375, "p10": -19.228363609313963, "median": 72.03094100952148, "p90": 155.63795623779296, "max": 192.908935546875, "pos_frac": 0.8125, "sample": [35.0892448425293, 148.69876098632812, 80.25047302246094, 26.00848388671875, -63.719757080078125, -110.38724517822266, 156.56597900390625, 154.36822509765625, -6.593029022216797, 126.30206298828125, 106.62583923339844, 20.081836700439453, -10.218036651611328, 80.86044311523438, 5.478809356689453, -42.537322998046875, 3.3561058044433594, 49.59034729003906, 138.3496551513672, 156.2958221435547, 94.29338073730469, 56.51129913330078, -11.273101806640625, -154.6351318359375, 142.61825561523438, 1.8567733764648438, 127.28624725341797, 72.6503677368164, -16.307924270629883, 33.712005615234375, 105.92312622070312, 130.29318237304688, 71.62574005126953, 52.10536575317383, 192.908935546875, 6.386157989501953, 133.40335083007812, 126.7643814086914, 19.6884765625, 2.8991775512695312, 142.59414672851562, -136.46539306640625, 154.06301879882812, 155.83297729492188, 47.30986022949219, 126.15538787841797, -1.6802444458007812, 111.46456909179688, -20.47998046875, 105.30770874023438, 12.109550476074219, 7.258512496948242, -34.27095031738281, 171.55343627929688, 154.59300231933594, 155.1829071044922, 12.617345809936523, 161.33151245117188, 108.10822296142578, 162.52935791015625, 76.48526763916016, 72.43614196777344, 25.444734573364258, 17.26654052734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000380.npy"}
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 24.82586669921875, "std": 76.90789031982422, "min": -174.39463806152344, "p10": -64.08070068359375, "median": 10.514243125915527, "p90": 150.16282653808597, "max": 193.13681030273438, "pos_frac": 0.65625, "sample": [8.838165283203125, 2.0223236083984375, 1.011098861694336, -84.93736267089844, -174.39463806152344, 5.653591156005859, -1.7846145629882812, 24.171070098876953, 165.04013061523438, 66.52493286132812, 12.760581970214844, -6.702838897705078, 6.359893798828125, 32.325443267822266, -5.435625076293945, 193.13681030273438, 103.3050537109375, 0.4272956848144531, 61.54972839355469, -1.6600112915039062, 9.432462692260742, -80.3287582397461, 13.284585952758789, 166.92970275878906, -104.9120101928711, 72.42171478271484, -54.6363525390625, -47.057029724121094, 102.4354248046875, -2.254354476928711, 60.53279113769531, 142.52099609375, 15.629936218261719, 11.596023559570312, -26.569774627685547, 158.75051879882812, 38.24814987182617, -28.251155853271484, -113.72468566894531, 16.922515869140625, 153.43789672851562, -9.75897216796875, 26.69336700439453, 9.173810958862305, 49.19336700439453, 161.03988647460938, 113.15570068359375, 59.18836212158203, -2.444814682006836, 21.44678497314453, 3.8355255126953125, 115.64344024658203, 117.97044372558594, -10.14849853515625, 26.24190902709961, 4.657257080078125, 92.24552917480469, -149.5941162109375, -60.67107391357422, 39.34822082519531, -0.5743408203125, -65.5419692993164, -20.21123504638672, 155.3472137451172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000381.npy"}
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 42.540550231933594, "std": 64.5793685913086, "min": -139.06622314453125, "p10": -29.34012451171875, "median": 31.310781478881836, "p90": 138.9886413574219, "max": 178.53634643554688, "pos_frac": 0.75, "sample": [-7.1563720703125, 1.1612491607666016, 165.52120971679688, 54.538352966308594, 155.85235595703125, -5.4687957763671875, 20.295944213867188, 32.198028564453125, 97.88734436035156, 73.9697265625, 65.33978271484375, -0.9022216796875, -24.451644897460938, 19.906494140625, 132.7782440185547, 16.0245361328125, -29.516189575195312, -29.743438720703125, -139.06622314453125, 123.08125305175781, -1.3701763153076172, 47.32484436035156, 21.823013305664062, 1.8688011169433594, -28.929306030273438, 1.013427734375, 77.47675323486328, 37.548583984375, -54.91522979736328, 162.54867553710938, 65.28021240234375, 12.379924774169922, -49.870540618896484, -47.053306579589844, 155.0048828125, 4.1236114501953125, -32.35390090942383, 2.50372314453125, 31.5279541015625, 139.94412231445312, 55.63710021972656, 178.53634643554688, 0.9520721435546875, 32.30140686035156, 29.402694702148438, 91.87227630615234, -11.132980346679688, -0.8501052856445312, 16.64764404296875, 81.98297882080078, 76.49720764160156, 118.28115844726562, -26.847126007080078, 36.98225021362305, 119.19651794433594, 1.8675193786621094, 78.28829956054688, 31.093608856201172, 136.75918579101562, 148.59005737304688, 88.08396911621094, 95.35087585449219, 57.060550689697266, 17.916114807128906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000382.npy"}
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 46.73271179199219, "std": 81.57247161865234, "min": -152.97799682617188, "p10": -56.29049339294433, "median": 28.920204162597656, "p90": 161.24839477539064, "max": 184.64010620117188, "pos_frac": 0.6875, "sample": [-0.8228855133056641, 137.9639434814453, 94.18894958496094, 6.661796569824219, 25.28411102294922, -46.3485221862793, 110.687255859375, 52.735206604003906, 17.06690216064453, -75.5930404663086, 5.230564117431641, -70.60067749023438, 77.53146362304688, 61.18623352050781, 104.81188201904297, 50.45274353027344, 141.65972900390625, -60.55133819580078, 167.98724365234375, 131.2569122314453, 22.536941528320312, -13.5596923828125, 164.26902770996094, -25.38494873046875, 153.42123413085938, 30.410675048828125, 47.931610107421875, -14.292648315429688, 19.873125076293945, 170.82342529296875, 131.52342224121094, -40.87251281738281, -0.10466766357421875, 184.01177978515625, -13.76205062866211, 88.75265502929688, 138.36477661132812, -152.97799682617188, 8.294471740722656, 78.16096496582031, -89.73833465576172, 35.00630187988281, 17.085800170898438, 42.72991943359375, 163.28143310546875, -78.11244201660156, 9.906730651855469, 27.429733276367188, -43.47187042236328, 184.64010620117188, 21.72463035583496, 171.32745361328125, 125.7290267944336, 10.165931701660156, -6.006195068359375, 133.14312744140625, 131.18597412109375, -10.787139892578125, -6.864837646484375, -91.11700439453125, 151.5133514404297, -30.476715087890625, 156.504638671875, 57.88578796386719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000383.npy"}
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 44.81376647949219, "std": 69.5692138671875, "min": -173.85470581054688, "p10": -34.7048542022705, "median": 42.144351959228516, "p90": 152.14438934326174, "max": 179.86895751953125, "pos_frac": 0.828125, "sample": [-46.726776123046875, 149.3169708251953, 42.13423156738281, 115.92678833007812, 4.269275665283203, 71.61844635009766, 93.2892837524414, 57.479042053222656, 159.8528594970703, -33.33347702026367, 45.457763671875, 43.70683288574219, 30.68438720703125, 8.31072998046875, -10.273078918457031, 14.831619262695312, 4.8308868408203125, 165.94943237304688, 4.58868408203125, 154.09861755371094, -13.903205871582031, 36.168731689453125, 23.890403747558594, -35.29258728027344, -92.46418762207031, 82.16377258300781, 179.86895751953125, 42.15447235107422, -101.03636932373047, 86.24931335449219, 159.42050170898438, 101.17720031738281, 1.7962188720703125, 59.42310333251953, 72.30422973632812, 1.1288070678710938, 167.12271118164062, 26.964679718017578, -173.85470581054688, 139.25357055664062, 64.96461486816406, 58.471954345703125, 96.25223541259766, -38.142356872558594, 153.35614013671875, 75.14224243164062, 31.39957046508789, 7.224870681762695, 29.485782623291016, 115.6517105102539, 71.3304443359375, 15.367767333984375, 62.94915771484375, 31.831100463867188, 9.837684631347656, 1.1657752990722656, -83.40042877197266, 55.68678283691406, -0.0606231689453125, 6.070537567138672, 59.70318603515625, 116.5624771118164, 62.524688720703125, 26.15778350830078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000384.npy"}
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 58.79779052734375, "std": 70.34893798828125, "min": -59.730506896972656, "p10": -32.2554817199707, "median": 51.968618392944336, "p90": 157.9426483154297, "max": 181.77195739746094, "pos_frac": 0.765625, "sample": [-59.730506896972656, -8.573596954345703, 151.47259521484375, 76.50103759765625, 40.53569030761719, 12.913360595703125, 34.245086669921875, 105.06353759765625, 16.701828002929688, 37.873992919921875, -47.75407028198242, 93.64384460449219, 78.4559555053711, 152.45143127441406, 48.411643981933594, 130.0394744873047, 161.22793579101562, 132.39723205566406, -4.132225036621094, 55.52559280395508, -33.47822570800781, -48.12657165527344, 33.48777770996094, 155.80514526367188, 156.65992736816406, 158.82554626464844, 41.96674728393555, 57.85392761230469, 60.119712829589844, 116.46983337402344, 105.36344909667969, -48.426544189453125, 42.78736877441406, 11.84564208984375, 57.33627700805664, 158.4923858642578, 137.29800415039062, 93.51504516601562, 77.9962158203125, 87.69055938720703, -25.026416778564453, -40.074806213378906, 88.04171752929688, 31.8746337890625, -2.3494033813476562, 181.77195739746094, 173.8528594970703, -52.27897644042969, 0.4114723205566406, -23.194244384765625, -11.535064697265625, 32.62602615356445, -29.40241241455078, 0.98272705078125, 118.41755676269531, 78.7993392944336, 8.982139587402344, 152.1310577392578, -26.762001037597656, 170.2694854736328, 6.746593475341797, 0.43372344970703125, 161.43788146972656, 136.15061950683594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000385.npy"}
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 37.383819580078125, "std": 68.23109436035156, "min": -158.23117065429688, "p10": -33.92898025512695, "median": 50.21476364135742, "p90": 121.29777755737305, "max": 175.142822265625, "pos_frac": 0.734375, "sample": [79.3726806640625, -83.71855163574219, 4.763973236083984, 58.70098876953125, 67.58601379394531, -29.499053955078125, -1.2764053344726562, 69.71232604980469, 66.14457702636719, 75.7790298461914, 22.783721923828125, 58.291961669921875, -125.74526977539062, 14.779586791992188, 16.368385314941406, 127.22789764404297, 110.92459106445312, 123.00552368164062, 121.7408676147461, 89.76863098144531, 17.31788444519043, 85.22908782958984, 53.68738555908203, -56.836246490478516, -35.50794982910156, 110.76719665527344, 120.26390075683594, -18.78472900390625, 101.5658950805664, 80.16838836669922, 10.294349670410156, 56.446685791015625, -16.912315368652344, 5.9033203125, -28.62411117553711, -30.9127197265625, 48.18799591064453, 12.584808349609375, 73.478271484375, 152.90615844726562, 11.5390625, -33.936668395996094, 69.95487213134766, 81.07171630859375, 107.2332534790039, -19.953662872314453, -32.568603515625, 21.141807556152344, 0.2642955780029297, 175.142822265625, 6.8667755126953125, -33.911041259765625, -104.56858825683594, 99.76521301269531, 163.66519165039062, 68.82349395751953, 52.24153137207031, 69.38246154785156, 89.80711364746094, 128.662841796875, 16.272140502929688, -5.651638031005859, 11.616409301757812, -158.23117065429688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000386.npy"}
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 55.425376892089844, "std": 66.82373809814453, "min": -109.59193420410156, "p10": -16.724974441528317, "median": 41.933319091796875, "p90": 159.60660095214845, "max": 208.70152282714844, "pos_frac": 0.828125, "sample": [31.542694091796875, 4.309989929199219, 125.9151382446289, -29.88151741027832, 4.9282684326171875, 168.00926208496094, 61.86647033691406, -13.534778594970703, 46.240882873535156, 208.70152282714844, 27.190080642700195, -2.864961624145508, 35.33221435546875, 44.38432312011719, 161.09312438964844, 2.8128128051757812, 32.614715576171875, 130.8361358642578, 143.65469360351562, 0.530364990234375, 97.801513671875, 163.73956298828125, 88.6181640625, -20.87874984741211, 2.8184890747070312, 116.04653930664062, 3.5586776733398438, 9.88499641418457, 148.30862426757812, 64.40497589111328, 1.1707649230957031, -46.23942565917969, 41.377723693847656, 156.13804626464844, -23.444366455078125, 102.4347915649414, 67.59980773925781, 22.78264045715332, 13.15188980102539, -9.547286987304688, 1.3528881072998047, 51.94329833984375, 89.94669342041016, -109.59193420410156, -30.1373291015625, 49.61408615112305, 113.7570571899414, 171.58688354492188, 41.47135925292969, 151.06973266601562, -9.454692840576172, 42.39527893066406, 21.82159423828125, 57.34197998046875, 64.22909545898438, 36.25361251831055, -18.092201232910156, 6.631872177124023, 88.94429016113281, 168.13209533691406, 188.9146728515625, 74.26124572753906, 36.16627502441406, 75.25740051269531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000387.npy"}
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 35.175655364990234, "std": 78.1881103515625, "min": -143.03823852539062, "p10": -46.882171630859375, "median": 16.177780151367188, "p90": 144.37156677246094, "max": 197.16595458984375, "pos_frac": 0.6875, "sample": [146.0625, 75.46222686767578, 107.89878845214844, 174.530517578125, 41.758949279785156, 92.04008483886719, 36.80976867675781, 15.410659790039062, 14.710868835449219, -27.682411193847656, 29.26335906982422, 133.6025390625, -10.232070922851562, 87.95740509033203, -24.841812133789062, 197.16595458984375, 2.077068328857422, 193.35401916503906, 16.944900512695312, 105.84339904785156, 131.7646484375, 4.5571136474609375, 133.2345733642578, 12.26129150390625, 145.46299743652344, -143.03823852539062, 44.6611328125, 55.438018798828125, 8.879724502563477, 4.225069046020508, -0.6320056915283203, -19.523229598999023, -8.164604187011719, 141.1962890625, -6.782278060913086, 29.076873779296875, 25.385868072509766, -48.327972412109375, 48.24589538574219, -82.5396728515625, 53.65404510498047, 132.38616943359375, -107.89793395996094, 120.29804992675781, 143.74415588378906, 7.116455078125, -3.4742965698242188, 1.6911087036132812, 20.78582763671875, 144.6404571533203, -2.8167648315429688, 153.4877166748047, -43.508636474609375, 8.886215209960938, -119.22530364990234, -37.1474609375, -32.87751770019531, 69.25837707519531, -7.720973968505859, 10.347000122070312, 50.49320983886719, -104.73407745361328, -93.94375610351562, 4.281755447387695], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000388.npy"}
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 56.80760955810547, "std": 74.8628921508789, "min": -151.43789672851562, "p10": -25.00465469360351, "median": 51.846500396728516, "p90": 155.6728515625, "max": 199.40689086914062, "pos_frac": 0.71875, "sample": [-36.410675048828125, -6.964338302612305, 75.90528869628906, 51.3013916015625, 98.00469970703125, 48.113983154296875, 116.52000427246094, 136.88235473632812, 176.2115478515625, 100.324462890625, -58.61164855957031, -28.217422485351562, 67.73004150390625, 106.86280059814453, 25.133712768554688, 81.07019805908203, 85.06307983398438, 146.62420654296875, 73.41041564941406, 34.61166000366211, 144.36325073242188, 134.7547607421875, -2.418710708618164, 16.974151611328125, 121.87954711914062, -15.175796508789062, 137.4715118408203, -18.40587615966797, 26.995742797851562, 74.70809173583984, -1.0850067138671875, 131.40106201171875, -12.681549072265625, 153.38858032226562, 162.83534240722656, 19.100616455078125, 168.20912170410156, 91.18157196044922, 25.276721954345703, -151.43789672851562, 166.98452758789062, -4.684070587158203, 8.116693496704102, -15.501873016357422, 52.39160919189453, 12.732192993164062, 147.14044189453125, -37.6945915222168, -92.53126525878906, 25.83843994140625, 0.01064300537109375, 3.3960189819335938, 62.81561279296875, -3.6694488525390625, -27.83270263671875, 176.96400451660156, 156.65182495117188, 135.9233856201172, 53.6693115234375, 199.40689086914062, -5.66252326965332, 119.21627807617188, 2.8617420196533203, -1.7572517395019531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000389.npy"}
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 43.51176452636719, "std": 67.2450942993164, "min": -140.4101104736328, "p10": -38.93794326782226, "median": 35.306148529052734, "p90": 141.57104949951173, "max": 170.58352661132812, "pos_frac": 0.78125, "sample": [-61.4095458984375, 62.15863037109375, 52.01665496826172, 25.410781860351562, 129.572021484375, 37.17301940917969, 124.84335327148438, 113.91881561279297, -45.53070068359375, 37.83625030517578, 0.5746593475341797, -23.632980346679688, 118.95118713378906, 18.694379806518555, 2.5891342163085938, 170.58352661132812, 141.13783264160156, -40.627716064453125, 96.68572998046875, -43.618385314941406, 103.20616149902344, 5.591941833496094, 7.3602294921875, 143.80865478515625, 122.4068603515625, 141.7567138671875, 9.917144775390625, -67.74491882324219, 0.9586372375488281, 104.9636001586914, -16.8668212890625, 8.45755672454834, -17.367048263549805, 33.43927764892578, -7.332096099853516, -34.995140075683594, -140.4101104736328, 77.16949462890625, 147.10850524902344, 0.5107154846191406, 147.52508544921875, 12.579416275024414, 7.1756439208984375, 43.66654968261719, -1.0375022888183594, 61.468109130859375, 15.603752136230469, 32.17316436767578, 147.98043823242188, 40.13274383544922, 123.34911346435547, 99.57818603515625, 5.5611572265625, 47.713104248046875, 2.470478057861328, 89.74905395507812, 0.41055870056152344, -63.81126403808594, 75.81997680664062, -13.157690048217773, 67.86820983886719, 70.74322509765625, 64.24921417236328, 167.67616271972656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000390.npy"}
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 52.93098449707031, "std": 81.88585662841797, "min": -149.80856323242188, "p10": -49.09795608520508, "median": 59.68182945251465, "p90": 157.18702697753906, "max": 207.1945343017578, "pos_frac": 0.765625, "sample": [10.009170532226562, -149.80856323242188, 157.8505859375, 61.54876708984375, 57.285301208496094, 157.36468505859375, 8.748363494873047, 156.77249145507812, 152.16539001464844, 206.9227294921875, 70.14353942871094, -135.97674560546875, 119.00550842285156, -18.41791534423828, 78.68708801269531, 174.74867248535156, -85.10118865966797, 36.03004455566406, -37.256591796875, 25.470787048339844, -5.230129241943359, 121.37267303466797, 71.73028564453125, 55.26531219482422, 27.098403930664062, -45.00761413574219, -102.24462890625, -17.5804443359375, 83.65473937988281, -5.034355163574219, 207.1945343017578, 104.90280151367188, 84.95022583007812, -8.693443298339844, 63.65313720703125, 0.862274169921875, 24.962373733520508, 79.11848449707031, 93.67362976074219, 126.39640045166016, 103.74886322021484, 72.68742370605469, -116.4894027709961, -65.28184509277344, 58.634254455566406, 36.300838470458984, 164.87753295898438, 11.371772766113281, 145.59213256835938, 130.30917358398438, 3.24884033203125, -8.817550659179688, 150.6044464111328, 132.15341186523438, 166.65731811523438, 117.39726257324219, 99.40667724609375, 47.13385009765625, -50.85095977783203, 10.480003356933594, 19.497604370117188, 60.72940444946289, 90.0484390258789, 0.906585693359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000391.npy"}
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 58.642520904541016, "std": 80.3648452758789, "min": -142.34381103515625, "p10": -39.52804260253906, "median": 50.623355865478516, "p90": 167.47654876708987, "max": 236.91265869140625, "pos_frac": 0.796875, "sample": [-117.2043228149414, -0.21926116943359375, 48.06865692138672, 57.65415954589844, 14.558441162109375, -11.609809875488281, 154.53424072265625, -41.56071472167969, 93.91151428222656, 147.81997680664062, 68.56718444824219, -142.34381103515625, 114.21661376953125, 156.5953369140625, 177.4478759765625, 55.208831787109375, 13.076324462890625, 141.84344482421875, 153.9720916748047, 11.483367919921875, 32.104103088378906, -1.0523681640625, 72.54691314697266, -75.25286865234375, 128.33432006835938, 141.8497314453125, 160.424072265625, 51.16015625, 77.24227905273438, -76.05989074707031, 50.08655548095703, 170.49903869628906, 116.42579650878906, 10.754386901855469, -43.77192687988281, 78.98551940917969, 30.20237922668457, 101.21365356445312, 92.14927673339844, 140.19850158691406, 178.86514282226562, 175.8216552734375, -67.02584838867188, 103.7489013671875, 1.0616073608398438, 1.7541160583496094, 5.0394744873046875, 16.265853881835938, 170.7068328857422, 3.02496337890625, -0.9603958129882812, 236.91265869140625, 6.036884307861328, 36.2596435546875, -34.78514099121094, 8.644098281860352, 182.40966796875, -9.7254638671875, 29.26446533203125, 118.30917358398438, 10.018587112426758, 100.1344985961914, 31.519241333007812, 95.76095581054688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000392.npy"}
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 58.54799270629883, "std": 71.23196411132812, "min": -93.46200561523438, "p10": -14.264326858520509, "median": 46.70939064025879, "p90": 158.00748443603516, "max": 186.03826904296875, "pos_frac": 0.75, "sample": [58.21846008300781, -3.5475101470947266, 131.86363220214844, 159.39935302734375, 157.97549438476562, 79.66710662841797, 36.63142776489258, 53.798240661621094, 67.66230010986328, 44.60812759399414, -13.447622299194336, 3.5250625610351562, 173.61642456054688, 136.0609130859375, 52.57801818847656, -34.40625762939453, 2.2216110229492188, 165.0857696533203, 3.3167495727539062, -2.1600914001464844, -10.0313720703125, 3.253358840942383, 5.671905517578125, 177.073486328125, 128.12709045410156, 145.7799835205078, -22.22751808166504, 138.99371337890625, 118.19969177246094, 130.07730102539062, 49.516204833984375, -58.65085220336914, 130.12818908691406, 158.0211944580078, -0.37505531311035156, 35.61619567871094, -85.88075256347656, 164.024169921875, -0.9459075927734375, -13.18556022644043, 80.97916412353516, 17.449142456054688, 121.76838684082031, -14.09994125366211, 48.81065368652344, 123.67686462402344, 132.59857177734375, 18.60858154296875, 36.54743576049805, 100.28482055664062, 151.46832275390625, 186.03826904296875, 60.24779510498047, -5.778465270996094, 15.379440307617188, 124.23119354248047, 25.518157958984375, 37.65266418457031, -93.46200561523438, -14.33477783203125, 8.238658905029297, 123.10183715820312, -14.897384643554688, 11.191543579101562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000393.npy"}
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 50.85765838623047, "std": 63.534847259521484, "min": -138.64988708496094, "p10": -9.421225547790526, "median": 52.71677589416504, "p90": 135.03465118408207, "max": 184.57156372070312, "pos_frac": 0.8125, "sample": [34.488067626953125, 158.40231323242188, 113.77955627441406, 25.480117797851562, 21.07794189453125, 32.16041564941406, -7.747100830078125, 47.21287536621094, 168.58847045898438, 47.22410583496094, 144.94601440429688, 97.21905517578125, 13.87087631225586, 55.2433967590332, -40.989261627197266, 59.00879669189453, -4.090877532958984, -9.943124771118164, -8.203460693359375, 18.065845489501953, 125.93284606933594, -30.840953826904297, 59.857234954833984, 81.03648376464844, 72.03053283691406, 169.80889892578125, 118.1049575805664, 97.42207336425781, 92.03072357177734, 3.8062210083007812, -82.15492248535156, 13.075759887695312, 151.23313903808594, 83.25848388671875, 34.00360107421875, 85.21661376953125, 1.0448169708251953, 72.83564758300781, 47.08563232421875, 8.33694076538086, 90.61054992675781, -56.21833419799805, -8.036865234375, 59.31388854980469, 81.83114624023438, 184.57156372070312, -138.64988708496094, 114.07478332519531, -2.5924453735351562, 67.82737731933594, -87.47496032714844, 71.29630279541016, 40.19194793701172, 63.80683898925781, 65.58085632324219, 50.190155029296875, 65.20169067382812, 105.46484375, 4.496604919433594, 21.455413818359375, 12.762893676757812, 138.9354248046875, 43.65673828125, 97.67485809326172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000394.npy"}
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 40.60387420654297, "std": 69.68480682373047, "min": -108.47905731201172, "p10": -35.48719177246094, "median": 21.11608123779297, "p90": 146.87015228271486, "max": 201.40740966796875, "pos_frac": 0.71875, "sample": [103.5180435180664, 119.00650024414062, 99.46775817871094, 5.576240539550781, 113.6318130493164, 37.387908935546875, -1.8368587493896484, 78.37252044677734, 164.33541870117188, -1.700265884399414, 31.798660278320312, 76.86605834960938, 152.10984802246094, 108.5779037475586, 17.060585021972656, 159.61180114746094, -15.15338134765625, 79.110107421875, -34.598968505859375, 8.5323486328125, -2.2221755981445312, 14.073921203613281, 9.878204345703125, 49.49458694458008, 128.41212463378906, -108.47905731201172, 25.17157745361328, 143.66400146484375, 44.00244140625, -3.2297439575195312, 46.198516845703125, 201.40740966796875, 40.107208251953125, 38.485984802246094, -77.27638244628906, 31.909408569335938, -38.97697067260742, 8.75485610961914, 93.7557144165039, -34.5373420715332, -11.76690673828125, -0.017072677612304688, 148.2442169189453, 11.604686737060547, 6.820636749267578, 68.27984619140625, 2.4335250854492188, 171.80384826660156, -58.14265441894531, -13.353372573852539, 2.864837646484375, 192.24928283691406, -35.86785888671875, 4.130649566650391, 29.58722686767578, 134.69061279296875, -52.0814094543457, 16.6746826171875, 66.87538146972656, -84.64659881591797, 8.77450180053711, -14.621749877929688, 3.4424991607666016, 88.40094757080078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000395.npy"}
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 49.21974563598633, "std": 65.85359191894531, "min": -122.11994171142578, "p10": -19.62967872619629, "median": 28.356456756591797, "p90": 146.4351318359375, "max": 202.8743438720703, "pos_frac": 0.8125, "sample": [159.01596069335938, 202.8743438720703, 126.57176208496094, 146.46481323242188, 127.947265625, 87.72602844238281, 13.405208587646484, 146.36587524414062, 49.74445343017578, 26.27582550048828, 86.95858001708984, 16.373931884765625, 105.17093658447266, -32.96828842163086, 36.661659240722656, -0.9249649047851562, 27.923065185546875, 76.00006866455078, 1.1065139770507812, 40.24097442626953, 63.01640319824219, -37.07301330566406, 169.85848999023438, 37.724891662597656, 6.409721374511719, 34.22962188720703, 113.26275634765625, -122.11994171142578, 101.920654296875, 88.14224243164062, 9.330028533935547, -49.6652717590332, 0.5967254638671875, 118.13188171386719, 16.762237548828125, 15.407890319824219, 79.40962982177734, 9.999053955078125, 173.85931396484375, 146.23651123046875, -8.97043228149414, 18.96053123474121, 1.0045051574707031, 84.759033203125, 11.024971008300781, 58.96382141113281, 28.78984832763672, 10.752067565917969, 63.81829833984375, -34.09266662597656, 12.20977783203125, -19.671855926513672, 11.464883804321289, 20.766380310058594, -12.301639556884766, 145.62518310546875, 56.104148864746094, 156.86366271972656, 0.7525177001953125, 25.094945907592773, -19.531265258789062, -26.946735382080078, -5.998260498046875, 152.24815368652344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000396.npy"}
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 27.40616226196289, "std": 81.91283416748047, "min": -168.24781799316406, "p10": -77.18323822021483, "median": 26.751266479492188, "p90": 128.77402954101564, "max": 211.15823364257812, "pos_frac": 0.71875, "sample": [2.063070297241211, 28.00170135498047, -72.06802368164062, 55.99041748046875, -47.147071838378906, 69.70773315429688, 115.55577087402344, 58.36858367919922, 122.47016906738281, -2.1381988525390625, 1.6292953491210938, 64.19001770019531, 7.889610290527344, 92.73979187011719, 82.87512969970703, 90.10193634033203, -167.70919799804688, -113.33099365234375, 36.49740219116211, -11.226966857910156, 94.5759048461914, -17.974075317382812, 64.68627166748047, 109.96115112304688, 25.856170654296875, -168.24781799316406, -79.37547302246094, 8.305412292480469, 157.5054931640625, 58.10887145996094, 25.373252868652344, 23.913488388061523, 130.03775024414062, 29.360809326171875, 148.2843780517578, -23.008453369140625, 40.5690803527832, 125.82534790039062, 111.40554809570312, 6.243547439575195, 2.907501220703125, 172.66058349609375, 112.74791717529297, 46.584503173828125, -161.52978515625, 27.6463623046875, 211.15823364257812, -51.71685791015625, -41.54438781738281, 2.3691864013671875, 1.3931922912597656, 46.103797912597656, 0.01381683349609375, 8.251441955566406, 0.5463008880615234, -27.62364959716797, 33.54674530029297, 43.984031677246094, -33.28666687011719, 132.5105438232422, -140.5025177001953, -82.4124526977539, -0.15003013610839844, 164.4698028564453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000397.npy"}
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 60.99681854248047, "std": 69.44489288330078, "min": -65.2862777709961, "p10": -9.94043388366699, "median": 47.42409896850586, "p90": 157.56898651123046, "max": 252.0152587890625, "pos_frac": 0.75, "sample": [127.61871337890625, 79.26191711425781, 120.81935119628906, 66.93898010253906, 157.0778350830078, 12.042224884033203, 162.6813201904297, 124.49383544921875, -10.365951538085938, 163.22616577148438, 100.80886840820312, 18.573043823242188, -8.557510375976562, -4.325126647949219, 101.2269287109375, 3.6740570068359375, 105.42302703857422, -6.21588134765625, 47.600555419921875, -37.130062103271484, 93.94807434082031, -4.240488052368164, 164.395263671875, 186.9059295654297, -0.5192489624023438, 148.1756591796875, 144.2635498046875, 101.3051986694336, 47.247642517089844, 26.229347229003906, -37.31870651245117, 44.82731628417969, 10.856521606445312, 140.49594116210938, -1.5170974731445312, -6.82166862487793, 40.094696044921875, 133.22122192382812, 12.408157348632812, 59.36506652832031, -30.743881225585938, -11.476585388183594, 146.8885955810547, 76.17694091796875, 95.802978515625, 121.1478042602539, 14.302938461303711, 8.294883728027344, -8.947559356689453, 70.471435546875, 2.5076828002929688, 252.0152587890625, 15.928871154785156, -5.135566711425781, 72.17295837402344, -18.62059783935547, 157.77947998046875, -65.2862777709961, 48.63605499267578, 5.636140823364258, 9.323970794677734, 133.8201141357422, 160.28907775878906, 24.6169376373291], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000398.npy"}
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 66.17098999023438, "std": 77.01811218261719, "min": -101.14070129394531, "p10": -28.456905364990234, "median": 58.684410095214844, "p90": 169.5206024169922, "max": 258.53741455078125, "pos_frac": 0.765625, "sample": [115.71421813964844, 45.79302215576172, 114.50102996826172, 165.39471435546875, -29.227630615234375, 58.469818115234375, 126.68585205078125, 27.111732482910156, -5.970706939697266, -12.105461120605469, -70.70040893554688, 70.3084716796875, 78.481201171875, -61.47061538696289, 31.649795532226562, 66.60673522949219, 76.75193786621094, 43.829071044921875, 71.47825622558594, 175.80783081054688, -29.373291015625, -9.832138061523438, -101.14070129394531, 117.27598571777344, 203.78909301757812, 35.00270080566406, 15.362964630126953, 258.53741455078125, 33.602210998535156, 7.410453796386719, 14.032234191894531, -0.4219684600830078, 71.64950561523438, 29.525470733642578, 41.64274597167969, -26.658546447753906, 56.123023986816406, 158.47596740722656, 40.935447692871094, 174.4788055419922, 58.12834167480469, 0.4620838165283203, 164.9218292236328, -18.5672607421875, 165.76974487304688, 106.78520965576172, 71.86762237548828, 64.67048645019531, -46.372161865234375, -22.324607849121094, 158.78848266601562, 172.19989013671875, 157.81161499023438, 119.2387466430664, 134.82369995117188, 157.23297119140625, 128.307861328125, 178.50152587890625, 58.89900207519531, -37.49277114868164, 171.12811279296875, -9.10284423828125, 67.23307037353516, 52.506431579589844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000399.npy"}
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 40.524681091308594, "std": 71.22846221923828, "min": -139.43313598632812, "p10": -35.16650619506835, "median": 21.9365291595459, "p90": 144.17032775878909, "max": 184.33648681640625, "pos_frac": 0.6875, "sample": [-5.018726348876953, 84.19934844970703, 30.011749267578125, -18.379440307617188, 82.70263671875, 110.98855590820312, 12.546958923339844, 115.98846435546875, 101.95578002929688, 138.28001403808594, 40.27644729614258, -47.37244415283203, 30.27926254272461, 1.8897552490234375, 3.435314178466797, 13.299072265625, 2.989105224609375, 146.6947479248047, 117.55692291259766, 109.41708374023438, -27.67352294921875, 28.155826568603516, 85.369873046875, 3.5016422271728516, -42.707275390625, 79.88243865966797, 19.680410385131836, 11.118301391601562, 21.671863555908203, -4.118478775024414, -69.1800537109375, -4.12721061706543, 113.63743591308594, -4.273101806640625, 163.78782653808594, -4.256763458251953, -55.48521423339844, 22.472660064697266, 158.97015380859375, 100.69766235351562, 162.06509399414062, 101.79084014892578, -139.43313598632812, 89.7578125, 5.866569519042969, 18.43695068359375, 0.20574951171875, -22.319107055664062, -19.893028259277344, 184.33648681640625, 49.735774993896484, 124.47847747802734, -38.377784729003906, 41.95476531982422, -7.564443588256836, 167.26695251464844, 176.80157470703125, 73.12979125976562, 22.201194763183594, -8.415435791015625, -117.19132995605469, -10.227642059326172, -4.704521179199219, 74.81288146972656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000400.npy"}
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 56.537445068359375, "std": 69.14269256591797, "min": -65.66293334960938, "p10": -28.839103126525877, "median": 38.937007904052734, "p90": 159.57760009765624, "max": 195.70059204101562, "pos_frac": 0.8125, "sample": [-9.928947448730469, 120.03096008300781, 131.1490478515625, 53.2874755859375, -29.11697769165039, 0.9173545837402344, 10.1563720703125, 34.66108703613281, -33.72128677368164, 6.624580383300781, 159.5399169921875, 0.1404266357421875, -65.66293334960938, -0.712799072265625, 39.51988983154297, -30.073570251464844, 71.94638061523438, 186.51507568359375, 140.03282165527344, 150.76742553710938, 54.61272430419922, 62.92384719848633, 26.313880920410156, 14.345935821533203, -31.355194091796875, -54.99980926513672, 55.87164306640625, 77.45865631103516, 59.27849578857422, 52.62879180908203, 85.75137329101562, 26.305519104003906, 6.433931350708008, 119.3629379272461, -3.8152084350585938, 172.26296997070312, 4.122030258178711, 124.41329193115234, 19.072206497192383, 148.5460205078125, 8.309776306152344, 163.43777465820312, 67.19296264648438, -28.19072914123535, 154.79318237304688, 16.720172882080078, 38.3541259765625, 195.70059204101562, 146.04470825195312, 89.31339263916016, 90.56043243408203, 3.3658504486083984, 16.149295806884766, 5.6223297119140625, -46.57589340209961, 76.80516815185547, 66.72332763671875, 19.25111198425293, 159.59375, 170.80526733398438, 38.09587097167969, -10.979061126708984, 29.21164321899414, 192.48504638671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000401.npy"}
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 47.936309814453125, "std": 78.5478515625, "min": -109.94947814941406, "p10": -50.59887237548828, "median": 40.543792724609375, "p90": 160.3211074829102, "max": 185.2328643798828, "pos_frac": 0.71875, "sample": [82.45623779296875, 169.47817993164062, 119.71714782714844, 119.08899688720703, 107.70698547363281, 67.04752349853516, 185.2328643798828, 15.81201171875, 51.16853332519531, 72.76163482666016, -15.897438049316406, 181.79629516601562, 39.853981018066406, -52.354034423828125, 41.217254638671875, 47.974727630615234, -0.9810943603515625, 16.543092727661133, 26.451553344726562, 27.9071044921875, 111.46056365966797, -0.4365425109863281, 2.9485244750976562, -98.89543151855469, 169.10427856445312, 2.644306182861328, 81.7895278930664, 144.83700561523438, 55.69633865356445, 19.926498413085938, 164.50975036621094, 150.547607421875, -11.662918090820312, 86.11503601074219, -4.1032867431640625, 54.64033508300781, 145.89837646484375, 143.36964416503906, 10.4532470703125, 80.90970611572266, -55.82994842529297, -97.13519287109375, -105.71041870117188, 90.61126708984375, 17.521148681640625, 16.91706657409668, 39.870330810546875, -46.50349426269531, 5.496177673339844, -1.7233772277832031, 133.37791442871094, 166.38731384277344, 132.06703186035156, 183.3230438232422, -22.332443237304688, -18.297964096069336, -10.750114440917969, -109.94947814941406, -109.23692321777344, 65.92819213867188, 33.26206588745117, 136.03347778320312, -34.924049377441406, 46.78791809082031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000402.npy"}
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 57.51924133300781, "std": 80.4508056640625, "min": -139.13671875, "p10": -38.24497947692871, "median": 35.85976028442383, "p90": 162.25987854003907, "max": 230.23870849609375, "pos_frac": 0.78125, "sample": [-4.379974365234375, -4.035400390625, 195.2777099609375, 16.50262451171875, 159.9156494140625, 34.458465576171875, 184.06195068359375, 114.18644714355469, 17.651718139648438, 11.49981689453125, 65.30082702636719, -38.90614318847656, 123.04197692871094, -43.58646011352539, 151.29629516601562, 98.73722839355469, 50.92247772216797, 8.498170852661133, 23.935951232910156, 4.245367050170898, -5.127927780151367, -59.160884857177734, 230.23870849609375, 144.39471435546875, -13.5538330078125, 118.61961364746094, 156.85610961914062, 4.763460159301758, 163.77894592285156, 18.96063995361328, -1.2354545593261719, 68.58039855957031, 23.625473022460938, 37.26105499267578, 58.554229736328125, -31.96175765991211, 1.5494747161865234, 162.8180389404297, 17.435327529907227, 59.02276611328125, 19.52264404296875, 1.0474414825439453, 160.95750427246094, 22.656993865966797, 121.41082763671875, -55.09552001953125, 129.48080444335938, 131.64566040039062, 66.39546203613281, 14.70556640625, 145.58847045898438, 168.97955322265625, -139.13671875, -37.21097183227539, 171.40585327148438, -119.08116149902344, 140.80532836914062, 83.74642181396484, 57.84328842163086, 152.59010314941406, -38.68812561035156, 1.9342727661132812, 25.870216369628906, 129.8136749267578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000403.npy"}
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 59.303890228271484, "std": 80.98713684082031, "min": -96.80717468261719, "p10": -42.9811164855957, "median": 48.53437423706055, "p90": 167.05310668945313, "max": 212.8546142578125, "pos_frac": 0.75, "sample": [-9.397216796875, 82.85586547851562, 94.35176849365234, 5.1017608642578125, 48.41992950439453, 12.877479553222656, 88.81324005126953, 10.310417175292969, -41.48876190185547, -17.342147827148438, 169.44293212890625, -16.225128173828125, -96.80717468261719, -84.97344970703125, -1.0650062561035156, 171.74639892578125, -19.816261291503906, 1.7363052368164062, 166.2850341796875, 149.81549072265625, 93.83587646484375, 124.7494125366211, 106.29971313476562, 177.43798828125, -43.620697021484375, 133.48104858398438, -52.65874481201172, 157.10305786132812, 144.30345153808594, 132.20962524414062, 6.559074401855469, 119.17840576171875, -15.363225936889648, 132.52090454101562, -20.469078063964844, 6.583747863769531, 23.819862365722656, 123.8046646118164, 101.16954803466797, 6.680950164794922, -0.7705764770507812, 0.25337982177734375, 110.79293823242188, 29.594390869140625, 152.28770446777344, 14.75229263305664, 113.6292724609375, 166.71417236328125, 170.45587158203125, 48.606361389160156, 1.3799476623535156, 212.8546142578125, 211.8577880859375, 57.59660339355469, 135.7906951904297, 107.19066619873047, 167.1983642578125, -63.0556640625, -57.598114013671875, -80.56993103027344, 48.46238708496094, 56.277286529541016, 7.169120788574219, 12.312606811523438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000404.npy"}
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 46.69974136352539, "std": 73.62545013427734, "min": -124.45770263671875, "p10": -36.71433944702148, "median": 25.89072322845459, "p90": 160.18121795654298, "max": 188.63363647460938, "pos_frac": 0.671875, "sample": [136.83763122558594, -15.19527816772461, -66.46466064453125, 161.2484893798828, 15.567893981933594, 12.032028198242188, 157.69091796875, -37.0267333984375, -1.4015235900878906, -43.07722473144531, -19.869186401367188, 3.4943599700927734, 2.351633071899414, -5.164649963378906, 71.65230560302734, 13.281852722167969, 188.63363647460938, -3.447263717651367, -5.755577087402344, 106.87188720703125, 155.31036376953125, 108.86868286132812, 13.693519592285156, 62.54307556152344, -1.5328292846679688, 34.031124114990234, 100.32778930664062, 163.01681518554688, -45.43951416015625, 97.4774169921875, 21.905010223388672, -7.334918975830078, 124.79833984375, 141.5767822265625, -13.790756225585938, 71.95464324951172, 15.858463287353516, -124.45770263671875, 64.75213623046875, 39.91438293457031, 31.63129425048828, -40.096946716308594, 53.565773010253906, 48.206634521484375, 8.595954895019531, 23.604955673217773, 16.44866180419922, 39.18061828613281, 173.99122619628906, 136.41781616210938, -19.53457260131836, -5.844818115234375, 107.01460266113281, 28.176490783691406, 164.4137725830078, 165.74826049804688, -21.943389892578125, 51.47884750366211, 124.52108001708984, -63.177947998046875, -14.095069885253906, 187.88365173339844, 132.8485870361328, -35.98542022705078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000405.npy"}
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 60.04176712036133, "std": 81.47892761230469, "min": -156.5906524658203, "p10": -35.76196250915527, "median": 55.61143493652344, "p90": 162.39232940673827, "max": 187.4104766845703, "pos_frac": 0.75, "sample": [155.52178955078125, 31.964447021484375, -38.785911560058594, -21.432937622070312, 123.25941467285156, 86.11363983154297, 148.41445922851562, 107.62548828125, -45.88153839111328, 184.9376983642578, 113.9579086303711, 125.01334381103516, 1.5676612854003906, -8.105560302734375, 172.67227172851562, 162.01890563964844, 178.29519653320312, 19.178735733032227, 53.88501739501953, 161.61819458007812, 79.89743041992188, 23.356300354003906, 97.29078674316406, 66.67688751220703, 16.06720733642578, 118.128173828125, -28.70608139038086, 106.05595397949219, -2.5823917388916016, 48.754703521728516, -0.57470703125, -2.9815292358398438, -131.1904754638672, -18.137630462646484, 44.32643127441406, 162.5523681640625, 57.337852478027344, 169.40704345703125, -57.70700454711914, 178.5447998046875, 97.74781799316406, 36.491455078125, 40.881507873535156, 187.4104766845703, 9.126846313476562, 156.8912811279297, -80.09712219238281, 20.11045265197754, 95.57791900634766, 155.37191772460938, 14.925168991088867, -26.183334350585938, 148.5916290283203, -92.19355773925781, 32.29865264892578, -156.5906524658203, 81.63253784179688, 89.4315185546875, -17.9453125, 28.675491333007812, 105.14247131347656, 90.1092529296875, 41.674774169921875, 145.23753356933594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000406.npy"}
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 46.884368896484375, "std": 78.60919952392578, "min": -126.11981201171875, "p10": -48.25475006103515, "median": 43.622175216674805, "p90": 159.0229431152344, "max": 184.8699188232422, "pos_frac": 0.671875, "sample": [1.7729339599609375, -5.746747970581055, -50.210540771484375, 72.40674591064453, -39.391136169433594, 153.2113800048828, 153.52430725097656, -43.69123840332031, -77.37733459472656, 102.7967529296875, 55.26017761230469, 46.73244094848633, -0.303375244140625, 3.12518310546875, -92.55220031738281, 16.42487335205078, 67.98021697998047, 128.2847900390625, 176.63751220703125, 160.2659454345703, 48.98736572265625, 2.245676040649414, 57.22061538696289, 37.56925964355469, 58.101959228515625, 90.54776000976562, 2.9901199340820312, -12.896942138671875, 139.51043701171875, 182.58135986328125, 95.45196533203125, 2.1956100463867188, 167.26092529296875, 161.60914611816406, 104.00613403320312, -3.3559799194335938, -60.169525146484375, 109.96755981445312, -63.5010986328125, 40.51190948486328, 4.58355712890625, 95.583740234375, 156.1226043701172, 81.72222137451172, 176.15565490722656, 184.8699188232422, 110.3581314086914, -0.38903045654296875, 121.81182861328125, -0.8683052062988281, 35.71357727050781, -109.6123046875, -126.11981201171875, -13.743743896484375, -34.72407531738281, 39.70502853393555, -15.553773880004883, 147.89834594726562, -33.68770980834961, -6.7123260498046875, -4.605171203613281, 81.46176147460938, 73.71765899658203, 46.92674255371094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000407.npy"}
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 71.02813720703125, "std": 76.15838623046875, "min": -105.11711120605469, "p10": -14.826384544372557, "median": 65.9044075012207, "p90": 169.40750885009766, "max": 188.2010498046875, "pos_frac": 0.78125, "sample": [48.65782165527344, 160.43051147460938, 145.42080688476562, 80.203125, -1.8056488037109375, -61.09230041503906, -105.11711120605469, 5.498165130615234, 137.52804565429688, 77.13014221191406, 57.77366638183594, 183.4635009765625, -0.46462059020996094, 99.27488708496094, 100.82948303222656, 188.2010498046875, 29.52169418334961, -32.282691955566406, 58.609046936035156, 28.498580932617188, 88.26690673828125, 2.4013900756835938, -7.946977615356445, 136.29396057128906, 186.83584594726562, -68.33598327636719, 62.972251892089844, 184.8450927734375, 108.5150375366211, 143.54771423339844, 170.3492889404297, -13.062305450439453, 113.25664520263672, 12.356962203979492, 9.623779296875, 163.0048065185547, 34.285404205322266, -15.582418441772461, 185.8717498779297, 41.69181823730469, 54.304847717285156, 165.4956512451172, 163.69314575195312, 174.90640258789062, 133.03700256347656, 14.886098861694336, 136.93960571289062, -16.985857009887695, 55.8812255859375, -1.5006561279296875, 151.46597290039062, 87.79495239257812, 167.21002197265625, -91.27517700195312, 143.66513061523438, 114.08966064453125, 42.80533218383789, -1.1083793640136719, 6.6303253173828125, 34.88381576538086, 109.35208892822266, 68.83656311035156, -2.7559661865234375, 94.07987976074219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000408.npy"}
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 35.53091049194336, "std": 73.34093475341797, "min": -150.79710388183594, "p10": -33.340430831909174, "median": 20.30583095550537, "p90": 152.4740966796875, "max": 180.80413818359375, "pos_frac": 0.65625, "sample": [11.578483581542969, 16.58786964416504, -20.173095703125, -24.272491455078125, 2.7741050720214844, -24.008182525634766, 33.62262725830078, -26.268848419189453, 46.41916275024414, -23.599876403808594, -24.21584701538086, 171.54776000976562, -8.059083938598633, 169.54312133789062, 89.55448913574219, 8.312408447265625, 64.34310913085938, -2.2161331176757812, -10.345958709716797, 175.36898803710938, 122.30841064453125, -23.552249908447266, 107.99272155761719, 148.70565795898438, 10.576202392578125, -56.058197021484375, -8.089736938476562, 70.89105224609375, 61.10028076171875, 4.734832763671875, -45.68177795410156, 55.55059814453125, 100.04923248291016, -20.153522491455078, 116.18811798095703, -13.199621200561523, 154.08914184570312, 107.61539459228516, 43.10780334472656, 20.01410484313965, -63.9138298034668, -8.883110046386719, 81.41389465332031, -67.96672058105469, 20.597557067871094, -25.423171997070312, 6.05328369140625, 70.54177856445312, -142.46163940429688, 170.1166229248047, 33.46620559692383, 106.01653289794922, 0.0643310546875, 170.39181518554688, -150.79710388183594, 12.374099731445312, 51.455108642578125, 57.522552490234375, 70.96575164794922, 61.50433349609375, 180.80413818359375, -36.37110900878906, 37.93658447265625, 55.8894157409668], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000409.npy"}
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 56.84495544433594, "std": 64.12248229980469, "min": -98.92133331298828, "p10": -17.58988418579101, "median": 52.922096252441406, "p90": 150.7980178833008, "max": 190.4605255126953, "pos_frac": 0.796875, "sample": [51.299095153808594, 71.56434631347656, 15.14211654663086, -0.7053718566894531, 163.20779418945312, 12.718032836914062, 86.19003295898438, 35.95134353637695, -23.20233154296875, -98.92133331298828, 74.19263458251953, 156.12057495117188, 3.3521881103515625, -42.34783172607422, 142.50021362304688, 69.92107391357422, 116.447021484375, -6.424346923828125, 78.24842071533203, 173.77357482910156, 190.4605255126953, 45.433197021484375, -3.6347503662109375, 91.22376251220703, 4.363372802734375, 109.37344360351562, 13.415420532226562, 95.969482421875, 27.79064178466797, -29.96778106689453, 104.54520416259766, -19.543136596679688, 31.080127716064453, 126.35883331298828, 28.366485595703125, 186.73068237304688, 123.60945129394531, 24.051319122314453, 54.9578857421875, 68.48919677734375, -13.032295227050781, -35.874908447265625, 66.50846099853516, 53.74702453613281, 126.77039337158203, 71.22523498535156, 147.7135467529297, -2.200897216796875, 12.796039581298828, 52.09716796875, 16.446792602539062, 83.03994750976562, 184.95938110351562, 65.65081787109375, 13.734512329101562, 74.61986541748047, -6.160457611083984, 40.4849853515625, 2.6687774658203125, 124.21480560302734, 152.11993408203125, 65.47859954833984, -32.59611511230469, 21.565040588378906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000410.npy"}
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 55.70112609863281, "std": 79.97290802001953, "min": -174.96670532226562, "p10": -38.225365447998044, "median": 47.84761047363281, "p90": 163.00161437988282, "max": 225.99380493164062, "pos_frac": 0.71875, "sample": [-6.770545959472656, 152.32688903808594, 124.31810760498047, 148.84149169921875, 95.4902114868164, -10.859512329101562, -1.2748870849609375, -174.96670532226562, 93.87100982666016, 2.247100830078125, 75.36226654052734, 175.98846435546875, 113.104248046875, 87.25923156738281, 148.2113037109375, 66.11922454833984, 84.99014282226562, -38.27302551269531, 53.380210876464844, 72.72618103027344, 7.120094299316406, 171.31385803222656, 153.68490600585938, 8.06855583190918, -25.752593994140625, 124.00848388671875, 1.7616653442382812, 17.07394790649414, -47.53122329711914, -32.43495178222656, 27.83673095703125, -11.005334854125977, -38.114158630371094, 35.42315673828125, 75.93963623046875, 29.851314544677734, -2.9695606231689453, 154.4142303466797, 9.062744140625, 225.99380493164062, 135.88076782226562, 42.64076232910156, -16.06536865234375, 163.6278076171875, 164.25653076171875, 165.67039489746094, -68.28549194335938, 31.306228637695312, -9.371417999267578, -69.91265869140625, 27.17607307434082, -48.181884765625, 161.54049682617188, 132.57798767089844, 69.87114715576172, 106.44791412353516, 104.9959487915039, 25.989782333374023, 204.263671875, 53.05445861816406, -39.397743225097656, 1.5674896240234375, -4.4253082275390625, 83.80767059326172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000411.npy"}
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 52.322906494140625, "std": 72.10680389404297, "min": -82.66917419433594, "p10": -35.33040885925293, "median": 36.53636932373047, "p90": 159.29747161865237, "max": 186.83628845214844, "pos_frac": 0.765625, "sample": [-75.91068267822266, 147.6590576171875, 55.20545196533203, 169.92117309570312, 44.990989685058594, 106.02311706542969, 27.614181518554688, 157.08761596679688, -59.834598541259766, -0.4399986267089844, 66.43285369873047, 8.046195983886719, 75.86593627929688, 162.7431640625, -8.202125549316406, -26.791648864746094, 143.31118774414062, -82.66917419433594, 35.13445281982422, 1.4942626953125, 160.2445526123047, 103.95756530761719, 37.5350341796875, 69.49482727050781, 66.40538024902344, -68.03766632080078, 20.347640991210938, 15.245784759521484, 27.039997100830078, 35.53770446777344, 0.2671375274658203, 181.8680877685547, 3.0669918060302734, 186.83628845214844, -30.173011779785156, 16.607620239257812, 33.22320556640625, -13.519725799560547, 138.85015869140625, -3.3292922973632812, -24.239084243774414, 11.260757446289062, 67.7384033203125, -37.54072189331055, 24.133686065673828, 160.613037109375, 131.17242431640625, 120.60698699951172, 170.1315155029297, -52.16376876831055, 76.51030731201172, 97.88909912109375, 51.507572174072266, 9.871208190917969, 46.60419845581055, 143.3085479736328, -2.8899784088134766, 63.824974060058594, -40.565067291259766, 76.89772033691406, 148.6663055419922, 6.699201583862305, 156.7645263671875, 12.714385986328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000412.npy"}
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 44.45771026611328, "std": 89.19840240478516, "min": -174.86131286621094, "p10": -68.33735466003418, "median": 52.90021514892578, "p90": 159.58789367675783, "max": 251.78538513183594, "pos_frac": 0.703125, "sample": [7.055362701416016, -103.92503356933594, 96.08277130126953, 58.36235046386719, -14.146736145019531, -34.456634521484375, -1.1975250244140625, 172.15975952148438, -52.777801513671875, 189.10861206054688, -169.37078857421875, 160.63912963867188, 79.92337799072266, -56.45899963378906, 13.710433959960938, 17.574234008789062, -62.46780776977539, 147.8561248779297, -0.6547470092773438, 14.794076919555664, 182.94131469726562, 95.27171325683594, 52.98344421386719, 66.45855712890625, 94.88156127929688, 39.11113739013672, 251.78538513183594, 62.074615478515625, 18.13762664794922, 45.69818878173828, 82.94456481933594, 127.63875579833984, 70.866943359375, 60.1700439453125, 171.6066131591797, -70.85287475585938, 85.96476745605469, -13.452804565429688, 72.1307144165039, 14.154144287109375, -125.88384246826172, 145.9527587890625, 163.7181396484375, -78.12085723876953, 36.247528076171875, -83.4791259765625, -174.86131286621094, 35.33246612548828, 67.4529037475586, 14.750102996826172, -25.105880737304688, -50.942169189453125, 131.6051025390625, -1.6519355773925781, 52.816986083984375, 138.6896209716797, 66.75930786132812, 149.64035034179688, 26.408039093017578, 58.127647399902344, -60.20024108886719, 114.57418060302734, 157.135009765625, 114.00434875488281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000413.npy"}
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 56.88047409057617, "std": 73.6666488647461, "min": -146.98336791992188, "p10": -26.0454978942871, "median": 51.52094268798828, "p90": 165.25286560058595, "max": 181.91598510742188, "pos_frac": 0.765625, "sample": [133.2017822265625, -11.339202880859375, 98.80621337890625, 61.98199462890625, 2.6681900024414062, 42.939979553222656, -68.60885620117188, 99.21514892578125, 78.34427642822266, 11.383316040039062, 47.44781494140625, 151.23414611816406, 127.01342010498047, -47.610755920410156, 166.34356689453125, 61.469364166259766, -85.1829833984375, -30.469871520996094, 11.430168151855469, 80.81783294677734, 12.943780899047852, 67.7932357788086, 179.07464599609375, -16.423194885253906, 170.35008239746094, 20.785003662109375, 58.7513313293457, 18.89019012451172, 86.02415466308594, 123.61822509765625, 165.22067260742188, 13.518661499023438, 126.55860137939453, -30.169342041015625, 180.63111877441406, 82.19890594482422, 149.86370849609375, 36.33549499511719, 55.59407043457031, 152.79112243652344, 36.33702087402344, -13.299507141113281, 18.407569885253906, -2.750631332397461, 93.92430114746094, -1.5742340087890625, 1.0083541870117188, 110.56964111328125, -146.98336791992188, 66.44278717041016, 85.49932861328125, -3.051006317138672, -0.7048568725585938, 38.51461410522461, 165.26666259765625, 173.79196166992188, 34.99738311767578, 13.785202026367188, 131.14505004882812, 114.72235870361328, -53.25978088378906, -8.599668502807617, 18.808971405029297, 181.91598510742188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000414.npy"}
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 49.402931213378906, "std": 79.60009002685547, "min": -144.02529907226562, "p10": -39.46945648193359, "median": 38.65982627868652, "p90": 161.97137908935548, "max": 223.390869140625, "pos_frac": 0.734375, "sample": [-8.302309036254883, 119.16150665283203, 27.11972999572754, 53.508785247802734, 41.46388626098633, 10.118431091308594, 145.4659881591797, 171.60873413085938, -38.51531982421875, 122.0096664428711, 79.58968353271484, -55.676300048828125, 141.9288330078125, 158.5594024658203, 7.1784515380859375, 164.0397491455078, 179.681884765625, 88.3780517578125, 89.86270141601562, -1.3847064971923828, -13.181381225585938, 80.85066223144531, 86.51311492919922, -1.1860733032226562, 25.327491760253906, 163.43365478515625, -121.4432373046875, 16.20313262939453, 56.88420104980469, 81.43539428710938, -21.229812622070312, 133.4587860107422, 70.37200164794922, -1.9135284423828125, -136.05960083007812, 223.390869140625, 101.59891510009766, 112.61876678466797, 149.91763305664062, -39.87837219238281, 94.47864532470703, -76.70835876464844, -67.89146423339844, -144.02529907226562, 16.221839904785156, 69.21662139892578, -10.733177185058594, 17.498687744140625, -2.2421493530273438, 4.26579475402832, 7.954694747924805, 59.030860900878906, 29.912704467773438, 14.153194427490234, 8.118026733398438, 70.47720336914062, -2.5171031951904297, 118.3029556274414, 85.38296508789062, 3.0205612182617188, 17.51911163330078, 35.85576629638672, 172.21241760253906, 179.37384033203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000415.npy"}
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 42.04512023925781, "std": 82.40144348144531, "min": -149.33551025390625, "p10": -53.03068733215332, "median": 30.953049659729004, "p90": 159.3841796875, "max": 262.3887634277344, "pos_frac": 0.671875, "sample": [9.108306884765625, -131.46572875976562, -8.961872100830078, 50.742706298828125, 42.25260925292969, 162.95323181152344, -37.43309020996094, -54.93299865722656, 159.66217041015625, 174.984619140625, 23.2432861328125, 144.76979064941406, 262.3887634277344, 14.998870849609375, -1.7870216369628906, 76.16217803955078, -1.8425960540771484, 98.5589370727539, -149.33551025390625, -26.533111572265625, 3.7972545623779297, 111.89920806884766, 29.17216682434082, 158.73553466796875, -0.4643745422363281, 109.34001159667969, 52.63232421875, 169.53640747070312, 183.93353271484375, 76.2002182006836, 70.6063003540039, 124.28840637207031, -18.271957397460938, -54.29686737060547, 3.5057544708251953, -2.1701583862304688, -75.44573974609375, -118.16859436035156, -1.8809223175048828, 4.663547515869141, -50.07626724243164, 32.73393249511719, -19.070602416992188, 6.2799835205078125, 92.8134765625, -20.663463592529297, 103.5993423461914, -74.73416137695312, -0.3300189971923828, 102.046142578125, 65.10086059570312, 6.178247451782227, -43.40180969238281, 106.26455688476562, 17.51921844482422, 37.160831451416016, 1.1985855102539062, 47.85005187988281, 78.85958099365234, 33.296043395996094, 153.51193237304688, 132.52923583984375, 170.6553497314453, 76.42098236083984], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000416.npy"}
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 56.72645568847656, "std": 84.83000183105469, "min": -215.23519897460938, "p10": -46.06148071289062, "median": 60.3596305847168, "p90": 158.79067687988282, "max": 235.8882293701172, "pos_frac": 0.796875, "sample": [26.233543395996094, 4.846900939941406, -10.333641052246094, -162.1650390625, 165.35855102539062, 144.38665771484375, -215.23519897460938, 164.43258666992188, 64.6312255859375, 98.88153076171875, 235.8882293701172, 28.021652221679688, 21.271156311035156, 60.75287628173828, 90.7102279663086, -91.76805114746094, 157.0029754638672, 99.2385482788086, 3.09075927734375, 66.6302261352539, 118.2464599609375, 40.674957275390625, 75.70466613769531, 59.96638488769531, 52.14922332763672, -41.44609069824219, 158.47329711914062, 45.697540283203125, 27.717937469482422, 137.25619506835938, -75.55754852294922, 156.07867431640625, 165.158935546875, -64.42713928222656, 126.50363159179688, -33.0018310546875, 44.56132507324219, -4.615638732910156, 143.11163330078125, 100.09520721435547, 158.92669677734375, 8.678627014160156, 164.79257202148438, 65.1141128540039, -48.03950500488281, 27.413345336914062, 8.816347122192383, 48.93730926513672, 1.2719268798828125, 75.19429016113281, 23.7774658203125, 146.99417114257812, 8.017227172851562, -19.958984375, 91.67928314208984, -58.86315155029297, 86.48429870605469, 31.757526397705078, 89.15086364746094, -0.9425868988037109, 125.20892333984375, 66.47412109375, 142.74354553222656, 202.64117431640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000417.npy"}
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 43.83423614501953, "std": 78.45767211914062, "min": -115.24755859375, "p10": -46.15416717529296, "median": 30.84957504272461, "p90": 153.99471130371094, "max": 265.20416259765625, "pos_frac": 0.671875, "sample": [70.51707458496094, 128.6776123046875, 116.34931182861328, 50.607208251953125, 58.99552536010742, 153.73727416992188, 13.605148315429688, 94.37859344482422, 170.02023315429688, -41.68572235107422, 154.10504150390625, 184.9149627685547, -42.26649475097656, 96.93362426757812, 72.48168182373047, -72.58527374267578, 143.9458770751953, 12.769309997558594, 146.82623291015625, -10.065818786621094, -0.6748695373535156, 265.20416259765625, 86.42180633544922, -14.774360656738281, 161.52853393554688, 26.08563995361328, -20.43572998046875, 0.7687606811523438, 18.793254852294922, -47.8203125, -115.24755859375, -55.29193115234375, 176.8297576904297, 35.61351013183594, 169.60699462890625, -16.063385009765625, 5.847873687744141, -13.861526489257812, 14.100540161132812, 16.059898376464844, -41.75933074951172, -50.68177032470703, -81.0662841796875, -24.044710159301758, 100.27532196044922, -16.643760681152344, -9.850547790527344, 37.763916015625, 5.386846542358398, -3.4156723022460938, 35.958656311035156, 50.5477294921875, 130.14291381835938, 121.38296508789062, 15.310295104980469, 14.265655517578125, -80.69535827636719, 78.78305053710938, 86.78219604492188, 110.2609634399414, 81.96553802490234, -38.52436828613281, 45.239105224609375, 43.055419921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000418.npy"}
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 63.256935119628906, "std": 82.83282470703125, "min": -137.9763641357422, "p10": -17.64232139587402, "median": 32.116201400756836, "p90": 175.6050231933594, "max": 221.50987243652344, "pos_frac": 0.828125, "sample": [-7.128582000732422, 221.50987243652344, 10.880302429199219, -2.4276046752929688, -67.62161254882812, 32.295501708984375, 175.9549102783203, 23.736572265625, 19.16040802001953, 20.6815185546875, -137.9763641357422, 157.80703735351562, 163.7473907470703, 95.23161315917969, -43.764854431152344, 13.970077514648438, 67.53449249267578, 25.138427734375, 183.12020874023438, 21.910659790039062, 64.6522216796875, 79.16754913330078, -18.926498413085938, 101.20075988769531, 5.0757904052734375, 2.3931427001953125, 191.25018310546875, -18.889968872070312, 208.44215393066406, 90.64605712890625, 23.852283477783203, -71.28329467773438, -14.731143951416016, 160.29830932617188, 220.1896514892578, 79.44175720214844, 151.01829528808594, 7.449005126953125, 17.598846435546875, 92.30874633789062, 0.3424053192138672, 174.7886199951172, 12.441131591796875, 9.436386108398438, 162.334228515625, 9.784194946289062, 74.83260345458984, 40.29397201538086, 26.61801528930664, 169.36306762695312, 162.2887420654297, 0.6030082702636719, 31.936901092529297, 22.090377807617188, 140.85630798339844, 16.253549575805664, 122.35771942138672, 130.04135131835938, 193.3023223876953, -8.506805419921875, 94.58772277832031, 54.9783935546875, -99.3591537475586, 161.86492919921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000419.npy"}
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 33.70429992675781, "std": 86.37425994873047, "min": -168.36447143554688, "p10": -67.60417251586912, "median": 11.680610656738281, "p90": 170.32270050048828, "max": 192.83670043945312, "pos_frac": 0.65625, "sample": [-5.8159027099609375, -21.956302642822266, 134.56069946289062, 6.829124450683594, 87.29277801513672, -137.15773010253906, 14.309829711914062, 7.692913055419922, 5.788581848144531, -163.3100128173828, -13.162384033203125, 80.82173919677734, 0.9610977172851562, 147.565185546875, 9.0513916015625, 171.47421264648438, 183.35360717773438, 79.26736450195312, 173.8178253173828, 149.716064453125, 21.5401668548584, -43.08427047729492, -0.4493846893310547, 53.031005859375, 93.81979370117188, 161.62930297851562, -83.06288146972656, -78.79920196533203, 56.972015380859375, 6.421291351318359, -96.29714965820312, 36.473724365234375, -24.17530059814453, 23.25027847290039, 0.6600151062011719, -72.59089660644531, -13.136482238769531, -2.8989486694335938, 171.9317626953125, 16.12762451171875, -18.688743591308594, -168.36447143554688, -55.968482971191406, -5.3031463623046875, 147.08627319335938, 56.26671600341797, 19.628833770751953, 169.73497009277344, 1.10931396484375, 41.76171875, 177.49752807617188, 39.81378173828125, 131.44505310058594, 3.456541061401367, 52.0691032409668, 118.69747924804688, 170.5745849609375, -4.933326721191406, 3.529094696044922, 18.572463989257812, -2.504913330078125, 192.83670043945312, -46.650428771972656, -23.053768157958984], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000420.npy"}
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 47.83168029785156, "std": 94.42495727539062, "min": -168.8746795654297, "p10": -72.35409545898438, "median": 32.34092330932617, "p90": 164.81371307373047, "max": 237.69869995117188, "pos_frac": 0.71875, "sample": [-126.18281555175781, 157.2607421875, 3.3395538330078125, 159.36474609375, 142.72067260742188, 15.968017578125, 87.18748474121094, 8.562454223632812, -0.13648223876953125, 237.69869995117188, 202.2661895751953, -74.80560302734375, 165.99105834960938, 50.214439392089844, -45.9808235168457, 54.90079116821289, 17.736404418945312, -44.76661682128906, 173.91915893554688, 122.85116577148438, 108.54231262207031, 109.97926330566406, 2.1334705352783203, 3.8933944702148438, 76.52647399902344, -3.338825225830078, 117.54618072509766, -168.8746795654297, -144.14935302734375, -55.7469367980957, 10.21287727355957, 161.00210571289062, 131.34384155273438, 9.284337997436523, 63.35436248779297, 170.311279296875, 139.6374053955078, 127.87577056884766, 24.81713104248047, 2.6397972106933594, 157.27662658691406, -3.2801475524902344, 86.3046875, -145.29605102539062, -95.25117492675781, 105.95361328125, 2.9940948486328125, 54.08637237548828, -97.58499145507812, 26.993301391601562, 210.86239624023438, 170.87677001953125, -0.08621978759765625, 30.086410522460938, 23.92645263671875, -66.6339111328125, 34.595436096191406, 78.518798828125, 105.80799865722656, -59.234336853027344, 162.0665740966797, 112.14948272705078, -26.300613403320312, -2.7031593322753906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000421.npy"}
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 63.94179916381836, "std": 77.49748229980469, "min": -94.07740020751953, "p10": -35.78712005615233, "median": 50.044267654418945, "p90": 165.17317352294924, "max": 205.85317993164062, "pos_frac": 0.765625, "sample": [169.60906982421875, 205.85317993164062, -61.951438903808594, -4.153415679931641, 166.264404296875, -13.236808776855469, 162.62696838378906, -47.63920593261719, 101.78828430175781, 190.01449584960938, 7.5050811767578125, 50.05314636230469, 8.291275024414062, 31.56048583984375, 157.15206909179688, 32.96340560913086, 159.3485870361328, 169.79627990722656, -84.50137329101562, 140.00381469726562, 74.68144226074219, -40.325286865234375, -4.818088531494141, -0.2609138488769531, 91.96424865722656, 154.8794708251953, 132.34852600097656, 127.18814849853516, 196.34893798828125, 96.87353515625, 99.91453552246094, 83.56245422363281, 37.65935134887695, -7.665620803833008, 30.815269470214844, 36.27199172973633, 105.92599487304688, 46.70819091796875, 162.41796875, 19.4727783203125, -21.28514862060547, -62.4063606262207, 146.58831787109375, 104.08358764648438, 5.0914306640625, -27.56513214111328, -4.6841278076171875, 109.68900299072266, 12.261322021484375, 51.763797760009766, 50.0353889465332, 200.78268432617188, 26.289155960083008, -39.310829162597656, 27.874248504638672, 90.40196228027344, 12.813299179077148, -94.07740020751953, 27.7006893157959, 75.52715301513672, 150.77549743652344, 110.82878112792969, 18.464614868164062, 135.32211303710938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000422.npy"}
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 63.5325927734375, "std": 80.53909301757812, "min": -110.43018341064453, "p10": -13.052191162109374, "median": 47.64168739318848, "p90": 173.19098663330078, "max": 227.8249053955078, "pos_frac": 0.75, "sample": [26.146102905273438, -3.96490478515625, 153.71311950683594, -71.38035583496094, 182.07701110839844, 3.629903793334961, 67.24404907226562, -42.74651336669922, 166.45018005371094, 47.342803955078125, -12.298294067382812, 154.8543243408203, 5.888214111328125, -53.81616973876953, -3.3391876220703125, 1.63623046875, 13.96759033203125, -6.2463836669921875, 173.42433166503906, -79.74978637695312, 87.24546813964844, 109.0408706665039, -13.375289916992188, -4.536264419555664, 74.15538024902344, 200.49205017089844, 133.15380859375, 5.360809326171875, -110.43018341064453, 98.74881744384766, 172.64651489257812, 147.012939453125, 99.30816650390625, 120.03468322753906, 86.74214935302734, 34.26353454589844, 21.19561767578125, 154.40953063964844, 187.9595489501953, 175.72840881347656, 11.161460876464844, 8.964645385742188, 4.528961181640625, 105.43597412109375, 140.03729248046875, 152.86810302734375, 166.02114868164062, 59.85499954223633, -5.403507232666016, 63.09453582763672, 227.8249053955078, 101.60173034667969, -3.4166011810302734, 15.848596572875977, 121.9707260131836, 180.79193115234375, -3.5770263671875, 5.361083984375, -10.788978576660156, 23.604629516601562, 167.1373291015625, 2.316070556640625, 47.94057083129883, -19.08149528503418], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000423.npy"}
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 61.598793029785156, "std": 83.6519546508789, "min": -134.18560791015625, "p10": -41.16196022033691, "median": 46.39569854736328, "p90": 174.67302856445315, "max": 238.0478515625, "pos_frac": 0.796875, "sample": [-7.545013427734375, 196.43060302734375, 29.233505249023438, 140.69287109375, -72.42379760742188, 149.36676025390625, 238.0478515625, 145.88735961914062, -97.96995544433594, -66.96510314941406, 15.482715606689453, 13.199335098266602, 62.394798278808594, -134.18560791015625, 132.32728576660156, 9.90847396850586, 156.12135314941406, -38.110389709472656, 55.875640869140625, 13.785285949707031, -24.56886863708496, 86.2509994506836, 146.44683837890625, 19.43169403076172, -42.46977615356445, 167.94659423828125, 196.37591552734375, 11.357711791992188, 182.2811279296875, 31.684053421020508, 82.56346893310547, 103.9977798461914, 8.531120300292969, 56.77002716064453, 27.394195556640625, 48.26393127441406, 148.04132080078125, 190.9744415283203, 136.74798583984375, 96.2861557006836, 107.19441986083984, 17.585044860839844, 14.89224624633789, 175.95578002929688, 13.450782775878906, 175.57907104492188, 108.38731384277344, -0.8357601165771484, -65.58023834228516, -1.1751956939697266, 164.7017822265625, 67.32470703125, 143.24078369140625, 172.55892944335938, 6.545310974121094, 15.889671325683594, 45.69831848144531, -72.56024169921875, 47.09307861328125, 123.81349182128906, 5.556007385253906, 20.41631317138672, -3.2807464599609375, 44.01141357421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000424.npy"}
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 50.64934539794922, "std": 81.61843872070312, "min": -129.736328125, "p10": -50.7958324432373, "median": 37.95551681518555, "p90": 169.16596374511718, "max": 200.6346435546875, "pos_frac": 0.734375, "sample": [154.44400024414062, 113.78694915771484, 37.91474151611328, 9.741973876953125, 169.20889282226562, -17.191524505615234, 191.724609375, -9.682926177978516, 63.84801483154297, 88.66896057128906, 93.27947998046875, -73.57183837890625, 54.811763763427734, -16.916015625, -42.09214401245117, 106.9249267578125, 31.84001922607422, 174.4861602783203, 11.23321533203125, 59.459720611572266, -121.71781921386719, 60.00770568847656, 1.0743293762207031, 64.76948547363281, 126.08056640625, 8.544380187988281, 0.4031352996826172, 187.05178833007812, 11.238197326660156, 172.48272705078125, 160.9440460205078, -54.266727447509766, 103.13677978515625, -1.95697021484375, 40.97306823730469, 44.079254150390625, 37.99629211425781, 169.0657958984375, 19.16202163696289, 29.974834442138672, 174.66012573242188, -98.67974090576172, 53.82997131347656, 102.3795166015625, 27.632888793945312, -129.736328125, 26.551071166992188, 66.170654296875, 51.533905029296875, -9.601303100585938, 166.67994689941406, -13.510967254638672, -67.23606872558594, 165.2249755859375, 15.781761169433594, -0.3199653625488281, 148.28677368164062, 37.06999206542969, -6.877777099609375, 200.6346435546875, -45.093563079833984, 16.819664001464844, 151.63546752929688, -53.239662170410156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000425.npy"}
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 30.56137466430664, "std": 90.51251983642578, "min": -213.42189025878906, "p10": -72.2149040222168, "median": 21.106475830078125, "p90": 151.1691757202149, "max": 208.3450927734375, "pos_frac": 0.625, "sample": [-1.6804924011230469, -3.269603729248047, 127.7297134399414, 155.78074645996094, 27.830909729003906, 135.09107971191406, 17.068466186523438, 169.21295166015625, 13.435203552246094, 57.15605163574219, -13.780509948730469, -12.191238403320312, 0.5170516967773438, -141.08502197265625, 61.18067932128906, 25.144485473632812, 93.38935852050781, 29.52740478515625, 45.426963806152344, -33.011436462402344, -167.66224670410156, -16.871734619140625, 32.480224609375, 66.36064147949219, -9.827476501464844, 5.2565460205078125, 41.44325256347656, -196.72325134277344, 111.29522705078125, -4.576637268066406, 162.05465698242188, 106.40533447265625, 140.40884399414062, -27.83515739440918, -213.42189025878906, 67.91763305664062, -6.5221099853515625, 0.7316570281982422, -0.8885345458984375, 208.3450927734375, 161.124267578125, 195.4855499267578, -110.2220458984375, -118.03631591796875, 3.3676605224609375, 186.5968017578125, -7.5361785888671875, 1.4964828491210938, -4.697086334228516, -69.96290588378906, 84.5624008178711, 109.81969451904297, -2.6497039794921875, 87.52110290527344, 38.50438690185547, -73.18004608154297, 11.265613555908203, 119.23230743408203, 33.65159606933594, -49.61993408203125, 101.884765625, 73.37895965576172, 136.9108428955078, -4.812953948974609], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000426.npy"}
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 48.38745880126953, "std": 88.40745544433594, "min": -147.81385803222656, "p10": -69.90603485107421, "median": 34.38432312011719, "p90": 168.42974243164065, "max": 263.06304931640625, "pos_frac": 0.765625, "sample": [77.3809814453125, 17.96681022644043, -9.366870880126953, -85.52796173095703, -47.67765808105469, 146.04798889160156, 171.5369873046875, 28.180042266845703, -3.3359222412109375, -78.84760284423828, 144.75494384765625, -25.960559844970703, -74.66740417480469, -101.90643310546875, 39.656166076660156, 142.5965576171875, 92.42359161376953, 73.74092102050781, 10.594459533691406, -18.922561645507812, 0.33859825134277344, 70.17965698242188, 140.4031982421875, 74.75299072265625, 27.182113647460938, 157.73361206054688, 129.773681640625, 139.85000610351562, 9.296554565429688, 32.92198944091797, 178.4801025390625, 12.594512939453125, 63.23944091796875, 35.846656799316406, 17.11125946044922, 163.62335205078125, 19.391639709472656, -147.81385803222656, 159.51791381835938, 173.99349975585938, 4.398887634277344, 21.930137634277344, 36.09295654296875, 13.026451110839844, -136.31021118164062, -53.141136169433594, 140.7757568359375, 47.2603759765625, 18.872406005859375, -110.45210266113281, 170.4896240234375, 202.54745483398438, 1.4102020263671875, 68.11690521240234, 79.51697540283203, 20.327850341796875, 84.27877807617188, 59.58808517456055, 5.5482940673828125, -17.786224365234375, 91.47431182861328, -58.796173095703125, 263.06304931640625, 187.48141479492188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000427.npy"}
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 57.95905303955078, "std": 87.14344787597656, "min": -160.22911071777344, "p10": -33.8057918548584, "median": 48.52906608581543, "p90": 177.69954376220704, "max": 278.8356018066406, "pos_frac": 0.703125, "sample": [-86.20326232910156, 91.54232025146484, 179.88916015625, -1.1196746826171875, 2.4251556396484375, 4.1704559326171875, 23.07931900024414, -117.92779541015625, -49.8980712890625, 38.20885467529297, 182.46673583984375, -33.58820343017578, 130.99981689453125, -16.78894805908203, 158.05099487304688, 46.815765380859375, 162.89968872070312, 20.031227111816406, -4.144081115722656, 47.56016540527344, 176.3999786376953, -0.6826515197753906, 1.1463394165039062, 2.1795501708984375, 184.3032684326172, 93.79270935058594, 64.24666595458984, -160.22911071777344, 57.43074035644531, -75.37869262695312, 73.79119873046875, -1.0818710327148438, 16.202747344970703, 49.49796676635742, 63.05866241455078, 180.09359741210938, 157.43115234375, 136.01353454589844, 61.38684844970703, -36.474876403808594, 184.51377868652344, 278.8356018066406, 95.21446990966797, 151.12722778320312, -29.723281860351562, -2.721893310546875, 88.84394836425781, 55.37500762939453, -0.7796802520751953, -16.502288818359375, 14.572639465332031, 24.236228942871094, -0.9653778076171875, 1.7046318054199219, 165.47750854492188, 172.8762664794922, 178.25650024414062, 72.67648315429688, 136.584716796875, 151.66726684570312, 91.01678466796875, 133.81259155273438, -24.41797637939453, -33.899044036865234], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000428.npy"}
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 44.62921142578125, "std": 84.17464447021484, "min": -132.99957275390625, "p10": -64.65152015686033, "median": 31.949337005615234, "p90": 164.1390289306641, "max": 185.80679321289062, "pos_frac": 0.75, "sample": [84.35255432128906, -122.608642578125, 2.171489715576172, -2.7466964721679688, 152.45486450195312, 113.87760162353516, 51.11404037475586, 20.97399139404297, -12.538997650146484, 11.865142822265625, 31.234649658203125, -116.72573852539062, 67.96527099609375, 62.65830993652344, -1.725738525390625, 137.17189025878906, -132.99957275390625, 21.589447021484375, -21.978240966796875, 107.9590835571289, 119.91127014160156, -37.614986419677734, 126.67140197753906, 158.79971313476562, 166.42730712890625, 167.64254760742188, 178.50660705566406, 4.78680419921875, 32.664024353027344, -37.894290924072266, -76.11890411376953, -34.083805084228516, 8.753181457519531, -15.077537536621094, 93.66422271728516, 2.2750244140625, -83.16815185546875, 78.5597152709961, 3.8252944946289062, 3.00323486328125, 166.64218139648438, 147.91604614257812, 36.744529724121094, 151.5830078125, 144.9932403564453, -130.5382080078125, -120.65568542480469, 15.524715423583984, 16.751739501953125, 129.47430419921875, 22.336585998535156, 34.76439666748047, 10.679023742675781, -18.399578094482422, 52.65963363647461, 55.90596008300781, 185.80679321289062, 66.8581771850586, 177.81582641601562, 166.97457885742188, 4.231781005859375, 115.66004943847656, 31.001205444335938, 75.94194793701172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000429.npy"}
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 71.38347625732422, "std": 86.31891632080078, "min": -204.55105590820312, "p10": -19.00901260375975, "median": 73.46363067626953, "p90": 176.39867553710937, "max": 193.15719604492188, "pos_frac": 0.796875, "sample": [81.7264404296875, 33.17878723144531, -25.962005615234375, 24.47565460205078, -2.3692245483398438, 165.4772491455078, -0.07126998901367188, 165.26844787597656, 38.178916931152344, 100.6665267944336, -93.3083724975586, 143.57379150390625, 182.52081298828125, 11.50132942199707, 169.2533416748047, 100.76203155517578, 86.21637725830078, 31.306045532226562, 184.17530822753906, -30.109350204467773, 49.88615417480469, 192.73980712890625, 152.29727172851562, 110.50967407226562, -0.6581344604492188, 173.8637237548828, 18.518875122070312, 175.20742797851562, 112.549560546875, 171.00242614746094, 93.92439270019531, 4.927825927734375, 69.93153381347656, 158.87704467773438, -55.996585845947266, 176.90921020507812, 193.15719604492188, 131.26950073242188, -106.85990905761719, 142.93609619140625, 9.397537231445312, 82.0455093383789, 33.24363327026367, 65.7979507446289, 16.253982543945312, 148.56378173828125, 13.187660217285156, 1.884023666381836, -204.55105590820312, -2.7853622436523438, -0.89080810546875, 164.35284423828125, -1.3967971801757812, 162.46441650390625, 29.35198974609375, 182.5174560546875, 184.6766815185547, 120.50703430175781, 148.62298583984375, 4.3236236572265625, 76.9957275390625, -48.43837356567383, 13.353363037109375, 37.610809326171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000430.npy"}
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 49.56535339355469, "std": 81.35397338867188, "min": -160.84552001953125, "p10": -26.477571868896483, "median": 31.975048065185547, "p90": 163.88995819091798, "max": 200.45797729492188, "pos_frac": 0.75, "sample": [-160.84552001953125, 127.14080810546875, 74.37472534179688, 38.059959411621094, 11.21609878540039, 117.85205841064453, 84.34442138671875, 133.64425659179688, 145.13343811035156, -1.3299198150634766, 5.069843292236328, 174.093505859375, 147.66754150390625, -14.823287963867188, 44.62689971923828, 17.81032371520996, 200.45797729492188, -1.765289306640625, 182.86209106445312, 25.89013671875, 60.63642120361328, -155.31068420410156, 22.298851013183594, 90.97686767578125, -41.64546203613281, 168.89198303222656, 74.69892883300781, -86.3671875, 54.9888916015625, -8.027830123901367, 54.794158935546875, -5.727710723876953, 141.29852294921875, 172.56280517578125, -82.84107971191406, 1.9137344360351562, 161.44522094726562, 86.41249084472656, -20.895782470703125, -54.25079345703125, -18.493804931640625, -0.69659423828125, 3.7071762084960938, 133.66575622558594, 2.9356307983398438, 18.66869354248047, 156.22560119628906, 39.98992919921875, 3.3391189575195312, 195.43482971191406, 12.946380615234375, 45.392765045166016, 2.501983642578125, 22.432559967041016, 104.2741928100586, 6.826568603515625, 45.8948974609375, -23.311790466308594, 17.25375747680664, 164.35824584960938, 162.7972869873047, 95.57958221435547, 20.961639404296875, -27.834335327148438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000431.npy"}
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 53.36254119873047, "std": 84.93352508544922, "min": -189.82997131347656, "p10": -38.61286239624023, "median": 46.13279724121094, "p90": 169.21895751953124, "max": 190.65658569335938, "pos_frac": 0.703125, "sample": [-9.107101440429688, -33.43797302246094, -50.31706237792969, -189.82997131347656, 3.8060665130615234, -0.5761394500732422, 8.065658569335938, 72.94961547851562, -40.292198181152344, 167.79229736328125, 102.465576171875, -85.68184661865234, 32.660247802734375, 67.64569091796875, 169.83038330078125, -131.189453125, 61.07543182373047, 10.26177978515625, 187.36265563964844, 109.72769165039062, 3.3101730346679688, 89.30396270751953, 140.67672729492188, -6.9240875244140625, -6.218486785888672, 177.2890625, 4.793739318847656, 167.4193115234375, 146.92637634277344, -29.637006759643555, -55.313568115234375, 37.894771575927734, 8.110506057739258, -34.69441223144531, 45.58856201171875, 66.21353912353516, 175.56492614746094, 132.957275390625, 46.677032470703125, 141.46856689453125, -26.89493179321289, 47.65021896362305, 126.34645080566406, 15.8271484375, 133.166748046875, -7.9973602294921875, 45.28070831298828, 172.84832763671875, 13.74587631225586, 178.14462280273438, 133.6517333984375, 105.64533996582031, 118.35132598876953, -25.144882202148438, 58.860809326171875, 190.65658569335938, -9.093032836914062, 142.1764373779297, 48.680145263671875, 8.672260284423828, -59.31365966796875, 160.57296752929688, -0.34130859375, 139.0918426513672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000432.npy"}
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 67.21623229980469, "std": 83.06703186035156, "min": -170.19290161132812, "p10": -39.270268630981434, "median": 57.01664352416992, "p90": 171.06000061035155, "max": 190.7563018798828, "pos_frac": 0.796875, "sample": [53.81430435180664, 134.11981201171875, 14.200756072998047, 57.20046615600586, -5.698631286621094, 34.395599365234375, 0.35395240783691406, 119.60940551757812, -22.922077178955078, 6.7415771484375, 11.627761840820312, 28.346031188964844, 118.47705078125, 37.39672088623047, 171.44403076171875, -100.94731903076172, 88.65222930908203, 158.1258087158203, 142.59451293945312, -43.01509475708008, 59.46888732910156, 36.080322265625, 4.377189636230469, 171.6260528564453, -170.19290161132812, 62.58651351928711, 157.25204467773438, 29.028976440429688, 158.61788940429688, 143.68692016601562, 38.771278381347656, 90.36370086669922, -44.78339385986328, 1.6667556762695312, 56.832820892333984, -10.724197387695312, -45.28367233276367, 63.74574279785156, 66.86090087890625, 173.84254455566406, 190.7563018798828, 171.14654541015625, 183.45419311523438, 50.33064270019531, 139.24977111816406, -90.03763580322266, 48.962310791015625, 119.78878784179688, -20.049781799316406, 169.75942993164062, -25.161155700683594, 116.23572540283203, 161.9244384765625, -30.53234100341797, 48.88410186767578, 165.32098388671875, 118.17566680908203, 56.068939208984375, 170.85806274414062, 182.20025634765625, 165.78109741210938, 38.05259704589844, 167.57618713378906, -45.247528076171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000433.npy"}
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 70.45217895507812, "std": 95.08000183105469, "min": -181.8292694091797, "p10": -55.9563766479492, "median": 77.00128173828125, "p90": 179.45250549316407, "max": 265.12249755859375, "pos_frac": 0.796875, "sample": [-169.66452026367188, 72.7496337890625, 123.52764892578125, 27.90826416015625, 153.25782775878906, 181.59767150878906, -64.00996398925781, 178.70822143554688, -66.7159423828125, 71.37979125976562, -0.3644847869873047, -37.1646728515625, -181.8292694091797, 73.15011596679688, 150.13450622558594, -66.557861328125, 123.93302917480469, 71.20541381835938, 265.12249755859375, 177.48109436035156, 94.97469329833984, 2.4291629791259766, 200.4861602783203, 6.133262634277344, -121.60657501220703, 29.513687133789062, 174.23257446289062, 107.84397888183594, -1.9834098815917969, 122.09880065917969, 8.52456283569336, 99.18757629394531, 105.75938415527344, 171.01234436035156, 186.1493377685547, 169.1720733642578, -10.818649291992188, 95.96131134033203, 189.24978637695312, 8.797828674316406, 179.771484375, 7.3858184814453125, 28.88302993774414, 141.73190307617188, 17.369979858398438, 28.058937072753906, 27.465593338012695, 104.08361053466797, 80.85244750976562, 155.2665252685547, 6.097827911376953, 183.05950927734375, -123.48091125488281, -12.332992553710938, 11.984565734863281, 108.39615631103516, 143.9746551513672, 167.10484313964844, 140.17747497558594, -1.6576042175292969, 49.73641586303711, 118.65937042236328, 170.69949340820312, 54.684120178222656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000434.npy"}
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 56.22063446044922, "std": 86.45177459716797, "min": -150.46456909179688, "p10": -43.306819915771484, "median": 45.520559310913086, "p90": 172.16770935058594, "max": 191.38665771484375, "pos_frac": 0.703125, "sample": [102.0406265258789, 85.97895812988281, -7.245796203613281, 71.84835815429688, -103.62767791748047, 30.580284118652344, 12.825576782226562, 0.00147247314453125, 143.2080078125, -21.932968139648438, 165.57481384277344, 165.38735961914062, -11.653167724609375, 100.58307647705078, 165.014404296875, 180.8945770263672, 31.163536071777344, -40.88868713378906, -13.204193115234375, 162.23040771484375, 51.152748107910156, -5.001335144042969, 152.37835693359375, 182.15640258789062, 16.720504760742188, 44.050010681152344, -44.343162536621094, 46.332855224609375, 172.84423828125, 180.0556640625, -8.786834716796875, 66.41522216796875, -150.46456909179688, 133.5166778564453, 44.7082633972168, 86.24583435058594, -7.966392517089844, -23.5460205078125, 131.49041748046875, 170.58914184570312, 191.38665771484375, 186.51065063476562, 174.3462677001953, 64.12297821044922, 0.34253692626953125, 54.021018981933594, 42.92863845825195, -34.29364776611328, -34.348690032958984, 162.36492919921875, -85.75959014892578, 18.515213012695312, -59.11677551269531, 103.29618072509766, 92.71322631835938, 89.40864562988281, 158.22300720214844, -69.96528625488281, -93.89252471923828, 41.58067321777344, 1.763702392578125, -18.995391845703125, 151.26016235351562, 4.38104248046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000435.npy"}
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 45.97420120239258, "std": 90.24996948242188, "min": -166.2494659423828, "p10": -73.06809997558594, "median": 32.40401840209961, "p90": 163.9751205444336, "max": 196.75445556640625, "pos_frac": 0.703125, "sample": [112.4766845703125, -29.632823944091797, 163.40277099609375, 2.1887073516845703, 93.68031311035156, 5.448274612426758, 77.7506103515625, 156.20884704589844, -94.93994903564453, 31.906639099121094, 32.901397705078125, 164.2204132080078, 7.840980529785156, 116.58511352539062, -40.04734420776367, 175.65682983398438, 134.28427124023438, 109.62040710449219, -164.20094299316406, 74.96390533447266, 14.546550750732422, 69.0398178100586, -21.836326599121094, 12.184410095214844, 45.689002990722656, -11.376350402832031, 149.64822387695312, -1.4202938079833984, 130.45278930664062, 23.86321258544922, 71.98432922363281, -12.2445068359375, -166.2494659423828, 98.47769165039062, -21.34134864807129, 48.15521240234375, 157.88446044921875, -75.2380599975586, -11.963104248046875, -75.58123779296875, 148.12796020507812, 117.97830963134766, 24.686267852783203, 12.354141235351562, -101.43746948242188, 120.71189880371094, -66.9690170288086, 195.57647705078125, 85.80380249023438, 196.75445556640625, 140.87991333007812, 1.1140518188476562, 181.90438842773438, 104.39714050292969, -20.13770294189453, 10.745220184326172, 171.9533233642578, -68.0048599243164, 11.680152893066406, 0.5095615386962891, -46.640350341796875, 183.0457000732422, -97.53776550292969, 79.8631820678711], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000436.npy"}
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 50.77096176147461, "std": 97.22096252441406, "min": -225.46420288085938, "p10": -70.2551383972168, "median": 55.912832260131836, "p90": 173.33938140869142, "max": 224.722900390625, "pos_frac": 0.65625, "sample": [224.722900390625, 194.87322998046875, 164.89166259765625, 96.84664916992188, 56.66460418701172, -23.050678253173828, 146.68362426757812, 71.85279846191406, 196.13201904296875, 55.16106033325195, 173.7034454345703, -7.575286865234375, -225.46420288085938, 58.68949890136719, -98.75086975097656, -79.56884002685547, 25.71246337890625, 108.55943298339844, 152.76272583007812, -139.97738647460938, 185.8074188232422, 195.48016357421875, 171.218017578125, -25.004039764404297, -31.036670684814453, 79.76899719238281, 170.3612060546875, -16.838428497314453, -146.05892944335938, -73.01358795166016, -14.440666198730469, -59.86122131347656, 48.257896423339844, -14.548789978027344, -5.241416931152344, 172.48989868164062, 77.52687072753906, -22.572479248046875, 72.5488052368164, 88.99888610839844, 41.85115051269531, -6.234283447265625, -30.81557846069336, 172.1850128173828, 39.6065673828125, 162.91644287109375, 57.935523986816406, 74.85572814941406, 81.04856872558594, -120.9875259399414, 74.12348175048828, 53.59926986694336, 153.2750244140625, 136.41600036621094, 0.4049072265625, 15.825233459472656, 40.227317810058594, 84.89372253417969, -63.818756103515625, -0.3113746643066406, -0.03747749328613281, 73.04727172851562, 14.676483154296875, 187.94808959960938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000437.npy"}
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 71.32022094726562, "std": 86.7859878540039, "min": -99.76544189453125, "p10": -18.987296295166015, "median": 52.36378479003906, "p90": 171.4025588989258, "max": 334.1785888671875, "pos_frac": 0.734375, "sample": [192.2884063720703, -3.8213653564453125, 334.1785888671875, -46.52256774902344, -0.8630638122558594, -49.09241485595703, -31.678455352783203, 157.34402465820312, 2.0857105255126953, 180.6266632080078, -17.37667465209961, 190.63894653320312, -11.473030090332031, 9.203514099121094, 172.35421752929688, 0.7319793701171875, 153.9478302001953, -99.76544189453125, 148.55856323242188, 99.2620620727539, 137.05181884765625, 142.649658203125, -7.52447509765625, 25.56451416015625, 64.33357238769531, 164.0438995361328, 45.444580078125, 29.024730682373047, 169.18202209472656, 128.27784729003906, 88.28616333007812, -22.25762939453125, 142.9130859375, 10.842878341674805, 9.402503967285156, 163.83908081054688, 154.7350311279297, -7.44447135925293, 88.87960815429688, 47.14903259277344, 150.04208374023438, -6.4038543701171875, 63.187042236328125, 223.560302734375, 150.70071411132812, 14.898223876953125, 118.42896270751953, 33.97981262207031, 15.70416259765625, -1.6400032043457031, -19.677562713623047, -7.726327896118164, 20.90422248840332, 159.67123413085938, 134.07516479492188, -89.38335418701172, 166.14508056640625, 36.027748107910156, -2.8418502807617188, 180.71604919433594, 6.785648345947266, 57.57853698730469, 133.350830078125, 71.39044952392578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000438.npy"}
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 58.80029296875, "std": 83.2633285522461, "min": -201.0333251953125, "p10": -16.91549377441406, "median": 34.87534713745117, "p90": 181.8637466430664, "max": 210.1024627685547, "pos_frac": 0.734375, "sample": [135.90203857421875, 2.1153526306152344, 26.951597213745117, -23.18842315673828, 27.625240325927734, 160.7420654296875, 101.33943176269531, 63.068145751953125, -18.01178741455078, 197.99668884277344, 198.70396423339844, -3.098459243774414, 176.56692504882812, 10.209976196289062, 29.531784057617188, 194.8514404296875, 150.7347412109375, 11.337684631347656, 172.50540161132812, -18.594985961914062, 6.289531707763672, 123.6093521118164, 54.72400665283203, -2.5516357421875, 174.2541046142578, 31.647483825683594, 179.63357543945312, -100.56886291503906, -0.2650585174560547, 182.8195343017578, -21.606489181518555, 183.7173614501953, -34.585365295410156, -13.964019775390625, 61.88243865966797, 0.6967315673828125, -1.6385345458984375, 10.854156494140625, -201.0333251953125, 31.301929473876953, 18.04949951171875, 26.692062377929688, 171.84910583496094, 38.10321044921875, 58.60714340209961, 173.4006805419922, 61.50396728515625, 194.12796020507812, 42.22186279296875, 40.358001708984375, 15.554920196533203, 97.05735778808594, -6.423866271972656, 121.00237274169922, 47.9674072265625, 210.1024627685547, 74.06423950195312, -8.40631103515625, -14.357475280761719, -2.2736663818359375, 16.549762725830078, 65.78361511230469, 65.21046447753906, -6.031959533691406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000439.npy"}
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 63.36556625366211, "std": 90.6329116821289, "min": -148.3852996826172, "p10": -43.46616554260253, "median": 48.345359802246094, "p90": 178.42002563476564, "max": 227.17947387695312, "pos_frac": 0.765625, "sample": [-21.80077362060547, 38.21830749511719, 53.28880310058594, 204.92929077148438, 177.91775512695312, 227.17947387695312, 179.08509826660156, -20.15013885498047, 175.86032104492188, 39.24969482421875, 16.603233337402344, 167.55587768554688, 59.206871032714844, 163.40147399902344, 123.94692993164062, 39.497344970703125, 172.61965942382812, -29.62812042236328, -129.74862670898438, 42.19743347167969, 95.35442352294922, -10.251550674438477, 6.0605926513671875, 140.73443603515625, 54.018836975097656, 132.22879028320312, 98.41584777832031, 168.0576171875, 10.915388107299805, 20.710784912109375, 3.857076644897461, 51.623985290527344, 145.77667236328125, 0.8622856140136719, 0.0727996826171875, 56.73724365234375, 36.583526611328125, -148.3852996826172, 19.177072525024414, 180.80380249023438, 123.7926025390625, 179.28944396972656, -2.0565948486328125, 172.05859375, -45.710208892822266, 37.81208038330078, 180.6844940185547, 45.066734313964844, 159.27655029296875, -34.66210174560547, -56.64737319946289, -70.12728881835938, 72.16627502441406, -38.230064392089844, 83.95288848876953, 30.39651870727539, -12.077762603759766, -46.426658630371094, 163.59661865234375, -140.6666259765625, 39.75608825683594, 178.63528442382812, 175.086669921875, 117.64579772949219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000440.npy"}
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 66.45780181884766, "std": 73.48334503173828, "min": -70.4953384399414, "p10": -17.418550872802733, "median": 55.98210525512695, "p90": 170.1917984008789, "max": 189.72329711914062, "pos_frac": 0.8125, "sample": [152.07009887695312, 84.81343078613281, 177.43983459472656, 180.8746337890625, -3.541189193725586, 142.78700256347656, 161.09173583984375, -13.768211364746094, 73.08114624023438, -9.759979248046875, -39.00221633911133, 145.95005798339844, -14.336402893066406, 171.8931884765625, 28.078807830810547, 26.318206787109375, 65.21562957763672, 9.0321044921875, -43.33895492553711, 168.2382354736328, 43.59376525878906, 61.147621154785156, 95.58966064453125, 170.0554962158203, 0.7498397827148438, 172.9326629638672, -0.5634346008300781, 10.416053771972656, 189.72329711914062, 52.890533447265625, 52.061256408691406, 51.8673095703125, -39.947731018066406, 107.64859008789062, 119.2821044921875, 7.883201599121094, -64.81398010253906, 168.6149444580078, 63.68201446533203, 63.9442138671875, 26.05499267578125, 170.25021362304688, 29.22370147705078, 14.203441619873047, 98.04090881347656, 41.3270263671875, 92.40809631347656, 2.0147247314453125, 108.59292602539062, -18.739471435546875, 163.42474365234375, 26.5200138092041, 167.1048583984375, 184.95098876953125, -56.435760498046875, 37.408164978027344, 67.20279693603516, 7.149238586425781, 12.909339904785156, 103.56818389892578, -70.4953384399414, 59.07367706298828, 156.80551147460938, 40.841888427734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000441.npy"}
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 63.97719955444336, "std": 82.06916046142578, "min": -162.88946533203125, "p10": -32.1145149230957, "median": 59.81579780578613, "p90": 168.1040496826172, "max": 221.47146606445312, "pos_frac": 0.796875, "sample": [27.42150115966797, 165.79531860351562, 43.34400939941406, -40.5145263671875, 51.84669494628906, 221.47146606445312, 162.15725708007812, 174.2498016357422, 69.16351318359375, 161.8296661376953, 60.93143081665039, 151.7664794921875, 153.28079223632812, -69.45089721679688, 126.24541473388672, 102.0319595336914, 0.0427703857421875, 23.176834106445312, 23.9254150390625, -34.646385192871094, -84.99613952636719, 154.285888671875, 104.69734191894531, -5.922760009765625, 148.14456176757812, 4.731773376464844, -43.94731903076172, 98.4556884765625, 16.732755661010742, 154.12342834472656, -0.8458251953125, 72.06856536865234, 163.46466064453125, 161.27597045898438, 198.03085327148438, 97.99217224121094, 7.465263366699219, 1.8262062072753906, 175.63743591308594, 58.700164794921875, 83.96688079833984, 7.803077697753906, 2.117246627807617, 21.202800750732422, 1.1935062408447266, -162.88946533203125, -11.694259643554688, 170.07464599609375, -26.206817626953125, -19.933731079101562, 173.7656707763672, 125.9214096069336, -48.89555358886719, 68.19007110595703, 73.2835693359375, -8.994405746459961, 96.70648193359375, 15.872222900390625, 89.02252197265625, 163.34930419921875, 40.97822570800781, 169.093505859375, 2.9177474975585938, 11.706930160522461], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000442.npy"}
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 80.46562194824219, "std": 92.83672332763672, "min": -169.79776000976562, "p10": -12.440770530700682, "median": 78.54653549194336, "p90": 176.52192077636718, "max": 280.2242126464844, "pos_frac": 0.796875, "sample": [8.750385284423828, 158.88978576660156, 61.62236022949219, 171.847900390625, 28.20520782470703, 35.5235595703125, 153.37538146972656, -78.54501342773438, 36.9962158203125, 89.87199401855469, -27.97997283935547, -12.730684280395508, 61.1933708190918, 168.2349090576172, 70.08903503417969, 201.56927490234375, 14.407419204711914, 130.48477172851562, 100.66144561767578, 165.79226684570312, 280.2242126464844, -169.79776000976562, -11.737815856933594, 162.6031494140625, 2.945646286010742, 209.79336547851562, 149.06060791015625, 79.84172821044922, 164.78494262695312, 168.11752319335938, 153.8048095703125, 123.83763122558594, 144.1431427001953, 164.69717407226562, 50.803680419921875, 177.07177734375, 165.18927001953125, 175.23892211914062, 127.47308349609375, 181.04641723632812, 6.416751861572266, 152.4320526123047, -9.355997085571289, 91.98273468017578, 172.87635803222656, 166.2340545654297, -11.764305114746094, 12.207489013671875, 5.80181884765625, 6.401304244995117, 5.481784820556641, -70.61540985107422, 77.2513427734375, -0.8441009521484375, 18.246761322021484, 47.45427322387695, -156.7689971923828, -16.36614990234375, 205.29383850097656, 49.133018493652344, -0.028865814208984375, -7.213768005371094, 149.2792510986328, 218.86293029785156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000443.npy"}
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 29.7401180267334, "std": 90.70862579345703, "min": -172.32965087890625, "p10": -72.7663803100586, "median": 13.779987335205078, "p90": 163.7701370239258, "max": 223.04647827148438, "pos_frac": 0.640625, "sample": [127.62833404541016, -1.9670829772949219, -47.254676818847656, 2.8107852935791016, 43.122711181640625, 31.03616714477539, 84.62654876708984, -26.782943725585938, 3.8755455017089844, 128.3122100830078, 6.648124694824219, 77.25952911376953, -133.8643798828125, -162.66119384765625, 34.21492004394531, 14.2509765625, 83.03663635253906, 29.165918350219727, 58.86037063598633, -62.039215087890625, 121.99295043945312, 195.84933471679688, -0.7976150512695312, -42.82395935058594, -8.213485717773438, 20.04560089111328, 6.1628265380859375, -8.530981063842773, -71.35000610351562, -68.37834167480469, 159.26953125, 7.287448883056641, 165.53326416015625, 40.561920166015625, 11.53302001953125, 159.15826416015625, 31.587234497070312, 168.9065704345703, -10.896713256835938, -172.32965087890625, 20.10198974609375, 223.04647827148438, -124.36614990234375, -43.930870056152344, -4.748014450073242, 170.39263916015625, 141.82284545898438, 117.1240234375, 177.2868194580078, 67.45025634765625, 17.6929874420166, 99.21955871582031, 3.527873992919922, 13.308998107910156, 167.21258544921875, -14.899370193481445, -73.37339782714844, 26.59917640686035, 7.611846923828125, -90.67797088623047, -145.39544677734375, 159.6561737060547, -3.043062210083008, -3.0988941192626953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000444.npy"}
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 69.72836303710938, "std": 86.56059265136719, "min": -112.49141693115234, "p10": -39.15094299316406, "median": 76.53562545776367, "p90": 177.07732543945312, "max": 210.42431640625, "pos_frac": 0.765625, "sample": [46.52633285522461, 210.42431640625, 10.893272399902344, -22.849319458007812, -20.67755126953125, 19.775360107421875, 171.21978759765625, 151.34202575683594, 47.338958740234375, 79.26307678222656, -0.764190673828125, -106.7069091796875, -33.05381774902344, 10.072677612304688, -80.84246826171875, 162.5133514404297, -52.44636535644531, 7.269554138183594, 6.410919189453125, 66.41960906982422, -0.0258026123046875, 176.90484619140625, -112.49141693115234, 43.284698486328125, -58.69994354248047, 178.083251953125, 6.085182189941406, 169.74169921875, 34.51526641845703, -14.475494384765625, -54.43192672729492, -7.221136093139648, 93.39464569091797, 188.67584228515625, 177.1512451171875, 103.01849365234375, 159.4097900390625, 172.95559692382812, 79.27132415771484, -40.91631317138672, 181.78643798828125, 73.08260345458984, 82.72476196289062, 194.30165100097656, 173.23536682128906, 169.5614471435547, 187.40023803710938, 10.786355972290039, 138.4638214111328, -35.03174591064453, 109.50660705566406, 2.2453384399414062, 5.79669189453125, 85.1162338256836, 73.80817413330078, 115.73483276367188, 165.92532348632812, 166.53164672851562, 79.86714935302734, 84.5625228881836, 142.27337646484375, 98.0055160522461, 176.31692504882812, 14.255332946777344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000445.npy"}
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 71.51223754882812, "std": 79.83660888671875, "min": -101.570556640625, "p10": -17.03611431121826, "median": 57.16141319274902, "p90": 180.40939331054688, "max": 233.16278076171875, "pos_frac": 0.78125, "sample": [5.650030136108398, -4.117561340332031, -101.570556640625, 89.19229125976562, 28.610313415527344, 60.53169250488281, 57.69553756713867, -18.912330627441406, 233.16278076171875, 56.18988037109375, 18.811588287353516, 136.35926818847656, 153.611328125, 160.45904541015625, 121.00507354736328, 180.72781372070312, 16.726905822753906, 89.13252258300781, 70.9113540649414, 36.685386657714844, 3.473499298095703, 117.17418670654297, -0.0015869140625, 161.42218017578125, 179.66641235351562, 71.22134399414062, 3.131275177001953, 162.718017578125, 182.03526306152344, -44.674095153808594, 161.05233764648438, 0.8488292694091797, -70.95215606689453, -2.1705551147460938, -16.615638732910156, -1.653848648071289, 19.79560089111328, -26.96261215209961, 63.566993713378906, 146.86578369140625, 31.1201171875, 188.4679412841797, 186.02444458007812, 166.58229064941406, 213.27877807617188, -1.1634578704833984, 196.12615966796875, 166.86460876464844, -17.216318130493164, 54.71632766723633, 67.22899627685547, 56.627288818359375, 49.650665283203125, 102.81355285644531, 173.3540802001953, 176.44151306152344, 5.388263702392578, 5.885528564453125, 2.3752517700195312, 98.97682189941406, -1.5008087158203125, 51.07438278198242, -21.4273681640625, 124.29045867919922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000446.npy"}
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 80.61550903320312, "std": 93.48397064208984, "min": -163.04800415039062, "p10": -53.600796508789045, "median": 88.86659622192383, "p90": 184.06912994384766, "max": 204.37518310546875, "pos_frac": 0.8125, "sample": [204.37518310546875, 184.20497131347656, 150.8236083984375, 20.791702270507812, 54.70861053466797, -15.446815490722656, 191.92242431640625, 18.13909912109375, -110.04193878173828, -13.590774536132812, 52.18682098388672, 149.628662109375, 5.827247619628906, 139.2947998046875, 182.728759765625, 177.57704162597656, 84.03101348876953, 59.06563186645508, 66.66595458984375, 93.70217895507812, 13.685783386230469, -60.82233428955078, 64.93354797363281, -36.75054168701172, 26.864151000976562, 196.96121215820312, 179.53782653808594, 186.9279022216797, 122.25140380859375, -71.30667114257812, 65.9921875, 82.48149108886719, 172.99053955078125, 127.7889404296875, 61.7776985168457, 164.99661254882812, 190.73760986328125, 122.51983642578125, 183.75216674804688, 5.723979949951172, 18.034088134765625, 107.22543334960938, 172.13037109375, -106.67707061767578, -4.520648956298828, -134.49227905273438, 30.79100799560547, 161.77468872070312, 181.119140625, 117.5727767944336, 2.242950439453125, 165.3516082763672, -77.51866149902344, 181.76068115234375, 118.23468017578125, 79.13265991210938, 113.75831604003906, -163.04800415039062, 133.09815979003906, 185.11087036132812, 177.0943603515625, -11.980491638183594, 68.807861328125, 146.75283813476562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000447.npy"}
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 64.70095825195312, "std": 85.69856262207031, "min": -166.10650634765625, "p10": -32.82727966308592, "median": 65.62990951538086, "p90": 174.28380737304687, "max": 226.67575073242188, "pos_frac": 0.796875, "sample": [168.24957275390625, -124.15962982177734, 174.0797119140625, 84.02522277832031, 68.01380157470703, 110.21259307861328, -1.7677993774414062, 143.1417999267578, 28.37401580810547, -1.8855819702148438, 98.68553161621094, 93.7091293334961, 67.92552185058594, -15.611099243164062, 165.87246704101562, 186.40321350097656, -4.488334655761719, 189.79159545898438, 15.671028137207031, 13.762874603271484, 9.830093383789062, 164.84716796875, 68.12348175048828, 94.02589416503906, 115.73355865478516, -0.7829627990722656, 171.90255737304688, 66.71331024169922, 20.75647735595703, -40.20564270019531, -70.22029876708984, 39.33168029785156, 66.02118682861328, 2.4120941162109375, 4.683509826660156, 112.16773986816406, -46.07221984863281, 23.25352668762207, 205.78118896484375, -6.97911262512207, 5.174581527709961, -166.10650634765625, 100.88861083984375, 34.83430480957031, 71.4185791015625, 161.59420776367188, 97.82781982421875, 164.11666870117188, 63.69470977783203, 39.42912292480469, -56.3438720703125, 46.97898864746094, 65.23863220214844, 163.8159637451172, 1.7766952514648438, 226.67575073242188, 3.1473236083984375, -106.66996002197266, 164.9855194091797, 178.19949340820312, 174.37127685546875, 48.28428649902344, 10.270439147949219, 185.92984008789062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000448.npy"}
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 31.2590274810791, "std": 78.7205581665039, "min": -133.08172607421875, "p10": -77.62321777343747, "median": 13.977977752685547, "p90": 142.79025268554688, "max": 192.95599365234375, "pos_frac": 0.640625, "sample": [2.768075942993164, 17.267120361328125, -98.45472717285156, 59.962432861328125, 10.688835144042969, -0.41603851318359375, -14.055221557617188, 49.23688507080078, 49.113990783691406, 182.011474609375, 143.324462890625, 101.14118194580078, 0.5318222045898438, -35.92601776123047, -15.833099365234375, 0.22515869140625, 192.95599365234375, 2.6789016723632812, -2.0085906982421875, 47.33285140991211, -50.34553146362305, 67.30904388427734, 174.44845581054688, 118.64739990234375, -133.08172607421875, -12.556636810302734, 114.07205963134766, -58.20172119140625, 128.69964599609375, 48.48291015625, 42.97419738769531, -22.532676696777344, 34.109375, -88.27781677246094, 120.81969451904297, 9.967338562011719, 176.08644104003906, -8.974239349365234, 58.309383392333984, 35.22688293457031, 6.344381332397461, 7.1818389892578125, -85.94671630859375, 28.375503540039062, -87.0008544921875, 95.35884094238281, -4.029052734375, 181.60086059570312, 7.708690643310547, 93.75061798095703, 157.25473022460938, -19.15277099609375, 22.293678283691406, -8.225120544433594, -51.642608642578125, 98.36724853515625, 122.6927719116211, -87.43758392333984, 141.54376220703125, -55.90819549560547, 80.33943176269531, -106.22311401367188, 23.071781158447266, -7.4682159423828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000449.npy"}
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 49.743858337402344, "std": 92.36797332763672, "min": -133.74981689453125, "p10": -53.011747360229485, "median": 23.46269130706787, "p90": 178.41804656982424, "max": 261.1431579589844, "pos_frac": 0.640625, "sample": [1.385915756225586, -20.755996704101562, -54.776512145996094, -38.287208557128906, 261.1431579589844, 191.00991821289062, 79.60923767089844, 41.565155029296875, 155.886474609375, 180.07919311523438, 91.73731231689453, 23.66937828063965, 51.39470672607422, 47.72413635253906, 1.1297836303710938, 146.43011474609375, -2.5738677978515625, 56.44389343261719, 165.58729553222656, 214.62908935546875, -2.2228965759277344, 37.94960403442383, -8.476547241210938, -29.99341583251953, 77.34278869628906, -45.43428421020508, -0.41565704345703125, 171.73098754882812, 70.34575653076172, 1.2085208892822266, -4.361564636230469, 170.55361938476562, 29.65618133544922, 122.46612548828125, -103.07908630371094, 180.12896728515625, 157.29541015625, -2.408885955810547, 200.84054565429688, 32.49763870239258, 6.9120941162109375, -5.187416076660156, 108.04988861083984, -105.20879364013672, 23.256004333496094, 2.613710403442383, 134.2084197998047, -0.49517822265625, 128.41510009765625, -48.89396286010742, 20.334070205688477, -133.74981689453125, 17.49333381652832, 94.76981353759766, 228.35702514648438, -3.2620773315429688, -70.5391845703125, 164.19479370117188, -99.9327392578125, 174.5420379638672, -74.43806457519531, 12.584800720214844, -24.777536392211914, -14.294364929199219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000450.npy"}
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 45.899452209472656, "std": 86.90593719482422, "min": -156.39401245117188, "p10": -27.05132369995117, "median": 27.482269287109375, "p90": 166.96219329833986, "max": 215.21188354492188, "pos_frac": 0.6875, "sample": [117.17698669433594, -21.99285888671875, 43.72831726074219, 50.01918029785156, 136.15208435058594, 18.68634033203125, -156.39401245117188, 39.32696533203125, 2.087900161743164, 183.91275024414062, 93.44219970703125, 80.19628143310547, 169.1595458984375, 30.48345947265625, 157.2138671875, 33.31095886230469, 64.60839080810547, -6.9950408935546875, 11.124814987182617, 85.1100082397461, -0.30641746520996094, -130.69049072265625, -121.52267456054688, 215.21188354492188, 161.8350372314453, -28.372344970703125, 68.5540771484375, -127.04010009765625, -31.704238891601562, -1.1402931213378906, 44.945167541503906, 214.32421875, -22.056652069091797, -1.2032947540283203, 144.73724365234375, 84.94986724853516, 172.97048950195312, 138.77590942382812, -20.552871704101562, 13.258811950683594, 16.843704223632812, 106.97322082519531, 2.7818679809570312, -3.7027816772460938, 93.94024658203125, 7.373199462890625, 5.681461334228516, 142.03489685058594, 185.0055694580078, 110.1051025390625, -17.857818603515625, 148.71444702148438, 24.4810791015625, -16.025962829589844, 0.8043365478515625, 10.6680908203125, 152.6836700439453, 0.7272300720214844, -5.282136917114258, 51.33487319946289, 178.60165405273438, -12.630069732666016, -127.05329895019531, -23.96894073486328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000451.npy"}
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 54.2625617980957, "std": 94.36571502685547, "min": -163.35284423828125, "p10": -53.98387145996094, "median": 39.232242584228516, "p90": 179.78668518066408, "max": 220.2400665283203, "pos_frac": 0.765625, "sample": [21.20006561279297, -17.38660430908203, 18.03675651550293, 100.61634826660156, 29.96155548095703, 96.54937744140625, 5.929603576660156, 167.9662628173828, 155.05648803710938, 181.69711303710938, 12.431713104248047, 211.6178436279297, -163.35284423828125, 156.12306213378906, 148.68966674804688, 92.03154754638672, 220.2400665283203, 84.55783081054688, 121.68389892578125, 76.06069946289062, 39.500709533691406, -26.753707885742188, 211.03265380859375, 133.59812927246094, 32.82148361206055, -12.831720352172852, 155.58546447753906, 33.438323974609375, 4.957984924316406, 16.56244659423828, 7.084476470947266, 199.00677490234375, -110.871337890625, 176.25457763671875, -54.115020751953125, 103.19596862792969, 38.963775634765625, 87.00822448730469, -132.4886474609375, 6.257724761962891, 104.38123321533203, -53.6778564453125, 86.39766693115234, 52.84283447265625, 134.6571044921875, -134.51698303222656, 181.30044555664062, 143.43533325195312, 25.118637084960938, 3.7519989013671875, 54.43305969238281, -9.304718017578125, 5.612741470336914, 135.5396270751953, 4.073352813720703, 4.5774383544921875, -40.789817810058594, 184.15750122070312, 135.38290405273438, -98.48849487304688, -10.678436279296875, 96.23407745361328, -158.72291564941406, -0.831451416015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000452.npy"}
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 57.81760787963867, "std": 92.2052001953125, "min": -134.15118408203125, "p10": -57.35127334594726, "median": 40.34218978881836, "p90": 181.73224487304688, "max": 242.56410217285156, "pos_frac": 0.703125, "sample": [121.14960479736328, 33.72731018066406, -8.5810546875, -16.55158233642578, 189.41641235351562, 19.529281616210938, -14.602554321289062, 157.377197265625, 148.36302185058594, -21.810997009277344, -114.65885925292969, -3.86041259765625, 72.39999389648438, 0.341064453125, 162.4224853515625, 162.38052368164062, 242.56410217285156, 64.35488891601562, 180.28756713867188, 79.96640014648438, 0.5052471160888672, -11.191047668457031, 40.16670227050781, 21.448013305664062, 6.701019287109375, -129.2274169921875, -134.15118408203125, 182.29397583007812, -7.344566345214844, 166.93603515625, 180.42153930664062, 40.517677307128906, 106.44537353515625, -58.851898193359375, -3.2131004333496094, 47.473846435546875, 160.55267333984375, 164.517578125, -86.0539779663086, 161.8719940185547, 66.64665985107422, 205.05235290527344, -77.34278869628906, 2.974864959716797, -17.122482299804688, 36.63179016113281, -53.849815368652344, 26.921634674072266, 128.01849365234375, 72.06025695800781, -3.8072166442871094, -23.66489028930664, 70.12174224853516, 94.18717956542969, 106.34416198730469, -69.94265747070312, 1.0732917785644531, 75.94972229003906, 183.74594116210938, 28.7840576171875, 205.18038940429688, 139.4075164794922, 193.750732421875, 5.17303466796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000453.npy"}
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 64.71170043945312, "std": 88.2756576538086, "min": -205.37176513671875, "p10": -16.42158794403076, "median": 59.939435958862305, "p90": 175.7208969116211, "max": 251.6607666015625, "pos_frac": 0.796875, "sample": [161.8056182861328, -37.18788146972656, -7.326446533203125, 174.51541137695312, -205.37176513671875, -14.710634231567383, 145.1347198486328, 50.67975616455078, 5.886959075927734, 25.541366577148438, 80.08137512207031, 176.23753356933594, 24.662445068359375, -179.5242462158203, 180.66375732421875, 109.70549011230469, 172.13888549804688, 79.90525817871094, 192.43878173828125, 31.838333129882812, -9.4825439453125, -97.02670288085938, 5.011064529418945, 162.71109008789062, 194.6287841796875, 251.6607666015625, -2.1454334259033203, 34.634185791015625, -17.15485382080078, 17.20391082763672, -19.294204711914062, 114.38361358642578, 133.38035583496094, 19.98780059814453, 104.05587005615234, 66.28917694091797, 193.5577392578125, -39.76205825805664, 177.1569366455078, 75.99050903320312, 103.47392272949219, 91.88824462890625, -0.02231597900390625, 0.7308225631713867, 11.616043090820312, 20.657005310058594, 170.468017578125, 110.5584716796875, 1.840555191040039, 37.959136962890625, 117.66737365722656, 142.49880981445312, 8.015514373779297, 154.97756958007812, 145.96388244628906, 31.342269897460938, 114.04070281982422, 66.01109313964844, 0.15218353271484375, 133.0989532470703, 98.8256607055664, 53.86777877807617, 2.914461135864258, -9.898101806640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000454.npy"}
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 61.74366760253906, "std": 85.58280181884766, "min": -128.39166259765625, "p10": -40.39904251098632, "median": 42.52767372131348, "p90": 181.75690612792968, "max": 232.076904296875, "pos_frac": 0.78125, "sample": [90.455322265625, 24.235931396484375, 74.84148406982422, 204.59426879882812, 93.43795013427734, 2.2855968475341797, 218.56541442871094, 150.16928100585938, -33.890541076660156, 20.191329956054688, 114.63758850097656, -128.39166259765625, 78.16302490234375, 105.66389465332031, 63.92510986328125, 232.076904296875, -18.494163513183594, 1.34130859375, 11.274505615234375, -87.83318328857422, 195.1070556640625, 1.492319107055664, 144.07684326171875, -3.066242218017578, -2.65289306640625, 181.78289794921875, 47.54032897949219, 108.07408142089844, 177.7204132080078, 56.59807586669922, 11.24874496459961, 15.463129043579102, 178.22503662109375, 16.450050354003906, 49.18848419189453, -3.5558853149414062, -67.54132080078125, 116.04869842529297, 153.7081298828125, 170.40670776367188, -46.71437454223633, 16.66309356689453, 84.094970703125, -18.746673583984375, 160.892333984375, 208.8320770263672, 81.3650131225586, 61.592979431152344, 15.413070678710938, 2.9888916015625, 181.69625854492188, 34.36467742919922, -2.4103965759277344, 164.04983520507812, 10.686691284179688, 14.54046630859375, 30.52572250366211, -62.829246520996094, -43.18840026855469, 24.32048797607422, 55.998741149902344, 37.515018463134766, -49.317928314208984, 225.6973876953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000455.npy"}
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 53.519752502441406, "std": 89.1139907836914, "min": -136.4414825439453, "p10": -27.672329139709465, "median": 35.038442611694336, "p90": 176.90924072265625, "max": 257.6181335449219, "pos_frac": 0.734375, "sample": [104.32365417480469, -6.526710510253906, 0.5560359954833984, 111.0857925415039, 257.6181335449219, 2.1527748107910156, -30.93484115600586, 73.23397827148438, -5.299312591552734, 136.0522003173828, 49.66831970214844, 25.58407211303711, 216.84255981445312, 167.40060424804688, 18.248672485351562, 2.3645172119140625, 100.61045837402344, 70.32584381103516, -0.04507637023925781, 177.67562866210938, -107.44468688964844, 12.4764404296875, 58.52698516845703, -10.937187194824219, 153.41726684570312, -20.05980110168457, -136.4414825439453, -135.9081573486328, 141.39498901367188, -70.67880249023438, 8.425121307373047, 69.22386169433594, 6.43968391418457, 110.73674011230469, 6.311592102050781, 62.20832824707031, 55.424346923828125, -3.5622406005859375, 98.61201477050781, 7.909038543701172, -9.531850814819336, 44.2380256652832, 68.84413146972656, 87.97987365722656, -99.36158752441406, -106.2017593383789, 128.09848022460938, 2.63177490234375, -11.945610046386719, -6.587085723876953, 138.26910400390625, 178.3930206298828, 17.228439331054688, 212.77711486816406, 17.62506103515625, 144.25721740722656, 200.0804901123047, 25.83885955810547, 153.119384765625, 175.12100219726562, 10.991327285766602, -7.26422119140625, 65.71229553222656, 217.9392852783203], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000456.npy"}
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 59.67945861816406, "std": 77.46920013427734, "min": -103.48977661132812, "p10": -9.064084053039547, "median": 45.43422317504883, "p90": 164.50708312988283, "max": 194.04409790039062, "pos_frac": 0.84375, "sample": [13.310920715332031, 2.5692901611328125, 2.4912986755371094, 110.1005859375, 52.489158630371094, -103.48977661132812, 29.52280044555664, 39.60838317871094, 36.31450653076172, 3.50860595703125, 22.86324119567871, 143.5562744140625, 153.99354553222656, 110.52498626708984, -6.522350311279297, -101.863037109375, 169.4938201904297, 22.651138305664062, 0.11693763732910156, 150.17575073242188, 160.32614135742188, 75.78886413574219, 194.04409790039062, 51.73747253417969, 5.287029266357422, 69.95741271972656, -10.153398513793945, 163.672607421875, 1.3170242309570312, -0.18394088745117188, -99.63616943359375, -77.49671173095703, 8.049064636230469, 155.32376098632812, 23.434192657470703, -42.30610656738281, -79.69746398925781, 40.170562744140625, 112.0503158569336, 31.104660034179688, 96.48645782470703, 4.275421142578125, 173.74411010742188, 144.40078735351562, 0.6719703674316406, 13.966560363769531, 117.54844665527344, 50.69788360595703, 10.390815734863281, 175.6094970703125, 106.61043548583984, 104.2272720336914, 164.86471557617188, 119.75123596191406, 115.63165283203125, 182.05386352539062, 184.41085815429688, 57.74726867675781, 19.61329460144043, 23.029552459716797, 157.02194213867188, -5.410614013671875, 99.26979064941406, 68.66624450683594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000457.npy"}
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 61.582176208496094, "std": 92.6472396850586, "min": -186.65957641601562, "p10": -17.33183135986328, "median": 49.0064640045166, "p90": 185.0380798339844, "max": 229.0122833251953, "pos_frac": 0.78125, "sample": [141.78668212890625, 142.7586669921875, 4.182016372680664, 95.11312866210938, 59.409034729003906, 30.511234283447266, 132.4693145751953, -186.65957641601562, -0.10963821411132812, 11.205780029296875, -16.520416259765625, -55.982391357421875, -17.679580688476562, 190.03196716308594, 173.30335998535156, 42.87317657470703, 57.086036682128906, 174.07411193847656, 66.00537109375, 63.14839172363281, 7.519311904907227, 27.045364379882812, 30.15320587158203, -8.081001281738281, 132.22288513183594, 21.417953491210938, 198.34361267089844, 69.92623138427734, 20.251808166503906, 175.0419921875, 200.28561401367188, 74.66828155517578, 83.71951293945312, -91.5130386352539, 35.91108703613281, -0.2148113250732422, 9.70635986328125, 23.83160400390625, 83.8323974609375, 159.8920135498047, 126.75344848632812, 55.13975143432617, -173.46902465820312, -3.6784019470214844, 220.41387939453125, 1.2212486267089844, 150.631591796875, -10.668203353881836, 176.53721618652344, 195.20057678222656, -13.874553680419922, 229.0122833251953, 116.71192932128906, 22.28214454650879, -95.06752014160156, 180.94529724121094, 33.940147399902344, 169.643798828125, -111.4981689453125, 20.993118286132812, 7.832210540771484, 28.589828491210938, 186.79212951660156, 65.90750122070312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000458.npy"}
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 75.65729522705078, "std": 89.16734313964844, "min": -161.72564697265625, "p10": -15.647237014770505, "median": 73.2525749206543, "p90": 190.73949127197267, "max": 237.0718536376953, "pos_frac": 0.796875, "sample": [40.814109802246094, 111.15238952636719, -13.284942626953125, 137.673095703125, 11.026817321777344, -161.72564697265625, 2.9765186309814453, 163.23965454101562, 133.06631469726562, 161.3193359375, 196.66815185546875, -38.63228225708008, 106.88319396972656, -53.538307189941406, 180.322265625, 200.73194885253906, 64.38130187988281, 75.23724365234375, 9.794139862060547, 162.4175262451172, 134.22186279296875, 130.22447204589844, 95.4674072265625, 29.65416717529297, 9.208345413208008, 172.21319580078125, 47.737892150878906, 159.83169555664062, 43.20680618286133, 135.97181701660156, 196.90325927734375, 31.970497131347656, -9.806634902954102, 8.889898300170898, -7.2971954345703125, 90.5938720703125, 80.90367889404297, -104.46290588378906, 166.38320922851562, -16.659648895263672, -2.4213504791259766, 106.84735107421875, 169.69712829589844, 2.4401016235351562, 237.0718536376953, -95.30133056640625, 71.26790618896484, 16.73059844970703, -53.219696044921875, 118.01494598388672, -6.23876953125, 0.97760009765625, -3.2099056243896484, 210.98748779296875, 123.06753540039062, 41.17051696777344, 166.81504821777344, 59.898284912109375, 191.38731384277344, 186.81509399414062, 190.85411071777344, 9.768592834472656, 190.4720458984375, 22.495826721191406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000459.npy"}
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 73.07688903808594, "std": 88.36984252929688, "min": -110.13455200195312, "p10": -31.315242767333984, "median": 53.812217712402344, "p90": 180.62850952148438, "max": 258.76776123046875, "pos_frac": 0.8125, "sample": [207.47511291503906, 105.59246826171875, 207.63916015625, -31.358352661132812, 16.358070373535156, 178.41329956054688, 196.78053283691406, 56.97882080078125, 258.76776123046875, 3.310955047607422, -37.908935546875, -39.52775573730469, 3.2189559936523438, 17.392478942871094, 8.26470947265625, 3.4686965942382812, 11.001344680786133, 172.328125, -46.277374267578125, 169.109619140625, 68.4666519165039, 177.91400146484375, 29.751625061035156, 181.91848754882812, 170.4500732421875, 91.38404083251953, 6.485561370849609, 60.529693603515625, 176.92422485351562, 70.056396484375, 126.19015502929688, 176.66114807128906, 145.61070251464844, 131.25852966308594, 28.930139541625977, 61.252410888671875, 2.9489097595214844, 33.17387390136719, 19.45477294921875, 0.5615978240966797, 91.86558532714844, 176.92652893066406, -31.21465301513672, 10.205141067504883, 145.22459411621094, 180.7741241455078, -28.244476318359375, 0.608428955078125, 146.28292846679688, -66.71783447265625, 2.1201343536376953, 36.1636962890625, 174.83370971679688, 50.64561462402344, -4.2481842041015625, 180.2887420654297, 40.308021545410156, 239.5240478515625, -110.13455200195312, -35.19615173339844, 88.22268676757812, -3.4901123046875, -3.7668533325195312, 174.98922729492188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000460.npy"}
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 74.86151123046875, "std": 88.6667709350586, "min": -107.58601379394531, "p10": -41.53958053588866, "median": 72.30746459960938, "p90": 183.7237075805664, "max": 221.97366333007812, "pos_frac": 0.75, "sample": [191.4829864501953, 83.43665313720703, 41.83445739746094, 33.244468688964844, 160.64288330078125, 177.0277557373047, 70.27440643310547, 158.43960571289062, 154.12026977539062, -92.46327209472656, -26.79486083984375, 74.96061706542969, 59.60285949707031, 3.0140018463134766, 117.22119903564453, -11.729095458984375, -25.988632202148438, 181.06463623046875, 1.5843353271484375, 4.558431625366211, 129.19998168945312, 151.76560974121094, 149.84063720703125, 4.329833984375, 39.218170166015625, 124.60191345214844, -53.555870056152344, -6.824974060058594, -26.01116180419922, 115.42900848388672, 174.40206909179688, -49.322181701660156, -18.5272159576416, 74.34052276611328, 158.94595336914062, 188.73593139648438, 221.97366333007812, 164.16806030273438, 53.35248565673828, 210.63540649414062, 184.3054656982422, 171.55865478515625, 113.83642578125, 47.14935302734375, 1.5907039642333984, -52.37261962890625, 89.63008117675781, -0.2056560516357422, 161.70166015625, 17.622657775878906, 201.9636993408203, 91.9760971069336, 57.195640563964844, 20.381179809570312, 182.36627197265625, -47.817108154296875, 194.36801147460938, -107.58601379394531, 42.334083557128906, -11.352912902832031, 171.83169555664062, -32.778221130371094, 176.50088500976562, -45.29444885253906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000461.npy"}
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 77.89765930175781, "std": 89.31340026855469, "min": -175.86843872070312, "p10": -21.0456558227539, "median": 76.91939544677734, "p90": 183.46737365722657, "max": 221.36053466796875, "pos_frac": 0.796875, "sample": [148.3418731689453, 190.7702178955078, 184.93966674804688, 173.4464111328125, 4.7444610595703125, 29.922927856445312, 117.2178726196289, 0.5441322326660156, -5.615455627441406, 3.1412582397460938, -26.062278747558594, 171.2615509033203, 165.64535522460938, 31.21166229248047, 177.0272979736328, 184.45443725585938, -14.522323608398438, 149.08152770996094, 28.821434020996094, 103.67276763916016, 153.8345184326172, 44.11909484863281, 47.85674285888672, 154.8077392578125, 66.22463989257812, 7.6298675537109375, 161.4314422607422, 184.01492309570312, -8.46725845336914, 154.22378540039062, -23.84136962890625, -26.234115600585938, -1.6280288696289062, 170.54556274414062, 184.60415649414062, -50.063507080078125, -29.96632957458496, 108.20515441894531, 57.049400329589844, 121.48023986816406, 170.97537231445312, 186.4699249267578, 93.99280548095703, 23.42351531982422, 27.073165893554688, 182.18975830078125, -0.1248779296875, 15.99234390258789, 168.39840698242188, 163.14234924316406, 87.61415100097656, -154.69688415527344, 157.853271484375, 174.42446899414062, 150.76649475097656, 6.711677551269531, 18.851503372192383, 221.36053466796875, 95.19001007080078, 64.9510269165039, -175.86843872070312, 4.457923889160156, -4.897989273071289, 13.328052520751953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000462.npy"}
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 48.669517517089844, "std": 83.17489624023438, "min": -173.85134887695312, "p10": -55.28206253051757, "median": 40.18065643310547, "p90": 170.68773345947267, "max": 196.1166534423828, "pos_frac": 0.71875, "sample": [-18.284420013427734, 146.40994262695312, 104.86463928222656, -173.85134887695312, 102.56393432617188, 136.0401153564453, -25.8912296295166, 175.16119384765625, 178.80014038085938, 108.60385131835938, -60.1596565246582, -28.178543090820312, 3.5715255737304688, 40.530128479003906, 125.9227294921875, -69.19546508789062, 46.29846954345703, 78.09071350097656, 23.73401641845703, -72.14681243896484, 72.24777221679688, 23.17804718017578, -7.0700531005859375, 174.40252685546875, 79.79022979736328, -12.650665283203125, 14.657272338867188, 18.381690979003906, -0.22283935546875, 13.736202239990234, 31.419296264648438, 1.7491989135742188, 176.99684143066406, 168.61656188964844, 53.1404914855957, 114.88079071044922, 132.16998291015625, 39.83118438720703, 1.9227066040039062, 196.1166534423828, 60.93098068237305, 8.661491394042969, 47.144775390625, -115.31501770019531, -25.23227310180664, -45.58819580078125, 182.33277893066406, 132.07455444335938, 152.6682891845703, 109.51828002929688, 42.0395622253418, -125.8759536743164, -3.0510177612304688, -59.43657684326172, 31.883712768554688, 171.57537841796875, 56.92952346801758, -0.9997711181640625, 67.97045135498047, 108.49732208251953, -8.058891296386719, 36.23643493652344, 6.312734603881836, 167.4527587890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000463.npy"}
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 70.26266479492188, "std": 97.03780364990234, "min": -165.57907104492188, "p10": -32.559014511108394, "median": 63.10038757324219, "p90": 188.66159820556643, "max": 203.76026916503906, "pos_frac": 0.734375, "sample": [-20.324447631835938, 22.338905334472656, 173.06314086914062, -34.478668212890625, 171.1450958251953, -1.406951904296875, 3.0758438110351562, 171.60836791992188, 99.88796997070312, 141.5560302734375, 171.11239624023438, -20.639488220214844, 162.68185424804688, 113.4961929321289, 92.50128173828125, -59.168914794921875, 194.55413818359375, 146.46383666992188, 16.269638061523438, 167.4493408203125, 13.5377197265625, 203.76026916503906, -5.746042251586914, 114.25563049316406, 17.23577880859375, 172.62811279296875, 36.42998504638672, 200.15296936035156, 107.1336669921875, 124.57601928710938, 60.81492614746094, 6.824668884277344, 176.8406219482422, 200.95082092285156, -165.57907104492188, -23.312957763671875, 179.75421142578125, 47.804420471191406, -124.406494140625, -84.38130187988281, 160.76947021484375, 12.986167907714844, 40.73603439331055, 203.099365234375, -75.30583953857422, -7.952156066894531, 13.772132873535156, 65.38584899902344, -13.011703491210938, 172.73414611816406, 122.66949462890625, 192.4790496826172, -154.51226806640625, -28.079822540283203, 50.663291931152344, -19.949748992919922, 15.712409973144531, 177.89593505859375, 124.45067596435547, 195.39083862304688, -18.649635314941406, 131.8374481201172, 151.54339599609375, 11.6866455078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000464.npy"}
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 60.45001220703125, "std": 93.53370666503906, "min": -127.42662048339844, "p10": -66.00580139160157, "median": 62.541046142578125, "p90": 179.61341247558593, "max": 238.17578125, "pos_frac": 0.703125, "sample": [-32.844940185546875, 238.17578125, 161.343017578125, -88.1910400390625, 137.8236083984375, 138.42337036132812, -23.310867309570312, 183.5198974609375, 63.753997802734375, 57.792442321777344, 61.328094482421875, 177.57012939453125, 122.16749572753906, 195.9010009765625, -52.63317108154297, 162.28317260742188, 1.5758819580078125, 177.428955078125, 83.00628662109375, -4.458946228027344, -66.25279235839844, -98.79017639160156, 180.48910522460938, 191.90440368652344, 111.38059997558594, 185.7849884033203, -73.3875961303711, -105.70457458496094, -17.730201721191406, 86.18070220947266, 1.7626266479492188, -23.03278350830078, 45.5650634765625, 86.00428009033203, 143.99383544921875, 31.182403564453125, -65.42948913574219, -3.4648818969726562, 3.8554439544677734, 165.10035705566406, -95.78736877441406, 0.37612152099609375, 82.35997009277344, 28.620033264160156, -13.162818908691406, -127.42662048339844, -52.29954147338867, 82.24394226074219, 29.21234130859375, 169.86917114257812, 4.917167663574219, 188.83584594726562, 17.893836975097656, 108.73159790039062, 34.39842987060547, -26.943822860717773, 78.08112335205078, -8.634679794311523, 103.61271667480469, 93.49890899658203, 170.90383911132812, 129.6274871826172, 159.87918090820312, 169.92837524414062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000465.npy"}
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 37.31903839111328, "std": 82.732177734375, "min": -192.6854248046875, "p10": -51.17476272583007, "median": 21.82399845123291, "p90": 165.39779510498047, "max": 217.46212768554688, "pos_frac": 0.703125, "sample": [1.1475753784179688, 147.3319091796875, -33.513671875, -101.4466552734375, 20.799402236938477, -1.5455646514892578, 12.221771240234375, -10.8826904296875, 22.848594665527344, -192.6854248046875, 67.75190734863281, 65.45365905761719, 173.91677856445312, -102.94983673095703, 73.66870880126953, 17.861175537109375, 101.27104187011719, 200.75643920898438, 24.491455078125, 191.29446411132812, 60.93488693237305, 9.100822448730469, 163.8771514892578, -0.2818412780761719, -6.611053466796875, 4.201240539550781, 109.26091003417969, -63.088680267333984, 217.46212768554688, 23.485595703125, -2.23931884765625, 86.96537780761719, -12.02425765991211, 26.873428344726562, 158.0960693359375, 47.29582595825195, 123.018310546875, 55.022071838378906, -18.444324493408203, 22.990020751953125, -11.730140686035156, -106.7718505859375, -54.01659393310547, -80.56596374511719, 121.75137329101562, 27.201385498046875, 8.003952026367188, 122.4632339477539, 203.71926879882812, -0.08848190307617188, -44.5438232421875, 45.298858642578125, -19.51250457763672, 40.25950622558594, 5.293052673339844, 2.164754867553711, 8.309041976928711, 13.21674919128418, 25.45222282409668, 166.04949951171875, 185.27142333984375, 2.7553977966308594, 8.499519348144531, 36.252891540527344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000466.npy"}
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 52.71526336669922, "std": 81.35128021240234, "min": -145.12179565429688, "p10": -21.47787628173828, "median": 23.606555938720703, "p90": 173.9375106811524, "max": 303.8592224121094, "pos_frac": 0.828125, "sample": [3.1769866943359375, 93.34617614746094, 21.46820068359375, 30.999420166015625, 25.112152099609375, 80.11116790771484, 1.7349319458007812, -13.414867401123047, -16.04787826538086, 72.88294219970703, 96.78115844726562, -69.71119689941406, 179.66140747070312, -32.27281951904297, 61.33192443847656, 120.9275894165039, 73.5418930053711, 20.862571716308594, -117.22172546386719, 11.472869873046875, 140.53262329101562, -76.2166748046875, -18.461273193359375, 120.50531768798828, 193.36814880371094, 17.10333251953125, 22.10095977783203, 92.98269653320312, -145.12179565429688, 133.22161865234375, -3.2661972045898438, 44.45631408691406, 11.122503280639648, 10.654632568359375, 1.7044830322265625, 30.141677856445312, 88.80421447753906, 185.08929443359375, 96.56558227539062, 70.1839599609375, 0.4182586669921875, 0.24946022033691406, 4.113685607910156, 37.345306396484375, 4.467714309692383, -22.770706176757812, 154.0384521484375, 3.6728134155273438, 141.46273803710938, 114.16246032714844, 303.8592224121094, 8.165290832519531, 12.842155456542969, 12.771833419799805, 90.87667083740234, 102.64986419677734, 19.106773376464844, 182.80062866210938, 12.044029235839844, 207.48265075683594, -33.60581970214844, 178.22621154785156, 163.9305419921875, 15.252326965332031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000467.npy"}
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 74.44082641601562, "std": 92.82128143310547, "min": -158.00064086914062, "p10": -31.67081146240234, "median": 67.53831481933594, "p90": 183.86343231201172, "max": 211.81350708007812, "pos_frac": 0.828125, "sample": [2.380189895629883, 40.22434997558594, 110.84947204589844, 211.81350708007812, 33.487220764160156, -33.323974609375, 2.712911605834961, 143.1781463623047, 167.81787109375, 72.0968017578125, 128.30540466308594, 196.95172119140625, 184.34864807128906, -158.00064086914062, 16.459152221679688, 38.38447570800781, 53.40123748779297, -41.376861572265625, -145.8172607421875, 103.142822265625, 173.01693725585938, 98.80709838867188, 154.3505859375, 181.41986083984375, 58.489227294921875, 146.68482971191406, 175.72369384765625, 0.22321128845214844, -12.829597473144531, -127.37706756591797, 191.58489990234375, -105.79208374023438, 48.66986083984375, -2.2031936645507812, 28.61522674560547, -48.37218475341797, 126.78043365478516, 142.74716186523438, -27.813430786132812, 168.5270538330078, 41.82191467285156, 62.979827880859375, 182.73126220703125, 194.0811767578125, 166.1725616455078, 1.8558998107910156, 25.93550682067871, 34.91740417480469, 10.386955261230469, 4.216650009155273, 175.926025390625, 30.508399963378906, 115.02471923828125, 29.33544921875, 211.49539184570312, 98.751708984375, -18.509536743164062, 199.4847869873047, 170.43621826171875, 177.6279754638672, 171.15652465820312, 90.97440338134766, 75.81166076660156, 12.802162170410156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000468.npy"}
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 70.00105285644531, "std": 78.70305633544922, "min": -118.22528076171875, "p10": -14.119175720214843, "median": 55.92304992675781, "p90": 169.31480865478517, "max": 229.76165771484375, "pos_frac": 0.828125, "sample": [19.335281372070312, 167.67262268066406, 111.57225036621094, 145.79385375976562, 10.857551574707031, 6.93487548828125, 64.13716125488281, 144.57142639160156, 103.34017944335938, -118.22528076171875, 166.6857452392578, 87.07612609863281, 125.47630310058594, 0.41571807861328125, -3.8978805541992188, 165.65687561035156, 123.88556671142578, 64.21704864501953, 132.80137634277344, 176.75079345703125, 183.07476806640625, 7.105247497558594, 138.68942260742188, 2.47320556640625, 131.98019409179688, -86.54468536376953, 44.41699981689453, 13.394332885742188, 5.233015060424805, 47.70893859863281, -3.4851150512695312, 0.8382091522216797, 82.01470947265625, 191.03433227539062, 170.20877075195312, -12.986663818359375, 0.020740509033203125, -14.604537963867188, 169.90065002441406, 144.48976135253906, 156.92498779296875, 89.4688491821289, 35.42734909057617, 116.22037506103516, 16.961143493652344, -0.9754486083984375, 40.743499755859375, 0.7282333374023438, 167.94784545898438, 17.715499877929688, -27.230072021484375, -15.197429656982422, 211.04489135742188, 0.6246261596679688, 229.76165771484375, 8.277000427246094, 110.70318603515625, 31.636276245117188, 115.34133911132812, -20.24335289001465, 123.45211791992188, 9.453886032104492, -15.5166015625, 166.77748107910156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000469.npy"}
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 64.83692932128906, "std": 89.22238159179688, "min": -187.9608612060547, "p10": -34.748398208618156, "median": 62.29106903076172, "p90": 183.64797363281252, "max": 205.39154052734375, "pos_frac": 0.765625, "sample": [178.84945678710938, 75.96045684814453, 205.39154052734375, -187.9608612060547, 64.26359558105469, 154.555908203125, 23.075393676757812, 54.880958557128906, 168.91336059570312, 82.97314453125, 177.86221313476562, -80.2784423828125, 186.11441040039062, 8.410385131835938, 153.6951446533203, 28.66565704345703, 81.15921783447266, -83.57469177246094, -11.106231689453125, 152.4478302001953, -25.698169708251953, 157.9854278564453, 3.958925247192383, -85.25802612304688, 188.16818237304688, 44.21742248535156, 169.11160278320312, 130.0811004638672, 6.218505859375, -7.013010025024414, 100.90963745117188, 3.0025177001953125, 133.12596130371094, 20.4053955078125, -66.21540832519531, 7.067008972167969, 88.22352600097656, 2.2931957244873047, 60.31854248046875, -20.964248657226562, 185.14356994628906, 180.1582489013672, 189.870849609375, 9.201423645019531, -21.40576934814453, -22.12628173828125, 13.53321647644043, 162.58995056152344, 110.13919067382812, 145.31973266601562, 104.0166015625, 188.398681640625, -38.62706756591797, 136.30862426757812, 17.542373657226562, 147.0084228515625, 81.1065444946289, 77.460693359375, -8.607437133789062, -41.25653839111328, 187.74447631835938, -1.0542240142822266, 2.652660369873047, 0.2091827392578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000470.npy"}
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 75.26802825927734, "std": 90.08958435058594, "min": -169.97740173339844, "p10": -21.58872070312499, "median": 75.24839401245117, "p90": 183.85903930664062, "max": 233.1326904296875, "pos_frac": 0.8125, "sample": [5.5396575927734375, 210.05809020996094, 146.31053161621094, 146.3852081298828, 62.098052978515625, -25.02447509765625, 171.41748046875, 46.343048095703125, 173.30564880371094, 195.1376953125, 77.38045501708984, 41.701934814453125, 181.540283203125, 13.701210021972656, -146.99781799316406, -1.985464096069336, 84.46821594238281, 186.4005126953125, 181.1670684814453, -84.57403564453125, 26.640064239501953, 51.020572662353516, 127.69507598876953, 100.56475830078125, 163.70877075195312, -169.97740173339844, 93.32147979736328, 102.40380859375, 178.3133087158203, 9.580329895019531, 144.76626586914062, 116.0657958984375, 24.936046600341797, 29.955230712890625, 74.82745361328125, 75.6693344116211, 181.91122436523438, -39.40702819824219, 38.856605529785156, 174.32546997070312, 12.696205139160156, 8.116455078125, 179.8337860107422, 192.59014892578125, -3.108358383178711, 12.806106567382812, 25.805374145507812, 63.06050491333008, -43.206748962402344, 159.4061737060547, -13.57196044921875, 184.69381713867188, 88.08316802978516, 12.733505249023438, 154.0574951171875, -1.6041450500488281, 24.481063842773438, 108.26216125488281, -82.95445251464844, 233.1326904296875, 18.5430908203125, 114.25244140625, -4.3590850830078125, 203.8539276123047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000471.npy"}
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 58.34724426269531, "std": 102.58890533447266, "min": -195.48777770996094, "p10": -72.56583023071285, "median": 49.443756103515625, "p90": 182.95236053466797, "max": 295.569091796875, "pos_frac": 0.75, "sample": [5.542900085449219, 40.527320861816406, 85.5615005493164, -27.305221557617188, 184.02413940429688, 164.49671936035156, -14.582107543945312, 142.3748779296875, 197.01742553710938, 100.66647338867188, 85.00849914550781, 214.0406494140625, 13.517065048217773, -30.454986572265625, -6.275396347045898, 178.98785400390625, 7.603033065795898, 71.25102996826172, 177.63446044921875, 23.762908935546875, -98.07477569580078, 33.65663146972656, -26.150257110595703, 167.427734375, 6.101470947265625, 161.27420043945312, 24.319137573242188, 183.274658203125, 180.2685546875, -13.120136260986328, 15.995292663574219, 74.89608764648438, 78.31391143798828, -141.26979064941406, -29.33856964111328, 47.94769287109375, 204.71575927734375, 295.569091796875, -4.993644714355469, 0.6040725708007812, 174.38438415527344, 182.20033264160156, -195.48777770996094, 184.24896240234375, -125.59925842285156, -90.61333465576172, 170.07684326171875, 16.048852920532227, -148.30450439453125, 115.6310043334961, 6.515541076660156, 50.9398193359375, -1.9560089111328125, 52.333152770996094, 22.498794555664062, 60.10027313232422, 66.32865905761719, 31.386741638183594, 180.85389709472656, 38.44212341308594, 85.2525634765625, 63.79651641845703, -141.791015625, 162.12057495117188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000472.npy"}
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 60.171531677246094, "std": 104.79956817626953, "min": -163.4066162109375, "p10": -83.47698440551757, "median": 37.82400131225586, "p90": 197.96938934326175, "max": 260.3455810546875, "pos_frac": 0.734375, "sample": [217.71530151367188, 192.0904083251953, 24.620405197143555, -134.25613403320312, -114.90547180175781, 141.3509979248047, 37.34617614746094, 18.07379150390625, 38.30182647705078, 191.6116943359375, 172.70779418945312, 30.322904586791992, -2.9331302642822266, 75.52012634277344, 0.7135009765625, -55.634849548339844, 217.5525360107422, 203.551025390625, 10.689445495605469, 186.80172729492188, -117.82716369628906, 29.922569274902344, 93.3394775390625, -76.75411224365234, -60.486907958984375, 163.5595245361328, 144.91319274902344, 139.48086547851562, 10.799806594848633, -7.920677185058594, -8.399658203125, 2.9709854125976562, -163.4066162109375, -24.78162384033203, -22.2522029876709, 87.10847473144531, 260.3455810546875, 218.9547119140625, 15.189281463623047, 11.333414077758789, 218.24920654296875, 164.55935668945312, 175.77508544921875, 105.42144775390625, 84.3265380859375, 78.771240234375, 66.59764862060547, 3.893402099609375, 32.87519836425781, 162.92271423339844, 118.72402954101562, -1.9393844604492188, 105.39030456542969, 114.50695037841797, -11.110061645507812, 113.78846740722656, 200.48895263671875, -113.33073425292969, 15.712600708007812, -86.35821533203125, 13.963768005371094, 119.90416717529297, 174.98556518554688, -154.46917724609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000473.npy"}
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 56.910247802734375, "std": 100.32646942138672, "min": -174.01153564453125, "p10": -49.05256729125976, "median": 31.245811462402344, "p90": 185.24294281005862, "max": 251.79103088378906, "pos_frac": 0.6875, "sample": [187.08633422851562, 169.84075927734375, 140.66998291015625, -58.52168655395508, -8.04520034790039, 8.80831527709961, 176.96578979492188, 199.73590087890625, 80.42225646972656, 56.989776611328125, 101.46194458007812, 77.09370422363281, -174.01153564453125, 126.52932739257812, 58.077606201171875, 180.9416961669922, -100.2833480834961, 0.8320198059082031, -34.809967041015625, 29.213653564453125, -48.60730743408203, -11.108650207519531, 149.6876220703125, 23.987741470336914, 153.24073791503906, 28.834257125854492, -0.12275123596191406, -30.830429077148438, -7.351890563964844, -12.020668029785156, 152.1676025390625, 40.916934967041016, -155.48577880859375, 1.7170181274414062, -14.41818618774414, 174.42364501953125, 13.671379089355469, 198.89883422851562, 193.21954345703125, 3.4896011352539062, 27.349416732788086, 33.27796936035156, 221.24142456054688, 151.74012756347656, -37.40378189086914, 251.79103088378906, 130.5078125, 80.69709777832031, 139.7586669921875, -2.6225223541259766, 174.36431884765625, 229.69732666015625, -49.24339294433594, 36.77715301513672, 17.912506103515625, -19.365741729736328, 174.29139709472656, -139.84185791015625, -4.301120758056641, 90.2774658203125, 154.47030639648438, -115.45112609863281, 13.390518188476562, 9.634349822998047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000474.npy"}
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 84.00973510742188, "std": 91.7249984741211, "min": -67.08718872070312, "p10": -12.743984603881831, "median": 51.275678634643555, "p90": 189.503173828125, "max": 253.59393310546875, "pos_frac": 0.796875, "sample": [-7.838884353637695, 2.6596946716308594, -0.8566665649414062, 2.0619049072265625, 166.02996826171875, 169.82254028320312, 175.58038330078125, 190.02255249023438, -3.4171390533447266, -28.188575744628906, 168.11007690429688, -67.08718872070312, 149.8009033203125, 237.47262573242188, 11.006113052368164, -2.2419281005859375, 158.5647735595703, 184.2398223876953, 10.263347625732422, 118.02722930908203, 33.382118225097656, 8.048812866210938, -14.846170425415039, 8.217971801757812, 151.31936645507812, 7.045307159423828, 24.56334686279297, 141.07411193847656, 157.21429443359375, -5.617729187011719, 195.3812255859375, 182.17868041992188, 183.89328002929688, 188.29129028320312, 49.1595458984375, 1.9374961853027344, 155.0152130126953, 5.861259460449219, 123.25381469726562, 24.594799041748047, 239.77215576171875, 170.63833618164062, 9.992145538330078, 20.755996704101562, 148.15707397460938, -52.53132247924805, 23.91335678100586, 98.99247741699219, 253.59393310546875, 22.02521514892578, 176.7034149169922, 216.38442993164062, 213.85040283203125, 115.9782485961914, 167.95870971679688, -62.00457000732422, -59.092018127441406, 173.6160888671875, 49.49676513671875, 178.77593994140625, 1.5735054016113281, -2.4417495727539062, -36.54011535644531, 53.05459213256836], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000475.npy"}
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 57.906463623046875, "std": 93.72097778320312, "min": -127.97189331054688, "p10": -52.98071060180663, "median": 51.051910400390625, "p90": 174.5807342529297, "max": 258.8446350097656, "pos_frac": 0.6875, "sample": [195.53384399414062, 71.92341613769531, -13.795080184936523, 78.64065551757812, -59.51863098144531, 34.811431884765625, -42.003082275390625, 175.80657958984375, 258.8446350097656, 51.2657470703125, 206.65029907226562, -1.8536720275878906, 171.72042846679688, 155.13966369628906, 133.6802215576172, 129.18145751953125, 91.90988159179688, 3.6201553344726562, 61.871299743652344, 50.83807373046875, -118.9390640258789, 2.3445777893066406, 124.37835693359375, 162.53689575195312, 38.12663269042969, -127.18862915039062, 24.782447814941406, -18.32684326171875, -29.760345458984375, 12.173831939697266, 193.1392364501953, -19.134231567382812, 106.3060302734375, 17.72781753540039, 73.6698226928711, -57.11335754394531, -118.47023010253906, -127.97189331054688, 42.41236877441406, 11.963249206542969, -11.008377075195312, 141.1990509033203, 15.223358154296875, -18.61859130859375, 115.43974304199219, 59.00592041015625, 156.32650756835938, -10.003059387207031, -19.069549560546875, 155.32406616210938, 118.36686706542969, -70.15811157226562, 189.89813232421875, -26.706451416015625, 98.37348937988281, 214.06582641601562, 2.5374698638916016, 65.28469848632812, 169.4921875, 154.13787841796875, -17.914445877075195, 162.95164489746094, 158.27943420410156, -43.337867736816406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000476.npy"}
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 54.3778076171875, "std": 112.20218658447266, "min": -188.10064697265625, "p10": -129.30485382080076, "median": 47.8027229309082, "p90": 192.52828063964844, "max": 249.45916748046875, "pos_frac": 0.75, "sample": [-142.5804901123047, -98.328369140625, 152.819091796875, 62.628379821777344, 0.504638671875, 180.85452270507812, 6.226814270019531, 23.56424903869629, 39.14215087890625, 2.0710906982421875, 190.35357666015625, 46.54644775390625, -146.2874298095703, -3.6653060913085938, 22.590578079223633, 5.420650482177734, -14.623458862304688, 232.1904296875, 137.9866180419922, -78.16790771484375, 196.24974060058594, 122.957275390625, 6.09503173828125, 63.369102478027344, 102.64767456054688, -14.879005432128906, -145.65118408203125, 1.4316940307617188, 182.7681884765625, 5.292518615722656, -170.42982482910156, 17.295555114746094, 53.77516174316406, 26.03908920288086, -186.72195434570312, -39.5372428894043, -40.44499206542969, 49.058998107910156, 125.71315002441406, 41.329193115234375, 198.6517333984375, -1.8160934448242188, 193.46029663085938, 184.13177490234375, 81.521484375, 199.16796875, -179.40512084960938, 106.05380249023438, 119.94364166259766, -6.00189208984375, 249.45916748046875, 142.52296447753906, 8.14421272277832, 174.69676208496094, 195.3827667236328, 52.64521789550781, -188.10064697265625, 137.86019897460938, 181.40963745117188, 185.795166015625, 81.94820404052734, 166.6134033203125, 28.57342529296875, 151.9170684814453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000477.npy"}
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 49.316490173339844, "std": 97.84859466552734, "min": -163.51101684570312, "p10": -75.63732223510742, "median": 38.63574981689453, "p90": 179.5256072998047, "max": 210.23765563964844, "pos_frac": 0.6875, "sample": [-3.427366256713867, 174.3419647216797, 61.063934326171875, 158.45245361328125, 128.17982482910156, -2.4921951293945312, 185.82040405273438, 73.19253540039062, -16.33269691467285, 98.44937133789062, 2.676483154296875, 10.34490966796875, 89.80401611328125, 196.86215209960938, -37.28533172607422, 16.106964111328125, -90.72306823730469, 42.635711669921875, -12.478096008300781, 58.71923828125, -48.57788848876953, -35.233123779296875, 172.52459716796875, 136.7605743408203, 171.5001220703125, -74.24918365478516, 140.82125854492188, 131.479736328125, 88.50836181640625, -163.51101684570312, 64.37933349609375, 7.059741973876953, -136.59417724609375, -145.130615234375, 14.436054229736328, 142.83570861816406, 175.46905517578125, 21.9976806640625, 177.63134765625, 27.91278839111328, 185.25979614257812, -6.223703384399414, -121.46582794189453, 48.790550231933594, 180.33743286132812, 56.661834716796875, -1.8699569702148438, 210.23765563964844, 24.351638793945312, -2.3866844177246094, 192.2630157470703, 80.2592544555664, -38.628875732421875, 123.36296844482422, 3.2274932861328125, 191.35653686523438, 148.8172149658203, -34.55384063720703, 3.2242202758789062, 34.63578796386719, -76.23223876953125, -158.82293701171875, 96.46106719970703, 13.261476516723633], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000478.npy"}
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 44.27500915527344, "std": 92.04733276367188, "min": -209.09756469726562, "p10": -69.76483840942383, "median": 29.87160873413086, "p90": 160.54310607910156, "max": 250.3983154296875, "pos_frac": 0.671875, "sample": [7.555431365966797, 21.485065460205078, 120.24807739257812, -125.85411071777344, -72.44879150390625, 75.48835754394531, 200.30934143066406, 84.25698852539062, 116.4945068359375, -69.11469268798828, 50.266414642333984, -41.21451950073242, 11.96209716796875, -10.43476676940918, -13.031425476074219, 23.71704864501953, 205.0009002685547, 172.61053466796875, -38.01631164550781, 0.7765922546386719, -41.82295608520508, 49.48118209838867, 250.3983154296875, 186.59979248046875, 32.548728942871094, -124.47247314453125, 61.65653991699219, 27.194488525390625, -95.2422103881836, -25.81505584716797, 0.8751678466796875, 125.81549835205078, -3.1168289184570312, 161.19921875, -2.263031005859375, 56.18355941772461, -118.44866943359375, -9.2996826171875, 11.322288513183594, 140.98565673828125, 128.23631286621094, -209.09756469726562, -14.105979919433594, 78.91767883300781, 159.01217651367188, 171.7139892578125, 90.7441635131836, -37.250640869140625, -1.8176956176757812, 91.56621551513672, 39.170135498046875, 127.37419891357422, -70.04347229003906, 153.22128295898438, -6.097795486450195, 124.64173889160156, 116.99944305419922, 19.4971866607666, 60.12000274658203, 142.72146606445312, 1.9572734832763672, 18.827880859375, 98.17660522460938, 145.27969360351562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000479.npy"}
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 67.61965942382812, "std": 96.7030258178711, "min": -191.1873779296875, "p10": -37.66081008911133, "median": 76.31464385986328, "p90": 172.89522399902344, "max": 293.22259521484375, "pos_frac": 0.78125, "sample": [88.11527252197266, -1.18280029296875, 171.01702880859375, -156.95449829101562, 139.6343231201172, 14.261022567749023, 156.05126953125, -191.1873779296875, 35.5443115234375, 293.22259521484375, -37.66417694091797, 82.67044067382812, 163.60777282714844, -31.80321502685547, 99.12747192382812, -11.478073120117188, -37.24275207519531, 176.95339965820312, -37.6529541015625, 9.851680755615234, -109.64041137695312, 151.76419067382812, 22.18280792236328, -106.96468353271484, 147.60345458984375, 151.26405334472656, 245.22515869140625, 90.6085205078125, 169.5239715576172, 69.95884704589844, 99.94615936279297, 1.9180717468261719, 15.402839660644531, 126.60601806640625, 41.835235595703125, 89.11451721191406, 191.90464782714844, 135.92822265625, -122.27725982666016, 101.28807067871094, 165.11634826660156, 94.28893280029297, 59.344688415527344, 163.03076171875, 31.884017944335938, 135.54510498046875, 15.839111328125, 7.463165283203125, 58.93856430053711, 58.68098831176758, -4.288366317749023, 9.646026611328125, 173.70016479492188, 86.61076354980469, 152.70980834960938, 110.72926330566406, 183.9976348876953, 166.23947143554688, 14.472942352294922, -45.77107238769531, 51.79969787597656, 196.0231475830078, -7.90167236328125, 11.4757080078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000480.npy"}
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 59.488807678222656, "std": 96.7834701538086, "min": -173.69918823242188, "p10": -55.10723419189452, "median": 49.870140075683594, "p90": 175.96078491210938, "max": 271.83160400390625, "pos_frac": 0.734375, "sample": [50.700927734375, -1.6815872192382812, 174.20437622070312, 61.222434997558594, -76.4039306640625, 130.0005645751953, 154.56381225585938, 68.66813659667969, -74.59385681152344, -96.01667785644531, 168.95919799804688, 49.03935241699219, 170.76478576660156, 67.6845932006836, 8.952095031738281, 133.55078125, 170.89288330078125, 26.309707641601562, 137.1680908203125, 158.50167846679688, -171.88018798828125, -4.092658996582031, 157.73663330078125, 56.272613525390625, -0.9735832214355469, -34.38469696044922, 42.58219909667969, 170.94720458984375, 2.6692466735839844, 93.84671020507812, 0.6700458526611328, 176.755615234375, 33.365478515625, -43.60896301269531, 58.63517761230469, -11.840621948242188, 113.6995620727539, 8.119678497314453, 3.5462989807128906, 4.9315338134765625, 30.631988525390625, 36.000083923339844, 176.71353149414062, 187.45664978027344, -7.5850067138671875, 194.926513671875, -38.628143310546875, -15.028938293457031, 178.67367553710938, 172.71856689453125, 165.444091796875, 10.967117309570312, 76.75495147705078, -31.68201446533203, -60.035064697265625, 271.83160400390625, -173.69918823242188, 111.67945098876953, 94.9747314453125, 254.17300415039062, 92.36258697509766, 24.779953002929688, -98.23089599609375, 12.599864959716797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000481.npy"}
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 58.54414367675781, "std": 107.72808837890625, "min": -220.66233825683594, "p10": -67.40612487792968, "median": 45.652286529541016, "p90": 194.40104980468752, "max": 289.6829528808594, "pos_frac": 0.703125, "sample": [20.546524047851562, -220.66233825683594, -21.669090270996094, 126.02271270751953, 72.75782012939453, 142.45196533203125, 40.72064208984375, -34.236473083496094, 5.647712707519531, 202.47300720214844, 92.24737548828125, 177.036376953125, 158.43603515625, 35.967308044433594, 113.32498168945312, 139.26324462890625, 204.27044677734375, -0.5764541625976562, -178.66897583007812, 105.98920440673828, 196.8734130859375, -3.6212158203125, 289.6829528808594, -32.18580627441406, -2.0536727905273438, 135.55279541015625, 130.83724975585938, 65.61859130859375, -7.5637969970703125, 6.149066925048828, -9.329490661621094, 152.7704620361328, -87.487060546875, 109.78622436523438, 188.6322021484375, 165.66946411132812, 103.66109466552734, 27.055583953857422, -15.09262466430664, -48.187957763671875, 280.9777526855469, 96.20584106445312, 50.58393096923828, 5.49627685546875, 0.9658203125, 176.3775634765625, 55.293190002441406, 2.254159927368164, 258.727294921875, -124.81303405761719, 103.10865020751953, 21.279006958007812, 68.25970458984375, 28.565963745117188, 155.158447265625, 0.6723728179931641, -62.293495178222656, 233.2706756591797, -69.59725189208984, -96.3072509765625, 133.2544708251953, -16.440834045410156, -141.19476318359375, 38.911415100097656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000482.npy"}
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 72.47357177734375, "std": 90.72322845458984, "min": -180.34617614746094, "p10": -20.24443359375, "median": 61.865386962890625, "p90": 183.0038314819336, "max": 248.46109008789062, "pos_frac": 0.765625, "sample": [13.485710144042969, 183.62608337402344, -13.425045013427734, 87.26754760742188, 155.9192657470703, 171.3792266845703, 164.5116729736328, -36.451202392578125, -20.32562255859375, -13.43234634399414, 73.07072448730469, 162.49224853515625, 106.21337890625, 140.20835876464844, -54.7202262878418, 149.0921173095703, 62.79299545288086, -42.477455139160156, 3.1578369140625, 171.1182403564453, 45.87617492675781, 179.45645141601562, 174.61941528320312, 28.657730102539062, 187.64752197265625, 61.28704071044922, 204.5915985107422, -21.462745666503906, -2.7876129150390625, 69.715576171875, 106.08753204345703, -143.94515991210938, 179.2659149169922, 2.599458694458008, -20.05499267578125, 248.46109008789062, 9.675262451171875, -180.34617614746094, 45.708641052246094, -17.34235382080078, 62.44373321533203, 40.103904724121094, 180.28536987304688, 114.6009292602539, 1.74310302734375, 73.36869812011719, 215.87937927246094, 185.30038452148438, 70.47493743896484, 148.15223693847656, 35.144752502441406, 46.29652404785156, 36.16899871826172, 198.28900146484375, -13.020936965942383, 31.759244918823242, 60.47360610961914, 5.947290420532227, 181.55191040039062, 157.4573974609375, 15.542556762695312, -5.679176330566406, -2.0509605407714844, 176.86151123046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000483.npy"}
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 51.01061248779297, "std": 72.46576690673828, "min": -118.74713897705078, "p10": -26.0990982055664, "median": 37.718069076538086, "p90": 153.50550689697266, "max": 212.9813232421875, "pos_frac": 0.796875, "sample": [29.603355407714844, 22.8165283203125, 0.9512786865234375, 3.2300796508789062, 189.09701538085938, 5.229280471801758, 78.61009216308594, -20.48711395263672, 151.7126007080078, -0.7415695190429688, 28.19872283935547, -113.11890411376953, 65.78140258789062, 70.69636535644531, 93.28904724121094, 95.20164489746094, -59.05336380004883, -13.283878326416016, 84.32545471191406, 68.38359832763672, -0.9188022613525391, 154.27389526367188, 124.99971008300781, 6.534912109375, 96.98955535888672, 107.91874694824219, 185.02659606933594, 36.59246826171875, 90.47203063964844, 22.935914993286133, 38.84366989135742, 14.05474853515625, 1.2734832763671875, -37.197227478027344, 212.9813232421875, -9.634281158447266, 11.701421737670898, 18.19585418701172, 64.59854125976562, 2.0335540771484375, 136.99887084960938, -28.504234313964844, 204.12847900390625, -47.526214599609375, 125.37689971923828, 90.7103042602539, 174.33489990234375, -0.18076705932617188, 73.20367431640625, 57.434814453125, 189.12799072265625, 15.872394561767578, 80.08079528808594, 12.799758911132812, -66.11164093017578, -118.74713897705078, 50.3946418762207, 28.878814697265625, 112.44108581542969, 110.63001251220703, 33.114776611328125, 54.46636962890625, 39.879642486572266, 13.757064819335938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000484.npy"}
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 53.34347152709961, "std": 83.34893798828125, "min": -177.30203247070312, "p10": -17.86107940673828, "median": 22.116477966308594, "p90": 176.24170989990233, "max": 245.63226318359375, "pos_frac": 0.78125, "sample": [10.096336364746094, 101.22403717041016, 85.96684265136719, 122.79495239257812, 97.99266052246094, -4.426033020019531, 2.6170406341552734, 8.899908065795898, 42.86595153808594, -47.63233184814453, -129.80088806152344, 27.0494384765625, 180.83432006835938, 176.2803955078125, -9.0531005859375, 176.1514434814453, 7.484622955322266, 171.76339721679688, 217.17950439453125, 17.145444869995117, 15.775426864624023, 91.79803466796875, -24.119766235351562, 162.1358642578125, -5.880401611328125, 0.2780036926269531, -177.30203247070312, 93.13438415527344, 50.646705627441406, 82.07936096191406, -12.959564208984375, 0.4787330627441406, 62.41991424560547, 0.06084632873535156, 152.98486328125, -67.09843444824219, 245.63226318359375, 186.96481323242188, -26.989242553710938, 167.3800506591797, 13.838748931884766, -17.77544593811035, 40.7669677734375, 4.704246520996094, 77.80256652832031, 1.5954418182373047, -4.597761154174805, 174.73704528808594, 107.18869018554688, 13.126205444335938, 15.084911346435547, 36.023406982421875, 131.412841796875, 180.52516174316406, 72.78592681884766, 68.12399291992188, 60.697364807128906, 182.093994140625, -14.30499267578125, -17.89777946472168, 10.583587646484375, 0.6030693054199219, 17.183517456054688, 6.8268280029296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000485.npy"}
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 49.296295166015625, "std": 99.68440246582031, "min": -174.8459014892578, "p10": -59.96106948852539, "median": 38.43530464172363, "p90": 186.0922088623047, "max": 303.8211975097656, "pos_frac": 0.6875, "sample": [38.56786346435547, -3.2009429931640625, 216.50311279296875, 4.452569961547852, -12.661163330078125, -3.502429962158203, 30.084213256835938, 303.8211975097656, 17.278953552246094, -6.5301971435546875, 2.751150131225586, 150.95135498046875, 105.1273422241211, 79.21549987792969, 59.300514221191406, 26.849075317382812, -14.450599670410156, 187.06475830078125, -1.4600753784179688, 175.01022338867188, 121.16248321533203, 39.510955810546875, -49.407955169677734, -50.008819580078125, 198.2357177734375, -123.48786926269531, 34.61480712890625, -72.52537536621094, 64.19622039794922, -173.52310180664062, 185.91543579101562, 64.99699401855469, 35.95849609375, -128.52908325195312, 2.489980697631836, 127.75347137451172, 68.1756362915039, 145.32559204101562, -116.97936248779297, 38.3027458190918, 0.3344917297363281, 172.5370635986328, 47.05438232421875, 186.16796875, 184.25570678710938, -60.455406188964844, 75.20402526855469, 147.47996520996094, 55.5250244140625, 175.2554473876953, 69.70834350585938, -24.051345825195312, 18.670761108398438, 203.53414916992188, -4.502021789550781, -58.8076171875, 2.643321990966797, 98.86177062988281, 213.3031768798828, -13.043146133422852, 39.461402893066406, -174.8459014892578, 64.85452270507812, -31.532630920410156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000486.npy"}
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 61.060760498046875, "std": 92.23595428466797, "min": -151.0253448486328, "p10": -42.893116760253896, "median": 44.268096923828125, "p90": 179.78656005859375, "max": 326.5586242675781, "pos_frac": 0.75, "sample": [107.57252502441406, -51.563751220703125, 130.9473876953125, 103.52496337890625, 70.8729019165039, 10.839811325073242, -56.62568664550781, 15.413803100585938, 108.8135757446289, 144.5759735107422, -13.932769775390625, -34.41850280761719, 31.722129821777344, -46.22224426269531, -9.059440612792969, 182.49542236328125, 124.52705383300781, 28.23324966430664, 285.4844970703125, 87.1962890625, -4.042579650878906, 143.034912109375, 36.94819641113281, 184.9493865966797, 134.76278686523438, -15.308879852294922, 52.061309814453125, -135.56387329101562, 16.120338439941406, 39.579437255859375, 163.431640625, 62.17749786376953, 68.34515380859375, -0.2497386932373047, 180.83135986328125, 11.630882263183594, 123.03335571289062, -48.7364501953125, 47.32623291015625, 106.47216796875, 156.1331787109375, 26.563953399658203, 51.102752685546875, 192.05975341796875, -141.21449279785156, 129.29721069335938, 0.8949737548828125, -35.125152587890625, 78.183349609375, -151.0253448486328, -10.495813369750977, 177.34869384765625, 22.29975128173828, 41.2099609375, 8.245010375976562, 151.24923706054688, 25.480632781982422, 29.10015869140625, -14.70068359375, 181.65745544433594, 103.3212661743164, 37.192222595214844, 326.5586242675781, 135.35162353515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000487.npy"}
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 65.61809539794922, "std": 90.2819595336914, "min": -165.0917510986328, "p10": -27.36950378417968, "median": 43.946041107177734, "p90": 189.00410919189454, "max": 230.7325439453125, "pos_frac": 0.703125, "sample": [21.312347412109375, -19.01671600341797, -2.411355972290039, -4.366445541381836, 25.971458435058594, -66.40042877197266, -6.558357238769531, 180.20697021484375, 177.9126434326172, 79.62088012695312, -0.13027381896972656, 9.148834228515625, 183.24227905273438, 12.727012634277344, 7.096458435058594, 212.30108642578125, -21.21247100830078, 175.30323791503906, 4.68359375, 105.62547302246094, 114.4673080444336, 36.63733673095703, 5.477622985839844, -5.9659271240234375, -59.937538146972656, 42.26628875732422, -78.63656616210938, 189.29754638671875, -3.3388214111328125, 186.85305786132812, 28.296737670898438, -62.79217529296875, -12.531759262084961, 169.9280548095703, 195.9793243408203, 108.32195281982422, 178.41058349609375, -30.00823211669922, 202.45553588867188, 187.32469177246094, 200.59841918945312, 143.76010131835938, 163.35427856445312, -4.754150390625, 70.47595977783203, 116.40319061279297, -7.9731903076171875, 73.77629089355469, -41.20790100097656, 99.40470886230469, 79.294189453125, 93.508056640625, -165.0917510986328, 98.55511474609375, 5.011810302734375, 67.66144561767578, 230.7325439453125, 52.12618637084961, 189.25523376464844, 24.94192123413086, 10.829292297363281, 45.62579345703125, -2.7085800170898438, 188.41815185546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000488.npy"}
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 41.244510650634766, "std": 98.19387817382812, "min": -187.68692016601562, "p10": -67.57996406555175, "median": 21.065101623535156, "p90": 169.54671936035157, "max": 239.34921264648438, "pos_frac": 0.640625, "sample": [-153.1208038330078, 159.2813720703125, 14.403701782226562, 156.70907592773438, 131.09347534179688, 174.80169677734375, -33.517730712890625, 132.967529296875, 51.352901458740234, 215.677490234375, 109.20404815673828, 15.934532165527344, 119.11367797851562, -71.15499877929688, -4.584903717041016, -35.01646423339844, -18.161590576171875, -38.627017974853516, 157.90289306640625, 49.02454376220703, 183.028076171875, 73.63285827636719, 133.6622772216797, 15.427410125732422, 135.70025634765625, 139.23374938964844, -8.543861389160156, -45.385154724121094, -28.610870361328125, 42.75846481323242, -172.0526123046875, -59.238216400146484, 170.62069702148438, 18.96215057373047, 4.4114990234375, -12.811111450195312, -24.83795928955078, -1.4663715362548828, 167.040771484375, 153.5501708984375, 36.89857482910156, 75.52386474609375, 7.790596008300781, 239.34921264648438, 52.91712188720703, -108.16909790039062, 185.42587280273438, 11.738052368164062, 109.72528839111328, 66.18611145019531, -144.02908325195312, 13.018623352050781, -12.94216537475586, 9.667623519897461, -4.248786926269531, -187.68692016601562, 156.333251953125, 23.168052673339844, -105.71862030029297, 51.240936279296875, -58.512413024902344, 176.06967163085938, -2.312009811401367, 29.849197387695312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000489.npy"}
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 49.7767333984375, "std": 112.27972412109375, "min": -227.92498779296875, "p10": -105.76571960449218, "median": 46.32100868225098, "p90": 189.8873748779297, "max": 259.548583984375, "pos_frac": 0.65625, "sample": [-32.39215850830078, -6.7800140380859375, -227.92498779296875, 1.4086380004882812, -7.314264297485352, 89.12423706054688, 28.366287231445312, 181.0442352294922, 192.60333251953125, 221.12506103515625, -165.91378784179688, 186.12142944335938, -185.85794067382812, 6.327802658081055, 48.772003173828125, 82.15589141845703, 136.64663696289062, 94.3409652709961, 242.91546630859375, 25.665870666503906, 19.19261932373047, 189.0340576171875, 160.3893280029297, 202.71051025390625, -112.11872863769531, -28.593994140625, 177.14695739746094, -2.066791534423828, 43.87001419067383, 21.743812561035156, -132.752685546875, 105.89755249023438, 79.16061401367188, 101.71137237548828, -2.003599166870117, 4.0266876220703125, -14.278861999511719, 190.25308227539062, -114.29104614257812, -12.209686279296875, 109.3473892211914, -76.79527282714844, 72.83590698242188, 142.0318603515625, -5.794145584106445, 127.8038330078125, 27.326248168945312, 4.079042434692383, -72.59217071533203, -12.519561767578125, 259.548583984375, 147.29086303710938, -12.372631072998047, 102.49459838867188, 207.2506561279297, 126.85894775390625, 187.37271118164062, 60.924400329589844, 59.25419616699219, 180.19354248046875, -187.2337646484375, -90.94203186035156, -16.572433471679688, 58.664119720458984], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000490.npy"}
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 57.37417984008789, "std": 87.49767303466797, "min": -191.77235412597656, "p10": -39.12135848999023, "median": 32.58068084716797, "p90": 175.5611572265625, "max": 279.680419921875, "pos_frac": 0.71875, "sample": [154.2594451904297, -15.929893493652344, 120.24078369140625, 31.20245361328125, 165.30142211914062, 95.0680923461914, -1.389261245727539, 133.96517944335938, 160.12823486328125, 42.60829544067383, 14.4481201171875, -102.89749908447266, 184.64854431152344, 40.285743713378906, 165.78956604003906, -63.06914520263672, 14.033878326416016, 118.16812133789062, 7.988658905029297, 102.68232727050781, -49.665069580078125, 111.25723266601562, 182.54783630371094, -4.920063018798828, 60.113243103027344, 146.37850952148438, 11.392112731933594, -9.263290405273438, -5.454841613769531, 7.594211578369141, 184.1997833251953, 107.07646179199219, -31.832427978515625, 43.120948791503906, 141.12277221679688, 22.64664077758789, 22.390357971191406, 279.680419921875, 33.95890808105469, -42.24518585205078, 176.70181274414062, 5.9655303955078125, 28.156410217285156, -9.330087661743164, 1.5550193786621094, 102.69155883789062, -11.8541259765625, -56.93414306640625, 96.80123138427734, 186.8532257080078, -1.3833389282226562, 134.59690856933594, 192.6724395751953, 168.98806762695312, 8.239648818969727, -2.5554046630859375, 16.41687774658203, 64.16287994384766, 70.11965942382812, -191.77235412597656, -47.92756652832031, 4.780969619750977, -15.528812408447266, 172.89962768554688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000491.npy"}
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 44.112030029296875, "std": 98.88451385498047, "min": -209.31678771972656, "p10": -63.95287704467772, "median": 19.069828033447266, "p90": 187.36629028320314, "max": 242.40309143066406, "pos_frac": 0.6875, "sample": [242.40309143066406, 180.82220458984375, -110.64730834960938, 0.903106689453125, 197.80914306640625, 0.8479518890380859, 118.04440307617188, 62.875858306884766, 75.20143127441406, -1.5074939727783203, 124.06489562988281, 190.1708984375, 41.521728515625, -26.442649841308594, 13.421707153320312, 103.96865844726562, -190.7701416015625, 106.89279174804688, 3.6883544921875, 85.26325988769531, 149.59866333007812, -69.04706573486328, -1.899749755859375, -25.22644805908203, 18.844200134277344, 90.3316650390625, 107.98795318603516, 200.50607299804688, -52.066436767578125, -8.79815673828125, 177.17767333984375, 13.97618293762207, -6.461143493652344, 78.89442443847656, -1.1597652435302734, 17.782028198242188, -209.31678771972656, 203.7022705078125, -73.46746826171875, 0.16950225830078125, 118.28414916992188, 3.5116806030273438, 88.62337493896484, 3.800159454345703, -38.55085754394531, 144.05404663085938, 30.966487884521484, 59.01716613769531, -43.91724395751953, 143.80006408691406, -182.63323974609375, -31.219322204589844, -9.280120849609375, 144.94969177246094, 196.06027221679688, 13.049419403076172, 124.09933471679688, 37.352752685546875, 113.92744445800781, 19.295455932617188, 3.2268314361572266, -106.68562316894531, -29.544570922851562, 190.92303466796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000492.npy"}
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 68.136962890625, "std": 95.61085510253906, "min": -178.29672241210938, "p10": -46.75055694580078, "median": 55.48656463623047, "p90": 192.0588836669922, "max": 221.03384399414062, "pos_frac": 0.734375, "sample": [7.525932312011719, 17.109783172607422, -44.695343017578125, 162.1589813232422, -18.546043395996094, 69.48651123046875, -157.10116577148438, 12.00997543334961, 2.2340869903564453, -29.60419464111328, 72.85514831542969, 221.03384399414062, 153.05459594726562, 192.88714599609375, 54.201778411865234, -59.60673904418945, 143.45513916015625, -1.8930511474609375, 116.14750671386719, 166.14669799804688, 44.44141387939453, 28.46221160888672, -0.9763965606689453, 38.807342529296875, 55.265480041503906, -80.96461486816406, 148.9983367919922, -1.8701171875, 193.65060424804688, 141.07550048828125, 132.4214630126953, 168.8565216064453, -79.3580322265625, -21.253803253173828, -70.78932189941406, 136.62496948242188, 181.7483673095703, -27.561920166015625, 190.13253784179688, 0.9381656646728516, 213.43283081054688, 100.11677551269531, 143.6159210205078, 188.79379272460938, 57.91010284423828, 197.2970428466797, -47.63136291503906, 192.88446044921875, 137.75994873046875, -2.8398666381835938, -9.001449584960938, 157.14141845703125, 45.79851531982422, 16.668960571289062, 200.73968505859375, 146.939453125, 36.960906982421875, 33.581817626953125, 135.92697143554688, 187.6522216796875, 8.430335998535156, 83.66732788085938, 55.70764923095703, -178.29672241210938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000493.npy"}
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 56.99005889892578, "std": 105.8646469116211, "min": -215.93978881835938, "p10": -59.17877845764159, "median": 46.15608787536621, "p90": 181.30137481689454, "max": 291.925048828125, "pos_frac": 0.6875, "sample": [129.158203125, 34.325294494628906, 44.32445526123047, 27.812179565429688, 175.77374267578125, 6.2464141845703125, -9.557769775390625, 162.3254852294922, 105.97920989990234, -108.63253784179688, 12.501640319824219, -30.800392150878906, 2.5803680419921875, 125.29236602783203, 105.11650085449219, 67.26248168945312, 173.6504669189453, -2.4590682983398438, 47.98772048950195, -64.30319213867188, 16.954906463623047, 291.925048828125, 132.66000366210938, -17.30712890625, 42.53534698486328, 180.26278686523438, 185.20591735839844, -4.087287902832031, 138.37979125976562, 180.8318634033203, 167.2613525390625, -181.19110107421875, -191.91683959960938, 165.56216430664062, -5.153749465942383, -127.38436889648438, -76.63904571533203, 168.56051635742188, 19.771499633789062, 148.0606689453125, 48.65019226074219, 0.07412528991699219, -215.93978881835938, -2.0952110290527344, 28.82839584350586, 48.066078186035156, 205.4557342529297, -1.4062023162841797, -25.68883514404297, 30.096572875976562, 185.97772216796875, 66.21743774414062, 90.95671081542969, 227.17578125, 49.01201629638672, 83.31055450439453, -39.127410888671875, -5.622699737548828, 126.17111206054688, 157.04583740234375, 227.62838745117188, -10.57952880859375, -47.2218132019043, 181.50259399414062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000494.npy"}
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 46.89491271972656, "std": 101.37512969970703, "min": -172.7947540283203, "p10": -65.54985847473144, "median": 22.327869415283203, "p90": 182.83912353515626, "max": 318.7225341796875, "pos_frac": 0.6875, "sample": [-66.5946273803711, -70.7607192993164, -172.7947540283203, -152.8416748046875, 154.0736846923828, -5.406463623046875, -127.48480224609375, 24.385971069335938, 88.8370361328125, -63.09181213378906, 0.7061233520507812, -143.09625244140625, -8.570053100585938, 91.28309631347656, 70.33712005615234, 181.47723388671875, -22.56676483154297, 20.46142578125, 125.11133575439453, -2.3060741424560547, 27.613361358642578, 121.28135681152344, 154.00303649902344, 2.869211196899414, 158.90853881835938, 183.42279052734375, -18.409290313720703, 93.09783172607422, -0.23102951049804688, 58.34696578979492, 318.7225341796875, 6.980859756469727, 24.194313049316406, -63.112064361572266, 18.133169174194336, 37.26495361328125, 4.2559051513671875, 144.3612518310547, 261.4027404785156, 158.79244995117188, -70.1534423828125, -6.185478210449219, 52.11149978637695, 65.66781616210938, 202.8970489501953, 2.6076278686523438, 70.32965850830078, -4.0011138916015625, 1.09161376953125, 185.81480407714844, 2.723295211791992, 155.84568786621094, 56.34581756591797, -29.926040649414062, -45.88518524169922, 259.6288146972656, 65.34359741210938, -1.1472091674804688, 7.279670715332031, 44.39007568359375, 121.55098724365234, 12.472824096679688, 235.30775451660156, 4.106403350830078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000495.npy"}
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 56.844696044921875, "std": 108.3113021850586, "min": -196.29010009765625, "p10": -71.6867904663086, "median": 32.754990577697754, "p90": 190.018083190918, "max": 434.251953125, "pos_frac": 0.71875, "sample": [121.00930786132812, 129.18569946289062, 196.694091796875, 169.04713439941406, -98.1129379272461, 27.142822265625, 84.7071533203125, 237.72048950195312, 19.381433486938477, 114.46141052246094, -82.73869323730469, 36.63689422607422, 46.90441131591797, 41.58636474609375, 5.002058029174805, -10.900123596191406, -5.540912628173828, 28.87308692932129, 166.24806213378906, 23.212303161621094, -53.171485900878906, 434.251953125, -3.3377037048339844, 191.0267333984375, 80.1728515625, -196.29010009765625, 256.7842712402344, 129.4923095703125, -1.5675621032714844, 195.48193359375, -90.74860382080078, 9.143451690673828, 187.66456604003906, 179.92291259765625, 8.429244995117188, 1.5739917755126953, -109.43563842773438, -8.428300857543945, 43.102874755859375, 122.09101104736328, 19.615386962890625, 65.88018798828125, 177.2884521484375, 92.9487075805664, 141.778076171875, 42.79222106933594, -29.886672973632812, 88.6868667602539, -16.63713836669922, 2.433116912841797, 203.10353088378906, -157.79949951171875, -30.786300659179688, 3.5883846282958984, 159.264892578125, 170.52268981933594, 7.573020935058594, 145.60040283203125, -68.89778137207031, 10.002716064453125, -72.882080078125, 52.41924285888672, 9.666404724121094, -4.893119812011719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000496.npy"}
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 67.83000946044922, "std": 93.81342315673828, "min": -173.18246459960938, "p10": -18.610892868041987, "median": 64.09584426879883, "p90": 182.16270904541017, "max": 228.15460205078125, "pos_frac": 0.6875, "sample": [177.510986328125, 37.295448303222656, 228.15460205078125, 166.14041137695312, 185.34814453125, 17.763137817382812, -5.1829986572265625, -5.8120269775390625, 8.110551834106445, 91.73471069335938, 129.51498413085938, 139.71926879882812, 101.21573638916016, -5.8740692138671875, 149.56515502929688, -2.7151412963867188, 108.23008728027344, -171.6136474609375, 175.41879272460938, 197.25009155273438, -3.072406768798828, -0.3007164001464844, -1.9849319458007812, -6.184684753417969, 57.25108337402344, -29.593612670898438, -13.093505859375, 184.07162475585938, 144.4789276123047, 164.26864624023438, -5.640275955200195, 83.98143005371094, 120.45352172851562, 193.0454864501953, -173.18246459960938, 170.04318237304688, 173.20001220703125, 91.22827911376953, 83.34829711914062, -30.653762817382812, 172.89443969726562, 124.15985870361328, -7.887348175048828, 68.26467895507812, 66.18806457519531, -20.3824462890625, 205.10247802734375, 143.8621826171875, -6.9399566650390625, 28.888885498046875, 47.45487594604492, 33.57647705078125, 56.534698486328125, 147.50314331054688, 62.003623962402344, 207.9000701904297, -128.72314453125, 2.8196487426757812, -14.47726821899414, 177.7085723876953, 45.0396728515625, 38.98670196533203, -129.83465576171875, 97.03897094726562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000497.npy"}
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 65.3443374633789, "std": 86.75074005126953, "min": -232.21975708007812, "p10": -17.441781616210935, "median": 49.46941947937012, "p90": 187.70516662597657, "max": 226.8970489501953, "pos_frac": 0.828125, "sample": [-19.754989624023438, 95.92176055908203, 162.2802276611328, 80.46958923339844, 45.0919189453125, -4.039283752441406, 154.1116943359375, 152.45892333984375, -9.920795440673828, -46.375152587890625, 168.93077087402344, 53.846920013427734, 71.34249877929688, -18.19573211669922, 146.5145263671875, 54.751312255859375, 6.96966552734375, -51.73578643798828, 76.0784683227539, 133.26473999023438, 226.8970489501953, 38.72199249267578, 182.01394653320312, 15.97787857055664, 60.514060974121094, 30.388870239257812, -27.47850799560547, 166.19158935546875, 160.98428344726562, 37.19721221923828, 193.0625762939453, 175.45538330078125, 1.3950309753417969, 15.282882690429688, 64.3208999633789, -73.1708984375, -5.9297027587890625, 54.745452880859375, 1.7461681365966797, 7.274148941040039, 86.76850891113281, 185.59217834472656, -232.21975708007812, 136.66226196289062, 89.93204498291016, 42.65038299560547, 2.232593536376953, 38.77454376220703, 13.152362823486328, 60.915042877197266, 205.51611328125, 188.61073303222656, 84.64461517333984, 13.779655456542969, 189.58778381347656, 4.314657211303711, 224.11485290527344, 18.35906982421875, 5.998208999633789, 10.359283447265625, 24.450927734375, -15.682563781738281, 17.94823455810547, 207.9740753173828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000498.npy"}
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 63.620235443115234, "std": 110.61506652832031, "min": -182.36257934570312, "p10": -98.2759956359863, "median": 76.48042678833008, "p90": 196.10716400146487, "max": 314.70892333984375, "pos_frac": 0.75, "sample": [115.90592193603516, 197.6756591796875, -177.89993286132812, 215.986083984375, 14.443069458007812, 73.45004272460938, 28.571565628051758, -182.36257934570312, 16.993595123291016, 114.02998352050781, 79.51081085205078, 150.15017700195312, 162.9517364501953, -135.7811737060547, 153.58111572265625, 4.490562438964844, 314.70892333984375, -113.16724395751953, 192.4473419189453, 179.22787475585938, 58.86949157714844, -34.659698486328125, -36.833656311035156, -109.30216979980469, -72.54825592041016, 114.05535888671875, 50.329254150390625, 201.76820373535156, -46.11351013183594, 84.54029083251953, -34.77174758911133, 19.217670440673828, 113.69044494628906, 143.51808166503906, 71.5537109375, -22.8751220703125, 26.67596435546875, 128.9024658203125, 4.758092880249023, 82.29743957519531, 205.70785522460938, 9.675819396972656, 191.60894775390625, 61.96269607543945, -131.62403869628906, 95.52457427978516, 96.05948638916016, 91.87157440185547, 171.57528686523438, 225.2732391357422, 186.19741821289062, 10.718994140625, 65.533935546875, -64.59445190429688, -170.4664306640625, 129.06875610351562, 110.08180236816406, 145.42831420898438, -41.73175048828125, 223.4219207763672, -2.8996810913085938, 170.81954956054688, 124.51367950439453, 19.981613159179688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000499.npy"}
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 59.02923583984375, "std": 107.75094604492188, "min": -269.74688720703125, "p10": -83.47407913208008, "median": 71.71420288085938, "p90": 182.82216186523436, "max": 225.4955596923828, "pos_frac": 0.765625, "sample": [-77.74046325683594, 130.65133666992188, 14.182914733886719, 5.66901969909668, 116.3997802734375, 94.57736206054688, 89.21461486816406, 141.32635498046875, 98.71456909179688, 187.63528442382812, 74.3643798828125, 180.5616455078125, 175.71351623535156, -85.93134307861328, 182.87474060058594, 15.798263549804688, 179.0107879638672, 55.66471862792969, 24.511573791503906, -15.318389892578125, 196.17933654785156, 166.32186889648438, 19.472267150878906, 121.71015930175781, -8.951492309570312, 184.9045867919922, 84.91960144042969, 7.329273223876953, -38.344932556152344, -129.90756225585938, -138.14462280273438, 73.01319885253906, 15.200130462646484, 36.17414855957031, -137.61563110351562, 4.137487411499023, 78.62368774414062, 225.4955596923828, 191.96365356445312, -0.42559051513671875, 127.52775573730469, 14.803945541381836, 176.93431091308594, 45.66518020629883, -51.482669830322266, 136.26025390625, 70.41520690917969, -269.74688720703125, 176.37435913085938, -186.43881225585938, 205.4777374267578, -28.27056884765625, 2.4497299194335938, 153.65768432617188, 153.2267608642578, 6.763088226318359, 2.6931991577148438, 58.01292419433594, 114.87056732177734, -33.254295349121094, -129.69403076171875, 134.38449096679688, 182.69947814941406, 174.60574340820312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000500.npy"}
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 52.8176155090332, "std": 73.7641372680664, "min": -160.8614044189453, "p10": -11.046379089355469, "median": 31.381546020507812, "p90": 170.32501831054694, "max": 205.15643310546875, "pos_frac": 0.75, "sample": [22.806419372558594, 179.91983032226562, -11.268875122070312, 48.03500747680664, 115.72361755371094, 123.16990661621094, -18.272964477539062, 27.049095153808594, 7.549571990966797, 135.28024291992188, 155.5656280517578, 20.390287399291992, 78.33786010742188, 4.318084716796875, -12.210309982299805, -6.272222518920898, 60.22875213623047, 175.95748901367188, -4.565315246582031, -5.79533576965332, 32.529640197753906, 0.7382278442382812, -3.3712425231933594, 203.9869842529297, 181.2069091796875, 151.86959838867188, 150.17648315429688, 205.15643310546875, 0.018085479736328125, 4.1900787353515625, 41.988319396972656, -39.40838623046875, -3.8807754516601562, 0.001300811767578125, 65.58053588867188, 48.27972412109375, 109.98948669433594, 11.873512268066406, 3.104900360107422, 76.13999938964844, 51.506866455078125, 157.18258666992188, 15.47146224975586, 87.81279754638672, -8.699699401855469, -160.8614044189453, 59.660064697265625, 90.18519592285156, -58.331111907958984, 30.23345184326172, 38.650848388671875, 61.307594299316406, 14.306312561035156, -14.043777465820312, 81.09654235839844, -8.911600112915039, 18.512054443359375, 4.172296524047852, 188.76223754882812, 178.0401153564453, 76.05879211425781, -0.4842491149902344, 153.11048889160156, -10.5272216796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000501.npy"}
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 55.175052642822266, "std": 84.31319427490234, "min": -157.87110900878906, "p10": -41.52449569702147, "median": 42.97465896606445, "p90": 181.05060577392578, "max": 219.9137420654297, "pos_frac": 0.75, "sample": [-30.576148986816406, 2.682353973388672, 14.027587890625, -64.62666320800781, 157.28451538085938, 126.26106262207031, 8.909049987792969, 28.134735107421875, -115.03121185302734, 81.33502197265625, 19.110820770263672, 192.26025390625, -5.583961486816406, -49.93524932861328, 99.05123901367188, 219.9137420654297, 19.13311767578125, -157.87110900878906, -13.976936340332031, 81.08387756347656, 20.581863403320312, -14.617610931396484, 36.23680114746094, 159.51548767089844, 181.9942169189453, 82.8431396484375, 147.94491577148438, -15.586261749267578, -1.8753719329833984, 42.3167724609375, 18.218116760253906, 5.881406784057617, 101.76866149902344, 4.170309066772461, 104.365478515625, 90.60403442382812, 185.3974609375, 205.3341522216797, -46.216644287109375, 3.6778907775878906, -17.853233337402344, 43.632545471191406, 72.42321014404297, 7.11590576171875, 94.54879760742188, -7.313446044921875, 26.39654541015625, 181.27017211914062, 84.8098373413086, -97.54054260253906, 80.53915405273438, 66.20608520507812, 158.407958984375, 91.76763916015625, 212.97760009765625, -67.68814086914062, 54.10302734375, 163.9110107421875, -13.419815063476562, 180.5382843017578, 4.026279449462891, 65.5377426147461, 125.3253173828125, 97.34044647216797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000502.npy"}
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 59.56558609008789, "std": 95.37307739257812, "min": -165.06039428710938, "p10": -42.437626647949216, "median": 33.292049407958984, "p90": 184.95993194580078, "max": 202.8287353515625, "pos_frac": 0.78125, "sample": [147.56890869140625, 4.232570648193359, -24.779571533203125, 70.49095153808594, 118.42467498779297, 21.75196075439453, 202.8287353515625, -165.06039428710938, 0.25722503662109375, 118.93273162841797, 153.00680541992188, -12.315444946289062, 22.115882873535156, 187.17320251464844, -37.48542785644531, 98.30706787109375, -44.55999755859375, 7.66131591796875, 199.2884063720703, -66.26506805419922, 12.753288269042969, 130.57431030273438, 176.4237823486328, 34.93190002441406, 4.037147521972656, 28.290283203125, 132.7508544921875, 193.52725219726562, 3.860675811767578, 120.87842559814453, -14.246959686279297, 69.63859558105469, 188.29388427734375, 186.6271209716797, 184.61148071289062, -13.076072692871094, 124.413330078125, -19.55280303955078, -129.7751922607422, 31.652198791503906, 1.8982048034667969, 23.74212646484375, 4.366266250610352, 102.20911407470703, -1.5373268127441406, 10.973190307617188, 55.458953857421875, 185.10926818847656, 129.89065551757812, 8.377792358398438, 175.6712188720703, 173.32516479492188, -140.2876739501953, 7.494010925292969, 172.1607208251953, 139.4459686279297, 4.13713264465332, -87.28146362304688, 139.3607635498047, 178.43321228027344, 13.581062316894531, 102.8965072631836, 123.1537857055664, -158.56919860839844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000503.npy"}
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 77.62858581542969, "std": 84.6902084350586, "min": -133.76531982421875, "p10": -10.119729232788085, "median": 70.39220428466797, "p90": 187.78314056396488, "max": 250.03631591796875, "pos_frac": 0.828125, "sample": [10.777191162109375, 180.2086181640625, -12.494522094726562, 129.68943786621094, 213.50230407714844, -10.581954956054688, 133.8040313720703, 26.34712791442871, 190.46214294433594, 7.361322402954102, 40.034027099609375, 25.368255615234375, 13.593917846679688, 195.8695526123047, 31.916248321533203, 160.6978302001953, 167.9557647705078, -109.37725067138672, 110.53750610351562, 149.36007690429688, -9.041202545166016, 102.69331359863281, 2.9625930786132812, 170.5518341064453, 196.49127197265625, -22.1639404296875, 83.9939956665039, 179.9533233642578, 102.14495849609375, 250.03631591796875, 181.53213500976562, 216.14138793945312, -4.322635650634766, 56.59529495239258, 79.31597900390625, -7.5855560302734375, 5.3787689208984375, -115.22319793701172, 162.52899169921875, 73.44509887695312, 166.86810302734375, -13.336990356445312, -2.826862335205078, 18.037506103515625, 46.598419189453125, 168.72837829589844, 94.00108337402344, 54.59398651123047, 66.85971069335938, 119.86224365234375, 133.2071533203125, 58.7585563659668, -133.76531982421875, 88.4619140625, 128.28634643554688, 24.15099334716797, 14.450361251831055, 67.33930969238281, 125.42355346679688, 11.761972427368164, 17.03690528869629, 199.99777221679688, 90.02491760253906, 63.24913787841797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000504.npy"}
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 49.71614074707031, "std": 104.28145599365234, "min": -217.80889892578125, "p10": -59.36326675415038, "median": 25.761381149291992, "p90": 183.5351043701172, "max": 251.75830078125, "pos_frac": 0.6875, "sample": [117.75360107421875, 1.0613136291503906, 103.99345397949219, 26.585140228271484, 1.0014533996582031, 251.75830078125, -164.42007446289062, 223.8160858154297, 161.22422790527344, 180.5655975341797, 184.73260498046875, -18.927207946777344, 29.24188995361328, 0.9214935302734375, 18.13037109375, -16.57421875, -15.997451782226562, -9.14181900024414, 124.5160140991211, 4.449901580810547, -89.12165832519531, 172.2217254638672, 140.28082275390625, 23.96689224243164, 77.9213638305664, 221.66171264648438, 7.635398864746094, -62.75477600097656, 32.892913818359375, -0.18761825561523438, -36.0308837890625, -20.55377960205078, 180.74093627929688, -129.08889770507812, 196.32354736328125, 160.851806640625, 131.663818359375, 175.6225128173828, -0.6081981658935547, -15.614883422851562, 24.9376220703125, 14.897636413574219, -8.142890930175781, -145.07498168945312, 29.167266845703125, 62.75581359863281, 75.81237030029297, -10.678239822387695, 199.90086364746094, -139.89675903320312, -18.28022003173828, 167.2208251953125, 178.89913940429688, 53.654632568359375, 174.547119140625, 27.228805541992188, 139.18011474609375, 11.947463989257812, 9.510810852050781, 196.17169189453125, 33.53467559814453, 1.28436279296875, -51.449745178222656, -217.80889892578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000505.npy"}
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 55.080204010009766, "std": 101.44401550292969, "min": -238.42886352539062, "p10": -42.32557716369629, "median": 53.705970764160156, "p90": 177.66859130859376, "max": 232.64410400390625, "pos_frac": 0.734375, "sample": [40.00659942626953, 146.96249389648438, 5.015384674072266, -17.9281005859375, 166.8931121826172, 175.24208068847656, 70.19364929199219, -44.28835678100586, 154.27365112304688, -37.745758056640625, 2.952878952026367, 81.51888275146484, 48.242759704589844, 28.734169006347656, 79.08915710449219, 103.04275512695312, -177.6556396484375, 209.92640686035156, 230.650146484375, 14.487373352050781, 173.332763671875, -138.21035766601562, -7.6793365478515625, -238.42886352539062, -145.38128662109375, -177.24688720703125, 23.804229736328125, 187.9789581298828, 191.65719604492188, 177.1129150390625, 184.06976318359375, -25.687759399414062, 232.64410400390625, -17.74365234375, 177.90673828125, 32.99059295654297, 150.83973693847656, 74.44376373291016, 136.162109375, -27.329696655273438, -63.198028564453125, 6.9081878662109375, 7.155189514160156, 10.572914123535156, 97.4161376953125, 74.57957458496094, 0.11832046508789062, 41.997459411621094, 121.66474914550781, 60.65509796142578, 59.16918182373047, 149.62954711914062, -4.45513916015625, 142.25033569335938, 120.84129333496094, -1.8938369750976562, 1.3884201049804688, 119.07569122314453, -6.522300720214844, -6.449825286865234, 63.13738250732422, 142.94482421875, 101.014404296875, 42.284881591796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000506.npy"}
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 58.5622673034668, "std": 104.04167175292969, "min": -207.12396240234375, "p10": -55.172533416748045, "median": 37.672515869140625, "p90": 203.21419525146487, "max": 295.44677734375, "pos_frac": 0.671875, "sample": [167.29876708984375, 1.589996337890625, -13.027999877929688, 29.670318603515625, 295.44677734375, -6.338554382324219, 233.05975341796875, -207.12396240234375, -81.3839111328125, 109.29145812988281, 77.34916687011719, 51.63175964355469, -51.09356689453125, 140.0614013671875, -56.92066192626953, -17.571044921875, 93.38539123535156, -14.488960266113281, 179.1466827392578, 26.275135040283203, 37.33746337890625, 38.007568359375, 172.85391235351562, -37.82099914550781, 190.24966430664062, -8.208902359008789, 65.52967071533203, 207.56256103515625, 88.84175109863281, 2.025726318359375, 46.17486572265625, 29.925567626953125, 247.79974365234375, -50.29546356201172, 188.58831787109375, 123.03524017333984, -49.28948974609375, 193.06800842285156, -82.61197662353516, -0.1366119384765625, -26.387237548828125, 10.7926025390625, -3.1472396850585938, 138.5831756591797, 123.99874877929688, 31.192642211914062, 39.013038635253906, 10.4361572265625, 285.0506591796875, 212.38861083984375, -65.1627197265625, 106.98568725585938, -40.19135284423828, 3.0487747192382812, 110.94256591796875, -9.895896911621094, 100.18212890625, -93.60464477539062, 58.289207458496094, -108.76771545410156, 14.1221923828125, 208.29959106445312, 107.06619262695312, 175.85548400878906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000507.npy"}
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 64.44580078125, "std": 98.19229125976562, "min": -166.16152954101562, "p10": -57.37180099487304, "median": 50.422203063964844, "p90": 194.81290283203126, "max": 228.3653564453125, "pos_frac": 0.71875, "sample": [158.10028076171875, 17.499114990234375, 139.14051818847656, 29.773422241210938, 30.396591186523438, -51.71783447265625, 219.142822265625, -2.0878677368164062, 130.89730834960938, 195.42874145507812, 226.8892059326172, 167.65921020507812, 41.05500793457031, 16.0885009765625, -17.437744140625, 129.002685546875, -166.16152954101562, 32.71685791015625, 32.27455139160156, -65.67951965332031, 26.545654296875, 11.457687377929688, 147.85726928710938, -59.79492950439453, -61.96471405029297, -147.5326385498047, 76.18679809570312, 0.5269184112548828, 228.3653564453125, 101.42269897460938, -26.124725341796875, 200.2639617919922, 82.73994445800781, 85.74869537353516, 144.388427734375, 69.35421752929688, 98.74162292480469, 61.571502685546875, 179.39117431640625, -18.66808319091797, -2.168731689453125, -15.282829284667969, 172.009521484375, 188.61318969726562, -6.2422027587890625, -91.65435791015625, 134.86163330078125, 162.71551513671875, 205.05706787109375, 4.071321487426758, 183.42462158203125, -1.09814453125, 55.05329513549805, -1.1185359954833984, 13.939361572265625, -138.2789764404297, -21.423782348632812, 187.931396484375, 193.37594604492188, 129.97552490234375, 54.29450988769531, 46.549896240234375, 8.443756103515625, 198.0255126953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000508.npy"}
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 56.46040344238281, "std": 91.68782043457031, "min": -143.32818603515625, "p10": -78.42787437438963, "median": 56.60816192626953, "p90": 175.47443237304688, "max": 212.80621337890625, "pos_frac": 0.71875, "sample": [62.44244384765625, -25.835041046142578, 66.66213989257812, 171.611083984375, 44.50282287597656, -0.8338298797607422, 8.066993713378906, -16.136980056762695, -95.31583404541016, 72.13241577148438, 15.640342712402344, 61.855499267578125, -88.28323364257812, 81.02587890625, 155.168701171875, -121.34010314941406, 92.65997314453125, 133.30569458007812, -17.449676513671875, 16.81658935546875, 148.70555114746094, -102.3478012084961, 31.727691650390625, -12.089736938476562, 212.80621337890625, 44.94218444824219, -24.28008270263672, 39.97085189819336, 70.8587646484375, -4.843597412109375, 198.6578369140625, 42.76198196411133, 179.95736694335938, 134.9028778076172, 52.7584228515625, -1.4841670989990234, 174.18850708007812, 89.42404174804688, 128.81475830078125, 176.02554321289062, 13.250999450683594, 0.9955596923828125, 203.68426513671875, -61.223873138427734, 106.80284118652344, 190.5476531982422, 49.97773742675781, 60.45790100097656, 199.16061401367188, 169.19049072265625, 171.29293823242188, -21.185867309570312, -31.614524841308594, -143.32818603515625, 8.948577880859375, 158.1922149658203, 105.10201263427734, -126.73651123046875, -85.80101776123047, 71.1999282836914, 7.351764678955078, 66.71515655517578, 161.17784118652344, 141.15408325195312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000509.npy"}
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 66.65662384033203, "std": 107.08175659179688, "min": -170.268798828125, "p10": -74.10163269042964, "median": 55.312110900878906, "p90": 210.16232299804688, "max": 274.9521179199219, "pos_frac": 0.78125, "sample": [58.154632568359375, 172.6317901611328, 171.78573608398438, 155.6089630126953, 3.93121337890625, 16.261383056640625, 13.373210906982422, 207.92654418945312, 0.35186004638671875, 1.280181884765625, -28.108238220214844, 24.69623565673828, 35.44088363647461, 41.66041564941406, -138.94921875, -8.015716552734375, 5.8512115478515625, 53.636322021484375, -6.669281005859375, 17.13377571105957, 63.01844787597656, 176.59243774414062, 197.59544372558594, -93.8130874633789, 5.83074951171875, 175.23507690429688, 56.662681579589844, -20.061172485351562, -0.21033096313476562, 152.39340209960938, 48.02433776855469, 125.61457824707031, 236.24981689453125, 77.66828155517578, 8.62839126586914, 145.44851684570312, 99.24507904052734, 227.06622314453125, 62.584869384765625, -17.4891357421875, 10.352750778198242, -158.79498291015625, 175.29617309570312, 62.402828216552734, 215.3359375, 21.60802459716797, 144.13861083984375, 218.93069458007812, 268.8664245605469, -94.89837646484375, 53.96154022216797, 72.35871887207031, 177.4166259765625, 158.77105712890625, -116.9111328125, 0.18556594848632812, 192.26028442382812, 211.12051391601562, -4.789161682128906, 91.00255584716797, 274.9521179199219, 73.20126342773438, -134.7418975830078, -170.268798828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000510.npy"}
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 77.87153625488281, "std": 97.55705261230469, "min": -131.60061645507812, "p10": -48.72259979248046, "median": 74.14213943481445, "p90": 196.09718170166016, "max": 224.64163208007812, "pos_frac": 0.78125, "sample": [202.23675537109375, 188.94569396972656, 75.73983764648438, 186.0629425048828, 14.286014556884766, -11.472061157226562, 41.40031814575195, -0.8564682006835938, 168.64710998535156, 193.44503784179688, 89.93019104003906, -53.395240783691406, 12.311141967773438, 45.03545379638672, 181.778076171875, 209.4848175048828, 155.2729034423828, -37.81977081298828, 138.7137908935547, 110.92304992675781, 181.27842712402344, -66.5048828125, 72.54444122314453, -96.6003646850586, -110.30674743652344, 7.881378173828125, 196.99227905273438, 135.0868682861328, 121.4055404663086, 28.76934814453125, 27.418611526489258, 139.86367797851562, 24.032440185546875, 48.78271484375, 171.42117309570312, 118.42601013183594, 169.5511474609375, 32.11429977416992, -16.326171875, 16.450925827026367, 34.97767639160156, 96.65623474121094, 178.2857208251953, 208.9008331298828, 224.64163208007812, 151.86880493164062, -130.64456176757812, 44.75445556640625, -15.344865798950195, 181.99319458007812, 41.25208282470703, 0.7271804809570312, 130.74258422851562, 163.96539306640625, 9.767303466796875, 39.535499572753906, -95.23722839355469, 214.7381591796875, 196.4058380126953, 150.36444091796875, 195.37698364257812, -3.293079376220703, -131.60061645507812, -18.00570297241211], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000511.npy"}
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 52.716392517089844, "std": 98.70779418945312, "min": -230.65496826171875, "p10": -76.38739051818845, "median": 36.614501953125, "p90": 176.5593200683594, "max": 216.414306640625, "pos_frac": 0.75, "sample": [157.4920654296875, 52.415679931640625, 32.238914489746094, -151.880126953125, -86.0887451171875, 180.41217041015625, 33.352352142333984, 26.914838790893555, 46.222511291503906, 134.0636749267578, 155.6046905517578, 191.6316680908203, -230.65496826171875, -53.75089645385742, 12.37896728515625, 3.1717529296875, -18.82579803466797, 1.6544303894042969, 160.99588012695312, 3.3277149200439453, 188.96051025390625, 173.9042205810547, 12.656578063964844, 151.68594360351562, -20.748733520507812, 149.29168701171875, -90.41860961914062, 168.13507080078125, 1.0184249877929688, 163.35638427734375, 2.418231964111328, 123.64981079101562, 132.0321807861328, 5.311727523803711, 143.144287109375, 20.410308837890625, 4.998085021972656, 99.76007080078125, 0.41539955139160156, 182.65069580078125, -5.2922210693359375, -145.6453857421875, -7.932229995727539, -28.405410766601562, 105.15235900878906, 195.78773498535156, 43.28571319580078, 31.096302032470703, 37.80535888671875, -0.5461845397949219, 170.43434143066406, -3.2454452514648438, 35.42364501953125, -126.75299072265625, 70.97706604003906, 128.77284240722656, 216.414306640625, 92.99836730957031, 177.6972198486328, 149.3218994140625, 39.15734100341797, -41.874732971191406, 76.58079528808594, -100.67069244384766], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000512.npy"}
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 64.99451446533203, "std": 84.64938354492188, "min": -152.385986328125, "p10": -19.566102218627922, "median": 52.24606704711914, "p90": 189.37776641845704, "max": 257.7676696777344, "pos_frac": 0.796875, "sample": [252.20718383789062, 135.1299285888672, 179.0109100341797, 115.20487976074219, 121.88105010986328, 66.69424438476562, 140.81204223632812, 0.5743980407714844, -0.7073402404785156, -2.5073394775390625, 57.002403259277344, -88.28250122070312, 162.60659790039062, 105.0429916381836, 92.3770751953125, 91.42948150634766, -22.224185943603516, 13.280448913574219, 188.7167205810547, 22.76451873779297, 47.48973083496094, 198.01072692871094, 123.84436798095703, 75.93876647949219, 197.16976928710938, 3.3449783325195312, -13.363906860351562, 107.97061157226562, 36.10453796386719, 90.92373657226562, 18.297874450683594, 65.83401489257812, 129.67050170898438, -152.385986328125, 45.04370880126953, 14.359216690063477, -47.03038787841797, 157.39312744140625, 257.7676696777344, 12.993671417236328, 0.7687873840332031, -1.7139644622802734, 214.50149536132812, 4.403900146484375, 122.95855712890625, 189.66107177734375, 202.36251831054688, -35.508689880371094, 153.970947265625, 87.27998352050781, 64.4914779663086, -1.906585693359375, 36.842750549316406, 72.69841003417969, -9.90374755859375, -67.7965087890625, 31.27521514892578, 11.808147430419922, 10.549144744873047, 25.65382194519043, 81.37092590332031, -47.405662536621094, 8.952102661132812, 3.9445419311523438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000513.npy"}
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 72.72770690917969, "std": 97.65734100341797, "min": -198.40237426757812, "p10": -51.15133132934569, "median": 87.0160140991211, "p90": 181.63469238281252, "max": 296.33294677734375, "pos_frac": 0.765625, "sample": [5.419492721557617, 129.07017517089844, 107.51473999023438, 105.34822082519531, 104.86625671386719, 82.9408187866211, 145.69723510742188, -17.24542236328125, 59.18153381347656, -17.436267852783203, 37.632694244384766, 187.59129333496094, 128.38218688964844, 99.47860717773438, -13.922632217407227, 67.35041046142578, -84.65740966796875, 125.51377868652344, 9.332733154296875, 70.30315399169922, -178.32305908203125, 1.2020339965820312, 175.7477264404297, 145.58023071289062, 196.33737182617188, 6.4244537353515625, 57.9224853515625, 150.31150817871094, 120.43647766113281, 205.87677001953125, -58.068206787109375, -64.86824798583984, 160.38465881347656, 175.82748413085938, 28.834205627441406, 91.0912094116211, 76.8553466796875, 165.93011474609375, -8.07000732421875, 113.67842102050781, 46.420310974121094, -81.84873962402344, -27.295452117919922, 12.721771240234375, 159.09925842285156, 100.99100494384766, 168.36074829101562, -24.63134765625, 15.908892631530762, 104.23909759521484, -36.406166076660156, 37.93984603881836, -1.2501754760742188, 166.68283081054688, -198.40237426757812, 118.67542266845703, 197.58360290527344, 166.41726684570312, 230.593994140625, 3.3631343841552734, 296.33294677734375, 183.27996826171875, -57.47068786621094, 177.79571533203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000514.npy"}
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 57.247528076171875, "std": 96.8034439086914, "min": -187.00381469726562, "p10": -77.38834152221679, "median": 57.022884368896484, "p90": 189.15560150146487, "max": 208.9429473876953, "pos_frac": 0.703125, "sample": [-13.536205291748047, 119.86327362060547, 22.08959197998047, 50.38495635986328, 207.3029327392578, 160.36709594726562, 19.336971282958984, 180.0474853515625, 5.742279052734375, 183.00338745117188, 80.14319610595703, 48.61878204345703, 99.54508209228516, -7.632455825805664, 121.29052734375, 61.85455322265625, -107.6329345703125, 190.6766815185547, -68.61766815185547, -128.36129760742188, 73.09281158447266, -1.0667343139648438, -30.307052612304688, 32.41026306152344, -187.00381469726562, 114.36872863769531, 185.60641479492188, 61.28724670410156, -21.116111755371094, -31.28313446044922, 135.37860107421875, 153.1016082763672, 0.2532844543457031, -1.740316390991211, 34.6275634765625, 166.71463012695312, -92.75527954101562, 157.22805786132812, -10.500402450561523, 101.74790954589844, -89.94725036621094, 84.13380432128906, 202.8874053955078, -128.62249755859375, 69.84324645996094, 199.04942321777344, 19.77243995666504, -20.01125144958496, 66.29979705810547, 184.4727783203125, 29.70287322998047, 121.23925018310547, 105.92337036132812, 199.2349395751953, 208.9429473876953, -81.14720153808594, 132.08827209472656, 3.9294166564941406, 198.57119750976562, -42.765953063964844, 52.758522033691406, 77.701904296875, -2.4060230255126953, 7.659538269042969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000515.npy"}
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 56.167274475097656, "std": 96.82243347167969, "min": -206.63595581054688, "p10": -56.7441047668457, "median": 64.10639190673828, "p90": 167.44957733154297, "max": 207.74949645996094, "pos_frac": 0.75, "sample": [162.7247772216797, 195.31727600097656, 148.6984100341797, -84.22798156738281, -56.953826904296875, 16.36191177368164, -46.296226501464844, 139.3307647705078, 167.5951690673828, 120.00199890136719, 41.49352264404297, 23.8718204498291, 109.2120361328125, 204.9944610595703, 1.5930290222167969, 10.296953201293945, 115.3292236328125, 163.0757598876953, 207.74949645996094, 152.08876037597656, -56.25475311279297, -187.77597045898438, 72.61347961425781, 99.49116516113281, 165.13534545898438, 65.08787536621094, 151.01748657226562, 6.4091033935546875, 155.10877990722656, 122.04460906982422, -206.63595581054688, 8.584243774414062, 0.6762180328369141, -39.18663787841797, -7.152595520019531, 173.44622802734375, 167.10986328125, -3.928243637084961, 3.431568145751953, 9.511890411376953, 122.40594482421875, -163.94479370117188, 98.15441131591797, 184.74368286132812, 63.124908447265625, 32.03924560546875, 163.41029357910156, 49.731563568115234, 76.96070861816406, 90.90422058105469, 5.735832214355469, 140.8031005859375, 8.12451171875, -55.77079772949219, -84.04995727539062, 110.78791809082031, -39.12644577026367, 101.00428771972656, -5.808113098144531, 11.065105438232422, 79.40815734863281, -1.9318656921386719, -58.79998016357422, 174.74244689941406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000516.npy"}
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 39.611785888671875, "std": 90.68933868408203, "min": -120.00444030761719, "p10": -85.86691131591797, "median": 22.050386428833008, "p90": 178.5036163330078, "max": 226.83984375, "pos_frac": 0.671875, "sample": [14.96639633178711, -7.2400665283203125, -12.273914337158203, 103.73619079589844, -65.77362823486328, 20.573902130126953, -42.31562805175781, 80.12295532226562, 203.45816040039062, 23.644290924072266, 12.484725952148438, -19.75518798828125, -71.28530883789062, 0.9633636474609375, 31.785934448242188, -67.5058822631836, -1.753732681274414, -1.7887191772460938, 57.573585510253906, 8.550132751464844, 115.36135864257812, -120.00444030761719, 178.83132934570312, 163.83642578125, -104.24966430664062, 52.393882751464844, -8.56243896484375, 89.7617416381836, 39.476348876953125, 159.40130615234375, 77.38265228271484, 162.09825134277344, 7.451234817504883, 46.872093200683594, -85.883056640625, -96.2502212524414, 16.42365264892578, 140.34494018554688, -44.294891357421875, -113.51714324951172, 81.5623550415039, 186.47817993164062, 226.83984375, 43.621341705322266, 96.32481384277344, 145.12930297851562, 10.650516510009766, 84.88456726074219, -99.17994689941406, -28.708763122558594, -85.82923889160156, 91.12887573242188, 11.688003540039062, 177.73895263671875, 211.21585083007812, 181.41961669921875, 48.86012268066406, -7.835819244384766, 23.526870727539062, 14.936527252197266, 179.4180450439453, -107.26924133300781, 99.59129333496094, 3.92120361328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000517.npy"}
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 57.238216400146484, "std": 91.81625366210938, "min": -151.10865783691406, "p10": -55.441747665405266, "median": 33.31150436401367, "p90": 188.3818603515625, "max": 255.0008544921875, "pos_frac": 0.71875, "sample": [94.58517456054688, -2.227752685546875, -1.1986160278320312, 190.5658721923828, -14.322341918945312, 4.787651062011719, 13.442298889160156, 135.43063354492188, -49.61978530883789, 26.100791931152344, 61.15482711791992, -67.84542846679688, 36.15941619873047, 83.21495056152344, 113.39241790771484, -57.93687438964844, 37.03678894042969, 116.53356170654297, -76.15354919433594, 166.42156982421875, 253.5655517578125, -151.10865783691406, 1.0437698364257812, 97.51216125488281, 153.86195373535156, 153.6486358642578, -61.24665832519531, 139.96292114257812, -69.79263305664062, 126.8584213256836, 6.571849822998047, -42.47785186767578, 2.628164291381836, 94.79042053222656, 30.463592529296875, -3.22723388671875, 3.453777313232422, -4.2431793212890625, 81.16728210449219, 240.99012756347656, 17.379714965820312, 62.463478088378906, 0.7480792999267578, 196.51275634765625, 128.02711486816406, 255.0008544921875, -64.52243041992188, 158.83096313476562, 188.98358154296875, -47.32170867919922, -8.543024063110352, -13.786661148071289, 186.97784423828125, 0.3739433288574219, 46.605369567871094, 162.95513916015625, 121.055908203125, 62.94175720214844, -3.6984195709228516, 0.8850784301757812, 3.4721145629882812, 4.384763717651367, 215.96713256835938, 123.60845947265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000518.npy"}
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 34.670284271240234, "std": 116.65364074707031, "min": -271.37799072265625, "p10": -112.33962554931641, "median": 30.78374481201172, "p90": 183.32825775146483, "max": 213.61492919921875, "pos_frac": 0.640625, "sample": [-140.13330078125, 177.8987274169922, 65.50921630859375, -190.2329559326172, -44.913211822509766, 179.03269958496094, -5.419017791748047, 213.61492919921875, 28.413558959960938, 164.30242919921875, 34.189239501953125, 25.67527198791504, 157.9971923828125, -26.245811462402344, 64.58216094970703, 92.86125183105469, 179.96812438964844, 22.93717384338379, -160.03750610351562, -34.148223876953125, 173.1637725830078, 170.0882110595703, -175.75880432128906, 200.1105194091797, 120.04817199707031, 186.9931182861328, -72.47496032714844, 5.411064147949219, 73.68280029296875, 84.78225708007812, -57.410667419433594, 156.4418182373047, 26.813278198242188, -80.33425903320312, 90.78591918945312, -54.199554443359375, -61.98683166503906, 1.3888359069824219, 204.72450256347656, -108.66326904296875, 107.45811462402344, 4.314109802246094, 166.66622924804688, 74.16638946533203, 43.87555694580078, -271.37799072265625, 58.86822509765625, -44.81361389160156, 33.1539306640625, -119.27857208251953, 183.27366638183594, -52.195899963378906, -113.91520690917969, 209.89645385742188, 44.90742492675781, 16.73261260986328, 200.5979766845703, 105.31319427490234, 2.590312957763672, -106.645263671875, -55.79383850097656, -44.14928436279297, 183.35165405273438, -97.55609130859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000519.npy"}
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 49.06908416748047, "std": 90.05728149414062, "min": -138.9610137939453, "p10": -54.14327316284179, "median": 32.18201732635498, "p90": 173.48725738525394, "max": 268.41168212890625, "pos_frac": 0.734375, "sample": [138.4800567626953, -28.010705947875977, -2.645122528076172, -9.628170013427734, 176.0036163330078, 2.019866943359375, 83.99041748046875, 6.418754577636719, 0.7209968566894531, 268.41168212890625, 151.92916870117188, 9.207910537719727, -1.275665283203125, -57.637046813964844, 61.58087158203125, 73.57003021240234, 4.423149108886719, -45.99113464355469, 7.574481964111328, 167.61575317382812, 236.04928588867188, 83.6392822265625, 48.922157287597656, 187.30104064941406, 68.77926635742188, 62.99982452392578, 24.697799682617188, -89.50917053222656, -104.17699432373047, 145.34271240234375, 58.44200134277344, -0.41809844970703125, 7.717987060546875, 21.187782287597656, 3.7361297607421875, -0.7514076232910156, 184.05804443359375, 103.36824798583984, 6.9682769775390625, -93.39722442626953, -137.6626434326172, 34.16709899902344, 55.12835693359375, -130.0769805908203, 76.79608154296875, 55.691505432128906, 23.794143676757812, 131.13058471679688, 110.170166015625, 26.373016357421875, 44.74061584472656, 188.7634735107422, 155.75682067871094, 49.65251922607422, 224.9718780517578, 30.196935653686523, 71.01742553710938, 1.1728134155273438, -7.527103424072266, -0.2305755615234375, -3.3683242797851562, 156.81227111816406, 160.19650268554688, -138.9610137939453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000520.npy"}
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 72.15047454833984, "std": 92.06861114501953, "min": -193.99191284179688, "p10": -15.871900939941401, "median": 73.9292984008789, "p90": 176.37014465332032, "max": 288.0942077636719, "pos_frac": 0.75, "sample": [164.52651977539062, 43.06449890136719, 153.8922576904297, 54.522979736328125, 170.23927307128906, 130.04849243164062, 171.15655517578125, -4.690589904785156, -9.365615844726562, 62.112884521484375, 68.23193359375, -27.428020477294922, 152.12173461914062, 71.77440643310547, 142.17930603027344, 111.00164031982422, 19.02256965637207, 20.412208557128906, -1.4925384521484375, 177.37060546875, -6.293861389160156, 172.17724609375, 32.71929931640625, 11.0447998046875, -7.558977127075195, 133.12466430664062, -97.24790954589844, 176.51797485351562, -10.145126342773438, 176.02520751953125, 219.6994171142578, 288.0942077636719, -121.51953887939453, 20.351070404052734, 0.8995895385742188, -22.041160583496094, -1.7959365844726562, 107.58488464355469, 2.3035736083984375, 212.63128662109375, 206.29750061035156, 18.406917572021484, -90.78430938720703, 81.65579986572266, -2.752592086791992, 11.624799728393555, 153.36654663085938, 102.56050109863281, 2.7640724182128906, 110.50990295410156, 151.39451599121094, 99.1422348022461, -193.99191284179688, 233.40109252929688, 104.80838012695312, 87.67964172363281, 55.14007568359375, -18.32623291015625, 87.11734008789062, -8.2752685546875, 120.57211303710938, 76.08419036865234, 166.45196533203125, 107.5113525390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000521.npy"}
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 47.63689422607422, "std": 109.89645385742188, "min": -220.58340454101562, "p10": -83.59985427856444, "median": 38.93418502807617, "p90": 195.94884033203127, "max": 239.4177703857422, "pos_frac": 0.65625, "sample": [-43.82757568359375, 81.0060043334961, 4.450616836547852, 69.33332061767578, 91.32930755615234, 178.94923400878906, 44.37616729736328, 214.3980712890625, 167.58609008789062, 79.3462905883789, -159.3767547607422, -127.070068359375, 43.76689147949219, 196.6910400390625, 38.945892333984375, 169.1929931640625, -57.14684295654297, 78.30523681640625, 143.8832244873047, 15.110960006713867, 194.217041015625, 0.2959327697753906, 59.730552673339844, -220.58340454101562, 237.15945434570312, 2.0191268920898438, -1.5793876647949219, -71.7165756225586, -216.17605590820312, 10.725557327270508, -23.843910217285156, 188.72604370117188, -35.83793640136719, 204.9510955810547, 145.8474884033203, 89.02479553222656, -8.713661193847656, 40.00700378417969, -3.4326438903808594, 38.50244903564453, 38.92247772216797, -6.371795654296875, 7.811012268066406, -18.519569396972656, 116.345703125, -0.5020370483398438, 119.55218505859375, -182.53587341308594, 196.98219299316406, -51.10652160644531, -5.378654479980469, -92.71681213378906, 154.549560546875, 239.4177703857422, -8.235870361328125, 10.543235778808594, -88.69268798828125, 225.78871154785156, 143.53250122070312, 133.17047119140625, 10.603158950805664, -19.506649017333984, 103.22557067871094, 163.31027221679688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000522.npy"}
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 54.623382568359375, "std": 102.03408813476562, "min": -254.09027099609375, "p10": -94.06837310791013, "median": 73.87549209594727, "p90": 159.44361114501953, "max": 208.7760467529297, "pos_frac": 0.78125, "sample": [24.752140045166016, 198.98410034179688, 1.3960914611816406, -9.721839904785156, 122.5297622680664, 89.7955551147461, 64.23323059082031, 33.37718963623047, 95.8958740234375, 39.03141784667969, 0.03151893615722656, 77.15104675292969, 198.64932250976562, -130.31195068359375, 70.60770416259766, 77.14328002929688, 116.66110229492188, -6.061346054077148, 64.22834777832031, -162.1580810546875, 12.45697021484375, 174.42425537109375, -45.35154724121094, 103.2078857421875, -111.66738891601562, 159.70240783691406, -66.70154571533203, 25.708126068115234, 1.0538711547851562, 148.2889404296875, 18.864538192749023, 125.93635559082031, 132.7841796875, 116.59723663330078, 158.83975219726562, 152.87026977539062, 165.52821350097656, 0.02288818359375, 19.171165466308594, -254.09027099609375, 79.61154174804688, -147.8929443359375, 208.47552490234375, 101.2867431640625, 157.07977294921875, 2.5294113159179688, 28.03030776977539, 110.89590454101562, 137.46060180664062, -79.0008773803711, 208.7760467529297, 63.1900749206543, 151.06419372558594, -100.52587127685547, 146.0338134765625, 157.29495239257812, 92.35003662109375, 153.60208129882812, -6.578590393066406, -172.9392852783203, -48.91246795654297, 104.88980102539062, 16.2889404296875, 129.02606201171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000523.npy"}
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 57.72763442993164, "std": 88.462646484375, "min": -182.08389282226562, "p10": -46.358073425292964, "median": 52.84110641479492, "p90": 174.8596923828125, "max": 255.08062744140625, "pos_frac": 0.796875, "sample": [158.6986846923828, 27.445236206054688, 204.2188720703125, 66.0120849609375, -16.057281494140625, -18.263938903808594, -71.36125946044922, -40.620361328125, 104.53572082519531, 75.06005859375, 189.1934814453125, 0.7559280395507812, -65.06697082519531, 73.39328002929688, 42.970611572265625, 200.878662109375, -159.76698303222656, -48.81709289550781, 11.985965728759766, 168.81060791015625, 169.9180145263672, -182.08389282226562, 145.1306915283203, -1.0773086547851562, 13.836761474609375, 25.212539672851562, 177.2720947265625, 97.89189910888672, 102.88773345947266, 5.244049072265625, -8.898216247558594, 90.9626693725586, 31.626541137695312, 99.48670196533203, 87.34274291992188, 59.80514144897461, 38.74268341064453, 154.1271514892578, 24.426132202148438, -72.5940170288086, 53.398651123046875, 4.4307403564453125, 4.542732238769531, 255.08062744140625, 40.149070739746094, 126.31305694580078, 170.3507080078125, 4.814487457275391, 53.507171630859375, 42.554656982421875, 102.14427947998047, 21.762741088867188, 52.28356170654297, 190.3188018798828, 69.41456604003906, 124.98339080810547, 176.7921142578125, 88.91121673583984, -124.2830810546875, 24.45197296142578, 153.41677856445312, -3.6294326782226562, 92.71659851074219, 6.877593994140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000524.npy"}
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 49.222312927246094, "std": 96.37015533447266, "min": -193.23712158203125, "p10": -57.366596984863264, "median": 29.79552459716797, "p90": 177.50227813720704, "max": 255.0984649658203, "pos_frac": 0.609375, "sample": [-18.012420654296875, -176.96524047851562, -8.140899658203125, -34.313472747802734, -111.94075775146484, 45.365386962890625, 14.7689208984375, 56.883453369140625, 215.52197265625, -6.632987976074219, 139.1442413330078, -1.8187713623046875, 130.91592407226562, 26.849029541015625, 127.96664428710938, 207.5323486328125, 101.74009704589844, -28.073204040527344, 152.6776580810547, -38.016632080078125, 101.11528778076172, -3.173053741455078, 205.38616943359375, 166.3283233642578, -6.871070861816406, 92.1783447265625, -17.592926025390625, 99.51499938964844, 47.622596740722656, 110.62648010253906, -7.590545654296875, 175.60659790039062, 32.74201965332031, 102.7364501953125, 81.89703369140625, -22.437942504882812, 43.222171783447266, 9.049911499023438, 255.0984649658203, -23.17247200012207, -65.65943908691406, 47.834877014160156, 1.0773963928222656, -193.23712158203125, -11.526580810546875, 178.31471252441406, -1.1147575378417969, 26.46051788330078, 18.74713134765625, 157.7182159423828, -83.75724029541016, 182.607421875, 16.475379943847656, -26.449447631835938, 151.70925903320312, 218.25015258789062, -74.9365234375, 79.72479248046875, 113.06765747070312, 116.7479248046875, -9.235687255859375, 166.44248962402344, -2.6615962982177734, -94.10948181152344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000525.npy"}
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 41.36842727661133, "std": 91.8499984741211, "min": -179.96597290039062, "p10": -73.12899932861326, "median": 44.05366516113281, "p90": 157.3151977539063, "max": 237.60952758789062, "pos_frac": 0.703125, "sample": [198.82875061035156, 11.256420135498047, -41.123878479003906, 3.4878597259521484, -86.57921600341797, 7.4698638916015625, 1.0375099182128906, 228.99769592285156, -133.6632843017578, -145.6736602783203, 66.6474609375, 72.0032958984375, -135.48538208007812, 20.055877685546875, 141.29225158691406, -34.050514221191406, 73.8349609375, 147.46051025390625, 23.71363067626953, 135.3463592529297, 56.654693603515625, 190.51930236816406, 25.9476318359375, 131.00259399414062, 176.7474365234375, 7.517185211181641, 40.774879455566406, 124.8600845336914, -58.22406005859375, 93.200927734375, -4.351388931274414, 76.4977798461914, 47.33245086669922, -15.811227798461914, 161.53863525390625, -2.978609085083008, 101.95842742919922, 10.669502258300781, 36.903900146484375, -2.6950302124023438, 82.29387664794922, 24.095294952392578, -30.736282348632812, 119.21550750732422, 53.44838333129883, -172.77947998046875, -0.4015464782714844, 139.40460205078125, 110.66514587402344, -17.309112548828125, 68.12850952148438, -21.82204818725586, 237.60952758789062, 62.63576126098633, 180.4976043701172, 6.711639404296875, -79.51683044433594, 68.99986267089844, 56.905982971191406, 58.58721923828125, 89.32200622558594, -30.30126953125, 68.96917724609375, -179.96597290039062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000526.npy"}
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 36.05781173706055, "std": 98.7869873046875, "min": -195.8603515625, "p10": -82.04953536987303, "median": 15.017051696777344, "p90": 176.75734558105472, "max": 252.60333251953125, "pos_frac": 0.625, "sample": [-0.3734912872314453, 26.672088623046875, 30.99748992919922, 97.73121643066406, 14.477371215820312, 2.15570068359375, 213.68057250976562, 80.168701171875, 159.96121215820312, 252.60333251953125, 163.85159301757812, -44.245880126953125, 169.10406494140625, -1.7672195434570312, 26.994903564453125, -90.82807159423828, 202.42816162109375, 0.8490543365478516, 94.82815551757812, -61.5662841796875, -127.23418426513672, -38.40937042236328, -35.90019226074219, -93.16044616699219, 93.11407470703125, 12.094268798828125, 217.16641235351562, 34.03935241699219, 9.927471160888672, 11.04534912109375, 86.85906982421875, 24.452743530273438, -46.33409118652344, 43.25015640258789, 31.26114273071289, 188.24362182617188, 52.544517517089844, -14.749786376953125, -12.858243942260742, 159.17764282226562, 77.2362289428711, 7.218799591064453, 160.59637451171875, -1.6415290832519531, -25.690582275390625, -8.722808837890625, -125.2208251953125, 180.03732299804688, -6.2592315673828125, -3.3052234649658203, -19.72747802734375, 0.9605007171630859, 92.77676391601562, 126.16666412353516, -195.8603515625, -39.212974548339844, 103.92555236816406, -9.125816345214844, 101.41392517089844, 190.66009521484375, -165.88861083984375, -179.23561096191406, 98.78988647460938, 15.556732177734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000527.npy"}
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 40.661521911621094, "std": 105.47079467773438, "min": -210.81744384765625, "p10": -63.1897331237793, "median": 22.18391704559326, "p90": 181.79112701416017, "max": 252.696044921875, "pos_frac": 0.703125, "sample": [147.22056579589844, -210.81744384765625, 194.14718627929688, -8.84050178527832, 56.09862518310547, 128.45223999023438, 73.59059143066406, -7.640472412109375, 19.571914672851562, -2.989593505859375, -1.2599334716796875, -50.41340637207031, 182.8569793701172, -174.02304077148438, 2.0878849029541016, -0.8107452392578125, 164.14073181152344, 11.070699691772461, 16.255247116088867, -63.272613525390625, -51.85649871826172, 252.696044921875, -0.8462448120117188, -180.56106567382812, -9.919151306152344, 24.27686309814453, -62.99634552001953, -40.314414978027344, 26.795682907104492, 168.1927490234375, 81.19517517089844, 15.316650390625, 31.29348373413086, 208.51116943359375, 113.16957092285156, 10.140348434448242, 201.4268341064453, 2.6645736694335938, 58.54866027832031, 94.07360076904297, 179.30413818359375, 191.52085876464844, 24.808530807495117, 20.090970993041992, -171.3413848876953, 116.24211120605469, 10.954517364501953, 7.468353271484375, 144.7105712890625, 1.0450057983398438, 48.90696716308594, -165.17355346679688, -172.79759216308594, 247.1322021484375, 119.97195434570312, 55.16289138793945, 4.286869049072266, -6.558341979980469, 169.39801025390625, 3.6473941802978516, 130.37567138671875, 61.11643981933594, 128.7667236328125, 36.06550598144531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000528.npy"}
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 88.30415344238281, "std": 90.65596008300781, "min": -168.6039581298828, "p10": -11.860997009277343, "median": 99.2132568359375, "p90": 194.9896224975586, "max": 226.83743286132812, "pos_frac": 0.828125, "sample": [102.84588623046875, -49.68790817260742, 111.75718688964844, -1.6534652709960938, 57.560306549072266, 2.0443267822265625, 226.83743286132812, 191.35411071777344, -33.893733978271484, 12.792823791503906, 8.80421257019043, 0.6160793304443359, 19.64300537109375, 89.56324005126953, 152.68783569335938, 130.05023193359375, -64.22847747802734, 161.27195739746094, -10.049591064453125, 160.62078857421875, 199.2468719482422, -12.637313842773438, -9.898162841796875, 175.69976806640625, 93.3486099243164, 174.6270294189453, -28.126853942871094, 33.38671112060547, 167.9380645751953, 35.1553955078125, 202.53900146484375, 149.72531127929688, 110.83860778808594, 183.79071044921875, 96.00582885742188, 108.94751739501953, 175.68075561523438, 182.73165893554688, 13.541179656982422, 212.370361328125, 197.62838745117188, 183.33868408203125, 216.6621551513672, 24.108474731445312, 154.23397827148438, 176.16940307617188, 60.16798400878906, -2.0522422790527344, 110.36949920654297, 34.956451416015625, 47.42536926269531, 98.09650421142578, 50.04889678955078, 191.1678466796875, -119.41462707519531, 196.54769897460938, 100.33000946044922, 184.3309326171875, 118.82145690917969, 63.48875427246094, -168.6039581298828, 9.541206359863281, 6.057472229003906, 184.19784545898438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000529.npy"}
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 51.91332244873047, "std": 90.5394287109375, "min": -160.37405395507812, "p10": -24.76815700531006, "median": 29.80315399169922, "p90": 178.2909652709961, "max": 251.56307983398438, "pos_frac": 0.734375, "sample": [230.92626953125, 73.11944580078125, 12.45524787902832, 3.846912384033203, 7.179573059082031, 3.0684814453125, 221.48391723632812, -9.461494445800781, 156.4914093017578, 151.4831085205078, 15.823810577392578, 197.9957275390625, -25.526742935180664, 35.843353271484375, 105.30891418457031, 57.31035614013672, 78.96501159667969, 96.94735717773438, 164.2494354248047, 15.931526184082031, 0.9195823669433594, -17.446426391601562, -160.37405395507812, 160.89691162109375, 125.04411315917969, -0.08946800231933594, 72.99714660644531, 70.71235656738281, 17.889877319335938, 179.83416748046875, 20.39263153076172, 11.945762634277344, 211.49581909179688, 251.56307983398438, 25.742263793945312, 46.342491149902344, 53.5263557434082, 93.28030395507812, -84.67388916015625, 6.793148040771484, -15.139617919921875, 62.63349914550781, -12.752769470214844, 46.746795654296875, -90.75955963134766, -20.49022674560547, 177.1184844970703, -14.311210632324219, -120.84144592285156, -107.87594604492188, -5.928354263305664, -85.11160278320312, 178.79345703125, -5.785182952880859, 169.3151092529297, 97.57640075683594, 130.34365844726562, -22.998123168945312, 3.9112548828125, 5.3783721923828125, 169.3338165283203, 66.80883026123047, 2.3892784118652344, 33.864044189453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000530.npy"}
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 55.718894958496094, "std": 89.81649780273438, "min": -190.6752166748047, "p10": -41.974523544311516, "median": 42.820919036865234, "p90": 185.1836624145508, "max": 214.54443359375, "pos_frac": 0.703125, "sample": [-45.70835876464844, 114.4091796875, 201.12286376953125, 114.49797058105469, 52.034820556640625, 85.22381591796875, 4.11029052734375, 211.51150512695312, 40.190147399902344, -8.73818588256836, 58.42566680908203, 1.0047893524169922, 214.54443359375, -1.0394439697265625, 126.79098510742188, 170.04656982421875, -135.41842651367188, -62.126686096191406, 18.88141632080078, 180.08389282226562, 59.12181091308594, 208.9759063720703, -66.32891082763672, -9.190361022949219, -29.27850341796875, 16.68131446838379, 86.46672821044922, 74.60285949707031, 9.425277709960938, 1.3597564697265625, 60.551353454589844, -7.671060562133789, -2.0109310150146484, 17.745651245117188, -190.6752166748047, -3.742938995361328, 90.63043212890625, 151.129638671875, 5.195945739746094, -19.851242065429688, 53.68193054199219, 79.81099700927734, 178.62376403808594, 177.90493774414062, -61.88563537597656, -52.032432556152344, 187.36927795410156, 171.98171997070312, 93.70106506347656, 31.265777587890625, -12.353843688964844, 19.861705780029297, 91.30055236816406, 166.09300231933594, 79.50669860839844, -33.26224136352539, 45.451690673828125, -8.189044952392578, 174.0290985107422, 192.2689208984375, -1.89068603515625, 0.6831569671630859, 188.15191650390625, 10.952247619628906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000531.npy"}
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 60.22857666015625, "std": 93.20708465576172, "min": -158.16702270507812, "p10": -69.80602111816404, "median": 53.9239387512207, "p90": 180.88819274902343, "max": 229.8347930908203, "pos_frac": 0.734375, "sample": [87.80990600585938, 94.19622039794922, 172.71249389648438, 178.39337158203125, -29.74505615234375, -16.201446533203125, 117.62679290771484, 187.65884399414062, 4.091869354248047, 102.48042297363281, -103.55249786376953, 143.8178253173828, 162.9145050048828, 57.9000244140625, 52.56452941894531, 33.464324951171875, 66.94448852539062, 87.46054077148438, 6.059736251831055, -96.13795471191406, -42.288360595703125, 11.651948928833008, 175.35781860351562, 55.283348083496094, -9.862457275390625, 52.46989440917969, 103.19561767578125, 229.8347930908203, 98.21343994140625, 208.4678955078125, 157.3926544189453, 3.6442832946777344, -1.981842041015625, 194.17715454101562, 14.936363220214844, 51.57945251464844, 110.33428955078125, -81.59930419921875, 10.884979248046875, 26.888612747192383, -93.13914489746094, -2.2062225341796875, 22.383464813232422, -32.95537185668945, 195.15484619140625, 27.90221405029297, 181.40106201171875, -36.403160095214844, 40.423988342285156, 179.69149780273438, 146.34498596191406, 122.8324966430664, -24.120521545410156, 193.976318359375, -158.16702270507812, 81.65357971191406, -116.7837142944336, 84.94284057617188, 173.0690460205078, -82.5547866821289, 0.9508609771728516, -14.627046585083008, 138.59312438964844, 147.22637939453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000532.npy"}
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 60.79398727416992, "std": 88.68106842041016, "min": -180.19898986816406, "p10": -25.619564437866206, "median": 40.510868072509766, "p90": 171.48626098632815, "max": 235.96878051757812, "pos_frac": 0.703125, "sample": [45.00714874267578, 168.09996032714844, -7.444892883300781, 31.270492553710938, 102.8514404296875, 85.79645538330078, 4.4229583740234375, -27.267047882080078, 178.05796813964844, 122.97981262207031, -36.28057861328125, -0.4274559020996094, -4.737052917480469, -0.29524803161621094, 95.47059631347656, 162.63250732421875, 10.220438003540039, 117.67552947998047, -13.336212158203125, -4.184989929199219, 9.116722106933594, -1.392416000366211, 110.51507568359375, 12.659652709960938, -72.58490753173828, 130.47218322753906, 135.1725311279297, 32.44819641113281, 10.773193359375, -164.0026397705078, -21.775436401367188, -14.597980499267578, 181.11195373535156, 154.4290771484375, -2.409517288208008, 204.70863342285156, 35.351837158203125, 235.96878051757812, 49.63630676269531, -47.44942092895508, 36.01458740234375, 146.972900390625, 132.03102111816406, 98.76029205322266, 169.32046508789062, 185.9840087890625, 76.08757019042969, -78.3958740234375, 21.278533935546875, -2.4874343872070312, 5.574041366577148, 159.5535888671875, 136.91336059570312, -12.536392211914062, 2.0481128692626953, 64.81204986572266, 167.15733337402344, 130.27012634277344, 17.88628387451172, 182.76165771484375, 172.41445922851562, 167.2559051513672, -180.19898986816406, 82.67399597167969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000533.npy"}
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 59.431175231933594, "std": 101.9652328491211, "min": -188.51010131835938, "p10": -58.176870346069336, "median": 45.16385841369629, "p90": 189.1638626098633, "max": 264.69268798828125, "pos_frac": 0.65625, "sample": [38.07947540283203, -2.2035770416259766, 196.93359375, 10.413211822509766, -18.985197067260742, 80.2038345336914, 25.030776977539062, -188.51010131835938, 100.81787109375, -92.77320861816406, -5.704179763793945, 27.858856201171875, 161.22735595703125, 158.89450073242188, -85.87598419189453, 264.69268798828125, 122.90178680419922, 44.87051773071289, 45.45719909667969, -19.155792236328125, 162.84716796875, 223.9241180419922, 101.45553588867188, 54.3349609375, -8.332862854003906, -95.73974609375, -173.6884765625, -10.160116195678711, 212.47821044921875, 13.057411193847656, 116.16082763671875, -81.68998718261719, 204.82496643066406, 181.61102294921875, 187.2799530029297, 117.61361694335938, -59.371891021728516, 56.92462921142578, 234.93162536621094, 181.17364501953125, 78.45999145507812, 187.23928833007812, -46.10230255126953, 1.526092529296875, 174.7933349609375, 119.96902465820312, -0.26418304443359375, 138.57493591308594, 17.57732391357422, -16.980941772460938, -0.746063232421875, 189.97125244140625, 96.05461883544922, -17.128742218017578, -16.65776824951172, 6.4163360595703125, 139.32568359375, -55.38848876953125, 106.33592224121094, -21.905197143554688, 63.52227020263672, 41.36724090576172, 176.0843963623047, -42.25710678100586], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000534.npy"}
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 49.48677444458008, "std": 95.00299072265625, "min": -189.62132263183594, "p10": -49.285100936889634, "median": 32.54797172546387, "p90": 180.47696838378909, "max": 225.66990661621094, "pos_frac": 0.65625, "sample": [24.66832733154297, -32.09807205200195, 108.88276672363281, 46.02338790893555, 16.99614715576172, 27.757488250732422, -178.91476440429688, -53.84244155883789, -106.4710464477539, 19.483169555664062, 100.17626953125, -63.8839111328125, 41.121253967285156, 139.5804443359375, 1.5440826416015625, -15.892906188964844, -20.034423828125, 57.5810546875, -0.4642963409423828, -34.19157409667969, 18.479507446289062, 76.53047180175781, 160.90863037109375, -38.65130615234375, 153.34033203125, 9.130531311035156, 196.7586669921875, -123.75880432128906, -29.21662139892578, 2.9197635650634766, 170.3020782470703, 185.17538452148438, 107.01690673828125, -4.683601379394531, 79.19581604003906, 174.63290405273438, 37.33845520019531, 225.66990661621094, -12.709136962890625, 122.10334014892578, 39.908546447753906, -14.64068603515625, 222.81240844726562, 143.84776306152344, -11.111251831054688, -189.62132263183594, -63.52093505859375, 123.9809341430664, -4.122562408447266, 118.31958770751953, 82.24169158935547, -6.041450500488281, 44.47227096557617, 201.25439453125, 4.1386260986328125, 144.6221160888672, 1.8017501831054688, 100.88296508789062, 182.9815673828125, -11.767898559570312, 90.29302978515625, -6.958824157714844, 221.68783569335938, 163.18896484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000535.npy"}
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 78.75794982910156, "std": 84.85926818847656, "min": -63.66790771484375, "p10": -5.279485321044921, "median": 55.04288864135742, "p90": 197.610368347168, "max": 247.61740112304688, "pos_frac": 0.796875, "sample": [62.56644058227539, 14.751396179199219, -43.62059783935547, 182.02557373046875, 105.02609252929688, 18.783401489257812, -1.03973388671875, -16.08216094970703, 98.304931640625, 15.141397476196289, -63.66790771484375, 130.32901000976562, 108.15216827392578, 207.13601684570312, 174.64718627929688, 139.14773559570312, 48.8389892578125, 186.2422332763672, 236.1746368408203, 61.246788024902344, 3.6023693084716797, 3.133394241333008, 182.2750701904297, 145.9737548828125, 24.328079223632812, 192.25958251953125, 152.21849060058594, 3.5462818145751953, 87.52465057373047, 121.05137634277344, 200.84396362304688, -2.2624549865722656, 2.891033172607422, 90.84371948242188, -17.4852294921875, -13.976776123046875, 183.88824462890625, 8.41302490234375, 199.0818634033203, 1.0634517669677734, 194.1768798828125, 192.15701293945312, 23.727033615112305, -4.070894241333008, 7.184894561767578, 21.005905151367188, 12.348579406738281, 75.23909759521484, 67.45556640625, 247.61740112304688, -2.303457260131836, 167.5891876220703, 44.184051513671875, 212.1944580078125, 139.3762969970703, 48.33367919921875, 30.88927459716797, 224.47935485839844, -5.797452926635742, -44.17573547363281, 29.928884506225586, 127.91925048828125, -2.252716064453125, -0.0148773193359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000536.npy"}
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 59.59520721435547, "std": 96.14234924316406, "min": -140.28424072265625, "p10": -46.777184295654294, "median": 42.543052673339844, "p90": 186.8914505004883, "max": 263.64788818359375, "pos_frac": 0.671875, "sample": [151.3842315673828, 102.97635650634766, -10.906646728515625, 118.97616577148438, 207.544677734375, 175.0838165283203, 135.311767578125, -1.4027538299560547, 95.64288330078125, 80.74867248535156, 74.2685317993164, -3.328481674194336, 188.27024841308594, 175.7978515625, 135.9375762939453, 156.03851318359375, 183.00259399414062, 4.486471176147461, -1.6162567138671875, -11.431753158569336, -0.8343963623046875, 44.472503662109375, 16.726608276367188, 9.486656188964844, 1.6008663177490234, 190.27786254882812, 94.850341796875, 9.354240417480469, -88.98998260498047, -8.18549919128418, 32.58324432373047, 31.608341217041016, 126.5946273803711, -30.075660705566406, -118.50669860839844, 11.57773208618164, 244.91497802734375, 99.21209716796875, 4.756927490234375, -9.970043182373047, 26.256568908691406, 183.67425537109375, -124.36311340332031, 84.80104064941406, -25.908674240112305, -16.11287498474121, 89.95480346679688, -26.31598663330078, 99.38775634765625, 46.75550842285156, -4.08978271484375, 147.88137817382812, -42.2578125, 172.8737030029297, 159.612548828125, 263.64788818359375, -93.57435607910156, 90.30271911621094, -88.84297943115234, 40.61360168457031, 191.73934936523438, 208.81683349609375, -140.28424072265625, -48.71405792236328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000537.npy"}
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 66.96125793457031, "std": 88.34017181396484, "min": -179.70004272460938, "p10": -42.60556182861327, "median": 52.11919021606445, "p90": 177.38057861328124, "max": 272.59002685546875, "pos_frac": 0.765625, "sample": [-48.59978485107422, 171.63861083984375, 39.31653594970703, 16.690486907958984, 75.33140563964844, -179.70004272460938, 68.78517150878906, -31.618682861328125, 94.67140197753906, 162.7863311767578, 84.52640533447266, 177.03890991210938, 34.9027099609375, 160.1684112548828, 24.47321891784668, 27.414703369140625, 138.5875701904297, 176.8291473388672, 14.773246765136719, 42.473819732666016, 55.194915771484375, 1.8146209716796875, 184.4613800048828, 105.43014526367188, 43.26483154296875, -74.89955139160156, 150.71063232421875, -15.601478576660156, 212.42050170898438, 5.360862731933594, -47.31422424316406, 162.29876708984375, -14.476226806640625, -47.32354736328125, 4.658210754394531, 48.673851013183594, 43.5662841796875, 83.1583251953125, -1.0327606201171875, 49.312744140625, 234.28128051757812, -5.71405029296875, 196.85299682617188, 86.34449768066406, -85.51490020751953, 128.5904998779297, -29.282371520996094, 177.52700805664062, 139.55311584472656, 272.59002685546875, 169.99099731445312, 42.11235046386719, 21.928329467773438, -57.3341064453125, 148.81021118164062, 105.73828125, 197.94691467285156, -0.11911392211914062, 98.7025146484375, 36.621307373046875, 54.925636291503906, 64.95630645751953, 97.1388168334961, -11.293651580810547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000538.npy"}
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 50.87729263305664, "std": 90.15190887451172, "min": -145.84889221191406, "p10": -23.673888397216796, "median": 28.508441925048828, "p90": 187.0766571044922, "max": 248.95059204101562, "pos_frac": 0.71875, "sample": [68.22840118408203, 110.75883483886719, -1.0614547729492188, 200.50765991210938, 17.701147079467773, -11.357505798339844, 188.14541625976562, 184.5828857421875, 133.64962768554688, 4.907691955566406, 47.64207458496094, 64.42098999023438, 61.87762451171875, 0.201995849609375, -53.223419189453125, -7.425941467285156, -5.2925262451171875, -2.8483047485351562, 193.5908660888672, 120.64707946777344, 4.28184700012207, -51.560237884521484, 39.27870178222656, -142.731201171875, 8.115943908691406, -122.46539306640625, -16.204336166381836, 127.42431640625, 228.57196044921875, 39.54718780517578, 14.865226745605469, -5.201446533203125, -23.944442749023438, 30.33301544189453, 240.630615234375, 52.45813751220703, 6.801170349121094, 48.374053955078125, 183.08279418945312, 19.944978713989258, -13.352706909179688, 72.28153991699219, 96.15017700195312, 16.66143035888672, 77.7996826171875, 59.72297668457031, -117.16252899169922, 146.57269287109375, 8.294136047363281, -16.06292724609375, 181.9342041015625, 79.81426239013672, 14.29141616821289, -23.04259490966797, 159.6593017578125, 53.916107177734375, -4.4740447998046875, 14.212467193603516, -145.84889221191406, 92.0425033569336, 16.777345657348633, 213.06973266601562, 248.95059204101562, 26.683868408203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000539.npy"}
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 73.51258850097656, "std": 88.95764923095703, "min": -138.73452758789062, "p10": -25.94640731811523, "median": 55.660404205322266, "p90": 190.75428619384766, "max": 220.82431030273438, "pos_frac": 0.8125, "sample": [-13.24749755859375, 0.787139892578125, 191.42251586914062, 185.71421813964844, 3.960935592651367, -138.73452758789062, 124.68792724609375, 85.79301452636719, 69.86714935302734, 3.654224395751953, 52.6922492980957, 118.98788452148438, 181.93267822265625, 54.622703552246094, -101.52572631835938, 56.69810485839844, 169.229736328125, 130.40005493164062, 8.294933319091797, 109.19126892089844, 19.19012451171875, -30.667526245117188, 115.5517578125, 36.19761657714844, 207.79258728027344, 164.75836181640625, -46.658592224121094, 117.39404296875, -43.35906982421875, 22.589279174804688, 62.4925537109375, 186.6555938720703, 220.82431030273438, 134.59454345703125, 206.01177978515625, -4.828502655029297, -12.028915405273438, 177.92538452148438, -27.65673828125, 1.810811996459961, -0.5040626525878906, 171.3465118408203, 173.941650390625, 194.84271240234375, 193.0167694091797, 20.515335083007812, 183.6505889892578, 45.09284591674805, 46.89959716796875, -94.22864532470703, 33.06092834472656, 77.03436279296875, 39.430023193359375, 4.806182861328125, 0.35887908935546875, 34.3739013671875, -21.95563507080078, 214.82362365722656, 26.924331665039062, 99.4912109375, 181.89614868164062, 7.804075241088867, 79.97097778320312, 189.19508361816406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000540.npy"}
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 68.03440856933594, "std": 87.86502838134766, "min": -171.3309326171875, "p10": -34.09289302825927, "median": 63.01278305053711, "p90": 180.67784118652344, "max": 211.01962280273438, "pos_frac": 0.734375, "sample": [167.5911407470703, 70.84907531738281, 98.42646789550781, 181.16677856445312, -5.141998291015625, 166.53903198242188, -69.0472412109375, 175.35073852539062, 177.14447021484375, 3.080841064453125, 19.596755981445312, -2.60992431640625, 2.6059799194335938, 128.04554748535156, -6.482940673828125, -171.3309326171875, 166.2677001953125, 179.5369873046875, 60.1875, 183.7252197265625, -55.664466857910156, 40.9398193359375, 189.8328857421875, 183.6810302734375, 164.41448974609375, 114.8571548461914, 169.40371704101562, -38.999473571777344, -62.00485610961914, -3.2869300842285156, 103.8379898071289, 19.21074867248535, 185.67138671875, 211.01962280273438, 5.946403503417969, 117.08889770507812, 98.75450134277344, -11.905181884765625, 134.552734375, -8.162973403930664, -2.025663375854492, -103.37260437011719, 132.99844360351562, 170.54721069335938, 19.92827796936035, 64.71372985839844, 150.00662231445312, 140.3868408203125, -47.35633850097656, 17.758132934570312, 0.8987483978271484, 154.74398803710938, 61.31183624267578, 47.08140563964844, -8.912399291992188, 20.69195556640625, 66.37979125976562, 83.49043273925781, -22.64420509338379, 23.85761260986328, 105.40336608886719, -5.665491104125977, 186.48968505859375, 12.802101135253906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000541.npy"}
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 76.51224517822266, "std": 101.96780395507812, "min": -207.7124481201172, "p10": -63.495328521728496, "median": 81.05498504638672, "p90": 193.6114013671875, "max": 241.0106201171875, "pos_frac": 0.8125, "sample": [154.93568420410156, 165.08203125, 7.697319030761719, 232.47586059570312, 76.34274291992188, 13.478302001953125, 115.59854125976562, 71.782470703125, 137.4042510986328, 88.7731704711914, 14.646068572998047, -8.559898376464844, -70.92302703857422, 185.96771240234375, 14.977018356323242, 166.8772430419922, 50.21520233154297, 241.0106201171875, -121.73213195800781, 181.08914184570312, 178.72543334960938, 221.4639892578125, 29.20221710205078, -2.297271728515625, -142.0297393798828, -207.7124481201172, 124.59466552734375, 85.79158020019531, 124.04733276367188, -46.164031982421875, 181.2147979736328, 188.8494873046875, 1.0075054168701172, 195.6522216796875, -94.25438690185547, -4.945980072021484, 3.057525634765625, 106.11019897460938, 0.23268890380859375, 121.51146697998047, 13.75775146484375, 20.941436767578125, 187.21815490722656, -78.47468566894531, 82.27351379394531, 0.41522979736328125, 188.6803436279297, 19.340782165527344, 82.97408294677734, 201.68569946289062, 159.22799682617188, 199.90411376953125, -74.20628356933594, 48.570709228515625, -5.448587417602539, 5.705066680908203, 13.976089477539062, 185.6517791748047, 170.46920776367188, 76.40900421142578, 79.83645629882812, 210.06338500976562, 139.01483154296875, 187.602294921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000542.npy"}
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 71.01347351074219, "std": 91.6890869140625, "min": -201.59420776367188, "p10": -14.052851676940909, "median": 69.8774642944336, "p90": 186.4872039794922, "max": 226.76065063476562, "pos_frac": 0.8125, "sample": [44.24177551269531, 33.67367172241211, 75.12062072753906, 0.2626628875732422, 109.56669616699219, 12.797016143798828, 125.45761108398438, 185.47653198242188, -167.52781677246094, 190.3681640625, 98.3543701171875, 147.9465789794922, 132.16709899902344, 171.76014709472656, 215.75445556640625, 97.81645202636719, 177.45974731445312, -121.31077575683594, 66.20215606689453, 192.5449676513672, 19.850255966186523, -3.8469467163085938, 1.6051673889160156, 38.1248779296875, 8.318248748779297, 178.7674560546875, -201.59420776367188, 1.1073532104492188, 2.2243995666503906, -49.684593200683594, 54.11601257324219, 177.67535400390625, 204.67669677734375, 4.159450531005859, 167.5687713623047, 51.01475524902344, 186.92034912109375, -5.904474258422852, -29.13933563232422, -3.7014389038085938, 87.29447937011719, 123.46788787841797, 135.72372436523438, 34.05975341796875, 181.5994415283203, 25.633155822753906, 147.14996337890625, 226.76065063476562, 31.230175018310547, 120.90361022949219, -5.284160614013672, 73.55277252197266, 78.13224792480469, -17.545013427734375, 100.07438659667969, 30.585723876953125, 115.50302124023438, 36.525726318359375, 6.431793212890625, -48.54252243041992, 146.6284942626953, 207.26754760742188, -1.4577980041503906, 118.776611328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000543.npy"}
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 76.49650573730469, "std": 100.63465881347656, "min": -143.08277893066406, "p10": -56.94599304199217, "median": 80.33097076416016, "p90": 214.5223571777344, "max": 249.23846435546875, "pos_frac": 0.796875, "sample": [7.68841552734375, -111.47457885742188, 57.329376220703125, 18.704790115356445, 143.09619140625, 52.04168701171875, -88.39215087890625, 33.43762969970703, -25.712554931640625, 27.847434997558594, 49.936431884765625, 89.21774291992188, -6.802734375, 183.394287109375, 94.18184661865234, 10.815465927124023, -129.4656524658203, 94.3301010131836, 96.5627212524414, 46.569915771484375, 137.9816436767578, 141.00741577148438, 157.911865234375, 175.66307067871094, 215.26712036132812, 60.612518310546875, 7.3032684326171875, -6.589080810546875, 1.3361797332763672, -143.08277893066406, 249.23846435546875, -136.1769561767578, 175.33718872070312, -4.927314758300781, 188.28770446777344, 138.5210723876953, 218.79263305664062, -65.2174072265625, 33.32868957519531, 77.6509017944336, 171.5017852783203, 31.89788055419922, 212.78457641601562, 57.300270080566406, 83.01103973388672, 92.27642822265625, 121.90359497070312, 138.8124237060547, 219.70028686523438, -37.646026611328125, 17.983415603637695, 169.93780517578125, 239.60116577148438, 237.75352478027344, -86.99170684814453, -12.241775512695312, 151.01290893554688, 32.434417724609375, 186.03176879882812, 163.66848754882812, 0.13555908203125, 86.66221618652344, 112.90704345703125, 239.78668212890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000544.npy"}
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 47.767616271972656, "std": 96.1666259765625, "min": -194.4877166748047, "p10": -66.40205307006836, "median": 26.718679428100586, "p90": 185.85466613769532, "max": 218.42117309570312, "pos_frac": 0.6875, "sample": [126.77349853515625, 110.76810455322266, -57.612060546875, 197.5607452392578, 25.747875213623047, 0.1309356689453125, -81.8490982055664, -66.95475006103516, -29.89483070373535, 56.746761322021484, 105.77841186523438, -8.006317138671875, 184.00381469726562, 51.96107482910156, 49.336814880371094, 9.907630920410156, 71.62794494628906, 186.64788818359375, -62.703094482421875, 115.62887573242188, -14.517799377441406, 201.796630859375, -14.585700988769531, 124.49334716796875, 85.74801635742188, -10.619285583496094, 65.5402603149414, 22.2471866607666, -17.781295776367188, 2.353452682495117, -1.9841041564941406, -81.14236450195312, -65.1124267578125, 128.94247436523438, -98.4757080078125, -4.80415153503418, 181.0041046142578, 188.43153381347656, 155.4364776611328, 0.9290695190429688, -6.772556304931641, 218.42117309570312, 137.16554260253906, 11.058929443359375, 15.570526123046875, 82.79137420654297, -38.431983947753906, 132.15377807617188, 44.04058837890625, -194.4877166748047, 14.372772216796875, 116.87503051757812, 27.689483642578125, 218.05699157714844, 96.11410522460938, -132.388671875, 3.5184898376464844, 139.94070434570312, -149.6659393310547, 0.7449512481689453, 79.56526947021484, 177.01319885253906, 207.87911987304688, 22.402484893798828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000545.npy"}
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 55.045677185058594, "std": 99.84337615966797, "min": -214.53280639648438, "p10": -77.23039550781249, "median": 44.24137878417969, "p90": 179.37237243652345, "max": 250.6864013671875, "pos_frac": 0.78125, "sample": [179.89529418945312, -33.93145751953125, 178.1522216796875, 46.563255310058594, 109.16957092285156, 164.1166534423828, 2.72821044921875, -33.05528259277344, 26.69268798828125, -87.16371154785156, -214.53280639648438, 143.01768493652344, 18.63698959350586, 20.405044555664062, -154.47825622558594, 218.91470336914062, 8.372739791870117, -6.3350067138671875, -55.103126525878906, 31.300315856933594, -151.86630249023438, 58.96839904785156, 113.96633911132812, 125.49087524414062, 17.502052307128906, 99.79029846191406, 144.4176025390625, -35.82147216796875, 134.2799835205078, 198.42819213867188, 132.91769409179688, 32.64141082763672, 158.67221069335938, 32.13508605957031, 17.773616790771484, -72.7672119140625, 26.24279022216797, 250.6864013671875, 117.12016296386719, 110.04869079589844, 2.835765838623047, 41.91950225830078, -91.51234436035156, 100.6402359008789, 26.72649574279785, 97.29956817626953, -24.257278442382812, 11.825366973876953, 96.3127670288086, 111.1270751953125, 158.84397888183594, 30.26630401611328, 207.6976318359375, -79.1431884765625, 7.620780944824219, 59.528045654296875, 138.32183837890625, 87.48817443847656, 184.3525848388672, 63.92145538330078, 190.15057373046875, -154.40118408203125, 176.77650451660156, 4.590177536010742], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000546.npy"}
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 72.02572631835938, "std": 97.05525207519531, "min": -200.79376220703125, "p10": -13.436121368408196, "median": 66.81401062011719, "p90": 189.73265380859377, "max": 273.6082763671875, "pos_frac": 0.828125, "sample": [3.533477783203125, 36.52033996582031, 13.618490219116211, 14.814437866210938, 214.18682861328125, 87.27173614501953, -68.05303192138672, 217.36566162109375, 76.80392456054688, 155.30430603027344, 167.06658935546875, 139.557373046875, 47.470787048339844, 79.57804870605469, 76.7054443359375, -6.503257751464844, 98.44517517089844, 177.8135223388672, 15.82821273803711, 15.108413696289062, 186.9021759033203, 158.54054260253906, 184.886474609375, 0.820068359375, 230.4808349609375, 152.02346801757812, 23.415054321289062, 101.66414642333984, 47.6669921875, 4.276161193847656, 75.63026428222656, -1.197397232055664, -198.8533935546875, 105.23991394042969, -37.6845703125, 69.09788513183594, 187.7786865234375, 7.014850616455078, 192.033447265625, 62.10285186767578, 145.91973876953125, 180.09422302246094, 116.62968444824219, 148.8446502685547, -140.256591796875, 64.53013610839844, 3.0339126586914062, -200.79376220703125, 28.531570434570312, 20.185043334960938, 190.570068359375, 181.43923950195312, 273.6082763671875, -4.242347717285156, 89.97295379638672, -16.4073486328125, 36.9394416809082, 145.0769805908203, -3.5387439727783203, 28.289817810058594, 14.269308090209961, 8.579910278320312, 201.62460327148438, -17.529251098632812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000547.npy"}
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 48.743003845214844, "std": 91.7282943725586, "min": -135.9672088623047, "p10": -72.82500762939453, "median": 40.759538650512695, "p90": 188.8016586303711, "max": 216.60914611816406, "pos_frac": 0.6875, "sample": [136.11158752441406, 156.76754760742188, -27.06817626953125, -46.692626953125, -135.9672088623047, 58.12046813964844, -8.811485290527344, 216.60914611816406, 154.2418670654297, 56.92888641357422, -30.928741455078125, 187.4669952392578, 0.24549484252929688, 12.735740661621094, 29.416915893554688, 1.057474136352539, -71.53684997558594, 43.11872863769531, -2.4456329345703125, 94.15582275390625, 204.08615112304688, 57.28450012207031, -83.8048095703125, 13.2554931640625, 128.71151733398438, 38.40034866333008, 205.67816162109375, 189.3736572265625, 19.875244140625, 111.21244812011719, -51.17360305786133, 105.06329345703125, -14.2943115234375, 120.62588500976562, 22.07061767578125, -31.360336303710938, 195.7097625732422, 60.484764099121094, -70.30096435546875, 118.51920318603516, 64.94454193115234, -0.34789466857910156, 68.42284393310547, -77.01220703125, 205.99484252929688, 13.535659790039062, -38.98664093017578, 75.25740051269531, -112.00713348388672, 127.1368179321289, 57.76336669921875, -111.36898803710938, 60.78553009033203, 151.90016174316406, 77.6767807006836, 17.788124084472656, 154.4184112548828, -25.035888671875, 24.084075927734375, -77.32481384277344, 190.27366638183594, 169.5958709716797, -73.3770751953125, 12.491622924804688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000548.npy"}
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 55.51802444458008, "std": 98.03534698486328, "min": -190.4209747314453, "p10": -70.63365936279297, "median": 52.223289489746094, "p90": 180.24990692138672, "max": 222.26287841796875, "pos_frac": 0.671875, "sample": [25.12725830078125, 15.477119445800781, -75.17217254638672, 98.4664535522461, -6.650932312011719, -70.50254821777344, 85.6492691040039, 3.163522720336914, 87.1526107788086, 200.55691528320312, 212.10899353027344, 4.071723937988281, 92.61382293701172, 163.69686889648438, -7.09765625, -38.10291290283203, -79.54304504394531, 78.53729248046875, 130.99026489257812, 151.55418395996094, 158.18646240234375, 136.13893127441406, 160.30630493164062, -105.12767028808594, 61.56439208984375, 6.4361572265625, 33.98368453979492, 190.4730224609375, 145.16030883789062, -59.19744110107422, 187.72549438476562, 134.28433227539062, 178.17181396484375, -20.989931106567383, 158.54153442382812, 222.26287841796875, -8.332296371459961, -111.17617797851562, -16.1297607421875, 149.54971313476562, -186.21401977539062, 23.757049560546875, -9.004188537597656, 46.997013092041016, 57.44956588745117, 138.3497772216797, -22.543609619140625, 138.24501037597656, 208.63314819335938, 38.10908508300781, -1.5475387573242188, 122.80213928222656, 134.52420043945312, 69.50450134277344, 84.00608825683594, 1.9228363037109375, 4.64961051940918, -2.003520965576172, -15.275287628173828, 127.63922882080078, 181.14051818847656, -190.4209747314453, -70.68984985351562, -0.8058929443359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000549.npy"}
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 36.81257629394531, "std": 90.5447998046875, "min": -187.1207275390625, "p10": -82.76085281372067, "median": 18.22628116607666, "p90": 150.64644317626954, "max": 244.79751586914062, "pos_frac": 0.734375, "sample": [81.6380615234375, 196.82388305664062, 187.43106079101562, 34.18666076660156, -108.4632568359375, 7.833381652832031, 0.15670204162597656, -38.532318115234375, -23.45136260986328, 85.95692443847656, 151.07513427734375, 33.571685791015625, 19.331008911132812, 2.5754623413085938, 9.048784255981445, 197.02597045898438, 200.2891082763672, 87.31246185302734, 16.923736572265625, 31.658184051513672, 134.1201171875, -7.987087249755859, 110.51798248291016, 81.47543334960938, 145.28526306152344, -33.44978332519531, 244.79751586914062, -6.854816436767578, 13.905540466308594, -98.98112487792969, -101.94064331054688, 19.906997680664062, -35.64959716796875, 60.98817443847656, 6.181488037109375, 75.3258056640625, 58.82884979248047, 9.520904541015625, -58.63402557373047, 9.812255859375, -93.10092163085938, 95.47370910644531, 16.955753326416016, 56.68257522583008, 3.4224700927734375, 121.50787353515625, 16.006134033203125, 17.121553421020508, 56.812774658203125, 123.6810302734375, -147.37158203125, 127.37808227539062, 125.01922607421875, 59.74842071533203, 155.54666137695312, 12.161378860473633, -187.1207275390625, 8.258550643920898, 149.6461639404297, -164.67453002929688, 111.03208923339844, -38.850852966308594, -10.799808502197266, -58.09151077270508], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000550.npy"}
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 46.996219635009766, "std": 84.09393310546875, "min": -203.10971069335938, "p10": -17.715975952148437, "median": 24.159966468811035, "p90": 175.8296813964844, "max": 321.71539306640625, "pos_frac": 0.734375, "sample": [-10.575798034667969, 84.15301513671875, 89.48133850097656, 39.821197509765625, 116.25445556640625, 49.152931213378906, 321.71539306640625, -6.948040008544922, 29.691862106323242, 29.08245086669922, -31.374561309814453, 16.87403106689453, 170.508544921875, 4.699090957641602, 33.743865966796875, 27.301494598388672, 89.937255859375, 22.997032165527344, -203.10971069335938, -15.466300964355469, -7.544984817504883, 7.704429626464844, 113.81482696533203, 55.189727783203125, 86.31452941894531, 187.52468872070312, -17.945709228515625, 94.0197525024414, 33.60566711425781, 151.7926025390625, 25.322900772094727, -38.490325927734375, -0.45131683349609375, -17.179931640625, 5.55645751953125, 208.51126098632812, 2.7576904296875, 158.62527465820312, -11.399419784545898, 9.459426879882812, 7.621002197265625, 26.180984497070312, -8.629539489746094, 5.137607574462891, 167.4099578857422, 209.8426513671875, 178.11016845703125, 16.524539947509766, 18.530128479003906, 0.7338905334472656, 60.82695770263672, 68.16168975830078, 29.780181884765625, -46.04427719116211, 6.601127624511719, -3.088226318359375, -9.86257553100586, 194.16188049316406, 5.887847900390625, 6.951145172119141, -97.19757080078125, -20.40341567993164, 193.1853485107422, 92.20950317382812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000551.npy"}
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 58.29364776611328, "std": 84.42240905761719, "min": -191.64559936523438, "p10": -27.54518432617186, "median": 60.16832733154297, "p90": 156.3301681518555, "max": 222.96278381347656, "pos_frac": 0.8125, "sample": [7.624223709106445, 33.3624267578125, 20.97602081298828, -11.671516418457031, 130.69837951660156, 47.13009262084961, 1.7467098236083984, 31.999588012695312, 41.42277526855469, 137.43780517578125, 173.16567993164062, 222.96278381347656, 124.8868408203125, 56.978179931640625, -11.730552673339844, -68.02518463134766, 9.27410888671875, 123.9010009765625, 60.339256286621094, -53.24165344238281, 50.3453369140625, 120.83612060546875, 126.41893005371094, 145.62098693847656, 137.25262451171875, -2.3281784057617188, 46.533897399902344, 132.8024444580078, 160.62049865722656, -191.64559936523438, 6.960905075073242, 1.4034347534179688, 107.61618041992188, -174.4031219482422, 83.3143081665039, -0.18422698974609375, 116.23956298828125, 45.22682189941406, 16.671226501464844, -1.9796600341796875, 180.21917724609375, 8.770965576171875, -34.32288360595703, 77.71839904785156, 146.31939697265625, 15.7119140625, 86.71683502197266, 64.36441802978516, -187.2952423095703, 178.3712921142578, 187.6373748779297, 93.54570770263672, 8.383138656616211, -43.164772033691406, 109.51123046875, 88.2319564819336, 78.81310272216797, 77.32376861572266, 169.96261596679688, 77.02532958984375, 59.997398376464844, 86.96558380126953, 134.48231506347656, 58.944923400878906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000552.npy"}
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 54.79808807373047, "std": 100.12044525146484, "min": -192.19192504882812, "p10": -64.25533943176269, "median": 51.09065246582031, "p90": 201.92363128662117, "max": 261.67657470703125, "pos_frac": 0.6875, "sample": [223.708740234375, -31.17596435546875, 1.2744598388671875, 83.37276458740234, 10.282722473144531, -14.265161514282227, 160.8046875, 169.2376708984375, -16.014442443847656, -12.074195861816406, 55.809410095214844, 72.20993041992188, 209.10316467285156, 234.62083435058594, -28.25226593017578, 114.42483520507812, -8.94386100769043, 72.00774383544922, 76.98910522460938, -192.19192504882812, -5.021383285522461, 112.67372131347656, -13.115791320800781, 137.89956665039062, 155.77320861816406, 35.41936492919922, 17.776771545410156, 50.850860595703125, 0.749847412109375, -44.48130798339844, 234.49920654296875, 175.3462677001953, 96.43517303466797, 261.67657470703125, 34.43461608886719, 105.00847625732422, 185.17138671875, 156.9011993408203, 51.3304443359375, 90.67890167236328, 36.741416931152344, 76.70985412597656, 27.694486618041992, -7.264408111572266, -67.51992797851562, 63.37299346923828, -1.4086875915527344, 210.39788818359375, -171.71463012695312, -92.8294677734375, 67.32504272460938, 153.19911193847656, 111.25982666015625, -70.3653335571289, -74.26996612548828, 25.971450805664062, 0.5724067687988281, -139.81361389160156, 113.270263671875, 90.8876724243164, -36.88593292236328, -56.63796615600586, 218.66671752929688, 8.783035278320312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000553.npy"}
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 65.84025573730469, "std": 94.04472351074219, "min": -186.4205780029297, "p10": -27.98949127197265, "median": 57.70381164550781, "p90": 183.67156524658205, "max": 230.21351623535156, "pos_frac": 0.75, "sample": [71.26481628417969, 211.6406707763672, 188.98480224609375, 215.18304443359375, -4.15142822265625, 83.56330871582031, 9.775712966918945, 6.557590484619141, 69.44761657714844, 160.26101684570312, -4.186927795410156, -13.580787658691406, -3.6725616455078125, 121.68065643310547, -186.4205780029297, 49.71525573730469, 116.98736572265625, -16.4351806640625, 135.48367309570312, 22.002988815307617, -12.447471618652344, 159.66543579101562, 34.47813415527344, -100.43838500976562, 111.64552307128906, 130.58489990234375, 189.33203125, 33.60139846801758, -1.4452400207519531, 28.06737518310547, 184.54095458984375, 4.8517303466796875, 119.38831329345703, 44.61874771118164, 4.490135192871094, -76.33735656738281, 180.46612548828125, 24.814332962036133, 122.6945571899414, -21.750213623046875, -128.48951721191406, 114.00914001464844, 65.69236755371094, 102.25932312011719, -30.663467407226562, 124.19125366210938, 14.876714706420898, 114.77818298339844, 46.3487548828125, 2.0419921875, 143.48081970214844, -2.7197341918945312, -57.3134651184082, 177.95973205566406, 199.1316375732422, 10.013862609863281, 181.16452026367188, 166.3633575439453, 107.60921478271484, 181.6429901123047, 230.21351623535156, -135.2393798828125, 178.283203125, 13.219100952148438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000554.npy"}
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 56.18605422973633, "std": 104.29386901855469, "min": -202.35748291015625, "p10": -92.63329544067382, "median": 55.882822036743164, "p90": 177.4351013183594, "max": 214.8528594970703, "pos_frac": 0.75, "sample": [0.44040489196777344, -198.62429809570312, 65.49337768554688, 59.42372131347656, 169.82916259765625, 24.866371154785156, -159.5803985595703, -23.666107177734375, -121.91395568847656, 94.73439025878906, 198.339599609375, -34.09925842285156, 120.81106567382812, 32.69114303588867, 141.33026123046875, 202.87686157226562, 39.03422546386719, -2.4910659790039062, 12.52239990234375, 176.59231567382812, 67.81665802001953, 1.2374420166015625, 197.0503692626953, 58.234466552734375, 177.79629516601562, 27.669326782226562, 137.56344604492188, 176.0884552001953, 27.802139282226562, 185.17764282226562, 154.96212768554688, 163.92266845703125, 75.81188201904297, 161.1441650390625, -113.08777618408203, 191.536865234375, -202.35748291015625, 149.1504669189453, -90.00440216064453, 53.53117752075195, 105.16189575195312, 43.833518981933594, -18.50157928466797, 19.50525665283203, -47.90266418457031, -2.844837188720703, 101.4927978515625, 91.92182159423828, 171.85989379882812, -16.981170654296875, 139.5029296875, 49.569557189941406, -184.32020568847656, -3.8093643188476562, 168.07308959960938, 25.195999145507812, 99.83415222167969, 33.152587890625, 103.33444213867188, -93.75996398925781, 214.8528594970703, 29.701955795288086, 154.50753784179688, 12.870794296264648], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000555.npy"}
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 48.79975128173828, "std": 104.04551696777344, "min": -187.30369567871094, "p10": -76.66857833862304, "median": 19.58550262451172, "p90": 189.9279327392578, "max": 250.34466552734375, "pos_frac": 0.71875, "sample": [166.0728759765625, 17.617279052734375, 31.5157470703125, -10.708499908447266, 10.658432006835938, 175.37265014648438, 196.9469757080078, 57.19611740112305, -6.449052810668945, -70.65679931640625, 14.697099685668945, 3.9835433959960938, 87.9090805053711, 158.354736328125, 22.77151107788086, -10.027992248535156, 37.1568603515625, -5.082542419433594, 118.9927978515625, 175.10333251953125, -14.568061828613281, 151.0749053955078, 188.60891723632812, 15.255149841308594, -51.72264862060547, 15.866737365722656, -187.30369567871094, 35.04668426513672, 139.10391235351562, -122.95318603515625, 0.9275302886962891, -182.91824340820312, 89.4476318359375, 7.294609069824219, 20.884597778320312, 2.2260494232177734, 143.09054565429688, 168.4730224609375, 5.3300018310546875, 2.667417526245117, 183.33096313476562, 250.34466552734375, 47.290401458740234, 12.983352661132812, 216.8538055419922, 211.49169921875, -52.750244140625, 18.286407470703125, -131.95777893066406, -3.05950927734375, 196.65696716308594, -98.88467407226562, -36.758079528808594, 187.00286865234375, 4.236545562744141, -72.09609985351562, 44.12261962890625, -78.62821197509766, 83.46876525878906, 177.3819580078125, 190.49322509765625, 194.6060791015625, 71.81321716308594, -90.30082702636719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000556.npy"}
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 74.97007751464844, "std": 85.94847106933594, "min": -106.85012817382812, "p10": -11.78196105957031, "median": 63.751216888427734, "p90": 194.60069580078127, "max": 213.75253295898438, "pos_frac": 0.765625, "sample": [195.09909057617188, 123.3779525756836, 92.59672546386719, 8.594711303710938, 0.8826179504394531, -37.648216247558594, 189.8171844482422, 24.35826873779297, 184.96954345703125, 89.61746215820312, 1.7072601318359375, -10.062637329101562, -3.0015716552734375, 102.42975616455078, 7.591663360595703, 68.22885131835938, 74.1966552734375, 4.695594787597656, 180.0054931640625, 95.11237335205078, 26.809139251708984, 14.67083740234375, 204.34744262695312, 28.7586669921875, -70.77737426757812, -40.985076904296875, 179.27206420898438, 201.48355102539062, 32.26070785522461, -39.173377990722656, 198.99325561523438, 119.7542724609375, 175.00286865234375, -8.979999542236328, -7.9461822509765625, 190.51466369628906, 110.52571868896484, 134.04904174804688, 37.61216735839844, 196.07205200195312, -6.4266510009765625, 12.933504104614258, -7.125297546386719, -1.207489013671875, -0.25266456604003906, -106.85012817382812, 187.52818298339844, 213.75253295898438, 175.1337890625, 42.57819366455078, 9.731697082519531, 66.79857635498047, 30.051528930664062, 161.97674560546875, 153.3521728515625, 60.703857421875, 193.43777465820312, 95.39970397949219, 30.62350082397461, 95.18101501464844, -41.86003112792969, -12.518814086914062, 200.98341369628906, 169.32630920410156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000557.npy"}
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 79.395263671875, "std": 93.37689971923828, "min": -151.42697143554688, "p10": -24.986510467529286, "median": 73.02916717529297, "p90": 195.70401000976562, "max": 275.6680603027344, "pos_frac": 0.796875, "sample": [166.32943725585938, 201.9716033935547, 196.42453002929688, -9.705726623535156, 194.02279663085938, 6.1506500244140625, 183.11703491210938, -48.85798263549805, 1.809326171875, 72.47004699707031, -13.627815246582031, 166.90847778320312, -32.19139099121094, 16.589759826660156, 0.276336669921875, 146.84088134765625, 180.97279357910156, -1.2523193359375, 99.88972473144531, 97.53746795654297, 275.6680603027344, 157.56690979003906, 15.319988250732422, 182.65505981445312, 134.04649353027344, -61.773101806640625, 154.10629272460938, 15.759452819824219, 169.12869262695312, 77.15276336669922, 174.81808471679688, 19.968215942382812, 73.58828735351562, 209.8699188232422, 12.810050964355469, 2.552541732788086, 94.64645385742188, 8.1773681640625, 66.20042419433594, 229.25717163085938, 147.70034790039062, -29.854522705078125, 236.44119262695312, 8.286026000976562, 186.88563537597656, 186.38134765625, -9.389236450195312, -52.44062042236328, 210.1826629638672, 3.484832763671875, 117.25773620605469, -151.42697143554688, 90.00691986083984, -37.630035400390625, 55.687042236328125, -5.89544677734375, 134.1743621826172, 8.807823181152344, -13.024673461914062, 0.709197998046875, 190.45736694335938, 42.994895935058594, 95.25997161865234, 29.046142578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000558.npy"}
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 67.82418823242188, "std": 104.82683563232422, "min": -157.4228515625, "p10": -36.794667816162104, "median": 41.79124069213867, "p90": 201.181169128418, "max": 350.14837646484375, "pos_frac": 0.6875, "sample": [12.530105590820312, -24.120872497558594, -5.029216766357422, -12.674583435058594, 104.49275970458984, 350.14837646484375, 224.88955688476562, 247.41778564453125, 36.746177673339844, -3.0486412048339844, -120.27651977539062, 129.72317504882812, 64.53780364990234, 98.53068542480469, 5.643400192260742, 187.3201446533203, 162.35537719726562, 87.0539321899414, 144.0703582763672, 98.55561828613281, 203.00289916992188, 26.700332641601562, -29.847854614257812, 146.2752227783203, -0.9284286499023438, 51.991363525390625, -120.18344116210938, 184.34597778320312, 46.8363037109375, 191.53640747070312, 27.794227600097656, 70.33995056152344, -20.594802856445312, 243.185791015625, 207.59605407714844, -39.771873474121094, 108.94749450683594, 8.112005233764648, -28.25513458251953, -17.049705505371094, -8.437583923339844, -24.08094024658203, -24.849529266357422, 26.823486328125, 135.9583740234375, 193.16043090820312, 152.68934631347656, -43.703285217285156, 165.70962524414062, -13.207504272460938, 33.068603515625, 196.9304656982422, -111.26394653320312, 10.782730102539062, 168.438232421875, 207.4752655029297, 14.851253509521484, 114.0940933227539, 0.34525299072265625, 15.207612991333008, 131.3120574951172, 181.31797790527344, -73.34955596923828, -157.4228515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000559.npy"}
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 60.24045181274414, "std": 86.1467514038086, "min": -140.96743774414062, "p10": -32.860194396972645, "median": 41.09950637817383, "p90": 185.13098754882813, "max": 201.15350341796875, "pos_frac": 0.75, "sample": [57.27393341064453, 52.088287353515625, 189.88421630859375, 16.68830108642578, -140.96743774414062, -13.440238952636719, 191.26388549804688, 19.278213500976562, 158.28826904296875, 2.091470718383789, 55.22706604003906, 16.32447052001953, 103.56256103515625, 46.38026428222656, 3.6375732421875, -2.29718017578125, -2.9383697509765625, 2.312051773071289, 17.845462799072266, 135.7200469970703, -78.40090942382812, 134.15460205078125, 63.52379608154297, 4.9739532470703125, 178.83094787597656, 147.653076171875, -10.390205383300781, 181.13514709472656, 186.8058319091797, 8.830999374389648, 4.648834228515625, -7.575630187988281, -22.10869598388672, 201.15350341796875, 150.6676788330078, -73.6556396484375, 28.16020965576172, -55.64701843261719, 183.703857421875, -37.467979431152344, 190.75662231445312, 116.92791748046875, 35.818748474121094, 86.29277038574219, 2.6011734008789062, 105.06995391845703, 1.7669143676757812, 186.34988403320312, 35.20400619506836, -19.7608642578125, -1.6389007568359375, 174.84207153320312, -56.19023895263672, -3.5509490966796875, 123.78372192382812, 185.74261474609375, -101.55604553222656, 62.49043655395508, 179.95443725585938, 48.30269241333008, 174.632568359375, 95.640625, 32.93317794799805, 101.7564697265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000560.npy"}
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 61.01319122314453, "std": 94.80498504638672, "min": -151.89395141601562, "p10": -44.092987823486325, "median": 34.98365020751953, "p90": 192.19048156738282, "max": 252.84754943847656, "pos_frac": 0.734375, "sample": [5.261100769042969, 9.597602844238281, 33.787017822265625, -6.999391555786133, 18.078935623168945, 169.70028686523438, 90.42909240722656, -18.308998107910156, 175.97073364257812, 28.322509765625, 16.5560302734375, 123.31730651855469, 1.1726570129394531, 109.49873352050781, 15.452880859375, -90.90896606445312, 24.828720092773438, 171.60687255859375, 193.03012084960938, -3.3980255126953125, -45.30421447753906, 161.968994140625, 142.52223205566406, -70.15940856933594, 138.3612060546875, -16.163055419921875, 126.32886505126953, -49.747257232666016, -151.89395141601562, 225.819091796875, -14.390483856201172, 36.260223388671875, 190.2313232421875, 194.87794494628906, 36.74198913574219, 36.18028259277344, 48.77515411376953, 36.857139587402344, 17.461523056030273, 227.97860717773438, 252.84754943847656, 212.15760803222656, 167.25900268554688, 193.9031524658203, 151.3494110107422, -112.80323028564453, -41.26679229736328, 26.875747680664062, 73.46202087402344, 11.846601486206055, -33.6334228515625, 58.57147216796875, 4.993782043457031, -15.991788864135742, 108.16888427734375, 189.1170654296875, 4.661252975463867, 137.9073486328125, 103.30184936523438, -89.351806640625, -3.5697994232177734, 26.782472610473633, 172.275634765625, -33.721160888671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000561.npy"}
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 74.48919677734375, "std": 96.2115478515625, "min": -109.10235595703125, "p10": -34.21878929138183, "median": 57.30817794799805, "p90": 186.53130493164062, "max": 294.5999755859375, "pos_frac": 0.71875, "sample": [9.008979797363281, 179.5227813720703, -84.96075439453125, 234.65670776367188, 200.5283966064453, 184.863037109375, 26.44109344482422, 274.141845703125, 171.11798095703125, -25.657310485839844, 73.82987976074219, 47.61920166015625, -107.08499145507812, 23.665416717529297, 32.503173828125, -1.2661094665527344, 18.63482666015625, 29.438323974609375, 136.37994384765625, -68.14158630371094, 133.37701416015625, 184.20030212402344, -39.23411560058594, -15.857582092285156, -12.307708740234375, 183.2204132080078, -3.801961898803711, 31.29863739013672, 33.481536865234375, 3.3972740173339844, 173.10214233398438, -30.979705810546875, 166.77471923828125, 179.04241943359375, 83.12042236328125, 294.5999755859375, -35.60696792602539, -12.041854858398438, 2.4485740661621094, 206.95437622070312, -1.3033676147460938, 15.288185119628906, 92.83963775634766, 120.63291931152344, 133.55274963378906, 187.24627685546875, 190.66708374023438, 138.08197021484375, 151.32977294921875, -11.825225830078125, 172.47830200195312, 56.157073974609375, -23.368242263793945, 58.45928192138672, 112.29133605957031, 81.9164810180664, 173.8157958984375, 178.42393493652344, 135.65704345703125, -109.10235595703125, -5.167816162109375, -43.31761169433594, 20.692176818847656, 61.434478759765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000562.npy"}
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 67.2710952758789, "std": 102.52279663085938, "min": -189.39990234375, "p10": -75.67349395751951, "median": 67.96399688720703, "p90": 193.12996520996094, "max": 223.00852966308594, "pos_frac": 0.75, "sample": [95.73465728759766, 155.74644470214844, 42.92501449584961, 0.13720703125, 185.7030029296875, 122.92193603515625, -189.39990234375, 199.604248046875, 156.83888244628906, 2.0889663696289062, 12.14310073852539, 170.00672912597656, 9.974357604980469, -86.63392639160156, 184.82479858398438, 58.25505065917969, -160.1918487548828, 99.89873504638672, -36.775718688964844, 223.00852966308594, 33.089874267578125, 208.40673828125, -50.099151611328125, 194.26541137695312, 164.87734985351562, 100.13621520996094, 122.68766784667969, -0.662139892578125, 78.89305114746094, -138.40377807617188, -2.0530853271484375, -20.890377044677734, 4.175468444824219, 44.21331787109375, -1.3108177185058594, 166.82162475585938, 72.22297668457031, -88.93901062011719, 112.4019775390625, 23.356822967529297, 189.62548828125, 195.89312744140625, 181.7540740966797, -34.61039733886719, -91.02407836914062, 63.70501708984375, 46.776214599609375, -39.01262664794922, 190.4805908203125, 114.31172943115234, -122.50357055664062, 78.6512451171875, 42.92741012573242, 139.7548828125, 146.13560485839844, 58.85774230957031, 196.4586181640625, 215.20855712890625, 31.082504272460938, 182.3912353515625, 126.14271545410156, -32.59025573730469, 139.78118896484375, 15.152786254882812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000563.npy"}
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 83.06480407714844, "std": 103.18859100341797, "min": -202.27210998535156, "p10": -32.77064266204833, "median": 87.80599975585938, "p90": 204.78780364990234, "max": 232.0377655029297, "pos_frac": 0.8125, "sample": [182.8304901123047, 8.583839416503906, 203.95846557617188, 178.0498046875, 86.67115783691406, 223.27474975585938, 144.2478790283203, -189.65367126464844, -24.687332153320312, 5.06201171875, 197.5302734375, 171.10610961914062, 44.35239791870117, -27.456087112426758, 1.07666015625, 18.235549926757812, -16.823287963867188, 156.27159118652344, 149.70474243164062, 82.79608917236328, 37.97731399536133, 1.357492446899414, 114.74378967285156, 143.25933837890625, 205.1432342529297, 150.59027099609375, 217.49154663085938, -75.72789001464844, -3.108121871948242, 127.68603515625, 221.78533935546875, 232.0377655029297, 209.578857421875, 104.82408142089844, 32.3133544921875, -138.31814575195312, 164.33262634277344, 75.79441833496094, 85.98237609863281, 8.098716735839844, 179.0398406982422, 54.80342102050781, 63.18394470214844, -20.972078323364258, -81.12974548339844, 73.20974731445312, 72.17627716064453, 196.0299072265625, 101.17884063720703, -65.36183166503906, 88.94084167480469, 135.9879608154297, 160.96945190429688, 11.722429275512695, 160.40325927734375, 23.67180633544922, 187.5142364501953, 209.41629028320312, -202.27210998535156, 191.56838989257812, -35.048309326171875, 17.391742706298828, 108.76214599609375, 173.98744201660156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000564.npy"}
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 55.58483123779297, "std": 88.9991683959961, "min": -163.50350952148438, "p10": -53.09539756774902, "median": 38.54973220825195, "p90": 174.53611602783204, "max": 262.60284423828125, "pos_frac": 0.75, "sample": [11.315849304199219, 1.7728919982910156, -12.691688537597656, 17.514007568359375, 24.97028350830078, 37.3868408203125, 76.73202514648438, 37.557456970214844, -2.330169677734375, 19.03205108642578, 29.16307830810547, 132.13265991210938, 240.75735473632812, -163.50350952148438, -15.27056884765625, 152.8583526611328, 64.36883544921875, 110.02693176269531, 262.60284423828125, 204.28616333007812, 106.4110336303711, -85.35945129394531, 6.506965637207031, 18.57868194580078, 62.413002014160156, 112.62539672851562, 80.76697540283203, 39.54200744628906, 59.79639434814453, 83.4946517944336, 175.49356079101562, 7.558588027954102, -37.14884948730469, -50.923526763916016, 110.03982543945312, -55.45615005493164, 9.26152229309082, 157.98194885253906, -158.2425537109375, 64.09410095214844, 145.74420166015625, -3.0491371154785156, -54.02619934082031, 68.72952270507812, -62.121978759765625, 123.75094604492188, 27.907745361328125, 172.3020782470703, 213.4518585205078, 6.800239562988281, 121.88250732421875, -12.466842651367188, -73.10507202148438, -3.0362281799316406, 31.31467056274414, 32.81711196899414, -41.891815185546875, 157.1864013671875, 71.68891143798828, 90.34860229492188, 188.3778533935547, 128.94024658203125, 113.2579574584961, 176.50999450683594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000565.npy"}
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 54.115760803222656, "std": 111.90186309814453, "min": -252.12513732910156, "p10": -81.69959564208983, "median": 45.85613822937012, "p90": 190.80598602294924, "max": 263.545654296875, "pos_frac": 0.671875, "sample": [92.52188110351562, 57.07152557373047, -70.991455078125, 113.60641479492188, 6.837493896484375, 46.80912399291992, 187.9416961669922, 183.24163818359375, 1.389089584350586, 120.32298278808594, 58.632320404052734, 67.39576721191406, 44.90315246582031, 31.219341278076172, 11.030448913574219, 263.545654296875, -252.12513732910156, 155.3651123046875, 197.2015380859375, 147.67019653320312, 202.9744415283203, -10.812667846679688, -45.22367477416992, 68.76319122314453, 2.360372543334961, -84.85350036621094, 105.25967407226562, 173.85491943359375, 186.21282958984375, 3.2035064697265625, 168.36317443847656, 156.3119354248047, 163.73394775390625, 2.642059326171875, -47.79156494140625, -182.23513793945312, -65.43010711669922, 44.23562240600586, 259.832763671875, -74.34048461914062, 29.402420043945312, -85.74630737304688, 246.0399932861328, -24.781394958496094, 180.4618377685547, -4.515953063964844, -103.5744400024414, -96.71553802490234, 40.48383331298828, -10.976707458496094, -57.74191665649414, -0.5366897583007812, -64.43492126464844, 183.49240112304688, -5.858074188232422, 68.99681091308594, -63.66529083251953, 198.4674072265625, 78.46131134033203, -109.7978744506836, 159.8690643310547, 117.43126678466797, 105.9639892578125, 192.03353881835938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000566.npy"}
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 67.20326232910156, "std": 94.05794525146484, "min": -135.40155029296875, "p10": -58.836993026733396, "median": 66.54410171508789, "p90": 183.60341644287112, "max": 253.108154296875, "pos_frac": 0.734375, "sample": [97.86222839355469, -0.8720893859863281, -0.01692962646484375, 177.9685516357422, -63.67741394042969, 160.6386260986328, 196.44253540039062, 169.81991577148438, 178.7960968017578, 27.20195770263672, 171.08834838867188, 61.755523681640625, 105.0260238647461, -35.70500183105469, 113.07887268066406, -10.324501037597656, -4.279266357421875, 30.7762451171875, 20.92127227783203, 64.70143127441406, 97.48214721679688, 11.230812072753906, 1.8470592498779297, 169.27508544921875, -53.972206115722656, 0.7867107391357422, -135.40155029296875, -105.85399627685547, 76.99576568603516, -119.28641510009766, 160.33331298828125, 50.06721496582031, 185.6636962890625, 253.108154296875, 202.5609130859375, -0.3471260070800781, 113.8952865600586, 54.50068664550781, 119.8701171875, 192.71250915527344, -105.39774322509766, 101.1006851196289, 33.304779052734375, 86.34654235839844, 169.16555786132812, 161.9430389404297, -0.8884735107421875, 4.0237274169921875, 173.635986328125, 147.33432006835938, 68.38677215576172, 20.94864273071289, 187.40524291992188, -0.05438232421875, 7.7935333251953125, -131.75204467773438, -60.92190170288086, 101.70069885253906, 198.18728637695312, -3.2327880859375, 143.74391174316406, 92.57398986816406, 46.56724548339844, 122.42346954345703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000567.npy"}
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 82.55564880371094, "std": 93.90958404541016, "min": -75.08941650390625, "p10": -28.722533416748036, "median": 59.022342681884766, "p90": 203.9066635131836, "max": 317.53851318359375, "pos_frac": 0.796875, "sample": [121.53205108642578, 45.24835205078125, 62.678470611572266, 22.981399536132812, 93.93450927734375, 144.71865844726562, 149.56661987304688, -67.45802307128906, 12.554828643798828, -7.5625, -69.90042877197266, 180.9837646484375, 54.67301940917969, 233.96554565429688, 0.1902313232421875, -58.40171813964844, 42.134132385253906, -6.763618469238281, 47.63536834716797, 136.06304931640625, -73.00921630859375, -11.542327880859375, 55.366214752197266, 181.7276611328125, 204.0987091064453, 203.45855712890625, 140.57107543945312, 231.53927612304688, -75.08941650390625, 47.24688720703125, 167.4922332763672, 98.89981079101562, 91.48416137695312, -16.35639190673828, 148.87606811523438, 15.796319961547852, -5.51373291015625, 81.68134307861328, -1.5481433868408203, 169.02557373046875, 4.4107513427734375, 190.87930297851562, 105.74249267578125, 97.33316040039062, 166.32672119140625, 33.944435119628906, 28.376907348632812, 0.380645751953125, 16.7706298828125, 38.02876281738281, 80.47500610351562, 174.919189453125, -34.022308349609375, 144.08839416503906, 185.56117248535156, 39.488983154296875, 0.3478240966796875, 269.397216796875, 220.50833129882812, 215.99766540527344, -37.83662414550781, 43.22321319580078, 188.70263671875, 317.53851318359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000568.npy"}
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 42.50354766845703, "std": 92.11004638671875, "min": -180.84652709960938, "p10": -53.779261016845695, "median": 28.9630126953125, "p90": 177.9466735839844, "max": 237.4973907470703, "pos_frac": 0.734375, "sample": [172.89736938476562, -44.88987731933594, 3.4785289764404297, 172.1009063720703, 33.833099365234375, 184.04510498046875, 14.307357788085938, 158.94451904296875, 45.7598876953125, 185.26556396484375, 91.78494262695312, -32.91783905029297, 6.118541717529297, 42.10746765136719, -180.84652709960938, 59.55418395996094, 20.523845672607422, 237.4973907470703, -23.93031883239746, 82.10978698730469, 153.65628051757812, 105.63034057617188, 3.888439178466797, 205.76962280273438, -16.985321044921875, 174.47450256347656, 95.18408203125, 24.757442474365234, 30.608551025390625, -123.6792221069336, -144.22415161132812, 65.57359313964844, 6.040092468261719, 9.839702606201172, -57.58899688720703, 6.723114013671875, 193.80694580078125, 3.6739063262939453, 35.97730255126953, 21.616098403930664, 70.29056549072266, -27.936927795410156, -114.82145690917969, -13.680046081542969, 97.99531555175781, -109.9430160522461, 39.47674560546875, -8.942092895507812, 25.626537322998047, 106.48638153076172, 52.48500442504883, 51.78596496582031, 178.47047424316406, -18.31584930419922, -40.17469024658203, 34.24018096923828, 92.13444519042969, 20.616409301757812, 191.272216796875, -43.25318908691406, -110.36748504638672, 27.317474365234375, 176.72447204589844, 20.25322723388672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000569.npy"}
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 83.39543151855469, "std": 94.35749053955078, "min": -135.9524688720703, "p10": -25.84933166503906, "median": 83.23954391479492, "p90": 196.8255172729492, "max": 315.7513122558594, "pos_frac": 0.78125, "sample": [189.0028076171875, -62.090797424316406, 169.71633911132812, 63.47365951538086, 140.59750366210938, 20.812095642089844, -62.51458740234375, 0.0160980224609375, 55.1070556640625, -9.752761840820312, 84.7120590209961, 81.76702880859375, 1.9001216888427734, 153.6749267578125, -8.222084045410156, 158.72940063476562, -0.3196887969970703, -2.6301612854003906, 81.02809143066406, 125.33718872070312, 109.54962158203125, -124.1644287109375, 29.72801971435547, 177.9504852294922, 197.2286834716797, 3.4431819915771484, 74.51048278808594, 175.6049041748047, 204.52474975585938, -26.313133239746094, 10.025123596191406, 122.35688018798828, 168.51087951660156, -72.76838684082031, 4.587467193603516, 128.34188842773438, 168.56219482421875, 89.80772399902344, 188.01644897460938, 54.20994567871094, 213.47720336914062, 121.79574584960938, 222.12741088867188, -18.663686752319336, 38.879966735839844, 133.61720275878906, 141.45050048828125, 207.4917755126953, 113.47427368164062, -135.9524688720703, 177.3301239013672, -24.767127990722656, 60.25170135498047, 150.77056884765625, 204.7379608154297, 38.387359619140625, 66.29768371582031, -3.0225963592529297, 108.80546569824219, 165.66055297851562, 315.7513122558594, 38.562747955322266, 195.88479614257812, -59.09776306152344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000570.npy"}
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 47.2779655456543, "std": 97.58845520019531, "min": -185.46743774414062, "p10": -94.04218749999998, "median": 56.22933578491211, "p90": 171.69019012451176, "max": 226.2147216796875, "pos_frac": 0.71875, "sample": [112.74681091308594, 107.27057647705078, 55.17512512207031, 17.56373405456543, 0.8378753662109375, 113.8575439453125, 54.484619140625, 93.10030364990234, 137.19468688964844, 226.2147216796875, 95.90522003173828, 69.83965301513672, -111.51151275634766, -3.4685935974121094, -2.049652099609375, 118.01341247558594, 5.345947265625, 160.48831176757812, 10.133766174316406, 116.11539459228516, 163.84371948242188, 11.156593322753906, 63.52122497558594, 76.74305725097656, 193.49658203125, 85.81600952148438, 106.80177307128906, 108.9579086303711, -185.46743774414062, 17.137107849121094, 0.5657444000244141, -100.76130676269531, -9.915153503417969, -125.5896987915039, 105.52503967285156, -2.750570297241211, -47.929847717285156, 0.13514328002929688, -78.36424255371094, -35.49806213378906, 33.97374725341797, 88.7374267578125, 175.05296325683594, 114.72103881835938, 63.26316452026367, 186.3602294921875, 188.17178344726562, -177.60574340820312, 49.140525817871094, -24.27251434326172, -49.93904113769531, -151.02818298339844, 93.73983001708984, 55.46159362792969, 162.22373962402344, -20.85851287841797, 139.6685791015625, 144.46261596679688, 56.99707794189453, 180.7249298095703, -44.898433685302734, 4.3354949951171875, -156.4849395751953, 189.16087341308594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000571.npy"}
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 56.937442779541016, "std": 84.93143463134766, "min": -134.253662109375, "p10": -41.38083724975585, "median": 42.98287391662598, "p90": 176.63070068359377, "max": 207.76727294921875, "pos_frac": 0.734375, "sample": [27.98754119873047, -60.84840393066406, 2.3434219360351562, 101.79095458984375, 150.54835510253906, 44.40815734863281, 174.46682739257812, 43.20721435546875, -6.867042541503906, 191.11087036132812, -4.2155609130859375, 140.51158142089844, 121.09652709960938, 189.722412109375, -19.52688217163086, 178.73190307617188, -45.386688232421875, 24.07798194885254, 27.40413475036621, 115.06825256347656, 42.7585334777832, 47.031829833984375, 112.53117370605469, 177.55807495117188, 171.62936401367188, 80.91265106201172, -32.033851623535156, 121.69756317138672, -6.236442565917969, 17.301376342773438, 207.76727294921875, 179.67701721191406, 173.16912841796875, -123.71405029296875, 124.24750518798828, 40.89902114868164, 38.388641357421875, -66.16256713867188, 6.180210113525391, 99.2425537109375, 84.58635711669922, -9.447195053100586, 139.29840087890625, 120.02780151367188, 44.18303680419922, 130.57183837890625, 25.280426025390625, -112.66999816894531, 198.975830078125, 10.311813354492188, 86.62396240234375, 113.43650817871094, -29.167984008789062, -0.2277679443359375, -134.253662109375, -31.533676147460938, 24.03809356689453, 111.26278686523438, -13.726470947265625, -71.41744995117188, 124.23800659179688, 1.4710750579833984, 18.83514976501465, 4.822885513305664], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000572.npy"}
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 57.960819244384766, "std": 107.94557189941406, "min": -208.62783813476562, "p10": -73.53320693969727, "median": 46.227169036865234, "p90": 212.99604187011718, "max": 298.7003173828125, "pos_frac": 0.734375, "sample": [40.083526611328125, 126.31956481933594, -175.2351837158203, 178.881103515625, -75.69461822509766, 40.81622314453125, -4.33531379699707, 21.210960388183594, 49.647216796875, -68.48991394042969, -177.8114013671875, -137.37841796875, -12.454740524291992, -33.61170959472656, -20.15606689453125, 124.055908203125, -16.534591674804688, 28.208755493164062, 190.755615234375, -208.62783813476562, 42.847999572753906, 76.38640594482422, 213.63580322265625, 77.13350677490234, -13.13775634765625, 13.039386749267578, 298.7003173828125, 2.3646202087402344, 139.34384155273438, 91.65557098388672, 25.485321044921875, 22.69488525390625, 89.85568237304688, 49.60633850097656, 96.4923095703125, 14.76699447631836, 71.33665466308594, 70.08551025390625, 30.53522491455078, 231.5712890625, 169.91549682617188, 154.3885955810547, 172.32388305664062, -4.503206253051758, -30.325454711914062, 97.6184310913086, 220.1267547607422, 10.868942260742188, 35.76171112060547, 91.78266143798828, 211.50326538085938, 234.86749267578125, 116.08908081054688, 122.79302215576172, 3.7825241088867188, -121.95521545410156, 251.940185546875, -18.40807342529297, 89.85008239746094, 105.31004333496094, 131.91575622558594, 21.06568145751953, -95.12864685058594, 223.86026000976562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000573.npy"}
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 73.18067169189453, "std": 78.90535736083984, "min": -108.21212005615234, "p10": -7.234431457519531, "median": 61.22980880737305, "p90": 185.4522186279297, "max": 273.0780334472656, "pos_frac": 0.828125, "sample": [62.786277770996094, -48.328041076660156, -10.463478088378906, -108.21212005615234, 93.36763000488281, 105.37638854980469, 45.45647430419922, 23.44866943359375, 5.143152236938477, -18.55328369140625, -7.1703338623046875, 78.18614196777344, 11.795486450195312, 35.923423767089844, -4.824859619140625, 2.0383071899414062, 6.197704315185547, 42.22782897949219, 43.18486785888672, 107.37664794921875, 166.11968994140625, 26.105186462402344, 36.27886962890625, 154.6927490234375, -5.025154113769531, 188.53770446777344, 38.06305694580078, 179.4585418701172, -0.3275737762451172, 186.58544921875, 81.61041259765625, 0.1120147705078125, 88.16065979003906, 31.4642333984375, 71.41363525390625, 139.52822875976562, -69.38304901123047, 133.7399444580078, 188.50106811523438, 10.130342483520508, 110.051513671875, 20.681884765625, 181.06173706054688, 177.48941040039062, 144.52951049804688, 190.979736328125, 109.72309875488281, 68.9432373046875, 59.67333984375, 16.42467498779297, -7.26190185546875, 189.55552673339844, 20.27264404296875, 118.94409942626953, 104.16536712646484, 24.980125427246094, 63.95602035522461, 181.6830596923828, 182.80801391601562, 207.8456268310547, -10.310462951660156, 126.79441833496094, 273.0780334472656, 16.771331787109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000574.npy"}
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 52.35631561279297, "std": 110.10914611816406, "min": -190.04176330566406, "p10": -88.55299987792966, "median": 26.81995391845703, "p90": 205.11578826904298, "max": 314.3399658203125, "pos_frac": 0.65625, "sample": [123.92069244384766, -139.89584350585938, -190.04176330566406, -4.883752822875977, 314.3399658203125, 59.90093994140625, -11.291900634765625, 111.7654800415039, -6.4623565673828125, -14.993602752685547, 132.14039611816406, -43.67082977294922, 205.64793395996094, -47.3040657043457, -132.6763458251953, 85.75779724121094, -124.79924774169922, 85.71896362304688, 0.46736907958984375, 6.437736511230469, 58.48828887939453, 203.87411499023438, 155.9422607421875, 11.5408935546875, 4.349586486816406, 13.805923461914062, -146.9451904296875, 133.22569274902344, 210.34213256835938, -9.485847473144531, 21.21601676940918, 62.19483184814453, 192.25523376464844, 244.3042449951172, 170.64076232910156, -73.21794128417969, 207.76739501953125, -7.65472412109375, 219.50970458984375, 152.05641174316406, -25.998641967773438, 29.149642944335938, 113.57828521728516, 24.490264892578125, -5.3948822021484375, -6.764900207519531, 44.505035400390625, -170.61087036132812, 123.10566711425781, 245.3642578125, 200.37709045410156, 117.39210510253906, 6.8170623779296875, 33.70579147338867, -95.12516784667969, 19.45679473876953, 80.59765625, -12.245220184326172, 21.934412002563477, 98.37892150878906, -12.562004089355469, 156.98178100585938, 147.88050842285156, -18.496646881103516], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000575.npy"}
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 73.40779113769531, "std": 97.04320526123047, "min": -190.40760803222656, "p10": -46.970750427246095, "median": 60.7308464050293, "p90": 196.51170806884767, "max": 272.7506103515625, "pos_frac": 0.75, "sample": [35.378997802734375, 173.27117919921875, -7.400993347167969, 52.89892578125, 103.95916748046875, 67.03211975097656, 190.276123046875, 22.85539436340332, 57.313018798828125, 139.5668182373047, -46.91747283935547, -76.35037231445312, 178.0839080810547, -63.514129638671875, 112.12398529052734, -93.53787231445312, -46.99358367919922, 169.26011657714844, 272.7506103515625, 96.7896728515625, 187.89805603027344, 208.80960083007812, 103.24894714355469, 2.286672592163086, 151.135009765625, 94.49453735351562, 9.606134414672852, -67.81970977783203, 240.52777099609375, -49.761619567871094, 24.216140747070312, 27.45514678955078, 43.498138427734375, 182.90277099609375, 88.15753173828125, 50.29400634765625, 229.65550231933594, -6.3925628662109375, -3.7578125, 5.900382995605469, 189.5984344482422, -9.348529815673828, 176.4013671875, 12.099729537963867, 192.8889923095703, 6.5687408447265625, 199.26194763183594, 155.19384765625, -2.020711898803711, 198.06430053710938, 167.3800811767578, -190.40760803222656, 199.8461151123047, 78.63206481933594, 64.14867401123047, 170.8763427734375, -35.88166809082031, -2.5510787963867188, 30.95459747314453, 99.0870132446289, -0.9682559967041016, 40.34986114501953, 11.172111511230469, 87.55169677734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000576.npy"}
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 52.93243408203125, "std": 98.03273010253906, "min": -204.79019165039062, "p10": -49.850492477416985, "median": 42.39693641662598, "p90": 186.973063659668, "max": 232.2672119140625, "pos_frac": 0.765625, "sample": [35.61433410644531, 10.6279296875, -11.206748962402344, 57.370849609375, 3.0145111083984375, 155.91952514648438, 17.361637115478516, -22.90189552307129, 216.3800048828125, 3.6879501342773438, -43.21057891845703, 185.0928497314453, 148.21290588378906, -46.081207275390625, -70.90415954589844, 192.10598754882812, 151.352294921875, 176.42593383789062, -162.7800750732422, 118.228515625, 190.78614807128906, 83.63861846923828, 12.311290740966797, 172.6324462890625, 45.91400909423828, -66.80101013183594, -124.61190795898438, 127.4127197265625, 145.20497131347656, 67.92575073242188, -16.670455932617188, 164.92662048339844, -21.487060546875, 58.61982727050781, 12.768829345703125, -11.629608154296875, 45.63327407836914, 9.06689453125, 39.16059875488281, 4.470890045166016, 48.03191375732422, 23.03759765625, 200.1667022705078, 2.193845748901367, 187.77886962890625, 232.2672119140625, -204.79019165039062, 110.31288146972656, 66.54859924316406, 46.9630126953125, 23.533096313476562, 102.67326354980469, 193.32534790039062, 175.331787109375, 21.033676147460938, 35.698490142822266, -201.5447998046875, -7.004648208618164, -51.46590042114258, 111.39510345458984, 2.6199493408203125, 86.88778686523438, 101.76838684082031, 27.330337524414062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000577.npy"}
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 49.770347595214844, "std": 100.7588882446289, "min": -184.3854522705078, "p10": -66.90071640014646, "median": 26.79869842529297, "p90": 183.26463317871094, "max": 252.57769775390625, "pos_frac": 0.65625, "sample": [89.40647888183594, -3.203399658203125, 73.03025817871094, 27.74462890625, 46.769798278808594, -74.23439025878906, 78.34298706054688, 13.400520324707031, 145.00717163085938, -0.7625408172607422, 2.48211669921875, 252.57769775390625, -3.65728759765625, 122.32562255859375, 166.87393188476562, -184.3854522705078, 166.68846130371094, 181.31265258789062, 144.87295532226562, 131.7464141845703, 172.44000244140625, 96.58506774902344, 2.6562671661376953, 1.6663398742675781, -0.1709003448486328, 93.79259490966797, 196.6065673828125, -1.590606689453125, 95.83832550048828, -4.110313415527344, 217.20083618164062, 174.01095581054688, 184.1011962890625, -35.578216552734375, -30.060829162597656, 25.852767944335938, 0.15107154846191406, 83.23835754394531, 68.99718475341797, 71.20936584472656, 168.06907653808594, -163.5621337890625, 29.17486572265625, -49.78881072998047, 0.2747650146484375, 9.2655029296875, -35.98377227783203, 180.14939880371094, -88.14556884765625, 201.93569946289062, 84.21390533447266, -113.58861541748047, 249.23927307128906, -29.202234268188477, -36.46998596191406, -75.91049194335938, 18.476181030273438, 53.81591033935547, -38.14997100830078, 12.590787887573242, -31.067256927490234, -4.555793762207031, -137.68746948242188, 193.0341339111328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000578.npy"}
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 73.43511199951172, "std": 85.46192932128906, "min": -168.42115783691406, "p10": -26.93174667358398, "median": 71.22794342041016, "p90": 189.5637969970703, "max": 212.6222381591797, "pos_frac": 0.78125, "sample": [104.14191436767578, 83.47444152832031, 73.26238250732422, 48.85631561279297, 67.0876235961914, -112.09750366210938, 190.7620391845703, 71.62852478027344, 132.10888671875, 212.6222381591797, 112.75360107421875, 176.22406005859375, 126.26756286621094, 194.1161651611328, 133.72808837890625, 16.77954864501953, 179.38514709472656, -15.398696899414062, -28.490013122558594, 84.53010559082031, 209.8709716796875, -55.792537689208984, 183.14942932128906, -3.2601985931396484, 51.68290710449219, 120.57611083984375, 193.8203582763672, 14.450912475585938, 186.7678985595703, 161.93309020996094, -51.317848205566406, 28.741119384765625, 156.53353881835938, 0.40604591369628906, 69.0174331665039, 102.07571411132812, 50.126556396484375, -3.460203170776367, -43.03486633300781, 183.41958618164062, -23.295791625976562, 6.485139846801758, -8.800689697265625, -0.6156997680664062, 70.82736206054688, 4.568696975708008, -14.033485412597656, -168.42115783691406, 194.07162475585938, 118.0601577758789, 55.01873779296875, 129.58523559570312, 26.65380859375, 202.47242736816406, 121.290771484375, 6.81812858581543, 150.05665588378906, 137.06744384765625, 30.203298568725586, 81.73667907714844, -34.08912658691406, 16.667499542236328, 120.6343765258789, 69.43655395507812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000579.npy"}
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 58.764244079589844, "std": 105.71414184570312, "min": -180.15823364257812, "p10": -57.91397705078123, "median": 54.98484420776367, "p90": 201.7858657836914, "max": 253.96112060546875, "pos_frac": 0.65625, "sample": [-155.38613891601562, -83.60322570800781, -27.208053588867188, 13.47028923034668, -29.6549072265625, 18.88860511779785, 166.88189697265625, 75.87615966796875, 74.0302734375, 27.13176727294922, 199.7640838623047, 83.30948638916016, -9.740997314453125, -1.5313682556152344, -28.461036682128906, 64.73108673095703, 160.77345275878906, 131.71426391601562, 0.8381099700927734, 40.25575256347656, -1.3474769592285156, 118.51055908203125, 223.21078491210938, 67.12932586669922, 140.74880981445312, -20.492393493652344, 223.61077880859375, -4.7099609375, -0.7713527679443359, 212.7021484375, 202.65234375, 191.0919952392578, -0.6744766235351562, 128.0612335205078, -35.67803955078125, 190.11795043945312, -23.233654022216797, -180.15823364257812, 8.04233169555664, 253.96112060546875, -169.578857421875, 12.1229248046875, -86.96261596679688, 77.7269287109375, 240.4326629638672, -26.67198371887207, 48.84025573730469, 61.129432678222656, 216.4676513671875, 135.735595703125, 150.97079467773438, 148.2377471923828, -34.879295349121094, -153.5320281982422, 23.18576431274414, -28.2100772857666, 85.21839904785156, 139.2065887451172, 114.69778442382812, 188.18820190429688, 27.637466430664062, -67.44366455078125, 125.55154418945312, 117.98687744140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000580.npy"}
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 60.2906494140625, "std": 95.01850891113281, "min": -154.20050048828125, "p10": -72.7671028137207, "median": 39.44902229309082, "p90": 188.0386962890625, "max": 242.38619995117188, "pos_frac": 0.71875, "sample": [116.40324401855469, 165.90545654296875, 23.96674346923828, 3.7415924072265625, 108.46876525878906, -7.4954986572265625, 3.5341644287109375, -75.74197387695312, 85.322509765625, 99.9852066040039, -63.479827880859375, 15.26108169555664, 19.68592643737793, 46.351470947265625, 141.44491577148438, -74.27312469482422, 185.1060791015625, 68.34021759033203, 126.65090942382812, 18.752538681030273, 180.68711853027344, 27.928787231445312, 194.07315063476562, -62.85784912109375, 215.77293395996094, 189.2955322265625, 146.25856018066406, -25.790664672851562, 34.636940002441406, 97.15205383300781, -0.343170166015625, 159.42288208007812, 31.875621795654297, 116.53067779541016, 138.0757293701172, 133.0832977294922, 39.094268798828125, -154.20050048828125, 231.9709930419922, 192.3931884765625, -22.660598754882812, -69.2530517578125, 91.48797607421875, 11.240798950195312, 95.75277709960938, -98.20579528808594, 242.38619995117188, 20.011810302734375, -6.498241424560547, -87.30657958984375, 173.5283203125, -11.926658630371094, 107.9068603515625, 159.20370483398438, 14.456806182861328, 108.81205749511719, -2.3004531860351562, -82.12776184082031, -85.67140197753906, 39.803775787353516, 18.5377197265625, 174.77703857421875, 193.79554748535156, -20.13909912109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000581.npy"}
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 60.47076416015625, "std": 95.41392517089844, "min": -168.90879821777344, "p10": -31.966058349609373, "median": 45.021568298339844, "p90": 181.04580078125, "max": 264.7656555175781, "pos_frac": 0.75, "sample": [132.38162231445312, 179.35061645507812, 44.95842742919922, 104.86282348632812, 23.145126342773438, 19.902706146240234, 200.38258361816406, 69.80973815917969, 144.99261474609375, 160.21530151367188, 90.1766357421875, 45.08470916748047, 221.55894470214844, 264.7656555175781, -30.4298095703125, 181.77230834960938, -32.62445068359375, -14.650253295898438, 72.56944274902344, 119.20897674560547, -3.9627532958984375, 137.39031982421875, 20.662860870361328, 209.17889404296875, 142.94015502929688, 203.0951690673828, 4.713266372680664, 18.06516456604004, -131.93389892578125, -168.90879821777344, -5.432790756225586, 18.87115478515625, 115.83679962158203, -19.24401092529297, -15.566030502319336, 135.9739227294922, 138.307373046875, 67.68846130371094, 189.99026489257812, 6.393978118896484, 133.47714233398438, 101.09783935546875, 64.62802124023438, 7.972475051879883, 1.262929916381836, -25.172210693359375, -36.188453674316406, 157.782470703125, 8.174858093261719, -168.1383819580078, 1.6138362884521484, 17.758323669433594, 0.3114891052246094, 0.8075485229492188, -95.77240753173828, 88.23406982421875, 163.06825256347656, -28.70093536376953, 175.63131713867188, -27.574615478515625, -44.13043212890625, 7.2068023681640625, 129.26153564453125, 176.02415466308594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000582.npy"}
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 67.73499298095703, "std": 106.95381164550781, "min": -193.4546356201172, "p10": -45.41278610229491, "median": 78.05879974365234, "p90": 199.02582550048828, "max": 248.70718383789062, "pos_frac": 0.671875, "sample": [78.44479370117188, -18.07122039794922, -33.012657165527344, -1.0065574645996094, 179.74411010742188, 84.67918395996094, -65.1382064819336, -35.19219970703125, -147.61178588867188, -5.365753173828125, 190.5295867919922, -18.74850082397461, -24.95052719116211, 161.30201721191406, 99.46694946289062, -49.79303741455078, 74.79518127441406, 0.8593769073486328, 197.32789611816406, 167.40948486328125, 59.32490539550781, -155.604248046875, 170.9134063720703, 3.5339889526367188, 109.14256286621094, 134.1119842529297, 223.86392211914062, -193.4546356201172, 26.571979522705078, 171.89120483398438, 44.657527923583984, 49.62049865722656, -10.014389038085938, 25.768508911132812, 132.24298095703125, 207.98158264160156, -28.985153198242188, -26.39453887939453, 186.88235473632812, 183.27027893066406, 77.67280578613281, 206.65005493164062, -190.22662353515625, 248.70718383789062, -9.162328720092773, 168.88021850585938, 140.69265747070312, 142.5198974609375, 89.60335540771484, -22.750526428222656, 205.91265869140625, 89.9328842163086, 190.8286590576172, 10.79887580871582, 94.70481872558594, 128.94195556640625, -68.98786163330078, 96.70221710205078, -0.987030029296875, 178.19638061523438, 200.0592498779297, 199.75350952148438, 7.4349822998046875, -1.8314743041992188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000583.npy"}
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 61.912025451660156, "std": 109.7228775024414, "min": -171.7078094482422, "p10": -69.9253318786621, "median": 43.139034271240234, "p90": 207.35252380371097, "max": 251.06597900390625, "pos_frac": 0.671875, "sample": [201.16995239257812, 210.002197265625, 76.00669860839844, -3.02813720703125, 70.53256225585938, -8.521615982055664, 216.6376953125, 201.11746215820312, 153.59588623046875, 184.68605041503906, 114.56219482421875, -6.591470718383789, -6.789850234985352, -67.79549407958984, -133.4739227294922, 136.70138549804688, 118.54479217529297, 189.26885986328125, 176.13177490234375, -4.854496002197266, 170.6876220703125, 220.7334442138672, -2.4251327514648438, 164.33349609375, 231.39791870117188, 179.78582763671875, 84.05067443847656, -0.5570526123046875, 24.131004333496094, 21.514175415039062, 187.01278686523438, -102.04353332519531, 142.7969970703125, 14.698974609375, -100.57780456542969, 174.810546875, -19.290512084960938, -36.28460693359375, -21.93901824951172, -45.25731658935547, 4.862926483154297, 91.88306427001953, -70.83811950683594, 55.27619171142578, 95.27521514892578, 232.01177978515625, 2.050912857055664, 68.70680236816406, 89.26041412353516, -67.60124969482422, -167.17198181152344, 22.774608612060547, -43.834068298339844, 177.75765991210938, 2.555328369140625, 251.06597900390625, 226.20147705078125, 7.589235305786133, 13.051511764526367, 31.001876831054688, 7.566886901855469, 115.19900512695312, -116.04901123046875, -171.7078094482422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000584.npy"}
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 43.517784118652344, "std": 93.75747680664062, "min": -218.16891479492188, "p10": -30.437088775634756, "median": 30.95962619781494, "p90": 185.05240478515626, "max": 241.97207641601562, "pos_frac": 0.75, "sample": [26.640674591064453, 82.02836608886719, 86.09266662597656, -3.3954010009765625, 72.67493438720703, 97.39588928222656, 32.39775848388672, 197.9599609375, -8.397109985351562, 4.600868225097656, -178.1039581298828, 17.033931732177734, 96.41983032226562, 29.965377807617188, 0.5342826843261719, 241.97207641601562, -142.2378692626953, -18.11785125732422, 43.82868957519531, 96.3221435546875, 126.92227172851562, 18.95948600769043, -4.15461540222168, -125.02882385253906, 0.9915256500244141, 33.175392150878906, -20.596633911132812, 6.269020080566406, 182.84620666503906, 143.845703125, 5.453571319580078, 9.110443115234375, 58.81682205200195, -2.945507049560547, -182.6232452392578, 11.605979919433594, 14.512680053710938, 145.82275390625, 190.19085693359375, 102.31794738769531, 38.09114456176758, 151.28591918945312, 75.76266479492188, -69.0475082397461, -218.16891479492188, 192.48876953125, 196.88128662109375, 186.38772583007812, -34.65442657470703, -8.089897155761719, 31.953874588012695, 29.043983459472656, -11.973114013671875, 5.802896499633789, 46.12457275390625, 57.985862731933594, 22.57391357421875, 184.3700714111328, 185.34483337402344, 85.84384155273438, 71.50240325927734, 4.7166748046875, 71.89398193359375, -2.0892715454101562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000585.npy"}
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 65.08265686035156, "std": 74.72933959960938, "min": -90.78886413574219, "p10": -9.920569419860838, "median": 42.317054748535156, "p90": 181.33204650878906, "max": 212.7110595703125, "pos_frac": 0.796875, "sample": [125.7086181640625, 14.879920959472656, 147.29702758789062, 65.92450714111328, 60.13836669921875, 155.9729766845703, 31.60859489440918, 8.207061767578125, -2.1038742065429688, -13.004255294799805, 100.37093353271484, -12.742134094238281, 4.9506072998046875, 212.7110595703125, 193.10853576660156, -8.961809158325195, 114.65943908691406, 106.76161193847656, 79.4760513305664, 131.34075927734375, 25.41872787475586, 71.49274444580078, 24.77823257446289, 182.4231719970703, 9.91461181640625, -2.178640365600586, 15.738044738769531, 13.649261474609375, 160.60130310058594, 36.37789535522461, 140.27362060546875, 151.6485595703125, 53.193809509277344, 54.94309616088867, 204.4699249267578, 7.1265716552734375, -90.78886413574219, 211.40216064453125, -3.4110374450683594, 5.839384078979492, 9.17363166809082, 38.51683044433594, -17.097015380859375, 127.36799621582031, 92.80870056152344, -15.007442474365234, 0.22705841064453125, 67.61891174316406, -2.9282913208007812, 121.90480041503906, 199.0740203857422, -10.331466674804688, 12.407882690429688, 46.117279052734375, 181.55886840820312, 32.54114532470703, -70.3404769897461, 180.80279541015625, 31.625167846679688, -7.3238677978515625, 174.82608032226562, 17.663555145263672, 78.609619140625, 86.25752258300781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000586.npy"}
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 63.94139862060547, "std": 115.52464294433594, "min": -373.4930114746094, "p10": -55.6704864501953, "median": 75.18317031860352, "p90": 191.10767059326173, "max": 257.2606506347656, "pos_frac": 0.765625, "sample": [17.711566925048828, -117.33927917480469, -0.8199234008789062, 132.7313690185547, 124.12815856933594, -111.02027893066406, 45.30240249633789, -27.750946044921875, 174.517822265625, 150.443603515625, 12.440105438232422, 81.07280731201172, -60.17671203613281, 16.302505493164062, 1.4031829833984375, 86.53998565673828, 146.51690673828125, -4.424018859863281, -45.15596008300781, 207.105224609375, 188.0994415283203, 137.290283203125, 0.25806427001953125, 192.39691162109375, 131.42678833007812, 17.42675018310547, 199.11465454101562, 8.861557006835938, -373.4930114746094, 7.604442596435547, 130.79498291015625, 117.98567199707031, 119.7939453125, 128.98281860351562, 9.735050201416016, 85.44720458984375, 87.75569152832031, -246.9794464111328, -97.4600830078125, 257.2606506347656, 167.6466827392578, -3.1467247009277344, 179.74740600585938, 112.51620483398438, -0.9512653350830078, 162.71859741210938, 7.691619873046875, 180.72088623046875, 167.56884765625, 115.28761291503906, 205.3301544189453, -110.3252182006836, 151.03890991210938, 8.027809143066406, 69.29353332519531, 225.31468200683594, 21.536048889160156, -9.897445678710938, -42.8154411315918, 56.25688934326172, 51.454078674316406, 254.04647827148438, 186.6533966064453, 4.7046966552734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000587.npy"}
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 71.29679870605469, "std": 104.23907470703125, "min": -202.6585693359375, "p10": -60.99696426391601, "median": 67.86011123657227, "p90": 198.27361602783205, "max": 295.01422119140625, "pos_frac": 0.765625, "sample": [90.27201843261719, -89.78456115722656, -34.50629425048828, 137.928466796875, -0.5968780517578125, 161.4799041748047, 175.42755126953125, 28.895736694335938, 195.995849609375, 174.747802734375, 278.22393798828125, 39.78851318359375, 23.662809371948242, 113.2404556274414, 36.400489807128906, 121.83899688720703, 202.03317260742188, 260.5672912597656, 20.673307418823242, 295.01422119140625, 71.3897933959961, 152.90902709960938, -38.60105514526367, 184.7118682861328, 1.3632450103759766, 83.53985595703125, 34.96307373046875, -4.905220031738281, 3.7776947021484375, -40.174652099609375, 4.187196731567383, 106.6434326171875, -118.16512298583984, 102.71607971191406, 127.61617279052734, 217.4786376953125, 13.47343635559082, 133.2029571533203, 77.97123718261719, -61.58904266357422, -23.03221893310547, -59.615447998046875, 88.98072052001953, 199.2498016357422, 120.25537109375, 93.61308288574219, 172.62286376953125, -5.000888824462891, 23.159744262695312, 167.90423583984375, 20.135940551757812, -97.99221801757812, 45.924591064453125, 37.918128967285156, 120.64924621582031, 64.33042907714844, -67.48815155029297, 21.656044006347656, 40.06916046142578, -95.77871704101562, 170.0106658935547, 249.86181640625, 194.40777587890625, -202.6585693359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000588.npy"}
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 69.64300537109375, "std": 77.96127319335938, "min": -97.61346435546875, "p10": -27.252326202392574, "median": 68.86134338378906, "p90": 180.92107238769532, "max": 250.51914978027344, "pos_frac": 0.78125, "sample": [43.95452880859375, 50.986366271972656, -97.61346435546875, 188.9652099609375, 96.76822662353516, -0.382049560546875, -50.46082305908203, -4.000640869140625, 36.765296936035156, -5.3488311767578125, -30.117691040039062, 43.23637390136719, 250.51914978027344, 3.579273223876953, 115.87541198730469, 80.6050796508789, 79.6147232055664, 137.76292419433594, 145.68206787109375, 123.90787506103516, 80.17648315429688, 90.35809326171875, 167.91744995117188, -58.80560302734375, 195.3180389404297, 119.60946655273438, -24.49083709716797, 104.83721160888672, 146.06710815429688, 122.08828735351562, 62.122833251953125, -2.119487762451172, 180.95188903808594, 37.6334228515625, -8.923408508300781, 83.75785827636719, 46.87877655029297, 25.037864685058594, 81.92671966552734, 188.41836547851562, 26.788040161132812, 80.56149291992188, 181.85003662109375, 107.92826843261719, 70.6890869140625, -28.435821533203125, 111.84607696533203, -32.6776008605957, -19.459354400634766, 14.794425964355469, -72.6644287109375, 157.24127197265625, 168.53997802734375, 13.859443664550781, 41.31067657470703, 180.8491668701172, 116.07049560546875, 137.469482421875, 13.756950378417969, 16.742233276367188, 67.03359985351562, 28.35385513305664, 221.79324340820312, 3.852579116821289], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000589.npy"}
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 66.54501342773438, "std": 91.16840362548828, "min": -157.06561279296875, "p10": -34.536965942382814, "median": 65.63948822021484, "p90": 189.04491119384767, "max": 223.59957885742188, "pos_frac": 0.75, "sample": [116.5947036743164, 18.650550842285156, 127.3615493774414, 4.663047790527344, 179.08258056640625, 177.45114135742188, 75.65512084960938, -157.06561279296875, 38.7619743347168, -33.48430633544922, 52.50823974609375, 4.017475128173828, 3.755767822265625, 27.28339385986328, -148.66676330566406, 174.3980712890625, 161.00982666015625, 78.61007690429688, -21.874736785888672, -3.0597686767578125, -54.236488342285156, 58.377349853515625, -7.034416198730469, 189.74331665039062, 191.64039611816406, 3.2942123413085938, -31.547714233398438, 93.44197082519531, 222.04638671875, 49.999786376953125, -34.98810577392578, 189.7486572265625, 15.530389785766602, 23.5208740234375, 118.52947998046875, 74.75039672851562, 107.34021759033203, 27.055953979492188, 72.90162658691406, 16.449996948242188, 199.63467407226562, 98.50933837890625, 223.59957885742188, 155.01971435546875, 180.8428955078125, 102.60505676269531, 105.32864379882812, 151.15650939941406, -33.3702392578125, 73.83584594726562, -45.310272216796875, 192.0093231201172, 129.6808624267578, -12.689559936523438, -2.0588531494140625, 183.95989990234375, 9.979207992553711, 78.42452239990234, 187.41529846191406, -3.0114173889160156, 37.686527252197266, -53.21978759765625, 183.2428741455078, -86.60618591308594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000590.npy"}
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 80.5565185546875, "std": 114.67311096191406, "min": -177.04537963867188, "p10": -85.32864913940426, "median": 75.16294860839844, "p90": 204.79521942138675, "max": 329.68865966796875, "pos_frac": 0.78125, "sample": [1.9849109649658203, 191.0511474609375, 160.30470275878906, 170.12860107421875, 154.33175659179688, 61.28205871582031, 187.86436462402344, 79.01637268066406, 71.30952453613281, 106.36833190917969, 44.29640197753906, -116.53857421875, 184.20130920410156, -97.05481719970703, 103.98920440673828, -20.498001098632812, 45.37815856933594, 104.54095458984375, 97.15118408203125, 4.990970611572266, 199.41815185546875, 38.290733337402344, 255.38980102539062, -141.90017700195312, 183.93649291992188, -153.3561553955078, 162.93240356445312, 94.75592041015625, -5.694976806640625, 58.284698486328125, -177.04537963867188, -162.14280700683594, 244.95889282226562, 206.6406707763672, 1.313629150390625, 200.07762145996094, -2.9801979064941406, 49.800262451171875, 180.69891357421875, 54.94524002075195, 5.13385009765625, -8.391708374023438, 50.9554443359375, 178.38357543945312, 0.7255344390869141, 226.88223266601562, -11.303529739379883, -101.6036376953125, -9.164253234863281, 329.68865966796875, -57.96759033203125, 20.40985870361328, 237.18161010742188, 26.50249481201172, 22.89120101928711, 194.95147705078125, 188.09274291992188, 123.831298828125, 162.69007873535156, 134.73291015625, 200.48916625976562, 226.2960968017578, 4.702058792114258, 187.0852813720703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000591.npy"}
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 59.65741729736328, "std": 94.12267303466797, "min": -125.47917175292969, "p10": -45.945005035400385, "median": 23.916772842407227, "p90": 187.25852966308594, "max": 281.8871154785156, "pos_frac": 0.75, "sample": [53.62361145019531, 204.53396606445312, 187.20245361328125, 20.937448501586914, 179.5953369140625, 187.0290069580078, 215.09793090820312, 19.63998031616211, 3.3227500915527344, 35.66132354736328, -14.772819519042969, 120.1834716796875, 53.50218200683594, 9.652420043945312, -49.576446533203125, 76.66960906982422, 186.74453735351562, 113.37112426757812, -0.32463836669921875, 37.01581954956055, -104.43208312988281, 173.731201171875, 101.95283508300781, 234.35610961914062, 22.803897857666016, 109.11805725097656, 122.20125579833984, 187.28256225585938, 178.48394775390625, 25.029647827148438, -88.12348175048828, 123.38712310791016, -9.13592529296875, 152.02862548828125, 28.756309509277344, 71.78056335449219, -125.47917175292969, 74.7940673828125, -105.19098663330078, -48.20903778076172, -4.373775482177734, 15.384248733520508, 21.34539031982422, -15.379745483398438, -56.06683349609375, 1.065582275390625, 2.2756195068359375, 249.76502990722656, 216.36819458007812, 17.1451416015625, 281.8871154785156, -0.31109619140625, 19.06851577758789, 0.07418060302734375, 18.539947509765625, 3.7008495330810547, -40.662261962890625, -8.303520202636719, -28.590145111083984, 12.704082489013672, 70.45759582519531, 16.887474060058594, 116.57156372070312, 144.27713012695312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000592.npy"}
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 63.491966247558594, "std": 98.39710235595703, "min": -141.48031616210938, "p10": -75.72321395874017, "median": 58.4658203125, "p90": 197.41224365234376, "max": 215.96102905273438, "pos_frac": 0.734375, "sample": [147.59738159179688, 21.821239471435547, 179.93589782714844, -4.64825439453125, 171.0029754638672, -17.454254150390625, 79.59542083740234, -126.6011962890625, 202.3942108154297, 64.62265014648438, 191.42819213867188, 28.960208892822266, 206.62696838378906, 154.5870819091797, 34.4483642578125, -0.07686996459960938, 117.27864837646484, 92.79293823242188, 163.51803588867188, 147.97702026367188, 12.043905258178711, 54.31620788574219, 212.200439453125, 199.97683715820312, -4.822784423828125, -3.2306289672851562, 4.865289688110352, 156.57017517089844, 93.35216522216797, -121.112060546875, 77.75727844238281, -100.69562530517578, 10.276901245117188, 62.61543273925781, 99.0964584350586, 159.85345458984375, -1.9855880737304688, -9.788543701171875, 142.43557739257812, -101.67686462402344, -141.48031616210938, -11.804458618164062, 209.0898895263672, -124.90473175048828, 173.24026489257812, 154.7321014404297, 167.2852783203125, 7.958955764770508, -11.950363159179688, 215.96102905273438, 37.734466552734375, -7.866510391235352, 181.41226196289062, 88.54942321777344, 71.10094451904297, -132.96731567382812, 8.998086929321289, 200.6034698486328, 1.5814666748046875, 8.627578735351562, 11.251556396484375, 24.356231689453125, 22.632064819335938, 111.48956298828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000593.npy"}
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 58.9074592590332, "std": 86.82299041748047, "min": -227.26185607910156, "p10": -49.50078811645508, "median": 58.37758827209473, "p90": 175.69862518310546, "max": 212.50830078125, "pos_frac": 0.78125, "sample": [62.01633071899414, 115.57881164550781, -74.43690490722656, 8.942420959472656, 76.3607177734375, 127.93224334716797, 33.50952911376953, 45.908782958984375, 54.73884582519531, 1.7975540161132812, -67.53387451171875, 18.799602508544922, -9.287635803222656, 113.48445129394531, 44.12249755859375, 3.232952117919922, -65.71078491210938, 0.4345512390136719, 92.97384643554688, 8.540847778320312, 177.76986694335938, 72.28656005859375, 1.0422496795654297, 179.841796875, 194.3981170654297, 96.64595031738281, 165.17239379882812, -46.587562561035156, 174.3287353515625, -1.829080581665039, 149.0484619140625, 100.51516723632812, 208.17312622070312, 142.64137268066406, 34.67950439453125, -30.527023315429688, -52.18781280517578, 103.55636596679688, 136.1488037109375, -50.74931335449219, 173.21673583984375, 3.9340686798095703, 88.57881164550781, 176.2857208251953, 2.5364761352539062, 31.66841697692871, 77.70674133300781, 186.72784423828125, 2.1171741485595703, 172.66961669921875, 134.60519409179688, -227.26185607910156, 118.01556396484375, 212.50830078125, 52.78326416015625, 113.17759704589844, 1.7449951171875, 77.48944854736328, 71.35508728027344, -32.986839294433594, 98.14677429199219, -40.20658874511719, -19.20043182373047, -51.307273864746094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000594.npy"}
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 77.47677612304688, "std": 108.33944702148438, "min": -185.6376495361328, "p10": -16.069799804687495, "median": 69.37393951416016, "p90": 206.0196044921875, "max": 306.0214538574219, "pos_frac": 0.796875, "sample": [-3.5304946899414062, 76.71833038330078, 306.0214538574219, 106.79365539550781, 38.46099090576172, 15.87175178527832, 94.37508392333984, 37.842952728271484, 15.120498657226562, 188.25750732421875, 211.37045288085938, 11.109107971191406, 71.68461608886719, -124.89143371582031, 190.7949981689453, 19.18144989013672, -1.4247760772705078, 6.670135498046875, 192.4098663330078, 56.95336151123047, 28.167015075683594, 117.65037536621094, 217.80799865722656, 6.486236572265625, 212.871826171875, 44.99547576904297, 75.8082504272461, 85.44461059570312, 189.60585021972656, 151.92611694335938, 0.23444366455078125, -124.83158874511719, 65.45883178710938, -11.895050048828125, 178.0137939453125, 178.44517517089844, 226.3876495361328, -154.42413330078125, -185.6376495361328, 205.05612182617188, 192.27345275878906, 190.6166229248047, 131.12612915039062, 67.06326293945312, -91.43524169921875, 190.32603454589844, 96.16998291015625, 135.61386108398438, -17.858978271484375, 18.286609649658203, -0.5264797210693359, 206.43252563476562, 140.112548828125, 25.86663055419922, -9.058822631835938, 29.436920166015625, 213.38551330566406, 174.12261962890625, 44.673492431640625, 5.3764801025390625, -0.21694374084472656, -184.23599243164062, 180.9008331298828, 202.70230102539062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000595.npy"}
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 36.29243850708008, "std": 90.92322540283203, "min": -178.63075256347656, "p10": -59.853485870361325, "median": 13.114490509033203, "p90": 181.36322784423828, "max": 227.2035675048828, "pos_frac": 0.578125, "sample": [-85.85995483398438, 26.764015197753906, 197.86244201660156, 21.509857177734375, 50.44786834716797, -75.07813262939453, -38.22556686401367, 0.33287811279296875, -54.524803161621094, 194.27371215820312, -102.31893920898438, 64.36616516113281, -114.74996948242188, -16.535980224609375, 227.2035675048828, 180.71055603027344, 165.69723510742188, 79.47544860839844, -122.17710876464844, 43.00934600830078, -30.295379638671875, 181.6429443359375, 44.953948974609375, 123.19558715820312, 191.65170288085938, 177.94442749023438, 94.87487030029297, -18.039411544799805, 82.93807220458984, -62.13720703125, -38.310001373291016, 76.41836547851562, 21.425949096679688, -1.731771469116211, 77.82447052001953, 2.285968780517578, 48.46558380126953, 215.3365020751953, 7.496944427490234, -13.222618103027344, 4.481971740722656, -3.4387741088867188, -17.009632110595703, -178.63075256347656, -20.684707641601562, -2.908428192138672, 11.290069580078125, -11.466297149658203, 198.1519775390625, 63.364593505859375, 119.96481323242188, -3.05908203125, -14.770500183105469, -10.783531188964844, -13.615974426269531, 171.22393798828125, 14.938911437988281, 125.58465576171875, -41.164306640625, 43.34881591796875, 17.550762176513672, 91.06709289550781, -7.925994873046875, -37.69508361816406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000596.npy"}
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 59.17926025390625, "std": 94.24042510986328, "min": -151.5236358642578, "p10": -56.8092803955078, "median": 34.13234519958496, "p90": 190.86874237060547, "max": 226.0319061279297, "pos_frac": 0.71875, "sample": [-4.640277862548828, -41.15837860107422, -74.38719177246094, -10.212844848632812, 196.98995971679688, -10.499435424804688, 36.129703521728516, -103.34801483154297, -45.51972961425781, 193.38572692871094, 3.9690818786621094, 118.94352722167969, 2.183320999145508, 175.7017822265625, -61.64765930175781, 5.272642135620117, 191.1251678466797, 196.53990173339844, 180.7371826171875, -22.20110511779785, 103.70638275146484, 6.03155517578125, -83.01435852050781, 19.549224853515625, 172.09280395507812, 159.67572021484375, 13.737699508666992, -61.76311492919922, 174.28189086914062, 5.976961135864258, 127.49237060546875, 31.296920776367188, 192.24948120117188, 32.134986877441406, 81.9311294555664, 91.00291442871094, -16.564102172851562, 115.36006164550781, 39.29866027832031, 165.472412109375, 2.0614356994628906, -101.07855224609375, 8.400413513183594, -7.496786117553711, -11.317926406860352, -7.057390213012695, 5.783662796020508, 44.58924865722656, 150.0943603515625, 55.18749237060547, 190.27041625976562, 3.0248680114746094, 152.21014404296875, -16.427398681640625, -151.5236358642578, 129.82485961914062, 1.6630821228027344, 137.2214813232422, 226.0319061279297, 211.52203369140625, 59.97843933105469, 178.03262329101562, 72.67371368408203, 156.49118041992188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000597.npy"}
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 37.21653366088867, "std": 93.70380401611328, "min": -183.5155029296875, "p10": -51.781925964355466, "median": 21.192489624023438, "p90": 172.0341018676758, "max": 253.39138793945312, "pos_frac": 0.640625, "sample": [-2.7992801666259766, -109.37466430664062, -1.9346694946289062, 23.594215393066406, 53.449363708496094, 0.6118659973144531, 6.1163177490234375, 191.73573303222656, 155.72462463378906, -183.5155029296875, 59.342491149902344, 48.69706726074219, 53.36520767211914, -43.71681213378906, 150.27017211914062, -21.31024932861328, 211.53961181640625, -48.59735107421875, 162.64418029785156, 45.408023834228516, -21.900964736938477, -181.38027954101562, -31.394515991210938, 97.40467834472656, 55.86167907714844, 38.99858093261719, 66.06614685058594, 41.908233642578125, 0.10454559326171875, -47.09999465942383, -53.14674377441406, -111.25448608398438, 14.420869827270508, 106.33045959472656, 200.43795776367188, -5.342380523681641, 2.6822357177734375, 1.5252647399902344, 81.88554382324219, 72.35818481445312, 142.73077392578125, 20.074462890625, -3.209169387817383, 189.26712036132812, -13.511947631835938, 141.58558654785156, 29.938156127929688, -2.5977935791015625, 144.570556640625, 22.310516357421875, -28.600830078125, -65.84070587158203, 5.326683044433594, 4.23004150390625, 34.62103271484375, -43.218475341796875, 173.3503875732422, 111.76268005371094, -22.24840545654297, -15.526023864746094, 253.39138793945312, 186.32260131835938, -131.54873657226562, 168.9627685546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000598.npy"}
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 45.604652404785156, "std": 95.08106231689453, "min": -171.97402954101562, "p10": -71.7600372314453, "median": 34.66542434692383, "p90": 179.63194427490237, "max": 242.17881774902344, "pos_frac": 0.65625, "sample": [-57.32890319824219, -25.044572830200195, 58.60681915283203, 43.205772399902344, -49.15473175048828, 81.34879302978516, 22.169940948486328, 217.8875732421875, -91.7550048828125, -153.616455078125, -6.8511505126953125, -40.37348175048828, 7.906150817871094, 68.25669860839844, 94.60225677490234, -17.53534698486328, -96.75993347167969, 0.4593620300292969, 1.3375587463378906, -45.77813720703125, 182.39158630371094, 121.41712951660156, 67.15106201171875, -171.97402954101562, -0.10691070556640625, 30.403213500976562, 24.714584350585938, 182.83941650390625, -77.94480895996094, 28.8106689453125, 205.1593475341797, 181.418701171875, -102.75957489013672, -27.155418395996094, 53.06977462768555, 23.470474243164062, 88.91230010986328, -19.6500244140625, 59.705474853515625, 85.88186645507812, 164.44190979003906, -29.268104553222656, 36.23426818847656, -46.85992431640625, 170.34088134765625, 60.37607192993164, 169.22879028320312, -12.480545043945312, -114.33952331542969, 19.066265106201172, 173.91954040527344, 190.73287963867188, 84.18305969238281, 153.59034729003906, 175.4628448486328, 33.096580505371094, 242.17881774902344, 166.00009155273438, 110.26577758789062, -2.7626819610595703, -13.110671997070312, 79.4283676147461, 87.09828186035156, 74.53648376464844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000599.npy"}
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 53.34266662597656, "std": 94.5609130859375, "min": -183.51849365234375, "p10": -57.3391128540039, "median": 32.191192626953125, "p90": 193.77411193847658, "max": 238.77005004882812, "pos_frac": 0.78125, "sample": [152.46871948242188, 63.793861389160156, 2.3200016021728516, 4.440437316894531, 198.36752319335938, 29.199020385742188, 39.828372955322266, -116.95345306396484, 170.19427490234375, 210.19627380371094, -24.38599395751953, 2.631664276123047, 87.83281707763672, 118.73811340332031, 3.44207763671875, 20.89610481262207, -84.86747741699219, -86.0578384399414, -59.274322509765625, 136.6681671142578, 17.097461700439453, 91.27909851074219, 153.71693420410156, 116.78019714355469, -62.254798889160156, 231.3252716064453, 98.3239517211914, 167.26402282714844, -12.811874389648438, 190.03073120117188, 195.37841796875, 6.143989562988281, -183.51849365234375, -15.838096618652344, 90.12348937988281, -42.61617660522461, 4.442169189453125, 129.51419067382812, 0.9662132263183594, 229.68997192382812, -4.4408416748046875, 114.8319091796875, 10.736320495605469, 180.38525390625, 2.6477298736572266, 9.124256134033203, -52.82362365722656, -30.03154182434082, -139.00543212890625, 65.33444213867188, 2.000030517578125, 86.69282531738281, 59.873382568359375, 24.270736694335938, 101.39329528808594, 9.1527099609375, 35.18336486816406, 57.83732986450195, 104.31350708007812, 199.3192138671875, 2.5164718627929688, 47.235755920410156, 14.098419189453125, 238.77005004882812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000600.npy"}
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 72.8189697265625, "std": 105.35838317871094, "min": -175.04725646972656, "p10": -50.24929733276367, "median": 68.34732818603516, "p90": 195.4568298339844, "max": 269.8735046386719, "pos_frac": 0.71875, "sample": [153.8072509765625, 12.175193786621094, 179.33612060546875, -108.89277648925781, 195.96817016601562, 123.709716796875, 269.8735046386719, -53.448760986328125, 62.20289611816406, -7.4545440673828125, 206.38002014160156, -149.25750732421875, 182.7582244873047, 9.043550491333008, -25.990259170532227, 77.80225372314453, 164.90872192382812, 240.39984130859375, -58.34618377685547, 147.63449096679688, 189.9669189453125, 2.7823486328125, 0.7589569091796875, 211.05404663085938, 194.26370239257812, 134.40939331054688, -1.4980525970458984, -10.96444320678711, 150.60369873046875, 57.93043518066406, 19.202781677246094, 144.4510498046875, 120.81509399414062, -15.355293273925781, -2.8808231353759766, 118.622802734375, -1.4766311645507812, 114.70962524414062, 74.49176025390625, 8.273305892944336, -54.79963684082031, 96.84803771972656, 14.512126922607422, 171.81765747070312, 53.33099365234375, 134.11314392089844, 51.177066802978516, -42.78388214111328, -37.100608825683594, 7.8675079345703125, -2.0440292358398438, 241.97030639648438, -1.1751956939697266, 4.006866455078125, 171.0749053955078, 177.47119140625, 261.84832763671875, 122.5933837890625, 177.59811401367188, 1.6893768310546875, -148.33395385742188, 167.78929138183594, 133.21971130371094, -175.04725646972656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000601.npy"}
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 70.70862579345703, "std": 90.23670959472656, "min": -177.44125366210938, "p10": -8.162517547607418, "median": 52.3286018371582, "p90": 194.23297271728515, "max": 264.7122497558594, "pos_frac": 0.828125, "sample": [-18.480850219726562, 137.50668334960938, 201.11053466796875, 264.7122497558594, 98.34486389160156, 39.59149169921875, 59.457550048828125, 5.47761344909668, 70.32159423828125, -2.021587371826172, 211.23953247070312, -10.013729095458984, 195.0404815673828, 176.30001831054688, -110.21246337890625, 192.34878540039062, -93.15280151367188, 209.33486938476562, 99.05259704589844, -2.496814727783203, 10.173477172851562, -2.2662734985351562, 11.487899780273438, -3.968669891357422, 24.040206909179688, 156.788818359375, 77.53972625732422, 156.9815673828125, 22.428028106689453, 98.936279296875, 8.89572525024414, 141.05889892578125, 87.45928192138672, 54.07487106323242, 50.582332611083984, 29.90433120727539, 19.02035903930664, 42.147132873535156, 183.27587890625, 93.11243438720703, 12.784557342529297, 148.91957092285156, 172.72174072265625, 168.79794311523438, 8.614402770996094, -177.44125366210938, 103.88147735595703, 49.620948791503906, 4.50250244140625, 16.92291259765625, -9.959880828857422, 9.316986083984375, 173.76483154296875, 5.399436950683594, 16.459754943847656, 187.83006286621094, 85.02242279052734, 83.13365173339844, 204.43408203125, 201.0243682861328, -96.6457748413086, 154.73570251464844, 10.602806091308594, 5.776031494140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000602.npy"}
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 66.09971618652344, "std": 97.51268768310547, "min": -178.69674682617188, "p10": -64.47386093139647, "median": 53.977928161621094, "p90": 183.04022827148438, "max": 259.9698791503906, "pos_frac": 0.78125, "sample": [177.47959899902344, 98.60844421386719, 177.9876708984375, -31.995399475097656, -81.38716125488281, 29.917007446289062, 53.201942443847656, 99.20596313476562, -3.8275985717773438, 7.2579803466796875, 179.3495635986328, 50.86213684082031, 7.394371032714844, -77.19124603271484, 34.51959991455078, 220.5599365234375, -95.86668395996094, 54.75391387939453, 12.792129516601562, 259.9698791503906, 84.65996551513672, 22.00201416015625, 178.1414794921875, 67.2076416015625, -5.876033782958984, 52.18009948730469, 167.9464569091797, -73.7176513671875, 169.0897674560547, -166.29405212402344, 0.5862350463867188, 13.06719970703125, 118.83779907226562, 103.96151733398438, -71.10066986083984, 162.17970275878906, 64.6024169921875, 26.705486297607422, -5.721792221069336, 186.26446533203125, 127.53047180175781, 147.73428344726562, 102.58729553222656, 83.6460952758789, 140.2823486328125, 179.14794921875, 25.723312377929688, -49.01130676269531, 76.53706359863281, 17.23130989074707, 13.379310607910156, 45.07786560058594, 167.79550170898438, 182.20590209960938, 94.3305892944336, -10.940872192382812, 7.696247100830078, -178.69674682617188, 185.23825073242188, 183.39779663085938, 190.96426391601562, 1.5072803497314453, 254.1627655029297, -25.4595947265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000603.npy"}
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 59.26122283935547, "std": 99.43983459472656, "min": -210.7389678955078, "p10": -58.71613616943359, "median": 49.74794006347656, "p90": 188.41460113525392, "max": 292.14208984375, "pos_frac": 0.734375, "sample": [70.94500732421875, 147.09197998046875, -9.181446075439453, -53.169830322265625, 20.387218475341797, 135.57086181640625, -16.846263885498047, 0.7328586578369141, 185.04554748535156, -36.13533020019531, 23.73333740234375, -12.220130920410156, 1.7142696380615234, 149.41989135742188, -77.05357360839844, -14.252647399902344, 177.5048370361328, 69.407470703125, 63.29540252685547, 185.28012084960938, -139.51272583007812, -42.5064697265625, 14.754776000976562, -82.83761596679688, 111.78730010986328, 5.773080825805664, 12.016054153442383, 77.41197967529297, 71.76565551757812, -210.7389678955078, 59.06381607055664, -50.73090362548828, 66.39005279541016, -61.09312438964844, 180.93707275390625, -15.637397766113281, 53.45610046386719, 0.00885009765625, 196.91348266601562, 151.52597045898438, 101.38713073730469, 194.2101287841797, 292.14208984375, 21.890350341796875, 169.1123046875, 131.212890625, 44.945770263671875, 180.87686157226562, 197.9409637451172, -62.410675048828125, 200.2147216796875, 5.2094268798828125, 82.99694061279297, 34.686729431152344, 139.52391052246094, 46.03977966308594, 143.6203155517578, 36.77739715576172, 1.2874584197998047, 208.51132202148438, 151.1097412109375, -37.46397399902344, -90.87782287597656, 189.75794982910156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000604.npy"}
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 58.58209228515625, "std": 99.8581771850586, "min": -192.29660034179688, "p10": -52.719414138793944, "median": 50.7572135925293, "p90": 179.02753143310548, "max": 239.72833251953125, "pos_frac": 0.703125, "sample": [80.93143463134766, 92.44119262695312, -142.15011596679688, 239.72833251953125, -8.556495666503906, 187.38668823242188, 48.60247802734375, 174.8364715576172, 112.2363510131836, -53.08568572998047, 150.50979614257812, -27.582839965820312, 49.43549346923828, -25.275550842285156, 8.520374298095703, -83.92527770996094, 41.88526916503906, 162.39279174804688, 140.42588806152344, 67.48045349121094, -32.228759765625, 16.414413452148438, 127.6943130493164, -192.29660034179688, -1.3161087036132812, 136.1337127685547, 187.44570922851562, 169.24319458007812, 63.450950622558594, 48.403656005859375, 135.92803955078125, 24.41687774658203, -181.7374725341797, 169.0795135498047, -3.416748046875, 178.78240966796875, 179.13258361816406, 121.03195190429688, 27.373748779296875, 115.30978393554688, -53.69837951660156, -9.795124053955078, 171.068359375, 52.07893371582031, 192.14547729492188, 29.74353790283203, 182.30923461914062, 172.8321533203125, 173.77261352539062, -20.66778564453125, 213.16180419921875, 67.73951721191406, 89.0555419921875, -12.254981994628906, -34.14857482910156, 15.853067398071289, 15.570344924926758, -169.52105712890625, 29.006423950195312, -51.86478042602539, 7.26182746887207, 74.10136413574219, -10.439743041992188, 120.862060546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000605.npy"}
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 70.06275939941406, "std": 99.749267578125, "min": -139.54592895507812, "p10": -63.451563262939445, "median": 57.52324295043945, "p90": 202.41363525390625, "max": 269.6534729003906, "pos_frac": 0.71875, "sample": [-52.923851013183594, 199.71664428710938, 28.169158935546875, 216.78475952148438, -76.57047271728516, 14.971527099609375, 180.456787109375, 25.807662963867188, -2.422475814819336, 148.6140899658203, 182.35299682617188, -25.416610717773438, 105.47647094726562, 203.56948852539062, 178.3759765625, -139.54592895507812, 119.00798034667969, 69.225830078125, 0.3664379119873047, 138.8622283935547, 269.6534729003906, 15.430419921875, 39.70524215698242, 164.72845458984375, -67.96343994140625, -68.5650863647461, 233.619873046875, -34.7991943359375, 95.31893920898438, 144.13339233398438, 107.17735290527344, -0.6438827514648438, -84.9568862915039, 180.30943298339844, 91.61796569824219, 3.095918655395508, 48.24174499511719, -9.116739273071289, 130.1510009765625, 3.4208831787109375, 185.938232421875, 218.75665283203125, 66.80474090576172, 145.41043090820312, 7.30926513671875, 241.48255920410156, -23.213546752929688, 46.2310791015625, -86.45447540283203, -39.73678970336914, 183.2164764404297, 24.00035858154297, -97.78522491455078, 38.172027587890625, -14.292678833007812, 131.30960083007812, 71.25238800048828, -24.56521224975586, 101.9434814453125, 229.54188537597656, 125.58283996582031, 145.3309326171875, 32.66888427734375, -0.32513999938964844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000606.npy"}
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 62.736549377441406, "std": 99.51309204101562, "min": -201.09912109375, "p10": -57.69485092163086, "median": 55.270408630371094, "p90": 183.94365386962892, "max": 263.617919921875, "pos_frac": 0.703125, "sample": [3.0267868041992188, 23.264892578125, -118.27521514892578, 80.50277709960938, 132.14598083496094, 3.8418426513671875, 132.79397583007812, 54.918312072753906, -201.09912109375, -57.76586151123047, -2.140583038330078, -32.09546661376953, 131.86285400390625, 183.83567810058594, 52.59510803222656, -40.06817626953125, -9.651603698730469, 11.113958358764648, 55.62250518798828, 196.13900756835938, 25.67718505859375, 208.54916381835938, 141.67941284179688, 135.82896423339844, 102.06626892089844, 34.68610382080078, -45.918785095214844, 181.95738220214844, -6.0385589599609375, -13.352119445800781, 196.1751251220703, 91.39371490478516, 120.34555053710938, 140.45993041992188, -97.29745483398438, 200.85958862304688, 263.617919921875, -2.870494842529297, 150.8455352783203, 6.4170989990234375, -57.52915954589844, 8.990486145019531, 25.261903762817383, 142.6503143310547, 98.72892761230469, -16.579368591308594, -75.84314727783203, 153.41815185546875, 183.98992919921875, 167.5148162841797, -0.5003166198730469, 15.330032348632812, 133.64645385742188, 208.09994506835938, 35.7536506652832, 178.18301391601562, -1.1081657409667969, 60.46961975097656, 160.1974639892578, 181.168212890625, 58.976104736328125, -150.03494262695312, -68.0477523803711, 136.7538604736328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000607.npy"}
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 61.499488830566406, "std": 89.25361633300781, "min": -151.32168579101562, "p10": -26.675407409667965, "median": 32.45441818237305, "p90": 174.14368286132813, "max": 265.8145751953125, "pos_frac": 0.71875, "sample": [168.47743225097656, -86.01991271972656, -0.4059791564941406, 180.07150268554688, -0.03611946105957031, -55.4000244140625, 145.63482666015625, 7.6678924560546875, 40.73887252807617, -114.02671813964844, 125.48356628417969, 87.15612030029297, 123.66810607910156, 173.5228729248047, 59.01519775390625, 119.94119262695312, 220.36671447753906, -151.32168579101562, -5.950227737426758, 10.187213897705078, 41.60107421875, -16.669330596923828, -0.4722919464111328, 204.43881225585938, 169.766357421875, 10.61843490600586, -0.18630218505859375, 95.08625793457031, 4.364738464355469, 189.8349151611328, 138.48892211914062, -34.46595001220703, -28.771209716796875, 105.62054443359375, 30.20703125, 115.25125122070312, 97.71908569335938, -18.33423614501953, -21.785202026367188, -2.809234619140625, 10.450401306152344, 12.738128662109375, 34.84367370605469, 29.259788513183594, -13.326522827148438, 132.27102661132812, -5.035858154296875, 174.4097442626953, 265.8145751953125, 168.3972625732422, 115.02304077148438, 17.461959838867188, 34.701805114746094, 172.17901611328125, 2.5388450622558594, 158.33712768554688, -78.78483581542969, 187.03509521484375, 19.663787841796875, 13.098609924316406, 23.28221893310547, 169.37901306152344, 17.285934448242188, 146.66900634765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000608.npy"}
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 71.68486785888672, "std": 88.19715118408203, "min": -209.37698364257812, "p10": -41.574688720703115, "median": 77.69268798828125, "p90": 181.58990631103515, "max": 282.642578125, "pos_frac": 0.78125, "sample": [197.4067840576172, 163.0342559814453, 85.10648345947266, 9.097343444824219, 70.3072509765625, 24.15839385986328, 177.66456604003906, 107.36039733886719, -14.294097900390625, -10.523399353027344, -33.30657958984375, 124.21812438964844, -12.739852905273438, 24.672252655029297, 282.642578125, 147.32077026367188, 196.20510864257812, 99.82518005371094, 83.81996154785156, 199.52493286132812, -50.738853454589844, 114.83403015136719, 74.25576782226562, 71.14602661132812, -70.681884765625, 194.97384643554688, 138.98831176757812, 96.12128448486328, -89.92891693115234, 133.64923095703125, 48.025550842285156, -0.8020648956298828, 177.4225311279297, 82.5673599243164, -25.62921142578125, 102.74903106689453, 62.75895309448242, -0.5806865692138672, 127.9490966796875, 111.74241638183594, 65.8548355102539, -65.06379699707031, 59.67395782470703, 193.72140502929688, 182.38693237304688, -209.37698364257812, 14.973812103271484, 83.14633178710938, 76.27427673339844, 11.509077072143555, 32.39194107055664, 109.22954559326172, 116.00352478027344, 59.371238708496094, 1.3219356536865234, 9.329658508300781, 179.7301788330078, 108.27568817138672, 168.67398071289062, -51.380821228027344, 169.09182739257812, 79.11109924316406, 18.3778076171875, -45.1181640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000609.npy"}
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 69.20388793945312, "std": 99.09683227539062, "min": -126.33935546875, "p10": -38.379602050781244, "median": 62.8764533996582, "p90": 198.8104095458985, "max": 282.52801513671875, "pos_frac": 0.71875, "sample": [67.03800964355469, 282.52801513671875, 165.694580078125, 118.67408752441406, 62.65648651123047, 184.5980224609375, 4.9241943359375, 5.014383316040039, -93.4630126953125, 6.319427490234375, 66.10433197021484, -103.61507415771484, 103.56655883789062, 34.93898010253906, -10.322280883789062, 63.09642028808594, 180.20103454589844, -31.323684692382812, -47.570587158203125, -8.867530822753906, 33.07263946533203, -85.13809204101562, 1.5907363891601562, 134.46612548828125, -35.34831237792969, -39.67872619628906, 13.602676391601562, 186.74444580078125, 61.141448974609375, 212.44677734375, 65.24160766601562, 267.9634704589844, 203.98153686523438, 182.01852416992188, -30.810104370117188, 165.22308349609375, 185.21693420410156, -1.7404537200927734, 175.63211059570312, 20.84161376953125, 13.14813232421875, 143.77703857421875, 76.3250961303711, -126.33935546875, 179.56524658203125, -11.99887466430664, -32.374046325683594, -21.756183624267578, 180.1976776123047, 210.55667114257812, 87.61332702636719, 102.4217529296875, 34.57936096191406, -3.9546070098876953, 0.4058055877685547, -13.837638854980469, -94.80772399902344, 158.99456787109375, 209.1065216064453, 94.20712280273438, 74.32014465332031, 31.630783081054688, 221.23184204101562, 149.3760223388672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000610.npy"}
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 54.448604583740234, "std": 85.02355194091797, "min": -158.35885620117188, "p10": -45.432060241699205, "median": 53.655839920043945, "p90": 172.2099411010742, "max": 219.63937377929688, "pos_frac": 0.71875, "sample": [98.39312744140625, 99.20223236083984, 18.622894287109375, 52.31973648071289, 147.50094604492188, -2.3713150024414062, -6.56634521484375, 63.53071594238281, -2.860759735107422, 60.87342834472656, 0.3718681335449219, 157.05638122558594, 68.81475830078125, 109.98942565917969, -4.337554931640625, -1.254302978515625, -92.04666137695312, 171.7992401123047, -33.119441986083984, 219.63937377929688, 167.43081665039062, 85.2502670288086, 108.21879577636719, -72.57966613769531, -50.70889663696289, -8.608039855957031, 172.52005004882812, 65.97029876708984, 109.4820327758789, 204.28811645507812, -0.4875946044921875, -56.96216583251953, 31.640159606933594, 192.1246337890625, -16.814437866210938, 10.9444580078125, -158.35885620117188, 181.2982940673828, 40.18175506591797, 50.36903381347656, 114.72883605957031, 88.02786254882812, -146.156494140625, 81.79832458496094, 21.93022918701172, 62.902687072753906, 96.14645385742188, 32.526763916015625, 142.85482788085938, 158.35464477539062, 87.32505798339844, 149.12200927734375, 2.483449935913086, 67.83377838134766, 4.174934387207031, 26.271743774414062, 189.51654052734375, -103.48011016845703, 172.38595581054688, 23.51134490966797, -25.04897117614746, 6.041069030761719, -4.288995742797852, 54.991943359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000611.npy"}
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 78.07444763183594, "std": 94.89973449707031, "min": -204.4320831298828, "p10": -17.6381534576416, "median": 76.12776184082031, "p90": 207.82696075439458, "max": 245.463134765625, "pos_frac": 0.796875, "sample": [-1.55572509765625, 2.744476318359375, 85.38552856445312, 7.312553405761719, -33.70329284667969, 7.077537536621094, 158.00685119628906, 128.74209594726562, -7.833343505859375, 28.431861877441406, 132.72055053710938, 77.29960632324219, 72.76461029052734, 74.95591735839844, 126.29454803466797, 48.84844970703125, 45.24193572998047, 8.932594299316406, 89.73097229003906, 137.14569091796875, 159.9180908203125, -58.78649139404297, 140.76348876953125, 44.11015319824219, 70.60206604003906, 26.449726104736328, 232.8408966064453, -18.426380157470703, -32.45455551147461, 17.793785095214844, 68.0204849243164, 186.0123748779297, 2.106006622314453, -5.164510726928711, -15.798957824707031, 193.33807373046875, 28.70520782470703, 129.96932983398438, 239.374755859375, -154.7793731689453, 216.12449645996094, 68.33612060546875, 162.10264587402344, -117.53129577636719, 229.029541015625, 113.95726013183594, 169.15802001953125, -11.08466911315918, 164.0303192138672, 214.03648376464844, 95.67459869384766, -7.364128112792969, 229.38800048828125, 91.73481750488281, 48.72102355957031, 245.463134765625, 124.45067596435547, 67.65094757080078, 91.09898376464844, -204.4320831298828, 83.1037368774414, 151.2318115234375, 152.9672088623047, 175.7792205810547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000612.npy"}
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 75.09466552734375, "std": 92.99874877929688, "min": -178.78076171875, "p10": -26.150121116638182, "median": 71.03226852416992, "p90": 192.41020507812502, "max": 218.87841796875, "pos_frac": 0.75, "sample": [-8.552558898925781, 59.06202697753906, -13.148468017578125, 31.779373168945312, 111.2433090209961, 132.82537841796875, 120.33419036865234, 77.440673828125, -39.995521545410156, -4.28282356262207, 47.963356018066406, 59.3626594543457, -178.78076171875, 141.55361938476562, 150.89303588867188, 50.40138244628906, -26.890588760375977, 3.543722152709961, 218.87841796875, 147.91580200195312, 16.109420776367188, -4.907934188842773, -156.32290649414062, 14.971834182739258, 178.78347778320312, 74.0792465209961, 172.20050048828125, 12.231870651245117, 16.36225128173828, 14.051033020019531, 218.74008178710938, 66.40516662597656, 95.25253295898438, -1.4571876525878906, -6.725013732910156, 71.47354125976562, -79.25191497802734, 176.51402282714844, 117.43131256103516, 203.795166015625, 176.65676879882812, 190.7113037109375, 61.479454040527344, 169.2740478515625, 213.71913146972656, 6.4000091552734375, 217.88818359375, -1.925699234008789, -29.473709106445312, 188.47592163085938, -9.042160034179688, -24.42236328125, 175.43206787109375, 212.8493194580078, 125.21122741699219, 143.29263305664062, 20.039127349853516, 193.1383056640625, 122.67156982421875, 187.57550048828125, 70.59099578857422, 75.51902770996094, 90.29813385009766, -51.58294677734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000613.npy"}
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 68.12400817871094, "std": 90.0319595336914, "min": -160.84359741210938, "p10": -45.00386276245117, "median": 76.82067489624023, "p90": 178.36012573242186, "max": 226.88232421875, "pos_frac": 0.78125, "sample": [178.63064575195312, 116.98822784423828, 35.73524856567383, 137.95993041992188, 98.58809661865234, 117.33806610107422, 11.015483856201172, 0.20599746704101562, 79.09597778320312, 65.45965576171875, 205.23907470703125, 135.82931518554688, 188.7357940673828, 181.6598358154297, 113.66183471679688, 144.21852111816406, 136.10023498535156, 89.0179672241211, 122.8396224975586, 111.91050720214844, -114.35846710205078, 7.1872711181640625, -46.70690155029297, -160.84359741210938, 226.88232421875, -12.772789001464844, -9.130300521850586, -54.45916748046875, 165.53250122070312, -18.75909423828125, -65.51973724365234, 9.671001434326172, 4.744388580322266, 168.58404541015625, 66.44070434570312, 15.113763809204102, 147.34361267089844, 80.61322021484375, -150.68736267089844, 140.53103637695312, -16.381752014160156, 177.72891235351562, 137.84323120117188, 129.1129150390625, 102.83646392822266, 142.2683563232422, 30.649559020996094, 64.96771240234375, 139.66085815429688, 52.54719543457031, 19.112747192382812, 66.1413345336914, -41.03010559082031, -27.04967498779297, 10.835453033447266, 202.94442749023438, 88.98020935058594, -16.15850067138672, 171.67477416992188, 36.519683837890625, 74.54537200927734, 51.1346321105957, -98.22624206542969, 189.64248657226562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000614.npy"}
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 66.5538558959961, "std": 109.13554382324219, "min": -223.55856323242188, "p10": -60.60144958496093, "median": 55.511226654052734, "p90": 195.11482543945314, "max": 286.1209716796875, "pos_frac": 0.78125, "sample": [63.505470275878906, 89.69886779785156, 175.4974365234375, 26.233795166015625, 71.22723388671875, 173.8265380859375, 63.546791076660156, -159.91441345214844, 184.683837890625, 56.84429168701172, -113.98516082763672, 78.4422836303711, -85.06802368164062, 96.95851135253906, 42.483131408691406, -100.72998046875, 143.97625732421875, 218.525390625, 160.83538818359375, 3.1913909912109375, 180.3646240234375, -28.34111785888672, 232.813720703125, -2.4951171875, 166.06976318359375, 35.86114501953125, -28.12079620361328, 50.36344909667969, 11.986846923828125, 197.4496307373047, 187.14341735839844, 64.40031433105469, -12.961997985839844, 37.60039138793945, 6.026924133300781, 184.73397827148438, 166.70498657226562, -57.8519287109375, 9.370254516601562, 189.6669464111328, 211.49856567382812, 218.2298583984375, 182.1608123779297, 5.8237762451171875, 137.42457580566406, 12.800647735595703, 33.20008850097656, -61.779815673828125, 49.456329345703125, 199.1436004638672, -46.65936279296875, 187.07208251953125, 6.811363220214844, 81.44902038574219, -13.081413269042969, 286.1209716796875, 54.17816162109375, 117.167724609375, 186.22300720214844, 8.608192443847656, 13.710273742675781, 25.521156311035156, -192.63894653320312, -223.55856323242188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000615.npy"}
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 61.14124298095703, "std": 93.16486358642578, "min": -155.77676391601562, "p10": -31.446649932861323, "median": 40.4026985168457, "p90": 189.99268188476563, "max": 225.22061157226562, "pos_frac": 0.703125, "sample": [205.46844482421875, 8.841537475585938, 3.6849365234375, -14.029861450195312, 176.6359100341797, -4.07838249206543, 225.22061157226562, 128.41526794433594, 0.8192024230957031, 107.10201263427734, 45.464454650878906, 9.192855834960938, 64.46675872802734, -155.77676391601562, 28.53731918334961, -49.74047088623047, 33.641021728515625, -32.891178131103516, -8.146942138671875, 205.46817016601562, 47.374046325683594, 190.64852905273438, -8.799951553344727, -21.551652908325195, 193.4862060546875, 0.8472423553466797, -10.804990768432617, 2.6186599731445312, -25.71861457824707, 186.1727752685547, 149.6434326171875, 6.108711242675781, 79.89666748046875, 175.64743041992188, 218.27345275878906, 159.60354614257812, -10.095577239990234, 91.23626708984375, -18.63055419921875, 180.2618408203125, 173.733154296875, 77.71000671386719, 37.542945861816406, 80.01704406738281, 159.20074462890625, 188.46237182617188, -55.43993377685547, 17.842693328857422, -137.32928466796875, 9.089889526367188, -28.07608413696289, -90.15080261230469, 86.5129623413086, 71.90438842773438, 32.856407165527344, 199.50299072265625, 97.73921966552734, -2.5289878845214844, 118.56550598144531, -13.655654907226562, 166.96917724609375, 175.90667724609375, 43.262451171875, -61.11058807373047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000616.npy"}
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 48.437278747558594, "std": 91.58654022216797, "min": -164.47238159179688, "p10": -51.88406677246093, "median": 37.66682052612305, "p90": 174.34939575195312, "max": 295.2500915527344, "pos_frac": 0.75, "sample": [29.210670471191406, 13.948223114013672, -30.178863525390625, 35.019935607910156, 182.5001983642578, 113.70968627929688, 138.33059692382812, 175.9678497314453, 4.712230682373047, -75.78609466552734, 40.31370544433594, 23.760276794433594, -54.39753723144531, 1.029195785522461, 1.0543479919433594, 155.89471435546875, -22.73015594482422, 172.87973022460938, 34.371856689453125, 3.4284133911132812, 295.2500915527344, -9.02072525024414, 1.5324974060058594, 162.19119262695312, 93.31932067871094, -0.649566650390625, -2.3680152893066406, 127.40353393554688, 49.35475158691406, -0.5187950134277344, 17.29793930053711, 8.866535186767578, 44.69004821777344, 183.29608154296875, 100.067626953125, 212.70745849609375, 42.68207550048828, -159.65444946289062, -37.70508575439453, 128.93638610839844, 95.71045684814453, 5.5177764892578125, -60.42555236816406, -8.535709381103516, 190.92523193359375, 174.74801635742188, -164.47238159179688, -144.797607421875, -132.2093963623047, 64.4288330078125, 106.49185180664062, 120.60183715820312, 59.57478332519531, 9.042272567749023, 173.41928100585938, 98.27970886230469, 34.79115295410156, 8.919807434082031, 46.69292449951172, 78.47158813476562, 42.32673645019531, 81.72757720947266, 64.0579833984375, -46.01930236816406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000617.npy"}
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 61.68936538696289, "std": 89.90117645263672, "min": -116.60106658935547, "p10": -39.01580352783203, "median": 46.31245040893555, "p90": 195.89368286132813, "max": 278.1098937988281, "pos_frac": 0.78125, "sample": [226.14845275878906, 202.8302001953125, 13.352062225341797, 165.04638671875, -9.912696838378906, -1.387704849243164, -63.66556167602539, 9.381919860839844, 211.86346435546875, 0.7731361389160156, 3.8468093872070312, 195.98822021484375, 142.7122039794922, 78.89781188964844, 191.67681884765625, 4.788688659667969, 64.78182220458984, 1.9422760009765625, 278.1098937988281, 114.5999755859375, 32.348182678222656, 219.70201110839844, 2.250215530395508, 44.52037811279297, 4.316993713378906, 191.99642944335938, 51.40198516845703, 51.63690948486328, 114.89229583740234, 74.4992904663086, 7.2118682861328125, 21.713333129882812, 195.673095703125, -3.6450538635253906, 53.740631103515625, -46.960784912109375, 62.06770324707031, -116.60106658935547, 110.95024108886719, 41.55060577392578, -86.79974365234375, 0.6223640441894531, 1.8549385070800781, -109.65719604492188, 103.56974792480469, -14.484420776367188, 38.892967224121094, -84.79682922363281, 105.9417495727539, 197.93865966796875, 95.64046478271484, 133.67605590820312, 8.194000244140625, -0.40538787841796875, 48.104522705078125, 54.36784362792969, -36.08979797363281, 87.51294708251953, 137.5083465576172, 169.95960998535156, -5.0199432373046875, 162.25851440429688, -40.269805908203125, 34.560386657714844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000618.npy"}
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 67.08335876464844, "std": 96.25796508789062, "min": -245.692626953125, "p10": -22.826781082153317, "median": 55.7342529296875, "p90": 199.5620880126953, "max": 225.06150817871094, "pos_frac": 0.765625, "sample": [118.6959228515625, 15.994125366210938, 2.5772533416748047, 22.244216918945312, 182.49302673339844, 208.74708557128906, 199.12042236328125, -67.0405502319336, -108.00189208984375, 225.06150817871094, 189.33782958984375, 79.51344299316406, 161.7096710205078, 11.491491317749023, 78.82772827148438, -14.990165710449219, 54.699607849121094, 77.19131469726562, -2.1407337188720703, -7.158203125, -110.49251556396484, 223.57196044921875, 31.6329345703125, -57.06434631347656, -9.014520645141602, 172.4130859375, 146.11192321777344, 167.24571228027344, 0.5806331634521484, -24.7010498046875, 12.664939880371094, -245.692626953125, -17.974639892578125, 199.75137329101562, 140.1741943359375, 136.65057373046875, 12.679462432861328, 187.76873779296875, 202.95556640625, -80.28305053710938, 115.22358703613281, 214.63406372070312, 86.71392822265625, 0.0011844635009765625, 110.44538879394531, 119.16079711914062, 101.74632263183594, 155.37045288085938, 147.67803955078125, 42.245670318603516, 56.768898010253906, 46.7809944152832, 17.69957733154297, 33.97126770019531, 207.3297119140625, 16.738121032714844, 63.15325927734375, 113.8991470336914, -3.6535472869873047, 30.79039192199707, -7.198822021484375, 101.2094497680664, 23.729232788085938, -18.453487396240234], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000619.npy"}
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 56.845909118652344, "std": 92.92910766601562, "min": -137.77670288085938, "p10": -53.441347503662094, "median": 39.07463836669922, "p90": 185.86132507324223, "max": 223.5019073486328, "pos_frac": 0.71875, "sample": [22.34577178955078, -133.47894287109375, -0.195556640625, 160.20657348632812, -72.71044158935547, 0.285064697265625, 58.98997497558594, 189.72633361816406, 15.836830139160156, 98.51338958740234, 152.48593139648438, 150.5427703857422, 114.24911499023438, 89.49505615234375, -4.9041900634765625, 217.8035888671875, 125.11538696289062, 58.849517822265625, -33.75193786621094, 5.2139739990234375, -137.77670288085938, 21.060405731201172, -98.80743408203125, 218.68563842773438, -59.06231689453125, 164.87049865722656, -28.02759552001953, -15.78533935546875, 196.56634521484375, -40.32575225830078, 165.63417053222656, 190.237060546875, -24.793964385986328, -8.288480758666992, -31.965953826904297, 28.081619262695312, 33.69816970825195, 37.60575866699219, 171.63406372070312, 1.3350067138671875, 157.23141479492188, 25.942916870117188, 113.4267578125, 79.13782501220703, 48.65761184692383, -96.01470947265625, 223.5019073486328, 196.14825439453125, -11.224508285522461, 137.4844970703125, 1.4581737518310547, 176.8429718017578, 40.54351806640625, 3.1645984649658203, -111.32429504394531, 27.1231746673584, 156.089599609375, 19.552993774414062, 78.10414123535156, -2.79638671875, 115.43008422851562, 107.79669952392578, 107.1236572265625, 45.54395294189453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000620.npy"}
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 53.09856414794922, "std": 98.0359878540039, "min": -172.84811401367188, "p10": -66.99797401428222, "median": 65.67878723144531, "p90": 181.4823211669922, "max": 302.31951904296875, "pos_frac": 0.734375, "sample": [4.97930908203125, 19.492408752441406, 16.101951599121094, -59.21879196166992, -70.3319091796875, -135.97634887695312, 75.59957885742188, 103.33262634277344, -35.765167236328125, 14.140329360961914, 68.12141418457031, 118.42472839355469, 181.3592529296875, -0.41131019592285156, 183.6584014892578, 73.41752624511719, 71.73728942871094, 104.17430114746094, 68.62091064453125, 162.85198974609375, 80.60940551757812, 50.10386657714844, 246.2423553466797, -159.52688598632812, 203.52139282226562, 149.00680541992188, 75.10113525390625, -172.84811401367188, 86.45578002929688, -82.87867736816406, 212.6307373046875, 96.01647186279297, 71.6990966796875, 174.20806884765625, 171.13587951660156, -18.59881591796875, 71.45372009277344, 10.278879165649414, 78.88117980957031, -32.342193603515625, -70.47804260253906, 11.141942977905273, 201.8406219482422, -19.27471923828125, 20.669775009155273, -20.286640167236328, 39.7872314453125, 1.4660186767578125, -16.180953979492188, 4.946552276611328, 3.882383346557617, 159.97650146484375, 63.23616027832031, 10.271799087524414, 132.08749389648438, -14.656116485595703, -24.78044891357422, 302.31951904296875, -149.79794311523438, 73.63386535644531, 101.40591430664062, 22.048812866210938, 181.53506469726562, 108.05450439453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000621.npy"}
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 60.16947555541992, "std": 93.58633422851562, "min": -168.35618591308594, "p10": -49.41577186584472, "median": 42.33197021484375, "p90": 179.5052444458008, "max": 230.10430908203125, "pos_frac": 0.734375, "sample": [157.46282958984375, 12.258586883544922, 5.622428894042969, -168.35618591308594, 11.349723815917969, 98.6653823852539, -16.941558837890625, 13.010597229003906, -80.71835327148438, -6.9436492919921875, 211.39491271972656, 128.75599670410156, 0.22974205017089844, -52.72183609008789, 184.04710388183594, 72.36907196044922, 179.96607971191406, 177.9972381591797, 140.2890167236328, 180.7186279296875, 169.284423828125, 175.2159423828125, -11.519262313842773, -17.326780319213867, 230.10430908203125, 64.5057144165039, 42.501190185546875, 130.38357543945312, 8.970020294189453, 61.86043167114258, 158.78594970703125, -107.33331298828125, 71.1414566040039, 22.514850616455078, 14.667457580566406, -0.5191860198974609, 153.34201049804688, 153.18017578125, 92.78433227539062, 33.74347686767578, 167.61239624023438, 21.75078582763672, 145.49102783203125, -41.701622009277344, 128.29364013671875, 146.3128662109375, 136.5433349609375, -140.18722534179688, -0.8586692810058594, -5.341728210449219, 178.42996215820312, 4.476921081542969, 2.1075897216796875, -39.086761474609375, -4.109186172485352, -69.01898193359375, 204.72328186035156, 42.64887237548828, -72.96762084960938, 12.106590270996094, 98.779541015625, 5.553258895874023, 192.38272094726562, 42.162750244140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000622.npy"}
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 63.49437713623047, "std": 94.94392395019531, "min": -156.5010986328125, "p10": -35.30478057861327, "median": 59.56415557861328, "p90": 187.15604248046876, "max": 272.96966552734375, "pos_frac": 0.71875, "sample": [155.65963745117188, 28.22930145263672, -16.149250030517578, -150.35972595214844, 64.91752624511719, 217.4268798828125, 150.85604858398438, 153.46054077148438, 168.65341186523438, 85.64397430419922, -81.00747680664062, 60.528594970703125, 199.7269287109375, 48.339698791503906, -21.038040161132812, 213.828857421875, -14.101932525634766, 163.6806182861328, 56.35553741455078, 111.8024673461914, 40.816619873046875, 58.59971618652344, 172.56886291503906, -9.253036499023438, -18.026695251464844, 91.10179138183594, 147.9704132080078, 187.44915771484375, -156.5010986328125, -42.106842041015625, 112.11956787109375, 146.54425048828125, 89.11347961425781, 93.33273315429688, -15.588958740234375, -114.4101791381836, 178.27566528320312, -14.507369995117188, 136.1348876953125, 186.47210693359375, 202.91458129882812, 193.76583862304688, 76.10000610351562, 3.7964229583740234, -19.339675903320312, -10.587949752807617, -41.419097900390625, 25.798749923706055, 4.799922943115234, 110.90941619873047, 70.0919189453125, 272.96966552734375, 83.71315002441406, 70.45417785644531, 3.5389862060546875, 16.336387634277344, -20.284439086914062, 4.6712799072265625, 54.55848693847656, 10.633687973022461, 162.43748474121094, -3.7111129760742188, 39.381195068359375, -114.4477767944336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000623.npy"}
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 70.83955383300781, "std": 101.58707427978516, "min": -188.21270751953125, "p10": -39.775098037719715, "median": 42.28030776977539, "p90": 192.1891067504883, "max": 270.9872131347656, "pos_frac": 0.6875, "sample": [-65.53359985351562, -10.821792602539062, 157.2239990234375, 133.68846130371094, 187.32192993164062, 175.2528533935547, -5.944007873535156, -9.828689575195312, -15.576507568359375, -43.00243377685547, 71.9897689819336, 18.89147186279297, 121.06573486328125, 138.31556701660156, 158.56869506835938, 151.3653564453125, 18.592546463012695, 243.35702514648438, 235.67816162109375, 30.165016174316406, -32.24464797973633, -46.15773391723633, -12.190210342407227, 185.34341430664062, 5.404819488525391, 144.84893798828125, 51.314170837402344, -17.272911071777344, 42.047210693359375, 40.31642150878906, 270.9872131347656, 180.28228759765625, 125.31520080566406, 264.3817443847656, 11.210559844970703, -50.82744216918945, 194.27503967285156, 167.73318481445312, 242.0491180419922, 164.75979614257812, 53.283058166503906, 184.46389770507812, 120.04637908935547, 21.9273681640625, -58.339508056640625, -11.709274291992188, 222.07505798339844, 0.3985443115234375, -15.48546028137207, 126.35045623779297, 133.2210693359375, 0.8564395904541016, -10.153564453125, 8.169031143188477, -188.21270751953125, 135.01046752929688, 12.635238647460938, -27.43683624267578, 135.5640869140625, -13.655101776123047, -117.37891387939453, -0.5266189575195312, 42.513404846191406, 157.76876831054688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000624.npy"}
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 53.46350860595703, "std": 103.32579040527344, "min": -174.4987335205078, "p10": -69.73916473388671, "median": 29.67624282836914, "p90": 199.31807250976567, "max": 261.5113525390625, "pos_frac": 0.671875, "sample": [51.01542663574219, 155.45193481445312, 46.79966735839844, 53.848907470703125, -0.7921848297119141, 136.578857421875, 84.23780059814453, 109.64488983154297, 4.586950302124023, 247.39486694335938, 77.49177551269531, -65.28974914550781, 180.87606811523438, -8.585132598876953, 238.24801635742188, 28.593215942382812, 220.8196258544922, 28.559661865234375, -73.57334899902344, -29.46320343017578, 152.17388916015625, 151.70408630371094, 17.913658142089844, -106.70968627929688, -4.5560150146484375, 36.67253875732422, 39.025299072265625, 40.772544860839844, 15.419719696044922, 29.514984130859375, -3.496173858642578, 179.3630828857422, 260.53582763671875, -22.034141540527344, 139.74720764160156, -3.6976470947265625, -5.118837356567383, 261.5113525390625, -2.8006820678710938, 13.088905334472656, 86.52987670898438, 157.41323852539062, 212.17776489257812, 111.47603607177734, 29.837501525878906, -13.410629272460938, -171.49301147460938, -71.64605712890625, -110.01200866699219, 149.4346466064453, 29.061935424804688, -153.97293090820312, -174.4987335205078, 202.89794921875, 147.07470703125, -0.076568603515625, 190.96502685546875, 13.74261474609375, 63.531715393066406, -33.99046325683594, 78.59211730957031, 17.416690826416016, 10.457342147827148, -25.31816864013672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000625.npy"}
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 71.36672973632812, "std": 108.7412338256836, "min": -132.07736206054688, "p10": -66.91132202148437, "median": 62.75663757324219, "p90": 192.58048553466796, "max": 369.0495300292969, "pos_frac": 0.71875, "sample": [196.66494750976562, 8.710311889648438, 179.55984497070312, -33.573036193847656, 287.4642333984375, 184.65899658203125, 192.83587646484375, -52.6798095703125, 85.72936248779297, 185.764404296875, -59.863800048828125, 15.792251586914062, 53.98738098144531, 3.431690216064453, -89.82047271728516, 178.72557067871094, 121.84943389892578, -95.64924621582031, 219.41058349609375, 43.260276794433594, -103.22626495361328, -6.40606689453125, 102.0160140991211, 179.43399047851562, 10.621513366699219, -1.7092952728271484, 102.48546600341797, 54.740814208984375, -35.06330871582031, 369.0495300292969, 103.49773406982422, -64.5392837524414, 207.47314453125, 178.2823486328125, 58.03436279296875, -111.11538696289062, -4.699554443359375, 67.47891235351562, -0.06612968444824219, 112.91654205322266, 11.944259643554688, 11.286111831665039, 183.82078552246094, 38.03495407104492, -26.985488891601562, 2.6715030670166016, 82.61090087890625, 175.26344299316406, 69.3853759765625, 159.56935119628906, 154.7353515625, 5.367931365966797, -132.07736206054688, 181.961669921875, -95.08653259277344, 179.5765380859375, 21.85970687866211, 191.9845733642578, 156.29302978515625, 98.24618530273438, 222.3150177001953, -67.92790985107422, 132.69061279296875, -35.533287048339844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000626.npy"}
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 76.5022964477539, "std": 100.973876953125, "min": -183.18832397460938, "p10": -22.579851150512695, "median": 81.706298828125, "p90": 200.9724578857422, "max": 244.0369415283203, "pos_frac": 0.765625, "sample": [120.90066528320312, 5.394855499267578, 95.74708557128906, 97.92660522460938, 121.06834411621094, 113.38377380371094, 170.04632568359375, 233.52439880371094, 238.10400390625, 68.92666625976562, -7.842071533203125, 203.12664794921875, 186.0185546875, 158.21644592285156, 61.01976013183594, 57.12261962890625, -21.170166015625, 29.43688201904297, 201.32955932617188, 126.14395141601562, 163.99887084960938, 131.9071502685547, 24.899612426757812, 11.070068359375, -9.173019409179688, -132.98509216308594, 190.78872680664062, -47.38761901855469, 141.7175750732422, -23.184001922607422, 6.93266487121582, 193.5457305908203, 200.13922119140625, -18.811927795410156, 109.35455322265625, -8.068939208984375, -21.08271026611328, 134.0021514892578, -51.810516357421875, 206.724853515625, 244.0369415283203, 64.2373275756836, -139.7386474609375, -3.7350540161132812, 43.74095916748047, 0.6250457763671875, -16.961776733398438, 118.8185043334961, 20.35406494140625, 209.2036895751953, 53.45806121826172, -183.18832397460938, 15.24631118774414, 94.48593139648438, 191.10736083984375, 133.3634033203125, 35.892494201660156, -175.49276733398438, 54.03564453125, 66.8490982055664, 114.8299789428711, 167.57855224609375, 130.5888671875, 195.80889892578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000627.npy"}
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 77.04563903808594, "std": 107.29527282714844, "min": -226.4943084716797, "p10": -69.16172256469724, "median": 87.72977066040039, "p90": 198.4499008178711, "max": 242.3599395751953, "pos_frac": 0.796875, "sample": [200.125732421875, 84.40652465820312, 97.78721618652344, 133.54151916503906, -45.362571716308594, -88.77169799804688, 201.9952392578125, 23.2401123046875, 171.12277221679688, 170.4254150390625, 200.57577514648438, 212.93963623046875, 129.03807067871094, 40.37722396850586, 59.33086395263672, 146.67391967773438, -226.4943084716797, -147.52011108398438, 152.6398468017578, 41.90412521362305, 194.5396270751953, 70.88941955566406, 20.374902725219727, -99.44332885742188, 187.17420959472656, 152.71853637695312, 87.82131958007812, 146.61233520507812, 89.04103088378906, 190.10183715820312, 175.2069854736328, 212.0247802734375, 21.435592651367188, 194.09669494628906, 11.98867416381836, 221.34591674804688, -31.660842895507812, 2.7081680297851562, 171.6068115234375, -131.18800354003906, 116.21443176269531, 87.63822174072266, 35.968406677246094, 164.25006103515625, 28.79975128173828, 36.73683166503906, 178.12095642089844, 84.33433532714844, 163.32156372070312, 182.16001892089844, 36.20720672607422, -33.70909118652344, 11.801132202148438, -184.25523376464844, 88.08204650878906, 242.3599395751953, 18.99224853515625, 176.0741424560547, -9.074691772460938, 80.37432861328125, -41.11090087890625, 104.13126373291016, -2.5044517517089844, -79.36135864257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000628.npy"}
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 55.74674987792969, "std": 86.6655502319336, "min": -151.28302001953125, "p10": -29.6634407043457, "median": 25.7706298828125, "p90": 190.24560394287113, "max": 240.4757537841797, "pos_frac": 0.71875, "sample": [-31.22559356689453, 99.6111831665039, -151.28302001953125, 102.99105072021484, 83.21751403808594, 8.025199890136719, -10.5947265625, 89.37932586669922, -116.57292175292969, 111.50059509277344, -9.405776977539062, 20.056129455566406, 5.632043838500977, 199.50685119628906, -12.559347152709961, -51.61928176879883, 1.7896785736083984, -8.655303955078125, -39.265838623046875, 14.259979248046875, 22.91754150390625, -7.122856140136719, -3.419191360473633, 50.15024948120117, -43.375667572021484, 47.85724639892578, 240.4757537841797, 174.82427978515625, -13.062368392944336, 183.11468505859375, 106.87713623046875, -33.470481872558594, 82.63304138183594, 89.86003875732422, -5.200237274169922, 60.624237060546875, 130.23362731933594, 16.16107177734375, 4.737491607666016, -0.8005695343017578, 7.6368408203125, 64.6566162109375, 224.0849609375, 0.053188323974609375, -10.247795104980469, 3.1532821655273438, 30.733619689941406, 145.96192932128906, 138.7233428955078, 141.91604614257812, 0.9458770751953125, 69.61138916015625, 214.85952758789062, 14.79918098449707, 90.49609375, 226.85533142089844, 7.952568054199219, 224.78060913085938, 171.02581787109375, 193.3017120361328, 28.62371826171875, 154.72967529296875, -26.018417358398438, 40.353981018066406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000629.npy"}
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 75.34062194824219, "std": 96.39840698242188, "min": -146.44821166992188, "p10": -33.771115112304685, "median": 56.9765510559082, "p90": 196.6002273559571, "max": 266.8498229980469, "pos_frac": 0.765625, "sample": [-69.09452819824219, 163.62863159179688, 6.684825897216797, 9.531425476074219, 153.67855834960938, 227.58204650878906, 105.0497817993164, 0.07934951782226562, 218.86489868164062, 156.09634399414062, -2.7943496704101562, 172.87554931640625, -9.509559631347656, 182.92623901367188, 13.296989440917969, 172.38125610351562, -101.54887390136719, 83.02352142333984, 147.43496704101562, -93.337890625, 46.663970947265625, -0.8235092163085938, 45.629661560058594, 184.29388427734375, 214.64334106445312, 9.222465515136719, -68.36188507080078, 46.972896575927734, -146.44821166992188, 145.2440185546875, -3.2036056518554688, 34.54606628417969, 5.837009429931641, 226.71644592285156, 90.75973510742188, 53.77546691894531, 201.87437438964844, 21.987579345703125, 60.177635192871094, 102.54314422607422, 266.8498229980469, 29.01801300048828, 183.68869018554688, -67.75474548339844, 184.25692749023438, 127.4823989868164, 153.87350463867188, 38.556365966796875, 102.13127136230469, -31.65887451171875, 44.154090881347656, -29.961898803710938, -16.393596649169922, 226.92782592773438, 162.67086791992188, 8.248655319213867, 82.1713638305664, 12.862430572509766, 177.6555938720703, 183.99136352539062, -34.676361083984375, 104.68801879882812, 104.18280792236328, -0.06457901000976562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000630.npy"}
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 83.22488403320312, "std": 107.67364501953125, "min": -172.03265380859375, "p10": -46.97983169555663, "median": 68.75063705444336, "p90": 206.21723480224608, "max": 291.82861328125, "pos_frac": 0.703125, "sample": [177.6299591064453, 82.39535522460938, -37.26043701171875, 203.4539794921875, 135.44046020507812, 183.58566284179688, 204.8675537109375, 146.9786376953125, 40.262569427490234, -12.635910034179688, -12.490264892578125, -5.606334686279297, 136.49974060058594, 12.242019653320312, -4.123409271240234, 44.785545349121094, -20.060791015625, 29.760150909423828, 67.67545318603516, 244.68658447265625, 259.91015625, -68.29583740234375, 201.06741333007812, 206.26979064941406, 140.63809204101562, 57.3277587890625, 22.78399658203125, 257.2574768066406, -8.43067741394043, 132.41526794433594, 42.92252731323242, 55.266090393066406, -156.416259765625, 115.08087158203125, 69.82582092285156, 258.67291259765625, 175.59701538085938, -3.96417236328125, 291.82861328125, 194.4989776611328, 96.9444351196289, -59.33099365234375, -63.95082092285156, 149.728759765625, -13.69677734375, -172.03265380859375, -5.05708122253418, 8.912912368774414, -75.41358184814453, 96.458984375, -51.145286560058594, 147.00338745117188, -9.838043212890625, 202.02749633789062, -6.216785430908203, 176.09341430664062, 16.60324478149414, 206.0946044921875, 57.548160552978516, 236.73419189453125, 126.54515075683594, 183.94900512695312, 26.110782623291016, 189.97775268554688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000631.npy"}
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 81.67279815673828, "std": 94.60446166992188, "min": -182.88131713867188, "p10": -20.829013061523437, "median": 81.33837127685547, "p90": 195.77398376464845, "max": 250.57620239257812, "pos_frac": 0.78125, "sample": [115.2084732055664, 184.28955078125, 101.87580871582031, -0.8908958435058594, 71.33480834960938, 186.63458251953125, 0.24047470092773438, 121.42625427246094, -12.408000946044922, 190.93997192382812, 91.30770874023438, -30.02740478515625, 93.78372192382812, -11.47378921508789, -63.15533447265625, 190.1387481689453, 207.02699279785156, 3.9614486694335938, 177.95729064941406, -78.67715454101562, 17.14666748046875, 33.5792121887207, -121.36738586425781, 19.481613159179688, 60.02752685546875, 165.65469360351562, 40.819000244140625, 160.32925415039062, 76.83979797363281, -2.543964385986328, 171.45240783691406, 250.57620239257812, 85.83694458007812, -21.52802276611328, 197.845703125, 11.723688125610352, 214.6910400390625, 59.82789611816406, 90.03205108642578, 106.11210632324219, 187.0694580078125, 183.2492218017578, 26.020477294921875, -19.19799041748047, 8.886539459228516, 197.8658447265625, 127.59898376464844, 110.10813903808594, 187.8424072265625, 181.6701202392578, 190.51022338867188, -182.88131713867188, 47.829986572265625, 143.83297729492188, -2.1110458374023438, -12.529685974121094, 199.26515197753906, 206.68426513671875, 176.68280029296875, 45.3509521484375, 47.61499786376953, 34.619266510009766, -56.89726638793945, 41.94474411010742], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000632.npy"}
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 62.541812896728516, "std": 94.36922454833984, "min": -138.94778442382812, "p10": -52.216253662109374, "median": 46.49541664123535, "p90": 191.6567138671875, "max": 238.67849731445312, "pos_frac": 0.71875, "sample": [-3.0526790618896484, 37.44068908691406, 15.051094055175781, 105.14118957519531, 138.84877014160156, -8.541885375976562, 116.6260986328125, 135.86329650878906, 59.74586486816406, 55.738563537597656, -3.3768157958984375, 10.673097610473633, 234.52259826660156, 52.40446090698242, -53.15981674194336, 149.43618774414062, 8.026687622070312, 114.70097351074219, 82.8510513305664, -24.163246154785156, 5.007514953613281, -22.20116424560547, 125.04097747802734, 42.00916290283203, 168.8455352783203, 76.01354217529297, -37.677677154541016, -18.664405822753906, 128.67462158203125, 114.72521209716797, 175.98863220214844, 4.494270324707031, 192.03790283203125, 19.651931762695312, -50.01460647583008, 10.056720733642578, -73.37055969238281, -25.727157592773438, -60.10948944091797, -24.243675231933594, -75.37617492675781, 43.330326080322266, 1.7466773986816406, 226.4227294921875, 55.05542755126953, 54.638526916503906, 37.24723815917969, -69.08907318115234, 49.66050720214844, 188.42637634277344, 183.22657775878906, 177.5243377685547, 190.76727294921875, 184.27447509765625, 4.913795471191406, -97.33731842041016, -6.651744842529297, 16.38882064819336, 130.95965576171875, 205.94200134277344, 238.67849731445312, -138.94778442382812, 226.9286346435547, 198.63272094726562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000633.npy"}
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 25.533855438232422, "std": 96.52301788330078, "min": -174.43138122558594, "p10": -101.67974624633787, "median": 2.151555061340332, "p90": 175.94853515625002, "max": 231.66270446777344, "pos_frac": 0.546875, "sample": [6.178157806396484, -112.7496337890625, -2.45477294921875, 188.11013793945312, 32.988006591796875, -23.899282455444336, -12.168903350830078, -25.291471481323242, 33.48912811279297, 12.816879272460938, 19.68663215637207, -6.976524353027344, 0.1836395263671875, -21.631263732910156, -21.9593505859375, 57.85895538330078, 119.95881652832031, 66.33526611328125, -88.0362777709961, -49.97076416015625, -44.95832824707031, -0.6388664245605469, 124.40122985839844, -147.65768432617188, 192.1160430908203, 3.403949737548828, 189.61264038085938, 149.6342315673828, 231.66270446777344, -132.9052276611328, -14.034210205078125, 19.259851455688477, -3.353038787841797, -40.85319900512695, 178.9640350341797, 6.780813217163086, -75.14361572265625, -156.0167236328125, 203.5938262939453, 168.91236877441406, 129.30897521972656, -63.467864990234375, -107.52694702148438, 24.016498565673828, -109.65191650390625, -174.43138122558594, 1.9862861633300781, 2.1139793395996094, 182.6971435546875, 62.895450592041016, -31.7044677734375, -40.61869812011719, 139.5814208984375, 31.50145721435547, -0.41191673278808594, -3.7202186584472656, 163.85086059570312, 132.30801391601562, 85.51653289794922, 104.00206756591797, -3.8540267944335938, 2.1891307830810547, -31.097801208496094, 113.43597412109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000634.npy"}
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 49.46977233886719, "std": 104.21263885498047, "min": -161.67977905273438, "p10": -90.84805908203124, "median": 33.628448486328125, "p90": 183.85444030761718, "max": 228.6767120361328, "pos_frac": 0.671875, "sample": [-75.9705810546875, 173.93472290039062, -123.25713348388672, 6.269641876220703, 41.54547119140625, 18.84685516357422, -97.22412109375, -128.19984436035156, -128.40243530273438, 182.2760009765625, 0.4479522705078125, 171.99774169921875, 6.661506652832031, -15.652137756347656, 185.3619384765625, 205.234619140625, 160.95303344726562, 7.691230773925781, 170.89205932617188, 181.9112091064453, -60.04502868652344, 29.273590087890625, -28.487648010253906, 209.21343994140625, 37.983306884765625, -63.95362854003906, 159.66885375976562, -2.9312496185302734, -9.929630279541016, 90.70425415039062, 42.6551513671875, -48.97106170654297, 184.53091430664062, 28.462120056152344, 39.104286193847656, 8.830692291259766, 181.09800720214844, 9.112018585205078, 177.35768127441406, -149.33609008789062, 140.29928588867188, 227.14776611328125, -32.549198150634766, 197.0919647216797, -19.646804809570312, -39.06443786621094, 75.71253967285156, 96.49727630615234, 103.32429504394531, -42.83879089355469, 114.41854095458984, 228.6767120361328, -4.960601806640625, 62.36005401611328, -161.67977905273438, 118.07588958740234, 125.59684753417969, 17.897422790527344, -4.65032958984375, 133.21133422851562, -117.11863708496094, 102.6280288696289, 65.2585220336914, 0.71990966796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000635.npy"}
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 68.08009338378906, "std": 91.17893981933594, "min": -177.8330078125, "p10": -16.29277191162109, "median": 71.06044387817383, "p90": 183.85195770263672, "max": 254.328125, "pos_frac": 0.8125, "sample": [183.40541076660156, 7.885917663574219, 35.17688751220703, 32.500709533691406, 98.8995590209961, 13.001289367675781, 74.73765563964844, -6.379768371582031, -23.43750762939453, 176.06692504882812, 97.66899108886719, -12.887468338012695, 184.0433349609375, 149.53627014160156, 41.078765869140625, 37.561492919921875, -36.87477111816406, 162.9233856201172, 69.7433090209961, 114.0193099975586, 135.81170654296875, 4.534675598144531, 2.488515853881836, 147.6117401123047, 3.0982131958007812, 88.6855239868164, 234.13534545898438, 3.985288619995117, -1.2613887786865234, 69.72797393798828, 81.45860290527344, 81.87883758544922, 224.5085906982422, 72.37757873535156, -9.606094360351562, 156.28219604492188, 101.772705078125, 67.36754608154297, -17.279586791992188, 182.49246215820312, 127.6345443725586, 195.997802734375, -13.990203857421875, 254.328125, 6.151622772216797, -177.8330078125, 206.3618927001953, 27.4193115234375, 2.554523468017578, 175.21658325195312, 113.90428161621094, 79.1695327758789, 124.38490295410156, 204.60577392578125, 69.06706237792969, -139.97528076171875, -158.8837890625, 12.230401992797852, 80.70140075683594, 89.96237182617188, 18.790122985839844, -82.87464141845703, 79.74736785888672, 33.71498107910156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000636.npy"}
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 51.14664840698242, "std": 104.43284606933594, "min": -243.79998779296875, "p10": -87.06835556030272, "median": 25.950054168701172, "p90": 196.69568634033206, "max": 314.6501770019531, "pos_frac": 0.703125, "sample": [11.714248657226562, 210.39645385742188, 16.66120147705078, 314.6501770019531, -19.09880828857422, -7.390838623046875, -12.159553527832031, 129.61846923828125, 203.39089965820312, 36.184146881103516, -38.63475799560547, 27.745338439941406, 122.8205337524414, -94.29557037353516, 140.5457763671875, 55.98479080200195, -11.197715759277344, 114.164306640625, 185.40267944335938, -13.268877029418945, 81.65544128417969, 36.06919860839844, 20.35504150390625, -3.2511367797851562, 5.95576286315918, 59.04241943359375, -92.9326400756836, 172.5205841064453, 225.44654846191406, 191.0599365234375, 190.31004333496094, 21.443214416503906, -119.63966369628906, 149.78213500976562, -29.477447509765625, 134.26583862304688, -103.73399353027344, 1.5241928100585938, 17.75322723388672, 47.227699279785156, 61.30787658691406, 204.607421875, 0.5602054595947266, 20.283939361572266, 182.69500732421875, 19.492271423339844, -11.0452880859375, 50.50732421875, -92.45841979980469, -8.882911682128906, 22.80230712890625, 218.7076873779297, 84.653564453125, 57.99732208251953, 24.154769897460938, -136.27398681640625, 199.1110076904297, 52.02185821533203, -20.89440155029297, -243.79998779296875, 90.49910736083984, 190.99200439453125, -74.49153900146484, 2.2292709350585938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000637.npy"}
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 60.937583923339844, "std": 93.30282592773438, "min": -157.4529266357422, "p10": -29.906874084472655, "median": 22.53293514251709, "p90": 192.61539001464845, "max": 210.7644805908203, "pos_frac": 0.6875, "sample": [-113.81692504882812, 155.38023376464844, 124.88299560546875, 0.3597145080566406, -157.4529266357422, -7.6418914794921875, 199.0623779296875, 111.23587036132812, 40.88959884643555, -15.100482940673828, 6.035575866699219, -50.44345474243164, -16.013412475585938, -11.48992919921875, -30.13995361328125, 193.71856689453125, -15.481155395507812, -48.192291259765625, 190.8839111328125, 9.50750732421875, 66.86845397949219, 16.491615295410156, 163.298828125, 164.6630859375, -94.17337799072266, 189.33631896972656, 120.78673553466797, 180.46099853515625, 1.14044189453125, 66.5032958984375, 17.515844345092773, 198.04766845703125, 139.27552795410156, 172.33297729492188, 11.0809326171875, -19.236780166625977, 7.397209167480469, 132.84762573242188, 13.873977661132812, 187.1678466796875, 87.0628433227539, 195.1671905517578, 195.83050537109375, -2.522235870361328, 187.76417541503906, -29.363021850585938, -3.120685577392578, 180.17620849609375, -12.971939086914062, 27.550025939941406, -0.1251201629638672, 15.160442352294922, 210.7644805908203, 95.715087890625, -39.80190658569336, 30.053504943847656, 2.1692428588867188, 193.35745239257812, 187.19151306152344, 5.745372772216797, -7.5803375244140625, -4.82763671875, 39.280982971191406, 45.46607208251953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000638.npy"}
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 71.1132583618164, "std": 99.14191436767578, "min": -119.62590789794922, "p10": -53.46234321594238, "median": 49.673675537109375, "p90": 206.6626724243164, "max": 264.4286193847656, "pos_frac": 0.6875, "sample": [169.3779296875, 122.06222534179688, 117.22930908203125, -60.229957580566406, 70.06353759765625, 157.12399291992188, 9.688819885253906, -36.5870246887207, -54.317771911621094, 128.80532836914062, 23.827239990234375, 65.19032287597656, 247.51597595214844, 54.95069885253906, 164.2471466064453, -4.536712646484375, 44.39665222167969, 2.169647216796875, -119.62590789794922, 94.1622543334961, 129.85086059570312, 169.01315307617188, -0.5352115631103516, 226.26885986328125, 5.287025451660156, 132.45391845703125, 67.81269836425781, 189.5849151611328, -31.3697509765625, 207.34019470214844, 212.4871368408203, -1.6895084381103516, 0.13396263122558594, 137.48138427734375, -4.20660400390625, 13.984794616699219, 135.14511108398438, -7.8828887939453125, 264.4286193847656, 158.72030639648438, 159.7808837890625, 33.1920166015625, 255.27456665039062, 105.3047866821289, -54.6282958984375, -0.7004890441894531, -21.17229461669922, -51.46634292602539, 42.37400817871094, 30.575542449951172, 206.94583129882812, 167.02235412597656, -3.4097061157226562, 19.39453125, -9.70611572265625, -107.21319580078125, 20.08844757080078, -57.71858215332031, -78.12421417236328, 206.00196838378906, -41.862083435058594, 156.43533325195312, 175.41998291015625, 199.6170654296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000639.npy"}
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 66.09814453125, "std": 96.15957641601562, "min": -192.50851440429688, "p10": -38.210549926757814, "median": 49.77434730529785, "p90": 190.51590423583988, "max": 265.558837890625, "pos_frac": 0.796875, "sample": [180.09046936035156, 48.53688430786133, 265.558837890625, -30.95429801940918, 14.622810363769531, 107.96444702148438, 122.38970947265625, 4.1230621337890625, 107.09732055664062, 195.1297149658203, 51.011810302734375, 0.8935470581054688, 126.61016845703125, 154.12353515625, -119.67826843261719, 139.0553436279297, 61.927574157714844, 43.92045211791992, -44.67904281616211, 201.04209899902344, -186.1614227294922, 119.40821075439453, 214.7915496826172, 42.70774841308594, -48.60307312011719, -12.48520278930664, 155.54647827148438, 16.936256408691406, -21.32469367980957, 3.303081512451172, 169.95277404785156, 11.971694946289062, 52.015716552734375, 194.05381774902344, 112.88424682617188, -1.811187744140625, -14.730354309082031, 21.095672607421875, -54.01457214355469, 98.2071762084961, 45.219207763671875, 182.26077270507812, 5.666053771972656, 67.98826599121094, 176.728759765625, 118.74800872802734, 88.73526763916016, -192.50851440429688, 21.90336036682129, 180.8854522705078, 133.4090576171875, 219.90164184570312, 27.991317749023438, 248.892822265625, 6.9012908935546875, -38.15577697753906, 175.7098388671875, -38.23402404785156, 76.16683959960938, 42.96232604980469, 16.929035186767578, 30.6915283203125, 37.13648223876953, 91.82218170166016], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000640.npy"}
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 59.01282501220703, "std": 80.51277923583984, "min": -197.4715118408203, "p10": -15.683829116821288, "median": 40.65852737426758, "p90": 169.13310852050782, "max": 267.6143798828125, "pos_frac": 0.828125, "sample": [-14.988697052001953, -104.0613784790039, 98.74203491210938, 83.70993041992188, 36.4681396484375, -1.1754531860351562, 152.69900512695312, 7.992439270019531, 0.19445037841796875, 4.55975341796875, 37.45249938964844, 17.29583168029785, 132.678466796875, 171.77783203125, 47.89129638671875, 9.072309494018555, -4.08197021484375, 168.0908203125, 188.6064453125, 132.4268035888672, 107.75064849853516, -15.981742858886719, 54.85157012939453, 35.53767395019531, 97.41876983642578, 200.46803283691406, 9.991378784179688, 69.03515625, 26.871658325195312, 117.02204895019531, 9.132080078125, 64.21270751953125, -61.92688751220703, 99.59285736083984, 7.6231689453125, -59.248863220214844, 141.53042602539062, 49.20759582519531, -24.854698181152344, 33.148712158203125, -197.4715118408203, 132.26583862304688, 43.74010467529297, 19.052749633789062, 95.20912170410156, 209.0857391357422, 9.69512939453125, -2.4543285369873047, 148.26385498046875, 37.57695007324219, 267.6143798828125, 115.3638916015625, 199.52162170410156, 169.57980346679688, -21.482940673828125, 66.30096435546875, 94.24392700195312, 20.209548950195312, 35.598270416259766, 26.961105346679688, 3.0335922241210938, 10.185638427734375, 45.649627685546875, 122.34500885009766], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000641.npy"}
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 42.731483459472656, "std": 99.57953643798828, "min": -167.9807586669922, "p10": -74.67316284179687, "median": 30.677461624145508, "p90": 174.6412017822266, "max": 273.6488037109375, "pos_frac": 0.671875, "sample": [74.5060043334961, 15.929832458496094, 1.3815078735351562, -99.3931884765625, 57.42173767089844, -9.497472763061523, 112.1854248046875, 189.78712463378906, -70.15383911132812, -167.9807586669922, 273.6488037109375, -5.281394958496094, 96.12947082519531, -77.21992492675781, 78.974853515625, 48.2713623046875, 129.04440307617188, 213.86581420898438, -17.449010848999023, -16.881351470947266, 112.7638168334961, 115.418701171875, 156.247314453125, -38.70742416381836, 89.39700317382812, 144.33387756347656, 188.3840789794922, 13.512664794921875, 159.9659423828125, 29.93372344970703, 98.6211929321289, 111.3328857421875, 80.18637084960938, 18.57455825805664, -12.685020446777344, 167.95486450195312, -62.96827697753906, 50.979209899902344, 6.038902282714844, -57.28367614746094, -0.3780860900878906, -13.022216796875, 6.637674331665039, -64.13582611083984, 143.26487731933594, 15.860994338989258, -76.61001586914062, 135.3658905029297, 164.25875854492188, 198.20712280273438, 35.827392578125, 19.23084259033203, 177.50677490234375, 103.12943267822266, 188.88803100585938, 31.421199798583984, -52.45275115966797, 42.807090759277344, -162.69244384765625, -157.538818359375, 5.3917388916015625, -66.70759582519531, -149.78627014160156, 11.051139831542969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000642.npy"}
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 67.19929504394531, "std": 100.76051330566406, "min": -180.78546142578125, "p10": -58.95417022705078, "median": 70.50037384033203, "p90": 196.3795196533203, "max": 251.14706420898438, "pos_frac": 0.75, "sample": [61.092987060546875, 78.32795715332031, 84.15750122070312, 16.74603271484375, 1.2716093063354492, 220.42721557617188, 1.5479984283447266, 176.00701904296875, 78.73857879638672, 24.26848602294922, 98.94622802734375, 37.119972229003906, 130.67042541503906, 3.9824066162109375, 132.52572631835938, 120.88198852539062, -50.93217468261719, 12.203393936157227, 196.11904907226562, 200.25997924804688, 160.009765625, 33.85407257080078, 171.00860595703125, 56.861572265625, -43.493934631347656, 191.9803466796875, 52.58363342285156, -29.112152099609375, 192.93557739257812, 251.14706420898438, 89.90990447998047, 154.1466827392578, -31.7587890625, 152.1089630126953, 70.83863830566406, -78.36613464355469, 45.98545837402344, -1.3150634765625, 217.288330078125, 171.3663330078125, 141.37393188476562, -60.42180633544922, 133.43028259277344, -9.902305603027344, 157.50625610351562, -125.66163635253906, -120.28267669677734, -180.78546142578125, 1.6386337280273438, 20.710491180419922, -60.891380310058594, 91.88966369628906, 196.49114990234375, 83.66561889648438, 56.392181396484375, -55.529685974121094, -12.8116455078125, 211.071044921875, 70.162109375, 77.13677215576172, -28.18783950805664, 244.86962890625, 141.47894287109375, -124.92880249023438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000643.npy"}
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 51.205039978027344, "std": 103.28720092773438, "min": -230.21023559570312, "p10": -64.9560104370117, "median": 27.545089721679688, "p90": 186.01185150146486, "max": 214.26626586914062, "pos_frac": 0.75, "sample": [190.60943603515625, -104.47047424316406, -230.21023559570312, 171.16871643066406, -30.921810150146484, -72.62741088867188, 6.593971252441406, 5.612157821655273, 182.69256591796875, 214.26626586914062, 70.08486938476562, 39.78799819946289, 189.4346923828125, 140.60357666015625, 83.93135070800781, 54.01012420654297, 58.5409049987793, 48.779640197753906, 9.493125915527344, 181.71583557128906, 11.592510223388672, 178.46334838867188, -141.3081817626953, -18.419570922851562, -19.8150634765625, 112.8893814086914, 0.6657028198242188, 98.18135070800781, 7.03814697265625, -47.05607604980469, 187.4344024658203, -8.25848388671875, 2.9399261474609375, 137.78289794921875, 158.29360961914062, -0.801025390625, -27.160491943359375, 205.318603515625, 180.98486328125, 22.44464874267578, 87.49401092529297, 191.82772827148438, 165.65635681152344, 32.274131774902344, 11.48834228515625, -197.6531219482422, -12.302261352539062, 135.4509735107422, 12.669441223144531, 165.2790985107422, -44.24418640136719, 5.353424072265625, 3.6132564544677734, -93.8847427368164, 22.81604766845703, 10.252021789550781, -141.0443115234375, 110.59896850585938, 129.38446044921875, 3.9669036865234375, 15.745254516601562, 177.2078857421875, 33.732215881347656, 201.134765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000644.npy"}
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 61.90576934814453, "std": 84.98475646972656, "min": -175.219970703125, "p10": -37.81533775329589, "median": 51.52216720581055, "p90": 177.99354095458986, "max": 235.13059997558594, "pos_frac": 0.765625, "sample": [26.385562896728516, 200.79544067382812, 114.66791534423828, -91.39385223388672, 86.24873352050781, 227.39501953125, 144.94369506835938, -1.0299625396728516, -11.220390319824219, 12.096687316894531, 10.65731430053711, -66.61109924316406, 46.26295471191406, -1.5807075500488281, 144.68240356445312, 35.76612091064453, 114.53829956054688, 1.0100288391113281, -4.5111083984375, 17.989439010620117, 189.23379516601562, -39.809181213378906, 23.897544860839844, 113.68939971923828, 56.78137969970703, -2.9845046997070312, 33.65099334716797, 100.17388916015625, 185.9575958251953, 106.48129272460938, 120.28328704833984, -0.5341091156005859, 71.97600555419922, 26.871047973632812, 7.8335418701171875, 137.78213500976562, -92.85481262207031, 9.105461120605469, -60.940704345703125, 190.65223693847656, 88.77473449707031, 175.18844604492188, 235.13059997558594, -4.340232849121094, 29.04946517944336, 106.95784759521484, 136.238037109375, 81.08184051513672, 68.73480224609375, 134.34146118164062, 8.5361328125, 141.37960815429688, 57.86403274536133, 148.96511840820312, 92.46781921386719, 0.2077178955078125, 156.46046447753906, -33.16303634643555, -41.14471435546875, -175.219970703125, 14.543846130371094, 179.1957244873047, 154.23658752441406, 22.144126892089844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000645.npy"}
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 53.65968704223633, "std": 99.26956176757812, "min": -184.69322204589844, "p10": -53.393206024169906, "median": 31.950693130493164, "p90": 185.62993774414062, "max": 275.8211975097656, "pos_frac": 0.734375, "sample": [-31.22314453125, 0.6618709564208984, 1.5977401733398438, 122.82173156738281, 185.8404998779297, 168.2710418701172, -137.2932586669922, 180.20770263671875, 89.32981872558594, 194.0443878173828, -6.884513854980469, 12.372879028320312, 178.34249877929688, 185.1386260986328, 140.2677001953125, -3.944427490234375, 87.40413665771484, 74.19076538085938, 187.37738037109375, 127.20592498779297, 204.8294677734375, 68.08676147460938, -184.69322204589844, -59.746604919433594, -77.45907592773438, 106.54927825927734, 2.0153770446777344, -9.761123657226562, 14.738784790039062, 32.8251953125, 41.81475830078125, 1.7759170532226562, 44.224510192871094, 190.4583282470703, 66.3402099609375, 11.157463073730469, 43.10515594482422, 3.4594192504882812, 64.00032043457031, 133.80413818359375, -1.3044071197509766, -39.44853973388672, 122.60807800292969, -59.36949157714844, 153.4017333984375, 275.8211975097656, 177.77113342285156, 132.31915283203125, 25.33831024169922, 237.47616577148438, 27.462369918823242, 3.18896484375, 8.1007080078125, -11.116043090820312, -184.31814575195312, 23.909652709960938, 85.76898193359375, -132.39524841308594, -38.93525695800781, 167.0123291015625, 8.79465103149414, -10.594039916992188, -21.602886199951172, 31.076190948486328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000646.npy"}
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 86.12971496582031, "std": 90.65769958496094, "min": -111.19020080566406, "p10": -25.021340942382807, "median": 88.32340621948242, "p90": 202.4074234008789, "max": 276.6451721191406, "pos_frac": 0.8125, "sample": [-6.251842498779297, 110.43209075927734, 181.17434692382812, 12.144668579101562, 46.964447021484375, 141.34974670410156, 19.21673583984375, 207.223876953125, 109.51123046875, 241.35665893554688, 86.052001953125, 137.60595703125, 145.45285034179688, 1.2898292541503906, -42.09132385253906, 23.0045166015625, 276.6451721191406, -31.950876235961914, -95.99652099609375, 125.29486083984375, 180.18202209472656, 62.45000457763672, 114.61823272705078, 100.21173095703125, 207.6004638671875, 155.855712890625, 170.84033203125, -0.5953884124755859, 190.2494354248047, 47.905242919921875, 157.10263061523438, 12.7041015625, 142.29150390625, 4.678916931152344, 146.11163330078125, -26.787017822265625, 16.88485336303711, 194.80531311035156, 68.1948471069336, -64.70870208740234, 118.48069763183594, 31.070838928222656, 52.80244827270508, 153.31524658203125, -111.19020080566406, 42.850494384765625, 90.59481048583984, 34.35452651977539, 43.48313903808594, 81.40277862548828, -19.90673828125, 211.71707153320312, 170.3094024658203, -20.90142822265625, -69.77218627929688, 171.39370727539062, 39.058685302734375, 240.622802734375, -20.047035217285156, 198.66061401367188, 204.01319885253906, 41.031272888183594, 132.6072235107422, 127.32568359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000647.npy"}
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 60.570072174072266, "std": 103.55755615234375, "min": -192.59164428710938, "p10": -52.135807037353516, "median": 72.18083572387695, "p90": 188.68914031982425, "max": 275.26788330078125, "pos_frac": 0.6875, "sample": [149.36993408203125, 33.161521911621094, 12.393775939941406, -184.12342834472656, 167.16619873046875, -45.15177917480469, -192.59164428710938, 4.877494812011719, 174.67803955078125, 37.131629943847656, -52.4859504699707, 96.35066986083984, 36.90959167480469, -5.832611083984375, -98.87933349609375, -128.96697998046875, 53.05922317504883, -12.496543884277344, 275.26788330078125, 102.37664794921875, 191.73695373535156, 110.70665740966797, 106.26290893554688, 145.25717163085938, 181.57757568359375, -12.519767761230469, 104.42385864257812, 200.29318237304688, 61.764564514160156, 83.08714294433594, -22.2052001953125, -2.0574722290039062, 243.14215087890625, 243.52459716796875, 1.7873687744140625, -83.39110565185547, 90.79405212402344, 106.09537506103516, 140.10426330566406, 12.765861511230469, 128.34695434570312, 178.64871215820312, 204.65414428710938, -51.31880569458008, -19.49542999267578, 10.993766784667969, 50.94575500488281, 118.4619369506836, 151.51463317871094, 196.053955078125, 134.3383331298828, 124.34730529785156, 135.04489135742188, -133.49095153808594, -28.12530517578125, 52.70115661621094, 109.23260498046875, -37.50910186767578, 82.59710693359375, -49.12449264526367, -1.0812759399414062, -33.80615234375, 120.12075805664062, 107.06966400146484], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000648.npy"}
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 61.72749328613281, "std": 103.72412872314453, "min": -172.37205505371094, "p10": -77.693253326416, "median": 58.5422248840332, "p90": 185.83460235595703, "max": 250.89683532714844, "pos_frac": 0.6875, "sample": [161.37582397460938, 140.41183471679688, -36.82229232788086, 168.3775634765625, 176.36302185058594, 179.724365234375, -4.1717376708984375, -2.655527114868164, -11.541976928710938, 176.6748504638672, 14.118490219116211, 250.89683532714844, 123.30418395996094, -3.6424598693847656, 144.02426147460938, 167.68063354492188, 112.36268615722656, 79.54330444335938, 138.8661346435547, 189.04466247558594, 186.5100555419922, -1.0270023345947266, -127.50713348388672, -19.72649383544922, -172.37205505371094, 104.93096923828125, 6.010734558105469, 52.35192108154297, -119.10956573486328, -58.764923095703125, 45.87929153442383, 42.44159698486328, 157.7592010498047, 0.36499977111816406, 101.10299682617188, 146.39718627929688, 44.07067108154297, 150.84521484375, 119.49608612060547, 172.5364227294922, 12.344917297363281, 192.246826171875, 86.7132797241211, -17.49722671508789, 7.5770721435546875, 184.258544921875, 1.4776840209960938, 10.834815979003906, -22.844390869140625, 64.73252868652344, 23.57677459716797, 225.55038452148438, -82.38863372802734, -162.81893920898438, 212.9261016845703, -25.126392364501953, -112.54127502441406, -104.82395935058594, 97.60436248779297, 188.17764282226562, 152.33834838867188, 92.04497528076172, -3.191204071044922, -66.73736572265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000649.npy"}
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 62.324851989746094, "std": 105.07905578613281, "min": -172.54652404785156, "p10": -58.07665481567383, "median": 51.0654239654541, "p90": 187.29666442871095, "max": 234.0707244873047, "pos_frac": 0.71875, "sample": [-31.981006622314453, 216.63548278808594, 114.93707275390625, 144.2812957763672, 10.887191772460938, 175.4964599609375, 170.14242553710938, 67.73576354980469, 5.219093322753906, 34.395084381103516, 167.38369750976562, 178.58633422851562, -138.14047241210938, -2.5172271728515625, 2.777963638305664, 141.03607177734375, 155.5198516845703, 25.423044204711914, 207.8794403076172, -0.8852291107177734, 165.76124572753906, 107.55745697021484, -42.56565856933594, 5.298397064208984, 183.01576232910156, 191.31790161132812, 189.01454162597656, 180.0042724609375, 97.40921783447266, -119.47957611083984, 2.7902488708496094, -153.78944396972656, -4.189647674560547, 129.33596801757812, -166.74972534179688, 22.778396606445312, 183.2882843017578, 80.57243347167969, 15.514015197753906, -18.128944396972656, -32.029876708984375, 76.79450988769531, -172.54652404785156, 172.25164794921875, -55.748809814453125, -13.892631530761719, 234.0707244873047, 201.99668884277344, 101.94668579101562, -59.074302673339844, 146.16127014160156, 5.291473388671875, 174.55795288085938, 0.7158851623535156, 12.917985916137695, 73.42831420898438, -18.949447631835938, 26.084930419921875, 175.11001586914062, 221.18850708007812, -60.223724365234375, -10.61764907836914, 85.94581604003906, 9.843795776367188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000650.npy"}
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 54.04645538330078, "std": 92.57837677001953, "min": -159.03131103515625, "p10": -39.413778686523436, "median": 25.733492851257324, "p90": 190.4406219482422, "max": 252.89242553710938, "pos_frac": 0.734375, "sample": [-0.50677490234375, 35.12217712402344, 91.50883483886719, -50.714019775390625, 33.88097381591797, -79.0011978149414, -35.612281799316406, 95.86542510986328, 16.18017578125, 178.1811981201172, 1.6297607421875, -16.374038696289062, 0.32757568359375, 8.94488525390625, 100.94709777832031, -159.03131103515625, 7.196815490722656, 250.57534790039062, 67.30184173583984, 184.81503295898438, 186.89407348632812, -4.8535308837890625, -65.49359130859375, 113.43437194824219, 10.848167419433594, 24.506046295166016, 42.047607421875, 84.19058227539062, -4.8318939208984375, 221.03912353515625, 8.830989837646484, 204.5496826171875, 8.025188446044922, -10.464241027832031, 33.93891906738281, 0.08746147155761719, 162.05555725097656, 15.601272583007812, -32.502235412597656, 34.84828567504883, 195.2945556640625, 19.655715942382812, 119.42259216308594, -1.4072837829589844, -11.91562271118164, 191.9605712890625, 210.80198669433594, 160.57675170898438, -41.042991638183594, 174.75704956054688, 26.960939407348633, 7.9610595703125, 1.5488357543945312, 123.09476470947266, 122.51174926757812, 104.04212188720703, 174.55958557128906, 0.13674545288085938, 41.2042236328125, -120.69772338867188, 47.895042419433594, -74.56168365478516, 252.89242553710938, -30.6673583984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000651.npy"}
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 52.69062423706055, "std": 104.277099609375, "min": -170.8358154296875, "p10": -64.89350280761717, "median": 18.244531631469727, "p90": 187.50529785156252, "max": 315.62408447265625, "pos_frac": 0.6875, "sample": [17.767574310302734, 165.4441680908203, 176.74679565429688, 18.72148895263672, -25.904136657714844, 188.61618041992188, 2.6681079864501953, 96.61199951171875, 175.74996948242188, 6.2064208984375, 191.01181030273438, -148.65753173828125, 4.409931182861328, 16.445161819458008, 84.35177612304688, 184.40966796875, 24.58392333984375, 110.36882781982422, 101.21056365966797, 12.202346801757812, -73.48040771484375, 184.91323852539062, 135.2881622314453, 1.3873519897460938, 10.561420440673828, -30.75756072998047, -132.285888671875, -35.010009765625, 211.15284729003906, 101.53343963623047, 28.729736328125, -35.08003234863281, 90.85063171386719, 3.9162216186523438, 158.59185791015625, -48.87928009033203, 165.17440795898438, 106.36692810058594, -14.232894897460938, -132.67115783691406, -37.56361389160156, 192.1426544189453, -4.776588439941406, 142.06765747070312, -50.15289306640625, -24.68008804321289, 182.63400268554688, 144.43377685546875, 1.9551753997802734, 190.563720703125, 315.62408447265625, 110.76184844970703, 109.22764587402344, -12.533302307128906, -5.231044769287109, 14.577285766601562, -126.13835144042969, 3.908731460571289, -71.21090698242188, 188.86903381347656, 36.38038635253906, -11.980926513671875, -170.8358154296875, 155.12344360351562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000652.npy"}
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 64.12318420410156, "std": 100.89856719970703, "min": -199.43264770507812, "p10": -53.965723419189445, "median": 59.19746208190918, "p90": 192.22334289550784, "max": 262.6009826660156, "pos_frac": 0.78125, "sample": [11.68571662902832, 165.47665405273438, 9.834083557128906, -18.284568786621094, 85.86862182617188, 9.129730224609375, 72.43797302246094, -48.108985900878906, 194.4261474609375, 7.0962371826171875, -13.749874114990234, 3.6861114501953125, 121.35313415527344, -74.73554992675781, 262.6009826660156, 130.39443969726562, 102.24765014648438, 200.80279541015625, 38.38755416870117, 198.29519653320312, 130.43499755859375, 209.18211364746094, 104.8906021118164, -16.91173553466797, 56.91133117675781, 30.413267135620117, 185.92684936523438, 64.29412078857422, 0.8087806701660156, -33.87736511230469, 207.43353271484375, -56.47575378417969, -57.084068298339844, 31.44717788696289, 4.654359817504883, -89.70596313476562, 122.46817016601562, 86.0697250366211, 163.58277893066406, 57.72129440307617, 15.317039489746094, -162.0376739501953, 229.22012329101562, 60.67362976074219, 156.40524291992188, -175.83787536621094, 182.4593505859375, -14.957046508789062, 142.16323852539062, 65.44366455078125, -7.847953796386719, 187.08346557617188, -199.43264770507812, 4.5395050048828125, 3.8857555389404297, 50.101112365722656, 164.86099243164062, 181.0465850830078, 13.643566131591797, 164.94757080078125, 64.76842498779297, 121.3895263671875, 26.971336364746094, 168.049072265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000653.npy"}
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 52.50005340576172, "std": 94.35359191894531, "min": -137.69882202148438, "p10": -71.12147216796875, "median": 39.60795021057129, "p90": 185.64632110595704, "max": 289.96075439453125, "pos_frac": 0.75, "sample": [11.248176574707031, 213.4217071533203, -128.47152709960938, 128.57852172851562, 76.60372924804688, -90.37632751464844, 193.7481689453125, 71.6939468383789, 2.3527259826660156, -77.58293151855469, 13.5252685546875, -65.778076171875, 112.7987289428711, 4.600959777832031, -13.204856872558594, 16.04189109802246, 181.7339630126953, -97.27523803710938, 184.23204040527344, 22.548919677734375, 27.329315185546875, 11.71352767944336, 178.8665771484375, 204.55224609375, -10.365461349487305, -10.775115966796875, 7.717525482177734, -11.569770812988281, -38.833839416503906, 7.808742523193359, -1.8406219482421875, 103.03387451171875, 289.96075439453125, 101.99319458007812, 31.113426208496094, 194.2581787109375, 180.23947143554688, -29.37030029296875, 58.94854736328125, 60.9869384765625, 116.77945709228516, 45.64143371582031, 97.20539855957031, -119.55966186523438, 10.991378784179688, 9.958595275878906, 196.0003204345703, 183.273681640625, 42.20042419433594, 100.77943420410156, 52.55482482910156, 186.25244140625, 1.3438224792480469, 49.63599395751953, 174.38009643554688, -73.4114990234375, 0.10703277587890625, 110.140625, 58.62117004394531, 99.79995727539062, 37.01547622680664, 44.68499755859375, -42.90013122558594, -137.69882202148438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000654.npy"}
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 85.00445556640625, "std": 85.91474914550781, "min": -85.31884002685547, "p10": -9.848941802978516, "median": 79.50939178466797, "p90": 190.62156372070316, "max": 258.98272705078125, "pos_frac": 0.796875, "sample": [148.05250549316406, 140.49974060058594, -6.586647033691406, 5.918453216552734, 37.34814453125, 110.74099731445312, 258.98272705078125, 130.7705078125, 102.22925567626953, 3.928375244140625, -1.823516845703125, 152.79397583007812, 2.098358154296875, -9.240516662597656, -3.39959716796875, -17.450210571289062, 31.711637496948242, 201.90362548828125, 181.56112670898438, 14.445438385009766, 194.2569580078125, -10.109695434570312, 174.12008666992188, 245.8850555419922, 30.608673095703125, -12.124351501464844, 104.41339874267578, -67.15940856933594, 232.31710815429688, -48.51457214355469, 176.74867248535156, 117.32611083984375, 182.13897705078125, 35.58618927001953, 53.32225036621094, 16.338764190673828, 86.83924865722656, 171.84756469726562, 128.27157592773438, 223.9798583984375, 9.382659912109375, 28.167938232421875, -6.969482421875, 208.54611206054688, 151.0158233642578, -60.41618347167969, 47.80126190185547, 126.11921691894531, 175.91259765625, 172.23973083496094, -5.184543609619141, 58.08599853515625, 169.86904907226562, 62.44287872314453, 51.7320556640625, 116.97494506835938, 109.27859497070312, 58.435707092285156, 153.38406372070312, 6.6240081787109375, -85.31884002685547, 72.17953491210938, 149.96780395507812, 149.4375762939453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000655.npy"}
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 65.92683410644531, "std": 94.83817291259766, "min": -190.53245544433594, "p10": -18.589533233642577, "median": 47.428070068359375, "p90": 195.77488708496097, "max": 220.42205810546875, "pos_frac": 0.765625, "sample": [176.24061584472656, 95.53101348876953, 159.371826171875, 166.594970703125, 18.928058624267578, -4.8386383056640625, 3.101604461669922, -2.01898193359375, 58.274993896484375, -15.014907836914062, -4.560176849365234, 220.42205810546875, -9.685646057128906, 7.686553955078125, 3.0222854614257812, 149.41481018066406, -190.53245544433594, -13.8984375, 129.72412109375, 156.36349487304688, 213.41152954101562, 13.372200012207031, -155.8234100341797, 33.0252571105957, 203.7733917236328, -27.65072250366211, 84.53516387939453, 179.62620544433594, -1.6709136962890625, 146.04205322265625, 159.37261962890625, 1.3614883422851562, 50.28605651855469, 47.87684631347656, 69.2694320678711, 121.41706848144531, 36.279815673828125, 27.135116577148438, 201.30250549316406, 69.3780517578125, 46.97929382324219, -113.43330383300781, 29.33782196044922, 210.4711456298828, 44.666290283203125, 191.47518920898438, 73.05209350585938, 78.34567260742188, 32.58250045776367, 145.6605987548828, 189.5654296875, -23.07967758178711, -18.60626220703125, 205.8994598388672, 172.112060546875, 11.036392211914062, 5.9642486572265625, -125.67327880859375, -18.550498962402344, 33.04512023925781, 52.08744812011719, 37.13545227050781, 185.17971801757812, 197.61761474609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000656.npy"}
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 75.00975036621094, "std": 105.50074005126953, "min": -197.13934326171875, "p10": -15.305333518981929, "median": 76.15395736694336, "p90": 206.8331726074219, "max": 270.9088134765625, "pos_frac": 0.796875, "sample": [216.8898468017578, 35.582794189453125, 7.715145111083984, 270.9088134765625, 116.0034408569336, 9.497793197631836, 244.32498168945312, 189.976318359375, 46.96721649169922, -173.51837158203125, 17.490100860595703, -9.548311233520508, 114.08537292480469, 31.227279663085938, 261.3392639160156, 112.21153259277344, 80.3098373413086, 179.9825897216797, -137.80886840820312, 211.462646484375, 6.902862548828125, 142.3746337890625, -155.18463134765625, 33.464195251464844, 80.36103057861328, 175.71780395507812, 71.99807739257812, -9.455169677734375, -2.102354049682617, 58.754981994628906, 95.4967041015625, 169.8781280517578, -17.772628784179688, 175.73851013183594, 91.57986450195312, 21.997802734375, 148.50387573242188, 172.84161376953125, 207.16021728515625, 3.4264469146728516, 242.53561401367188, 3.1157073974609375, -8.055648803710938, 202.38748168945312, 8.314132690429688, 63.805938720703125, 206.070068359375, 0.7562999725341797, -87.49163055419922, 94.7410888671875, 16.210670471191406, -197.13934326171875, 159.54788208007812, -1.7558116912841797, 32.11262512207031, 128.1881103515625, -39.78607940673828, 113.5705795288086, 94.35087585449219, 191.1658172607422, 5.573101043701172, 115.6180191040039, -7.876625061035156, 167.88363647460938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000657.npy"}
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 59.52809143066406, "std": 111.8115463256836, "min": -184.49642944335938, "p10": -72.10392608642577, "median": 52.59996032714844, "p90": 197.47820587158202, "max": 305.1884765625, "pos_frac": 0.765625, "sample": [75.50949096679688, 18.77591323852539, 196.87022399902344, -154.83010864257812, 175.3262481689453, -1.986886978149414, -174.9417266845703, 69.32130432128906, 183.1865692138672, 195.32733154296875, 0.538360595703125, 152.83908081054688, -62.351959228515625, 96.79000854492188, 192.1834259033203, 73.19209289550781, -58.98005676269531, 151.17393493652344, 305.1884765625, 5.125030517578125, -16.7879581451416, 29.103973388671875, 47.081993103027344, 135.54971313476562, 114.8544921875, 197.73876953125, -168.92977905273438, 139.72946166992188, 102.62249755859375, 24.413040161132812, 6.711355209350586, -1.7228050231933594, 59.17539978027344, 121.30488586425781, 1.7243194580078125, 50.079566955566406, 223.357177734375, -177.3195037841797, 79.67731475830078, 55.106964111328125, 183.8712158203125, 4.140235900878906, 85.66700744628906, 209.420654296875, -3.126913070678711, 97.66368865966797, -76.28334045410156, 194.48147583007812, 25.180747985839844, 59.216182708740234, 17.367340087890625, 1.270620346069336, -44.73126983642578, -128.62869262695312, 28.29730224609375, 137.71304321289062, 3.3359947204589844, -184.49642944335938, 245.0047607421875, -6.000846862792969, 239.39117431640625, 50.09295654296875, 205.54513549804688, 3.678281784057617], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000658.npy"}
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 47.89827346801758, "std": 103.06155395507812, "min": -182.8984375, "p10": -83.58306198120115, "median": 35.21263885498047, "p90": 188.45124206542968, "max": 228.71542358398438, "pos_frac": 0.6875, "sample": [29.573944091796875, 111.13249206542969, 194.1461639404297, 151.20167541503906, -5.1508026123046875, 228.71542358398438, 120.6956787109375, 107.22185516357422, 10.325029373168945, 197.84451293945312, 165.39266967773438, -4.3268890380859375, 19.61743927001953, 38.54623031616211, -0.33016204833984375, -4.002655029296875, 132.56373596191406, -150.72927856445312, 15.209714889526367, -64.4100570678711, 172.87094116210938, -54.434242248535156, 76.38104248046875, 37.579742431640625, 5.458311080932617, -15.247421264648438, 5.803712844848633, 37.91858673095703, -2.2779407501220703, 44.78947448730469, 32.84553527832031, 41.68273162841797, 188.769287109375, 148.8437042236328, 90.27295684814453, -2.2374801635742188, 191.27484130859375, 116.54632568359375, 187.70913696289062, 14.10744857788086, 48.641815185546875, -182.8984375, -91.80006408691406, 180.6715545654297, -23.982263565063477, 1.69580078125, -3.733917236328125, 149.18014526367188, 152.83084106445312, -59.57794952392578, 19.508743286132812, 173.34873962402344, -181.9185791015625, -112.84610748291016, 195.1371612548828, 21.204421997070312, 10.088783264160156, -182.27403259277344, 212.22201538085938, 66.12460327148438, -32.5245361328125, 143.72227478027344, 64.73663330078125, -113.9615478515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000659.npy"}
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 72.93806457519531, "std": 94.08099365234375, "min": -159.48915100097656, "p10": -28.92299346923827, "median": 62.757394790649414, "p90": 189.37368774414065, "max": 365.26483154296875, "pos_frac": 0.796875, "sample": [191.99839782714844, 83.6976547241211, 60.81392288208008, -72.4335708618164, 65.2322998046875, -88.64960479736328, -33.050872802734375, 184.59515380859375, -159.48915100097656, -18.624313354492188, 49.05442428588867, 196.1466064453125, 24.163257598876953, 116.64891052246094, 10.74493408203125, 250.13986206054688, 169.9545135498047, 126.54704284667969, 19.51313591003418, -60.009521484375, 115.74952697753906, 37.171142578125, 48.42426300048828, 41.211936950683594, 75.26963806152344, 182.7305908203125, -19.291275024414062, 199.54751586914062, 79.21287536621094, -59.291015625, 56.21592712402344, 134.31378173828125, 365.26483154296875, 120.28318786621094, -3.1289024353027344, -95.59197998046875, 14.351173400878906, -0.272308349609375, 139.423583984375, 77.09872436523438, 35.043495178222656, 4.375158309936523, 191.421630859375, 161.26004028320312, 172.16586303710938, 176.18161010742188, -2.822357177734375, 33.42052459716797, 150.0570526123047, 120.22844696044922, 1.2998771667480469, 215.45565795898438, 18.81787872314453, 3.279062271118164, 144.90567016601562, 64.70086669921875, 66.26565551757812, 143.50083923339844, 2.2667083740234375, -4.2979736328125, 37.1326904296875, 73.42874145507812, 50.43931579589844, 183.82302856445312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000660.npy"}
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 48.82136535644531, "std": 108.15203857421875, "min": -234.1811065673828, "p10": -76.87365341186523, "median": 34.86649131774902, "p90": 189.27298889160159, "max": 278.22021484375, "pos_frac": 0.6875, "sample": [-18.15130615234375, -3.136444091796875, -74.12451171875, -144.5571746826172, 126.44825744628906, 125.20027160644531, 164.21885681152344, -0.135162353515625, -234.1811065673828, 171.5428466796875, -140.4447784423828, 11.4581298828125, -30.136795043945312, 184.61260986328125, 194.3624267578125, 121.3078384399414, 197.83642578125, 109.19804382324219, 278.22021484375, 60.69566345214844, -184.18145751953125, 57.56230163574219, 147.10479736328125, -86.49929809570312, -3.834949493408203, 22.9735107421875, 28.66614532470703, -67.08876037597656, 1.7544631958007812, 5.506019592285156, 31.47280502319336, 82.87557220458984, -156.13710021972656, 25.60504913330078, 144.598388671875, 0.9146537780761719, 52.28486633300781, 191.27029418945312, 13.20888900756836, 172.58248901367188, 48.501007080078125, 38.26017761230469, -51.14579772949219, 172.717529296875, 177.77719116210938, 79.69940185546875, 206.2474365234375, 28.0831241607666, 5.245359420776367, 110.51018524169922, -78.0518569946289, -1.8636856079101562, -41.82316589355469, -52.367584228515625, 61.7879524230957, 146.67416381835938, 66.49982452392578, 111.33486938476562, 72.4384765625, 192.49496459960938, 15.16827392578125, -7.575836181640625, -23.42296600341797, 266.5052490234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4/margin_logs/step_0000661.npy"}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:195cbc61d7a3574c77e0bd02b7eb6491c10264f41cddec9ec18d4d366a6a2e0f
size 4886466168

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5a802221363df47f1d9cf6ee716859c00b86ed72fc97960024ac9508926ac103
size 4832007448

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6f52ded9cebb963ccbf14829cf2cbf87b36976a2f972ff0c017ee1c5122c4ee5
size 4999813112

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bdfbe0eec140acf8a36ce554b181faebb8b9c965db22b01a8abaf2dfa5f6b623
size 4999813128

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:62e74369c443cf26b64146704ae1e67bc267678634bce7a83951ebae5c60e700
size 4832007496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:409ccce20cc2b1c7168f0be8e0c9014d2d4968aa8fa294e05238f2446ce746ac
size 4999813120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:592d9d3a21538e75c7e162aed99af47795dd8aa392967a78ef8c79a15605fadb
size 2571158184

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 32121044992
},
"weight_map": {
"lm_head.weight": "model-00007-of-00007.safetensors",
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.norm.weight": "model-00007-of-00007.safetensors"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3c5cf44023714fb39b05e71e425f8d7b92805ff73f7988b083b8c87f0bf87393
size 17209961

2064
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

9
train_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 0.999244142101285,
"total_flos": 0.0,
"train_loss": 1.1320684536863204,
"train_runtime": 1756.8176,
"train_samples": 42336,
"train_samples_per_second": 24.098,
"train_steps_per_second": 0.376
}

12653
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff