初始化项目,由ModelHub XC社区提供模型

Model: jackf857/llama-3-8b-base-new-dpo-harmless-4xh200-s_star1.0
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-10 13:57:33 +08:00
commit 22d5030199
681 changed files with 7893 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

81
README.md Normal file
View File

@@ -0,0 +1,81 @@
---
library_name: transformers
base_model: W-61/llama-3-8b-base-sft-hh-harmless-4xh200
tags:
- alignment-handbook
- new-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: llama-3-8b-base-new-dpo-ultrafeedback-4xh200
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama-3-8b-base-new-dpo-ultrafeedback-4xh200
This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-hh-harmless-4xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-hh-harmless-4xh200) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5214
- Fcm Dpo/beta: 0.0836
- Fcm Dpo/q T: 0.3380
- Fcm Dpo/delta: -0.0050
- Fcm Dpo/margin: 11.8756
- Margin Dpo/margin Mean: 11.8756
- Margin Dpo/margin Std: 18.3875
- Logps/chosen: -96.2474
- Logps/rejected: -113.1114
- Logps/ref Chosen: -75.8693
- Logps/ref Rejected: -80.8577
- Logits/chosen: 0.3597
- Logits/rejected: 0.3046
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Fcm Dpo/q T | Fcm Dpo/delta | Fcm Dpo/margin | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:------------:|:-----------:|:-------------:|:--------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
| 1.0502 | 0.3023 | 200 | 0.5717 | 0.0936 | 0.3525 | 0.0185 | 10.3675 | 10.3675 | 18.4204 | -87.9072 | -103.2631 | -75.8693 | -80.8577 | 0.4369 | 0.3875 |
| 1.0126 | 0.6047 | 400 | 0.5364 | 0.1033 | 0.3417 | 0.0098 | 9.4748 | 9.4748 | 15.2862 | -92.9376 | -107.4008 | -75.8693 | -80.8577 | 0.3564 | 0.3008 |
| 1.0944 | 0.9070 | 600 | 0.5214 | 0.0836 | 0.3380 | -0.0050 | 11.8756 | 11.8756 | 18.3875 | -96.2474 | -113.1114 | -75.8693 | -80.8577 | 0.3597 | 0.3046 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

26
all_results.json Normal file
View File

@@ -0,0 +1,26 @@
{
"epoch": 0.999244142101285,
"eval_fcm_dpo/beta": 0.08221900463104248,
"eval_fcm_dpo/delta": 0.006471974775195122,
"eval_fcm_dpo/margin": 11.944085121154785,
"eval_fcm_dpo/q_t": 0.33879798650741577,
"eval_logits/chosen": 0.39933323860168457,
"eval_logits/rejected": 0.3423865735530853,
"eval_logps/chosen": -96.3701171875,
"eval_logps/ref_chosen": -75.86933135986328,
"eval_logps/ref_rejected": -80.85771942138672,
"eval_logps/rejected": -113.30257415771484,
"eval_loss": 0.5207385420799255,
"eval_margin_dpo/margin_mean": 11.944085121154785,
"eval_margin_dpo/margin_std": 18.461780548095703,
"eval_runtime": 38.5096,
"eval_samples": 2303,
"eval_samples_per_second": 59.803,
"eval_steps_per_second": 1.87,
"total_flos": 0.0,
"train_loss": 1.0854419020769637,
"train_runtime": 1766.4629,
"train_samples": 42336,
"train_samples_per_second": 23.967,
"train_steps_per_second": 0.374
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"vocab_size": 128256
}

20
eval_results.json Normal file
View File

@@ -0,0 +1,20 @@
{
"epoch": 0.999244142101285,
"eval_fcm_dpo/beta": 0.08221900463104248,
"eval_fcm_dpo/delta": 0.006471974775195122,
"eval_fcm_dpo/margin": 11.944085121154785,
"eval_fcm_dpo/q_t": 0.33879798650741577,
"eval_logits/chosen": 0.39933323860168457,
"eval_logits/rejected": 0.3423865735530853,
"eval_logps/chosen": -96.3701171875,
"eval_logps/ref_chosen": -75.86933135986328,
"eval_logps/ref_rejected": -80.85771942138672,
"eval_logps/rejected": -113.30257415771484,
"eval_loss": 0.5207385420799255,
"eval_margin_dpo/margin_mean": 11.944085121154785,
"eval_margin_dpo/margin_std": 18.461780548095703,
"eval_runtime": 38.5096,
"eval_samples": 2303,
"eval_samples_per_second": 59.803,
"eval_steps_per_second": 1.87
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.51.0"
}

661
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,661 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0036170482635498047, "std": 0.25554317235946655, "min": -0.736083984375, "p10": -0.3432229995727539, "median": 0.031360626220703125, "p90": 0.29227676391601565, "max": 0.645111083984375, "pos_frac": 0.578125, "sample": [0.1120758056640625, 0.12518310546875, 0.31621551513671875, 0.13765716552734375, -0.12592506408691406, 0.23141098022460938, -0.21887779235839844, 0.21950721740722656, 0.04480743408203125, 0.020877838134765625, 0.0570220947265625, 0.058269500732421875, -0.4338226318359375, -0.030628204345703125, 0.645111083984375, -0.395477294921875, 0.09050941467285156, 0.0007190704345703125, -0.34615325927734375, 0.016077041625976562, -0.33638572692871094, 0.293853759765625, 0.03119659423828125, 0.22386932373046875, 0.21470260620117188, -0.08536529541015625, 0.0907745361328125, -0.03816986083984375, 0.39190101623535156, 0.16336441040039062, 0.08024787902832031, -0.031158447265625, 0.08477020263671875, 0.002460479736328125, -0.242034912109375, 0.07232666015625, -0.60186767578125, 0.20531463623046875, 0.155731201171875, -0.14299774169921875, -0.25698089599609375, 0.12331962585449219, -0.26497650146484375, 0.15140533447265625, -0.0920257568359375, -0.18599319458007812, 0.19028091430664062, 0.2496490478515625, 0.42162322998046875, 0.17873382568359375, -0.1525421142578125, -0.4972076416015625, 0.32010650634765625, -0.10365867614746094, -0.233795166015625, -0.19828224182128906, -0.4018898010253906, -0.13407135009765625, -0.09596633911132812, 0.031524658203125, 0.28859710693359375, -0.192962646484375, -0.736083984375, 0.3026123046875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000001.npy"}
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.03744968771934509, "std": 0.2875921130180359, "min": -0.7604827880859375, "p10": -0.2812448501586914, "median": 0.03963661193847656, "p90": 0.3654294967651367, "max": 0.8134727478027344, "pos_frac": 0.5625, "sample": [0.30594635009765625, -0.24289894104003906, -0.11509323120117188, -0.13417816162109375, 0.06942558288574219, 0.36568641662597656, -0.14640045166015625, 0.1497650146484375, 0.30261993408203125, 0.10124588012695312, 0.13028717041015625, -0.0031890869140625, 0.0361480712890625, 0.5662612915039062, 0.09694290161132812, -0.01091766357421875, 0.1128997802734375, 0.0411834716796875, -0.21860504150390625, -0.1236419677734375, -0.08812713623046875, 0.10360527038574219, 0.1790008544921875, -0.5114288330078125, 0.3056755065917969, -0.14553451538085938, 0.28168487548828125, 0.26990509033203125, 0.1686878204345703, 0.038089752197265625, 0.19541168212890625, -0.10783576965332031, -0.2644004821777344, -0.19707489013671875, -0.140472412109375, 0.1349811553955078, 0.19672012329101562, -0.0714111328125, 0.53369140625, 0.1271820068359375, 0.8134727478027344, 0.2990264892578125, -0.7604827880859375, -0.08274078369140625, 0.05890846252441406, 0.029361724853515625, 0.4510040283203125, -0.1599273681640625, -0.29346656799316406, 0.10005569458007812, -0.27509117126464844, -0.1937713623046875, 0.19167327880859375, 0.28173065185546875, -0.09406471252441406, -0.3380699157714844, -0.29186248779296875, 0.36483001708984375, 0.009979248046875, 0.44391632080078125, -0.126708984375, -0.6550216674804688, 0.6160736083984375, -0.28388214111328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000002.npy"}
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": -0.036822348833084106, "std": 0.3666442632675171, "min": -1.24493408203125, "p10": -0.5151679992675781, "median": 0.03291606903076172, "p90": 0.3262573242187501, "max": 0.8081741333007812, "pos_frac": 0.53125, "sample": [0.09917449951171875, -0.02935028076171875, -0.31510162353515625, -0.4739227294921875, 0.29925537109375, 0.6008148193359375, -0.18277931213378906, 0.14822006225585938, 0.584197998046875, -0.4706573486328125, -0.368560791015625, -1.24493408203125, -0.0503387451171875, 0.022665023803710938, -0.4677581787109375, 0.13232421875, -0.5328445434570312, 0.10906219482421875, 0.11223983764648438, -0.40293121337890625, 0.4168243408203125, -0.012737274169921875, 0.10643768310546875, -0.001750946044921875, -0.16180038452148438, 0.19319725036621094, 0.8081741333007812, 0.18597412109375, 0.018825531005859375, 0.136077880859375, 0.1293315887451172, -0.5908279418945312, -0.32933807373046875, 0.11566925048828125, -0.13167381286621094, -0.13050270080566406, -0.6655349731445312, -0.1309661865234375, 0.09109878540039062, 0.2708930969238281, 0.5146865844726562, 0.180877685546875, 0.520538330078125, 0.2529754638671875, 0.2991485595703125, -0.168426513671875, 0.0572967529296875, 0.2230701446533203, 0.15476226806640625, -0.6433868408203125, -0.7787399291992188, 0.33782958984375, 0.209136962890625, -0.047473907470703125, 0.0431671142578125, 0.1602020263671875, 0.059906005859375, -0.32929229736328125, -0.35622406005859375, -0.09255218505859375, -0.6499481201171875, 0.25737762451171875, -0.22983551025390625, -0.21787261962890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000003.npy"}
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": 0.05772021412849426, "std": 0.26867926120758057, "min": -0.6277923583984375, "p10": -0.27910575866699217, "median": 0.08504962921142578, "p90": 0.37826156616210943, "max": 0.6207351684570312, "pos_frac": 0.625, "sample": [-0.0748138427734375, 0.49536895751953125, 0.21860504150390625, 0.2703094482421875, -0.08898162841796875, -0.6277923583984375, 0.08395957946777344, 0.5272274017333984, 0.3392524719238281, 0.1710662841796875, 0.22943496704101562, 0.186614990234375, -0.1854705810546875, 0.1371002197265625, -0.21223831176757812, -0.310699462890625, 0.14093780517578125, 0.17470932006835938, 0.6207351684570312, -0.5820999145507812, -0.10109710693359375, 0.15488815307617188, -0.23262786865234375, 0.0016326904296875, 0.007846832275390625, 0.17594146728515625, -0.12631988525390625, 0.382781982421875, 0.0789794921875, 0.405242919921875, 0.212158203125, 0.035617828369140625, 0.05321502685546875, 0.29140472412109375, -0.081085205078125, 0.4670867919921875, 0.3056640625, 0.08613967895507812, -0.260101318359375, 0.16109466552734375, 0.536529541015625, -0.10182571411132812, 0.1378173828125, 0.03313636779785156, -0.1820526123046875, 0.1665496826171875, -0.37387847900390625, -0.033359527587890625, -0.4111175537109375, -0.005207061767578125, 0.20421600341796875, -0.23438262939453125, -0.03936767578125, -0.2872505187988281, 0.2180633544921875, 0.26837158203125, -0.4673919677734375, 0.2679176330566406, 0.36771392822265625, -0.21140289306640625, -0.11806488037109375, 0.120330810546875, 0.28122711181640625, 0.0258331298828125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000004.npy"}
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": 0.003126293420791626, "std": 0.30415335297584534, "min": -0.5953788757324219, "p10": -0.38629894256591796, "median": 0.0011138916015625, "p90": 0.4298225402832033, "max": 0.8375778198242188, "pos_frac": 0.515625, "sample": [-0.39072608947753906, -0.06169891357421875, 0.06361198425292969, 0.001007080078125, -0.174102783203125, -0.1037139892578125, 0.05846977233886719, 0.14780044555664062, 0.2871551513671875, 0.07476043701171875, 0.5759429931640625, -0.41363525390625, -0.5953788757324219, -0.20702362060546875, -0.1952972412109375, 0.001220703125, -0.07487869262695312, -0.308441162109375, -0.2206573486328125, 0.09398651123046875, 0.012996673583984375, -0.07400131225585938, 0.06533241271972656, 0.2680549621582031, -0.4396820068359375, 0.48565673828125, -0.37596893310546875, 0.03305816650390625, 0.382659912109375, 0.31743621826171875, 0.18358612060546875, -0.03912353515625, -0.5004730224609375, 0.20655441284179688, -0.0739593505859375, -0.4923973083496094, -0.10964775085449219, -0.254150390625, 0.235992431640625, 0.10361862182617188, 0.19465255737304688, 0.064666748046875, -0.2303466796875, 0.7541580200195312, 0.45003509521484375, -0.050067901611328125, -0.1220550537109375, -0.191009521484375, 0.0725555419921875, -0.1565093994140625, 0.0024261474609375, -0.181976318359375, 0.26445770263671875, 0.8375778198242188, -0.50128173828125, -0.12601852416992188, -0.327362060546875, -0.18565750122070312, 0.12148094177246094, 0.4871978759765625, 0.12407302856445312, 0.47110748291015625, -0.3755817413330078, 0.3096160888671875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000005.npy"}
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": -0.028319865465164185, "std": 0.32732126116752625, "min": -0.8701171875, "p10": -0.4658819198608398, "median": -0.009981155395507812, "p90": 0.3678298950195313, "max": 0.7412261962890625, "pos_frac": 0.484375, "sample": [-0.20590972900390625, -0.1417827606201172, 0.040191650390625, 0.11321258544921875, -0.04990386962890625, -0.1321086883544922, 0.08891487121582031, 0.05756378173828125, -0.228515625, -0.2857780456542969, 0.27082061767578125, -0.50299072265625, -0.19828033447265625, 0.31749725341796875, 0.7412261962890625, -0.5300140380859375, 0.3690338134765625, 0.6697463989257812, 0.007537841796875, -0.4094352722167969, -0.21669769287109375, 0.365020751953125, -0.013439178466796875, 0.25141334533691406, 0.23184967041015625, -0.5727386474609375, 0.0355987548828125, 0.3328857421875, -0.15872764587402344, 0.048221588134765625, 0.30411720275878906, 0.398651123046875, 0.3949432373046875, 0.03951263427734375, -0.5183486938476562, 0.17263031005859375, -0.2745933532714844, -0.16645431518554688, -0.21887588500976562, -0.5267486572265625, 0.14693450927734375, -0.17789268493652344, -0.1490020751953125, -0.3003959655761719, 0.3529510498046875, -0.430328369140625, 0.4783477783203125, 0.12412261962890625, -0.48111915588378906, 0.09046745300292969, -0.00652313232421875, 0.43279266357421875, -0.3943195343017578, -0.18792724609375, -0.32575225830078125, -0.034618377685546875, -0.14197731018066406, -0.4227294921875, -0.8701171875, 0.33050537109375, -0.31219482421875, 0.21851539611816406, 0.16441726684570312, 0.1841259002685547], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000006.npy"}
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": -0.014500677585601807, "std": 0.2555774450302124, "min": -0.5395126342773438, "p10": -0.3222640991210937, "median": -0.007221221923828125, "p90": 0.27245101928710946, "max": 0.6279296875, "pos_frac": 0.484375, "sample": [0.3534126281738281, -0.24993896484375, -0.26628875732421875, -0.03271675109863281, 0.30832862854003906, -0.041217803955078125, 0.13299560546875, -0.0029754638671875, -0.5395126342773438, 0.4010581970214844, -0.33942413330078125, 0.17924118041992188, 0.1531085968017578, 0.159088134765625, 0.08337783813476562, 0.05069923400878906, -0.02205657958984375, 0.02034759521484375, -0.08133316040039062, -0.32749176025390625, 0.039794921875, 0.28041839599609375, -0.09575653076171875, -0.0391845703125, 0.2515716552734375, 0.19449234008789062, -0.17366790771484375, -0.2369384765625, -0.464019775390625, -0.13674163818359375, -0.4797515869140625, 0.19390869140625, -0.30713844299316406, 0.19934463500976562, -0.01146697998046875, 0.48201751708984375, -0.16654586791992188, 0.5711212158203125, 0.1409759521484375, -0.13028335571289062, 0.2010955810546875, -0.1924762725830078, 0.17481231689453125, 0.2538604736328125, -0.273895263671875, 0.0879364013671875, -0.1433563232421875, 0.09644699096679688, 0.09106063842773438, -0.44110107421875, -0.2994804382324219, -0.31006622314453125, 0.09950828552246094, -0.4002532958984375, 0.026031494140625, -0.1045989990234375, -0.21656036376953125, -0.171478271484375, -0.2921905517578125, 0.6279296875, 0.22779083251953125, 0.00077056884765625, 0.14263916015625, -0.16332054138183594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000007.npy"}
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": 0.01682296395301819, "std": 0.3240106999874115, "min": -0.6318588256835938, "p10": -0.34962482452392574, "median": -0.015995025634765625, "p90": 0.3798248291015626, "max": 1.0202102661132812, "pos_frac": 0.453125, "sample": [0.13897323608398438, 0.968475341796875, 0.026275634765625, 0.01996612548828125, -0.60235595703125, 0.23433685302734375, 0.24549102783203125, 0.6424789428710938, -0.3611736297607422, -0.375946044921875, 0.5762786865234375, 0.32834625244140625, -0.14298248291015625, -0.17355728149414062, 0.1560211181640625, 1.0202102661132812, -0.0018157958984375, -0.09606552124023438, 0.2501411437988281, 0.0795135498046875, 0.18323326110839844, -0.272979736328125, -0.6318588256835938, -0.1328125, 0.022830963134765625, -0.18198585510253906, -0.38402366638183594, 0.34130859375, 0.026676177978515625, 0.11028861999511719, 0.39228057861328125, 0.04059600830078125, -0.13467979431152344, -0.13323974609375, 0.10544586181640625, -0.02719879150390625, -0.06641387939453125, 0.335113525390625, 0.1881561279296875, -0.3066368103027344, 0.5645294189453125, 0.07629203796386719, -0.3226776123046875, 0.1039581298828125, -0.0243072509765625, -0.16303253173828125, 0.5154876708984375, -0.17805099487304688, -0.3707447052001953, -0.29106903076171875, -0.05157470703125, -0.0056610107421875, -0.2919349670410156, -0.28513336181640625, -0.21826171875, -0.1433563232421875, -0.20781898498535156, -0.47986602783203125, 0.35076141357421875, -0.040660858154296875, -0.092803955078125, 0.25787353515625, -0.02121734619140625, -0.010772705078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000008.npy"}
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.015697389841079712, "std": 0.29573947191238403, "min": -0.8074951171875, "p10": -0.28749084472656244, "median": 0.0400543212890625, "p90": 0.26746978759765627, "max": 0.8474884033203125, "pos_frac": 0.546875, "sample": [-0.30181884765625, 0.8474884033203125, 0.15640640258789062, -0.254058837890625, 0.14626121520996094, 0.0312652587890625, 0.11945343017578125, -0.24706649780273438, 0.5627899169921875, 0.0929718017578125, -0.23180389404296875, -0.19451522827148438, 0.7764129638671875, 0.3082542419433594, -0.160125732421875, -0.3919219970703125, 0.1241455078125, -0.3472747802734375, -0.19979476928710938, 0.1792583465576172, -0.10845947265625, -0.03281974792480469, 0.769775390625, 0.067596435546875, -0.09359550476074219, -0.0648193359375, 0.2209758758544922, 0.1672821044921875, 0.050182342529296875, -0.13796234130859375, -0.19071197509765625, -0.070892333984375, 0.20604705810546875, 0.2668609619140625, 0.007541656494140625, -0.10253524780273438, 0.12957763671875, -0.40737152099609375, 0.1620025634765625, 0.251739501953125, 0.247772216796875, -0.13941192626953125, -0.7547454833984375, -0.238677978515625, -0.016143798828125, 0.0510101318359375, 0.22933578491210938, 0.10105514526367188, 0.09231948852539062, -0.18170928955078125, 0.09360504150390625, -0.34561920166015625, -0.06124687194824219, 0.1256866455078125, 0.42803955078125, 0.267730712890625, 0.08079910278320312, -0.136566162109375, 0.08977508544921875, 0.015001296997070312, -0.8074951171875, -0.081512451171875, -0.20995330810546875, 0.0488433837890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000009.npy"}
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": -0.077696293592453, "std": 0.37100741267204285, "min": -1.2342300415039062, "p10": -0.5758394241333007, "median": -0.038181304931640625, "p90": 0.30521240234375, "max": 0.8332977294921875, "pos_frac": 0.40625, "sample": [-0.386810302734375, -0.43717002868652344, 0.0433197021484375, -0.054225921630859375, -0.18593597412109375, -0.8828582763671875, -0.2773418426513672, 0.070587158203125, -0.8304443359375, 0.096588134765625, -0.3280181884765625, -0.0350799560546875, -0.39292144775390625, 0.3705596923828125, 0.1810302734375, -0.21193695068359375, -0.1325836181640625, 0.24334716796875, 0.2947235107421875, 0.3948936462402344, 0.21739578247070312, -0.02020263671875, -0.0349578857421875, 0.177520751953125, -0.8663902282714844, -0.098388671875, -0.07909393310546875, -0.1771087646484375, -0.199188232421875, -0.16259002685546875, 0.17552947998046875, 0.4901580810546875, -0.049732208251953125, 0.251556396484375, 0.0621337890625, 0.3097076416015625, -0.22117996215820312, 0.2825164794921875, -0.044219970703125, 0.11914443969726562, -0.164703369140625, -0.020673751831054688, -0.7106170654296875, -0.270233154296875, -0.42352294921875, -0.054004669189453125, -1.2342300415039062, -0.04128265380859375, 0.8332977294921875, -0.9519805908203125, -0.6352691650390625, 0.4486503601074219, 0.23890304565429688, -0.0134124755859375, 0.02587127685546875, -0.08213043212890625, 0.290252685546875, 0.11895942687988281, -0.031005859375, 0.040313720703125, 0.15821266174316406, -0.30692291259765625, 0.31360626220703125, -0.1729736328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000010.npy"}
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": 0.027366310358047485, "std": 0.3115066587924957, "min": -0.98907470703125, "p10": -0.28300933837890624, "median": 0.025812149047851562, "p90": 0.3663818359375, "max": 0.7347564697265625, "pos_frac": 0.546875, "sample": [-0.98907470703125, 0.16292953491210938, 0.22707748413085938, -0.08851242065429688, -0.4705924987792969, 0.24706649780273438, 0.013429641723632812, 0.12882614135742188, -0.0103759765625, -0.24212646484375, 0.095977783203125, 0.07730865478515625, -0.2620849609375, 0.26576995849609375, -0.1769695281982422, -0.121612548828125, 0.6959342956542969, 0.7240753173828125, -0.5143280029296875, -0.1561279296875, 0.22235488891601562, -0.07476806640625, 0.36492919921875, -0.07101821899414062, -0.21778297424316406, -0.04340362548828125, 0.033634185791015625, 0.10605812072753906, 0.010648727416992188, 0.14635467529296875, 0.10009574890136719, 0.18449974060058594, -0.2919769287109375, 0.36700439453125, -0.0165557861328125, -0.24382781982421875, -0.21193695068359375, 0.29259490966796875, 0.19297027587890625, 0.25130462646484375, 0.164398193359375, 0.048004150390625, -0.3878059387207031, 0.12023544311523438, -0.00595855712890625, -0.8471603393554688, 0.22200775146484375, 0.20439720153808594, -0.16644287109375, 0.496734619140625, -0.003818511962890625, -0.260955810546875, -0.16615867614746094, -0.058200836181640625, 0.24013519287109375, -0.29393768310546875, -0.1568126678466797, 0.2617931365966797, 0.3825950622558594, 0.38153076171875, -0.10639190673828125, 0.7347564697265625, 0.0179901123046875, 0.22274017333984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000011.npy"}
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": -0.03798454999923706, "std": 0.37978437542915344, "min": -1.4480743408203125, "p10": -0.4972343444824219, "median": 0.00013065338134765625, "p90": 0.31829071044921875, "max": 1.1272125244140625, "pos_frac": 0.5, "sample": [-0.08743095397949219, -0.36896514892578125, -0.03627967834472656, 0.07417869567871094, -0.38912391662597656, -0.1821136474609375, 0.036357879638671875, 0.1913909912109375, -0.0157928466796875, -0.291961669921875, -0.8369598388671875, -0.14297866821289062, 0.002788543701171875, -0.5006103515625, 0.2712230682373047, 0.08913421630859375, 0.23329925537109375, -0.0482940673828125, -0.38079833984375, 0.2994728088378906, 0.13592147827148438, 0.21307373046875, -0.6248779296875, 0.3759918212890625, 0.2664775848388672, -0.48935699462890625, -0.2764701843261719, -0.29448699951171875, -0.10470199584960938, 0.31145477294921875, -0.19724273681640625, 0.28277587890625, -0.5905227661132812, -1.4480743408203125, -0.0025272369384765625, -0.12115859985351562, -0.40898895263671875, 0.25603485107421875, 0.0186309814453125, -0.11516761779785156, -0.2969970703125, -0.047908782958984375, 0.13126373291015625, 0.05557823181152344, 0.19631195068359375, 0.2568073272705078, -0.03467559814453125, 0.16987228393554688, 0.39544677734375, 0.37548828125, 0.1879119873046875, 0.18031692504882812, 0.35064697265625, 0.652252197265625, 1.1272125244140625, -0.6826171875, -0.5549144744873047, 0.32122039794921875, -0.3965797424316406, 0.0649566650390625, -0.03852081298828125, -0.1493682861328125, 0.1511859893798828, 0.050777435302734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000012.npy"}
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": -0.011957705020904541, "std": 0.33256784081459045, "min": -1.153594970703125, "p10": -0.4213642120361328, "median": -0.0011463165283203125, "p90": 0.37932739257812503, "max": 0.8685073852539062, "pos_frac": 0.5, "sample": [-1.153594970703125, -0.04421234130859375, -0.4815521240234375, -0.40912628173828125, -0.0361328125, -0.4266090393066406, -0.26180267333984375, 0.20041465759277344, 0.067169189453125, -0.02857208251953125, 0.36722373962402344, -0.22393035888671875, 0.3805999755859375, 0.033966064453125, -0.14939498901367188, 0.16604042053222656, 0.426788330078125, 0.0215606689453125, 0.2117767333984375, 0.0656890869140625, -0.10540771484375, -0.50799560546875, -0.4646453857421875, -0.3257598876953125, -0.023853302001953125, -0.09712600708007812, 0.15055084228515625, 0.05787086486816406, 0.12570953369140625, -0.16852569580078125, 0.0232391357421875, -0.5704727172851562, 0.4498748779296875, -0.17690277099609375, -0.16623687744140625, 0.3763580322265625, 0.48502349853515625, -0.22917556762695312, 0.30518150329589844, -0.2144927978515625, -0.04245758056640625, 0.5261001586914062, 0.17328643798828125, 0.048641204833984375, 0.14240264892578125, -0.1622142791748047, 0.08490753173828125, -0.10822296142578125, -0.2454071044921875, -0.0829925537109375, -0.34011077880859375, 0.25252532958984375, 0.030628204345703125, -0.034717559814453125, 0.8685073852539062, -0.18396759033203125, -0.820587158203125, 0.2661571502685547, 0.5404510498046875, 0.25696372985839844, 0.2943115234375, 0.0235748291015625, -0.02469635009765625, 0.12210845947265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000013.npy"}
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": 0.05396169424057007, "std": 0.3372066915035248, "min": -0.781829833984375, "p10": -0.22420616149902345, "median": -0.0037145614624023438, "p90": 0.45091724395751975, "max": 1.145904541015625, "pos_frac": 0.5, "sample": [-0.1744537353515625, -0.781829833984375, -0.06813812255859375, 0.097747802734375, 0.3936595916748047, 0.10340118408203125, 0.07711029052734375, 0.131439208984375, -0.02925872802734375, 0.11159706115722656, -0.079345703125, 0.1421966552734375, 0.37941932678222656, -0.14969635009765625, -0.19230270385742188, 0.008100509643554688, -0.22524642944335938, -0.192352294921875, -0.11156463623046875, 0.0657501220703125, 0.10252952575683594, -0.524627685546875, -0.1464557647705078, 0.06289291381835938, 0.08809661865234375, 0.036834716796875, 0.068267822265625, 0.1416778564453125, -0.08801651000976562, 0.7434844970703125, -0.0446319580078125, 0.8952102661132812, 0.35672760009765625, 1.145904541015625, -0.0374298095703125, -0.2939605712890625, 0.684051513671875, 0.03786468505859375, 0.06343269348144531, 0.1171875, -0.155731201171875, 0.47545623779296875, -0.4523773193359375, 1.102142333984375, 0.20038604736328125, -0.058624267578125, -0.015529632568359375, 0.38581085205078125, -0.2691192626953125, 0.5065822601318359, -0.22177886962890625, -0.17938232421875, 0.07124137878417969, -0.18662261962890625, -0.08669090270996094, -0.3387451171875, -0.109832763671875, -0.08083152770996094, -0.020505905151367188, -0.08108901977539062, -0.08496475219726562, 0.1114654541015625, -0.16876220703125, 0.19577789306640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000014.npy"}
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": 0.0151863694190979, "std": 0.31055948138237, "min": -0.802947998046875, "p10": -0.2822700500488281, "median": -0.0152587890625, "p90": 0.37488174438476574, "max": 1.0445175170898438, "pos_frac": 0.484375, "sample": [0.13453292846679688, -0.1104278564453125, 0.1557140350341797, -0.06036186218261719, -0.1910991668701172, 0.303955078125, 0.1506500244140625, -0.2864837646484375, -0.029876708984375, -0.1889495849609375, 0.313690185546875, -0.04004478454589844, -0.0491180419921875, -0.10941696166992188, 0.4215545654296875, 0.027252197265625, 1.0445175170898438, 0.17029571533203125, 0.06180572509765625, -0.29572296142578125, 0.38622283935546875, -0.5353279113769531, 0.03354644775390625, -0.12546920776367188, 0.020151138305664062, -0.3064117431640625, 0.4266357421875, -0.0977783203125, -0.201141357421875, -0.27243804931640625, 0.23630523681640625, -0.10159492492675781, -0.1328125, -0.551116943359375, 0.22682952880859375, 0.348419189453125, 0.17533111572265625, -0.089111328125, -0.15027236938476562, -0.18593597412109375, 0.05181884765625, 0.037261962890625, -0.0883331298828125, 0.910125732421875, 0.1881999969482422, -0.802947998046875, -0.628570556640625, 0.00215911865234375, 0.029422760009765625, -0.177337646484375, -0.000640869140625, -0.03791618347167969, -0.1607513427734375, 0.5152664184570312, -0.13337326049804688, -0.1149139404296875, 0.0182952880859375, -0.060077667236328125, 0.24898529052734375, 0.02619171142578125, 0.6646270751953125, -0.201263427734375, 0.00104522705078125, 0.1581573486328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000015.npy"}
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": 0.0020624548196792603, "std": 0.3215937912464142, "min": -1.0620880126953125, "p10": -0.2913944244384765, "median": 0.00858449935913086, "p90": 0.3218891143798828, "max": 1.1300811767578125, "pos_frac": 0.546875, "sample": [-0.34545135498046875, 0.08951759338378906, -0.14496612548828125, 0.027681350708007812, -0.0640106201171875, -0.1656646728515625, 0.06132698059082031, -0.3004913330078125, 1.1300811767578125, 0.09816360473632812, 0.19893646240234375, 0.4036712646484375, 0.18952178955078125, -0.402984619140625, -0.22912216186523438, 0.08806037902832031, -0.04055023193359375, 0.009599685668945312, -0.19089508056640625, -0.11444091796875, 0.289337158203125, -0.5255126953125, 0.4330291748046875, 0.6124954223632812, 0.09154701232910156, -1.0620880126953125, -0.2701683044433594, -0.0230712890625, -0.15829849243164062, 0.7328872680664062, 0.05130767822265625, 0.1505584716796875, -0.5942230224609375, -0.0308074951171875, 0.215362548828125, -0.2491607666015625, 0.3152923583984375, -0.704071044921875, -0.17239761352539062, 0.3201026916503906, -0.15868759155273438, -0.16669845581054688, 0.18738555908203125, 0.046550750732421875, 0.0015869140625, -0.037689208984375, 0.3763923645019531, -0.14867401123046875, 0.03034210205078125, -0.16765403747558594, 0.0056934356689453125, -0.15953636169433594, 0.007569313049316406, 0.1033935546875, 0.14603805541992188, 0.1071014404296875, -0.2653350830078125, -0.156494140625, -0.058807373046875, 0.2730426788330078, 0.04640960693359375, 0.033428192138671875, 0.32265472412109375, 0.043880462646484375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000016.npy"}
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": 0.005961209535598755, "std": 0.3170897960662842, "min": -0.7611083984375, "p10": -0.34326858520507814, "median": -0.0230560302734375, "p90": 0.36043930053710943, "max": 1.02496337890625, "pos_frac": 0.4375, "sample": [0.03502655029296875, 0.1415557861328125, -0.0059452056884765625, -0.3485527038574219, -0.070404052734375, -0.06592941284179688, -0.23807525634765625, -0.10236740112304688, 0.3691253662109375, -0.02283477783203125, -0.659332275390625, 0.5499916076660156, -0.06851005554199219, 0.09384536743164062, -0.345367431640625, 0.12798309326171875, -0.23537445068359375, -0.6006202697753906, -0.0628814697265625, -0.7611083984375, -0.1566009521484375, 0.0416259765625, 0.10565185546875, -0.02327728271484375, -0.47760009765625, -0.23321533203125, -0.1389617919921875, -0.08210563659667969, -0.09769439697265625, -0.23282241821289062, -0.16194915771484375, -0.1057281494140625, -0.01580047607421875, -0.06488037109375, 0.33490753173828125, 1.02496337890625, 0.0712432861328125, -0.0081024169921875, 0.5341567993164062, -0.16754150390625, -0.054412841796875, 0.142181396484375, 0.08558082580566406, 0.19649505615234375, 0.10129356384277344, -0.03798675537109375, -0.390228271484375, 0.488067626953125, 0.2895774841308594, 0.799468994140625, 0.173004150390625, -0.0467071533203125, -0.33837127685546875, -0.22771263122558594, 0.751007080078125, 0.02239990234375, 0.052043914794921875, 0.3169136047363281, 0.08594512939453125, 0.008968353271484375, 0.34017181396484375, 0.09818267822265625, -0.11514854431152344, -0.23571014404296875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000017.npy"}
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": -0.009387761354446411, "std": 0.31338295340538025, "min": -1.0141754150390625, "p10": -0.3493072509765625, "median": 0.02108478546142578, "p90": 0.327149772644043, "max": 0.6643829345703125, "pos_frac": 0.515625, "sample": [-0.04625701904296875, 0.029201507568359375, -0.13801956176757812, 0.019418716430664062, 0.3264942169189453, 0.26015472412109375, 0.05138587951660156, 0.0227508544921875, 0.46683502197265625, 0.4424324035644531, 0.2288055419921875, -0.3319854736328125, 0.055084228515625, 0.1214599609375, -0.34954071044921875, 0.20611572265625, -0.03990936279296875, -0.7700042724609375, 0.32743072509765625, 0.5859527587890625, 0.2897186279296875, 0.6643829345703125, -0.29346466064453125, 0.03832244873046875, -0.12261199951171875, -0.06823348999023438, -0.34876251220703125, -0.031841278076171875, 0.21796035766601562, 0.1781597137451172, 0.11367988586425781, 0.2605743408203125, -0.1330718994140625, 0.1410369873046875, -0.16032791137695312, -0.38289642333984375, -0.23578643798828125, -0.1349773406982422, -0.0305328369140625, -0.271392822265625, -0.08980560302734375, -1.0141754150390625, -0.2968597412109375, -0.2581787109375, -0.42894744873046875, 0.4416160583496094, 0.07629966735839844, 0.15201377868652344, -0.11510848999023438, -0.764312744140625, -0.156646728515625, -0.0076236724853515625, -0.2265796661376953, -0.07134819030761719, 0.14636993408203125, 0.16878509521484375, 0.12040328979492188, 0.02814483642578125, 0.319915771484375, 0.069610595703125, -0.056171417236328125, -0.46572113037109375, 0.5790290832519531, 0.09073257446289062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000018.npy"}
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": 0.05492427945137024, "std": 0.3257042169570923, "min": -0.89892578125, "p10": -0.2548213958740234, "median": 0.042461395263671875, "p90": 0.41222496032714856, "max": 1.2525177001953125, "pos_frac": 0.609375, "sample": [0.635650634765625, -0.09305572509765625, 0.0896148681640625, -0.8443603515625, -0.18781661987304688, 0.18486785888671875, 0.0233612060546875, 0.027301788330078125, 0.098876953125, 0.375885009765625, 0.08917617797851562, -0.1437225341796875, 0.5239715576171875, 0.09414291381835938, 0.236572265625, 0.026889801025390625, 0.021673202514648438, -0.02419281005859375, 0.04569244384765625, 0.43541908264160156, -0.30808258056640625, -0.89892578125, -0.1010589599609375, 0.4304656982421875, 1.2525177001953125, 0.1055755615234375, 0.05980682373046875, -0.3915252685546875, -0.08306121826171875, -0.01831817626953125, 0.04395294189453125, -0.028438568115234375, 0.0409698486328125, 0.4397697448730469, -0.636077880859375, -0.0259857177734375, -0.21461105346679688, -0.2007293701171875, 0.2196502685546875, 0.02777862548828125, 0.2118663787841797, -0.22660446166992188, 0.2973175048828125, 0.024311065673828125, -0.18601226806640625, 0.3223743438720703, 0.20014572143554688, 0.2058563232421875, -0.08098983764648438, 0.31941986083984375, 0.252105712890625, -0.061496734619140625, -0.08435821533203125, -0.039974212646484375, 0.1336517333984375, 0.4277992248535156, -0.42110443115234375, -0.21707916259765625, -0.26691436767578125, 0.35947418212890625, 0.27340126037597656, 0.1791839599609375, 0.31627655029296875, 0.24688339233398438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000019.npy"}
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": -0.015722736716270447, "std": 0.2768303453922272, "min": -0.59857177734375, "p10": -0.422723388671875, "median": 0.0049343109130859375, "p90": 0.343795394897461, "max": 0.5377578735351562, "pos_frac": 0.546875, "sample": [-0.0155792236328125, 0.0525054931640625, 0.20000457763671875, -0.2082672119140625, -0.30782127380371094, -0.2075653076171875, -0.3052520751953125, -0.132965087890625, 0.049327850341796875, -0.27706146240234375, -0.14870452880859375, 0.15897750854492188, -0.57781982421875, 0.046367645263671875, -0.01666259765625, -0.46721839904785156, -0.59857177734375, 0.0024471282958984375, 0.29244232177734375, -0.430084228515625, 0.05982398986816406, -0.08625602722167969, 0.0006103515625, 0.48504638671875, 0.09054183959960938, -0.128936767578125, 0.0063877105712890625, -0.1707611083984375, 0.26952362060546875, -0.059044837951660156, 0.12701416015625, -0.4738349914550781, 0.10211944580078125, 0.27679443359375, -0.43863677978515625, 0.06396865844726562, 0.20412063598632812, 0.07537841796875, -0.2315216064453125, 0.3905792236328125, 0.11211204528808594, 0.020900726318359375, -0.105804443359375, -0.405548095703125, 0.3224945068359375, 0.514923095703125, 0.36774444580078125, -0.3058624267578125, 0.09355926513671875, 0.1474151611328125, -0.5251922607421875, 0.5377578735351562, 0.3043060302734375, 0.5286178588867188, 0.10457229614257812, -0.03644561767578125, 0.0308685302734375, 0.3529243469238281, -0.06340789794921875, -0.3220558166503906, -0.018049240112304688, 0.011119842529296875, -0.3481025695800781, 0.0034809112548828125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000020.npy"}
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": -0.020741939544677734, "std": 0.27672091126441956, "min": -0.7547760009765625, "p10": -0.32894229888916016, "median": 0.007555961608886719, "p90": 0.23810405731201179, "max": 0.8404998779296875, "pos_frac": 0.515625, "sample": [-0.07534408569335938, -0.24574851989746094, -0.09057807922363281, -0.2381744384765625, 0.12099456787109375, 0.2563018798828125, 0.1700916290283203, -0.0890655517578125, -0.053905487060546875, -0.24866867065429688, 0.18142318725585938, 0.01617431640625, 0.059234619140625, 0.047332763671875, 0.14824676513671875, 0.4356956481933594, -0.6019973754882812, 0.03937530517578125, -0.28469085693359375, -0.13981056213378906, -0.19522476196289062, -0.012559890747070312, 0.21976852416992188, 0.15375518798828125, 0.0793304443359375, -0.664215087890625, 0.24596214294433594, 0.038177490234375, 0.1183624267578125, -0.3161182403564453, 0.196258544921875, -0.22438812255859375, 0.438262939453125, 0.010076522827148438, -0.7547760009765625, 0.5034885406494141, 0.8404998779296875, 0.17798614501953125, -0.153106689453125, -0.35796165466308594, 0.20049285888671875, -0.47142791748046875, 0.103271484375, -0.3344383239746094, 0.38599395751953125, 0.021814346313476562, 0.05930328369140625, 0.1901702880859375, -0.20282363891601562, 0.16538619995117188, -0.2918357849121094, 0.005035400390625, -0.016618728637695312, -0.22534942626953125, -0.16016006469726562, -0.016651153564453125, 0.07508468627929688, -0.09097480773925781, 0.10843467712402344, 0.12420654296875, -0.01905059814453125, -0.036861419677734375, -0.0876617431640625, -0.5632896423339844], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000021.npy"}
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": 0.06918168067932129, "std": 0.28427553176879883, "min": -0.544281005859375, "p10": -0.2864837646484375, "median": 0.04256439208984375, "p90": 0.4443618774414064, "max": 0.84185791015625, "pos_frac": 0.59375, "sample": [-0.348907470703125, -0.46179962158203125, -0.12995147705078125, 0.058345794677734375, 0.1714153289794922, 0.23427581787109375, 0.137542724609375, 0.08294677734375, -0.12541580200195312, 0.3933525085449219, 0.5055999755859375, 0.11563873291015625, -0.04144859313964844, -0.016767501831054688, 0.050945281982421875, 0.0287933349609375, -0.2969322204589844, 0.06408309936523438, 0.02028656005859375, -0.12487030029296875, 0.49494171142578125, 0.1746826171875, 0.35631561279296875, 0.40584564208984375, -0.021963119506835938, -0.544281005859375, 0.15097427368164062, 0.12138748168945312, -0.2069377899169922, 0.00689697265625, 0.1300506591796875, -0.485504150390625, -0.18578338623046875, 0.35997772216796875, 0.22258949279785156, 0.012054443359375, -0.023275375366210938, 0.46086883544921875, -0.16031646728515625, -0.2601165771484375, 0.34790992736816406, -0.4461669921875, 0.49416351318359375, 0.3749542236328125, -0.2621040344238281, 0.66632080078125, -0.13182449340820312, -0.3122100830078125, -0.04361724853515625, -0.0317840576171875, 0.034183502197265625, 0.2216796875, 0.5333404541015625, 0.364013671875, 0.84185791015625, -0.042751312255859375, 0.142578125, 0.029794692993164062, 0.34619140625, -0.1104278564453125, 0.1407623291015625, -0.12503814697265625, 0.17380714416503906, -0.103546142578125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000022.npy"}
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": -0.034706562757492065, "std": 0.35763198137283325, "min": -0.9917755126953125, "p10": -0.5167251586914062, "median": 0.031360626220703125, "p90": 0.3593727111816406, "max": 0.817718505859375, "pos_frac": 0.53125, "sample": [-0.6202545166015625, -0.776641845703125, 0.43634033203125, 0.3133544921875, -0.15824317932128906, 0.09838104248046875, -0.2221832275390625, -0.93609619140625, -0.17073822021484375, 0.180328369140625, -0.10697174072265625, -0.23932838439941406, 0.0317230224609375, -0.14046669006347656, 0.07013320922851562, 0.359100341796875, -0.5341033935546875, 0.3237800598144531, -0.2785835266113281, 0.098358154296875, 0.40459442138671875, 0.06836509704589844, -0.041027069091796875, -0.32501983642578125, 0.09117507934570312, 0.4186553955078125, 0.03099822998046875, -0.24836349487304688, -0.9917755126953125, -0.42081451416015625, -0.501312255859375, 0.32869720458984375, -0.04621124267578125, 0.2134723663330078, 0.6071128845214844, 0.06507110595703125, -0.6451873779296875, -0.360260009765625, 0.055629730224609375, -0.41107749938964844, 0.086639404296875, -0.15283203125, 0.24375534057617188, 0.12056732177734375, 0.22607421875, 0.817718505859375, -0.31574249267578125, 0.3247337341308594, 0.06650543212890625, -0.5233306884765625, -0.073455810546875, -0.0548858642578125, 0.01507568359375, -0.29050636291503906, -0.2882194519042969, 0.2779502868652344, 0.4605255126953125, 0.17800521850585938, 0.1986236572265625, 0.2509765625, -0.04852294921875, -0.19533920288085938, 0.35948944091796875, 0.07436370849609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000023.npy"}
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": 0.013681381940841675, "std": 0.22192321717739105, "min": -0.642791748046875, "p10": -0.2747182846069336, "median": 0.02778911590576172, "p90": 0.30816078186035156, "max": 0.4399261474609375, "pos_frac": 0.546875, "sample": [-0.11119842529296875, -0.2208251953125, -0.0323486328125, 0.177978515625, 0.0688629150390625, 0.23514556884765625, 0.4399261474609375, 0.19757080078125, 0.10791778564453125, 0.34295654296875, -0.1439037322998047, 0.03966331481933594, -0.2040557861328125, -0.4953460693359375, 0.11444854736328125, -0.151397705078125, -0.031017303466796875, 0.22216415405273438, -0.20470428466796875, 0.1275787353515625, 0.02108001708984375, 0.15717697143554688, 0.20633697509765625, -0.14400482177734375, 0.31360626220703125, -0.026088714599609375, 0.16925048828125, -0.16043853759765625, -0.642791748046875, -0.373687744140625, -0.12292289733886719, -0.08023452758789062, 0.07468414306640625, -0.28362464904785156, 0.3626708984375, 0.3558197021484375, -0.047557830810546875, -0.10956001281738281, -0.05440521240234375, 0.03656768798828125, 0.02286529541015625, 0.38719940185546875, 0.16799354553222656, 0.1051177978515625, -0.294464111328125, 0.3013916015625, -0.0990447998046875, -0.3418426513671875, 0.09312057495117188, -0.253936767578125, -0.13431930541992188, 0.036529541015625, 0.15354156494140625, 0.0082855224609375, -0.30902099609375, 0.13910293579101562, 0.03271293640136719, -0.1343994140625, -0.04854583740234375, -0.08727264404296875, 0.203857421875, 0.29599761962890625, 0.188385009765625, 0.3110618591308594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000024.npy"}
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": 0.07940652966499329, "std": 0.3727625608444214, "min": -0.6801910400390625, "p10": -0.2590000152587891, "median": 0.00030422210693359375, "p90": 0.5170112609863283, "max": 1.3680419921875, "pos_frac": 0.5, "sample": [0.033599853515625, 0.00977325439453125, -0.11278343200683594, 0.1005096435546875, -0.1680583953857422, -0.3506641387939453, 1.0738449096679688, 0.0067119598388671875, -0.292938232421875, 0.24707794189453125, 0.20832061767578125, -0.02222442626953125, -0.6801910400390625, -0.5242156982421875, -0.03582763671875, 0.4044456481933594, -0.1105194091796875, 0.2670440673828125, -0.126495361328125, 0.45831298828125, 0.00576019287109375, 0.010515213012695312, 0.01987457275390625, 1.3680419921875, 0.3743896484375, -0.0051517486572265625, -0.017589569091796875, 0.838409423828125, -0.2584953308105469, -0.059131622314453125, -0.059112548828125, 0.0073184967041015625, 0.0734710693359375, 0.3620147705078125, 0.5421676635742188, 0.02358245849609375, 0.29681396484375, -0.0791168212890625, 0.1493377685546875, -0.405975341796875, -0.230743408203125, -0.09584808349609375, 0.014127731323242188, -0.10430335998535156, -0.1475677490234375, -0.113006591796875, 0.08220291137695312, -0.016307830810546875, -0.06674957275390625, -0.14556884765625, 0.18194961547851562, 0.06052589416503906, 0.7770538330078125, -0.17688369750976562, -0.05069732666015625, -0.0073757171630859375, -0.04390716552734375, 0.2113494873046875, 0.9031219482421875, -0.45477294921875, -0.25921630859375, 0.9923095703125, -0.13214111328125, 0.3316192626953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000025.npy"}
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": -0.009073078632354736, "std": 0.3293210566043854, "min": -0.55279541015625, "p10": -0.42013549804687494, "median": -0.02959442138671875, "p90": 0.38081111907959, "max": 1.35906982421875, "pos_frac": 0.46875, "sample": [0.4129142761230469, 0.412689208984375, -0.4420166015625, -0.28797149658203125, 0.09289169311523438, 0.11174964904785156, -0.5082931518554688, -0.5160331726074219, -0.1148681640625, 0.461273193359375, -0.13487625122070312, 1.35906982421875, -0.127105712890625, -0.3848075866699219, -0.2103729248046875, -0.34275054931640625, -0.01212310791015625, 0.6898097991943359, -0.03755950927734375, 0.13201141357421875, -0.0230255126953125, -0.1274433135986328, 0.03839874267578125, -0.3168468475341797, 0.18981552124023438, -0.16092681884765625, 0.02285003662109375, 0.04282188415527344, 0.39354896545410156, -0.036163330078125, 0.016778945922851562, 0.6239738464355469, -0.134674072265625, 0.2992973327636719, 0.01674652099609375, -0.1634368896484375, 0.23816680908203125, 0.11921310424804688, -0.07083892822265625, -0.55279541015625, 0.05487632751464844, -0.49408721923828125, -0.11681747436523438, -0.271484375, 0.2837066650390625, -0.3467559814453125, -0.304290771484375, -0.08599662780761719, -0.48687744140625, -0.23751258850097656, 0.16120147705078125, 0.0464324951171875, 0.2779083251953125, 0.3083038330078125, -0.23430824279785156, 0.3510894775390625, 0.15874481201171875, 0.05876731872558594, 0.06610107421875, -0.23905181884765625, -0.4352760314941406, 0.1289806365966797, -0.06427192687988281, -0.129150390625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000026.npy"}
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": 0.0052055418491363525, "std": 0.2946908175945282, "min": -0.6446456909179688, "p10": -0.3540567398071289, "median": -0.007321357727050781, "p90": 0.33377532958984385, "max": 0.8173370361328125, "pos_frac": 0.5, "sample": [-0.10418701171875, 0.27182769775390625, -0.242523193359375, -0.036106109619140625, 0.11991119384765625, 0.2792510986328125, -0.11266708374023438, -0.172698974609375, -0.24016571044921875, -0.2007904052734375, 0.055328369140625, -0.312225341796875, 0.11596488952636719, 0.12757110595703125, -0.2732982635498047, -0.12302017211914062, -0.04574775695800781, 0.10407257080078125, -0.10052108764648438, -0.18376541137695312, 0.34299468994140625, -0.1625823974609375, -0.4419403076171875, -0.382843017578125, -0.0407257080078125, 0.8173370361328125, -0.24442291259765625, 0.08456039428710938, 0.5747451782226562, 0.025621414184570312, 0.1977081298828125, -0.04334449768066406, 0.012502670288085938, -0.5396270751953125, 0.6798439025878906, -0.6446456909179688, 0.29703521728515625, 0.5639495849609375, 0.38817596435546875, -0.13942718505859375, 0.31226348876953125, -0.15287017822265625, 0.00901031494140625, 0.24937057495117188, -0.20955276489257812, 0.21828460693359375, -0.0514373779296875, 0.17016220092773438, -0.4338836669921875, 0.2735176086425781, 0.27452850341796875, 0.019317626953125, -0.023653030395507812, 0.1538543701171875, 0.045658111572265625, -0.3292217254638672, 0.51617431640625, -0.37885284423828125, -0.3647003173828125, 0.1893634796142578, -0.29558563232421875, -0.317138671875, 0.11458969116210938, 0.0728302001953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000027.npy"}
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": 0.011406183242797852, "std": 0.34519389271736145, "min": -0.7028045654296875, "p10": -0.38037834167480467, "median": 0.01878643035888672, "p90": 0.4475605010986329, "max": 0.90533447265625, "pos_frac": 0.53125, "sample": [0.1005706787109375, 0.525299072265625, 0.212249755859375, 0.06914520263671875, -0.50665283203125, -0.15278244018554688, -0.52301025390625, 0.031597137451171875, 0.3955230712890625, 0.876953125, -0.6175460815429688, -0.23897552490234375, -0.23738861083984375, 0.200164794921875, -0.060821533203125, -0.27960205078125, -0.27987098693847656, -0.3797187805175781, 0.054500579833984375, 0.21093368530273438, 0.2134552001953125, 0.0790557861328125, 0.0137176513671875, -0.25896453857421875, 0.3372039794921875, 0.03169441223144531, -0.13573074340820312, 0.23187255859375, -0.5944061279296875, -0.08460235595703125, -0.254791259765625, -0.4261360168457031, 0.490234375, 0.45253753662109375, 0.6396408081054688, -0.3806610107421875, 0.050777435302734375, 0.02892303466796875, -0.3076133728027344, -0.7028045654296875, 0.010351181030273438, -0.011810302734375, -0.34090423583984375, 0.4359474182128906, 0.06887435913085938, 0.023855209350585938, 0.1780529022216797, -0.2377777099609375, -0.1805896759033203, -0.24904251098632812, -0.29962158203125, 0.38443756103515625, 0.4302024841308594, 0.26389312744140625, 0.90533447265625, 0.1936187744140625, -0.07839775085449219, -0.37776947021484375, -0.0114288330078125, 0.1391773223876953, 0.5191497802734375, 0.3452796936035156, -0.049739837646484375, -0.15506744384765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000028.npy"}
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": 0.010018914937973022, "std": 0.3390301764011383, "min": -0.9913101196289062, "p10": -0.44877166748046876, "median": 0.040767669677734375, "p90": 0.39383850097656276, "max": 0.7055892944335938, "pos_frac": 0.53125, "sample": [0.224609375, -0.4439544677734375, 0.22290420532226562, -0.642425537109375, 0.3213043212890625, -0.10116386413574219, 0.46987152099609375, 0.011188507080078125, -0.576141357421875, -0.20611572265625, 0.18357276916503906, -0.3105506896972656, -0.020538330078125, -0.08545303344726562, 0.0674591064453125, 0.6932830810546875, -0.14166259765625, -0.09638214111328125, -0.39178466796875, 0.32720947265625, 0.2305469512939453, 0.701690673828125, 0.10254287719726562, -0.3098888397216797, 0.2708892822265625, 0.07630157470703125, -0.10985946655273438, -0.5046348571777344, -0.479034423828125, 0.09918785095214844, 0.2778053283691406, -0.0834197998046875, -0.1651153564453125, 0.24084091186523438, 0.57440185546875, 0.186309814453125, -0.07198333740234375, 0.20024490356445312, 0.15731048583984375, 0.422393798828125, -0.2890777587890625, -0.19343185424804688, -0.29036712646484375, 0.30010986328125, -0.08136558532714844, 0.2346649169921875, 0.01407623291015625, -0.058685302734375, 0.10227584838867188, 0.12670135498046875, 0.2816009521484375, 0.24989700317382812, 0.4342193603515625, 0.09251213073730469, -0.450836181640625, 0.15191268920898438, -0.9913101196289062, -0.31348419189453125, -0.0106048583984375, -0.011077880859375, -0.5315093994140625, -0.3808135986328125, 0.22845458984375, 0.7055892944335938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000029.npy"}
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": 0.004153341054916382, "std": 0.26802217960357666, "min": -0.6184768676757812, "p10": -0.26112995147705076, "median": -0.026697158813476562, "p90": 0.3386745452880861, "max": 0.95550537109375, "pos_frac": 0.4375, "sample": [-0.2499828338623047, 0.11572265625, 0.22652435302734375, -0.6184768676757812, -0.097869873046875, -0.02117919921875, 0.2008953094482422, 0.5027618408203125, -0.18706512451171875, -0.0982666015625, -0.06747055053710938, 0.36600494384765625, 0.20694732666015625, -0.21318817138671875, -0.26590728759765625, -0.31610107421875, -0.018625259399414062, 0.6525650024414062, -0.16915130615234375, -0.11716461181640625, 0.2434234619140625, -0.1021575927734375, -0.16162681579589844, 0.15760231018066406, -0.05847930908203125, -0.02386474609375, -0.19523239135742188, -0.09104537963867188, 0.084136962890625, 0.01593780517578125, 0.6817626953125, 0.14196395874023438, -0.1038970947265625, 0.29976654052734375, 0.95550537109375, 0.09917449951171875, -0.20264434814453125, -0.08719635009765625, 0.0192413330078125, -0.4281654357910156, 0.00604248046875, -0.29882049560546875, 0.3536376953125, -0.1707305908203125, -0.24448776245117188, 0.106903076171875, -0.17313385009765625, 0.05023956298828125, 0.003631591796875, -0.32586669921875, -0.301422119140625, -0.0042572021484375, 0.14186859130859375, -0.22525405883789062, 0.3037605285644531, 0.046878814697265625, -0.172607421875, -0.128631591796875, 0.0311279296875, 0.10336875915527344, -0.029529571533203125, -0.0955047607421875, -0.17217254638671875, 0.38559532165527344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000030.npy"}
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": 0.050546497106552124, "std": 0.3903513550758362, "min": -1.170684814453125, "p10": -0.380601692199707, "median": 0.032196998596191406, "p90": 0.44596862792968756, "max": 1.42388916015625, "pos_frac": 0.546875, "sample": [0.28534698486328125, 0.02524566650390625, -0.5989227294921875, 0.4574623107910156, -0.12528228759765625, 0.10418701171875, 0.13093185424804688, 0.4026451110839844, 0.449127197265625, 0.1404571533203125, 0.099395751953125, 0.4385986328125, 0.38991546630859375, 0.0947265625, 0.5684585571289062, -0.001007080078125, 0.1046905517578125, 0.13885498046875, 0.8261795043945312, -0.4463348388671875, 0.9354248046875, 0.283294677734375, -0.08218193054199219, -0.5997467041015625, 0.20175933837890625, 0.249267578125, 0.28271484375, -0.5310287475585938, -0.03215789794921875, 0.2267913818359375, 0.233245849609375, -0.3213653564453125, -0.24615478515625, -0.38526344299316406, -0.06740188598632812, -0.10001754760742188, -0.085205078125, -0.15141677856445312, 0.007305145263671875, -0.0751495361328125, 0.11629486083984375, -0.5039901733398438, -0.29799652099609375, 0.5653839111328125, 1.42388916015625, 0.13024139404296875, -0.00543975830078125, -0.3697242736816406, 0.15325164794921875, -0.170013427734375, -1.170684814453125, -0.1275482177734375, 0.051605224609375, -0.05208015441894531, -0.15728759765625, -0.34169769287109375, 0.4174022674560547, 0.370941162109375, -0.11355972290039062, -0.09442901611328125, 0.37371826171875, 0.028367996215820312, 0.0360260009765625, -0.2550849914550781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000031.npy"}
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.031289905309677124, "std": 0.28950613737106323, "min": -0.7948989868164062, "p10": -0.32663116455078123, "median": 0.0168304443359375, "p90": 0.3396747589111329, "max": 0.8708038330078125, "pos_frac": 0.515625, "sample": [-0.08901214599609375, -0.12126922607421875, -0.7948989868164062, -0.3556365966796875, 0.061004638671875, 0.07938385009765625, 0.2694091796875, -0.4450531005859375, 0.8708038330078125, -0.06261444091796875, 0.3487052917480469, 0.6208724975585938, 0.045291900634765625, -0.06990432739257812, 0.2826499938964844, -0.2577056884765625, -0.3215789794921875, 0.2924232482910156, -0.4035301208496094, -0.062328338623046875, 0.12206459045410156, -0.08545303344726562, 0.2318248748779297, 0.368377685546875, 0.5268440246582031, -0.1603870391845703, 0.2635078430175781, 0.2648773193359375, 0.1444549560546875, 0.5660629272460938, 0.036914825439453125, 0.04630279541015625, -0.2909393310546875, 0.02527618408203125, -0.046661376953125, 0.1150970458984375, 0.04209136962890625, -0.4236793518066406, 0.7392120361328125, 0.318603515625, -0.001995086669921875, -0.07853317260742188, -0.08297348022460938, -0.0723419189453125, -0.058528900146484375, -0.20294952392578125, 0.03035736083984375, -0.3044624328613281, -0.0060882568359375, 0.07872581481933594, 0.09881591796875, 0.23915481567382812, -0.1624774932861328, -0.010538101196289062, 0.00838470458984375, -0.3431987762451172, -0.32879638671875, 0.24603271484375, -0.10561752319335938, -0.018520355224609375, 0.20079421997070312, 0.16005706787109375, -0.123382568359375, 0.14923095703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000032.npy"}
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": 0.08293843269348145, "std": 0.38587385416030884, "min": -1.3079605102539062, "p10": -0.339990234375, "median": 0.05261945724487305, "p90": 0.45156402587890637, "max": 1.351531982421875, "pos_frac": 0.625, "sample": [-0.44744873046875, -0.03668212890625, 0.42201995849609375, 0.01453399658203125, 0.021511077880859375, 0.1385040283203125, 0.39434051513671875, 0.30776405334472656, 0.46422576904296875, 0.7425003051757812, -0.4579925537109375, -0.14835166931152344, -0.1300497055053711, -0.17517471313476562, 1.0506515502929688, -0.5390357971191406, -0.123382568359375, 0.24442291259765625, 0.047809600830078125, 0.2119140625, 0.5982589721679688, 1.351531982421875, 0.582275390625, -0.015241622924804688, -1.3079605102539062, 0.23626708984375, -0.31624603271484375, 0.19124603271484375, 0.01983642578125, -0.234039306640625, 0.2245349884033203, 0.2844047546386719, 0.31180572509765625, -0.17563629150390625, -0.024778366088867188, -0.010656356811523438, 0.10377693176269531, 0.37073516845703125, 0.04375457763671875, -0.35016632080078125, 0.336273193359375, -0.24440765380859375, -0.14586257934570312, 0.5765304565429688, 0.11090087890625, -0.1667633056640625, 0.24011993408203125, 0.40038299560546875, 0.05276775360107422, -0.08964920043945312, 0.03368949890136719, -0.458038330078125, 0.1651611328125, -0.4785614013671875, 0.000759124755859375, 0.10540008544921875, -0.08359527587890625, 0.38311004638671875, -0.248291015625, 0.1236114501953125, 0.42182159423828125, 0.052471160888671875, 0.19194412231445312, 0.1425018310546875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000033.npy"}
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": 0.10567304491996765, "std": 0.28298723697662354, "min": -0.5869255065917969, "p10": -0.2128589630126953, "median": 0.11377811431884766, "p90": 0.43116607666015627, "max": 0.7324485778808594, "pos_frac": 0.671875, "sample": [-0.19421958923339844, 0.03018951416015625, 0.26755523681640625, 0.014978408813476562, 0.1616668701171875, 0.598052978515625, -0.13853073120117188, 0.25731658935546875, 0.2164154052734375, 0.018280029296875, 0.7324485778808594, -0.1831512451171875, -0.29033660888671875, -0.083587646484375, -0.1766204833984375, -0.0311431884765625, 0.333587646484375, 0.41085052490234375, 0.09103202819824219, 0.18042373657226562, -0.16231536865234375, 0.10859107971191406, -0.10200119018554688, -0.2095184326171875, 0.3451271057128906, 0.00164031982421875, 0.4030914306640625, 0.5238456726074219, 0.58392333984375, -0.1341419219970703, 0.13386917114257812, 0.43170166015625, 0.1726207733154297, 0.1617431640625, 0.07875442504882812, 0.6720428466796875, 0.19053268432617188, 0.42925262451171875, 0.14414215087890625, 0.24883651733398438, 0.42380523681640625, 0.11896514892578125, 0.14608001708984375, 0.3088951110839844, 0.4299163818359375, 0.221649169921875, -0.5869255065917969, 0.05499267578125, 0.3591346740722656, 0.09428596496582031, 0.28690338134765625, 0.291259765625, 0.5183486938476562, -0.15544891357421875, -0.21429061889648438, -0.3155937194824219, -0.09755706787109375, -0.07673454284667969, -0.1862163543701172, 0.018222808837890625, -0.5022659301757812, 0.06295013427734375, -0.3547515869140625, -0.31949615478515625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000034.npy"}
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": -0.038157522678375244, "std": 0.33253708481788635, "min": -0.63525390625, "p10": -0.41324195861816404, "median": -0.07739067077636719, "p90": 0.33716011047363287, "max": 0.9582977294921875, "pos_frac": 0.390625, "sample": [-0.3582000732421875, -0.4197044372558594, 0.3723125457763672, -0.1197662353515625, -0.07265853881835938, 0.345062255859375, -0.3447532653808594, 0.09920501708984375, -0.00299835205078125, -0.3810291290283203, -0.237579345703125, 0.1830577850341797, 0.2461223602294922, -0.2115497589111328, -0.26993560791015625, -0.082122802734375, 0.913116455078125, -0.48586273193359375, -0.1334686279296875, -0.2304229736328125, 0.23623275756835938, -0.1475067138671875, 0.2530193328857422, -0.18221664428710938, 0.07604598999023438, -0.4471168518066406, -0.613372802734375, -0.22083282470703125, -0.17519378662109375, -0.2740936279296875, -0.0535430908203125, 0.9582977294921875, 0.1764373779296875, -0.398162841796875, -0.25763702392578125, -0.63525390625, -0.023054122924804688, -0.4435882568359375, -0.024700164794921875, -0.3580818176269531, 0.30019378662109375, 0.39542198181152344, 0.5674285888671875, -0.39678955078125, -0.18065643310546875, 0.1039581298828125, 0.20453643798828125, -0.1969451904296875, -0.0092010498046875, 0.021327972412109375, -0.3889732360839844, 0.09253692626953125, -0.0538787841796875, 0.318023681640625, -0.27562713623046875, 0.10559463500976562, 0.594451904296875, -0.454925537109375, -0.11153030395507812, 0.2708740234375, 0.3187217712402344, 0.2635345458984375, -0.24248313903808594, 0.05782127380371094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000035.npy"}
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": -0.02209240198135376, "std": 0.3715944290161133, "min": -1.00518798828125, "p10": -0.4371673583984375, "median": -0.03365039825439453, "p90": 0.45314788818359397, "max": 0.9375, "pos_frac": 0.46875, "sample": [-0.27447509765625, 0.209564208984375, 0.1082916259765625, -0.015453338623046875, -0.44220733642578125, -0.4005546569824219, -0.065521240234375, 0.18741607666015625, 0.17514991760253906, -0.5559768676757812, -0.4060516357421875, 0.0809326171875, 0.57257080078125, 0.9375, 0.016994476318359375, -0.0696868896484375, -0.0046539306640625, -0.290283203125, 0.0612945556640625, 0.6346549987792969, -0.09610557556152344, 0.629852294921875, -0.29904937744140625, -0.09104347229003906, -0.6662483215332031, -0.603515625, -0.14727020263671875, 0.474456787109375, 0.013355255126953125, -1.00518798828125, -0.1650390625, 0.05611419677734375, -0.091278076171875, -0.1226959228515625, -0.6428070068359375, 0.15351486206054688, 0.00453948974609375, -0.188018798828125, -0.3235149383544922, 0.2891273498535156, 0.16334915161132812, 0.9105377197265625, 0.8149795532226562, -0.3719615936279297, 0.0040493011474609375, 0.21398544311523438, 0.2044525146484375, 0.08440399169921875, 0.198577880859375, 0.3260383605957031, 0.07308197021484375, -0.20982742309570312, -0.05184745788574219, -0.053314208984375, -0.42540740966796875, -0.5332260131835938, -0.16251373291015625, -0.21239089965820312, -0.190887451171875, 0.16127395629882812, -0.2906818389892578, 0.4034271240234375, -0.19311904907226562, 0.08441543579101562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000036.npy"}
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": 0.08684796094894409, "std": 0.3997003734111786, "min": -1.2071380615234375, "p10": -0.3051479339599609, "median": 0.03183746337890625, "p90": 0.6341789245605469, "max": 1.3626708984375, "pos_frac": 0.578125, "sample": [-0.2133636474609375, 0.4345703125, 0.3999481201171875, -0.6435165405273438, 0.393585205078125, 0.08182525634765625, -0.0014705657958984375, 0.023614883422851562, 0.0296173095703125, 0.1446533203125, 0.286529541015625, -0.21571731567382812, 0.6817245483398438, 0.1072540283203125, -0.2549018859863281, -0.05950927734375, 0.11760711669921875, 0.557403564453125, 0.46715545654296875, -0.0084075927734375, -0.09273719787597656, -0.023237228393554688, -1.2071380615234375, -0.31549835205078125, -0.3124847412109375, -0.06520843505859375, -0.42726898193359375, -0.1323223114013672, 0.27182769775390625, 0.09035491943359375, -0.2753143310546875, -0.2703094482421875, 0.6933975219726562, 0.17725372314453125, 1.3626708984375, 0.6397476196289062, 0.621185302734375, -0.2880287170410156, -0.0651397705078125, 0.03557586669921875, 0.654754638671875, -0.09807586669921875, -0.09723281860351562, 0.33142852783203125, -0.696258544921875, -0.008056640625, -0.042083740234375, -0.216094970703125, 0.2779998779296875, 0.3068695068359375, 0.2901020050048828, 0.015193939208984375, 0.0002384185791015625, -0.1624431610107422, 0.1659393310546875, 0.44350433349609375, 0.2566070556640625, 0.28397178649902344, 0.803680419921875, 0.12409210205078125, 0.6645736694335938, -0.54644775390625, 0.026021957397460938, 0.0340576171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000037.npy"}
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": -0.03814077377319336, "std": 0.46023693680763245, "min": -1.25469970703125, "p10": -0.5615966796874999, "median": -0.024938583374023438, "p90": 0.4062835693359377, "max": 1.324432373046875, "pos_frac": 0.453125, "sample": [-0.15534019470214844, 0.4901618957519531, -0.2845916748046875, 1.1839065551757812, -0.299041748046875, 0.4241752624511719, 0.11449813842773438, 0.1255645751953125, -0.7681655883789062, 0.343048095703125, -0.0695037841796875, 0.20015335083007812, 0.20930862426757812, -0.12723350524902344, -0.06806564331054688, -0.18781280517578125, -0.42192840576171875, -0.013711929321289062, 0.0923919677734375, 0.44330596923828125, -0.47045135498046875, -0.15618896484375, -1.25469970703125, -0.11811065673828125, -0.27550506591796875, 0.1479034423828125, 0.0740509033203125, -0.7809600830078125, -0.7275772094726562, 0.1680145263671875, -0.8168048858642578, -0.4825439453125, -0.2083740234375, -0.009777069091796875, 0.8440361022949219, 0.3645362854003906, -0.1992950439453125, -0.57342529296875, 0.12741851806640625, 0.2640876770019531, -0.10485458374023438, -0.878387451171875, 0.19884872436523438, 0.3247261047363281, 0.2659873962402344, -0.339569091796875, -0.033935546875, -0.47502899169921875, 0.18059539794921875, 0.9864501953125, -0.53399658203125, 0.11568450927734375, 1.324432373046875, -0.42372894287109375, -0.29143524169921875, 0.2867088317871094, -0.016941070556640625, 0.11177825927734375, -0.383026123046875, -0.03293609619140625, 0.19813919067382812, 0.108428955078125, 0.06633377075195312, -0.24273681640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000038.npy"}
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.023186206817626953, "std": 0.32939985394477844, "min": -0.6946487426757812, "p10": -0.4473979949951172, "median": 0.057059288024902344, "p90": 0.41521072387695324, "max": 0.5845947265625, "pos_frac": 0.625, "sample": [0.42694091796875, -0.1050262451171875, -0.28276824951171875, 0.05099678039550781, -0.3549041748046875, -0.07464599609375, 0.0771484375, 0.34834861755371094, 0.294830322265625, 0.25548362731933594, -0.4062042236328125, -0.4033355712890625, 0.4524974822998047, 0.121337890625, 0.15435028076171875, 0.1584911346435547, -0.1397838592529297, 0.5610790252685547, 0.2747325897216797, -0.5157318115234375, -0.6946487426757812, -0.436126708984375, -0.033725738525390625, 0.32636260986328125, 0.2337646484375, 0.04652976989746094, -0.635467529296875, 0.05040740966796875, 0.26641845703125, -0.4522285461425781, 0.12638092041015625, 0.3184967041015625, 0.015625, 0.2420501708984375, 0.15636825561523438, 0.5687484741210938, -0.09139251708984375, 0.38784027099609375, 0.009654998779296875, -0.5961837768554688, 0.5845947265625, -0.22624969482421875, 0.022550582885742188, 0.3768157958984375, -0.31988525390625, -0.2735748291015625, -0.6886138916015625, 0.204315185546875, -0.09744644165039062, 0.17865753173828125, 0.02730560302734375, -0.615447998046875, 0.3133811950683594, 0.06312179565429688, 0.11759185791015625, 0.2149658203125, 0.21972084045410156, 0.5357208251953125, -0.13115692138671875, -0.2609901428222656, 0.4642066955566406, 0.11182022094726562, -0.04513359069824219, 0.00493621826171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000039.npy"}
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": 0.011313170194625854, "std": 0.431333988904953, "min": -1.938323974609375, "p10": -0.4610448837280273, "median": 0.027543067932128906, "p90": 0.4543273925781252, "max": 0.8914337158203125, "pos_frac": 0.5625, "sample": [-0.09923553466796875, 0.7285385131835938, 0.5056610107421875, 0.0070438385009765625, 0.027507781982421875, 0.19394493103027344, 0.2751922607421875, -0.043949127197265625, -0.13165283203125, 0.23952102661132812, -0.00217437744140625, 0.2916755676269531, -0.21794891357421875, 0.027578353881835938, 0.2135601043701172, 0.05174827575683594, 0.8914337158203125, 0.7180252075195312, 0.3679065704345703, -0.04753875732421875, 0.037952423095703125, 0.763092041015625, 0.3997039794921875, 0.22492218017578125, 0.12378311157226562, 0.4777374267578125, 0.10550689697265625, -0.15776443481445312, 0.39768218994140625, -0.13620567321777344, 0.3171062469482422, -0.8053131103515625, -0.60247802734375, 0.0106658935546875, 0.27449607849121094, 0.2104778289794922, -0.4180011749267578, 0.2793464660644531, -0.7270355224609375, -0.020172119140625, -0.3330707550048828, -1.938323974609375, 0.3512134552001953, 0.4796295166015625, -0.541839599609375, 0.3849601745605469, 0.015359878540039062, -0.21550369262695312, -0.3579673767089844, -0.0846099853515625, 0.07594680786132812, 0.0931396484375, -0.19812393188476562, -0.07740020751953125, -0.1875457763671875, -0.6907577514648438, -0.4794921875, -0.17778778076171875, -0.363311767578125, -0.293060302734375, 0.20039749145507812, -0.031167984008789062, 0.3100433349609375, 0.030975341796875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000040.npy"}
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.02519175410270691, "std": 0.36735108494758606, "min": -0.8316268920898438, "p10": -0.3734272003173828, "median": -0.03300666809082031, "p90": 0.5439056396484375, "max": 1.001739501953125, "pos_frac": 0.484375, "sample": [0.414306640625, -0.0599517822265625, -0.121337890625, -0.3526573181152344, -0.3265228271484375, 0.6605072021484375, -0.37958526611328125, 0.3840751647949219, -0.23279953002929688, -0.5508460998535156, 0.345916748046875, 0.3312721252441406, -0.1699066162109375, 0.574066162109375, 0.197357177734375, 0.34844970703125, -0.006061553955078125, 0.5467071533203125, -0.17267608642578125, -0.07611083984375, 0.22928619384765625, 0.5373687744140625, -0.2012348175048828, 0.15926361083984375, 0.07969474792480469, -0.29090118408203125, -0.1047515869140625, 0.35549354553222656, 0.0742340087890625, -0.2030487060546875, -0.09270477294921875, 0.1987457275390625, 1.001739501953125, 0.35117340087890625, 0.05236053466796875, -0.49430084228515625, 0.044342041015625, -0.31790924072265625, 0.42523956298828125, -0.2208843231201172, 0.007015228271484375, -0.8316268920898438, -0.21498489379882812, -0.1585216522216797, 0.7312545776367188, -0.1875629425048828, -0.101837158203125, -0.061305999755859375, 0.66204833984375, 0.194549560546875, -0.5774917602539062, -0.368804931640625, -0.3754081726074219, 0.20745086669921875, 0.21202468872070312, -0.746185302734375, 0.17195892333984375, -0.31293487548828125, 0.19968032836914062, -0.27271270751953125, 0.09464263916015625, 0.5569686889648438, -0.08206367492675781, -0.0712890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000041.npy"}
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.0798250287771225, "std": 0.4121474027633667, "min": -0.8512039184570312, "p10": -0.4111968994140625, "median": 0.019516944885253906, "p90": 0.6484256744384767, "max": 0.9875411987304688, "pos_frac": 0.515625, "sample": [0.5900421142578125, 0.77386474609375, -0.384063720703125, -0.35135650634765625, -0.04358482360839844, -0.4126434326171875, 0.0047397613525390625, -0.09321212768554688, -0.2086181640625, 0.4024505615234375, -0.21187591552734375, 0.1356048583984375, 0.08168411254882812, -0.26128387451171875, -0.08432769775390625, 0.4912834167480469, 0.8504791259765625, 0.3690643310546875, -0.0549468994140625, 0.23950576782226562, -0.18294906616210938, 0.9875411987304688, 0.48284912109375, -0.34500885009765625, -0.3752326965332031, -0.14186859130859375, -0.039653778076171875, 0.08074951171875, -0.06743621826171875, -0.237457275390625, 0.06409072875976562, 0.6722946166992188, 0.06616592407226562, -0.409210205078125, 0.5922050476074219, 0.03429412841796875, 0.10578155517578125, 0.217742919921875, -0.8512039184570312, 0.3177642822265625, -0.11268424987792969, -0.25287628173828125, -0.41204833984375, -0.45438385009765625, -0.4876556396484375, -0.02771759033203125, 0.6106796264648438, 0.6328125, -0.09298324584960938, -0.74462890625, 0.6551170349121094, 0.42496490478515625, -0.089263916015625, 0.12743759155273438, 0.46623802185058594, -0.10491943359375, -0.072967529296875, 0.18037128448486328, 0.9570083618164062, 0.7080535888671875, 0.1884479522705078, 0.4183464050292969, 0.30281829833984375, -0.5156288146972656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000042.npy"}
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": 0.023893296718597412, "std": 0.42230355739593506, "min": -1.1259613037109375, "p10": -0.5588703155517578, "median": 0.06513214111328125, "p90": 0.4812835693359376, "max": 1.3144378662109375, "pos_frac": 0.609375, "sample": [-0.0656280517578125, -0.144683837890625, -0.602325439453125, 0.13573265075683594, 0.0153045654296875, -0.3002586364746094, -0.2965278625488281, 0.08756065368652344, -0.16766357421875, -0.2982902526855469, 0.23725318908691406, 0.32398223876953125, -0.13623046875, -0.5655136108398438, 0.468017578125, 0.154022216796875, 0.2547721862792969, 0.22445297241210938, 0.26279449462890625, 0.0350189208984375, 0.3605003356933594, -0.11797332763671875, -0.6867218017578125, 0.011281967163085938, 0.4294624328613281, -0.40694618225097656, 0.20119285583496094, 0.0939483642578125, 0.16976165771484375, 0.551177978515625, 0.603057861328125, -0.5944976806640625, 0.0452117919921875, 0.1495513916015625, 0.10887336730957031, 0.08172607421875, 0.45650482177734375, -0.1711711883544922, 0.581024169921875, 0.486968994140625, 0.00823211669921875, -0.1682281494140625, 0.22481536865234375, -1.1259613037109375, 0.29816436767578125, 1.3144378662109375, 0.0485382080078125, 0.4990692138671875, -0.3581409454345703, 0.34758949279785156, 0.20732498168945312, 1.0596771240234375, 0.03469085693359375, -0.42162322998046875, 0.138946533203125, -0.0748138427734375, -0.5433692932128906, 0.16014862060546875, -0.08758544921875, -0.1077880859375, -0.935089111328125, -0.40483856201171875, 0.12263870239257812, -0.6823883056640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000043.npy"}
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": 0.008136510848999023, "std": 0.4705609083175659, "min": -1.1251373291015625, "p10": -0.5976905822753906, "median": 0.05173492431640625, "p90": 0.5559530258178712, "max": 1.4243621826171875, "pos_frac": 0.53125, "sample": [-0.5844192504882812, -0.20783615112304688, -0.34723663330078125, 0.29635047912597656, 0.7394256591796875, 0.76153564453125, 0.0659027099609375, -0.39058685302734375, -0.6964759826660156, -0.44034576416015625, 0.15702438354492188, 0.8184432983398438, -1.1251373291015625, -0.1387939453125, -0.1575927734375, -0.24658203125, 0.22316741943359375, 0.100128173828125, 0.23986434936523438, 0.06102561950683594, 0.1150665283203125, 0.6026344299316406, 0.2168445587158203, -0.6616363525390625, -0.3680095672607422, 0.1541900634765625, -0.2854461669921875, -0.6726455688476562, -0.6033782958984375, 0.05416107177734375, -0.213104248046875, -0.20465850830078125, 0.04930877685546875, -0.14057159423828125, 0.0055828094482421875, 0.3543262481689453, 1.4103164672851562, 0.22272491455078125, 0.18495941162109375, -0.493377685546875, -0.0837554931640625, -0.11959075927734375, 0.4761962890625, -0.043426513671875, -0.17009735107421875, 0.14737701416015625, 0.20957183837890625, -0.5189666748046875, 0.141326904296875, 0.07050132751464844, -0.6105232238769531, 0.4564361572265625, 0.06784820556640625, 0.1606597900390625, -0.3515663146972656, 0.36898040771484375, 0.5172805786132812, 1.4243621826171875, 0.5725269317626953, 0.2502593994140625, -0.16036224365234375, -0.08039093017578125, -0.23712158203125, -0.8219375610351562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000044.npy"}
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.10993975400924683, "std": 0.3669873774051666, "min": -1.04852294921875, "p10": -0.28884315490722656, "median": 0.17217254638671875, "p90": 0.5208820343017578, "max": 0.7254638671875, "pos_frac": 0.671875, "sample": [-0.12049102783203125, -0.27260589599609375, 0.11293411254882812, 0.0206756591796875, -0.2232227325439453, -0.24401092529296875, -0.3453826904296875, 0.638885498046875, -0.7011871337890625, -0.92279052734375, 0.284393310546875, 0.5063858032226562, 0.21530532836914062, 0.0231781005859375, -0.06561660766601562, 0.057262420654296875, 0.18741226196289062, 0.1535797119140625, -0.0245819091796875, 0.2667198181152344, 0.413055419921875, -0.7305221557617188, 0.43976783752441406, -0.16464996337890625, 0.23128509521484375, 0.503631591796875, 0.7254638671875, 0.3776988983154297, 0.1793212890625, 0.47702789306640625, 0.39398956298828125, 0.2459716796875, -0.267181396484375, 0.204803466796875, 0.5240440368652344, 0.6176300048828125, 0.24361228942871094, 0.11232757568359375, -0.06824493408203125, 0.30542755126953125, -1.04852294921875, 0.421295166015625, -0.37163734436035156, -0.030757904052734375, 0.039398193359375, 0.16344451904296875, -0.290130615234375, 0.1618976593017578, 0.527313232421875, 0.5135040283203125, 0.2508087158203125, 0.3788280487060547, 0.34571075439453125, 0.1650238037109375, -0.0040645599365234375, 0.6046829223632812, 0.5456390380859375, -0.01456451416015625, 0.19882583618164062, -0.2858390808105469, 0.08635330200195312, 0.24066162109375, 0.35101318359375, -0.22404098510742188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000045.npy"}
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.12091103196144104, "std": 0.4480941891670227, "min": -0.8977508544921875, "p10": -0.40353393554687494, "median": 0.10343742370605469, "p90": 0.6621723175048829, "max": 1.1301040649414062, "pos_frac": 0.609375, "sample": [-0.3651885986328125, -0.215728759765625, -0.8977508544921875, 0.1071014404296875, 0.3688507080078125, -0.87646484375, -0.2786140441894531, 0.13487625122070312, -0.3103485107421875, -0.3341064453125, 0.6134605407714844, 0.23361587524414062, 1.083251953125, 1.1301040649414062, -0.4199676513671875, 0.2952156066894531, -0.7531890869140625, 0.03905487060546875, 0.49924468994140625, 0.45896339416503906, 0.3732414245605469, 0.9202880859375, -0.07571983337402344, 0.40245819091796875, 0.2764892578125, 0.27716064453125, 0.0423736572265625, 0.9450607299804688, 0.1619415283203125, 0.1030426025390625, -0.1427764892578125, 0.6249618530273438, 0.30648040771484375, -0.23470306396484375, -0.4266166687011719, 0.68756103515625, 0.25690650939941406, 0.09226417541503906, -0.2296009063720703, 0.4275054931640625, 0.3571929931640625, 0.0256805419921875, 0.3251495361328125, -0.07563018798828125, 0.6781196594238281, -0.12999343872070312, -0.24388885498046875, -0.22842788696289062, -0.057262420654296875, -0.013868331909179688, 0.545135498046875, -0.074005126953125, -0.15188980102539062, 0.006504058837890625, 0.5653076171875, 0.5550765991210938, -0.7313461303710938, -0.4853363037109375, -0.22219085693359375, 0.868316650390625, 0.3880767822265625, 0.09521675109863281, 0.10383224487304688, 0.33783721923828125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000046.npy"}
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.1963297724723816, "std": 0.40259048342704773, "min": -0.7403926849365234, "p10": -0.24058074951171873, "median": 0.22830963134765625, "p90": 0.7196449279785162, "max": 1.2476577758789062, "pos_frac": 0.671875, "sample": [0.4144630432128906, 0.14939117431640625, -0.11449050903320312, 0.208831787109375, -0.023529052734375, 0.10808944702148438, -0.18519020080566406, 1.2476577758789062, -0.1602802276611328, 0.5727310180664062, 0.14905929565429688, -0.22690963745117188, 0.22650909423828125, -0.492889404296875, -0.232879638671875, 0.3081550598144531, 0.3953399658203125, -0.37047576904296875, 1.1485748291015625, -0.06185150146484375, 0.17788314819335938, 0.2716064453125, -0.49039459228515625, 0.251678466796875, 0.05029487609863281, 0.2368621826171875, 0.13399696350097656, 0.42226219177246094, -0.09155654907226562, 0.3228130340576172, 0.014551162719726562, 0.38372802734375, -0.2438812255859375, 0.4865226745605469, -0.30928802490234375, 0.4624786376953125, 0.9366683959960938, 0.43795013427734375, 0.5545558929443359, 0.899078369140625, -0.063140869140625, 0.5195846557617188, -0.12468719482421875, 0.39636993408203125, 0.9531288146972656, 0.23144149780273438, 0.8052978515625, -0.08197784423828125, 0.33519935607910156, 0.27606964111328125, 0.09780120849609375, 0.34674835205078125, -0.6549835205078125, -0.03940582275390625, -0.7403926849365234, 0.42638397216796875, 0.11287498474121094, 0.7826080322265625, 0.4829444885253906, 0.30312347412109375, 0.3973121643066406, 0.23011016845703125, -0.225006103515625, -0.17041587829589844], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000047.npy"}
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.06665191054344177, "std": 0.48112455010414124, "min": -0.9963226318359375, "p10": -0.5225122451782227, "median": 0.06987380981445312, "p90": 0.6417640686035158, "max": 1.4147872924804688, "pos_frac": 0.5625, "sample": [0.4008293151855469, -0.2906036376953125, 0.2426891326904297, -0.24102020263671875, 0.2256011962890625, 1.0649604797363281, -0.9063568115234375, -0.303375244140625, -0.5307464599609375, 0.189605712890625, 0.32091522216796875, 0.036373138427734375, -0.3560066223144531, -0.7522125244140625, 0.6692047119140625, 0.8335113525390625, 0.4181632995605469, 0.11603736877441406, -0.9731903076171875, 0.6538467407226562, 0.075653076171875, 0.2940673828125, -0.119354248046875, 0.47948455810546875, 0.36980438232421875, 0.23274993896484375, -0.080230712890625, 0.21001815795898438, -0.41655731201171875, 0.5784988403320312, -0.06232452392578125, 0.3897552490234375, -0.11825370788574219, 0.3255157470703125, 0.3703765869140625, 0.06409454345703125, 0.029449462890625, 0.007171630859375, 0.9384536743164062, -0.022584915161132812, -0.3002033233642578, 0.4960784912109375, -0.3059539794921875, 0.18382644653320312, -0.9963226318359375, -0.014257431030273438, 0.1392974853515625, 0.34051513671875, -0.2183074951171875, -0.3530082702636719, 0.6135711669921875, -0.14870834350585938, -0.5267848968505859, 0.19001007080078125, -0.05573844909667969, 0.35950469970703125, 1.4147872924804688, -0.05944633483886719, -0.3757476806640625, -0.5446319580078125, 0.16571044921875, 0.9147491455078125, -0.5046882629394531, -0.512542724609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000048.npy"}
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": 0.07756048440933228, "std": 0.47563436627388, "min": -1.3580322265625, "p10": -0.34002151489257815, "median": 0.11290836334228516, "p90": 0.4589414596557618, "max": 2.0029144287109375, "pos_frac": 0.65625, "sample": [0.09137725830078125, -0.18664169311523438, 0.11716651916503906, 0.450469970703125, 0.1338348388671875, 0.08340263366699219, 0.20541954040527344, 0.4625720977783203, -0.3140144348144531, -0.27021026611328125, -0.0832977294921875, 0.014514923095703125, 0.10865020751953125, 0.28411102294921875, -0.08511734008789062, -0.33660888671875, -0.5653896331787109, 0.09625244140625, 0.09859466552734375, -1.185760498046875, 0.15190887451171875, 0.10789871215820312, 0.04592704772949219, 0.37837982177734375, 0.12858963012695312, 0.5048446655273438, 0.04979705810546875, 0.07243156433105469, 0.36212730407714844, -0.41619873046875, 0.3492908477783203, 0.226715087890625, 0.24533843994140625, 0.3490886688232422, 0.3572254180908203, 0.6396331787109375, -0.21872711181640625, 0.383636474609375, 0.7027435302734375, 0.7410202026367188, 0.364990234375, -0.1733245849609375, -0.008253097534179688, 0.1272125244140625, -0.34148406982421875, 0.14048004150390625, 0.31797027587890625, 0.2933177947998047, 0.12818527221679688, -0.017360687255859375, 0.6535415649414062, 0.4145965576171875, -1.3580322265625, 0.26572418212890625, -0.27338409423828125, -1.009735107421875, -0.057483673095703125, 2.0029144287109375, 0.33417510986328125, 0.1375141143798828, -0.26214599609375, -0.00572967529296875, -0.0160980224609375, -0.9747161865234375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000049.npy"}
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": 0.10448768734931946, "std": 0.4159475564956665, "min": -1.1039505004882812, "p10": -0.37369537353515625, "median": 0.14220809936523438, "p90": 0.5939674377441407, "max": 1.101837158203125, "pos_frac": 0.625, "sample": [0.6097946166992188, 0.5134124755859375, 0.0435638427734375, -0.0997314453125, 0.3477020263671875, 0.1346588134765625, 0.4123115539550781, -0.7647018432617188, 1.0327529907226562, 0.43768310546875, 0.5340251922607422, 0.17986297607421875, -0.37708282470703125, -0.30205535888671875, 0.2566184997558594, -0.3358879089355469, 0.4351959228515625, 0.3349609375, 0.6465740203857422, 0.33658599853515625, 0.22881317138671875, 0.14975738525390625, -0.5466823577880859, 0.11225128173828125, 0.5329437255859375, 1.101837158203125, -0.4584922790527344, 0.326904296875, 0.06605148315429688, 0.31572723388671875, 0.6447696685791016, 0.15394973754882812, 0.32354736328125, 0.2565956115722656, -1.1039505004882812, -0.36579132080078125, 0.3872833251953125, 0.7161865234375, 0.028228759765625, -0.45456695556640625, 0.6950836181640625, 0.557037353515625, -0.0541839599609375, -0.30001068115234375, 0.15353775024414062, 0.07698249816894531, -0.13861083984375, 0.003021240234375, -0.346649169921875, 0.39514923095703125, -0.03306388854980469, -0.18526268005371094, -0.14603042602539062, 0.22471237182617188, 0.4989013671875, -0.0877685546875, -0.65155029296875, 0.15665054321289062, -0.30963897705078125, -0.1861438751220703, -0.1475982666015625, -0.06954193115234375, 0.07979774475097656, -0.289215087890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000050.npy"}
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": 0.16660377383232117, "std": 0.538969099521637, "min": -1.0838623046875, "p10": -0.49688720703125, "median": 0.13820362091064453, "p90": 0.69995174407959, "max": 1.68212890625, "pos_frac": 0.59375, "sample": [0.441650390625, 0.3230705261230469, 0.03584098815917969, -0.0735931396484375, -0.0447845458984375, 0.545074462890625, -0.02864837646484375, 0.21648406982421875, -0.6049251556396484, -0.45786285400390625, -0.0482330322265625, 0.2812614440917969, -0.5091133117675781, 1.2175216674804688, -0.13558197021484375, 0.481597900390625, 0.21875, 0.375274658203125, 0.5713043212890625, 0.180419921875, 0.34796905517578125, 0.2648487091064453, 0.7177524566650391, 0.458465576171875, 0.17525482177734375, -1.0838623046875, -0.0484771728515625, -0.5714740753173828, -0.18895339965820312, 0.5555362701416016, 0.01805877685546875, 0.658416748046875, 0.10115242004394531, 0.417724609375, -0.6716232299804688, -0.11124038696289062, 0.6109580993652344, -0.4937095642089844, -0.4982490539550781, 0.2196807861328125, -0.16347503662109375, -0.167510986328125, 0.2684783935546875, 1.245758056640625, 0.9202308654785156, -0.19190216064453125, -0.7356491088867188, 1.68212890625, -0.4334259033203125, 1.2646713256835938, 0.2918052673339844, 0.5959014892578125, 0.617095947265625, -0.13525390625, 0.06649017333984375, -0.2721595764160156, -0.1525726318359375, 1.6068763732910156, 0.36896514892578125, -0.08100509643554688, -0.36499786376953125, 0.4474945068359375, 0.07222366333007812, 0.048736572265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000051.npy"}
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.2083311527967453, "std": 0.5620430707931519, "min": -0.87713623046875, "p10": -0.3866693496704101, "median": 0.12426280975341797, "p90": 0.9470657348632815, "max": 1.9854278564453125, "pos_frac": 0.578125, "sample": [1.4514312744140625, -0.047393798828125, 0.9680633544921875, 0.9695587158203125, 0.5852890014648438, -0.3145904541015625, 0.3542919158935547, 0.0135345458984375, -0.4147071838378906, 1.5355072021484375, 0.6406402587890625, 0.561279296875, -0.0028228759765625, 1.1905517578125, 1.0313720703125, -0.87713623046875, 0.8778152465820312, -0.17343902587890625, -0.1388397216796875, -0.2774200439453125, -0.8420524597167969, -0.1925182342529297, 0.4430694580078125, -0.18724822998046875, -0.0355072021484375, 0.11163330078125, -0.3947906494140625, -0.17212867736816406, 0.462188720703125, 0.30718231201171875, 0.2675056457519531, -0.3134765625, 0.08330535888671875, -0.3677196502685547, 0.2988548278808594, 0.8259811401367188, 0.14686965942382812, 0.21249771118164062, 0.2049407958984375, -0.45111083984375, 0.001178741455078125, 0.5423049926757812, 0.15004539489746094, 0.3815479278564453, 0.04134178161621094, 0.5825424194335938, -0.05496501922607422, -0.19646453857421875, 1.9854278564453125, -0.1743316650390625, 0.7007980346679688, 0.5912322998046875, -0.08316802978515625, 0.4617958068847656, 0.68133544921875, -0.6622085571289062, 0.8980712890625, -0.6419258117675781, -0.187255859375, 0.13689231872558594, 0.21963882446289062, -0.16350173950195312, -0.1663227081298828, -0.05127716064453125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000052.npy"}
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.17890891432762146, "std": 0.4739670157432556, "min": -1.2016067504882812, "p10": -0.36383304595947263, "median": 0.26430320739746094, "p90": 0.5988399505615235, "max": 1.249176025390625, "pos_frac": 0.671875, "sample": [0.4122886657714844, 0.5890579223632812, 0.5793685913085938, 0.1253814697265625, 0.2513389587402344, 0.2083110809326172, 0.19424819946289062, 0.39638519287109375, 0.2368621826171875, -0.00565338134765625, 0.33232879638671875, 0.3077850341796875, -0.25716400146484375, 0.5916061401367188, -0.18514251708984375, 0.18974685668945312, 0.3105964660644531, 1.0729598999023438, -0.15433692932128906, -1.017852783203125, 0.19419097900390625, 0.48450469970703125, 0.5683746337890625, 0.4157218933105469, -1.2016067504882812, 0.4497528076171875, -0.37672996520996094, 0.5172843933105469, 0.2582206726074219, 0.3957061767578125, 0.4386272430419922, -0.973236083984375, -0.2975616455078125, -0.6685638427734375, 0.33957672119140625, 0.3691253662109375, 0.4399433135986328, 0.9369049072265625, -0.05195426940917969, -0.4743232727050781, -0.21266937255859375, -0.333740234375, 0.6411266326904297, -0.2300567626953125, 0.5882034301757812, 0.374114990234375, 0.0738983154296875, 0.89599609375, 0.5376663208007812, 0.508544921875, 0.44072723388671875, 0.39769744873046875, -0.23212242126464844, 0.6019401550292969, 0.810272216796875, -0.3201103210449219, 1.249176025390625, -0.06476974487304688, 0.2703857421875, -0.031345367431640625, -0.2640800476074219, 0.22576904296875, -0.4221820831298828, 0.00365447998046875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000053.npy"}
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.12304225564002991, "std": 0.5162787437438965, "min": -1.3615646362304688, "p10": -0.43496742248535153, "median": 0.1265277862548828, "p90": 0.6194915771484375, "max": 1.6767120361328125, "pos_frac": 0.640625, "sample": [0.11428451538085938, 1.6767120361328125, 0.12705230712890625, 0.20027542114257812, 0.12600326538085938, -0.32448577880859375, 0.8859653472900391, 0.34792327880859375, 0.5024948120117188, -0.29431915283203125, -0.30803680419921875, -0.0019626617431640625, 0.23754119873046875, -1.3615646362304688, 0.009801864624023438, 0.07221794128417969, 0.161285400390625, 0.3175697326660156, 0.14689254760742188, 0.0968475341796875, 0.12204742431640625, 1.5569915771484375, 0.3045368194580078, 0.4338054656982422, -0.530975341796875, 0.5747737884521484, 0.4050731658935547, -0.261444091796875, 0.15064239501953125, 0.036865234375, -0.2205047607421875, -0.25344085693359375, -0.28427886962890625, -0.45944976806640625, 0.40290069580078125, 0.1092681884765625, 0.33516693115234375, 0.46192169189453125, 1.0285758972167969, 0.8047676086425781, 0.21283340454101562, 0.34616851806640625, 0.7046318054199219, 0.48215484619140625, -0.3778419494628906, 0.6132431030273438, 0.5116386413574219, 0.38109588623046875, -0.04437255859375, -0.0441131591796875, -0.312774658203125, -0.568634033203125, 0.1314239501953125, 0.03269195556640625, 0.5890731811523438, -0.3103179931640625, -0.2592601776123047, -0.3466510772705078, 0.6221694946289062, -0.025341033935546875, -0.47711181640625, -1.2011642456054688, 0.3482818603515625, -0.5828609466552734], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000054.npy"}
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.30653250217437744, "std": 0.7119595408439636, "min": -0.9828872680664062, "p10": -0.40602531433105465, "median": 0.23076343536376953, "p90": 1.2391250610351563, "max": 2.562774658203125, "pos_frac": 0.625, "sample": [1.5690689086914062, 0.3108367919921875, 0.11407470703125, -0.100494384765625, 0.16039657592773438, 0.412506103515625, 0.4169654846191406, -0.4103202819824219, -0.5594158172607422, 0.01721954345703125, -0.001697540283203125, 0.5689544677734375, -0.84033203125, 0.31433868408203125, 0.2817516326904297, 1.2347564697265625, 1.247039794921875, 1.0818862915039062, -0.2575225830078125, -0.137176513671875, 0.18239974975585938, -0.198455810546875, 0.48740386962890625, 0.292633056640625, 2.562774658203125, 0.2765483856201172, 0.4418525695800781, 1.0672760009765625, -0.0594940185546875, -0.1775360107421875, 0.5621681213378906, 1.6644439697265625, -0.0668487548828125, -0.7206954956054688, -0.2169189453125, -0.39600372314453125, 0.4886789321899414, 0.12142181396484375, 0.3411216735839844, -0.18567276000976562, -0.00070953369140625, 1.1664657592773438, 0.1999359130859375, 1.240997314453125, 0.3512725830078125, 0.26159095764160156, 1.634490966796875, -0.5146026611328125, 0.19387054443359375, -0.8503875732421875, 0.105865478515625, 0.37413597106933594, -0.3419189453125, -0.28179168701171875, -0.3268852233886719, 0.8294486999511719, 0.6533355712890625, 0.42445945739746094, 0.6244258880615234, 2.33709716796875, -0.2690887451171875, -0.24915313720703125, 1.1481781005859375, -0.9828872680664062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000055.npy"}
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.17014756798744202, "std": 0.731311023235321, "min": -3.0560302734375, "p10": -0.4399681091308593, "median": 0.15215301513671875, "p90": 0.8971805572509766, "max": 1.9299774169921875, "pos_frac": 0.609375, "sample": [-0.19118499755859375, 0.3549995422363281, -0.025102615356445312, -0.29131507873535156, 0.49106597900390625, -0.31043243408203125, 0.8202285766601562, -1.361663818359375, -0.2636985778808594, 0.15377044677734375, 0.2872428894042969, -0.635894775390625, -0.2343597412109375, -0.0378570556640625, 0.5538101196289062, 1.4290847778320312, -0.12531280517578125, -0.7195873260498047, 0.037994384765625, 1.3859443664550781, -0.46804046630859375, 1.9299774169921875, 0.4378337860107422, 0.5247039794921875, 0.4184608459472656, 0.74615478515625, -0.3491973876953125, 0.414154052734375, -0.05850982666015625, 0.3765869140625, -0.15879440307617188, 0.5651016235351562, 1.2811355590820312, 0.8368587493896484, 0.647064208984375, -0.3744659423828125, -3.0560302734375, 0.4405345916748047, 0.0244140625, 0.19527435302734375, 0.6664562225341797, 1.3643646240234375, -0.13580322265625, -0.36217498779296875, 0.1378936767578125, 0.050079345703125, -0.1679229736328125, 0.5124282836914062, -0.07003974914550781, 0.3279743194580078, 0.8950309753417969, 0.41156768798828125, 0.898101806640625, 0.5388984680175781, -1.414520263671875, 0.08610153198242188, -0.5627098083496094, 0.28515625, 0.7913017272949219, -0.1085662841796875, 0.15053558349609375, 1.0317840576171875, 0.08051300048828125, -0.20795440673828125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000056.npy"}
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.29252129793167114, "std": 0.6844032406806946, "min": -0.866485595703125, "p10": -0.6358135223388671, "median": 0.3721466064453125, "p90": 0.8929435729980469, "max": 2.2098846435546875, "pos_frac": 0.703125, "sample": [-0.8174514770507812, -0.6764450073242188, 0.50482177734375, 1.938323974609375, -0.15410804748535156, 0.8105926513671875, 0.445098876953125, -0.42873382568359375, 0.6465263366699219, 0.8824920654296875, 1.5394134521484375, 0.76434326171875, -0.8131866455078125, -0.1558513641357422, 0.6009292602539062, -0.4947471618652344, 0.0722503662109375, 0.37537384033203125, 2.1504440307617188, 0.5117378234863281, 0.4531707763671875, -0.477783203125, 0.5990676879882812, 0.196685791015625, -0.24893951416015625, -0.5771446228027344, 0.36891937255859375, 0.485931396484375, 0.0884552001953125, -0.6609573364257812, 0.8974227905273438, 0.08725929260253906, 0.6060142517089844, 0.08849334716796875, 0.1096649169921875, 0.22837066650390625, -0.29754638671875, 0.47917747497558594, -0.866485595703125, 0.8096923828125, 0.00034332275390625, -0.3771171569824219, -0.8434829711914062, 0.6349678039550781, -0.68597412109375, 0.220611572265625, 0.4443206787109375, 0.4669342041015625, 0.5195770263671875, 0.31501197814941406, 0.6830177307128906, 0.5396804809570312, 0.10028076171875, 2.2098846435546875, 0.4003620147705078, 1.6057052612304688, 0.4907073974609375, 0.48960113525390625, 0.21491432189941406, 0.5950088500976562, -0.11205673217773438, -0.04974555969238281, 1.2889633178710938, -0.5014457702636719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000057.npy"}
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.22854623198509216, "std": 0.9707141518592834, "min": -1.383636474609375, "p10": -0.6786399841308594, "median": 0.03425788879394531, "p90": 1.244055938720703, "max": 4.233001708984375, "pos_frac": 0.515625, "sample": [-0.9403610229492188, 0.9036407470703125, 0.7563514709472656, -0.15919113159179688, 0.6813201904296875, -0.5499343872070312, -0.0673065185546875, 0.904449462890625, -0.0041351318359375, -0.3764190673828125, 0.6666679382324219, -0.5474376678466797, 1.236328125, -0.6893768310546875, -0.5279731750488281, 0.5992584228515625, 1.2473678588867188, -0.9759521484375, 0.3438720703125, 0.4880523681640625, 0.22544288635253906, -0.13712310791015625, 0.12519454956054688, 0.60333251953125, -0.5282211303710938, 0.49440765380859375, -0.6535873413085938, -0.20258331298828125, -0.6168994903564453, 0.8652267456054688, -0.00984954833984375, -0.9150543212890625, -0.04320526123046875, -0.20672607421875, 4.233001708984375, -0.2864990234375, -1.383636474609375, 3.5257415771484375, 1.7507476806640625, 0.5484542846679688, -0.70062255859375, 0.07631301879882812, 0.019832611083984375, -0.5344696044921875, -0.437164306640625, 2.3728790283203125, -0.31938934326171875, -0.8224945068359375, -0.105804443359375, 1.4137344360351562, 0.059967041015625, 1.3050994873046875, 0.5991134643554688, -0.18241119384765625, 0.09090042114257812, 1.21807861328125, 0.14371299743652344, -0.6139602661132812, -0.2580413818359375, 0.56805419921875, 0.04868316650390625, -0.32193756103515625, 0.05897331237792969, 0.570526123046875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000058.npy"}
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.27108943462371826, "std": 0.9215839505195618, "min": -1.2245101928710938, "p10": -0.7344772338867186, "median": 0.10557842254638672, "p90": 1.4193466186523442, "max": 3.0884246826171875, "pos_frac": 0.53125, "sample": [-0.4832611083984375, 0.44353485107421875, 0.01312255859375, -0.7858314514160156, -0.2868919372558594, -0.6146507263183594, 0.40770721435546875, -1.2245101928710938, 0.3152008056640625, 0.6269378662109375, 2.90985107421875, 0.20425033569335938, 1.6612548828125, 0.8199310302734375, 0.75299072265625, -0.27501487731933594, 1.2388916015625, -0.8075637817382812, -0.3719482421875, 0.3759307861328125, -0.026214599609375, -0.1393585205078125, -0.4099273681640625, -0.05883026123046875, -0.23056793212890625, 0.7676963806152344, -0.4381370544433594, 1.2841339111328125, 2.757537841796875, 0.7371597290039062, -0.01078033447265625, 0.29253387451171875, 0.6605701446533203, -0.14940643310546875, -0.4857025146484375, 0.09461593627929688, -0.10689163208007812, 0.6086578369140625, -0.8027229309082031, -0.2583141326904297, 3.0884246826171875, 0.6987075805664062, 1.6864471435546875, 0.9050464630126953, -0.5276775360107422, 0.4765453338623047, -0.7873306274414062, 0.9552993774414062, 0.1919116973876953, 1.477294921875, -0.4536476135253906, 0.9899711608886719, -0.4549903869628906, -1.139739990234375, 0.493194580078125, -0.2917594909667969, -1.224151611328125, 0.7051620483398438, -0.40875244140625, -0.40172576904296875, 0.11654090881347656, 1.6792449951171875, 1.1068115234375, -0.5370864868164062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000059.npy"}
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": 0.03688898682594299, "std": 0.8526102900505066, "min": -2.22406005859375, "p10": -0.9704170227050781, "median": 0.13277244567871094, "p90": 0.9272668838500978, "max": 2.372039794921875, "pos_frac": 0.59375, "sample": [0.4442176818847656, -0.30126190185546875, 0.5403251647949219, 0.04297637939453125, 0.0609893798828125, 0.9489631652832031, -1.8446197509765625, -0.8925323486328125, 0.2846717834472656, -2.0987091064453125, 0.2977485656738281, 0.4252777099609375, -0.46930885314941406, 0.017908096313476562, -1.067047119140625, 0.22246551513671875, -0.38669586181640625, 1.0399150848388672, 1.5304241180419922, 0.073822021484375, -0.8999786376953125, 0.288787841796875, 0.2292327880859375, -0.9441680908203125, 0.15042877197265625, -0.1861419677734375, 0.4457550048828125, -0.08333778381347656, 0.9731674194335938, -1.1103782653808594, 0.23572921752929688, -0.32497406005859375, 1.1060638427734375, 0.3889923095703125, -0.4461097717285156, 0.6026153564453125, 0.7223491668701172, -0.7204818725585938, 0.026050567626953125, -0.8378047943115234, -0.9816665649414062, 0.759674072265625, 0.405426025390625, -1.7793960571289062, -0.022287368774414062, 1.671051025390625, -0.3105812072753906, -0.432525634765625, 0.7022781372070312, 0.11511611938476562, 0.857208251953125, 0.38671875, 0.6823329925537109, 0.7605743408203125, 2.372039794921875, -0.397491455078125, 0.2959098815917969, 0.7280044555664062, 0.19635772705078125, -0.010650634765625, -2.22406005859375, -0.7215824127197266, 0.8766422271728516, -0.053524017333984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000060.npy"}
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.03861680626869202, "std": 1.014214038848877, "min": -2.6049346923828125, "p10": -1.4496009826660157, "median": 0.1800079345703125, "p90": 1.1563148498535158, "max": 2.2121047973632812, "pos_frac": 0.5625, "sample": [0.802734375, 0.6501617431640625, 1.4725341796875, 1.8580398559570312, -0.3016204833984375, 0.39454078674316406, 0.209503173828125, 2.2121047973632812, 0.6923828125, -0.16139602661132812, -0.09091949462890625, 0.3124370574951172, -0.6611251831054688, 0.24468994140625, 0.3093414306640625, -0.5214271545410156, 0.376800537109375, -0.33516693115234375, 0.01760101318359375, 0.3458843231201172, 0.12607574462890625, 0.3579425811767578, 0.8468475341796875, -0.10956001281738281, 0.0060405731201171875, -0.27931785583496094, -1.398956298828125, -2.6049346923828125, -0.01348876953125, -0.16005325317382812, 0.8963279724121094, -0.42608642578125, -1.0397529602050781, 1.056732177734375, -0.827606201171875, 0.5423469543457031, -0.0768585205078125, -0.7918720245361328, -0.49469757080078125, 1.174591064453125, 1.64886474609375, -1.8349990844726562, 0.15676116943359375, -1.4713058471679688, 0.4316253662109375, 1.4650611877441406, 0.5154018402099609, 0.33545684814453125, -0.5821571350097656, -0.061992645263671875, 1.1136703491210938, -1.8415756225585938, -0.08472442626953125, -0.426116943359375, 0.6956863403320312, -2.4513015747070312, 1.6465301513671875, 0.533905029296875, 0.3834991455078125, 0.20325469970703125, 1.0685043334960938, 0.5804061889648438, -2.2976722717285156, -1.8661270141601562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000061.npy"}
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.2759028971195221, "std": 0.920768141746521, "min": -2.23480224609375, "p10": -0.7388206481933594, "median": 0.18893814086914062, "p90": 1.4355678558349618, "max": 2.3271026611328125, "pos_frac": 0.703125, "sample": [1.2047882080078125, -1.550405502319336, -0.4910888671875, 0.15940284729003906, 1.1511077880859375, 0.08715057373046875, 1.2214241027832031, -0.5833053588867188, -1.3973464965820312, 0.010009765625, 2.0555267333984375, -0.08021736145019531, 1.6650161743164062, -1.587005615234375, 1.735809326171875, 0.1554718017578125, 0.7227020263671875, 1.0923919677734375, 0.273529052734375, 0.4456634521484375, 0.0305633544921875, -0.8179969787597656, -0.663116455078125, 0.35002899169921875, 0.4989280700683594, 0.07073974609375, -0.2340240478515625, 0.9600906372070312, 0.7423629760742188, 0.8973846435546875, -0.7446365356445312, -0.03383636474609375, 0.32018089294433594, 0.1334228515625, 1.52734375, -2.23480224609375, -0.725250244140625, 1.828256607055664, 0.16396713256835938, 0.18843841552734375, 0.177703857421875, 0.8349990844726562, 0.1302337646484375, 0.3989715576171875, 0.9687423706054688, 0.1894378662109375, 0.17376708984375, -0.4933738708496094, 0.457733154296875, 1.1069869995117188, -0.1149749755859375, 0.9053440093994141, 0.8710422515869141, 0.3455009460449219, -0.659576416015625, 0.096466064453125, 2.247344970703125, 2.3271026611328125, -0.2723655700683594, 0.3679046630859375, 0.37934112548828125, -0.23680496215820312, -1.6360321044921875, 0.5436210632324219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000062.npy"}
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.604219913482666, "std": 0.9599465131759644, "min": -2.197723388671875, "p10": -0.3177566528320312, "median": 0.5853366851806641, "p90": 1.5044158935546876, "max": 3.524505615234375, "pos_frac": 0.78125, "sample": [0.8290328979492188, 0.303009033203125, -0.20787811279296875, -0.216766357421875, 3.524505615234375, -0.4519500732421875, -0.3457202911376953, 3.471343994140625, 0.8381729125976562, 0.29416656494140625, -2.197723388671875, -0.41973114013671875, 0.4650230407714844, -1.912384033203125, 0.9619789123535156, 0.51177978515625, 1.5294647216796875, 0.740570068359375, 0.5845375061035156, 0.8599891662597656, 0.11081695556640625, 0.36026954650878906, 1.2525520324707031, 1.8364715576171875, 0.5861358642578125, 1.0801887512207031, 1.2455291748046875, 0.2715568542480469, -0.052349090576171875, 0.9342727661132812, 0.7603950500488281, 0.8454132080078125, 1.3785667419433594, 0.6461257934570312, 0.1427459716796875, -0.2869415283203125, 0.7039451599121094, -0.8812103271484375, 0.48284149169921875, 0.3493194580078125, 0.6736183166503906, 2.67156982421875, 0.111968994140625, 1.4459686279296875, 0.41396141052246094, 0.9389743804931641, 0.74847412109375, 1.37847900390625, 0.7375335693359375, -0.330963134765625, 0.12293815612792969, 0.7379360198974609, -0.003910064697265625, -0.18920135498046875, 0.706451416015625, 0.4485321044921875, 1.0576438903808594, 0.40727806091308594, -0.027345657348632812, 0.07849311828613281, 1.1499519348144531, 2.389434814453125, 1.9814910888671875, 0.0927276611328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000063.npy"}
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.023882657289505005, "std": 0.9693297147750854, "min": -4.1880035400390625, "p10": -0.7266847610473632, "median": 0.02706146240234375, "p90": 0.7856399536132813, "max": 2.10516357421875, "pos_frac": 0.53125, "sample": [1.5003433227539062, 2.10516357421875, -0.8113498687744141, -1.04510498046875, -0.19224166870117188, 0.40869903564453125, 0.7216358184814453, 0.10200119018554688, 0.4105854034423828, 1.9244308471679688, 0.7584953308105469, 0.31763458251953125, 1.1305084228515625, -0.220733642578125, 0.7206764221191406, -0.10808563232421875, -0.2632904052734375, -0.1782073974609375, -0.0441436767578125, 0.5422344207763672, 0.765472412109375, 0.16546630859375, 0.04412841796875, -0.1629657745361328, -1.302032470703125, -0.4297447204589844, 0.00110626220703125, -4.1880035400390625, -0.52459716796875, 0.7061061859130859, 0.11905479431152344, -0.2145233154296875, 0.1913299560546875, -0.65777587890625, 0.41402626037597656, -0.0584564208984375, 0.7779388427734375, -2.9842376708984375, -0.6648502349853516, 0.7889404296875, 0.7694473266601562, 0.0099945068359375, 0.7334365844726562, -0.41089820861816406, 0.4098663330078125, -2.21954345703125, -0.3337669372558594, 0.2968597412109375, -0.24133682250976562, 1.5166244506835938, -0.7531852722167969, -0.32292747497558594, 0.9590377807617188, 0.4211883544921875, -0.06020164489746094, 0.3584861755371094, 0.4108085632324219, -0.11827468872070312, -0.62298583984375, -0.35536956787109375, 0.3674468994140625, -0.23325347900390625, -0.1790447235107422, 0.5604476928710938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000064.npy"}
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.5417026281356812, "std": 1.2936222553253174, "min": -3.1287460327148438, "p10": -1.1065563201904296, "median": 0.44739437103271484, "p90": 2.046487236022949, "max": 4.73199462890625, "pos_frac": 0.703125, "sample": [0.28802490234375, 1.2452621459960938, 1.83135986328125, 0.2998504638671875, -0.033046722412109375, 1.0035667419433594, -1.4092864990234375, -0.2120513916015625, -0.6873741149902344, 2.0586471557617188, -0.34625244140625, 0.5584754943847656, 1.2770366668701172, 0.8687362670898438, 0.8086395263671875, 0.9958629608154297, 0.4473705291748047, 0.4978179931640625, -1.2210845947265625, 2.796783447265625, 0.1295623779296875, 0.1636962890625, -1.9965667724609375, 2.1656646728515625, 2.0181140899658203, 0.045040130615234375, -3.1287460327148438, 0.3283119201660156, -0.8951950073242188, -0.0968170166015625, 0.1424407958984375, 0.5250091552734375, -1.0721817016601562, -1.1212882995605469, 0.18634796142578125, 2.7785797119140625, 0.4381103515625, 2.6739349365234375, 1.58929443359375, 0.8587722778320312, 1.6779861450195312, 0.29971885681152344, 1.8896102905273438, 1.1280403137207031, -0.7110061645507812, 0.292449951171875, 0.5308361053466797, 0.326446533203125, -1.12164306640625, -0.29109954833984375, 0.724395751953125, 0.659454345703125, -0.22763824462890625, 0.8052825927734375, 0.447418212890625, 1.2536163330078125, 3.2178878784179688, -0.17473220825195312, 1.76055908203125, 1.3391342163085938, 0.870941162109375, -1.5273284912109375, 4.73199462890625, -0.03377723693847656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000065.npy"}
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.3192029595375061, "std": 1.4191575050354004, "min": -3.4783935546875, "p10": -1.3785087585449216, "median": 0.3373394012451172, "p90": 1.7964023590087892, "max": 4.665740966796875, "pos_frac": 0.609375, "sample": [0.11260414123535156, 3.926605224609375, 0.8100490570068359, 0.6590938568115234, 0.2768821716308594, 1.1494293212890625, 0.5999088287353516, 0.4963874816894531, -0.10321617126464844, 0.4119853973388672, -0.9954833984375, 0.7944526672363281, 4.665740966796875, -0.241668701171875, -1.083740234375, -2.2174453735351562, -0.23488616943359375, 0.6240463256835938, 1.672393798828125, 2.0284500122070312, 0.0794677734375, 1.7878227233886719, -1.98699951171875, 0.3617973327636719, 1.2278900146484375, 0.99578857421875, 0.3996467590332031, 1.1231498718261719, 1.3097686767578125, 0.3128814697265625, 0.9691162109375, 0.9557609558105469, -0.229949951171875, -0.4820690155029297, -0.1634960174560547, -0.46372222900390625, 0.5143146514892578, 0.5012969970703125, -1.468719482421875, 0.0371551513671875, -3.4783935546875, -2.595245361328125, -0.20412445068359375, -0.18210601806640625, -0.6788101196289062, -1.7511367797851562, 2.1091880798339844, -0.12311553955078125, -0.6543426513671875, 1.0651988983154297, -0.2380809783935547, 0.39385223388671875, 0.2618675231933594, 4.0744781494140625, 2.4296798706054688, -1.1785202026367188, 1.2025833129882812, -0.889434814453125, -1.4642181396484375, 1.800079345703125, 1.0124435424804688, -0.38530921936035156, 0.47391510009765625, 0.296051025390625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000066.npy"}
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.3212243616580963, "std": 1.02715003490448, "min": -2.6688385009765625, "p10": -0.7300636291503906, "median": 0.27033424377441406, "p90": 1.5173274993896497, "max": 3.09423828125, "pos_frac": 0.671875, "sample": [0.546356201171875, -1.404144287109375, -0.5467758178710938, -0.4794158935546875, -1.2749252319335938, 0.36299896240234375, 1.6794090270996094, 1.0013008117675781, 0.00048065185546875, -0.5060844421386719, 1.6456260681152344, 0.053638458251953125, -0.7425537109375, -1.4625473022460938, 1.1715545654296875, 0.9991855621337891, -0.2089099884033203, 0.5938453674316406, -0.26584625244140625, 1.0113067626953125, -0.2407684326171875, 0.9797134399414062, -2.6688385009765625, 2.9994354248046875, 0.3759040832519531, 1.183349609375, 0.5921707153320312, 0.43781280517578125, 1.0156402587890625, -0.5155143737792969, 0.26624298095703125, 0.2744255065917969, 0.9590835571289062, 1.66119384765625, 0.795745849609375, 0.12670516967773438, 0.7929306030273438, 0.532745361328125, -0.367828369140625, 0.9071083068847656, 2.0031280517578125, 0.01802825927734375, 1.1700668334960938, 1.2179641723632812, 0.7090797424316406, 0.3770637512207031, -0.2358245849609375, -0.3051414489746094, 0.18636322021484375, -0.7009201049804688, -0.12201309204101562, 3.09423828125, 0.17074203491210938, 0.14898109436035156, -0.195068359375, 0.1511383056640625, 1.0776290893554688, 0.07765579223632812, -1.0622634887695312, -2.0434951782226562, 2.060443878173828, -0.35379791259765625, 0.03570556640625, 0.7968978881835938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000067.npy"}
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.32454001903533936, "std": 1.4058586359024048, "min": -4.546844482421875, "p10": -1.075501251220703, "median": 0.39672183990478516, "p90": 1.5631034851074221, "max": 4.7558746337890625, "pos_frac": 0.6875, "sample": [4.7558746337890625, 1.586090087890625, 0.06525802612304688, 1.4850921630859375, -0.47670745849609375, 3.0285263061523438, 0.627593994140625, -0.6767463684082031, 0.9561824798583984, -0.0579833984375, 1.7646865844726562, -4.546844482421875, 1.5094680786132812, 1.0265426635742188, 0.4933624267578125, 0.7641563415527344, -0.37358856201171875, -0.6802330017089844, 0.39776611328125, 1.1180953979492188, 1.3918304443359375, 0.27422523498535156, 0.4534149169921875, 1.0039024353027344, 0.863037109375, 0.9617767333984375, 0.52899169921875, -0.47495269775390625, 1.3303375244140625, -0.879241943359375, -0.3537425994873047, 0.3699798583984375, 0.3956775665283203, 0.34151458740234375, -1.1279792785644531, 0.19188308715820312, 3.917724609375, 0.4905509948730469, 0.058628082275390625, 0.09305381774902344, -2.2555999755859375, -1.2876396179199219, 0.6167144775390625, 0.6367721557617188, -1.9654617309570312, -2.966461181640625, 0.28661346435546875, 0.1242523193359375, -0.9530525207519531, -0.44603729248046875, 2.121612548828125, 0.82415771484375, -0.31870269775390625, 1.1562271118164062, 1.0306549072265625, 0.3524169921875, 0.573089599609375, -0.02748870849609375, 2.0316619873046875, 0.6601810455322266, 0.6803436279296875, -0.793304443359375, 0.1333160400390625, -2.0409088134765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000068.npy"}
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.6868542432785034, "std": 1.4825056791305542, "min": -2.5848121643066406, "p10": -0.9115261077880858, "median": 0.6066455841064453, "p90": 2.2110664367675783, "max": 6.534576416015625, "pos_frac": 0.671875, "sample": [-0.06138420104980469, 0.07860183715820312, 3.048492431640625, 0.0050201416015625, 1.0997467041015625, -0.448822021484375, -0.6826515197753906, -0.35132789611816406, 1.9965362548828125, 1.7616424560546875, 0.4768657684326172, 0.3954887390136719, 0.7861957550048828, 1.9411125183105469, 1.1946220397949219, 2.1348724365234375, 0.6853218078613281, 1.7488555908203125, 3.15509033203125, -1.30560302734375, 4.0586090087890625, -2.5842971801757812, 1.398712158203125, -0.02803802490234375, 2.69842529296875, -0.37624359130859375, 0.570709228515625, 1.1215362548828125, 1.0350494384765625, -0.0222625732421875, 0.8418998718261719, -0.516571044921875, 0.6425819396972656, 0.7442398071289062, 0.4856529235839844, -1.62188720703125, 2.2437210083007812, 0.47092628479003906, -0.3158111572265625, 0.808807373046875, -0.1470947265625, 0.4552898406982422, 0.6567535400390625, -0.06403350830078125, 1.620819091796875, -1.0116500854492188, 1.7225532531738281, 0.7117576599121094, -0.5170783996582031, -1.0886077880859375, 0.43645477294921875, 1.116037368774414, -2.5848121643066406, -0.8772716522216797, 1.2251472473144531, -0.4638557434082031, 1.7487335205078125, 0.3355979919433594, 1.51312255859375, -0.9262065887451172, 6.534576416015625, 0.8000144958496094, 3.3155670166015625, 0.13242340087890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000069.npy"}
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.29693299531936646, "std": 1.4121835231781006, "min": -3.542877197265625, "p10": -1.4288042068481444, "median": 0.4651317596435547, "p90": 1.8926155090332035, "max": 3.741607666015625, "pos_frac": 0.625, "sample": [0.6705551147460938, -1.158050537109375, 1.4063568115234375, -2.342071533203125, 0.18992042541503906, -1.201141357421875, -0.646636962890625, 1.0254745483398438, 1.4859161376953125, -0.523223876953125, 0.659942626953125, 1.942596435546875, -3.542877197265625, 0.8232612609863281, 0.1758708953857422, -0.7065181732177734, 2.3360214233398438, -2.583221435546875, -1.0827178955078125, -0.2847137451171875, 3.6610794067382812, 0.5951709747314453, 1.1224918365478516, 0.4623832702636719, -2.1667747497558594, 0.9453392028808594, 0.5545806884765625, 2.5796051025390625, -1.0457611083984375, 3.741607666015625, 0.21857070922851562, 0.5313301086425781, -0.98370361328125, -0.3940391540527344, 0.11084747314453125, 2.4962310791015625, 1.636383056640625, 0.4678802490234375, 0.5056400299072266, 1.217691421508789, 0.4510936737060547, -1.4793224334716797, -0.4483184814453125, 0.8772506713867188, 1.7354278564453125, 0.7591896057128906, -0.00942230224609375, 2.893096923828125, -0.110992431640625, 0.9539031982421875, 0.7757949829101562, 1.7759933471679688, 0.5428466796875, -1.3109283447265625, 0.3100395202636719, -0.0955657958984375, -0.2063140869140625, -1.8868274688720703, 1.2696456909179688, 0.7498435974121094, -1.4862823486328125, 0.97552490234375, -0.9870948791503906, 0.0538330078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000070.npy"}
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.22837717831134796, "std": 1.717917561531067, "min": -5.43341064453125, "p10": -1.9464965820312499, "median": 0.2564659118652344, "p90": 2.063004684448243, "max": 3.5396728515625, "pos_frac": 0.578125, "sample": [0.07085227966308594, 1.6370201110839844, 1.3007793426513672, 0.4665679931640625, -0.5026588439941406, -0.5840072631835938, 0.9611148834228516, -1.8604278564453125, 0.32428741455078125, 2.595489501953125, 0.5313701629638672, -2.7830047607421875, 0.15618896484375, 1.0438156127929688, -3.332855224609375, -0.15453529357910156, -0.0916290283203125, 2.9544754028320312, 1.074188232421875, 1.630096435546875, -4.445465087890625, 2.2909927368164062, -0.016841888427734375, -0.5877799987792969, 1.2036857604980469, 3.0598297119140625, 1.0977401733398438, -0.5832881927490234, 0.5580101013183594, 0.1886444091796875, -5.43341064453125, -0.3861236572265625, -2.3262290954589844, -0.3206462860107422, 0.14191818237304688, -0.09659576416015625, 0.4309520721435547, 1.3427581787109375, 1.5927352905273438, -1.3163375854492188, -0.2910614013671875, -3.0516510009765625, 0.786285400390625, 1.844329833984375, -0.4325733184814453, -0.43819141387939453, -1.3978271484375, 2.1332931518554688, -0.22945404052734375, -0.028148651123046875, 1.5739192962646484, 2.862133026123047, -1.9833831787109375, 1.6679000854492188, 1.4740753173828125, 1.786895751953125, 1.3101806640625, 0.04355621337890625, 0.9518051147460938, 0.3373756408691406, 1.8989982604980469, -0.4472827911376953, 3.5396728515625, -1.1263847351074219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000071.npy"}
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.8340104222297668, "std": 1.546425700187683, "min": -4.0367889404296875, "p10": -0.7047962188720702, "median": 0.876133918762207, "p90": 2.616598510742188, "max": 5.413051605224609, "pos_frac": 0.78125, "sample": [2.6632537841796875, 1.502655029296875, 0.33504486083984375, 1.8413772583007812, -4.0367889404296875, 0.15764617919921875, 0.5312519073486328, 5.413051605224609, -1.53662109375, 0.023370742797851562, 1.2404441833496094, 1.7731857299804688, -0.6371307373046875, 2.1866226196289062, 0.2616539001464844, -0.2595500946044922, -0.3751716613769531, 0.9077281951904297, 2.2855758666992188, 0.0579986572265625, 3.4205474853515625, 1.0863800048828125, 1.8978614807128906, 0.3108692169189453, 1.1496734619140625, 0.3175048828125, 0.49831199645996094, -0.6514892578125, -1.2667694091796875, 0.11264801025390625, 3.0466156005859375, 1.1676654815673828, 1.0623931884765625, 2.5077362060546875, 4.731842041015625, 0.2510395050048828, 0.4145965576171875, 1.3559036254882812, 0.9753379821777344, -0.1153106689453125, 1.4331626892089844, 1.090688705444336, 1.6590118408203125, 0.28180694580078125, 1.3479080200195312, -0.4667510986328125, 1.682647705078125, 2.4176788330078125, -1.3794898986816406, 1.155487060546875, -1.434906005859375, 0.4892425537109375, 3.437774658203125, -0.5076179504394531, 3.2329940795898438, -0.7276420593261719, 1.4722366333007812, 0.8445396423339844, -2.283477783203125, 0.06594276428222656, 0.10694122314453125, 1.3258895874023438, 0.9324798583984375, 0.5911636352539062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000072.npy"}
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.10340967774391174, "std": 1.970245361328125, "min": -4.5959930419921875, "p10": -1.5558073043823242, "median": -0.2813682556152344, "p90": 1.7396986007690445, "max": 9.95928955078125, "pos_frac": 0.4375, "sample": [1.1849441528320312, 0.5660743713378906, -0.614410400390625, -1.1243247985839844, -1.4329338073730469, -0.9711837768554688, -1.6552352905273438, -0.997589111328125, -4.5959930419921875, 1.2688007354736328, -1.8104095458984375, -0.22153472900390625, 0.23691749572753906, -1.0513687133789062, -1.4327030181884766, 0.6916160583496094, -2.27447509765625, -0.11561203002929688, -0.3412017822265625, 1.1423263549804688, -0.6773834228515625, -0.5120201110839844, -1.0747623443603516, 0.03688812255859375, 9.95928955078125, 2.6883544921875, -0.6131820678710938, -1.2790813446044922, -0.3779468536376953, 1.0791130065917969, 0.5320816040039062, -0.41467857360839844, 0.9200458526611328, -0.083648681640625, -1.1166229248046875, 0.33734703063964844, 1.8885822296142578, 2.8118133544921875, -1.541971206665039, -1.100860595703125, -1.561737060546875, 3.3504791259765625, 1.392303466796875, -2.3670806884765625, -0.388763427734375, 1.3635635375976562, 0.43791961669921875, -0.8636074066162109, -0.5671463012695312, -1.19647216796875, -2.4677886962890625, 0.6082534790039062, 0.651702880859375, 0.6274185180664062, 1.3167762756347656, 0.892669677734375, 1.2851219177246094, 0.333892822265625, -0.6224365234375, -0.5645790100097656, -0.002208709716796875, 2.8246002197265625, 5.1314697265625, -0.9091930389404297], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000073.npy"}
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.7449140548706055, "std": 1.5884640216827393, "min": -2.9283294677734375, "p10": -1.0620241165161133, "median": 0.7254848480224609, "p90": 2.4154882431030273, "max": 7.21392822265625, "pos_frac": 0.734375, "sample": [-0.8906326293945312, 2.6066131591796875, -0.209686279296875, 1.2406463623046875, 1.9890670776367188, -0.7683372497558594, 2.4169387817382812, 7.21392822265625, 1.0159072875976562, 2.4121036529541016, 1.3896064758300781, 0.1680908203125, 2.20513916015625, 0.6980667114257812, -0.11754608154296875, 0.6032829284667969, 0.909210205078125, 1.0006561279296875, 0.08148956298828125, -2.9283294677734375, 0.5625457763671875, 2.39410400390625, 0.7748947143554688, 0.7025375366210938, 1.2130012512207031, 0.08010101318359375, -0.6643905639648438, 1.3496284484863281, 2.597576141357422, -1.1471176147460938, 0.1976909637451172, 0.7025032043457031, 1.5395641326904297, -1.109811782836914, -0.9505195617675781, 1.1607780456542969, -0.8273391723632812, 1.3050308227539062, -2.4789581298828125, -0.1343555450439453, 0.4803314208984375, -1.1224098205566406, 2.0269393920898438, 0.5630035400390625, 0.7864837646484375, 3.0648193359375, 1.3478012084960938, -0.5373821258544922, 1.9355545043945312, 1.2979049682617188, -2.6904144287109375, 0.08226394653320312, 0.7484321594238281, -0.4495506286621094, -1.1437911987304688, 0.18872451782226562, 0.5602340698242188, 4.545806884765625, 2.510894775390625, 1.5598602294921875, 1.6038398742675781, 0.8353309631347656, 0.7843723297119141, 0.3917694091796875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000074.npy"}
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.8109314441680908, "std": 1.5494182109832764, "min": -3.133258819580078, "p10": -0.9414316177368164, "median": 0.591339111328125, "p90": 3.2687393188476563, "max": 5.4225616455078125, "pos_frac": 0.6875, "sample": [-0.5800704956054688, -0.9162521362304688, -0.40088462829589844, 1.1728897094726562, 3.5832595825195312, -0.07770538330078125, 0.4850044250488281, 1.1162300109863281, 1.0154876708984375, 3.674854278564453, -1.2074356079101562, -0.1822662353515625, -0.9522228240966797, 0.7170028686523438, 0.48835182189941406, 1.6949195861816406, 3.2292938232421875, -1.4569683074951172, 0.6563310623168945, 0.587738037109375, 4.466796875, 1.095123291015625, 1.2575759887695312, 2.5633010864257812, 0.8602142333984375, 0.31073760986328125, 1.1440505981445312, -1.244110107421875, 2.4078521728515625, 0.4250755310058594, 0.5012855529785156, 3.28564453125, -0.25926971435546875, 2.0270309448242188, 1.5660476684570312, 0.34237098693847656, 5.4225616455078125, 0.842529296875, 1.0562362670898438, 0.5569610595703125, -0.5219497680664062, 1.0156726837158203, -3.133258819580078, 0.6847267150878906, 1.6853294372558594, -0.21667861938476562, -0.4563865661621094, 0.594940185546875, 0.4257526397705078, -0.1944904327392578, 0.7742900848388672, 2.312335968017578, -1.4314441680908203, -0.0460968017578125, 1.0439186096191406, -0.32883453369140625, 3.65380859375, -0.320343017578125, -1.2174911499023438, 0.039112091064453125, 0.35205650329589844, 0.3036155700683594, 3.5710601806640625, 2.034393310546875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000075.npy"}
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.3795369863510132, "std": 1.938381314277649, "min": -5.277496337890625, "p10": -1.8290920257568357, "median": 0.3597888946533203, "p90": 2.8179809570312506, "max": 4.884651184082031, "pos_frac": 0.59375, "sample": [-5.277496337890625, 1.3602485656738281, -2.7831039428710938, 0.395416259765625, 0.5620193481445312, -0.7395896911621094, -0.5685520172119141, 1.2773895263671875, 0.41181373596191406, 4.100555419921875, 2.3769779205322266, 4.884651184082031, -0.3287220001220703, -5.22515869140625, 1.121856689453125, -1.6893959045410156, 1.0963897705078125, 1.3431262969970703, 0.632080078125, 0.3724937438964844, 0.2513885498046875, 3.2475051879882812, -2.1776580810546875, 4.103950500488281, -2.536865234375, 2.3756790161132812, 0.7496109008789062, 1.2760391235351562, -0.09960746765136719, -0.1419086456298828, 0.2517681121826172, -0.7554111480712891, -1.8889617919921875, -0.6014404296875, -0.4383964538574219, -0.31296539306640625, 3.3112411499023438, 0.1322174072265625, -0.4374217987060547, 1.1215133666992188, -0.24123573303222656, 0.7017326354980469, 0.34708404541015625, 0.7383155822753906, 2.1319732666015625, 0.6749401092529297, 0.8556861877441406, -1.168863296508789, -0.17948150634765625, -0.6462326049804688, 0.14464569091796875, 1.1000595092773438, -0.5326652526855469, 2.6527862548828125, 4.752838134765625, 0.6145381927490234, -0.23252105712890625, -1.2800350189208984, -2.9589920043945312, 2.8887786865234375, 1.8196334838867188, -0.04211997985839844, 1.2273406982421875, 0.16888427734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000076.npy"}
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.9772913455963135, "std": 1.7723809480667114, "min": -3.4438438415527344, "p10": -1.1929000854492187, "median": 0.7798614501953125, "p90": 3.3467357635498054, "max": 6.2279052734375, "pos_frac": 0.71875, "sample": [-0.17682456970214844, 1.9968070983886719, -1.4474945068359375, -1.70526123046875, 2.576038360595703, 1.8517913818359375, -2.4347686767578125, 0.3117504119873047, -3.4438438415527344, 0.1267547607421875, 1.5742645263671875, 1.6627311706542969, -0.23951148986816406, -1.4135055541992188, 0.9707756042480469, 3.180187225341797, -0.414306640625, 1.0214004516601562, 1.1779022216796875, 4.455558776855469, -0.09603691101074219, 0.40186500549316406, 2.5464859008789062, 3.5426979064941406, 0.3790740966796875, 0.7707443237304688, 0.3434886932373047, 3.4181137084960938, 0.4575786590576172, 0.6515579223632812, -0.5345077514648438, -0.9557647705078125, 0.3255729675292969, 0.6425094604492188, 1.5193061828613281, 2.1539306640625, 1.5914535522460938, 1.5298118591308594, 1.2333221435546875, 1.7498588562011719, 2.2876358032226562, 0.540191650390625, 0.9963302612304688, -0.5605487823486328, -0.20632171630859375, 6.2279052734375, 0.10009765625, -0.04822540283203125, 3.9752120971679688, 1.5112266540527344, 5.133018493652344, 0.7889785766601562, 0.41869544982910156, 2.161468505859375, 2.5274581909179688, 0.8945350646972656, -1.2074661254882812, -1.2855682373046875, 2.45849609375, -0.14765357971191406, 0.858367919921875, 4.420173645019531, 0.5600433349609375, -1.1589126586914062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000077.npy"}
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.520514726638794, "std": 1.8253446817398071, "min": -4.18487548828125, "p10": -1.3945594787597655, "median": 0.3686246871948242, "p90": 2.6189659118652346, "max": 5.372837066650391, "pos_frac": 0.625, "sample": [2.9991073608398438, -3.7153472900390625, 0.8932228088378906, 3.1789474487304688, 0.39046478271484375, 2.6446762084960938, 0.71484375, 0.27474212646484375, 0.3757305145263672, 1.1738357543945312, 1.62518310546875, -2.990509033203125, -0.30164146423339844, 2.46844482421875, -0.132965087890625, -0.8555908203125, -1.7942008972167969, 3.7064971923828125, -1.8228530883789062, 0.36151885986328125, -0.7324981689453125, -0.51507568359375, -0.8492279052734375, -0.6571159362792969, 1.705810546875, 0.1915435791015625, -0.5978546142578125, -3.8502426147460938, 2.5589752197265625, -4.18487548828125, 1.582794189453125, 1.72564697265625, -0.21775054931640625, 2.1587905883789062, 0.18762969970703125, -0.6376190185546875, -1.2956466674804688, 0.9455718994140625, 1.663827896118164, 2.4381771087646484, 0.17650222778320312, 0.6849098205566406, 1.0286331176757812, 1.8564414978027344, -0.35594940185546875, 2.0183258056640625, -1.43695068359375, -0.10571479797363281, 0.5149307250976562, 1.3386459350585938, -0.376953125, 3.0759124755859375, -0.0623779296875, -1.2622756958007812, 1.1284027099609375, -0.51898193359375, 0.2778759002685547, 0.8263626098632812, 1.1842117309570312, 4.540618896484375, 0.1050262451171875, 0.29496002197265625, 5.372837066650391, 2.1925811767578125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000078.npy"}
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 1.4859102964401245, "std": 2.6758534908294678, "min": -3.40045166015625, "p10": -0.8148685455322265, "median": 0.902379035949707, "p90": 4.2097076416015655, "max": 11.02801513671875, "pos_frac": 0.71875, "sample": [-0.15441322326660156, 0.1464996337890625, -1.1993675231933594, 2.0040512084960938, -0.6763839721679688, 1.6434898376464844, -3.40045166015625, -0.2465496063232422, 0.43802642822265625, 0.09174346923828125, -0.6193313598632812, 0.8704910278320312, 1.9153709411621094, 9.919769287109375, -2.6848526000976562, 0.652496337890625, 3.3195953369140625, 0.55206298828125, -0.7436180114746094, -0.6451835632324219, 3.499176025390625, 9.379806518554688, 3.4698638916015625, 4.51422119140625, 1.7008247375488281, -1.7389411926269531, 5.5922393798828125, 1.9808578491210938, 1.9817733764648438, 3.20184326171875, -2.1038436889648438, -0.24364089965820312, 3.1864471435546875, 2.24810791015625, 2.4214229583740234, 0.4168434143066406, 1.3875923156738281, 2.167682647705078, -0.7564239501953125, 3.4609756469726562, -0.8399162292480469, 3.1723403930664062, 0.940948486328125, -0.4004192352294922, 0.6276721954345703, 1.0540294647216797, 3.1912841796875, 0.4253559112548828, 0.9342670440673828, 0.21095657348632812, 1.2261638641357422, -0.8589630126953125, 6.3187408447265625, 0.47540283203125, 0.7515678405761719, 11.02801513671875, 0.40392303466796875, 1.5302810668945312, 0.28148651123046875, 2.446582794189453, -0.32813262939453125, 1.0425071716308594, 4.899517059326172, -0.38562774658203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000079.npy"}
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.6415961980819702, "std": 2.7611172199249268, "min": -6.234771728515625, "p10": -2.1550689697265626, "median": 0.3012685775756836, "p90": 4.863739013671875, "max": 8.941802978515625, "pos_frac": 0.546875, "sample": [-0.7057476043701172, 3.7372055053710938, 0.6885147094726562, 2.1614761352539062, 1.25018310546875, 0.131622314453125, -0.9764938354492188, -0.2795867919921875, -2.2051849365234375, -2.0381317138671875, 2.1754932403564453, 5.485984802246094, -1.2138595581054688, -1.7632865905761719, -0.2642955780029297, 4.898170471191406, -1.661834716796875, -0.6161117553710938, 1.1872062683105469, -0.025266647338867188, 0.4611091613769531, -2.793731689453125, 0.23988723754882812, 2.14666748046875, 4.8119354248046875, 3.3721065521240234, 3.8385848999023438, -0.38053131103515625, 5.3311920166015625, 6.0112762451171875, -2.459115982055664, 2.3181304931640625, 0.3404083251953125, 0.4213123321533203, 1.6306304931640625, 2.9201583862304688, -0.4237957000732422, -0.5365447998046875, 5.699211120605469, -6.234771728515625, 4.8859405517578125, -0.09926605224609375, -5.358795166015625, 2.098682403564453, -0.4093360900878906, 1.1604270935058594, 1.023895263671875, 0.5515766143798828, -2.8966827392578125, -1.3692378997802734, 2.963165283203125, 0.2621288299560547, -0.561004638671875, -1.1702194213867188, -0.7742843627929688, -0.34386634826660156, 0.9036483764648438, -1.9413318634033203, 1.706735610961914, -4.684932708740234, 0.598785400390625, -1.5212860107421875, 8.941802978515625, 0.415435791015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000080.npy"}
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.9365310072898865, "std": 2.5791542530059814, "min": -4.465690612792969, "p10": -1.7701393127441405, "median": 0.7796897888183594, "p90": 3.4641338348388677, "max": 10.344253540039062, "pos_frac": 0.65625, "sample": [1.2030296325683594, 2.609630584716797, 1.5999374389648438, -1.7308273315429688, 0.8370399475097656, -4.3469085693359375, -0.2333049774169922, 0.6136703491210938, -0.5041904449462891, -0.7259063720703125, 7.633514404296875, -0.61663818359375, 1.4014053344726562, 3.0822677612304688, 0.7223396301269531, -1.1746902465820312, 0.2982826232910156, 0.14988327026367188, 5.5603790283203125, 2.1644725799560547, 1.2371768951416016, -0.7179069519042969, 1.6910781860351562, 3.374603271484375, 3.084991455078125, 1.2241058349609375, 0.6173095703125, -1.4042434692382812, -0.34763526916503906, -1.265716552734375, 1.1063385009765625, -3.7523956298828125, 1.7357711791992188, 0.012054443359375, -1.9827423095703125, 5.141448974609375, 0.1497650146484375, 10.344253540039062, 2.113067626953125, 6.488555908203125, -1.7869873046875, 1.1724472045898438, 0.2556304931640625, 0.45690155029296875, 1.0137405395507812, -0.027856826782226562, 1.853912353515625, 1.2532119750976562, 3.3868637084960938, 2.4017333984375, 3.4972496032714844, -2.5499343872070312, 3.1364803314208984, -0.43238067626953125, -0.4932384490966797, 1.227630615234375, -2.2653560638427734, 0.39816856384277344, 1.5927581787109375, 3.8013763427734375, -0.9378013610839844, 1.4640312194824219, -4.465690612792969, -1.408172607421875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000081.npy"}
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 1.2400058507919312, "std": 2.1601433753967285, "min": -6.5361785888671875, "p10": -0.9567022323608397, "median": 1.048914909362793, "p90": 3.946786117553713, "max": 7.343414306640625, "pos_frac": 0.78125, "sample": [0.7540035247802734, 1.4867172241210938, 0.8385772705078125, -1.2432746887207031, 2.472900390625, 0.1307830810546875, 1.1690177917480469, 2.8107376098632812, 4.9694366455078125, 1.2717666625976562, 0.8860759735107422, 0.6582794189453125, 2.942596435546875, 2.8219985961914062, 0.3681793212890625, 1.4259719848632812, 1.399139404296875, 0.4499664306640625, 1.2559070587158203, 0.99188232421875, 0.6856231689453125, -0.8070526123046875, 1.105947494506836, -0.280303955078125, -0.29596710205078125, 5.129570007324219, 1.576528549194336, 0.4509429931640625, 2.086759567260742, 1.2772502899169922, 0.0598602294921875, 0.09034538269042969, 7.343414306640625, 0.27329254150390625, 2.905670166015625, 6.0795135498046875, 3.3915939331054688, 2.8619537353515625, 1.793060302734375, 4.156887054443359, 0.2633514404296875, -2.373798370361328, 3.4565505981445312, -1.7452278137207031, 1.849344253540039, 0.36586761474609375, 2.886503219604492, -0.33589744567871094, 5.267646789550781, -0.541656494140625, 0.4355621337890625, -0.02193450927734375, 1.6388702392578125, 4.815177917480469, -0.153900146484375, -1.0208377838134766, 2.5206947326660156, 1.5472908020019531, 1.5046863555908203, 0.7795333862304688, -2.1138534545898438, -6.5361785888671875, -1.2779045104980469, 0.4049339294433594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000082.npy"}
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 1.398418664932251, "std": 2.400785207748413, "min": -3.9293689727783203, "p10": -1.4127199172973628, "median": 1.2600975036621094, "p90": 4.181716918945315, "max": 10.38543701171875, "pos_frac": 0.78125, "sample": [2.9012298583984375, 1.6890716552734375, -1.6234359741210938, 0.6992473602294922, 0.08402442932128906, 0.091888427734375, 2.6722946166992188, 2.018169403076172, 2.8768081665039062, 6.844482421875, 1.0618515014648438, 1.8887882232666016, -2.9798316955566406, 1.1874275207519531, 2.143953323364258, 0.8745155334472656, 3.4955596923828125, 3.1489410400390625, 3.0163230895996094, 1.3643035888671875, 2.1343612670898438, -1.6373291015625, 5.0572967529296875, 0.2860889434814453, -1.7053451538085938, 2.9654083251953125, 5.7468414306640625, 0.7797622680664062, 0.48898887634277344, -0.06897544860839844, -2.9690704345703125, 0.8781356811523438, 2.8081398010253906, -3.9293689727783203, -0.9210491180419922, 3.0847930908203125, 1.4931564331054688, 1.4590072631835938, 1.8166656494140625, -3.6610488891601562, 0.5488090515136719, 0.47725677490234375, -0.15710830688476562, 0.4534435272216797, -0.32205772399902344, -0.5379886627197266, 0.027362823486328125, 3.0834197998046875, 5.671600341796875, 1.88641357421875, 10.38543701171875, 1.5456562042236328, 4.4757843017578125, 1.3327674865722656, -0.1922607421875, 1.9956245422363281, 1.1872615814208984, 2.2458972930908203, 2.1721572875976562, 4.615478515625, 0.833953857421875, 0.5846023559570312, 0.5167655944824219, -0.8935546875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000083.npy"}
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 1.1916708946228027, "std": 2.818288803100586, "min": -4.93988037109375, "p10": -2.5257030487060543, "median": 0.9202251434326172, "p90": 5.2029464721679695, "max": 9.404403686523438, "pos_frac": 0.71875, "sample": [1.861541748046875, 2.3002586364746094, 0.8348865509033203, 1.8105812072753906, 6.8567047119140625, 2.390514373779297, 5.2900543212890625, -0.34148406982421875, -0.8542213439941406, -4.93988037109375, 0.4481201171875, 2.4503707885742188, 4.165660858154297, 0.59478759765625, -0.150848388671875, 0.4236946105957031, 0.9352607727050781, -2.1886825561523438, 5.354927062988281, -1.2625541687011719, 1.4234161376953125, 2.7688827514648438, 1.525320053100586, 3.01495361328125, 0.7948188781738281, 2.202484130859375, -2.302875518798828, 4.99969482421875, 0.354278564453125, 2.0029983520507812, -0.01336669921875, 0.7231292724609375, -0.3488006591796875, 9.404403686523438, -3.1914520263671875, 0.7162055969238281, 1.538370132446289, 2.485109329223633, 2.005840301513672, -3.2269287109375, -4.733367919921875, 7.2295074462890625, 2.3281021118164062, 0.2709312438964844, 6.453666687011719, 1.2524433135986328, 0.2637901306152344, 2.9536380767822266, 1.848785400390625, -3.8979415893554688, 0.6542205810546875, -1.2493133544921875, 2.258970260620117, 2.497039794921875, 2.190357208251953, -2.6212005615234375, 6.1499786376953125, 0.3381462097167969, 0.38214111328125, -2.6473388671875, 3.2494277954101562, 0.9051895141601562, -1.9954338073730469, -0.6709747314453125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000084.npy"}
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 1.6296786069869995, "std": 2.979201316833496, "min": -5.405784606933594, "p10": -2.2994750976562495, "median": 1.6109466552734375, "p90": 4.868304443359376, "max": 10.178955078125, "pos_frac": 0.75, "sample": [2.7290992736816406, -2.837024688720703, 1.4697704315185547, 3.0125961303710938, 1.2390213012695312, 0.7992706298828125, 2.5689010620117188, 0.9196434020996094, 0.7532176971435547, 4.023242950439453, 3.9622344970703125, 4.0765228271484375, 2.1960601806640625, -2.4342041015625, 1.177865982055664, 2.467449188232422, 1.8191719055175781, 4.5928802490234375, 3.8575363159179688, 2.7445068359375, -3.871429443359375, 3.3009109497070312, -4.1424713134765625, 3.2803192138671875, 1.9568023681640625, 5.639801025390625, -0.1371917724609375, 1.248992919921875, 6.504550933837891, -0.19878005981445312, -1.4348335266113281, 2.1077117919921875, 3.483367919921875, 3.9404144287109375, 1.326009750366211, 1.1277542114257812, 4.9863433837890625, 3.0014572143554688, 0.15036773681640625, -0.010349273681640625, -5.405784606933594, 1.6138458251953125, -1.985107421875, 2.732616424560547, 10.178955078125, 3.352895736694336, 7.188018798828125, -0.09899520874023438, 1.376556396484375, 1.1898384094238281, 1.7966232299804688, -3.8361282348632812, 0.8065338134765625, -1.3093338012695312, 0.9017982482910156, 1.6080474853515625, -3.0803756713867188, -1.5912399291992188, 6.565757751464844, 1.7721366882324219, 8.928924560546875, -1.940399169921875, 0.0032501220703125, 2.1334800720214844], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000085.npy"}
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 1.182692289352417, "std": 3.049222230911255, "min": -6.262031555175781, "p10": -2.5409679412841792, "median": 1.1709880828857422, "p90": 5.615957450866703, "max": 9.344879150390625, "pos_frac": 0.609375, "sample": [9.344879150390625, -1.08447265625, -2.7211761474609375, -0.26970672607421875, 2.0970077514648438, -0.42072105407714844, -0.14202880859375, 1.1709365844726562, -6.262031555175781, 2.1342220306396484, -0.5371685028076172, -2.3152618408203125, 6.0418548583984375, 6.213233947753906, -1.196990966796875, 3.793121337890625, 1.9846439361572266, -0.6545753479003906, -5.589080810546875, 1.259521484375, 7.22027587890625, -0.735443115234375, -0.3423919677734375, 3.528226852416992, 6.2021636962890625, 0.010076522827148438, 2.86236572265625, 1.1710395812988281, 4.1782379150390625, 0.770050048828125, 4.366100311279297, 6.0008697509765625, 2.8503875732421875, -4.44366455078125, 4.5997467041015625, 2.3340377807617188, -1.6437835693359375, 1.7167797088623047, -2.7896995544433594, 0.7234268188476562, -2.6376991271972656, 3.853015899658203, 1.7832107543945312, -1.3270378112792969, 2.2178726196289062, -0.7550621032714844, 4.717828750610352, -1.8870086669921875, 1.9637069702148438, -0.7728824615478516, -0.559906005859375, 6.0633544921875, 0.08185577392578125, -0.4149646759033203, 2.2380943298339844, 3.6305313110351562, -3.0567684173583984, 3.2033462524414062, 2.504497528076172, -0.7829933166503906, 1.5080757141113281, 0.31134986877441406, 0.6399803161621094, 1.7448959350585938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000086.npy"}
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 1.5732887983322144, "std": 4.0023322105407715, "min": -11.982025146484375, "p10": -2.2844467163085938, "median": 1.0890827178955078, "p90": 6.017824554443362, "max": 14.625198364257812, "pos_frac": 0.671875, "sample": [1.2338218688964844, 4.487518310546875, 9.557411193847656, -0.3738861083984375, 1.2172775268554688, -0.36412811279296875, 14.625198364257812, -1.0149154663085938, 4.660247802734375, 3.266141891479492, -2.9369544982910156, -1.5354337692260742, 2.6209468841552734, -3.2701187133789062, -11.982025146484375, 7.678245544433594, -4.5570068359375, 6.3215484619140625, 1.3214607238769531, 1.3650398254394531, -2.337799072265625, -1.8664932250976562, 2.083524703979492, -2.1599578857421875, 4.715053558349609, -4.3791351318359375, 0.3994255065917969, -0.11548423767089844, 1.4809951782226562, -1.479288101196289, -0.9988212585449219, 0.4634056091308594, 4.889766693115234, -2.5966110229492188, 2.12518310546875, -1.5977783203125, 0.87445068359375, 2.8961124420166016, 3.780984878540039, 0.6137599945068359, 0.9608879089355469, 4.234771728515625, 0.4324760437011719, -2.1111373901367188, 2.06390380859375, 2.823131561279297, 0.6334056854248047, 3.552825927734375, -1.464447021484375, 4.6538543701171875, 5.309135437011719, 1.7675933837890625, 9.622039794921875, -0.20832061767578125, 0.23154830932617188, 3.4579086303710938, 0.8657016754150391, -0.09264755249023438, 10.723236083984375, 7.967384338378906, 2.4171142578125, 0.9165611267089844, 0.68359375, 2.138275146484375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000087.npy"}
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 1.0533528327941895, "std": 4.667135238647461, "min": -12.389678955078125, "p10": -3.0415037155151365, "median": 0.5708589553833008, "p90": 6.903244781494143, "max": 18.879837036132812, "pos_frac": 0.609375, "sample": [9.609298706054688, 0.7738208770751953, -2.962160110473633, -3.0755081176757812, 8.301345825195312, 5.3890838623046875, 7.969963073730469, 0.5781364440917969, 0.16896820068359375, -0.13283157348632812, 4.762920379638672, 6.077880859375, 2.056577682495117, 2.0748558044433594, -1.1583061218261719, 2.0800094604492188, 6.355979919433594, -2.489105224609375, -1.9384613037109375, 1.3884048461914062, 3.1092166900634766, 0.8152332305908203, 0.3902130126953125, -0.3209857940673828, -2.393901824951172, 1.7058181762695312, -0.6487503051757812, 18.879837036132812, -1.8344993591308594, -4.007598876953125, -12.212669372558594, -4.8145294189453125, 7.137786865234375, -0.4246978759765625, 0.9753608703613281, 1.1582374572753906, -3.3548736572265625, 2.9057464599609375, -0.0706939697265625, 1.5573863983154297, -0.13709449768066406, 3.6825332641601562, 2.400423049926758, 0.9345359802246094, -3.2664794921875, 0.048828125, 0.3851890563964844, 0.5635814666748047, -1.8411369323730469, -0.38805389404296875, 2.8118343353271484, 1.3554458618164062, 9.5234375, 8.345428466796875, 5.090751647949219, -2.961423873901367, 2.0587005615234375, 0.2175140380859375, 0.3669281005859375, 3.0159034729003906, -1.7260360717773438, -12.389678955078125, -2.3411407470703125, -2.7179203033447266], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000088.npy"}
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 2.0436043739318848, "std": 3.78635835647583, "min": -6.1244049072265625, "p10": -1.3874830245971677, "median": 1.7210969924926758, "p90": 6.167555618286133, "max": 16.861724853515625, "pos_frac": 0.765625, "sample": [0.34511756896972656, 3.329303741455078, -0.3849620819091797, 1.8143310546875, 4.20068359375, 4.419528961181641, -1.4927520751953125, 0.43117523193359375, 2.4665584564208984, 5.1818084716796875, 2.4344024658203125, 6.23785400390625, 3.2155914306640625, -0.628875732421875, 0.5336227416992188, 0.39247703552246094, -1.5274772644042969, 2.151458740234375, 0.15699386596679688, -3.415027618408203, 4.599395751953125, -1.141855239868164, 0.17925262451171875, 5.093841552734375, -1.1312103271484375, 16.861724853515625, 3.4805831909179688, 2.147541046142578, -0.48215675354003906, 8.131988525390625, 0.3050537109375, 2.6305389404296875, 1.0468063354492188, 0.6626129150390625, 6.1151885986328125, 0.3118152618408203, 2.8203277587890625, 1.1442832946777344, 4.8868408203125, 5.2081451416015625, 3.016988754272461, 3.1922988891601562, 9.589607238769531, 10.398366928100586, 1.5377483367919922, -0.11058998107910156, 0.5918464660644531, 4.3005218505859375, 6.189998626708984, -5.8454437255859375, -0.2013092041015625, 3.8636302947998047, 0.018505096435546875, 0.3415565490722656, 0.21208763122558594, -5.3109893798828125, 2.2828311920166016, -0.5516166687011719, -4.28314208984375, -6.1244049072265625, 8.039291381835938, 2.125701904296875, 1.6278629302978516, 3.156801223754883], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000089.npy"}
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 1.7633610963821411, "std": 3.9035494327545166, "min": -7.394378662109375, "p10": -3.2669288635253904, "median": 1.477980613708496, "p90": 6.79005279541016, "max": 11.310691833496094, "pos_frac": 0.671875, "sample": [-0.9910564422607422, 3.504148483276367, 1.6895313262939453, 3.2052574157714844, 4.898336410522461, -0.03487968444824219, 4.674835205078125, 11.310691833496094, -7.394378662109375, -4.44696044921875, 4.648017883300781, 5.174625396728516, 4.1318817138671875, 0.8712959289550781, 3.7814407348632812, -1.7369117736816406, 1.798248291015625, 7.7684173583984375, -2.737895965576172, 1.2665061950683594, 5.817939758300781, -4.1304168701171875, 10.877471923828125, 5.078865051269531, 9.118453979492188, 0.6956634521484375, 3.0914382934570312, 4.0692138671875, -0.11532402038574219, 0.780975341796875, -0.39017486572265625, -1.05523681640625, 2.6027584075927734, 11.22900390625, 1.1763648986816406, 7.206672668457031, -4.139183044433594, -0.075714111328125, 8.410697937011719, 1.8310298919677734, -2.2618789672851562, 2.2241973876953125, 1.6902313232421875, -3.120452880859375, 5.310630798339844, 0.3573627471923828, 0.9970741271972656, -0.48850250244140625, 0.21317672729492188, -0.4289588928222656, -4.610820770263672, 1.8005294799804688, 2.7690658569335938, 0.1715545654296875, -0.7232856750488281, 1.5312480926513672, 1.424713134765625, -4.036825180053711, -3.3297042846679688, 3.43890380859375, 3.49664306640625, 4.11767578125, -1.1715564727783203, 0.022441864013671875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000090.npy"}
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 1.3786134719848633, "std": 4.4945197105407715, "min": -8.399658203125, "p10": -3.8276124954223634, "median": 0.7130641937255859, "p90": 6.024163055419923, "max": 21.25799560546875, "pos_frac": 0.609375, "sample": [5.33447265625, 5.818870544433594, 0.8866996765136719, -2.6937122344970703, 0.2417755126953125, 9.691917419433594, 0.5360832214355469, 2.2061996459960938, 2.8406410217285156, 9.334068298339844, 21.25799560546875, -1.611297607421875, 0.7233467102050781, -2.4398956298828125, 2.45623779296875, -2.5591869354248047, -1.2020111083984375, -1.4761085510253906, 4.563678741455078, -4.1822967529296875, 1.845916748046875, 2.9191665649414062, 1.1744461059570312, 0.2274799346923828, 0.7027816772460938, 2.4434146881103516, 4.581336975097656, -3.8144893646240234, -6.389068603515625, 5.839256286621094, 0.6327438354492188, -2.716663360595703, -4.4674530029296875, 2.1286449432373047, 7.227109909057617, -2.211811065673828, -4.252349853515625, 4.0005645751953125, -1.2839984893798828, -8.399658203125, 3.6180343627929688, 3.8712921142578125, -4.1600341796875, -0.995758056640625, -0.05151557922363281, -0.048046112060546875, 0.8877029418945312, 6.128513336181641, 3.221771240234375, 1.9408721923828125, -0.1268310546875, -2.2253246307373047, -0.016107559204101562, -0.13727378845214844, 6.1034088134765625, 2.970457077026367, 9.931182861328125, 0.07222938537597656, 5.509197235107422, 3.0967063903808594, -3.8332366943359375, -0.03848838806152344, 2.2318191528320312, 0.36583900451660156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000091.npy"}
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 2.291909694671631, "std": 4.908112049102783, "min": -7.565925598144531, "p10": -2.2236900329589844, "median": 1.3523759841918945, "p90": 8.14270477294922, "max": 18.262863159179688, "pos_frac": 0.640625, "sample": [4.259620666503906, -0.8575077056884766, -3.717987060546875, -0.11948204040527344, -1.3496627807617188, -2.2474822998046875, 0.27030181884765625, 0.5141639709472656, 3.3746871948242188, -1.6762714385986328, -2.0189208984375, 1.4675273895263672, 3.973339080810547, -3.0229949951171875, 6.514871597290039, 2.4262638092041016, 3.2289772033691406, 0.15706634521484375, 7.605829238891602, 1.9118423461914062, 5.04632568359375, -0.8417892456054688, 1.4008159637451172, 0.672637939453125, 6.807342529296875, -0.13188552856445312, -6.343971252441406, 2.3350982666015625, -7.565925598144531, 5.4748992919921875, -2.1681747436523438, 3.0826950073242188, 10.962699890136719, 1.3039360046386719, 5.639984130859375, 1.805328369140625, 0.42140960693359375, 6.2247314453125, -5.6257781982421875, 6.088249206542969, -0.2314605712890625, -0.5741500854492188, 9.169395446777344, 3.4131622314453125, 2.39801025390625, -1.4132308959960938, 5.865135192871094, 9.1856689453125, 3.7376556396484375, -1.193756103515625, -0.707183837890625, 14.78955078125, 0.22640228271484375, -0.07120323181152344, 8.0643310546875, -6.795131683349609, 18.262863159179688, 8.085624694824219, 0.4125194549560547, 12.18145751953125, 8.167167663574219, 0.6783828735351562, -1.1069412231445312, -1.1448631286621094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000092.npy"}
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 1.4382685422897339, "std": 5.4266767501831055, "min": -9.681106567382812, "p10": -4.074275779724121, "median": 0.674748420715332, "p90": 7.761073303222657, "max": 24.472076416015625, "pos_frac": 0.546875, "sample": [0.7287139892578125, 4.419075012207031, 10.178497314453125, 5.6563720703125, -2.54522705078125, 7.8481903076171875, -5.651237487792969, -0.9639968872070312, -0.4375324249267578, 2.5786514282226562, -1.1287288665771484, 2.1049461364746094, 24.472076416015625, 4.1705780029296875, -1.2007923126220703, -0.12265396118164062, -0.8359146118164062, 4.265094757080078, -0.8523941040039062, -7.823204040527344, 3.5763626098632812, 13.322158813476562, -2.9858245849609375, -0.420928955078125, 4.885406494140625, -0.5281257629394531, 7.289009094238281, 0.39936065673828125, -1.9647846221923828, 2.9988784790039062, -0.8948211669921875, 0.05041694641113281, -1.9159297943115234, 8.06719970703125, -2.8881454467773438, -2.9085311889648438, 0.6539459228515625, -0.0689544677734375, -9.681106567382812, -0.8533554077148438, 1.0523185729980469, -0.15642166137695312, 2.3120155334472656, 1.6537971496582031, 2.6999282836914062, -5.959815979003906, 0.6955509185791016, 1.1506614685058594, 8.140243530273438, 1.7020111083984375, 4.764596939086914, 1.2507152557373047, 7.144187927246094, 3.3320999145507812, -3.4079360961914062, -3.528299331665039, 7.55780029296875, -8.680313110351562, -4.308265686035156, 10.597801208496094, 5.44940185546875, -5.700927734375, -1.00799560546875, 4.30328369140625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000093.npy"}
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 2.9438118934631348, "std": 7.05675745010376, "min": -18.21685791015625, "p10": -2.250828170776367, "median": 2.350039482116699, "p90": 10.133105659484864, "max": 28.98077392578125, "pos_frac": 0.734375, "sample": [5.2671966552734375, 0.3174285888671875, 0.8554611206054688, 3.5108776092529297, 2.0237579345703125, -13.549026489257812, 0.5248565673828125, 0.6317901611328125, 1.418212890625, 3.38330078125, 1.0264015197753906, 7.027839660644531, 2.731170654296875, -10.483278274536133, 2.9314136505126953, 8.891403198242188, -1.0051155090332031, -0.5292835235595703, 10.261566162109375, 4.309814453125, 2.8257312774658203, 3.1136245727539062, 4.8189697265625, 2.735248565673828, 2.9911117553710938, 1.192190170288086, 0.016000747680664062, 1.9299087524414062, -4.182380676269531, 6.239654541015625, 4.801939010620117, -2.3561973571777344, -0.09628868103027344, -1.314239501953125, 3.9858360290527344, 5.528858184814453, 6.5449066162109375, 4.252838134765625, 5.581272125244141, -4.9760589599609375, 9.833364486694336, 10.7349853515625, -0.6556739807128906, 28.98077392578125, 6.147136688232422, 6.06561279296875, -1.1823348999023438, 18.148040771484375, 12.8853759765625, 2.7802047729492188, 24.068954467773438, 1.129770278930664, -1.1876602172851562, -0.8389225006103516, 2.676321029663086, 0.8330078125, -18.21685791015625, 1.8443431854248047, 15.230361938476562, 1.5684528350830078, 0.3696460723876953, -2.0049667358398438, -3.0284957885742188, -0.9561786651611328], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000094.npy"}
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 2.2913341522216797, "std": 5.985300064086914, "min": -11.440948486328125, "p10": -4.02734031677246, "median": 1.5944557189941406, "p90": 10.810770416259766, "max": 15.590011596679688, "pos_frac": 0.65625, "sample": [8.242431640625, -4.099613189697266, 0.9654541015625, -0.9946784973144531, -6.273162841796875, -0.25945472717285156, 13.495834350585938, 1.437448501586914, 4.165283203125, 0.502197265625, -2.275440216064453, 1.7014312744140625, 1.449554443359375, -1.4951896667480469, -0.23595428466796875, 2.8806076049804688, -10.696121215820312, -3.85870361328125, -1.3128318786621094, 1.7880058288574219, 15.590011596679688, 10.567024230957031, 0.20751190185546875, -0.47928619384765625, -9.7200927734375, 0.2705841064453125, 12.2491455078125, 12.132156372070312, -6.704803466796875, -11.440948486328125, 3.7805938720703125, 12.272262573242188, 2.3740463256835938, 3.5783233642578125, -0.40555381774902344, 1.7006378173828125, 9.310508728027344, -3.4639205932617188, -6.805992126464844, 4.4507293701171875, 6.1708984375, 0.17911148071289062, 10.819374084472656, 4.094755172729492, -3.73199462890625, 7.431678771972656, 8.347320556640625, 0.2253131866455078, 3.1530723571777344, 8.257637023925781, -2.7516937255859375, 7.669769287109375, -0.5209007263183594, 5.775177001953125, 8.724685668945312, 11.536827087402344, -2.924285888671875, 1.4882736206054688, 2.287750244140625, 5.2632293701171875, 0.6540699005126953, 2.0950794219970703, 10.790695190429688, -2.9804954528808594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000095.npy"}
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 2.7446060180664062, "std": 8.178009033203125, "min": -17.014801025390625, "p10": -4.9609832763671875, "median": 1.1171951293945312, "p90": 13.639452362060549, "max": 29.430938720703125, "pos_frac": 0.640625, "sample": [3.6373939514160156, -1.56683349609375, 13.860000610351562, -2.933807373046875, 29.430938720703125, -17.014801025390625, 6.428436279296875, 2.0617904663085938, 3.8473129272460938, 1.2522964477539062, 9.201019287109375, -0.5474433898925781, 2.9119720458984375, 16.47125244140625, 5.457557678222656, 0.9820938110351562, -9.976310729980469, -0.4107208251953125, 0.5084438323974609, -0.4431037902832031, 0.9094505310058594, 1.4421005249023438, 13.124839782714844, -7.788421630859375, 0.58428955078125, 0.6996898651123047, 7.052940368652344, -1.7717819213867188, -4.093498229980469, -0.118560791015625, 1.9694595336914062, 14.241867065429688, 25.101898193359375, 9.963081359863281, -1.689199447631836, -5.116668701171875, -13.125198364257812, 0.6786899566650391, -6.3521728515625, 1.5135383605957031, 4.3741455078125, 3.3435707092285156, 2.480175018310547, -0.19139480590820312, 22.104598999023438, -4.4826812744140625, 4.92207145690918, 0.5489501953125, -0.14852142333984375, -0.0382080078125, 5.155220031738281, -2.6045570373535156, 0.0782012939453125, 10.098808288574219, 17.597412109375, 4.148872375488281, -2.7470321655273438, 0.49123573303222656, 3.3701744079589844, 5.257026672363281, 11.755683898925781, 3.9564781188964844, -4.59771728515625, -9.601547241210938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000096.npy"}
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 1.3751589059829712, "std": 9.259732246398926, "min": -38.4085693359375, "p10": -5.799114990234375, "median": 0.9465389251708984, "p90": 9.545852851867677, "max": 30.006988525390625, "pos_frac": 0.625, "sample": [-18.66290283203125, -6.3882293701171875, 2.8141002655029297, 9.596860885620117, 0.46956825256347656, 22.29376220703125, -5.7468414306640625, -4.029060363769531, -4.120994567871094, -38.4085693359375, -3.807985305786133, -1.7391738891601562, 2.3469390869140625, 4.453071594238281, -6.962623596191406, -15.583992004394531, 0.7466087341308594, 0.07003021240234375, 4.25030517578125, 0.4711627960205078, 1.331064224243164, 5.670814514160156, 3.0573387145996094, -4.7357330322265625, 6.6646575927734375, -1.2425079345703125, -2.9690818786621094, 9.426834106445312, -4.391216278076172, 21.3094482421875, 2.0668182373046875, -4.8484954833984375, 4.143943786621094, 0.2163372039794922, 0.5652179718017578, 8.794158935546875, 11.672050476074219, 11.84377670288086, -1.2378864288330078, 5.4962158203125, 5.003631591796875, -9.767101287841797, 30.006988525390625, 4.098052978515625, 4.685272216796875, -1.0541934967041016, 6.531129837036133, 6.94207763671875, 16.08868408203125, -4.631492614746094, -0.22151947021484375, 3.6972122192382812, 1.1464691162109375, -3.3566322326660156, -5.8215179443359375, 4.468574523925781, 5.396196365356445, -3.157073974609375, 4.604360580444336, 3.6665878295898438, 0.054347991943359375, 0.49971771240234375, 5.206813812255859, -0.9722156524658203], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000097.npy"}
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.5424197912216187, "std": 7.25172233581543, "min": -36.474029541015625, "p10": -5.236370086669921, "median": 1.3501949310302734, "p90": 7.5913539886474615, "max": 12.21563720703125, "pos_frac": 0.65625, "sample": [0.0510101318359375, 2.0977725982666016, 9.032722473144531, 1.28131103515625, 3.1038970947265625, 4.179592132568359, 5.317588806152344, 3.4894142150878906, -3.5189590454101562, 12.059463500976562, -16.542022705078125, -5.437030792236328, -6.1568145751953125, -1.8198051452636719, -7.8378143310546875, 1.8394622802734375, -0.43943023681640625, 1.0160789489746094, 1.2177715301513672, 1.1660003662109375, 6.239154815673828, 8.86355209350586, -4.3995361328125, 10.802276611328125, 0.6061134338378906, -3.5226669311523438, 7.459957122802734, 1.4189567565917969, 2.809171676635742, 3.221729278564453, 2.8780517578125, -0.32959747314453125, 10.169540405273438, 1.586883544921875, -4.750484466552734, 3.1693267822265625, 1.5476875305175781, 1.9231376647949219, 1.159881591796875, -1.4375534057617188, 2.8670196533203125, -0.8451080322265625, -18.719314575195312, 1.9752922058105469, -0.5442123413085938, -5.739141464233398, 2.151430130004883, 1.7600860595703125, 1.28143310546875, 7.647666931152344, -4.768161773681641, 5.44195556640625, -1.7224807739257812, -2.6520233154296875, -36.474029541015625, 4.928131103515625, 3.0496368408203125, 0.8648872375488281, 1.1606941223144531, 12.21563720703125, 7.013233184814453, 6.718955993652344, -2.0677413940429688, -4.344768524169922], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000098.npy"}
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.9639348387718201, "std": 8.282386779785156, "min": -24.273895263671875, "p10": -7.475201416015624, "median": 0.3954429626464844, "p90": 10.098331069946289, "max": 29.859649658203125, "pos_frac": 0.546875, "sample": [13.8885498046875, 10.050113677978516, 10.118995666503906, 19.77452850341797, 0.7729225158691406, -9.565422058105469, -1.1541748046875, -2.0054779052734375, 3.21148681640625, 7.400440216064453, 4.44989013671875, -0.28388214111328125, 1.2057533264160156, 5.265537261962891, -0.7713413238525391, 4.091064453125, 3.0069732666015625, -0.1566162109375, -3.052379608154297, -0.03815650939941406, 10.483131408691406, 0.5135498046875, -13.227027893066406, -4.682861328125, -12.183761596679688, -6.812580108642578, 29.859649658203125, -18.222625732421875, -4.458930969238281, 12.108161926269531, 1.9995346069335938, -6.51959228515625, -1.609893798828125, 0.27733612060546875, 0.14494705200195312, 0.14825439453125, 1.21923828125, -4.059104919433594, -1.0311946868896484, 7.463268280029297, 1.1657848358154297, -24.273895263671875, 8.789112091064453, 15.774948120117188, -1.2694854736328125, 1.9989395141601562, 9.149391174316406, 1.669992446899414, -11.977676391601562, 1.0897655487060547, -2.659076690673828, 5.716756820678711, 2.2431583404541016, -1.3711166381835938, 6.090606689453125, -0.46698760986328125, -0.8735160827636719, 6.918357849121094, 2.353239059448242, -0.5960769653320312, 1.50445556640625, -5.776481628417969, -3.3674850463867188, -7.759181976318359], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000099.npy"}
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 2.6439366340637207, "std": 11.918777465820312, "min": -30.096817016601562, "p10": -8.109092712402344, "median": 1.3529253005981445, "p90": 13.937831115722666, "max": 48.87908935546875, "pos_frac": 0.65625, "sample": [11.2586669921875, -3.952972412109375, -8.296707153320312, 3.4353485107421875, 28.37591552734375, -1.6282501220703125, 0.8193416595458984, 5.361427307128906, 8.39141845703125, 7.984184265136719, 2.1882171630859375, -0.5975341796875, 6.518089294433594, 26.053810119628906, 0.9597320556640625, 2.7851409912109375, -1.57916259765625, 3.5210533142089844, 1.0093307495117188, 0.9766845703125, 1.2189960479736328, 1.4868545532226562, -9.191558837890625, 0.6218147277832031, -8.948577880859375, 6.8570404052734375, -3.879253387451172, 15.086044311523438, 1.8836174011230469, -29.845611572265625, -1.0454368591308594, 0.7971343994140625, -3.9563140869140625, 4.2855987548828125, 3.0484962463378906, -1.0074996948242188, 9.540756225585938, 1.1069259643554688, -19.80414581298828, 7.385734558105469, 2.61669921875, 33.88484191894531, -8.840019226074219, 1.2152862548828125, -3.556232452392578, -5.233280181884766, 3.1064300537109375, 3.1449508666992188, -2.9025096893310547, -0.7235603332519531, 8.516677856445312, 18.419448852539062, -4.2997589111328125, 4.995092391967773, 3.8336029052734375, 1.1731700897216797, -7.67132568359375, 21.306365966796875, 6.1851654052734375, 2.7779159545898438, 48.87908935546875, -1.64263916015625, -30.096817016601562, 4.8990020751953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000100.npy"}
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 2.801596164703369, "std": 8.829099655151367, "min": -26.787261962890625, "p10": -5.243538475036621, "median": 2.86637020111084, "p90": 11.382475280761719, "max": 31.811859130859375, "pos_frac": 0.765625, "sample": [-3.3893470764160156, -1.7191238403320312, 8.26239013671875, 1.806549072265625, 9.129364013671875, 9.612571716308594, 6.1284637451171875, 0.9218502044677734, 1.2747802734375, -1.0732421875, 2.8965225219726562, 7.710052490234375, -26.787261962890625, 14.539836883544922, 5.1761016845703125, 1.4100074768066406, 5.955677032470703, 3.513418197631836, -1.9693069458007812, -4.769723892211914, 7.051532745361328, 2.892608642578125, 11.445083618164062, -5.446601867675781, 6.411785125732422, -6.114738464355469, 5.972820281982422, 5.431373596191406, 1.2995986938476562, 9.869516372680664, 0.0027618408203125, 17.7972412109375, 7.758661270141602, 0.9395980834960938, -0.1142425537109375, -2.9681529998779297, 0.5743942260742188, -12.000701904296875, -3.8629913330078125, 9.392375946044922, -11.385513305664062, 4.820304870605469, 1.559072494506836, 2.4905929565429688, 12.318893432617188, 4.5738525390625, 0.14733123779296875, -20.5947265625, 0.5527019500732422, 1.527994155883789, 11.23638916015625, 0.7034206390380859, 12.310592651367188, 2.8401317596435547, 7.280006408691406, -19.980255126953125, 31.811859130859375, 3.5947418212890625, 0.2223358154296875, 4.202816009521484, 10.766891479492188, 3.2901573181152344, 17.860977172851562, 2.1900882720947266], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000101.npy"}
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.848842978477478, "std": 10.838068962097168, "min": -24.728530883789062, "p10": -10.251435089111329, "median": -0.9191150665283203, "p90": 11.987300872802734, "max": 41.74359130859375, "pos_frac": 0.4375, "sample": [13.967147827148438, -7.5421905517578125, -4.2222747802734375, 0.7308349609375, -13.269195556640625, -3.0268993377685547, -8.871963500976562, 2.9639434814453125, -11.493049621582031, -1.4763832092285156, 1.8039321899414062, 7.118804931640625, 1.7560501098632812, 2.6074676513671875, 12.031280517578125, -10.272933959960938, -1.6278915405273438, 33.02937316894531, -4.860450744628906, 21.933395385742188, -3.827066421508789, 1.8102664947509766, -8.634559631347656, -1.1394577026367188, 3.243753433227539, 3.226125717163086, -0.6418228149414062, 11.661518096923828, -18.152061462402344, 0.38833045959472656, 12.656822204589844, -0.30532073974609375, -13.717765808105469, -2.1328468322753906, -10.201271057128906, -4.7855377197265625, 6.02154541015625, 11.494529724121094, 24.519210815429688, -5.635738372802734, -0.4116973876953125, -7.05072021484375, -24.728530883789062, -0.94049072265625, -2.067432403564453, -0.8977394104003906, -0.95654296875, 11.884681701660156, -1.8567867279052734, -4.0268402099609375, 1.4002971649169922, 8.619163513183594, 5.842521667480469, -7.427751541137695, -1.5213966369628906, -6.412761688232422, -1.5293540954589844, 41.74359130859375, -5.436328887939453, -10.743026733398438, 4.634525299072266, 8.528587341308594, 9.406044006347656, 1.1462821960449219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000102.npy"}
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 2.201383590698242, "std": 7.21010684967041, "min": -21.671005249023438, "p10": -5.471182250976562, "median": 1.9615898132324219, "p90": 10.61044216156006, "max": 19.037857055664062, "pos_frac": 0.640625, "sample": [-5.080924987792969, -4.0556182861328125, 16.93359375, -11.710716247558594, -2.6340579986572266, 6.701744079589844, -5.811439514160156, -1.5012397766113281, 0.6813812255859375, 13.022972106933594, -5.638435363769531, 2.4257583618164062, -3.026092529296875, 3.1960601806640625, -3.6867904663085938, 8.480613708496094, 1.94708251953125, -1.6555557250976562, -0.411865234375, 8.465007781982422, 4.630645751953125, 9.927425384521484, 0.3238525390625, 3.671375274658203, 3.1023178100585938, -1.3246917724609375, -1.4840660095214844, 19.037857055664062, -4.375267028808594, 4.5403289794921875, 9.219085693359375, 8.575607299804688, 14.1478271484375, -4.1784820556640625, 2.0260772705078125, 9.90228271484375, -21.671005249023438, 8.549560546875, 1.2124862670898438, 1.2584514617919922, 6.6488037109375, 0.9663829803466797, 0.7928123474121094, 3.6275253295898438, 5.165142059326172, 5.738983154296875, 10.78226089477539, -6.587474822998047, 10.209531784057617, 1.54541015625, 11.00067138671875, 3.0157241821289062, -8.469551086425781, -0.33687591552734375, -13.100387573242188, 1.0280303955078125, -4.958427429199219, -1.6265449523925781, -2.1897220611572266, 1.9760971069335938, 5.835296630859375, 5.372226715087891, 15.146347045898438, 5.573150634765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000103.npy"}
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 4.756073951721191, "std": 14.292997360229492, "min": -24.326705932617188, "p10": -9.155218505859375, "median": 2.065199851989746, "p90": 21.74625701904298, "max": 67.12664794921875, "pos_frac": 0.671875, "sample": [25.438140869140625, 9.362823486328125, 8.383735656738281, -5.186407089233398, -8.873588562011719, 6.8502197265625, -5.591270446777344, -0.6828231811523438, 8.822746276855469, 0.001708984375, 1.5846939086914062, -24.326705932617188, 67.12664794921875, 19.039962768554688, -13.632980346679688, -3.307525634765625, 35.522216796875, 4.2403106689453125, 0.0865631103515625, 17.822914123535156, -13.290939331054688, -20.572067260742188, 15.10076904296875, 15.789627075195312, 3.04278564453125, -1.0746707916259766, 1.0599784851074219, -17.803688049316406, 8.80824089050293, -0.4409942626953125, 0.044795989990234375, 1.811056137084961, -0.7880268096923828, 5.608770370483398, 16.24044418334961, 6.102670669555664, 1.4299392700195312, -0.9140377044677734, 0.999847412109375, 5.38983154296875, -0.2221832275390625, -6.515420913696289, 37.6566162109375, 2.3193435668945312, 5.881317138671875, 22.906097412109375, 4.792205810546875, 2.8898868560791016, -17.93999481201172, 2.8854751586914062, 1.5375804901123047, 5.502260208129883, 11.91933822631836, 6.341697692871094, 1.5714874267578125, -9.275917053222656, -0.22500991821289062, -1.580474853515625, 1.1420669555664062, 6.876495361328125, -6.621040344238281, 9.236335754394531, 24.569976806640625, 29.514862060546875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000104.npy"}
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 4.009280681610107, "std": 12.912360191345215, "min": -21.166671752929688, "p10": -8.849656677246093, "median": 2.3476076126098633, "p90": 19.510626220703127, "max": 51.422637939453125, "pos_frac": 0.671875, "sample": [2.556833267211914, 12.093559265136719, 7.593902587890625, 5.879798889160156, 1.2372398376464844, 8.659337997436523, -19.697250366210938, 19.793304443359375, -3.5715408325195312, 3.9443187713623047, 1.9297103881835938, 4.6344146728515625, -6.9303131103515625, 16.04828643798828, 0.23812484741210938, 12.607734680175781, -2.8978023529052734, -11.689453125, -0.7945365905761719, 3.31341552734375, 7.512081146240234, -4.091819763183594, 2.7783126831054688, -8.809783935546875, 3.1259307861328125, 11.468841552734375, 0.14201736450195312, -19.469192504882812, 4.077491760253906, -14.229446411132812, -3.5978317260742188, 0.9776210784912109, -3.1724090576171875, 5.944923400878906, -21.166671752929688, 51.422637939453125, 1.414255142211914, -0.20672607421875, 6.417366027832031, -0.6119251251220703, 1.1218299865722656, -0.8688163757324219, 2.1383819580078125, 1.7869739532470703, -19.830184936523438, 7.344139099121094, 32.67097473144531, -3.8672332763671875, 20.435226440429688, -5.392768859863281, -8.866744995117188, 3.879932403564453, 2.6345958709716797, 27.799514770507812, 41.08111572265625, -1.2944412231445312, 7.552457809448242, 18.467361450195312, 0.9801845550537109, 3.9745044708251953, 0.99658203125, 20.376014709472656, 9.748565673828125, 18.851043701171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000105.npy"}
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 2.401583671569824, "std": 12.214886665344238, "min": -36.44221496582031, "p10": -11.65270614624023, "median": 1.6942329406738281, "p90": 13.70139617919922, "max": 34.67897033691406, "pos_frac": 0.65625, "sample": [-21.976455688476562, 11.626983642578125, 5.990522384643555, 9.63897705078125, -14.953872680664062, -0.11098861694335938, 12.719535827636719, 1.9842891693115234, -0.9993133544921875, 0.17884063720703125, 8.128448486328125, 11.029132843017578, 20.930709838867188, 8.896461486816406, 0.7958469390869141, -6.640846252441406, 1.4721450805664062, 13.60418701171875, 13.227096557617188, 2.103302001953125, -0.5019493103027344, 0.5145187377929688, -22.0938720703125, 20.2816162109375, -2.2298078536987305, 1.4790382385253906, -0.8898296356201172, -8.084030151367188, 1.6423797607421875, 33.593902587890625, -1.1761703491210938, -2.6201324462890625, -4.528026580810547, -9.099082946777344, 9.351814270019531, 6.630393981933594, -1.8358097076416016, -2.028921127319336, 6.5469818115234375, 34.67897033691406, 8.505355834960938, 3.8840560913085938, 4.168031692504883, 1.208566665649414, -12.747116088867188, 1.929840087890625, 3.4269065856933594, 1.0348014831542969, -36.44221496582031, 1.4201316833496094, -28.682891845703125, 1.7460861206054688, 13.743057250976562, 4.424571990966797, 12.172401428222656, 19.859901428222656, 1.7756462097167969, 2.2205944061279297, -1.6011009216308594, 25.845382690429688, -13.258743286132812, 1.1347007751464844, -2.4961509704589844, 3.1525650024414062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000106.npy"}
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 1.0161256790161133, "std": 15.541536331176758, "min": -50.968994140625, "p10": -14.493194580078123, "median": 3.182086944580078, "p90": 14.000395202636719, "max": 53.12272644042969, "pos_frac": 0.640625, "sample": [2.6136474609375, -9.129829406738281, 6.8831329345703125, -9.109272003173828, 14.069389343261719, 5.9715728759765625, -50.968994140625, 12.950912475585938, 3.579803466796875, 2.685009002685547, -39.82991027832031, -2.9391937255859375, 2.2893810272216797, 6.870506286621094, 8.072551727294922, -6.358909606933594, -11.472175598144531, 11.020683288574219, 11.953441619873047, 4.60772705078125, 0.1231536865234375, 21.172515869140625, 13.839408874511719, 1.3811531066894531, 2.9172592163085938, -19.214019775390625, 5.504615783691406, -4.290290832519531, -2.8224105834960938, 3.7293701171875, 3.4547386169433594, -17.051910400390625, 2.889453887939453, 13.559989929199219, -0.00664520263671875, 12.825714111328125, 7.176261901855469, 4.513671875, 7.348358154296875, 14.983619689941406, 31.825408935546875, 16.09606170654297, -12.639320373535156, -2.1965408325195312, 3.4537582397460938, -2.5884933471679688, -1.0122261047363281, 9.268196105957031, 5.699666976928711, -34.61988830566406, 16.555213928222656, 4.138275146484375, -6.324945449829102, -8.162765502929688, 0.42998313903808594, 2.4885025024414062, 11.274093627929688, 53.12272644042969, 8.09735107421875, -5.624397277832031, -10.246841430664062, -37.95445251464844, 3.4469146728515625, -15.287712097167969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000107.npy"}
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 3.5093448162078857, "std": 13.541446685791016, "min": -32.31172180175781, "p10": -10.018377685546874, "median": 1.64208984375, "p90": 18.199002075195313, "max": 47.895660400390625, "pos_frac": 0.5625, "sample": [2.6591110229492188, 10.681266784667969, 25.2010498046875, -10.810722351074219, -0.025299072265625, 9.861900329589844, 10.648300170898438, 3.1700439453125, 2.248199462890625, -12.040916442871094, 9.235115051269531, 4.766048431396484, 7.831207275390625, -1.961669921875, 18.366424560546875, 4.065498352050781, -7.889533996582031, 0.15262985229492188, 1.636199951171875, -2.268451690673828, 14.593841552734375, -1.5214920043945312, -1.6492996215820312, 17.808349609375, -0.3246307373046875, -19.29767608642578, 5.14923095703125, -32.31172180175781, 23.539031982421875, -0.1488189697265625, -27.969818115234375, -2.806428909301758, 13.383087158203125, -1.6325035095214844, 22.946617126464844, 4.239477157592773, 0.40072059631347656, -0.22336769104003906, -9.440826416015625, 10.79205322265625, -9.424674987792969, 16.589340209960938, 7.256107330322266, -1.9443073272705078, 1.647979736328125, 7.419910430908203, -2.846841812133789, 0.7831287384033203, 10.272567749023438, -10.265899658203125, -16.385944366455078, 12.68292236328125, -8.57525634765625, -0.5593338012695312, 25.587631225585938, -3.1414794921875, 5.044281005859375, 47.895660400390625, 2.6125946044921875, -5.19818115234375, -0.22403717041015625, 10.795280456542969, -2.1710433959960938, 45.69544982910156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000108.npy"}
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 6.621773719787598, "std": 15.593023300170898, "min": -15.92218017578125, "p10": -6.626133728027343, "median": 2.03043270111084, "p90": 22.426654434204107, "max": 62.161285400390625, "pos_frac": 0.640625, "sample": [-3.2359542846679688, 38.11248016357422, 61.76312255859375, 1.3540382385253906, 5.778020858764648, 7.605657577514648, 11.711345672607422, -4.19610595703125, 2.8025245666503906, 8.026878356933594, -3.9456615447998047, 5.063697814941406, -1.4111671447753906, 10.16159439086914, 2.019287109375, -15.92218017578125, 0.28511619567871094, 1.157938003540039, -4.220981597900391, 0.11748695373535156, 0.4936676025390625, 6.036506652832031, -10.573043823242188, -2.189791679382324, -9.436508178710938, 62.161285400390625, 3.8201446533203125, -2.8502864837646484, 12.153264999389648, 51.54608154296875, 8.162002563476562, 0.6075973510742188, 4.413944244384766, -0.31992340087890625, -0.4902019500732422, 4.7810211181640625, -3.388151168823242, 24.499588012695312, -3.98468017578125, 9.041854858398438, -2.0507354736328125, 9.249349594116211, 3.030977249145508, 21.5072021484375, -0.6613330841064453, -5.4061431884765625, -0.8359909057617188, 2.0242786407470703, -15.514091491699219, -8.976577758789062, 2.0365867614746094, 9.662666320800781, 15.833969116210938, 16.833969116210938, 22.82070541381836, 39.93603515625, 19.81824493408203, 1.14642333984375, 13.366523742675781, 7.2214508056640625, -9.882987976074219, -0.18952560424804688, -7.14898681640625, 12.460006713867188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000109.npy"}
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": -1.6375174522399902, "std": 17.23258399963379, "min": -68.52674865722656, "p10": -14.550450134277341, "median": 1.3110466003417969, "p90": 14.604165649414066, "max": 38.3367919921875, "pos_frac": 0.5625, "sample": [4.121196746826172, -1.5575637817382812, -40.692779541015625, 2.4959945678710938, -3.422962188720703, 17.09302520751953, 6.768646240234375, -11.125991821289062, -39.81585693359375, -5.8342437744140625, 9.138053894042969, 5.903232574462891, 3.5469512939453125, 14.977554321289062, 1.5765457153320312, -10.013648986816406, 4.509178161621094, 25.805252075195312, -1.3912525177001953, -2.1878089904785156, 7.361345291137695, -10.569633483886719, 3.371458053588867, 13.732925415039062, 0.4319610595703125, 6.12469482421875, 1.0455474853515625, 12.932918548583984, 0.6439971923828125, -10.45697021484375, -12.567131042480469, 0.6460304260253906, 4.0123291015625, -3.1738853454589844, -15.400444030761719, 5.246124267578125, -11.427230834960938, 17.45538330078125, -4.0460357666015625, 2.3295669555664062, 28.010498046875, 11.8096923828125, -5.779388427734375, -9.111557006835938, -2.8424034118652344, 1.6675491333007812, 10.925029754638672, -21.93444061279297, -39.774444580078125, -50.28070068359375, -7.695892333984375, -2.8534622192382812, -3.5769577026367188, 2.860076904296875, -4.77294921875, 38.3367919921875, 21.182693481445312, 4.499029159545898, 4.108722686767578, 6.0750579833984375, -9.482114791870117, 1.6362686157226562, 3.132049560546875, -68.52674865722656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000110.npy"}
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 4.60113525390625, "std": 16.46465492248535, "min": -24.378677368164062, "p10": -11.081395912170409, "median": 0.4691429138183594, "p90": 20.751941680908203, "max": 89.41409301757812, "pos_frac": 0.53125, "sample": [-2.9189453125, -1.6603145599365234, 0.4163665771484375, 17.101776123046875, 2.147296905517578, 22.338104248046875, 0.5219192504882812, 6.323829650878906, 0.3900260925292969, -3.1167850494384766, 11.9937744140625, -11.411094665527344, -6.953212738037109, 4.76324462890625, -21.267440795898438, -3.7822608947753906, -1.7333488464355469, -3.089385986328125, -3.8407516479492188, -1.732809066772461, 33.86460876464844, 18.146621704101562, 31.457664489746094, 22.233219146728516, 41.03450012207031, 8.90176010131836, 11.22967529296875, 11.885589599609375, 1.8068161010742188, -24.378677368164062, -0.6252174377441406, -4.249856948852539, 15.22320556640625, -14.160377502441406, -5.822151184082031, -10.321075439453125, -3.8761978149414062, 5.406547546386719, -6.429389953613281, 8.2767333984375, 20.439170837402344, 19.96764373779297, 20.885986328125, -1.1278228759765625, -11.407247543334961, -2.379697799682617, 5.127737045288086, 15.984882354736328, -2.8554229736328125, -1.1633014678955078, 8.844139099121094, 1.3770484924316406, 4.027853012084961, -10.263906478881836, -9.930023193359375, 1.5369758605957031, -12.786148071289062, 17.462875366210938, -11.811275482177734, -0.18715667724609375, 89.41409301757812, -3.058624267578125, 6.306331634521484, 5.97454833984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000111.npy"}
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 6.437560081481934, "std": 17.384408950805664, "min": -40.00506591796875, "p10": -10.597524261474609, "median": 3.322249412536621, "p90": 24.743493652343755, "max": 74.38388061523438, "pos_frac": 0.671875, "sample": [74.38388061523438, 15.893707275390625, 12.494033813476562, 5.752422332763672, 0.3071403503417969, 1.6257476806640625, 39.22773742675781, 4.421287536621094, 10.814308166503906, -0.0167694091796875, 3.9149627685546875, 23.4437255859375, -14.452529907226562, 5.454839706420898, 3.648845672607422, -14.464874267578125, 0.3465576171875, 12.248451232910156, -4.7626495361328125, -4.586891174316406, 7.405281066894531, 0.483123779296875, 0.34835052490234375, 10.357505798339844, 11.713058471679688, 2.572307586669922, 16.97229766845703, -5.20635986328125, 58.085235595703125, 7.760763168334961, 20.026275634765625, -0.9155654907226562, -5.021871566772461, -0.8270778656005859, 13.746158599853516, 1.425750732421875, 4.325471878051758, -21.799476623535156, 12.312065124511719, 8.674076080322266, -4.9185638427734375, -20.64520263671875, 13.250690460205078, 2.9956531524658203, 22.244468688964844, 33.26866149902344, 11.2108154296875, 9.297782897949219, -0.8603878021240234, 1.8715095520019531, -12.241439819335938, -0.0989990234375, -3.1173057556152344, 2.92144775390625, 38.61857604980469, 0.47137451171875, 6.827522277832031, -4.095495223999023, -10.962226867675781, 25.300537109375, -9.746551513671875, -40.00506591796875, -0.7898826599121094, 33.074623107910156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000112.npy"}
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 2.223445177078247, "std": 15.374906539916992, "min": -45.92724609375, "p10": -13.419877243041991, "median": 2.229304313659668, "p90": 18.944449996948247, "max": 64.21365356445312, "pos_frac": 0.625, "sample": [-9.290641784667969, -14.238883972167969, -2.8309478759765625, 2.8622913360595703, 1.5623207092285156, 7.298358917236328, 6.075469970703125, -26.016883850097656, -5.167877197265625, 2.4075984954833984, 4.080745697021484, -6.646694183349609, -11.565452575683594, -0.7934188842773438, -6.336879730224609, -12.010528564453125, 2.261302947998047, 14.31558609008789, 10.250022888183594, -4.973608016967773, -14.990753173828125, 5.102874755859375, 19.351398468017578, -12.514999389648438, 3.298879623413086, 6.551948547363281, 6.842266082763672, 5.622486114501953, 22.742286682128906, -1.9182319641113281, 3.640840530395508, 64.21365356445312, 2.5842647552490234, 1.4736137390136719, 9.180381774902344, 9.962108612060547, -45.92724609375, 3.6627063751220703, 24.84228515625, -7.7024078369140625, 2.197305679321289, 4.34062385559082, -30.83892822265625, 20.18651580810547, -5.532367706298828, 37.056060791015625, 1.7473678588867188, -0.5009040832519531, 0.8283729553222656, 4.043462753295898, 9.010360717773438, -15.491203308105469, 1.849029541015625, 33.77301025390625, 13.662353515625, 1.1782341003417969, 5.3288726806640625, -0.06725120544433594, 17.994903564453125, 0.6075439453125, -4.989891052246094, -2.2699851989746094, -13.807682037353516, 4.734447479248047], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000113.npy"}
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 3.765486717224121, "std": 15.29601001739502, "min": -24.62799072265625, "p10": -12.157267379760743, "median": 1.5765113830566406, "p90": 27.050014877319363, "max": 48.31352233886719, "pos_frac": 0.578125, "sample": [-23.3526611328125, -20.629592895507812, -0.4304466247558594, 20.33615493774414, 3.378337860107422, 5.627655029296875, 0.2457275390625, 32.843833923339844, 2.7441177368164062, -4.11566162109375, -12.052909851074219, -12.20199203491211, 5.8681793212890625, -15.749710083007812, -4.386283874511719, 9.902946472167969, -0.5779151916503906, -1.9696578979492188, 0.8574008941650391, -8.633926391601562, 1.7556533813476562, -9.350212097167969, 37.397125244140625, -10.338115692138672, 0.6397933959960938, 17.052406311035156, 7.958583831787109, 10.858963012695312, 14.022216796875, 10.080806732177734, -7.5921173095703125, 1.3027362823486328, 13.410415649414062, -0.5409393310546875, -1.768524169921875, 2.0122146606445312, 14.823097229003906, -4.194404602050781, 4.994232177734375, -11.739547729492188, 44.753692626953125, 10.102499008178711, 48.31352233886719, 3.549314498901367, 15.043754577636719, 1.5880603790283203, 29.927383422851562, -13.934951782226562, -3.58416748046875, -12.91141128540039, 3.494354248046875, -24.62799072265625, 3.928041458129883, 33.926849365234375, 17.82707977294922, 35.505157470703125, 4.166416168212891, -2.786956787109375, -10.407554626464844, -6.028650283813477, 3.8328781127929688, -7.0136871337890625, -3.7254295349121094, 1.564962387084961], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000114.npy"}
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 4.1976518630981445, "std": 16.010906219482422, "min": -31.64745330810547, "p10": -14.647732162475585, "median": 2.384697914123535, "p90": 21.09809036254883, "max": 51.6435546875, "pos_frac": 0.578125, "sample": [-6.992664337158203, -15.983444213867188, 15.017189025878906, -0.1652679443359375, 51.6435546875, -3.350126266479492, 7.761623382568359, 9.132770538330078, -3.085662841796875, 51.521148681640625, 2.255617141723633, -1.5792922973632812, 9.006317138671875, -15.035369873046875, -12.364082336425781, -28.873565673828125, -1.0682659149169922, 42.643402099609375, 6.667694091796875, 17.896930694580078, 5.8065032958984375, 16.391700744628906, 9.431877136230469, 7.576850891113281, -5.66242790222168, 1.8400135040283203, 21.58203125, 7.359081268310547, -31.64745330810547, 16.716060638427734, -19.717056274414062, -3.8396987915039062, 0.7786960601806641, 30.982818603515625, -1.4726486206054688, 12.076637268066406, -2.7059249877929688, 3.770559310913086, -19.94525909423828, 4.798210144042969, 2.5137786865234375, -13.743244171142578, 16.033309936523438, 14.513641357421875, 33.75349426269531, 5.027439117431641, -9.008171081542969, -3.4529170989990234, -8.834575653076172, 27.954986572265625, 0.11194419860839844, -1.0868167877197266, 5.570281982421875, -0.33245086669921875, 0.15558624267578125, -2.10089111328125, -20.023948669433594, 3.989377975463867, 8.488243103027344, -0.789398193359375, 8.281509399414062, -1.5965156555175781, 4.08708381652832, 19.968894958496094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000115.npy"}
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 8.429168701171875, "std": 17.01601219177246, "min": -23.128578186035156, "p10": -10.420768737792967, "median": 4.517142295837402, "p90": 27.629653930664066, "max": 63.99212646484375, "pos_frac": 0.640625, "sample": [22.3428955078125, -18.492462158203125, -1.1955413818359375, 14.360359191894531, 18.89508056640625, 18.529396057128906, 23.590164184570312, 1.2418231964111328, 12.635007858276367, 23.456016540527344, 26.082290649414062, -7.834400177001953, -18.98138427734375, 45.585174560546875, -1.5130348205566406, -1.844522476196289, -0.30924224853515625, -0.6153430938720703, 3.736927032470703, 28.1656494140625, -8.263442993164062, -3.9297027587890625, 12.808155059814453, 16.285335540771484, -2.1348419189453125, 20.73699951171875, 24.812522888183594, 12.42013168334961, 18.704132080078125, 0.11214447021484375, 5.575183868408203, -6.068378448486328, -8.224939346313477, 0.9477958679199219, 26.378997802734375, 2.381866455078125, 3.6502151489257812, 8.360828399658203, -13.642990112304688, -12.146759033203125, 33.29962158203125, 11.311416625976562, 1.7233238220214844, -3.8024444580078125, 44.56428527832031, 7.43658447265625, 46.235687255859375, -11.3453369140625, -1.6676254272460938, 4.60870361328125, -1.8843860626220703, 13.004798889160156, 2.9282684326171875, -23.128578186035156, -2.2759323120117188, 63.99212646484375, -14.976680755615234, 5.8702392578125, 6.05949592590332, 40.19868469238281, 18.678489685058594, 4.425580978393555, 7.942924499511719, -0.3306007385253906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000116.npy"}
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 2.4066035747528076, "std": 19.433807373046875, "min": -91.50906372070312, "p10": -15.863511657714843, "median": 3.5933542251586914, "p90": 18.863187408447274, "max": 55.18534851074219, "pos_frac": 0.625, "sample": [-16.207763671875, 3.635324478149414, 10.187911987304688, 8.712371826171875, 50.27919006347656, 2.11785888671875, 13.990999221801758, 2.8776092529296875, -12.21121597290039, 10.67216682434082, 0.5764122009277344, -1.94989013671875, 29.145904541015625, -21.26165008544922, -15.060256958007812, 0.10169219970703125, -16.552474975585938, 6.220176696777344, 6.087432861328125, 4.824943542480469, 17.159912109375, 8.674896240234375, 12.137298583984375, -5.009223937988281, -4.852468490600586, -20.786842346191406, 14.732452392578125, -14.433624267578125, 13.94485092163086, 16.218955993652344, -8.881660461425781, 19.593162536621094, 17.14417266845703, 4.2294158935546875, 34.43597412109375, -8.363662719726562, 7.875246047973633, 24.996238708496094, -0.15332794189453125, 8.817710876464844, 5.7423858642578125, -0.4605598449707031, 9.539810180664062, 5.677593231201172, -2.7010536193847656, -26.846031188964844, 0.689239501953125, 3.5513839721679688, 3.0862960815429688, 14.243949890136719, 3.7456741333007812, -9.946083068847656, 22.78466033935547, -4.166648864746094, 55.18534851074219, -26.98639678955078, -12.132240295410156, 14.079437255859375, -91.50906372070312, -12.614967346191406, 6.8895721435546875, -3.4585189819335938, 1.6102962493896484, -5.64769172668457], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000117.npy"}
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 5.825830936431885, "std": 13.668137550354004, "min": -28.23016357421875, "p10": -7.776062393188477, "median": 3.46382999420166, "p90": 26.33895645141603, "max": 40.445709228515625, "pos_frac": 0.640625, "sample": [-2.5857696533203125, 1.3120498657226562, 14.035110473632812, -8.570175170898438, -23.48908805847168, -28.23016357421875, 13.470558166503906, 9.842170715332031, 8.52444076538086, -2.802989959716797, 4.717475891113281, 5.7080078125, -0.4436836242675781, 11.348140716552734, 0.1719646453857422, -0.47129058837890625, 1.9752426147460938, 19.12952423095703, -2.592426300048828, -1.537984848022461, -10.066398620605469, 31.23675537109375, 3.175394058227539, 11.2899169921875, 10.135566711425781, -15.003299713134766, 0.8265762329101562, -8.734184265136719, 5.689626693725586, -0.25253868103027344, 13.365066528320312, 2.505983352661133, 22.661216735839844, -7.200687408447266, 3.7429428100585938, 27.915130615234375, 9.653518676757812, 21.838821411132812, 32.25807189941406, -5.0713043212890625, -1.0250663757324219, 19.777175903320312, 5.091712951660156, 4.945701599121094, -7.7787322998046875, 31.423324584960938, -4.92437744140625, 40.445709228515625, 3.7947540283203125, 28.308258056640625, 22.325546264648438, -7.769832611083984, 22.655258178710938, 34.765445709228516, 4.739315032958984, 2.5379161834716797, -5.2769622802734375, 3.1847171783447266, 6.511741638183594, 16.632036209106445, -5.026477813720703, -6.047702789306641, 1.2289600372314453, -7.142539978027344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000118.npy"}
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 4.57119083404541, "std": 25.429540634155273, "min": -52.2325439453125, "p10": -16.63108730316162, "median": 0.7449893951416016, "p90": 28.783337402343754, "max": 138.01779174804688, "pos_frac": 0.546875, "sample": [-16.700532913208008, -2.8414077758789062, -11.966451644897461, -9.213825225830078, -9.622299194335938, -9.015640258789062, 4.027151107788086, 12.1591796875, -1.531503677368164, 3.730426788330078, -13.089340209960938, -8.250022888183594, -0.8991317749023438, 45.31914520263672, 15.590164184570312, -52.2325439453125, -6.0581512451171875, -6.60986328125, 3.8979339599609375, 29.823638916015625, 0.7937889099121094, 6.830892562866211, -19.729232788085938, 0.35459327697753906, 20.189056396484375, 14.385360717773438, -12.512531280517578, -3.357799530029297, 29.284652709960938, 44.409423828125, 14.514671325683594, 0.6961898803710938, 3.9074974060058594, 9.344331741333008, 18.591644287109375, 138.01779174804688, 8.106260299682617, -16.46904754638672, -0.6477794647216797, 20.23053741455078, -1.1299629211425781, 27.613601684570312, -6.245733261108398, -0.9789810180664062, -21.289344787597656, -16.05040168762207, 64.6102294921875, 8.2025146484375, -2.4023284912109375, 24.143959045410156, -28.162918090820312, -45.8133544921875, -17.824966430664062, 0.6623744964599609, 1.4946823120117188, 1.1217536926269531, 9.112607955932617, 34.35911560058594, 4.419078826904297, 8.5416259765625, 6.289846420288086, 9.198776245117188, -4.111232757568359, -6.661956787109375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000119.npy"}
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 7.405904769897461, "std": 15.561226844787598, "min": -20.778709411621094, "p10": -10.097793197631836, "median": 6.372758865356445, "p90": 27.61622390747071, "max": 62.24971008300781, "pos_frac": 0.65625, "sample": [31.5628662109375, -5.28211784362793, -9.139358520507812, 9.057947158813477, 13.7469482421875, -1.1828193664550781, 17.62978744506836, -0.10515594482421875, 10.844993591308594, -9.694694519042969, 62.24971008300781, 5.321144104003906, 13.85329818725586, 35.302734375, 8.435890197753906, 43.02839660644531, 6.4672698974609375, -4.7255096435546875, 10.056846618652344, 13.940040588378906, 13.226203918457031, -19.106773376464844, 12.963577270507812, 4.815387725830078, -0.0127105712890625, 1.7584953308105469, 25.073837280273438, -14.787155151367188, 22.87224578857422, 10.939498901367188, 28.70581817626953, -14.199394226074219, 35.57831573486328, -2.9786033630371094, 22.984786987304688, -2.0266761779785156, -0.38946533203125, 6.739795684814453, 5.841804504394531, -20.778709411621094, 7.389493942260742, 6.278247833251953, 6.4823455810546875, 47.02644348144531, -0.3375968933105469, -11.589149475097656, -8.955961227416992, 9.4232177734375, 14.517730712890625, 9.113945007324219, -2.2911338806152344, 7.573305130004883, 2.593608856201172, 16.220565795898438, 6.978261947631836, -0.9393386840820312, 3.5663070678710938, -9.515060424804688, -10.270549774169922, -12.966682434082031, 12.21160888671875, 5.812282562255859, 4.379722595214844, 2.6877593994140625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000120.npy"}
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 10.676218032836914, "std": 24.532217025756836, "min": -24.45135498046875, "p10": -8.752853393554688, "median": 3.7574586868286133, "p90": 45.222904205322315, "max": 99.92544555664062, "pos_frac": 0.609375, "sample": [-24.45135498046875, 84.38821411132812, 4.178001403808594, -4.841840744018555, 7.794395446777344, 99.92544555664062, 29.483318328857422, -6.508007049560547, -7.884853363037109, 33.21533966064453, -0.28099822998046875, -6.876380920410156, 29.921310424804688, 12.053688049316406, -10.518630981445312, 0.9486865997314453, 3.465951919555664, 16.6824951171875, 17.272552490234375, -17.281463623046875, 50.36900329589844, -5.990306854248047, 26.190711975097656, 18.549407958984375, 31.58233642578125, -3.9808807373046875, 8.012290954589844, -4.78919792175293, 8.936531066894531, -5.422645568847656, -1.6424903869628906, 72.62530517578125, -0.5259361267089844, 2.8867568969726562, 9.949241638183594, 9.635025024414062, -12.184066772460938, -17.321792602539062, 5.3036041259765625, -6.1101226806640625, -8.306461334228516, -2.187162399291992, 16.82855987548828, 11.345870971679688, -6.150909423828125, 12.438766479492188, -10.58980941772461, 62.03710174560547, 4.0489654541015625, 3.218883514404297, 14.464410781860352, 1.1844253540039062, -4.8287200927734375, 3.3047733306884766, -7.738378524780273, 66.46636962890625, -1.867258071899414, 12.179943084716797, 0.8736572265625, 8.070472717285156, 7.319969177246094, 6.261268615722656, 57.0887451171875, -8.944164276123047], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000121.npy"}
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 6.72728157043457, "std": 28.778352737426758, "min": -54.95263671875, "p10": -16.715649795532226, "median": 1.7645530700683594, "p90": 36.50267181396485, "max": 124.54904174804688, "pos_frac": 0.59375, "sample": [65.68511962890625, -19.210182189941406, 0.9950485229492188, 9.036125183105469, -6.258544921875, -8.420303344726562, 87.52252197265625, 45.181053161621094, -10.437131881713867, 22.224559783935547, 6.257534027099609, -11.267623901367188, -4.187444686889648, 16.851627349853516, 3.307483673095703, 124.54904174804688, -14.791046142578125, 2.1142196655273438, -16.822376251220703, 7.200750350952148, 0.6825065612792969, 13.135377883911133, 15.210521697998047, 6.9380950927734375, 14.116458892822266, 73.27183532714844, -15.927556991577148, -13.338069915771484, 35.19964599609375, 9.72979736328125, -54.95263671875, 6.1292266845703125, -10.126760482788086, -19.066070556640625, -16.46662139892578, 4.2322235107421875, 1.732025146484375, 7.189323425292969, 1.5452079772949219, -4.3867340087890625, -2.1530189514160156, -3.2168655395507812, -17.965831756591797, 37.06111145019531, -31.014419555664062, 9.670736312866211, 2.441438674926758, 9.35513687133789, -5.002342224121094, -1.8381118774414062, 1.7970809936523438, -7.568365097045898, 0.17333412170410156, 0.7960586547851562, -20.471878051757812, 5.25697135925293, 4.126972198486328, -4.1034698486328125, -4.992179870605469, 14.668830871582031, -14.237762451171875, 2.235332489013672, 8.725555419921875, 92.42347717285156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000122.npy"}
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 7.2038798332214355, "std": 23.247236251831055, "min": -52.8531494140625, "p10": -12.754828453063965, "median": 2.4826889038085938, "p90": 25.785362243652347, "max": 122.56085205078125, "pos_frac": 0.65625, "sample": [10.08367919921875, 36.36687469482422, 26.10863494873047, 122.56085205078125, -12.827981948852539, -6.507755279541016, 5.899539947509766, 8.109268188476562, 18.26343536376953, 1.1280460357666016, 12.562324523925781, 78.20164489746094, -0.11341094970703125, -2.2787628173828125, 21.856643676757812, 17.895477294921875, 6.392223358154297, 6.979991912841797, 14.442413330078125, 46.67356872558594, 3.9361953735351562, 2.4938201904296875, -8.7705078125, 14.059123992919922, 16.26317596435547, 33.31376647949219, 1.8608684539794922, -3.075347900390625, 17.978851318359375, 5.2112579345703125, 2.3259029388427734, 25.03105926513672, 10.375198364257812, 16.601003646850586, 2.6949729919433594, -17.113174438476562, -2.6416778564453125, -12.188980102539062, -0.3425273895263672, 2.4715576171875, 18.730865478515625, -19.695812225341797, 0.9411506652832031, -19.542701721191406, 0.49253273010253906, 22.290435791015625, 1.5132064819335938, -8.536855697631836, 0.14937782287597656, -1.5060558319091797, -16.72490692138672, -7.466743469238281, 16.070999145507812, -7.726417541503906, -12.584136962890625, 33.096221923828125, 0.29091453552246094, 6.802650451660156, -2.1541748046875, -11.400924682617188, 0.4924278259277344, 14.79018783569336, -52.8531494140625, -16.702011108398438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000123.npy"}
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 9.415124893188477, "std": 31.236574172973633, "min": -131.99295043945312, "p10": -13.67785835266113, "median": 5.834375381469727, "p90": 48.00597229003907, "max": 92.57099914550781, "pos_frac": 0.71875, "sample": [-3.6588516235351562, 53.62232971191406, 18.861083984375, 4.991264343261719, 8.059698104858398, 8.521957397460938, 46.80934143066406, 7.067054748535156, 56.092308044433594, 29.314697265625, 41.36278533935547, 26.335147857666016, 0.5922603607177734, 57.10609436035156, 2.9968395233154297, 14.57332992553711, 3.4718093872070312, 4.154897689819336, 1.7458114624023438, -3.419424057006836, -131.99295043945312, 5.2269287109375, 0.10193634033203125, 26.197532653808594, 5.790813446044922, 13.617938995361328, 2.0051937103271484, -22.272872924804688, -77.85047912597656, -20.066871643066406, 5.877937316894531, 17.021804809570312, -42.0947265625, 38.23374938964844, 15.694416046142578, 48.51881408691406, 51.93199157714844, -15.389373779296875, -0.5220718383789062, 7.54364013671875, -0.9301071166992188, 6.0594940185546875, 5.928623199462891, -7.089363098144531, 4.562807083129883, 19.48656463623047, 4.67352294921875, 39.70537567138672, -9.684322357177734, -4.423768997192383, 51.497894287109375, 3.937786102294922, 2.494670867919922, -1.1593589782714844, 27.044830322265625, 18.407684326171875, 14.239143371582031, -4.195159912109375, -26.035137176513672, 43.258575439453125, -1.6228599548339844, 25.989704132080078, -8.323348999023438, 92.57099914550781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000124.npy"}
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 9.795515060424805, "std": 39.639888763427734, "min": -161.0499267578125, "p10": -15.221532249450682, "median": 6.928645133972168, "p90": 46.838721466064456, "max": 143.96896362304688, "pos_frac": 0.703125, "sample": [-161.0499267578125, 15.034637451171875, 34.37425994873047, 20.33697509765625, -7.352935791015625, 0.12312507629394531, 9.23748779296875, 2.6929054260253906, 88.5621337890625, 5.968891143798828, 143.96896362304688, -34.33361053466797, 20.655479431152344, 2.093475341796875, 7.25932502746582, 20.99692153930664, 45.4659423828125, 23.442672729492188, 19.851966857910156, -8.863426208496094, 8.73745346069336, 9.03976058959961, -9.712532043457031, 5.463630676269531, -11.382759094238281, -2.895160675048828, 9.96234130859375, -15.689435958862305, -9.264533996582031, -4.082359313964844, 50.515472412109375, 12.735267639160156, 16.58551025390625, 115.095947265625, 9.810710906982422, -11.219024658203125, 4.167564392089844, -0.159912109375, 47.42705535888672, 103.37199401855469, -22.130447387695312, 24.49126434326172, 56.59673309326172, 9.94198989868164, 7.926944732666016, -34.229774475097656, 9.044570922851562, 6.02099609375, 7.127323150634766, 5.036937713623047, 35.860595703125, 3.920734405517578, 16.839698791503906, 11.938636779785156, 5.08428955078125, 26.15654754638672, 5.196399688720703, 0.3299560546875, -8.945533752441406, -1.7533226013183594, 6.72996711730957, -14.129756927490234, -39.896156311035156, -67.21794128417969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000125.npy"}
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 0.41440293192863464, "std": 19.034666061401367, "min": -61.194053649902344, "p10": -22.904209136962887, "median": 2.041083335876465, "p90": 24.323445129394536, "max": 50.91581726074219, "pos_frac": 0.53125, "sample": [6.432332992553711, 2.819398880004883, 3.362701416015625, 7.03434944152832, 1.8730716705322266, 2.209095001220703, -2.19500732421875, -0.742431640625, 17.557052612304688, -4.902807235717773, 7.797098159790039, 50.91581726074219, -29.916534423828125, -24.1031494140625, 2.7403945922851562, 7.724308013916016, -14.561920166015625, -5.600536346435547, 4.48432731628418, -14.268386840820312, 38.441864013671875, 16.536422729492188, -17.542226791381836, 5.292845726013184, -10.441997528076172, 2.2520599365234375, 16.468490600585938, -26.54308319091797, -61.194053649902344, 5.639135360717773, 4.00665283203125, -13.674545288085938, -4.227657318115234, 3.757680892944336, -11.951189041137695, -0.5072479248046875, 3.5913028717041016, -6.104156494140625, 12.463117599487305, -28.22332763671875, 37.81376647949219, -27.813377380371094, 10.92735481262207, 24.592193603515625, 12.366798400878906, 25.161338806152344, 23.696365356445312, 32.13899230957031, -14.05810546875, -4.363250732421875, -12.88644027709961, 12.445655822753906, 9.600421905517578, 1.8579444885253906, -17.83281707763672, -13.972076416015625, -0.2802886962890625, -2.633035659790039, -20.10668182373047, -30.37183380126953, 36.873443603515625, -10.629890441894531, -11.49557876586914, 18.7916259765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000126.npy"}
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 6.133054733276367, "std": 23.205381393432617, "min": -36.752166748046875, "p10": -22.86535949707031, "median": 4.920015335083008, "p90": 29.933678817749023, "max": 100.76911926269531, "pos_frac": 0.625, "sample": [-9.311721801757812, 100.76911926269531, 31.61882781982422, -6.971122741699219, -35.349327087402344, 55.19712829589844, -7.6218109130859375, 19.66925811767578, 5.3118743896484375, 15.282211303710938, 19.274398803710938, -1.9264755249023438, 7.9013214111328125, 5.122318267822266, -26.32427978515625, -2.7872543334960938, 13.168563842773438, 10.838233947753906, 10.527992248535156, -4.676044464111328, 16.89678955078125, 10.590744018554688, 4.71771240234375, 5.398983001708984, 7.080059051513672, -18.094970703125, 12.201072692871094, 1.5639495849609375, 63.654815673828125, -0.1018524169921875, 1.8252601623535156, 20.865325927734375, 0.41730499267578125, 38.72499084472656, -0.5379543304443359, 29.841957092285156, 11.335220336914062, 11.66537857055664, 29.97298812866211, 26.233566284179688, -28.611068725585938, 21.392303466796875, 18.373573303222656, -4.727254867553711, -36.752166748046875, 18.793739318847656, -33.59352111816406, -9.294235229492188, -8.049118041992188, -17.539627075195312, -24.66130828857422, -0.0657806396484375, -4.199832916259766, 8.888755798339844, 1.7335739135742188, -18.67481231689453, -33.878448486328125, 2.028076171875, 43.348655700683594, 2.4738388061523438, 3.442453384399414, 9.375919342041016, 12.135444641113281, -3.3881893157958984], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000127.npy"}
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 15.04159164428711, "std": 38.120643615722656, "min": -63.1900634765625, "p10": -18.9445478439331, "median": 5.173077583312988, "p90": 53.35512847900394, "max": 143.74221801757812, "pos_frac": 0.65625, "sample": [34.49481201171875, 22.413909912109375, 3.1903076171875, -2.1238327026367188, -5.281558990478516, 1.1404800415039062, 31.148056030273438, 125.04666137695312, 2.66644287109375, 96.29525756835938, 14.345344543457031, -63.1900634765625, 35.628074645996094, -0.842010498046875, 2.2530384063720703, -30.59642791748047, -0.8848247528076172, 27.80420684814453, -34.18018341064453, 56.733978271484375, 124.356201171875, 0.0024623870849609375, 3.8878021240234375, 143.74221801757812, 106.62112426757812, 44.44972229003906, 0.5519561767578125, 1.233123779296875, -20.84383201599121, 6.206157684326172, -16.577239990234375, -10.429328918457031, 15.044857025146484, 41.9476318359375, 25.097026824951172, 32.75736999511719, -6.878719329833984, 11.11456298828125, -28.557018280029297, 30.883514404296875, -7.231777191162109, 6.7826080322265625, 9.83001708984375, 3.685626983642578, -19.959108352661133, 26.82659149169922, -0.6506309509277344, 5.942035675048828, -14.772560119628906, -2.6979236602783203, 4.97210693359375, 11.882537841796875, -5.8142852783203125, -50.375518798828125, 5.374048233032227, -8.580131530761719, 7.010139465332031, 28.77508544921875, 63.97663879394531, -6.121685028076172, -0.6706047058105469, 25.452064514160156, 12.8841552734375, 45.47114562988281], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000128.npy"}
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 1.2645282745361328, "std": 38.554229736328125, "min": -159.75274658203125, "p10": -29.613021850585934, "median": 3.5480222702026367, "p90": 34.004998016357426, "max": 140.08734130859375, "pos_frac": 0.53125, "sample": [-57.17735290527344, -6.423089981079102, 9.366703033447266, -19.878578186035156, 3.5800228118896484, 21.640209197998047, 55.12458801269531, -0.4741935729980469, 30.207237243652344, 17.34201431274414, 20.271194458007812, 1.7656059265136719, 34.368446350097656, -20.469932556152344, -0.3909149169921875, 11.966445922851562, 30.193008422851562, -4.57383918762207, -2.9319229125976562, 7.051227569580078, 46.179595947265625, -159.75274658203125, -7.875732421875, 28.194679260253906, -0.6492023468017578, 140.08734130859375, 11.531682968139648, -30.568511962890625, 5.485252380371094, -20.690536499023438, 3.6289291381835938, -87.72412109375, -27.383544921875, 9.886526107788086, 5.694921493530273, 39.100196838378906, -25.050777435302734, -19.251060485839844, -21.42462158203125, 16.44940948486328, 4.808418273925781, -11.500045776367188, -16.328746795654297, -3.0235347747802734, 8.26834487915039, 5.0620574951171875, -35.050025939941406, 30.624874114990234, 11.479827880859375, 3.516021728515625, 4.857723236083984, 33.156951904296875, 57.475852966308594, -2.291423797607422, -6.117792129516602, -15.75880241394043, -11.444694519042969, -54.886566162109375, 77.93655395507812, -1.519012451171875, 7.767827987670898, -66.95181274414062, -2.5903244018554688, 27.01361846923828], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000129.npy"}
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 13.005002975463867, "std": 32.22734451293945, "min": -47.712738037109375, "p10": -21.057899475097656, "median": 5.880515098571777, "p90": 63.300619506835936, "max": 131.70046997070312, "pos_frac": 0.625, "sample": [-18.194988250732422, 15.861614227294922, 24.726959228515625, 62.94844055175781, 1.1157188415527344, -47.712738037109375, 0.65203857421875, 20.87989044189453, 23.261430740356445, 1.1760311126708984, -5.7123565673828125, 29.989822387695312, 36.960670471191406, 2.8446807861328125, -28.231834411621094, 44.94536590576172, -1.249185562133789, -43.83177947998047, -20.541305541992188, 79.43335723876953, 5.390705108642578, -12.645256042480469, -30.566612243652344, 25.78662109375, -21.279296875, -11.970611572265625, -29.36687469482422, 10.789459228515625, 0.6744804382324219, -2.0614089965820312, 66.58421325683594, 25.401824951171875, 24.415390014648438, 12.314079284667969, 12.315658569335938, 26.096454620361328, 47.8236083984375, 17.076053619384766, 83.10673522949219, 19.671092987060547, -12.064884185791016, -23.141971588134766, 27.860122680664062, -6.549064636230469, -8.461540222167969, 6.370325088500977, 26.827728271484375, 4.5839080810546875, 8.319839477539062, -14.498970031738281, -11.572654724121094, -1.421875, 70.47549438476562, 2.499044418334961, -10.83319091796875, 29.927616119384766, 131.70046997070312, -3.2528648376464844, 29.46540069580078, 63.636627197265625, 12.122467041015625, -0.13547134399414062, -1.8660449981689453, 63.45155334472656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000130.npy"}
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 17.419395446777344, "std": 38.204376220703125, "min": -63.407928466796875, "p10": -14.921284866333007, "median": 4.84732723236084, "p90": 71.22366561889652, "max": 169.54811096191406, "pos_frac": 0.65625, "sample": [14.510383605957031, 33.139190673828125, -11.54170036315918, -11.569263458251953, -0.34220314025878906, 47.16276550292969, 13.050090789794922, 2.116823196411133, -4.461820602416992, 75.3703842163086, 2.850860595703125, 3.356027603149414, 0.8814582824707031, -7.512237548828125, 10.332748413085938, 6.436285018920898, 48.93376159667969, 36.41095733642578, 14.490966796875, -63.407928466796875, -25.20370101928711, 26.930282592773438, 169.54811096191406, -4.02203369140625, -30.094085693359375, -16.51325225830078, -1.6558265686035156, 9.059524536132812, -17.244884490966797, 15.728668212890625, -0.3833274841308594, 61.54798889160156, 36.73146057128906, -3.380239486694336, 49.47691345214844, 127.59300231933594, -1.1243782043457031, 105.88943481445312, 8.934669494628906, 79.19277954101562, -3.2755603790283203, 45.60688018798828, -2.2345218658447266, 0.035858154296875, -14.322341918945312, -6.745086669921875, -3.732309341430664, -25.68218994140625, 5.132728576660156, 88.627685546875, 3.2716331481933594, 3.4601974487304688, 17.116592407226562, -15.177974700927734, 3.2428207397460938, 32.99175262451172, 4.795690536499023, 4.898963928222656, 82.47749328613281, 28.616737365722656, 22.232040405273438, 0.5564918518066406, 24.636510848999023, 17.09253692626953], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000131.npy"}
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 10.359777450561523, "std": 32.97364044189453, "min": -64.63531494140625, "p10": -24.615818405151366, "median": 6.883214950561523, "p90": 41.29292449951175, "max": 130.35745239257812, "pos_frac": 0.671875, "sample": [-0.8724403381347656, 5.85894775390625, 14.683509826660156, 9.935497283935547, -8.471967697143555, 2.515665054321289, 22.690078735351562, -1.0763702392578125, -35.61168670654297, -35.09942626953125, -1.6862545013427734, 5.510993957519531, 8.378875732421875, 5.372734069824219, 25.489501953125, -7.198890686035156, -23.857288360595703, 130.35745239257812, 17.220962524414062, 5.6933746337890625, -20.104835510253906, -64.63531494140625, 26.968950271606445, -29.712783813476562, -11.619991302490234, 0.497802734375, 23.373023986816406, 21.761516571044922, 6.621889114379883, 4.028297424316406, 83.25371551513672, 1.9156684875488281, -16.111509323120117, 12.90826416015625, -3.7265777587890625, 74.19511413574219, 44.524932861328125, 7.144540786743164, 18.574684143066406, 14.184532165527344, 6.35308837890625, -30.243186950683594, 74.73960876464844, 8.879981994628906, 9.243490219116211, -4.885322570800781, 30.291763305664062, 14.630859375, 2.721963882446289, -1.1734905242919922, 20.428466796875, 15.403457641601562, -24.940902709960938, 17.340925216674805, 9.953876495361328, 11.900527954101562, 78.4822998046875, -12.95405387878418, 27.34991455078125, 26.309860229492188, -13.800186157226562, 33.75157165527344, -63.425628662109375, 92.79168701171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000132.npy"}
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 2.497537136077881, "std": 33.340965270996094, "min": -90.46778869628906, "p10": -38.36894226074219, "median": 1.2266101837158203, "p90": 38.29050235748295, "max": 117.05091857910156, "pos_frac": 0.53125, "sample": [-3.7250900268554688, -33.27027893066406, -0.7654953002929688, 15.077747344970703, -0.8817138671875, -39.694618225097656, 14.994159698486328, 51.45787811279297, 7.437747955322266, 18.087005615234375, -24.5294189453125, 117.05091857910156, 9.107118606567383, 11.677547454833984, -29.627643585205078, 4.1266632080078125, -32.839874267578125, -90.46778869628906, 14.215534210205078, -10.586341857910156, 14.2442626953125, 14.251827239990234, -20.213035583496094, 9.238933563232422, 3.649991989135742, -11.528533935546875, 77.04804992675781, 22.3775634765625, 0.10410308837890625, -0.17772865295410156, -16.720726013183594, -9.091653823852539, -52.566436767578125, 11.558273315429688, -36.909576416015625, -41.06852722167969, 0.3960456848144531, -59.180908203125, 18.938343048095703, 89.56053161621094, -59.04436492919922, 20.00457000732422, 42.46570587158203, -6.081228256225586, 12.542518615722656, 43.028167724609375, 13.103370666503906, 20.762176513671875, -9.066925048828125, 47.512939453125, 21.69762420654297, -5.220680236816406, -3.986940383911133, 23.95916748046875, 17.97393035888672, -2.9929237365722656, 3.387451171875, 28.54836082458496, 2.0571746826171875, -6.070016860961914, -14.312873840332031, -38.994384765625, -2.044109344482422, -0.1411895751953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000133.npy"}
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 15.976378440856934, "std": 44.909847259521484, "min": -85.75482177734375, "p10": -25.227430725097648, "median": 8.610097885131836, "p90": 73.45374221801764, "max": 148.1856231689453, "pos_frac": 0.65625, "sample": [-51.869537353515625, 26.163925170898438, -17.632736206054688, 15.436077117919922, 13.44631576538086, 4.370697021484375, 10.484375, 9.490684509277344, 18.890830993652344, 136.37451171875, 57.408958435058594, 11.704971313476562, -12.397109985351562, -10.582637786865234, -4.437509536743164, 1.7823677062988281, 39.79265213012695, 32.82637023925781, -30.258403778076172, 27.314498901367188, 7.979061126708984, 19.533546447753906, 3.8395004272460938, -28.889324188232422, 32.68419647216797, 20.12152862548828, -28.4822998046875, -6.765892028808594, 5.627632141113281, 2.1107311248779297, -0.17264556884765625, -0.9708728790283203, 130.55177307128906, 9.241134643554688, -0.07744789123535156, -9.202516555786133, 16.908782958984375, 1.6411876678466797, -63.83879089355469, 4.035024642944336, 100.946044921875, 12.000083923339844, -85.75482177734375, 50.04447937011719, -1.7901420593261719, 13.643409729003906, 80.330078125, -76.2557373046875, 34.70915222167969, 4.796379089355469, 21.50510025024414, -0.174407958984375, -9.335029602050781, 29.800186157226562, 148.1856231689453, 18.189964294433594, 13.456745147705078, -0.78900146484375, -8.080101013183594, 52.42741012573242, 138.32485961914062, 95.19760131835938, 2.7271385192871094, -5.800445556640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000134.npy"}
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 7.776280879974365, "std": 27.85268783569336, "min": -111.04153442382812, "p10": -16.353344345092772, "median": 4.858624458312988, "p90": 41.69580535888673, "max": 88.99128723144531, "pos_frac": 0.671875, "sample": [-15.947559356689453, -5.115978240966797, -27.9100341796875, -4.392362594604492, 8.650005340576172, 2.798685073852539, -47.63233947753906, -10.792503356933594, 2.2592201232910156, 39.09864044189453, 54.58207702636719, -32.67951202392578, -6.632110595703125, 4.136114120483398, 49.94517517089844, -22.301471710205078, 31.18596649169922, 2.2873077392578125, 11.908674240112305, 3.814462661743164, 3.7635498046875, 42.808876037597656, -0.43418121337890625, 29.111366271972656, 13.823898315429688, 0.6310920715332031, 17.194786071777344, 20.82360076904297, 26.179039001464844, 11.900447845458984, 22.974044799804688, -16.527252197265625, 62.06043243408203, 9.75300407409668, 14.384658813476562, -2.1263427734375, 28.139938354492188, -4.900733947753906, 16.83533477783203, 5.595729827880859, 1.1455497741699219, 88.99128723144531, 2.8980140686035156, 28.60546875, 2.881305694580078, -11.061027526855469, 3.291656494140625, -8.127933502197266, 7.573215484619141, 49.23246765136719, -7.340511322021484, -111.04153442382812, -23.037147521972656, 5.6351470947265625, 30.101547241210938, 15.547863006591797, 5.581134796142578, 14.141191482543945, -12.134859085083008, 27.686199188232422, -1.248016357421875, 11.34634780883789, 51.699588775634766, -13.938705444335938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000135.npy"}
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 17.457881927490234, "std": 50.523677825927734, "min": -85.87568664550781, "p10": -36.85093460083007, "median": 8.833589553833008, "p90": 64.67745971679688, "max": 185.88522338867188, "pos_frac": 0.625, "sample": [-4.1107330322265625, 11.620733261108398, -7.422330856323242, 12.328285217285156, -72.5955810546875, 29.40454864501953, -38.036109924316406, -9.259353637695312, -40.95812225341797, 18.631608963012695, -2.54522705078125, -85.87568664550781, -0.30406761169433594, 3.344066619873047, -4.716651916503906, 8.471817016601562, 19.049528121948242, 14.590755462646484, 3.3217906951904297, -7.2351837158203125, 57.0, 10.715579986572266, 34.316139221191406, 74.5909652709961, -5.778694152832031, 7.1045074462890625, 46.056182861328125, -5.136962890625, 24.976348876953125, 169.57525634765625, 60.270843505859375, -17.35540008544922, 153.2662353515625, 64.96035766601562, 136.7955322265625, 39.505218505859375, 16.434898376464844, 9.195362091064453, 32.00315856933594, -17.93021011352539, -16.585826873779297, -39.000701904296875, -34.08552551269531, 36.96028137207031, 47.53681564331055, 6.413604736328125, -45.52393341064453, 42.36128234863281, -1.02398681640625, 48.39991760253906, -5.253559112548828, -13.926910400390625, 25.92730712890625, 109.27360534667969, 19.571182250976562, 185.88522338867188, -60.26753234863281, 4.813880920410156, 9.598031997680664, 12.378036499023438, 64.01736450195312, 7.005035400390625, 1.1746864318847656, -26.613372802734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000136.npy"}
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 11.8425874710083, "std": 43.93686294555664, "min": -170.8138427734375, "p10": -19.065148925781244, "median": 6.9308271408081055, "p90": 51.957321548461934, "max": 178.4902801513672, "pos_frac": 0.71875, "sample": [-8.012222290039062, -76.11203002929688, 3.22406005859375, 43.01243591308594, 6.987998962402344, -0.5420303344726562, 46.97335433959961, 54.09330749511719, 1.4742393493652344, 178.4902801513672, 14.912090301513672, -8.016677856445312, 20.36505889892578, -12.906730651855469, 10.592411041259766, -38.41400909423828, -14.757034301757812, 6.043464660644531, 4.3206787109375, 11.447023391723633, 46.62950134277344, 15.007850646972656, 32.420501708984375, -26.765106201171875, 1.987680435180664, 29.89324188232422, 3.1617507934570312, 57.730857849121094, 14.068193435668945, 26.22539520263672, 29.49272918701172, -51.880226135253906, -9.707786560058594, 19.666290283203125, -9.209976196289062, 4.871669769287109, 3.520465850830078, 11.082962036132812, 2.6546859741210938, 6.873655319213867, 35.60281753540039, 15.077068328857422, 8.938041687011719, 67.80447387695312, 0.8956794738769531, -13.843559265136719, -6.825855255126953, 16.12266731262207, 103.10002136230469, 65.11830139160156, 1.800527572631836, -5.4682769775390625, -1.1166858673095703, -20.911483764648438, 14.991392135620117, 6.434965133666992, 1.4596824645996094, -35.129173278808594, 15.64007568359375, 40.24641418457031, 16.69219970703125, 25.590553283691406, -170.8138427734375, 125.61959838867188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000137.npy"}
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 9.719563484191895, "std": 52.55023956298828, "min": -145.08987426757812, "p10": -37.696276092529295, "median": 9.804655075073242, "p90": 47.6371337890625, "max": 279.3914794921875, "pos_frac": 0.703125, "sample": [20.592498779296875, -53.14068603515625, 20.489822387695312, 0.8410186767578125, 8.773197174072266, -5.544008255004883, -60.58598327636719, -37.755393981933594, -100.13140869140625, 3.1734275817871094, 4.765527725219727, -145.08987426757812, 94.8280029296875, 26.78485870361328, 88.07286071777344, -83.51376342773438, 9.082466125488281, 8.642000198364258, -3.3696975708007812, -31.895275115966797, 20.36699676513672, 9.266242980957031, 279.3914794921875, 16.872940063476562, 15.17327880859375, 15.92646598815918, 21.660327911376953, 35.90925598144531, 13.927818298339844, -11.554025650024414, 7.178153991699219, -1.6010112762451172, 17.386947631835938, -12.097122192382812, 10.343067169189453, -8.879547119140625, 3.346158981323242, -79.10797882080078, -1.5903034210205078, 19.573970794677734, 59.60577392578125, 47.98872375488281, 24.30603790283203, 29.225379943847656, -3.9992408752441406, -19.332626342773438, 20.820043563842773, 16.627574920654297, 37.143150329589844, 1.1314239501953125, 21.7481689453125, 23.717498779296875, 46.81675720214844, 20.349945068359375, 18.634002685546875, 1.9660835266113281, 11.345893859863281, 11.313207626342773, -13.101297378540039, 9.26116943359375, 94.23670196533203, 1.3665733337402344, 61.92671203613281, -37.55833435058594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000138.npy"}
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 7.547426700592041, "std": 41.069149017333984, "min": -116.71954345703125, "p10": -34.33690795898437, "median": 7.233088493347168, "p90": 49.27177810668946, "max": 101.99710083007812, "pos_frac": 0.625, "sample": [-7.518016815185547, 14.270898818969727, 4.93548583984375, 4.696868896484375, -36.2603759765625, 46.3415412902832, 42.694793701171875, 8.242172241210938, 44.20633316040039, -27.144922256469727, -19.977046966552734, 27.291048049926758, -0.9930515289306641, -8.777341842651367, 14.645660400390625, -8.660003662109375, 72.75730895996094, 6.224004745483398, 31.74445343017578, 21.741357803344727, -64.53709411621094, -116.71954345703125, -58.69378662109375, 8.775753021240234, 0.8075828552246094, 21.436687469482422, 42.14933776855469, 10.233707427978516, 2.026165008544922, -11.077835083007812, 22.48448944091797, 37.049530029296875, 9.94482421875, 101.99710083007812, 5.396858215332031, 47.21076965332031, -6.651741027832031, -8.572219848632812, 101.43032836914062, 13.145736694335938, 3.7085494995117188, -27.77999496459961, 17.214073181152344, -4.0226593017578125, 24.04524040222168, -51.454219818115234, 54.458587646484375, -2.6310043334960938, 92.25325012207031, 42.92668533325195, -98.65142822265625, -7.539730072021484, -26.382949829101562, -73.32383728027344, -29.84881591796875, 13.574676513671875, -15.3966064453125, 16.775428771972656, -2.994779586791992, 11.691085815429688, 12.754203796386719, 50.155067443847656, 6.002927780151367, 89.20376586914062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000139.npy"}
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 12.714333534240723, "std": 49.835205078125, "min": -156.94261169433594, "p10": -34.48170166015625, "median": 8.911544799804688, "p90": 69.27404022216798, "max": 134.46017456054688, "pos_frac": 0.59375, "sample": [93.9840087890625, 76.55252075195312, 26.22573471069336, -29.967788696289062, -34.89508056640625, -13.831626892089844, -24.049461364746094, -83.02408599853516, 12.289588928222656, 59.47827911376953, -156.94261169433594, 49.47462463378906, -27.911636352539062, 105.68798828125, 48.62964630126953, -35.514251708984375, -3.251779556274414, 22.008445739746094, 38.659698486328125, 23.99261474609375, -3.295919418334961, 56.9273681640625, 26.599864959716797, 29.82574462890625, -12.307659149169922, 134.218994140625, -8.751943588256836, 28.57298469543457, 44.78492736816406, -33.51715087890625, -11.0255126953125, 5.1325531005859375, 19.963462829589844, -41.995521545410156, -22.239456176757812, 81.95367431640625, 15.640462875366211, 65.30348205566406, 45.8291015625, 134.46017456054688, 55.1766357421875, 60.005043029785156, 44.02522277832031, 26.931278228759766, -0.4364013671875, -4.672395706176758, 70.9757080078125, 0.2626914978027344, 58.83940887451172, -19.041383743286133, -32.154911041259766, 9.536235809326172, 8.286853790283203, 7.248268127441406, 12.004081726074219, 0.35828399658203125, -9.670684814453125, -92.43387603759766, -16.84259033203125, -32.56999969482422, 8.030803680419922, 23.273273468017578, -5.6442718505859375, -61.44440460205078], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000140.npy"}
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 14.282639503479004, "std": 45.822757720947266, "min": -108.86628723144531, "p10": -33.32124557495117, "median": 16.49039077758789, "p90": 57.814838790893575, "max": 190.14752197265625, "pos_frac": 0.6875, "sample": [-15.046783447265625, -18.585128784179688, 106.52874755859375, 45.06260299682617, 27.57146453857422, 22.96154022216797, 16.76727294921875, 38.20501708984375, 6.005718231201172, -10.382129669189453, -30.0838623046875, -40.29719924926758, 1.324533462524414, -39.54597473144531, 60.16313552856445, 16.025920867919922, 5.314605712890625, 2.571563720703125, -13.043083190917969, -108.86628723144531, -13.3834228515625, -12.045774459838867, -20.343708038330078, 17.300106048583984, 18.281845092773438, 63.130340576171875, 15.503921508789062, -85.07920837402344, 44.792449951171875, -73.15065002441406, 5.984413146972656, -30.56006622314453, 30.325767517089844, 19.86914825439453, 19.197402954101562, 25.87796401977539, 36.69656753540039, 26.23285675048828, 34.892704010009766, 49.777496337890625, -10.0985107421875, 26.437686920166016, 22.68337631225586, 20.6162109375, 44.83208465576172, 15.301719665527344, 0.5512790679931641, 16.21350860595703, 20.916221618652344, 16.770965576171875, -34.504608154296875, -4.884147644042969, 21.591815948486328, 133.39395141601562, 52.335479736328125, 190.14752197265625, -3.5054359436035156, 14.523361206054688, 73.75206756591797, 87.32011413574219, 32.15309143066406, 2.7402496337890625, -56.13621520996094, -15.014640808105469], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000141.npy"}
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 9.707908630371094, "std": 29.916566848754883, "min": -59.940277099609375, "p10": -18.523544692993163, "median": 5.373954772949219, "p90": 50.58996124267579, "max": 100.67393493652344, "pos_frac": 0.5625, "sample": [17.979583740234375, -46.91265106201172, 0.27036094665527344, 32.93878936767578, -13.258552551269531, 41.46330261230469, 33.73625183105469, 8.476036071777344, -4.376964569091797, -11.863670349121094, 28.325912475585938, -1.0110282897949219, 5.353050231933594, -9.990310668945312, -25.24268341064453, 23.40447235107422, -16.88833236694336, 5.417192459106445, -7.646413803100586, 11.469121932983398, 5.394859313964844, -10.81243896484375, 20.344388961791992, -11.005840301513672, 3.5061187744140625, 65.53607177734375, -16.14892578125, -25.6025390625, -9.293807983398438, 36.00204086303711, -12.045341491699219, 51.997955322265625, 70.65675354003906, 74.79112243652344, -19.224349975585938, 28.541824340820312, 3.4007034301757812, -4.945869445800781, -5.656520843505859, 25.75063705444336, 61.2999267578125, 58.173980712890625, 11.317596435546875, -13.914382934570312, -19.321372985839844, -14.69586181640625, 20.005001068115234, 47.30464172363281, -9.63385009765625, 18.39056396484375, 24.435516357421875, -5.305606842041016, -59.940277099609375, 100.67393493652344, 18.142620086669922, 42.27935791015625, 21.856124877929688, 12.590829849243164, -45.318939208984375, 18.23820686340332, -7.668937683105469, -3.0176734924316406, -14.410194396972656, 16.99456787109375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000142.npy"}
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 3.3130061626434326, "std": 40.609161376953125, "min": -145.50985717773438, "p10": -40.511706542968746, "median": 6.637371063232422, "p90": 43.04138069152833, "max": 109.24223327636719, "pos_frac": 0.625, "sample": [-8.986915588378906, 98.8731689453125, 17.407180786132812, 40.15004348754883, 27.974395751953125, 16.302993774414062, 0.109375, -0.7529296875, 31.5379638671875, -11.119869232177734, 13.569040298461914, -39.31044006347656, 109.24223327636719, -13.423076629638672, 39.11261749267578, -6.4188385009765625, -3.480304718017578, -74.35478210449219, 53.22053527832031, 1.0079154968261719, -64.47053527832031, 14.736209869384766, 34.614952087402344, 9.166606903076172, 10.854333877563477, 7.997655868530273, 33.503662109375, -127.23721313476562, 2.665578842163086, -28.922527313232422, -41.02653503417969, 5.27708625793457, -58.93370056152344, 4.852447509765625, 44.28052520751953, -13.35495376586914, -18.53143310546875, 2.7099857330322266, 16.673912048339844, 30.825469970703125, 11.283336639404297, -145.50985717773438, 16.528045654296875, 17.544403076171875, -5.031436920166016, -11.816009521484375, 50.741661071777344, 54.16107177734375, 0.160736083984375, 1.6786880493164062, -7.941841125488281, -7.466741561889648, 17.844566345214844, 11.903755187988281, -60.5137939453125, 16.4887752532959, 44.58155059814453, 22.775161743164062, 15.373519897460938, 11.240230560302734, 21.297544479370117, -3.9006729125976562, -13.658275604248047, -2.073881149291992], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000143.npy"}
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 9.499675750732422, "std": 49.597190856933594, "min": -107.29963684082031, "p10": -40.55690383911132, "median": 2.164348602294922, "p90": 50.90825881958012, "max": 187.8758544921875, "pos_frac": 0.53125, "sample": [22.11209487915039, 21.24887466430664, 19.119461059570312, 0.3380622863769531, -2.744415283203125, 7.10649299621582, 15.497980117797852, 15.041889190673828, 6.733282089233398, -5.927148818969727, 8.436820983886719, 36.69190979003906, -17.622344970703125, -7.417333602905273, -10.804901123046875, -4.4752349853515625, 55.16712951660156, 5.650825500488281, -24.407440185546875, 40.97089385986328, 143.21401977539062, -50.59653854370117, -66.04168701171875, -48.042510986328125, -10.70534896850586, -29.9139404296875, -44.98607635498047, 56.048545837402344, -0.8507595062255859, -2.2396926879882812, 56.41636657714844, -25.249107360839844, -13.636116027832031, -19.463272094726562, 17.690521240234375, 2.2805633544921875, -2.5894393920898438, 36.461341857910156, 165.7413787841797, 33.799957275390625, 187.8758544921875, 25.201988220214844, -35.96825408935547, 154.4060516357422, -7.505760192871094, -42.523468017578125, -5.698211669921875, -4.308340072631836, 25.868324279785156, 7.223197937011719, 2.0481338500976562, -29.606292724609375, 14.776260375976562, 32.239906311035156, -107.29963684082031, 33.84144592285156, 16.43511962890625, 25.343017578125, -8.920303344726562, -56.357643127441406, 22.269821166992188, -10.50118637084961, 3.9019393920898438, -12.81787109375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000144.npy"}
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 6.844819068908691, "std": 57.966087341308594, "min": -200.13650512695312, "p10": -47.607025909423825, "median": 7.919986724853516, "p90": 49.52648735046387, "max": 272.7742614746094, "pos_frac": 0.6875, "sample": [6.953895568847656, 20.274036407470703, 3.548654556274414, 75.8216552734375, -6.0642547607421875, -81.25325012207031, 126.03819274902344, -42.92223358154297, 49.170230865478516, -92.49339294433594, 10.175102233886719, 12.960807800292969, 7.320194244384766, 23.862289428710938, 29.024078369140625, 13.421463012695312, 3.8541297912597656, 68.17072296142578, -47.07572937011719, 20.197036743164062, 21.100929260253906, 3.0951271057128906, -19.326656341552734, 49.679168701171875, 22.945547103881836, 2.7017822265625, 36.93299865722656, 81.62481689453125, -11.327667236328125, -12.032833099365234, 39.73284912109375, 1.1269149780273438, 15.64996337890625, 40.120384216308594, -200.13650512695312, 19.865291595458984, 4.045400619506836, 2.441774368286133, 28.29224395751953, 13.54471206665039, -64.72216796875, 9.788745880126953, 13.982894897460938, 8.224639892578125, 2.4846324920654297, -23.252685546875, -1.1863555908203125, 10.83420181274414, -10.521278381347656, -47.83472442626953, 22.78008270263672, 13.798141479492188, 14.88979721069336, -24.867572784423828, -30.74553680419922, 272.7742614746094, -10.16729736328125, -35.30888366699219, -103.17379760742188, 40.49073791503906, 88.91848754882812, 6.297920227050781, -64.09103393554688, 7.615333557128906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000145.npy"}
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 6.635363578796387, "std": 33.478641510009766, "min": -68.13673400878906, "p10": -38.94672775268554, "median": 7.075336456298828, "p90": 37.40969696044922, "max": 129.8181610107422, "pos_frac": 0.6875, "sample": [34.879600524902344, -12.173961639404297, -22.604598999023438, 4.699470520019531, 10.378044128417969, -13.752593994140625, 12.566017150878906, -20.373435974121094, 4.681587219238281, 18.258079528808594, -42.41069030761719, -41.80419158935547, 104.07221984863281, 6.737161636352539, -32.27931213378906, 12.05804443359375, 33.99702453613281, -28.692710876464844, 20.640438079833984, -16.20569610595703, 23.865741729736328, -7.7400360107421875, -68.13673400878906, -4.200643539428711, -18.787067413330078, 11.410091400146484, 11.30073356628418, -14.417411804199219, -58.069580078125, 19.899566650390625, 21.61704444885254, -44.11546325683594, 129.8181610107422, 5.4173431396484375, 36.96208190917969, 2.1039161682128906, 21.77861785888672, 6.914031982421875, -50.58653259277344, 5.355365753173828, 30.51311492919922, 6.242565155029297, 2.161397933959961, 47.56398010253906, 11.396568298339844, 44.015480041503906, 12.259317398071289, 11.744277954101562, 15.414588928222656, 4.023569107055664, 11.005121231079102, 37.601531982421875, 21.99659538269043, -44.68058776855469, -6.817667007446289, 7.236640930175781, 76.21382904052734, -26.62139892578125, 4.665596008300781, 8.139554977416992, 42.19740295410156, 2.097789764404297, 11.325584411621094, 31.908662796020508], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000146.npy"}
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 6.0924272537231445, "std": 40.22786331176758, "min": -221.4754638671875, "p10": -25.395024108886716, "median": 6.575949668884277, "p90": 41.321586608886726, "max": 114.57540893554688, "pos_frac": 0.609375, "sample": [1.6501541137695312, 20.43250846862793, 26.08993148803711, -18.369361877441406, -12.642158508300781, 12.773971557617188, -30.57628631591797, 19.44097900390625, -17.127365112304688, -6.0125885009765625, 29.57494354248047, 22.325382232666016, 70.51152038574219, 62.515716552734375, -14.855316162109375, -21.201614379882812, 23.677635192871094, -221.4754638671875, 5.604665756225586, -1.2461261749267578, 0.9487819671630859, 24.33514404296875, -6.849729537963867, 6.651214599609375, 18.777908325195312, 11.011306762695312, -13.292922973632812, 7.901174545288086, 5.393070220947266, 6.5661163330078125, -14.552125930786133, -18.89910125732422, 72.17129516601562, -34.84230041503906, 40.13795471191406, 27.202713012695312, 18.88661003112793, 37.43682098388672, 54.42814636230469, 6.585783004760742, 114.57540893554688, 42.307403564453125, -34.4263916015625, 31.55990982055664, 22.375736236572266, -14.485393524169922, 35.76078796386719, -9.014274597167969, 16.139511108398438, -8.456119537353516, -23.59942626953125, 19.997802734375, -29.551437377929688, 5.202581405639648, -26.164566040039062, 4.411201477050781, -12.835771560668945, -29.606441497802734, 11.012260437011719, 41.828857421875, -16.003650665283203, 23.139617919921875, 25.11747169494629, -0.4587078094482422], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000147.npy"}
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 14.864524841308594, "std": 34.81605529785156, "min": -107.8143310546875, "p10": -17.31063537597656, "median": 10.28764533996582, "p90": 55.831889343261736, "max": 100.16482543945312, "pos_frac": 0.734375, "sample": [100.16482543945312, 3.335531234741211, -4.697845458984375, -10.758096694946289, 49.80302429199219, -10.414352416992188, 5.432647705078125, 3.5604705810546875, 94.9300537109375, 30.967132568359375, -0.23675918579101562, 20.457691192626953, 13.365394592285156, 15.887969970703125, 31.706340789794922, -32.794044494628906, 3.3084144592285156, 7.4744110107421875, 57.63056182861328, 8.862506866455078, -3.463611602783203, -15.181999206542969, 51.634986877441406, 3.598154067993164, 30.188520431518555, 26.580276489257812, 19.34128189086914, 14.653450012207031, 92.947021484375, -5.714591979980469, -4.256599426269531, 86.49296569824219, 29.250452041625977, -32.56422424316406, 4.401458740234375, 49.84136962890625, 2.575338363647461, 12.731597900390625, -39.97035217285156, 3.856046676635742, 28.29108428955078, 11.535053253173828, 6.672332763671875, 91.23689270019531, -23.555692672729492, 16.698701858520508, 10.239681243896484, 4.37890625, 68.2350082397461, -107.8143310546875, 12.733085632324219, 40.049049377441406, 10.335609436035156, 17.20400619506836, 6.208198547363281, -12.050491333007812, 47.74053955078125, 16.852577209472656, 3.2509918212890625, -36.702781677246094, 12.768150329589844, 40.36412048339844, -10.045578002929688, -18.22290802001953], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000148.npy"}
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 10.028595924377441, "std": 30.683759689331055, "min": -74.24969482421875, "p10": -22.40891304016113, "median": 7.133670806884766, "p90": 51.22171478271485, "max": 86.3094482421875, "pos_frac": 0.671875, "sample": [-4.382362365722656, 59.0843505859375, 22.172714233398438, -1.169464111328125, -74.24969482421875, 29.796066284179688, -23.753860473632812, 12.902326583862305, 3.5353736877441406, 19.380563735961914, 3.309783935546875, -7.4901123046875, 5.467529296875, 6.670146942138672, -35.76136779785156, -2.0147857666015625, 2.7009506225585938, -14.160713195800781, 8.11225700378418, 34.61985778808594, 8.789127349853516, 13.798355102539062, 25.62507438659668, -19.270702362060547, 34.31953811645508, 10.089834213256836, 3.9639720916748047, 0.35117530822753906, 10.356245040893555, -9.294013977050781, -1.4576873779296875, 39.003265380859375, 59.847808837890625, 35.08164596557617, 35.78313446044922, 28.04612922668457, 26.1304874420166, -50.78063201904297, 86.3094482421875, -10.398809432983398, 50.67079162597656, 72.34465026855469, -13.251176834106445, 79.22305297851562, 7.597194671630859, 9.882768630981445, 0.40900421142578125, 3.0957412719726562, 2.068103790283203, 23.408584594726562, -3.753662109375, 15.744277954101562, -24.882431030273438, -2.086151123046875, 4.218738555908203, -72.628662109375, 66.85313415527344, 51.45782470703125, 18.90642547607422, 10.48538589477539, -1.7589035034179688, 17.818519592285156, -32.460845947265625, -12.595199584960938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000149.npy"}
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 14.469606399536133, "std": 37.197715759277344, "min": -79.87358856201172, "p10": -32.7424819946289, "median": 13.30837345123291, "p90": 58.777014160156256, "max": 105.55380249023438, "pos_frac": 0.671875, "sample": [-11.017398834228516, 5.522605895996094, 97.92398071289062, 57.385772705078125, -79.87358856201172, -3.1098175048828125, 7.969219207763672, -14.144233703613281, 2.6840667724609375, 15.882688522338867, -20.689414978027344, 40.48851013183594, -36.63447570800781, -18.687606811523438, -9.202016830444336, 7.8964996337890625, 8.120780944824219, 89.24795532226562, -41.03822326660156, -18.063438415527344, 29.290504455566406, 7.660663604736328, 105.55380249023438, -54.66590881347656, 14.587570190429688, 17.528718948364258, 12.10319709777832, 14.943706512451172, -12.68780517578125, 32.4234619140625, 10.783491134643555, 69.72319030761719, 23.88979721069336, 43.453941345214844, 27.773101806640625, -28.817703247070312, -10.429651260375977, 11.959636688232422, 42.76904296875, 52.0777587890625, 55.457427978515625, 14.526290893554688, 26.851394653320312, 14.5135498046875, 61.46208190917969, -52.72288513183594, 9.469911575317383, 21.57977867126465, -26.45702362060547, 89.00865173339844, 48.01654052734375, -1.87445068359375, 26.575332641601562, -36.96485137939453, 1.0733509063720703, 59.373260498046875, -20.129100799560547, -7.4514617919921875, 35.570030212402344, -34.424530029296875, 27.712417602539062, 51.611480712890625, 25.530235290527344, 47.16497802734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000150.npy"}
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 18.08395004272461, "std": 35.42637634277344, "min": -38.04130554199219, "p10": -9.73655776977539, "median": 10.401472091674805, "p90": 54.974784088134776, "max": 205.67340087890625, "pos_frac": 0.78125, "sample": [0.4476165771484375, 28.342121124267578, 26.587928771972656, 2.2127647399902344, 12.837261199951172, 1.1071510314941406, 205.67340087890625, -6.611103057861328, 7.0322723388671875, -9.912071228027344, 30.978225708007812, 15.003917694091797, 55.72843933105469, -38.04130554199219, 1.8436927795410156, 19.50354766845703, 25.911787033081055, 1.3519039154052734, -1.8892097473144531, 1.7093372344970703, -18.40497589111328, 5.762636184692383, 25.797203063964844, 5.482540130615234, 63.21812438964844, -32.568328857421875, 3.877063751220703, 14.173236846923828, -0.7339115142822266, 3.1298751831054688, 5.300811767578125, -8.185005187988281, -20.741683959960938, 76.00152587890625, 16.418472290039062, 4.4240570068359375, 43.354576110839844, -12.213607788085938, 4.7478179931640625, 118.15426635742188, 0.3589820861816406, 17.97113037109375, 4.106037139892578, 12.66390609741211, 17.945369720458984, 60.15020751953125, 8.861808776855469, 43.199859619140625, 35.90624237060547, 24.736568450927734, -9.3270263671875, 53.21625518798828, 0.12198257446289062, 12.005538940429688, -13.579261779785156, 17.86806297302246, 11.94113540649414, 13.823429107666016, -6.146419525146484, 75.96286010742188, 32.50035858154297, 30.10696792602539, -0.7751293182373047, 36.941551208496094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000151.npy"}
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 13.667675018310547, "std": 32.480262756347656, "min": -57.988182067871094, "p10": -25.536826705932615, "median": 12.456672668457031, "p90": 53.964050292968764, "max": 105.44403076171875, "pos_frac": 0.671875, "sample": [8.538997650146484, 20.857261657714844, 21.06032943725586, -19.94080352783203, -44.34758758544922, 38.4669189453125, 55.43017578125, 22.38751983642578, -9.888172149658203, -50.69889831542969, 5.764856338500977, 24.233108520507812, 11.058738708496094, -35.97607421875, 30.664093017578125, 23.403465270996094, 6.377628326416016, 20.848133087158203, -0.8252887725830078, -8.884441375732422, -57.988182067871094, 28.811279296875, 13.854606628417969, -25.02056121826172, -25.75808334350586, 2.702657699584961, 21.25971221923828, -8.121482849121094, -4.610603332519531, 30.669219970703125, 5.47016716003418, 26.675268173217773, 3.0748157501220703, -12.437946319580078, -12.077308654785156, 105.44403076171875, -1.14044189453125, -48.927955627441406, 42.73841857910156, -4.005218505859375, -3.9707908630371094, 20.393207550048828, 29.574798583984375, 27.32152557373047, 9.511672973632812, 61.463653564453125, 24.420364379882812, 3.6249427795410156, 76.0423812866211, 68.76284790039062, -38.930419921875, 31.131088256835938, 9.719812393188477, 74.48983001708984, 31.882429122924805, 26.290557861328125, 47.354835510253906, 50.5430908203125, 40.4912109375, 80.29558563232422, -5.406471252441406, -16.92556381225586, 0.869781494140625, 26.638511657714844], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000152.npy"}
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 11.063375473022461, "std": 30.619096755981445, "min": -68.33001708984375, "p10": -19.119148254394528, "median": 5.858341217041016, "p90": 47.50338821411133, "max": 133.0189208984375, "pos_frac": 0.609375, "sample": [14.654806137084961, 22.298263549804688, -7.4246063232421875, -6.8555908203125, 22.73753547668457, 20.24197769165039, -31.757591247558594, -3.6784095764160156, 9.469795227050781, -0.102996826171875, 1.4759044647216797, 20.02375030517578, 9.999969482421875, 71.9471664428711, 47.027099609375, -8.151290893554688, 5.254203796386719, 12.750167846679688, -0.36887359619140625, 35.30796813964844, -2.4952545166015625, 11.99673843383789, -10.781951904296875, 5.121673583984375, 16.706729888916016, -13.642997741699219, 9.423011779785156, 0.05438041687011719, -10.087753295898438, 4.145847320556641, 29.306602478027344, 21.824445724487305, -15.763824462890625, -3.1942405700683594, 27.872695922851562, -3.505491256713867, -33.55388641357422, 26.90296173095703, 133.0189208984375, 17.122249603271484, -68.33001708984375, -0.747802734375, 72.04676818847656, -11.171138763427734, 15.627758026123047, -6.034910202026367, 42.7567138671875, -5.430320739746094, 47.70751190185547, -13.44563102722168, -26.454925537109375, 15.864753723144531, 10.016008377075195, -23.766677856445312, 0.46028900146484375, -24.718994140625, -20.557144165039062, 17.942167282104492, 67.97232055664062, 74.92231750488281, 26.79183578491211, 6.4624786376953125, 2.7803382873535156, 62.04225158691406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000153.npy"}
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 8.121039390563965, "std": 36.12432861328125, "min": -76.87837219238281, "p10": -39.63909759521484, "median": 8.555887222290039, "p90": 39.79615440368653, "max": 157.42584228515625, "pos_frac": 0.65625, "sample": [-48.72765350341797, -37.87310791015625, -62.68251037597656, -20.175594329833984, 34.3194580078125, 28.20173454284668, -15.374019622802734, 4.675973892211914, -60.5106201171875, -55.00353240966797, 19.168933868408203, -1.0433826446533203, 15.117147445678711, 71.28997802734375, -7.8560333251953125, -0.7001914978027344, -13.774547576904297, 1.2001609802246094, 2.1856956481933594, -25.325340270996094, 23.958480834960938, 44.968467712402344, 39.13241958618164, 19.527481079101562, -40.39595031738281, 1.9577045440673828, 36.46137237548828, -61.282203674316406, 54.8763427734375, -76.87837219238281, 12.597478866577148, 8.93697738647461, 8.066368103027344, 30.747909545898438, 29.62673568725586, -5.054607391357422, 10.404645919799805, 30.943771362304688, 9.201908111572266, 9.511730194091797, 8.174797058105469, 22.902137756347656, 4.361787796020508, -10.744352340698242, 15.557960510253906, 25.3425235748291, -2.2541122436523438, 3.9380531311035156, 26.710472106933594, 6.588130950927734, 81.35115051269531, 18.216773986816406, -10.23110580444336, 26.65508270263672, 157.42584228515625, 28.604904174804688, 7.490100860595703, -1.2375354766845703, 40.08061218261719, 47.1563720703125, -20.32611846923828, 13.812204360961914, -1.6853656768798828, 17.43499183654785], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000154.npy"}
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 10.963760375976562, "std": 35.331546783447266, "min": -94.64456176757812, "p10": -18.90573387145996, "median": 7.422859191894531, "p90": 51.8835746765137, "max": 102.22596740722656, "pos_frac": 0.640625, "sample": [7.575897216796875, -6.5602874755859375, 90.18632507324219, 18.836631774902344, 4.360015869140625, 31.90258026123047, 14.187362670898438, 28.153182983398438, -2.591930389404297, 31.907917022705078, 35.09733581542969, 14.758308410644531, -12.692535400390625, 36.65374755859375, -8.677600860595703, 7.2698211669921875, -1.8792724609375, 79.28584289550781, -0.6512374877929688, 41.06903839111328, 35.018123626708984, -13.843652725219727, 3.4631423950195312, -18.82388687133789, 55.111663818359375, 38.516937255859375, -65.00483703613281, 68.84005737304688, -2.2056121826171875, 33.01250457763672, -94.64456176757812, 19.836589813232422, 28.246204376220703, 5.863670349121094, 102.22596740722656, 54.820098876953125, -6.2707061767578125, 45.03168487548828, -74.61883544921875, -13.086532592773438, 2.53741455078125, -9.345966339111328, -18.940811157226562, -0.2996406555175781, 56.151123046875, 14.333484649658203, 5.023036956787109, 11.193397521972656, 28.857879638671875, 9.323722839355469, 36.607826232910156, -63.0528564453125, 7.059715270996094, -21.230194091796875, 0.9951400756835938, -7.584281921386719, 14.541221618652344, -13.602428436279297, -48.97281265258789, 1.54010009765625, 41.88780975341797, 29.97533416748047, 26.910564422607422, -11.907257080078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000155.npy"}
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 17.77851676940918, "std": 31.49639129638672, "min": -46.75367736816406, "p10": -17.42402648925781, "median": 11.937289237976074, "p90": 56.645701599121104, "max": 111.17164611816406, "pos_frac": 0.734375, "sample": [1.87274169921875, -20.668224334716797, -5.262434005737305, -29.48299789428711, 12.286483764648438, 57.73472595214844, 27.94182777404785, -5.316520690917969, 51.75505065917969, 13.52553939819336, 6.4827117919921875, 32.07744216918945, -23.266258239746094, 75.9775390625, -19.07073974609375, 34.440338134765625, 54.104644775390625, 40.40324401855469, 2.9135875701904297, 27.78095245361328, 7.650705337524414, 8.214380264282227, 19.124584197998047, 0.9829635620117188, -18.271255493164062, 30.59369659423828, -2.244497299194336, 37.32427215576172, 4.7508544921875, 23.665794372558594, 6.586662292480469, 28.343093872070312, 21.967208862304688, -14.394287109375, 63.6962890625, 101.25655364990234, -26.61773681640625, -46.75367736816406, -15.447158813476562, 42.34629821777344, 28.595443725585938, 52.699737548828125, 9.185073852539062, 7.405281066894531, 11.588094711303711, 9.189260482788086, -14.741600036621094, -0.1403656005859375, 10.33172607421875, -9.284263610839844, 14.707643508911133, 25.113616943359375, 24.633007049560547, 12.641639709472656, 18.996971130371094, 4.0985107421875, -13.120101928710938, 65.03716278076172, 110.35546875, 4.62470817565918, 21.24834442138672, 111.17164611816406, 25.99917221069336, -1.5154876708984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000156.npy"}
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 10.016905784606934, "std": 22.46013069152832, "min": -46.40177917480469, "p10": -15.883589172363278, "median": 7.406471252441406, "p90": 38.68110656738281, "max": 77.13015747070312, "pos_frac": 0.703125, "sample": [38.892478942871094, 31.541133880615234, 3.6387863159179688, 27.61937713623047, -6.69537353515625, -39.117958068847656, 45.54673767089844, 22.773529052734375, 12.535743713378906, -3.4536819458007812, -17.73870849609375, 4.224632263183594, 45.39836120605469, 6.597919464111328, -46.40177917480469, 1.0061511993408203, -21.023422241210938, -16.97662353515625, 28.69449806213379, 22.366069793701172, 7.346351623535156, 47.03346633911133, 77.13015747070312, 12.829988479614258, 4.123899459838867, 35.891197204589844, -13.333175659179688, -4.083009719848633, 8.04144287109375, 1.2837600708007812, 21.262937545776367, -4.9756011962890625, 13.846527099609375, 37.246116638183594, 15.763374328613281, 36.852294921875, -10.069755554199219, 26.37874984741211, 44.35108947753906, 18.770660400390625, 12.713478088378906, 7.466590881347656, -2.1983489990234375, 38.187904357910156, -27.186370849609375, -2.3052978515625, -3.194580078125, 19.42974853515625, 13.451408386230469, 1.426065444946289, -12.148796081542969, -12.612506866455078, 7.676261901855469, 2.508930206298828, 6.0001068115234375, 22.20829963684082, 14.660560607910156, 24.23806381225586, -35.07252502441406, 2.918975830078125, 41.393157958984375, -3.16619873046875, 2.613616943359375, 6.9550933837890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000157.npy"}
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 13.925542831420898, "std": 41.49396514892578, "min": -77.24762725830078, "p10": -23.793630218505857, "median": 6.047477722167969, "p90": 42.6679672241211, "max": 171.68115234375, "pos_frac": 0.671875, "sample": [-45.354896545410156, 25.113601684570312, -25.336410522460938, 12.205345153808594, -6.3994903564453125, 24.69586181640625, 82.66482543945312, 51.54234313964844, -31.330780029296875, -9.215484619140625, 10.904449462890625, 30.818702697753906, -2.0589599609375, 14.261886596679688, -20.193809509277344, 6.064573287963867, 171.68115234375, 6.03038215637207, 4.074676513671875, 34.098419189453125, 6.829402923583984, 9.17578125, 25.982873916625977, 19.69683074951172, 33.95790100097656, 40.959815979003906, 3.804840087890625, 11.59307861328125, 31.231185913085938, -7.044242858886719, 12.024673461914062, 8.368881225585938, 39.36689758300781, -77.24762725830078, 32.257469177246094, 43.40003204345703, 27.248138427734375, -8.65557861328125, -7.968717575073242, -47.5657958984375, 91.14994812011719, 40.192657470703125, -27.333717346191406, 128.19622802734375, 4.337211608886719, 25.675758361816406, -2.2144527435302734, 5.534477233886719, 155.587158203125, 1.6513690948486328, 1.6028213500976562, 4.227378845214844, -10.553886413574219, -4.429039001464844, 6.413213729858398, -0.15749740600585938, -13.335027694702148, -12.04794692993164, 19.822280883789062, 1.5861091613769531, -2.49371337890625, 4.438362121582031, 1.2074337005615234, -59.50457763671875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000158.npy"}
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 13.412359237670898, "std": 27.937744140625, "min": -74.68093872070312, "p10": -11.333719635009766, "median": 12.376314163208008, "p90": 49.61166076660158, "max": 95.30204772949219, "pos_frac": 0.671875, "sample": [31.404956817626953, 16.858375549316406, 22.122360229492188, 60.715248107910156, 14.719856262207031, -50.63390350341797, 55.791015625, 33.95153045654297, 22.594196319580078, 95.30204772949219, -16.821044921875, 9.975204467773438, 44.79676818847656, -13.827606201171875, -9.51443099975586, -11.159996032714844, 16.877609252929688, 62.813232421875, 36.112510681152344, -1.4307804107666016, 11.666000366210938, -11.408172607421875, 6.549873352050781, -10.995567321777344, 34.73991394042969, 6.193948745727539, -1.8536624908447266, -39.58708190917969, 12.358963012695312, 35.851966857910156, -22.9344482421875, 76.76351928710938, 22.94394874572754, -7.2340240478515625, 20.65833282470703, 11.476400375366211, 21.19357681274414, 23.27416229248047, 1.2252044677734375, -1.290985107421875, 12.393665313720703, -1.4127044677734375, 51.67518615722656, -7.991241455078125, 20.202381134033203, 34.894203186035156, -4.909505844116211, 19.265592575073242, -4.429168701171875, 22.147720336914062, 8.463844299316406, 52.48522186279297, 35.37329864501953, 8.107460021972656, 3.1585693359375, -74.68093872070312, 21.66001319885254, -6.4024658203125, -10.698814392089844, 6.017665863037109, 40.13996124267578, 13.651565551757812, -4.551445007324219, 13.591911315917969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000159.npy"}
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 12.94650650024414, "std": 28.10050392150879, "min": -36.56285858154297, "p10": -18.68336009979248, "median": 11.545494079589844, "p90": 40.36646575927736, "max": 124.45147705078125, "pos_frac": 0.671875, "sample": [2.8148021697998047, -25.367351531982422, 14.767059326171875, 55.019065856933594, -3.8406982421875, 26.943077087402344, 22.956787109375, -8.103286743164062, 16.373802185058594, 16.638702392578125, 82.51225280761719, -14.099685668945312, 36.9271240234375, 23.947967529296875, 9.37857437133789, -5.828826904296875, 5.587535858154297, -36.56285858154297, 13.712413787841797, 18.948654174804688, -15.715354919433594, -21.740890502929688, 16.276885986328125, 29.53302764892578, 29.193334579467773, -0.6778907775878906, 32.116844177246094, 1.0490703582763672, 3.6826858520507812, 5.893577575683594, 33.780052185058594, 18.338571548461914, 79.02462768554688, 45.4329948425293, 2.778043746948242, -11.002195358276367, 33.50457763671875, -4.943355560302734, -0.4826927185058594, 14.837444305419922, 3.8328208923339844, 23.551639556884766, 2.486358642578125, -18.587703704833984, -10.793224334716797, 31.47930145263672, -33.086875915527344, -18.724355697631836, 24.226272583007812, 25.02783203125, 28.69062042236328, -6.165975570678711, 8.840583801269531, 33.32807540893555, -12.167816162109375, 23.7379150390625, -0.6446685791015625, 30.626976013183594, -30.698646545410156, 4.801521301269531, 41.84046936035156, 41.90065002441406, 124.45147705078125, -32.98138427734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000160.npy"}
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 13.022612571716309, "std": 24.623151779174805, "min": -51.76533508300781, "p10": -17.618051147460935, "median": 12.898612976074219, "p90": 45.76208915710449, "max": 92.5927734375, "pos_frac": 0.71875, "sample": [10.935142517089844, -25.351123809814453, 10.152458190917969, 45.977813720703125, 13.834365844726562, 5.51971435546875, 92.5927734375, 47.37263488769531, 6.18678092956543, -2.5433225631713867, -24.925376892089844, -17.804855346679688, 13.912670135498047, 21.123214721679688, 45.258731842041016, 28.83719825744629, 15.520883560180664, -51.76533508300781, 20.9495849609375, -25.12957000732422, -5.20574951171875, 14.319717407226562, 11.767990112304688, -17.182174682617188, 37.85826110839844, -5.786884307861328, 17.78491973876953, 11.138582229614258, -4.969535827636719, 61.06250762939453, 26.882080078125, 19.50110626220703, 18.3018741607666, -25.810745239257812, 15.328788757324219, -20.103591918945312, -14.39004898071289, 63.91566467285156, 23.11907196044922, 35.47367477416992, 14.960285186767578, 4.426063537597656, -5.069965362548828, 17.83397674560547, 21.61750030517578, -4.517333984375, -0.45964813232421875, 3.364910125732422, 48.72364807128906, 29.256263732910156, 0.37401580810546875, 21.78106689453125, 38.391319274902344, 23.197093963623047, -5.382530212402344, 5.052118301391602, 11.962860107421875, 0.3128376007080078, 20.770980834960938, 43.35236358642578, 50.06971740722656, -10.137222290039062, 1.8308372497558594, 8.076175689697266], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000161.npy"}
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 18.73758888244629, "std": 39.44634246826172, "min": -110.53538513183594, "p10": -20.444188690185545, "median": 16.838150024414062, "p90": 47.92800064086916, "max": 135.2976531982422, "pos_frac": 0.78125, "sample": [21.712249755859375, -17.160423278808594, -95.41232299804688, -0.60223388671875, 58.79187774658203, 27.60305404663086, 19.296661376953125, 31.6661376953125, -45.404998779296875, 41.767120361328125, 10.180908203125, 17.046554565429688, 8.45600700378418, 135.2976531982422, 35.49750518798828, 19.949811935424805, 115.2567138671875, 31.697669982910156, 49.640350341796875, 30.58462142944336, 37.162174224853516, 41.13747024536133, 14.710006713867188, -24.464111328125, 16.629745483398438, 17.583999633789062, 105.04965209960938, 39.59581756591797, -33.49561309814453, 113.67535400390625, 16.599266052246094, 13.971023559570312, 14.862907409667969, 27.865215301513672, -110.53538513183594, 15.234748840332031, 25.608840942382812, 28.448659896850586, -5.888830184936523, -8.923965454101562, -30.97161865234375, -3.688690185546875, 85.0042953491211, 28.48173713684082, 29.38371467590332, 10.901756286621094, -3.8644561767578125, 10.503616333007812, 6.282196044921875, 8.588035583496094, 3.8569984436035156, 2.2689208984375, 36.54326629638672, 31.758337020874023, 39.94158935546875, 26.290382385253906, 9.847856521606445, -21.851516723632812, 7.027496337890625, 9.47299575805664, -1.0605754852294922, 21.147178649902344, 43.932518005371094, 8.71771240234375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000162.npy"}
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 17.277774810791016, "std": 26.37078857421875, "min": -45.746543884277344, "p10": -19.236814880371092, "median": 17.38038444519043, "p90": 48.13553466796875, "max": 85.16439819335938, "pos_frac": 0.796875, "sample": [0.5059652328491211, 33.74237060546875, 3.48651123046875, 22.043807983398438, 11.820358276367188, 12.509868621826172, -21.801513671875, 71.51815795898438, 46.2857666015625, -16.985794067382812, 4.196258544921875, 61.70624542236328, 64.9360580444336, 18.94902801513672, 19.143295288085938, 5.890693664550781, 17.187252044677734, -23.331138610839844, 85.16439819335938, 39.78966522216797, 2.75360107421875, 45.98158264160156, 30.0222225189209, -13.569068908691406, -4.316947937011719, 39.54107666015625, 17.615062713623047, 51.550567626953125, 2.5705127716064453, -20.739837646484375, 21.483293533325195, 12.103126525878906, 39.777549743652344, -6.641380310058594, 37.60789489746094, 40.46531677246094, -7.148502349853516, 56.05450439453125, -20.2015380859375, 0.7162208557128906, 20.776554107666016, 48.074737548828125, 16.070472717285156, 32.11337661743164, 33.460357666015625, -7.383607864379883, 1.631439208984375, 8.3564453125, 5.337074279785156, -31.485191345214844, -45.746543884277344, 35.63032531738281, 22.799232482910156, 2.576549530029297, 7.209163665771484, -29.529006958007812, 25.157058715820312, 48.161590576171875, 45.745452880859375, 29.480133056640625, 26.316246032714844, 7.736574172973633, 3.3331031799316406, 17.573516845703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000163.npy"}
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 18.884613037109375, "std": 28.249414443969727, "min": -45.035301208496094, "p10": -10.635531806945801, "median": 16.372516632080078, "p90": 52.73543472290039, "max": 100.66854858398438, "pos_frac": 0.734375, "sample": [1.951629638671875, 2.0622329711914062, -10.105674743652344, 5.764804840087891, 8.503562927246094, 55.819244384765625, 53.22528076171875, 40.607933044433594, 26.29638671875, 21.584186553955078, 11.43310546875, -15.5479736328125, 25.841320037841797, -8.982173919677734, 15.360057830810547, 60.12446594238281, 14.870002746582031, 17.93271827697754, 91.89544677734375, -2.7983055114746094, 8.951484680175781, -37.96382141113281, 13.081100463867188, 46.7926025390625, -1.2922515869140625, 16.981689453125, 39.45490646362305, -2.7334365844726562, 12.973800659179688, 2.0486068725585938, 34.61335754394531, 42.29723358154297, -10.738775253295898, 29.362850189208984, -2.9571380615234375, -17.662063598632812, 59.024749755859375, 33.52775573730469, 40.379478454589844, -45.035301208496094, 21.85240364074707, -10.394630432128906, 6.236896514892578, 42.10620880126953, 27.857601165771484, 17.200796127319336, 50.34913635253906, -2.667287826538086, 100.66854858398438, 24.882343292236328, 48.467498779296875, -1.6292686462402344, 33.21254348754883, -26.64093589782715, 4.67805290222168, -37.558631896972656, 5.7704620361328125, 51.59246063232422, 48.12626647949219, -1.6415748596191406, 55.056182861328125, 37.54930114746094, 20.8323974609375, 15.763343811035156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000164.npy"}
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 16.14834213256836, "std": 27.616653442382812, "min": -50.91764831542969, "p10": -14.428118896484374, "median": 18.297977447509766, "p90": 44.054727935791014, "max": 94.43523406982422, "pos_frac": 0.75, "sample": [58.867820739746094, 8.98531723022461, 12.750255584716797, 31.33132553100586, 21.93508529663086, -1.080963134765625, 18.75131607055664, -43.85804748535156, 9.617877960205078, 4.774709701538086, 9.634071350097656, 17.84463882446289, 52.86564636230469, -6.841497421264648, -14.443344116210938, 8.072578430175781, 19.116622924804688, 10.674850463867188, -35.69391632080078, -18.96875, -11.915115356445312, -9.5875244140625, 37.614097595214844, -3.0948543548583984, 45.275482177734375, 21.980993270874023, 16.095373153686523, 34.744049072265625, 42.207794189453125, 42.815330505371094, 21.3172607421875, 1.7752361297607422, 61.062896728515625, 43.768463134765625, 19.663864135742188, -47.119232177734375, 6.4598388671875, 18.896900177001953, 38.42759323120117, 30.367904663085938, -14.392593383789062, -11.820465087890625, 39.84117126464844, 28.28810691833496, 22.789342880249023, -11.738838195800781, 6.344226837158203, 10.210273742675781, 29.378244400024414, -7.0003662109375, 44.1417236328125, 43.85173797607422, -50.91764831542969, 34.446102142333984, 9.145792007446289, -35.20415115356445, 37.56147003173828, 13.284318923950195, 29.15227508544922, 28.48297119140625, 67.16419219970703, 33.268341064453125, 94.43523406982422, 17.69055938720703], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000165.npy"}
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 15.347746849060059, "std": 27.666894912719727, "min": -34.13414001464844, "p10": -16.237847137451173, "median": 9.215160369873047, "p90": 46.99365310668946, "max": 118.55194091796875, "pos_frac": 0.703125, "sample": [36.971717834472656, 6.107908248901367, 8.373184204101562, 17.27069091796875, 35.21723937988281, 8.122848510742188, -17.244369506835938, 60.17572021484375, 29.743515014648438, -5.970306396484375, 10.057136535644531, 7.361017227172852, 31.756908416748047, -16.29875946044922, 38.335693359375, 29.986526489257812, 4.733306884765625, 22.626523971557617, -19.704322814941406, 5.0764923095703125, 0.03612518310546875, 29.45089340209961, -3.1072463989257812, 7.270416259765625, 43.61786651611328, 3.615093231201172, -13.717498779296875, 118.55194091796875, 16.827194213867188, 32.312904357910156, -34.13414001464844, -33.4970817565918, 7.265830993652344, 31.41570281982422, 50.63880157470703, 2.2099761962890625, 47.20475769042969, -5.8747406005859375, 28.333274841308594, 25.285444259643555, 17.262222290039062, 44.076812744140625, -33.74678039550781, -16.095718383789062, 55.043731689453125, -8.502471923828125, -11.852989196777344, -7.546703338623047, 15.392389297485352, 46.501075744628906, 34.22419738769531, -17.953643798828125, 85.26246643066406, 17.29688262939453, -12.053211212158203, -4.53192138671875, 27.080772399902344, -3.1437225341796875, 48.13230895996094, 0.6559829711914062, 31.70867919921875, 27.381790161132812, -0.2692070007324219, 1.5286293029785156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000166.npy"}
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 16.62277603149414, "std": 23.17474365234375, "min": -43.35185241699219, "p10": -11.87210426330566, "median": 17.700830459594727, "p90": 45.22786254882813, "max": 70.91398620605469, "pos_frac": 0.734375, "sample": [15.758277893066406, 17.281978607177734, 4.614748001098633, 28.30984878540039, 0.29590606689453125, 36.7623291015625, 36.85344696044922, 1.5005264282226562, 9.385168075561523, 40.01374053955078, 32.80408477783203, 1.9636192321777344, 19.160396575927734, -3.4620361328125, -7.140232086181641, -2.202526092529297, 40.86360168457031, 61.42054748535156, 35.45207214355469, -18.844131469726562, -19.924903869628906, -1.2968177795410156, 4.904485702514648, -20.213844299316406, 22.53742790222168, -43.35185241699219, -4.088142395019531, 1.582468032836914, -0.7473220825195312, 33.9698486328125, 23.906539916992188, 37.96034622192383, 5.858251571655273, 27.95730209350586, 19.237335205078125, 18.570526123046875, 51.4354248046875, 70.91398620605469, 18.388992309570312, 18.11968231201172, 4.696022033691406, 57.70603942871094, -6.319602966308594, 15.410865783691406, 34.434959411621094, -18.444259643554688, -7.806358337402344, 45.85724639892578, -4.645790100097656, 43.759300231933594, 24.86081314086914, 4.147335052490234, 18.933488845825195, 54.319461822509766, 16.24667739868164, 38.588462829589844, -14.154521942138672, 33.94175720214844, 25.935142517089844, 57.02722930908203, 31.440574645996094, -0.896514892578125, 5.9228363037109375, -13.614566802978516], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000167.npy"}
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 15.429210662841797, "std": 30.71924591064453, "min": -38.188446044921875, "p10": -16.283954620361328, "median": 8.612669944763184, "p90": 47.784085845947274, "max": 116.97088623046875, "pos_frac": 0.625, "sample": [11.747200012207031, -11.781791687011719, 45.66042709350586, -11.206512451171875, 4.264806747436523, 34.61741638183594, 36.86164855957031, -7.357414245605469, -34.25343322753906, -38.188446044921875, 0.6779022216796875, -3.0045928955078125, 24.167381286621094, 48.6942253112793, -6.2624664306640625, 2.733583450317383, 23.144725799560547, 16.526573181152344, 24.943801879882812, 31.307998657226562, 43.64488983154297, -0.9290828704833984, -6.895679473876953, 12.231616973876953, -6.61918830871582, 29.697086334228516, 43.29615020751953, 4.442928314208984, -20.219383239746094, 32.831695556640625, -22.463607788085938, -7.7316131591796875, 82.28291320800781, 43.26905822753906, 44.041202545166016, -9.903671264648438, 116.97088623046875, -22.007247924804688, -3.912628173828125, 4.6748809814453125, 3.4177074432373047, 49.692100524902344, -15.449729919433594, 40.355262756347656, 16.81658935546875, -16.6414794921875, 62.02088928222656, -14.798141479492188, 18.45337677001953, 5.067474365234375, 36.74327087402344, 18.26793670654297, 33.86397933959961, -5.253988265991211, -7.506370544433594, 36.14966583251953, -4.666717529296875, -12.110855102539062, -23.978683471679688, 6.969694137573242, 37.21849060058594, 53.538902282714844, 109.05023193359375, 10.255645751953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000168.npy"}
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 13.180170059204102, "std": 28.21048355102539, "min": -72.73423767089844, "p10": -12.845477294921874, "median": 10.621862411499023, "p90": 47.60171127319338, "max": 86.16259765625, "pos_frac": 0.703125, "sample": [8.132637023925781, 36.68708038330078, 41.63983917236328, -2.3715972900390625, 18.028383255004883, -1.1754074096679688, 7.587223052978516, 0.21652984619140625, 12.422088623046875, 22.31560516357422, -2.78216552734375, -8.974868774414062, 54.64915466308594, 39.32633972167969, -7.14434814453125, -10.799079895019531, 17.699642181396484, 5.327445983886719, 32.502296447753906, -2.0513153076171875, -3.9010162353515625, 50.15679931640625, 39.159523010253906, 24.1834716796875, -1.5918006896972656, 79.08345794677734, 6.561012268066406, 1.55267333984375, -9.207569122314453, 18.700481414794922, 11.689241409301758, -14.488204956054688, 21.288982391357422, -4.023015975952148, 35.80002975463867, 9.554483413696289, 36.26068115234375, -72.73423767089844, 23.907136917114258, 24.11461639404297, 0.5593147277832031, 24.241470336914062, 3.535888671875, 6.4861907958984375, 6.564855575561523, 19.895668029785156, 22.200469970703125, -12.63616943359375, 61.5174560546875, 34.12107849121094, 7.143096923828125, 19.835119247436523, 19.5538330078125, -66.58285522460938, -45.95832824707031, 86.16259765625, 57.69651794433594, 15.508049011230469, -21.59812355041504, 52.20310592651367, -12.9351806640625, 40.40312194824219, -16.590408325195312, 4.901863098144531], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000169.npy"}
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 15.015144348144531, "std": 28.27567481994629, "min": -49.672271728515625, "p10": -23.051576423645013, "median": 16.471939086914062, "p90": 46.75365905761719, "max": 81.53713989257812, "pos_frac": 0.78125, "sample": [16.20665740966797, 0.1346435546875, 18.546571731567383, 10.55596923828125, 0.6975574493408203, 36.38042449951172, 10.676542282104492, 21.251977920532227, -1.9489822387695312, 47.404136657714844, -37.35530090332031, 4.389251708984375, 18.141395568847656, 74.30166625976562, 48.04914093017578, -25.809537887573242, -6.823591232299805, 21.18555450439453, 12.249366760253906, -13.889060974121094, 6.5806121826171875, 43.9527587890625, 26.472578048706055, 4.883050918579102, -36.30290985107422, 26.54955291748047, -48.938316345214844, 16.737220764160156, -49.672271728515625, 56.186859130859375, 42.39372634887695, 50.756256103515625, 38.56212615966797, 15.328531265258789, -15.739788055419922, 25.99505615234375, -16.6163330078125, 19.52777862548828, 44.08472442626953, 0.40648651123046875, 28.319168090820312, 1.0700626373291016, -3.1176681518554688, 34.50775146484375, 22.89958953857422, 81.53713989257812, 35.47172546386719, 65.59071350097656, 11.232234954833984, 7.397773742675781, 1.7150039672851562, -11.513936996459961, 8.382041931152344, 38.638999938964844, -35.91545104980469, 19.26512908935547, 34.41410827636719, 33.76914978027344, 0.08864593505859375, 13.256776809692383, 36.758140563964844, 45.235877990722656, 30.33739471435547, -43.863182067871094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000170.npy"}
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 17.860342025756836, "std": 30.43794059753418, "min": -67.3489990234375, "p10": -11.139122009277344, "median": 17.260547637939453, "p90": 55.44782028198243, "max": 83.17739868164062, "pos_frac": 0.71875, "sample": [25.307884216308594, 6.3834381103515625, 37.74476623535156, 16.691993713378906, 36.67662811279297, 52.37282943725586, 51.11582565307617, 41.69648742675781, 2.6653308868408203, 6.255743026733398, -1.6331729888916016, -34.15232849121094, 47.876922607421875, 24.07522964477539, 56.47347640991211, -0.6841049194335938, 38.2688102722168, 13.529544830322266, 28.398361206054688, 0.9916610717773438, 1.4617671966552734, 19.07111930847168, 18.56711196899414, 83.17739868164062, 38.76668167114258, 1.4402809143066406, -7.0929107666015625, -6.28759765625, 60.98175811767578, -10.420333862304688, 40.34058380126953, 31.65850067138672, 49.839271545410156, -23.244842529296875, 23.03078842163086, 37.969520568847656, -31.99120330810547, -28.182525634765625, 53.054622650146484, 66.58224487304688, -63.248130798339844, 5.65205192565918, 42.36404037475586, 14.570037841796875, 31.444427490234375, 73.01899719238281, 59.55247497558594, 7.3517913818359375, -2.793670654296875, 30.350936889648438, 12.394062042236328, 41.47662353515625, 38.868804931640625, -0.254364013671875, -7.280479431152344, 17.8291015625, 62.0069580078125, 1.3969745635986328, -8.659534454345703, -67.3489990234375, 9.1512451171875, -9.210624694824219, -11.447174072265625, -2.9012298583984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000171.npy"}
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 15.726179122924805, "std": 27.04473876953125, "min": -40.080177307128906, "p10": -14.054086303710937, "median": 9.420821189880371, "p90": 47.958706283569335, "max": 127.01730346679688, "pos_frac": 0.71875, "sample": [-20.36724853515625, 10.750015258789062, -2.6840744018554688, 19.041671752929688, -13.735977172851562, -0.0092620849609375, -0.2820911407470703, 22.131214141845703, 37.35054016113281, 49.75031280517578, 127.01730346679688, 25.0157470703125, 13.50408935546875, 6.571178436279297, 20.84344482421875, 54.46569061279297, 34.18400573730469, -14.128646850585938, -7.501945495605469, 20.100784301757812, 33.772979736328125, 1.0626602172851562, -26.53609275817871, 1.8659820556640625, 10.892158508300781, 8.09162712097168, -13.880111694335938, 38.415435791015625, 28.82183837890625, -11.291671752929688, 28.44482421875, 37.24797821044922, 25.043411254882812, 60.538177490234375, -0.5757293701171875, 4.192136764526367, 46.60212707519531, 29.090187072753906, 48.043819427490234, 0.10584259033203125, -4.768808364868164, 16.873443603515625, 2.1419029235839844, 36.34858703613281, 31.872074127197266, -18.738903045654297, -8.860855102539062, -21.446075439453125, 2.0125732421875, 4.671266555786133, -16.421707153320312, 45.488128662109375, 51.36427307128906, -40.080177307128906, 5.310415267944336, 30.159103393554688, 2.4118576049804688, 5.619453430175781, 3.2765846252441406, 46.228477478027344, 1.929473876953125, -2.676328659057617, 47.760108947753906, 54.036224365234375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000172.npy"}
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 18.405986785888672, "std": 33.60620880126953, "min": -42.3333740234375, "p10": -20.552362251281735, "median": 14.653036117553711, "p90": 61.298133850097685, "max": 113.93788146972656, "pos_frac": 0.671875, "sample": [-21.65790367126465, -28.26172637939453, -5.322601318359375, 23.886241912841797, 33.040199279785156, 51.803504943847656, 4.884332656860352, -4.3383636474609375, -2.9293289184570312, 30.810630798339844, -0.072265625, 23.578231811523438, 113.93788146972656, -10.98240852355957, -3.9809112548828125, 20.40020751953125, 73.43009948730469, 5.8742218017578125, 27.886240005493164, 0.3263969421386719, 48.69926452636719, 40.57176971435547, -6.348461151123047, 54.720733642578125, -41.84302520751953, -42.3333740234375, 16.350181579589844, 32.16499328613281, 12.955890655517578, -4.9310760498046875, -17.304107666015625, 71.57954406738281, -18.17671775817871, 54.169647216796875, 31.57330322265625, 3.667867660522461, 37.34492492675781, 3.6358108520507812, 29.567398071289062, 64.11701965332031, 21.774169921875, 1.4546890258789062, 3.184904098510742, 54.36724853515625, 24.70197296142578, -32.71216583251953, 6.079280853271484, -21.57049560546875, 97.17933654785156, 33.40093231201172, 52.77593231201172, -38.79871368408203, 24.54998779296875, 69.54106903076172, 42.52944564819336, 5.860908508300781, -4.45526123046875, -10.620634078979492, 21.270933151245117, 94.51824951171875, 4.8392333984375, 38.06454086303711, -3.1222496032714844, -9.3243408203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000173.npy"}
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 19.297199249267578, "std": 26.21869659423828, "min": -44.568634033203125, "p10": -13.838979721069332, "median": 19.99315643310547, "p90": 55.14598388671876, "max": 61.808082580566406, "pos_frac": 0.75, "sample": [26.535049438476562, 40.00616455078125, 28.35242462158203, 2.649137496948242, 18.2310733795166, 18.94769287109375, 9.400703430175781, 40.0904541015625, 14.943069458007812, 21.232410430908203, 45.47191619873047, 52.684661865234375, 10.210990905761719, -10.752994537353516, 56.200836181640625, -3.3825550079345703, 40.71568298339844, 37.18672180175781, 38.40544891357422, -44.568634033203125, 42.808616638183594, 11.952354431152344, -3.9261741638183594, 38.21784591674805, 61.808082580566406, -1.352090835571289, 57.351287841796875, 9.2386474609375, 47.96343994140625, 30.333892822265625, 8.334922790527344, 61.79645538330078, 59.22179412841797, -3.8870105743408203, 34.14823532104492, 58.80223083496094, 33.271484375, -16.84326171875, -1.8375568389892578, 51.29193115234375, -15.161544799804688, 61.350250244140625, 12.92184066772461, 1.8812198638916016, 2.8262481689453125, 47.826026916503906, 38.22444152832031, -27.69213104248047, 42.920249938964844, 21.038619995117188, -27.75737762451172, 25.135574340820312, 3.8432083129882812, -0.3325080871582031, 28.189224243164062, 1.3000640869140625, 31.21886444091797, -20.841060638427734, 28.440906524658203, -39.712646484375, 1.1526317596435547, -1.8700408935546875, 0.2490234375, -1.385772705078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000174.npy"}
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 19.455673217773438, "std": 30.152952194213867, "min": -44.86204528808594, "p10": -13.44768943786621, "median": 15.985309600830078, "p90": 54.00097923278809, "max": 121.55078125, "pos_frac": 0.71875, "sample": [-31.173479080200195, 33.50708770751953, -5.182037353515625, 1.9969425201416016, -8.773590087890625, 40.72301483154297, 0.71356201171875, 60.501800537109375, 26.937301635742188, -6.725128173828125, 5.152130126953125, 52.751564025878906, 31.12535858154297, 40.4144287109375, -14.792125701904297, 44.15226745605469, 33.47190856933594, 33.347450256347656, 43.79396057128906, -2.0319366455078125, 11.00347900390625, 9.411264419555664, 11.583702087402344, 69.42843627929688, 51.05079650878906, 19.0067138671875, -9.100509643554688, -11.723472595214844, 47.30219268798828, -31.71929931640625, 38.56620788574219, -15.124702453613281, 24.604576110839844, 0.9391021728515625, 8.461456298828125, -0.6671257019042969, 25.34496307373047, 45.062774658203125, -1.148599624633789, -29.380584716796875, 20.191848754882812, -44.86204528808594, 7.193534851074219, 20.412071228027344, -11.746109008789062, 2.1409912109375, 44.77869415283203, -11.758296966552734, 54.490821838378906, 48.733436584472656, 10.17806625366211, 41.95692443847656, 23.145843505859375, -14.171714782714844, 12.963905334472656, 56.94615936279297, -10.20733642578125, 62.99580383300781, 43.34624481201172, 121.55078125, 1.0139846801757812, 69.34040832519531, 0.8591384887695312, 52.85801315307617], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000175.npy"}
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 13.042362213134766, "std": 32.03752517700195, "min": -67.64651489257812, "p10": -21.38794937133789, "median": 9.29631233215332, "p90": 51.42079582214356, "max": 112.46202087402344, "pos_frac": 0.703125, "sample": [61.92057800292969, -47.21166229248047, 40.09673309326172, 20.47051239013672, 3.253192901611328, 17.379379272460938, -19.192909240722656, 17.497772216796875, 5.120691299438477, 1.4784774780273438, -54.28253173828125, -20.75365447998047, -11.176429748535156, 17.174270629882812, -31.786643981933594, 48.6817512512207, -0.058124542236328125, 28.404422760009766, 5.5023956298828125, -2.7388763427734375, 9.705947875976562, 10.382438659667969, 15.0103759765625, 64.98060607910156, 69.15594482421875, 27.532127380371094, -20.594825744628906, 42.95440673828125, -4.230537414550781, 44.933502197265625, -11.748565673828125, 2.4145240783691406, 4.4173583984375, 48.457313537597656, 27.539575576782227, 49.92900466918945, 4.653373718261719, 65.87644958496094, 2.3725433349609375, -21.6597900390625, -15.935577392578125, 39.392913818359375, 28.54026222229004, 52.06013488769531, 16.272064208984375, 8.886676788330078, -21.965301513671875, -17.318572998046875, 33.424072265625, 57.39054870605469, -6.7864227294921875, 35.28528594970703, 2.425403594970703, 26.430831909179688, 4.7709197998046875, 7.80711555480957, 47.750762939453125, 2.5168914794921875, -12.308341979980469, 112.46202087402344, 10.840866088867188, -67.64651489257812, 15.709697723388672, -37.155609130859375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000176.npy"}
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 24.429092407226562, "std": 24.260072708129883, "min": -26.284896850585938, "p10": -3.913832092285154, "median": 21.035816192626953, "p90": 54.583567047119146, "max": 84.72547912597656, "pos_frac": 0.859375, "sample": [1.509552001953125, 11.706684112548828, 34.30516052246094, 69.54515075683594, 19.144813537597656, -1.5236740112304688, 32.610572814941406, 33.943878173828125, 78.69007873535156, 31.90644073486328, 64.23638153076172, 52.31807327270508, 27.366182327270508, -4.9067535400390625, 23.820491790771484, -15.869832992553711, 43.805938720703125, 7.113037109375, -5.960151672363281, 9.118494033813477, 21.53533172607422, 13.777664184570312, 7.140312194824219, 41.732582092285156, 23.62769317626953, 14.82427978515625, 18.995351791381836, 44.741844177246094, -7.618833541870117, 6.088048934936523, 33.93206787109375, 9.606651306152344, 36.116485595703125, -1.597015380859375, 15.545267105102539, 50.89359664916992, 47.74425506591797, 0.207427978515625, 16.2476806640625, 37.01869201660156, 12.184974670410156, 24.54606819152832, 44.20235061645508, 51.68092346191406, 35.65083312988281, -7.86126708984375, 3.3042449951171875, 11.51947021484375, 60.82230758666992, 6.0815277099609375, 76.2298583984375, 8.059560775756836, -26.284896850585938, 55.55449295043945, 20.536300659179688, 23.37519073486328, 30.091331481933594, 84.72547912597656, 35.595458984375, 19.984872817993164, 16.111900329589844, 50.01343536376953, -24.496612548828125, 8.394271850585938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000177.npy"}
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 15.497869491577148, "std": 23.240747451782227, "min": -25.54115104675293, "p10": -6.69599723815918, "median": 13.01646614074707, "p90": 45.48368339538575, "max": 101.81671905517578, "pos_frac": 0.671875, "sample": [20.977571487426758, 40.573822021484375, 10.237056732177734, -5.54144287109375, 28.40308380126953, -0.8303546905517578, 6.220378875732422, 28.98357391357422, 36.98151397705078, 13.72732162475586, -17.052032470703125, 101.81671905517578, 23.537681579589844, -25.54115104675293, 2.9059295654296875, 3.6557960510253906, 49.691688537597656, -0.15653610229492188, -2.31561279296875, -3.9489974975585938, 45.577640533447266, 12.007585525512695, 14.977157592773438, -5.846099853515625, -6.673606872558594, 14.360382080078125, -6.705593109130859, 32.43553161621094, -1.9236831665039062, 10.006608963012695, 36.186614990234375, -0.1707172393798828, 12.003746032714844, 64.47406768798828, 15.634292602539062, 15.381555557250977, 14.002227783203125, 8.452491760253906, 46.12529754638672, 12.305610656738281, -19.043869018554688, 28.534622192382812, 45.26445007324219, 7.2775726318359375, -18.686683654785156, 47.84648895263672, -3.792461395263672, -22.470176696777344, 35.641319274902344, -0.6276512145996094, 36.65299987792969, 39.483123779296875, 15.841602325439453, 38.45726013183594, 16.960205078125, 4.667427062988281, 31.904815673828125, -4.3966522216796875, -22.01525115966797, -2.330108642578125, 21.17477798461914, 48.524681091308594, 23.44286346435547, -1.3848190307617188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000178.npy"}
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 15.924118041992188, "std": 26.35126304626465, "min": -42.87461853027344, "p10": -16.55665531158447, "median": 12.259526252746582, "p90": 48.848941802978516, "max": 90.22721862792969, "pos_frac": 0.75, "sample": [25.072174072265625, -8.918502807617188, 10.645843505859375, 16.02655029296875, 4.947780609130859, 22.863784790039062, 12.013582229614258, 34.27528381347656, 48.31640625, 25.794071197509766, 48.70103454589844, 39.604148864746094, 46.133033752441406, -25.784648895263672, 6.672798156738281, -27.92669677734375, -42.87461853027344, -15.64747428894043, -0.11243820190429688, 30.838226318359375, -4.619701385498047, 36.95753860473633, 0.7357177734375, 48.912330627441406, 36.64556884765625, -42.07158660888672, -9.585212707519531, 41.852874755859375, 11.545886993408203, 23.339967727661133, 21.215049743652344, 53.24213409423828, 9.69976806640625, -5.108154296875, 2.1846084594726562, -10.457988739013672, 2.6388092041015625, 56.05116271972656, 51.299224853515625, 0.9750289916992188, 20.67841148376465, 37.836341857910156, 33.76966094970703, 16.37479019165039, -19.931800842285156, 37.17034912109375, 71.0887451171875, 2.6689910888671875, 52.33216094970703, 90.22721862792969, 36.36005783081055, 30.360876083374023, 24.415435791015625, 0.1825714111328125, 11.638835906982422, 17.984542846679688, 8.308639526367188, 0.199676513671875, 4.298271179199219, -16.946304321289062, -24.21943473815918, -0.4206409454345703, 12.505470275878906, -3.832735061645508], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000179.npy"}
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 11.179155349731445, "std": 22.792512893676758, "min": -44.90357208251953, "p10": -11.360383605957031, "median": 6.085732460021973, "p90": 45.05541229248047, "max": 60.056640625, "pos_frac": 0.671875, "sample": [10.41107177734375, 2.4776153564453125, 6.319721221923828, 17.92169952392578, 4.9349822998046875, 12.342987060546875, -10.692031860351562, -1.0190582275390625, -1.6339092254638672, -0.16584396362304688, 45.08697509765625, 4.6263275146484375, -12.563980102539062, -3.6291942596435547, 12.509063720703125, 5.737754821777344, -28.034698486328125, 44.98176574707031, -8.207447052001953, -42.972015380859375, 43.2227783203125, 21.238256454467773, 1.289480209350586, 49.70635986328125, 6.715595245361328, 31.341230392456055, -2.0793113708496094, 0.04497528076171875, 5.851743698120117, 26.42654800415039, 12.0787353515625, 42.530818939208984, -8.351821899414062, 12.644363403320312, 13.136791229248047, 13.358123779296875, 5.007118225097656, 57.294212341308594, -1.1082897186279297, -2.959024429321289, 6.785619735717773, 43.257843017578125, 48.018836975097656, -5.353662490844727, -0.6671543121337891, 0.7645816802978516, 54.899749755859375, -11.646820068359375, 60.056640625, 1.2741870880126953, 31.497703552246094, -17.360000610351562, 18.880523681640625, -6.378387451171875, 16.928054809570312, 6.331012725830078, 52.11676025390625, -44.90357208251953, 40.265438079833984, -15.579086303710938, -2.2754058837890625, 22.327713012695312, 28.207801818847656, 2.197093963623047], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000180.npy"}
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 17.161563873291016, "std": 24.522966384887695, "min": -40.115692138671875, "p10": -6.078120231628418, "median": 9.705455780029297, "p90": 54.4384735107422, "max": 81.13494873046875, "pos_frac": 0.734375, "sample": [19.029014587402344, 10.992557525634766, -3.5474395751953125, -0.6053180694580078, 12.90671157836914, -1.7936840057373047, 16.28441619873047, -11.549285888671875, 5.803989410400391, -1.6507530212402344, 42.23291015625, 6.4281463623046875, 17.19446563720703, -1.129547119140625, 8.177053451538086, 3.5597457885742188, 39.71303939819336, -6.286201477050781, 31.130340576171875, -40.115692138671875, 57.88737487792969, 49.36113739013672, 26.145092010498047, 2.460601806640625, 59.785255432128906, 42.35569763183594, 5.311531066894531, 67.92657470703125, -22.760208129882812, -16.362022399902344, 36.9560546875, 1.8669509887695312, 81.13494873046875, -6.137947082519531, 31.804473876953125, 2.821239471435547, -5.93852424621582, 66.05140686035156, 1.4047775268554688, 8.434432983398438, 43.67549514770508, 3.290895462036133, 55.57807159423828, 51.77941131591797, 43.24504470825195, 2.1590709686279297, 18.23333168029785, 13.162227630615234, 6.946922302246094, -3.9779529571533203, -4.108863830566406, 45.43619155883789, 61.04426574707031, -9.995841979980469, 14.978759765625, 33.72277069091797, 10.550773620605469, 11.0576171875, 8.070331573486328, -2.1690826416015625, 19.475540161132812, -0.3369712829589844, 30.378517150878906, 8.860137939453125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000181.npy"}
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 15.883401870727539, "std": 24.696657180786133, "min": -38.53626251220703, "p10": -14.420494651794433, "median": 10.508243560791016, "p90": 50.33702239990236, "max": 63.39208984375, "pos_frac": 0.75, "sample": [28.595138549804688, -14.906473159790039, -13.286544799804688, 9.792686462402344, 5.053337097167969, 28.553268432617188, 21.155410766601562, -25.564411163330078, -19.461441040039062, 20.968368530273438, 19.748180389404297, 3.05194091796875, 55.342681884765625, 10.916336059570312, 41.21898651123047, 45.120452880859375, 2.2889022827148438, 10.211517333984375, 39.34671401977539, 10.804969787597656, 37.92652130126953, 2.124317169189453, 8.641632080078125, -11.287384033203125, 30.85040283203125, 13.357528686523438, 41.235191345214844, 6.16856575012207, 63.39208984375, 46.34416198730469, 52.048248291015625, 23.247718811035156, 11.469467163085938, 0.5331344604492188, 62.93476104736328, 1.4443206787109375, -21.097305297851562, -1.788482666015625, 59.72076416015625, -0.9016628265380859, 7.8067779541015625, -26.337800979614258, 60.349403381347656, 16.250900268554688, 44.284393310546875, 55.1485595703125, 4.28996467590332, -8.717460632324219, 32.55491638183594, -1.8726730346679688, 30.97580337524414, 38.64622497558594, -11.438529968261719, 33.66905975341797, 43.18817138671875, 4.679389953613281, -6.4655609130859375, 7.532072067260742, 38.57851791381836, 3.122020721435547, 6.191257476806641, -20.70983123779297, -1.9655742645263672, -38.53626251220703], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000182.npy"}
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 6.714117527008057, "std": 22.29013442993164, "min": -69.10808563232422, "p10": -14.667881584167478, "median": 4.893279075622559, "p90": 35.811424636840826, "max": 57.2879638671875, "pos_frac": 0.625, "sample": [13.883171081542969, 44.67425537109375, -16.084495544433594, -2.1045494079589844, 6.1294403076171875, 0.7921218872070312, -17.375274658203125, 31.49030303955078, -5.288238525390625, -13.043106079101562, -49.865970611572266, -3.9590530395507812, 28.90604591369629, -9.738983154296875, 11.096519470214844, 10.8582763671875, -0.8570442199707031, 18.438720703125, 27.32550048828125, 13.443824768066406, 36.51637268066406, -11.91943359375, 19.63238525390625, -15.364213943481445, 7.743782043457031, -4.7846832275390625, 12.4400634765625, 3.8383522033691406, -28.770095825195312, -4.5643310546875, 39.30003356933594, -10.38116455078125, 8.437744140625, 16.23784637451172, -0.01349639892578125, -0.245697021484375, 12.160377502441406, 51.475677490234375, -69.10808563232422, 35.00480270385742, -7.008907318115234, -2.1479568481445312, 47.25513458251953, -9.496833801269531, 1.7218303680419922, 8.053083419799805, 8.111177444458008, 5.671613693237305, -12.397476196289062, -5.349454879760742, 32.992942810058594, 7.6853179931640625, 2.1559906005859375, 4.1149444580078125, 57.2879638671875, 1.4256153106689453, 30.70185089111328, -29.197952270507812, 0.8740596771240234, 1.1535110473632812, 11.342689514160156, 36.15711975097656, 26.478593826293945, 25.760971069335938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000183.npy"}
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 12.278566360473633, "std": 19.186283111572266, "min": -18.667373657226562, "p10": -6.928670501708983, "median": 4.955974578857422, "p90": 42.37418975830079, "max": 52.81324005126953, "pos_frac": 0.6875, "sample": [2.3039093017578125, 40.650390625, 37.436031341552734, 2.6193904876708984, 11.541297912597656, 5.2029876708984375, 1.4346809387207031, -5.080787658691406, 4.708961486816406, -9.993629455566406, -1.1724891662597656, 42.690673828125, 1.5000743865966797, 13.844886779785156, 29.226470947265625, -4.9477386474609375, 18.2537841796875, 9.341522216796875, 7.101676940917969, 41.63572692871094, -16.425125122070312, 52.81324005126953, 26.364723205566406, 50.352455139160156, -4.324882507324219, 44.11395263671875, 9.098363876342773, 27.475814819335938, 34.74538040161133, -1.5108718872070312, 47.82512664794922, -5.5082855224609375, 7.1285247802734375, -18.030147552490234, -2.9511375427246094, 18.844192504882812, 50.39086151123047, 30.53993034362793, -9.008125305175781, 28.52074432373047, 3.368135452270508, -3.6734695434570312, 30.324363708496094, 2.0430641174316406, 14.716026306152344, 3.527111053466797, 24.24969482421875, 17.665252685546875, -1.1090679168701172, 36.66791534423828, -18.410303115844727, -4.531349182128906, 22.65770721435547, 4.465614318847656, 45.13337707519531, 15.174468994140625, -1.4675617218017578, -1.4299545288085938, 0.22138023376464844, 2.1273727416992188, -7.537406921386719, 2.633211135864258, -1.0725383758544922, -18.667373657226562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000184.npy"}
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 13.579446792602539, "std": 24.175151824951172, "min": -51.77075958251953, "p10": -13.992835998535156, "median": 16.47862720489502, "p90": 41.268027877807626, "max": 69.17318725585938, "pos_frac": 0.671875, "sample": [4.847997665405273, 52.918373107910156, 1.1029891967773438, 18.068811416625977, 30.12921142578125, -21.734378814697266, 1.9747734069824219, 20.60446548461914, -13.543533325195312, 3.1885528564453125, 39.03546142578125, 42.220619201660156, -4.51800537109375, -14.185394287109375, 50.706268310546875, 6.502473831176758, 58.492637634277344, 29.30615997314453, 39.04531478881836, 46.40933609008789, -1.1536407470703125, 27.952346801757812, 33.352439880371094, 14.888442993164062, 0.7484493255615234, 37.04525375366211, 27.01529312133789, 18.711408615112305, 20.740989685058594, 36.4552001953125, -4.5032806396484375, -10.206352233886719, 22.040029525756836, 25.704864501953125, -37.873863220214844, 24.58747100830078, 35.952606201171875, 28.821420669555664, 21.565898895263672, -1.655538558959961, -51.77075958251953, 11.735923767089844, -5.476398468017578, 22.74017333984375, 37.16130828857422, -31.567291259765625, 45.50428009033203, -7.28645133972168, 69.17318725585938, -7.764556884765625, 25.434829711914062, -4.811384201049805, 27.8480224609375, -7.4385223388671875, -8.15362548828125, 13.486652374267578, -11.937347412109375, -24.995126724243164, 9.609031677246094, -2.6999130249023438, -15.717220306396484, 12.39077377319336, 34.46964645385742, 28.387807846069336], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000185.npy"}
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 14.592811584472656, "std": 23.200944900512695, "min": -50.348899841308594, "p10": -8.677245521545409, "median": 11.336299896240234, "p90": 48.6493865966797, "max": 79.28109741210938, "pos_frac": 0.703125, "sample": [19.541595458984375, 52.754737854003906, -9.873558044433594, 7.511638641357422, 18.812164306640625, 14.805648803710938, 7.8503265380859375, -21.18297576904297, -1.9462890625, 22.704498291015625, 4.511871337890625, -4.850772857666016, 20.031890869140625, 13.314838409423828, 79.28109741210938, 35.5250129699707, 1.8528022766113281, 8.311847686767578, 11.447349548339844, -27.148452758789062, 5.942924499511719, 41.35359191894531, 39.68152618408203, 53.97740173339844, -50.348899841308594, -5.8662109375, -3.9124984741210938, -5.172473907470703, -6.183929443359375, -4.380161285400391, -0.653533935546875, 14.468976974487305, 27.52337646484375, -5.4187164306640625, -4.951921463012695, -6.325654983520508, 20.73291015625, 21.400604248046875, 28.25292205810547, -9.685070037841797, -5.8040924072265625, -16.41735076904297, 0.6917610168457031, 18.535415649414062, 20.3123779296875, 2.6327438354492188, 45.47221374511719, 22.199722290039062, -11.394973754882812, 26.487281799316406, 53.44366455078125, 4.856380462646484, 41.760292053222656, 59.590545654296875, 30.720314025878906, 10.628211975097656, 50.01103210449219, 50.4849853515625, 31.1163330078125, 37.45177459716797, 11.225250244140625, 30.638931274414062, 6.105232238769531, 9.501510620117188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000186.npy"}
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 13.431703567504883, "std": 24.01099967956543, "min": -28.26563262939453, "p10": -12.681272125244138, "median": 7.925563812255859, "p90": 45.451518249511736, "max": 87.26832580566406, "pos_frac": 0.734375, "sample": [6.560798645019531, -28.129150390625, 55.392791748046875, 54.432167053222656, 87.26832580566406, 21.98676300048828, 38.479759216308594, 12.107925415039062, 6.133514404296875, 26.110877990722656, -17.750564575195312, 24.981719970703125, 12.038204193115234, 39.4407958984375, 15.386077880859375, 4.70252799987793, 4.240472793579102, 11.70965576171875, 22.513168334960938, 3.940471649169922, -22.750701904296875, -24.589481353759766, 40.21577453613281, 0.404876708984375, 1.7908935546875, -6.174537658691406, 2.1753997802734375, 7.996734619140625, -2.849090576171875, 48.578651428222656, 41.060699462890625, 32.8228759765625, -10.510368347167969, -5.996395111083984, 7.854393005371094, 30.316917419433594, 4.305206298828125, 7.516716003417969, -16.926116943359375, 14.606155395507812, -6.943119049072266, -7.4316558837890625, 33.46887969970703, 2.095510482788086, -2.1626434326171875, 12.847328186035156, 56.09010314941406, -9.10845947265625, 16.1988525390625, 27.837326049804688, 39.68316650390625, -28.26563262939453, 25.233375549316406, 75.1007080078125, 2.2696304321289062, 11.764068603515625, 47.33329772949219, 28.090805053710938, 1.0803680419921875, 8.068946838378906, -13.216629028320312, -11.432106018066406, -2.841461181640625, 2.4734039306640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000187.npy"}
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 11.838859558105469, "std": 20.99146842956543, "min": -39.39886474609375, "p10": -5.476045608520507, "median": 6.625633239746094, "p90": 43.33048362731934, "max": 66.91380310058594, "pos_frac": 0.71875, "sample": [2.231597900390625, -5.907459259033203, 43.499202728271484, -10.000799179077148, -2.2910919189453125, 36.75306701660156, 43.61244201660156, 4.47679328918457, 27.84375, 20.615234375, -0.35284423828125, 66.91380310058594, 47.973690032958984, 6.2918853759765625, -3.5956954956054688, 2.4014968872070312, 36.572914123535156, 51.13861083984375, -2.214700698852539, -3.5126380920410156, 10.172698974609375, 4.7063446044921875, 4.7696075439453125, -2.891876220703125, 10.6561279296875, -4.469413757324219, -12.274349212646484, 33.732521057128906, 22.577884674072266, 38.21085739135742, 8.49981689453125, 6.065460205078125, 10.47821044921875, 56.16020965576172, 22.746803283691406, -1.130584716796875, 41.113525390625, 6.959381103515625, 13.521171569824219, 10.239418029785156, 24.422443389892578, -2.3864898681640625, 3.6243324279785156, 2.4499588012695312, 8.020439147949219, -0.7352447509765625, 0.8010463714599609, 42.936805725097656, 10.97509765625, 6.181955337524414, 10.25982666015625, -39.39886474609375, -7.19256591796875, 2.4300804138183594, 3.2791290283203125, 10.557273864746094, 4.344938278198242, -30.92164421081543, 12.984115600585938, 34.534034729003906, 43.983680725097656, 14.324455261230469, -2.9926109313964844, -37.08824157714844], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000188.npy"}
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 9.74410343170166, "std": 19.269197463989258, "min": -34.00193786621094, "p10": -12.212205505371093, "median": 6.696084976196289, "p90": 39.0359432220459, "max": 54.78192138671875, "pos_frac": 0.734375, "sample": [-7.263175964355469, 22.862899780273438, 2.756744384765625, 15.454444885253906, -5.8439483642578125, 39.65943908691406, 47.89232635498047, 13.428850173950195, 10.127603530883789, 2.7643280029296875, -23.207901000976562, 37.581119537353516, 0.0118560791015625, 5.854682922363281, -3.6113967895507812, -2.26007080078125, 46.25560760498047, 8.128124237060547, -7.633884429931641, 9.614639282226562, 3.8263587951660156, 14.648344039916992, 5.6134796142578125, 19.222915649414062, 4.396186828613281, 2.233095169067383, 10.619056701660156, 23.711383819580078, 3.2921619415283203, 18.268653869628906, 27.726539611816406, -33.76292419433594, 11.507431030273438, 42.483028411865234, 34.35321044921875, 29.874008178710938, 0.00415802001953125, -0.6538467407226562, 6.628826141357422, 6.763343811035156, 40.14393997192383, -11.53125, 18.635419845581055, 19.838035583496094, 1.2672042846679688, 3.641376495361328, -12.504043579101562, -17.433759689331055, -4.744140625, -2.8321762084960938, 11.641838073730469, 16.42454719543457, -18.57666015625, 31.73741912841797, 54.78192138671875, 8.281085968017578, -34.00193786621094, -18.212146759033203, 4.321002960205078, 11.22747802734375, 5.95869255065918, -4.3223724365234375, 51.48778533935547, 25.065622329711914], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000189.npy"}
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 16.18812370300293, "std": 21.000802993774414, "min": -19.440502166748047, "p10": -6.962018585205076, "median": 11.226792335510254, "p90": 52.19712677001954, "max": 62.77972412109375, "pos_frac": 0.8125, "sample": [8.010557174682617, 27.184799194335938, 60.732666015625, -4.08355712890625, 14.536258697509766, 56.26703643798828, 3.1921539306640625, 11.047012329101562, -11.81436538696289, 50.78356170654297, 17.036514282226562, 16.805084228515625, 11.678199768066406, 3.377948760986328, 23.990764617919922, 3.127716064453125, -19.440502166748047, 32.61798095703125, 10.504283905029297, 4.435251235961914, 22.102359771728516, 40.49972152709961, -12.857147216796875, 4.710720062255859, -5.013614654541016, 12.870914459228516, -18.421096801757812, 8.48306655883789, 5.563484191894531, 10.310497283935547, 19.798629760742188, -0.36862945556640625, 57.23920440673828, 37.807525634765625, 13.257957458496094, 2.764862060546875, 24.892669677734375, -5.259223937988281, 0.011434555053710938, 18.217124938964844, -4.775596618652344, 30.742172241210938, 45.42524719238281, 56.737823486328125, 5.982149124145508, -8.97808837890625, 1.596963882446289, 14.256454467773438, -7.6917877197265625, -13.541847229003906, 11.406572341918945, 7.226772308349609, 3.3030471801757812, 58.27854919433594, 10.144317626953125, 49.288978576660156, 62.77972412109375, 5.43670654296875, 29.794780731201172, 52.802940368652344, 4.787330627441406, 27.363616943359375, 22.0579833984375, 15.01527214050293], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000190.npy"}
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 11.920636177062988, "std": 25.666217803955078, "min": -72.2861328125, "p10": -9.586452102661132, "median": 8.65031623840332, "p90": 49.02509078979492, "max": 60.0333251953125, "pos_frac": 0.6875, "sample": [52.953941345214844, -5.487895965576172, 15.388565063476562, 17.203407287597656, 53.17148971557617, -7.031028747558594, 30.725143432617188, 41.925315856933594, 9.223962783813477, 9.913406372070312, -9.286666870117188, 49.365211486816406, 33.59391784667969, -7.787509918212891, 9.900932312011719, 2.2675018310546875, -42.57209777832031, 5.462440490722656, 60.0333251953125, 8.076669692993164, 13.567977905273438, 31.376556396484375, 11.224105834960938, 41.720252990722656, -39.206573486328125, 24.983436584472656, 2.5077056884765625, 31.78817367553711, -5.2608795166015625, -0.6956119537353516, 6.250524520874023, -3.7094039916992188, 4.725343704223633, 12.03555679321289, 52.816558837890625, -31.755844116210938, -12.737541198730469, -21.283935546875, 3.4348182678222656, 35.61808776855469, 11.188480377197266, 12.245590209960938, 24.85979461669922, 7.908958435058594, 34.84161376953125, 4.124736785888672, 32.10963439941406, 58.565792083740234, -1.87701416015625, 0.951690673828125, -1.4979629516601562, 3.6206798553466797, 57.60784912109375, -6.956977844238281, -9.71493148803711, 47.325958251953125, 48.231475830078125, -72.2861328125, 9.902814865112305, -5.9670562744140625, 3.9915008544921875, 23.726272583007812, -3.058563232421875, -1.3628158569335938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000191.npy"}
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 6.614563941955566, "std": 16.003372192382812, "min": -39.41828918457031, "p10": -7.003534507751464, "median": 4.11501407623291, "p90": 27.32652797698975, "max": 51.049705505371094, "pos_frac": 0.6875, "sample": [-39.41828918457031, -6.304088592529297, 2.607837677001953, 32.49007797241211, 27.579490661621094, 1.5640907287597656, -5.409427642822266, 22.282394409179688, 50.89940643310547, -28.082778930664062, -4.962608337402344, -0.297149658203125, 2.124969482421875, 2.776508331298828, -2.4978866577148438, 4.4232177734375, 23.548381805419922, 7.5508270263671875, -2.6791038513183594, 5.386116027832031, 14.584455490112305, 15.535850524902344, -0.04017448425292969, -1.826629638671875, 38.37028503417969, 0.6888217926025391, 5.09759521484375, 15.536483764648438, 6.1828460693359375, 3.5697479248046875, 7.2974700927734375, 19.171157836914062, 5.69465446472168, 10.150751113891602, -7.30329704284668, -6.1107635498046875, 4.5921173095703125, -4.644191741943359, -4.3949737548828125, 22.947181701660156, 4.921689987182617, -7.686927795410156, -30.79977035522461, 51.049705505371094, 7.922698974609375, 6.039545059204102, -0.8473701477050781, 1.6765460968017578, 2.3613815307617188, -8.657577514648438, 18.912384033203125, 16.421756744384766, 6.179969787597656, 26.90363311767578, 3.8068103790283203, 13.403221130371094, 0.5742301940917969, -2.5814437866210938, 22.11725616455078, 30.708999633789062, 27.507768630981445, -8.186201095581055, 1.9588241577148438, 0.9436092376708984], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000192.npy"}
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 12.522571563720703, "std": 20.105934143066406, "min": -37.69419860839844, "p10": -6.734730911254882, "median": 6.428924560546875, "p90": 40.97392730712891, "max": 59.19706726074219, "pos_frac": 0.71875, "sample": [2.6521530151367188, 2.0821590423583984, 36.78141403198242, 4.1011505126953125, 12.52763557434082, -37.69419860839844, 11.563224792480469, 40.53606414794922, -18.978363037109375, -0.5925426483154297, 16.79535675048828, 3.5690155029296875, 10.70285415649414, 2.0936737060546875, 14.283393859863281, -0.6861419677734375, 34.679771423339844, -0.23534011840820312, 16.518856048583984, 36.33525085449219, 1.0800495147705078, 31.025165557861328, 56.335792541503906, -7.124401092529297, 44.07099151611328, 34.3892822265625, -0.03624725341796875, 1.2498550415039062, -3.1008758544921875, 7.867914199829102, 32.71388626098633, 26.428421020507812, -0.1343536376953125, 11.380050659179688, 0.5499801635742188, -8.590278625488281, -7.2791290283203125, 0.11745452880859375, 11.511650085449219, 16.347076416015625, -1.8034515380859375, 6.395519256591797, 24.210918426513672, 59.19706726074219, -1.6466026306152344, -5.8122711181640625, 10.1082763671875, 3.6491527557373047, 6.462329864501953, 47.653953552246094, 41.161582946777344, 2.002683639526367, 51.28041076660156, 49.29960250854492, 33.02332305908203, -7.3256683349609375, 20.766281127929688, -28.06963348388672, 14.605033874511719, 39.69282531738281, -5.82550048828125, 6.260463714599609, -1.1541213989257812, 1.4747257232666016], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000193.npy"}
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 12.298486709594727, "std": 22.395591735839844, "min": -39.6455078125, "p10": -11.513494873046875, "median": 8.093467712402344, "p90": 42.32634925842286, "max": 71.72985076904297, "pos_frac": 0.71875, "sample": [40.010276794433594, -11.121902465820312, 38.16880798339844, -6.574806213378906, 37.56922149658203, 8.684494018554688, -33.31092834472656, 2.0924453735351562, 3.609527587890625, 13.36126708984375, 8.154251098632812, 26.92535400390625, 5.1664276123046875, 24.283065795898438, -11.681320190429688, -14.298103332519531, 40.51203918457031, -18.95214080810547, 8.482086181640625, -2.4861602783203125, 8.032684326171875, 6.464260101318359, 15.634849548339844, 34.164894104003906, -1.4676570892333984, 24.38544464111328, -10.809661865234375, 25.436843872070312, 3.3071823120117188, 9.590553283691406, 0.11277008056640625, 40.5916633605957, -6.302680969238281, -0.6808700561523438, -12.982864379882812, -2.7147178649902344, 51.095680236816406, 44.511810302734375, 8.252788543701172, 49.24104309082031, -39.6455078125, 4.573020935058594, 2.1771106719970703, 1.7961959838867188, 24.859420776367188, 47.712738037109375, 6.547977447509766, -17.935096740722656, 71.72985076904297, 38.70245361328125, 7.291221618652344, -5.7064971923828125, 18.642471313476562, 43.069786071777344, 21.325271606445312, 70.57901763916016, 6.406648635864258, 1.2061824798583984, 8.53978157043457, 11.774139404296875, 13.639673233032227, -7.019069671630859, -4.118251800537109, 16.49663734436035], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000194.npy"}
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 12.761885643005371, "std": 18.93498420715332, "min": -42.847198486328125, "p10": -6.546456146240233, "median": 11.348540306091309, "p90": 40.39236221313477, "max": 50.64408874511719, "pos_frac": 0.703125, "sample": [0.2314605712890625, 16.96198272705078, -3.6817092895507812, 28.849594116210938, 48.162330627441406, 19.595306396484375, -2.481914520263672, 11.54205322265625, 14.652191162109375, 19.705215454101562, 35.1072998046875, 30.114477157592773, -13.330642700195312, -12.013816833496094, -13.216583251953125, -42.847198486328125, -2.2194480895996094, 8.901649475097656, 3.0483341217041016, 41.423179626464844, -10.662246704101562, 24.53931427001953, 6.249782562255859, 18.690780639648438, 11.732933044433594, 11.759880065917969, 47.68107604980469, 50.64408874511719, 2.8671035766601562, 40.757362365722656, 2.5658931732177734, -2.00714111328125, 44.46366500854492, 7.5842132568359375, 35.697731018066406, -4.6215362548828125, -4.269048690795898, 20.023880004882812, 32.05897521972656, 13.376235961914062, 12.836633682250977, -5.4672698974609375, 0.7373886108398438, -0.4361724853515625, 0.5137424468994141, 11.155027389526367, -0.87060546875, 36.936492919921875, -14.819808959960938, 3.8040599822998047, -0.7496871948242188, -4.3850250244140625, 46.88615417480469, 35.41426086425781, 19.74371337890625, 15.195413589477539, 29.81689453125, -0.5279693603515625, -7.008964538574219, 24.252716064453125, 6.6109466552734375, 9.513954162597656, 20.431381225585938, 39.54069519042969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000195.npy"}
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 10.939878463745117, "std": 17.839073181152344, "min": -49.258609771728516, "p10": -3.7196941375732417, "median": 6.7469482421875, "p90": 36.87261657714844, "max": 49.06329345703125, "pos_frac": 0.796875, "sample": [12.894989013671875, -49.258609771728516, -1.2852725982666016, 6.5697784423828125, 11.912496566772461, -6.020957946777344, -5.3158721923828125, 4.293373107910156, 14.977523803710938, -1.98968505859375, 1.396514892578125, 20.043006896972656, 34.37086486816406, 3.149017333984375, 6.080879211425781, 22.001235961914062, -38.58195877075195, 14.18133544921875, 0.9469375610351562, 13.539878845214844, -1.8625507354736328, 2.0468597412109375, 35.586830139160156, -1.1234722137451172, -12.62814712524414, 6.9241180419921875, 44.7452392578125, 37.423667907714844, 4.7478790283203125, 40.2091064453125, 24.795196533203125, 15.643733978271484, -10.592849731445312, -2.9152145385742188, -3.9702682495117188, 5.848533630371094, 3.3812007904052734, 11.211755752563477, 3.4147796630859375, 49.06329345703125, 10.333442687988281, 15.968017578125, 28.287567138671875, 8.263725280761719, 27.009723663330078, 0.9198760986328125, 2.5391292572021484, 14.922836303710938, 18.037796020507812, 25.193344116210938, 44.81171417236328, 34.39042663574219, 2.1964454650878906, 3.698211669921875, 2.9652328491210938, 40.71533203125, 10.503641128540039, 37.86418914794922, 3.0329742431640625, 2.607959747314453, 3.6776351928710938, -3.135021209716797, 23.96607208251953, 25.526763916015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000196.npy"}
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 13.148622512817383, "std": 22.225181579589844, "min": -57.41026306152344, "p10": -9.352568435668946, "median": 12.05118179321289, "p90": 43.63578262329102, "max": 57.692771911621094, "pos_frac": 0.734375, "sample": [-7.6119232177734375, 49.26020050048828, 8.613384246826172, -19.666481018066406, 18.419921875, 18.331741333007812, -9.202014923095703, -57.41026306152344, -0.33135986328125, -5.811256408691406, 14.275405883789062, 46.502227783203125, -9.417091369628906, -6.787410736083984, 4.086460113525391, 57.425132751464844, 10.421798706054688, 12.12307357788086, 54.22332000732422, 28.181610107421875, 14.758949279785156, 5.8695068359375, 18.813636779785156, 10.798759460449219, 7.4486846923828125, 3.3610382080078125, 13.104393005371094, -13.521175384521484, 41.709068298339844, 17.46178436279297, 44.461517333984375, 47.578033447265625, -21.651824951171875, 33.75028991699219, -4.384212493896484, -36.25300979614258, 5.199409484863281, -6.603668212890625, 34.077362060546875, 6.789894104003906, -2.5089168548583984, 31.266075134277344, 27.3525390625, 22.199127197265625, 11.979290008544922, 41.64653015136719, 14.448989868164062, -2.8065719604492188, 14.248607635498047, 29.979902267456055, -8.242738723754883, 11.138099670410156, 11.5355224609375, 22.050281524658203, 4.304228782653809, -21.36785125732422, 57.692771911621094, 17.84619903564453, 3.154329299926758, 25.85049819946289, 33.01325988769531, 35.306060791015625, 4.806884765625, 28.22378158569336], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000197.npy"}
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 10.667190551757812, "std": 17.305734634399414, "min": -23.76352882385254, "p10": -10.421304893493652, "median": 7.870916366577148, "p90": 33.980554580688484, "max": 56.322853088378906, "pos_frac": 0.78125, "sample": [-5.515380859375, 8.582775115966797, -2.3931636810302734, 7.6863250732421875, 7.248407363891602, 13.161293029785156, -23.76352882385254, 0.335357666015625, 5.446233749389648, 3.650115966796875, 27.938758850097656, 18.088150024414062, 3.189485549926758, -20.612565994262695, 56.322853088378906, 44.67058563232422, 8.61700439453125, 44.12286376953125, 28.056243896484375, -15.45451545715332, 24.149765014648438, 19.350500106811523, 38.254730224609375, -10.135553359985352, 32.66801452636719, -17.970748901367188, 0.53973388671875, 1.4269866943359375, 15.826667785644531, 20.31804084777832, 25.63869857788086, -3.134307861328125, -13.020401000976562, -16.165191650390625, 4.0935211181640625, -0.5551071166992188, 1.0713539123535156, 3.685932159423828, -4.465728759765625, 22.14691162109375, 5.0784149169921875, 31.76311492919922, 39.2096061706543, 2.0204010009765625, 7.978597640991211, 8.889053344726562, 2.8575439453125, 34.47590637207031, 49.20295715332031, 2.4918289184570312, 1.7488021850585938, -10.543769836425781, 15.104827880859375, 32.82473373413086, 9.029129028320312, 8.683853149414062, 8.643905639648438, 7.763235092163086, -3.5320396423339844, 6.390270233154297, 24.55582046508789, 9.654621124267578, 13.04296875, 22.265308380126953], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000198.npy"}
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 9.621490478515625, "std": 17.642353057861328, "min": -31.49846649169922, "p10": -11.300503540039061, "median": 8.083358764648438, "p90": 32.28212471008301, "max": 48.873809814453125, "pos_frac": 0.734375, "sample": [31.456233978271484, 29.440383911132812, 6.228584289550781, 3.4772281646728516, -18.347566604614258, -6.620174407958984, 37.32813262939453, 42.825103759765625, 11.289070129394531, 15.177078247070312, 25.9805908203125, -3.8685684204101562, 44.375606536865234, 10.978584289550781, 47.524349212646484, -11.509033203125, 13.602344512939453, 22.554553985595703, 3.4705657958984375, 37.57839584350586, 14.970489501953125, -12.713912963867188, 30.491565704345703, 17.062789916992188, 1.6070423126220703, 6.475860595703125, 7.846702575683594, -4.385246276855469, 15.059967041015625, 8.939804077148438, 8.320014953613281, 0.13480186462402344, 8.904640197753906, 1.0692520141601562, -0.6383523941040039, 14.579879760742188, 15.575927734375, -10.813934326171875, -8.870452880859375, 7.0100860595703125, -30.759937286376953, 22.925445556640625, 17.415985107421875, 7.750297546386719, -31.49846649169922, 31.280906677246094, 6.242345809936523, 1.3680648803710938, -16.006492614746094, 7.6914215087890625, -0.2184734344482422, -17.538162231445312, -7.1533203125, 10.76278305053711, -10.630683898925781, 0.21121978759765625, 32.636077880859375, -3.1012802124023438, 13.818466186523438, 5.810649871826172, 22.83423614501953, 14.464897155761719, 48.873809814453125, 25.027179718017578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000199.npy"}
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 17.061601638793945, "std": 21.051311492919922, "min": -29.613067626953125, "p10": -4.436742210388183, "median": 12.38326644897461, "p90": 47.12689208984375, "max": 64.37272644042969, "pos_frac": 0.78125, "sample": [4.774776458740234, 36.1922721862793, 64.14907836914062, 16.69080352783203, 13.602676391601562, 1.5792598724365234, 27.800750732421875, 29.97480010986328, 4.899658203125, -6.96405029296875, 11.640884399414062, 36.411712646484375, -4.972434997558594, 42.25816345214844, 4.1988983154296875, -3.5631332397460938, 6.093902587890625, 1.9234428405761719, 38.33128356933594, 14.321014404296875, 3.5276336669921875, -5.036918640136719, 22.951065063476562, 56.16947555541992, 1.4673709869384766, 17.97035789489746, 40.070411682128906, -0.5073471069335938, 39.06755447387695, -1.4503021240234375, 28.03467559814453, 13.125648498535156, 34.67901611328125, 46.27400207519531, 47.913055419921875, 0.17319297790527344, -29.613067626953125, 35.89069366455078, 42.16007995605469, 34.679161071777344, 56.67023849487305, 10.007209777832031, 64.37272644042969, 9.255332946777344, 16.768539428710938, -17.007278442382812, -0.8177947998046875, 17.441558837890625, -0.0384979248046875, 47.49241638183594, -3.3953628540039062, 8.712783813476562, 15.899421691894531, -10.277458190917969, 8.006996154785156, 13.841903686523438, -4.57899284362793, 7.0970916748046875, 5.22918701171875, 0.7711715698242188, 0.5138053894042969, -4.104824066162109, 26.683013916015625, 56.50988006591797], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000200.npy"}
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 10.7646484375, "std": 19.507442474365234, "min": -30.533615112304688, "p10": -7.947595214843748, "median": 7.838079452514648, "p90": 39.79777984619143, "max": 51.71259307861328, "pos_frac": 0.734375, "sample": [3.7825775146484375, 33.594573974609375, 4.518806457519531, 13.79281997680664, 7.055133819580078, 5.525791168212891, 51.22187423706055, 8.109869003295898, 10.682304382324219, -2.5709381103515625, 14.421195983886719, 22.33952522277832, 17.119741439819336, 42.97279357910156, 4.11121940612793, -30.533615112304688, 51.71259307861328, 18.88599395751953, 5.73681640625, 51.57227325439453, -6.2215423583984375, 0.5785903930664062, 16.08141326904297, -11.967941284179688, 0.227386474609375, 31.262454986572266, 9.246284484863281, 8.054363250732422, -4.475086212158203, 23.668861389160156, 3.8561744689941406, 28.574356079101562, 2.2466793060302734, 2.1957473754882812, 23.212390899658203, 42.20015335083008, 1.3264427185058594, -28.04296875, 19.207275390625, -0.32660675048828125, 15.463485717773438, 19.567520141601562, -1.1834030151367188, 32.79082489013672, -3.3298397064208984, 7.621795654296875, 4.9685211181640625, -28.62103271484375, 6.594236373901367, 28.313385009765625, -1.6825313568115234, 11.322906494140625, 12.087932586669922, 34.19224166870117, 8.622760772705078, 49.36241912841797, 21.126670837402344, -0.8772735595703125, -0.5994949340820312, -8.687332153320312, 45.88016128540039, -4.5998992919921875, -27.076583862304688, -27.275741577148438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000201.npy"}
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 9.542519569396973, "std": 19.4805850982666, "min": -41.05387496948242, "p10": -6.140293312072752, "median": 7.103456497192383, "p90": 30.419432067871117, "max": 67.64596557617188, "pos_frac": 0.734375, "sample": [-4.363685607910156, 4.859466552734375, 52.291324615478516, 67.64596557617188, 22.29509162902832, -1.7635498046875, 11.697242736816406, 6.266441345214844, 16.23743438720703, 9.513690948486328, 4.42106819152832, 9.11837387084961, 0.6362037658691406, 5.16502571105957, -2.8337554931640625, 5.9018402099609375, -2.6303939819335938, -0.5070953369140625, 11.889801025390625, -19.104286193847656, 5.07855224609375, 8.428974151611328, 17.902084350585938, 12.530387878417969, -2.8268203735351562, -1.1393966674804688, 5.183633804321289, 56.223140716552734, -0.8797225952148438, 38.7269287109375, -4.767129898071289, 5.720529556274414, 6.399219512939453, 7.8855133056640625, 14.379776000976562, -38.34410858154297, 16.883621215820312, 9.614736557006836, 21.307510375976562, -23.107887268066406, 24.242218017578125, 7.8076934814453125, 16.65045928955078, 24.299270629882812, 15.455589294433594, 11.588371276855469, 18.99604034423828, 4.511190414428711, 20.132583618164062, -12.878684997558594, 1.888315200805664, -41.05387496948242, 1.6296653747558594, -6.701656341552734, 3.2113113403320312, 13.015090942382812, 55.52040100097656, 2.035848617553711, -4.830446243286133, 21.493886947631836, -13.285808563232422, 33.0423583984375, 8.08213996887207, 53.933509826660156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000202.npy"}
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 7.45098876953125, "std": 15.850777626037598, "min": -30.490524291992188, "p10": -9.777874755859374, "median": 5.769950866699219, "p90": 30.338877105712896, "max": 50.406044006347656, "pos_frac": 0.734375, "sample": [2.9642181396484375, 2.2381134033203125, -30.490524291992188, -10.506561279296875, 0.770416259765625, 8.513786315917969, 3.7943191528320312, 35.283172607421875, 3.1389617919921875, -20.280067443847656, 12.961990356445312, 15.490161895751953, 10.28196907043457, 37.61085510253906, 0.7791786193847656, 33.82527160644531, 12.9912109375, 7.026100158691406, 26.273178100585938, 13.987625122070312, -17.184829711914062, -0.08721923828125, 15.453598022460938, -15.518421173095703, -3.8927783966064453, -15.946426391601562, 2.539196014404297, 28.833160400390625, 9.498558044433594, 30.98418426513672, 5.154319763183594, -3.7575759887695312, 42.36316680908203, 15.796504974365234, 17.224884033203125, 3.1223068237304688, 15.197860717773438, 8.27362060546875, 12.989044189453125, 8.402923583984375, 25.698036193847656, -8.05474853515625, -8.077606201171875, 6.385581970214844, 10.422813415527344, 2.6126480102539062, 40.51904296875, 13.537185668945312, -5.401336669921875, 50.406044006347656, 22.27254867553711, -0.7441558837890625, -1.4245147705078125, 2.8363800048828125, 6.970733642578125, -24.34912109375, 0.5529556274414062, 0.8288421630859375, 12.358356475830078, 3.0974178314208984, 9.468437194824219, -4.07965087890625, -5.1432952880859375, 0.07122802734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000203.npy"}
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 11.475125312805176, "std": 17.036415100097656, "min": -15.94563102722168, "p10": -8.758574676513668, "median": 8.696934700012207, "p90": 35.07303504943848, "max": 55.101104736328125, "pos_frac": 0.6875, "sample": [55.101104736328125, 25.85784912109375, -1.0734996795654297, 4.496225357055664, 8.482955932617188, 49.09722900390625, 2.8028717041015625, 8.910913467407227, -10.356285095214844, 22.682098388671875, 33.11683654785156, -10.286949157714844, -0.7827873229980469, 21.579486846923828, -0.9341201782226562, 19.282638549804688, 23.76763916015625, -3.3249359130859375, -4.481010437011719, -10.859153747558594, 22.721216201782227, -1.4484481811523438, 6.426654815673828, 36.65052795410156, -3.56304931640625, 2.0708580017089844, 9.807098388671875, 30.016986846923828, 4.486230850219727, 6.51458740234375, 36.402183532714844, -11.709846496582031, 2.6003799438476562, 16.536407470703125, 9.81089973449707, -4.920392990112305, 9.480682373046875, 47.080074310302734, -15.94563102722168, 28.969444274902344, -0.7212791442871094, 26.94723129272461, 27.602249145507812, 35.184993743896484, -5.1923675537109375, 16.177146911621094, 13.670677185058594, 0.27689170837402344, 10.982025146484375, 9.833927154541016, 41.507652282714844, 34.811798095703125, -3.69110107421875, 15.708061218261719, 12.937248229980469, 33.50786209106445, 4.013702392578125, -0.8810348510742188, 1.984375, 5.376411437988281, -15.920646667480469, -3.7839813232421875, 24.806734085083008, -15.7965087890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000204.npy"}
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 10.74968433380127, "std": 19.7203369140625, "min": -32.457618713378906, "p10": -8.344012641906739, "median": 6.801570892333984, "p90": 35.88695640563965, "max": 61.19239044189453, "pos_frac": 0.6875, "sample": [58.88922119140625, 1.2034931182861328, -2.6275691986083984, 3.128507614135742, 4.953762054443359, 47.31769943237305, 18.8873348236084, 6.806037902832031, -30.085357666015625, 23.531326293945312, 3.1589431762695312, -0.4779796600341797, 61.19239044189453, 10.878276824951172, 6.403797149658203, -9.549751281738281, -5.5645294189453125, 2.796905517578125, 10.18927001953125, 17.192928314208984, 7.61817741394043, 35.45111083984375, 3.267087936401367, 33.539642333984375, 26.94281005859375, 3.254852294921875, -6.4932708740234375, -1.3187179565429688, -0.7665252685546875, 45.24536895751953, -1.4464492797851562, -8.460245132446289, 23.9742431640625, 1.6659698486328125, -16.604385375976562, 10.926071166992188, 6.231504440307617, 6.7971038818359375, -3.8354339599609375, 27.078697204589844, -31.715484619140625, 31.648880004882812, 15.164344787597656, 16.971481323242188, 42.39697265625, 25.926025390625, 14.59152603149414, 9.905555725097656, 25.228931427001953, -5.2312774658203125, -3.7738037109375, 36.06788635253906, 35.464786529541016, 14.442298889160156, -2.330047607421875, 33.446815490722656, 13.992618560791016, -6.961156845092773, 40.725677490234375, -8.072803497314453, 2.2138595581054688, -32.457618713378906, -9.680458068847656, 8.722465515136719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000205.npy"}
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 9.311038970947266, "std": 17.98443031311035, "min": -37.006500244140625, "p10": -5.5769754409790036, "median": 4.781318664550781, "p90": 38.63292541503907, "max": 51.069427490234375, "pos_frac": 0.640625, "sample": [-4.966644287109375, -8.951339721679688, 5.540397644042969, -1.9306488037109375, -5.75071907043457, -2.0367889404296875, 9.890922546386719, 4.406406402587891, 25.10437774658203, 6.90962028503418, -1.190286636352539, 43.39399719238281, 32.00289535522461, -1.9105796813964844, -0.5688705444335938, 43.034698486328125, -1.3686065673828125, 38.884910583496094, -4.817291259765625, 21.472192764282227, 43.38166809082031, 4.1348724365234375, 10.718757629394531, 0.13916015625, 4.533111572265625, 38.044960021972656, 6.213321685791016, -37.006500244140625, 21.791549682617188, 2.3463287353515625, 34.54478454589844, 41.31794738769531, 6.6946868896484375, 5.0295257568359375, 8.935882568359375, 2.2777557373046875, -33.855926513671875, -15.919456481933594, -3.9302501678466797, 3.124725341796875, 9.026674270629883, -2.8549156188964844, 23.922271728515625, -2.5960845947265625, -5.171573638916016, -4.099945068359375, 38.89004135131836, 24.574798583984375, 3.2315826416015625, -6.470062255859375, 51.069427490234375, 8.53472900390625, 36.27459716796875, -0.37963104248046875, 12.114845275878906, 15.983192443847656, -0.2685127258300781, 2.8557586669921875, -0.889984130859375, 18.291763305664062, -13.047906875610352, 11.238349914550781, 11.124565124511719, 24.88697052001953], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000206.npy"}
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 12.920370101928711, "std": 16.91084861755371, "min": -32.48072052001953, "p10": -5.680182456970213, "median": 10.107718467712402, "p90": 36.288006591796886, "max": 54.44929504394531, "pos_frac": 0.78125, "sample": [54.44929504394531, 0.31723785400390625, -4.364818572998047, 15.589038848876953, 20.428443908691406, 3.99755859375, -0.226593017578125, 20.639724731445312, -0.6904525756835938, 18.633255004882812, 16.33990478515625, 4.443641662597656, 2.5019874572753906, 37.16352081298828, -6.24390983581543, -6.984130859375, 6.652130126953125, 3.363922119140625, 4.784023284912109, 33.795833587646484, 29.306278228759766, 9.904182434082031, 22.47637939453125, -1.00994873046875, -7.832021713256836, 1.2354278564453125, -2.8224430084228516, 26.2548828125, -11.3837890625, 11.832534790039062, 21.761985778808594, 26.78185272216797, 51.65716552734375, 5.245841979980469, 8.042129516601562, 5.620586395263672, -10.7510986328125, 26.1556396484375, 14.76568603515625, 2.057954788208008, 48.33723449707031, 27.98485565185547, 2.6623306274414062, 9.539005279541016, 0.5685462951660156, 10.930328369140625, 19.044708251953125, -1.3214874267578125, 12.571146011352539, 28.751190185546875, -32.48072052001953, 40.07897186279297, -0.242950439453125, 24.079452514648438, 14.199951171875, 34.245140075683594, 41.918609619140625, 13.962562561035156, 23.699752807617188, 10.311254501342773, -6.611106872558594, 1.3406982421875, 3.7254409790039062, 45.71995162963867], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000207.npy"}
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 6.453763484954834, "std": 15.214516639709473, "min": -28.02045440673828, "p10": -7.6269378662109375, "median": 3.655345916748047, "p90": 26.91632232666016, "max": 43.139198303222656, "pos_frac": 0.625, "sample": [6.317121505737305, 2.8724327087402344, 21.15460777282715, -6.524286270141602, 32.2811279296875, 19.068904876708984, 1.0486927032470703, -7.6986236572265625, 12.294342041015625, -28.02045440673828, 2.2611236572265625, 10.52618408203125, 21.80066680908203, 3.7067642211914062, 36.5507926940918, 3.508230209350586, 3.183330535888672, -4.231109619140625, -5.350503921508789, -16.677833557128906, 9.240867614746094, -5.957118988037109, 1.8621578216552734, -6.512908935546875, 14.866842269897461, 32.633766174316406, 13.35687255859375, -4.704668045043945, 14.433975219726562, 31.54543113708496, -5.288990020751953, 0.5583953857421875, -2.6338653564453125, 3.6039276123046875, -27.269515991210938, 10.706939697265625, 16.663463592529297, 21.443923950195312, 3.8258514404296875, -3.0765018463134766, 43.139198303222656, 10.575477600097656, -7.299522399902344, -5.6123809814453125, 27.38739776611328, -2.949981689453125, 13.3363037109375, -4.2579345703125, 13.36524772644043, 5.315584182739258, -9.125297546386719, 21.259437561035156, -1.9186935424804688, 20.26397705078125, -6.329166412353516, 25.795780181884766, 5.004325866699219, -4.2394866943359375, -12.24449348449707, -7.4596710205078125, -15.025516510009766, 25.81714630126953, 10.98199462890625, 39.89076232910156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000208.npy"}
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 10.135825157165527, "std": 17.65978240966797, "min": -32.031822204589844, "p10": -8.383819580078123, "median": 8.01480484008789, "p90": 33.05628814697266, "max": 61.338104248046875, "pos_frac": 0.734375, "sample": [20.526424407958984, 34.75566482543945, 7.565521240234375, 4.093738555908203, -5.709049224853516, 29.716064453125, 10.572942733764648, 9.881561279296875, 4.779207229614258, 61.338104248046875, 0.01734161376953125, 4.495536804199219, -7.408843994140625, 8.464088439941406, 9.917587280273438, -3.677642822265625, -10.662506103515625, -1.8137893676757812, -0.6188316345214844, 32.0859375, 1.1616935729980469, -2.9540176391601562, 1.2540740966796875, 8.73175048828125, 4.370155334472656, 9.565948486328125, 30.93744659423828, 15.051544189453125, 26.202194213867188, 17.052581787109375, 14.238363265991211, 25.2138671875, 6.369621276855469, 22.10407257080078, 33.47215270996094, -9.172149658203125, 37.95215606689453, -1.6904830932617188, -32.031822204589844, 9.76556396484375, -6.134674072265625, 2.1588802337646484, -8.801666259765625, -1.8237533569335938, 19.582107543945312, 6.032712936401367, 56.52501678466797, 14.821361541748047, 10.047599792480469, 30.2525634765625, 4.160223007202148, -14.259994506835938, 0.6407051086425781, -6.297210693359375, -12.452728271484375, 1.5743446350097656, 43.59276580810547, -27.917816162109375, 21.945465087890625, 10.042924880981445, 31.85696792602539, 35.33614730834961, 9.861675262451172, 2.035449981689453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000209.npy"}
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 14.197968482971191, "std": 17.754886627197266, "min": -26.136444091796875, "p10": -8.551938438415522, "median": 13.183069229125977, "p90": 36.78775177001953, "max": 51.46529006958008, "pos_frac": 0.828125, "sample": [36.929466247558594, 2.0243148803710938, 3.7589111328125, 25.88713836669922, 0.2227039337158203, 30.70496368408203, 51.46529006958008, 16.174896240234375, 44.51911926269531, 36.126312255859375, 7.944690704345703, 34.75563049316406, 18.189476013183594, 42.57893371582031, 43.963836669921875, 6.14031982421875, -12.210538864135742, -13.74209213256836, 27.795921325683594, -11.441848754882812, 38.9761962890625, 9.252548217773438, 15.260000228881836, 3.546875, 29.226173400878906, 3.5976028442382812, -1.4287872314453125, 11.495243072509766, 34.396873474121094, 0.0113372802734375, 27.437339782714844, 2.2565956115722656, 22.135452270507812, 9.530654907226562, 11.30349349975586, -0.396331787109375, 0.125946044921875, -2.672283172607422, 7.132106781005859, 36.13059997558594, -14.038961410522461, 48.1170654296875, 14.903884887695312, 2.7479324340820312, 18.60516357421875, -0.77288818359375, 17.564638137817383, 4.810184478759766, 16.396026611328125, 3.3754920959472656, 0.3978996276855469, 30.18695640563965, 28.342227935791016, -11.07179069519043, 4.441413879394531, 36.45708465576172, 16.819908142089844, 26.918869018554688, 20.40385627746582, -26.136444091796875, 4.64300537109375, -23.236907958984375, 14.870895385742188, 24.819416046142578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000210.npy"}
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 10.511463165283203, "std": 15.636152267456055, "min": -33.775238037109375, "p10": -2.6355190277099605, "median": 10.010669708251953, "p90": 28.31274871826172, "max": 62.01849365234375, "pos_frac": 0.796875, "sample": [-1.3973255157470703, -24.679176330566406, 12.769996643066406, 1.497549057006836, 1.9708786010742188, 0.7235355377197266, -10.257904052734375, 25.968597412109375, 28.222503662109375, 1.8216915130615234, 17.779388427734375, 5.4535980224609375, 10.419998168945312, 20.566452026367188, -4.154819488525391, 12.238203048706055, 4.146852493286133, -5.591335296630859, 27.00080108642578, 3.5936622619628906, 6.331727981567383, -0.5155029296875, 11.104789733886719, -33.775238037109375, 17.41585922241211, -23.676137924194336, -1.411041259765625, 22.305618286132812, 33.8535270690918, 14.903621673583984, 20.12957000732422, 2.0975341796875, 17.556509017944336, -2.8354644775390625, 22.950027465820312, 5.166522979736328, 12.260486602783203, 15.927749633789062, -2.1689796447753906, 9.601341247558594, 2.2082366943359375, 62.01849365234375, 40.22503662109375, 2.566488265991211, 0.24109649658203125, 20.526805877685547, 34.91570281982422, 12.818359375, 30.96966552734375, 22.34625244140625, -1.2337417602539062, 17.470539093017578, 31.841415405273438, 26.54907989501953, 13.383995056152344, 25.336637496948242, 5.92431640625, 16.361732482910156, 0.36780738830566406, 28.351425170898438, 2.484416961669922, 1.8530120849609375, 1.6054153442382812, -1.71417236328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000211.npy"}
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 11.171043395996094, "std": 16.94175148010254, "min": -22.380569458007812, "p10": -7.2098081588745115, "median": 7.417048454284668, "p90": 36.55585708618164, "max": 56.663360595703125, "pos_frac": 0.703125, "sample": [36.59675598144531, 22.899524688720703, 23.399152755737305, 38.122196197509766, -2.5907020568847656, 2.8489608764648438, 45.56754684448242, 7.037294387817383, -7.956672668457031, 24.196510314941406, -0.6959266662597656, -22.380569458007812, 20.169090270996094, 56.663360595703125, -9.2899169921875, 28.720321655273438, 13.901336669921875, 16.47586441040039, -3.8105926513671875, 20.54157066345215, 9.992103576660156, 2.989461898803711, 1.1499919891357422, 14.214874267578125, -11.6480712890625, -0.055328369140625, 33.8834228515625, -7.295225143432617, 40.917572021484375, 44.09033203125, 21.14193344116211, 10.041606903076172, 13.867027282714844, 2.0220184326171875, 14.720573425292969, -1.60479736328125, -1.8762779235839844, 4.0403594970703125, 1.683462142944336, 7.796802520751953, -2.4959793090820312, 2.3949813842773438, 4.896945953369141, -14.1290283203125, 14.472688674926758, 17.881317138671875, 11.342506408691406, 41.44811248779297, -18.264354705810547, -2.268646240234375, -2.497835159301758, -4.311912536621094, 9.768867492675781, -1.0278396606445312, 27.206893920898438, 36.460426330566406, 18.524917602539062, 29.063941955566406, 32.080291748046875, 0.55474853515625, 3.5705413818359375, 5.970062255859375, -7.010501861572266, 0.8286209106445312], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000212.npy"}
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 10.391778945922852, "std": 18.025768280029297, "min": -37.269439697265625, "p10": -11.686703109741208, "median": 7.095087051391602, "p90": 38.061676025390625, "max": 51.0595703125, "pos_frac": 0.765625, "sample": [25.655529022216797, -3.2581787109375, 10.211761474609375, 13.32159423828125, 4.721584320068359, 3.6473026275634766, -20.591156005859375, -2.2452774047851562, -26.332061767578125, 23.631317138671875, 21.78021240234375, 2.4385318756103516, 5.605812072753906, -2.2630481719970703, -37.269439697265625, -14.853485107421875, 22.41036033630371, 38.21356201171875, -0.5806388854980469, 9.871543884277344, 25.620891571044922, -14.903854370117188, 40.42267608642578, 39.327430725097656, 30.79278564453125, 10.947505950927734, -4.179872512817383, -4.497169494628906, 3.5101070404052734, 4.828699111938477, -13.101573944091797, -4.71136474609375, 8.673301696777344, 7.851411819458008, 7.7995452880859375, -14.001861572265625, 3.8447113037109375, 5.65618896484375, 31.613903045654297, 4.2727508544921875, 4.354351043701172, 26.45740509033203, 41.8410530090332, 1.8796844482421875, 16.362091064453125, 7.2003021240234375, 0.05992889404296875, 35.54801940917969, 41.98204040527344, 6.608917236328125, -8.385337829589844, 7.050907135009766, 13.984115600585938, 2.2462615966796875, 2.938983917236328, 7.1392669677734375, 23.309051513671875, 51.0595703125, 11.289299011230469, 1.4631805419921875, 13.309318542480469, 39.72869110107422, 36.05743408203125, 37.707275390625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000213.npy"}
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 11.19051742553711, "std": 19.091140747070312, "min": -24.9932861328125, "p10": -9.514535903930662, "median": 6.596897125244141, "p90": 36.055862045288094, "max": 57.35479736328125, "pos_frac": 0.71875, "sample": [18.244977951049805, 6.303413391113281, 1.80194091796875, 6.890380859375, 20.783363342285156, 21.027944564819336, 5.878211975097656, 2.296649932861328, 26.175010681152344, 5.945808410644531, 40.818328857421875, 24.250627517700195, 26.92265510559082, 2.757659912109375, 23.35039520263672, 32.73780822753906, 23.083877563476562, -22.643844604492188, 11.001602172851562, 25.92804527282715, -3.7210006713867188, 8.390071868896484, 1.9311676025390625, -1.0356597900390625, 1.3529644012451172, 32.58631896972656, -14.399909973144531, 18.128707885742188, 37.47463607788086, 31.365493774414062, 1.2259750366210938, 3.437164306640625, -7.251850128173828, -3.2073135375976562, 19.358802795410156, 34.6632080078125, -22.494140625, -7.465370178222656, 40.702659606933594, 57.35479736328125, -7.461891174316406, -12.413692474365234, 31.547149658203125, 15.105220794677734, 36.652713775634766, 54.93064880371094, -1.5377311706542969, 2.5457077026367188, 0.756561279296875, -6.695865631103516, -24.9932861328125, 33.65019989013672, 4.139245986938477, -2.2872047424316406, 4.4570465087890625, -3.924196243286133, 52.450714111328125, 13.601631164550781, -23.783466339111328, 11.119888305664062, 9.909698486328125, 9.952362060546875, -3.087146759033203, -10.392749786376953], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000214.npy"}
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 13.450828552246094, "std": 16.861652374267578, "min": -17.35918426513672, "p10": -7.537661170959472, "median": 11.365150451660156, "p90": 35.38478012084961, "max": 51.589447021484375, "pos_frac": 0.703125, "sample": [16.59908676147461, -1.5651779174804688, -17.35918426513672, 27.29632568359375, 11.7342529296875, -7.949836730957031, 10.236396789550781, -1.1613311767578125, 33.73438262939453, 34.27029037475586, -10.018325805664062, -7.058292388916016, -11.185874938964844, 9.812271118164062, 10.996047973632812, 39.83235168457031, -1.1646499633789062, 21.65118408203125, -2.9934310913085938, 14.414581298828125, 16.754104614257812, 12.589668273925781, 2.007343292236328, 19.588645935058594, 35.53803253173828, 2.620403289794922, -1.0175857543945312, 18.307655334472656, 19.965553283691406, -7.743104934692383, 3.8451385498046875, 32.295928955078125, -0.6581573486328125, 4.0124359130859375, -14.161712646484375, -9.07586669921875, 25.059560775756836, 21.41315460205078, 18.32152557373047, 41.59332275390625, 9.499265670776367, 5.2291412353515625, -1.581298828125, 0.8403549194335938, 20.295013427734375, 36.98426055908203, 32.9520263671875, 46.04043197631836, 31.997100830078125, 51.589447021484375, 14.519378662109375, 31.78955078125, 22.545066833496094, -4.9428253173828125, 8.035896301269531, 3.7146034240722656, 27.59588623046875, 47.19166564941406, 26.428863525390625, -1.3400993347167969, 35.027191162109375, -0.8209953308105469, 10.253242492675781, -4.367214202880859], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000215.npy"}
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 12.751335144042969, "std": 15.914328575134277, "min": -34.485023498535156, "p10": -5.624071121215818, "median": 10.924093246459961, "p90": 35.12236557006836, "max": 44.449371337890625, "pos_frac": 0.8125, "sample": [9.782089233398438, 36.88263702392578, 33.562591552734375, 31.872474670410156, 15.895980834960938, 5.8505096435546875, 13.895011901855469, 6.50799560546875, 17.115142822265625, 32.486351013183594, 6.352874755859375, 18.144454956054688, 3.8494720458984375, -7.9953155517578125, 21.85455322265625, -8.1103515625, 2.9822006225585938, 7.4640045166015625, 5.3462371826171875, 16.3104248046875, 21.063064575195312, 35.35475158691406, 16.85723114013672, 21.174339294433594, 23.419994354248047, 3.988058090209961, 40.80908203125, 1.667236328125, 1.5378665924072266, 29.397384643554688, 19.071205139160156, -34.485023498535156, 2.9095611572265625, 12.066097259521484, 18.336593627929688, 25.87432861328125, -6.743583679199219, 0.9033069610595703, 17.357280731201172, -3.0118751525878906, -0.4141693115234375, 43.19403076171875, 39.831031799316406, 0.5373306274414062, 2.4984397888183594, -8.771934509277344, 7.503292083740234, 3.110105514526367, -1.7695236206054688, 13.177490234375, 43.920589447021484, -12.72281265258789, -7.664886474609375, 1.1613636016845703, 34.58013153076172, -0.5165729522705078, 6.174345016479492, 12.140876770019531, 4.092979431152344, 44.449371337890625, -0.8908481597900391, 28.401031494140625, 20.99907684326172, 25.468399047851562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000216.npy"}
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 13.30902099609375, "std": 17.354511260986328, "min": -22.168014526367188, "p10": -6.636759948730467, "median": 8.7569580078125, "p90": 34.12448768615723, "max": 70.359130859375, "pos_frac": 0.8125, "sample": [22.83313751220703, 35.64209747314453, 0.19865798950195312, 9.706710815429688, -0.7313232421875, -7.457752227783203, 4.572649002075195, 70.359130859375, -13.342437744140625, 16.650634765625, 31.370712280273438, 43.6566162109375, 4.559318542480469, -16.18832015991211, 11.948333740234375, 3.7120437622070312, 18.091468811035156, 6.7776947021484375, 2.3043365478515625, 19.832969665527344, 2.7249088287353516, 12.720884323120117, 8.191375732421875, 27.925582885742188, 6.791221618652344, 33.12697219848633, -9.815040588378906, 3.2903060913085938, 1.0010833740234375, 16.978313446044922, 37.2922477722168, 5.574066162109375, 29.737594604492188, -4.721111297607422, 17.618850708007812, -7.819976806640625, 5.651252746582031, 24.2637882232666, 6.130706787109375, -22.168014526367188, -11.461532592773438, 32.83433532714844, 25.739620208740234, -2.4395980834960938, 5.2053375244140625, 17.60814666748047, -2.8408203125, 6.29052734375, 9.322540283203125, 59.1741943359375, 13.805131912231445, 34.55199432373047, 7.529642105102539, 2.7460460662841797, 25.145652770996094, 27.124740600585938, 42.887229919433594, 19.662750244140625, 4.875823974609375, 31.86052703857422, 5.840557098388672, 23.138946533203125, 15.251296997070312, -1.0673599243164062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000217.npy"}
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 7.059081554412842, "std": 18.422460556030273, "min": -50.260292053222656, "p10": -16.353549194335937, "median": 7.223751068115234, "p90": 29.306133270263672, "max": 60.029815673828125, "pos_frac": 0.703125, "sample": [6.3313140869140625, 2.2670841217041016, 4.704559326171875, 11.760013580322266, 7.930694580078125, -12.297836303710938, 7.0573577880859375, 10.125442504882812, 4.356353759765625, 22.019088745117188, 4.37347412109375, 8.16827392578125, 39.311485290527344, -16.57769775390625, -17.65192413330078, -27.100496292114258, 0.12440872192382812, 29.862106323242188, -20.651779174804688, -5.00897216796875, 11.761428833007812, 7.453361511230469, 2.4942245483398438, 10.331472396850586, 13.79910659790039, 60.029815673828125, 29.020118713378906, 45.47062683105469, 13.333263397216797, 17.187210083007812, -0.5655078887939453, -15.830535888671875, 14.999526977539062, -10.122102737426758, -7.840293884277344, -3.2024917602539062, 16.007408142089844, 9.978479385375977, 1.5160942077636719, 29.4287109375, 30.714679718017578, 19.525054931640625, -0.17096710205078125, -1.6399383544921875, 7.324790954589844, -6.433866500854492, 18.157554626464844, 2.8881969451904297, -26.994529724121094, -4.754676818847656, 24.331594467163086, -22.89134979248047, 11.6046142578125, 0.5923233032226562, 22.64978790283203, 12.431045532226562, 2.8905792236328125, 28.668296813964844, -50.260292053222656, -3.9181060791015625, 7.122711181640625, 15.484855651855469, 18.185821533203125, 41.920196533203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000218.npy"}
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 7.146683216094971, "std": 13.712711334228516, "min": -21.57415771484375, "p10": -6.631065368652342, "median": 6.2389984130859375, "p90": 29.344605255126954, "max": 43.43712615966797, "pos_frac": 0.703125, "sample": [6.666412353515625, 23.7730712890625, 8.388214111328125, 7.508123397827148, 6.868446350097656, 0.6235427856445312, 1.7100410461425781, 0.8888664245605469, 13.933692932128906, -18.081031799316406, -1.772125244140625, 34.17987823486328, 33.562652587890625, -7.31658935546875, -21.57415771484375, 15.809112548828125, 2.3820457458496094, 2.2744998931884766, 12.446086883544922, 0.9310989379882812, 2.776155471801758, -15.200149536132812, -3.89190673828125, 3.604511260986328, -0.0427703857421875, -2.2966766357421875, 1.760467529296875, -1.9679718017578125, 12.884613037109375, 5.147869110107422, 12.502464294433594, 34.34404754638672, 28.87274932861328, 36.08222961425781, 7.596321105957031, 22.92200469970703, -5.0315093994140625, 4.99104118347168, -7.660316467285156, 7.772941589355469, 6.969776153564453, 2.6988677978515625, -14.705772399902344, 5.9973907470703125, -3.8980636596679688, -3.6149253845214844, 10.588485717773438, -1.5948944091796875, 11.085914611816406, 43.43712615966797, 9.093757629394531, 8.596107482910156, 20.23910140991211, -2.9184341430664062, -16.362613677978516, 10.632122039794922, 8.136619567871094, 7.395820617675781, 29.546829223632812, -0.361572265625, 36.77611541748047, -1.3011474609375, 16.102497100830078, 6.4806060791015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000219.npy"}
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 9.979418754577637, "std": 15.631482124328613, "min": -29.789501190185547, "p10": -7.173196029663085, "median": 8.455062866210938, "p90": 28.147477149963386, "max": 49.537872314453125, "pos_frac": 0.71875, "sample": [2.030424118041992, 2.1843185424804688, 12.287200927734375, 11.423046112060547, 3.6516284942626953, 26.291473388671875, 41.6953125, 4.5916290283203125, 5.386981964111328, 4.2860565185546875, -3.952423095703125, 39.75346374511719, 3.9775238037109375, -9.956314086914062, -7.330757141113281, -9.755760192871094, 19.965049743652344, -0.10488128662109375, -3.032703399658203, 2.467803955078125, 25.486534118652344, 23.283599853515625, -1.7058906555175781, -21.380403518676758, 1.25146484375, -13.816858291625977, -1.108224868774414, -13.592758178710938, 14.492279052734375, 12.565322875976562, 17.46245574951172, 5.9948577880859375, 21.045974731445312, 16.304370880126953, 33.754180908203125, 3.3604507446289062, 19.959579467773438, 25.02273941040039, 30.606523513793945, -6.805553436279297, 6.512351989746094, 13.55589485168457, 12.166709899902344, 17.29853057861328, 44.556396484375, -3.1215972900390625, 22.29772186279297, -0.6039810180664062, 18.47991371154785, 24.45825958251953, 17.21059226989746, -4.681800842285156, 28.942907333374023, -2.6729202270507812, 5.029045104980469, 10.397773742675781, -29.789501190185547, 11.33761978149414, 15.16937255859375, 1.664693832397461, 49.537872314453125, -5.782594680786133, 26.186702728271484, 22.49311065673828], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000220.npy"}
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 11.268268585205078, "std": 17.307373046875, "min": -20.330413818359375, "p10": -5.739407348632811, "median": 6.7921295166015625, "p90": 36.83835830688478, "max": 65.69918823242188, "pos_frac": 0.75, "sample": [10.663671493530273, -18.691150665283203, 5.624542236328125, 7.098480224609375, -0.4689445495605469, 30.486839294433594, 1.8050155639648438, 6.48577880859375, 16.956283569335938, -7.106864929199219, -1.7258567810058594, 11.374359130859375, 8.528522491455078, 12.4097900390625, -8.027305603027344, 5.913366317749023, -3.867706298828125, -1.2619400024414062, 22.951833724975586, 9.649921417236328, 25.068811416625977, 28.429840087890625, 7.422191619873047, 15.401817321777344, 65.69918823242188, 4.786159515380859, 24.165760040283203, 46.36445236206055, -3.3688507080078125, 2.3256969451904297, 24.334304809570312, 34.46025848388672, -0.5800704956054688, 12.496795654296875, 52.269081115722656, 4.468944549560547, 24.50823974609375, -17.26807403564453, 14.579339981079102, 37.8575439453125, 18.983673095703125, -6.2706146240234375, 0.8951263427734375, 0.8397884368896484, -0.8620128631591797, 1.0869293212890625, 1.4308910369873047, 42.06340026855469, 5.011894226074219, -3.261058807373047, 3.2812347412109375, -20.330413818359375, 4.710826873779297, 38.236968994140625, 24.83148193359375, -17.01618766784668, 6.421211242675781, 13.561569213867188, 42.52532958984375, 27.696395874023438, 0.29746246337890625, -4.4999237060546875, 17.591842651367188, 11.723258972167969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000221.npy"}
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 13.678936004638672, "std": 15.741147994995117, "min": -18.60120391845703, "p10": -4.456831169128418, "median": 13.548145294189453, "p90": 34.39400863647462, "max": 59.4808349609375, "pos_frac": 0.78125, "sample": [-1.859842300415039, -18.60120391845703, 21.249984741210938, 12.776565551757812, 27.543167114257812, 29.989797592163086, 24.964519500732422, -2.3635940551757812, 17.983535766601562, 40.365142822265625, 12.143768310546875, 5.469148635864258, -9.385711669921875, 2.4255752563476562, 17.58802604675293, -6.44659423828125, 9.547264099121094, 4.716121673583984, 3.7393856048583984, 24.977882385253906, 10.453048706054688, -0.7787075042724609, 16.877883911132812, 26.58446502685547, -0.49799346923828125, 31.111282348632812, 3.2615432739257812, 59.4808349609375, -4.435464859008789, 18.296875, 15.340324401855469, 10.092178344726562, 30.874160766601562, 32.503631591796875, 12.3974609375, 30.626083374023438, 16.94251251220703, -3.0790786743164062, -8.872190475463867, 0.4702606201171875, -12.069465637207031, 23.67113494873047, 0.22576141357421875, 7.7269134521484375, 3.136554718017578, 48.131683349609375, 2.0805282592773438, -7.178428649902344, 5.0758209228515625, 14.319725036621094, 19.24217987060547, -2.4196338653564453, 35.20417022705078, 43.782135009765625, 18.14862060546875, 5.93133544921875, 35.80862045288086, 18.158599853515625, 14.484619140625, 23.244705200195312, 16.048980712890625, 17.242298126220703, 35.448974609375, -4.4659881591796875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000222.npy"}
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 11.437530517578125, "std": 14.726035118103027, "min": -15.085479736328125, "p10": -5.80578269958496, "median": 9.890127182006836, "p90": 28.1617904663086, "max": 49.992942810058594, "pos_frac": 0.703125, "sample": [26.102272033691406, 9.079294204711914, 15.630821228027344, 6.9495086669921875, -4.5689544677734375, 21.528553009033203, -5.240447998046875, 5.968536376953125, -0.1009674072265625, 19.653064727783203, 2.302877426147461, 24.450942993164062, 16.962772369384766, 23.09821319580078, -9.93997573852539, -4.831905364990234, -3.5843257904052734, 22.261207580566406, 21.98687744140625, -7.251930236816406, 23.597736358642578, 12.145851135253906, -3.3002281188964844, 11.028133392333984, -0.5261001586914062, 7.6370697021484375, -3.425283432006836, 38.31837463378906, -3.2704906463623047, -15.085479736328125, 20.310840606689453, 9.963268280029297, -11.342041015625, 1.6300468444824219, 5.7008056640625, 44.444236755371094, 24.99640655517578, 23.074504852294922, -2.0054931640625, -6.048069000244141, -6.829536437988281, 6.07402229309082, 18.46868896484375, 9.816986083984375, 24.657196044921875, 13.583370208740234, 28.864013671875, 24.721036911010742, 6.57623291015625, 36.63835525512695, 49.992942810058594, -7.0666046142578125, 38.770660400390625, 26.523269653320312, 11.716842651367188, 13.810356140136719, 25.124481201171875, 19.071678161621094, 8.358028411865234, -3.5254573822021484, -4.0742645263671875, 29.664398193359375, 2.5211334228515625, 0.24359130859375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000223.npy"}
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 7.851241111755371, "std": 14.685498237609863, "min": -18.557201385498047, "p10": -9.69456787109375, "median": 5.215797424316406, "p90": 25.56822624206544, "max": 49.88703918457031, "pos_frac": 0.6875, "sample": [3.344432830810547, 42.78962707519531, 13.065221786499023, 22.536819458007812, 2.042724609375, 30.021732330322266, 15.572921752929688, 18.791507720947266, -3.825847625732422, 5.300140380859375, 28.03549575805664, 7.458196640014648, 20.559894561767578, 26.473468780517578, -13.034175872802734, -2.0829601287841797, 3.914022445678711, 9.136192321777344, 8.889083862304688, 2.6387786865234375, 8.999656677246094, 5.9641876220703125, 11.512466430664062, -11.94676399230957, -1.3667984008789062, -0.4557228088378906, -16.078269958496094, -2.7833099365234375, 10.933563232421875, 5.1314544677734375, 17.354385375976562, -18.557201385498047, -14.114530563354492, -9.342559814453125, -6.521881103515625, -0.0432891845703125, -2.001920700073242, -12.998435974121094, 4.995174407958984, 9.423194885253906, -8.888198852539062, 7.349884033203125, 21.345020294189453, 2.392730712890625, 12.865020751953125, 3.3773345947265625, 13.161502838134766, 4.494415283203125, -5.556068420410156, -1.3970108032226562, -4.762968063354492, 16.153623580932617, 17.580015182495117, 20.064414978027344, 49.07514190673828, 11.243209838867188, 2.2364730834960938, 23.45599365234375, 3.170713424682617, 36.908260345458984, -9.845428466796875, 49.88703918457031, 15.576644897460938, 2.860980987548828], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000224.npy"}
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 11.233922958374023, "std": 15.94520092010498, "min": -32.870452880859375, "p10": -2.227152252197265, "median": 8.28070068359375, "p90": 33.4034164428711, "max": 50.103912353515625, "pos_frac": 0.796875, "sample": [0.9981632232666016, 3.539398193359375, 1.9296150207519531, 14.713943481445312, 9.97607421875, 24.87152099609375, 0.7915153503417969, -2.444061279296875, 9.970754623413086, 9.211074829101562, 5.02020263671875, 29.849639892578125, 4.3515167236328125, 19.179885864257812, -0.8233070373535156, 3.129444122314453, 3.7829666137695312, 3.943145751953125, 21.379329681396484, 33.66058349609375, 37.90242004394531, 18.340892791748047, 49.228660583496094, 13.088294982910156, 50.103912353515625, 6.647697448730469, 4.604400634765625, 15.223258972167969, 32.326236724853516, -2.983522415161133, 0.8546028137207031, -10.893524169921875, 12.889183044433594, -0.24528884887695312, 28.195640563964844, 20.686264038085938, 3.265596389770508, 15.353729248046875, 15.58392333984375, 2.313915252685547, -0.8165054321289062, 7.3503265380859375, 28.16132354736328, 0.5992717742919922, 39.790977478027344, -32.870452880859375, -22.52984619140625, 2.0241165161132812, 19.104644775390625, 4.15087890625, 17.217453002929688, -0.2450084686279297, 2.923370361328125, 32.80335998535156, 36.110843658447266, -1.7210311889648438, 12.886825561523438, 19.178810119628906, 38.6841926574707, 9.732437133789062, -11.064239501953125, -1.121368408203125, 22.666419982910156, -13.56341552734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000225.npy"}
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 9.101963996887207, "std": 15.402286529541016, "min": -21.462360382080078, "p10": -9.70089111328125, "median": 7.884700775146484, "p90": 30.15043563842774, "max": 42.53788757324219, "pos_frac": 0.71875, "sample": [7.7516021728515625, -15.056289672851562, 4.31201171875, 14.103315353393555, 13.582351684570312, 12.406951904296875, 23.197982788085938, 3.8457069396972656, 10.826953887939453, -8.479690551757812, 25.77029037475586, 2.0693626403808594, 8.454780578613281, 28.91608428955078, -10.173439025878906, -3.441204071044922, 4.311267852783203, -14.985206604003906, 34.46462631225586, 26.596633911132812, 36.29328155517578, -4.811065673828125, 26.05315399169922, 3.7583084106445312, 8.017799377441406, 1.3925514221191406, 0.7807540893554688, 25.97167205810547, 27.071434020996094, 19.139183044433594, -6.342325210571289, 19.560379028320312, 10.253002166748047, 13.588287353515625, 33.85249328613281, 10.176513671875, 7.263149261474609, -10.065811157226562, -2.9783573150634766, 0.8725414276123047, 2.7410964965820312, 14.83935546875, -6.226631164550781, -21.462360382080078, 36.06306457519531, 37.806640625, -19.788009643554688, 17.385276794433594, -20.743019104003906, 20.808570861816406, 6.355037689208984, 4.45433235168457, 6.177341461181641, 12.667730331420898, -8.849411010742188, 42.53788757324219, 30.679443359375, 19.151779174804688, -6.30084228515625, -1.590982437133789, 14.866744995117188, -2.203981399536133, -3.1626663208007812, 17.998268127441406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000226.npy"}
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 9.104473114013672, "std": 16.31398582458496, "min": -30.02770233154297, "p10": -7.389424896240232, "median": 5.427104949951172, "p90": 33.154656982421876, "max": 44.360816955566406, "pos_frac": 0.671875, "sample": [21.892578125, 8.377304077148438, -1.2032012939453125, 22.311981201171875, -1.027130126953125, -1.4932441711425781, 13.081932067871094, 34.88947296142578, -30.02770233154297, -3.803070068359375, 40.67546081542969, 0.184967041015625, -3.900545120239258, 36.680564880371094, 18.014602661132812, 5.255035400390625, 0.27788352966308594, -5.1695556640625, 30.05400848388672, -3.2225189208984375, 21.051467895507812, 16.683387756347656, 2.5314254760742188, 5.599174499511719, 44.360816955566406, 8.732887268066406, -1.8645248413085938, 20.598876953125, -3.870340347290039, -1.4191017150878906, 18.455154418945312, 39.667030334472656, 12.333328247070312, -12.000625610351562, 1.0655670166015625, 23.451635360717773, 43.287357330322266, 4.710731506347656, 10.938529968261719, 33.331642150878906, 3.1569366455078125, 8.269401550292969, -1.83013916015625, 10.95709228515625, 16.348724365234375, 1.4527130126953125, 3.0038604736328125, 4.425920486450195, -4.6600799560546875, -1.1355819702148438, -18.476924896240234, 24.69363021850586, 15.873138427734375, 16.652923583984375, 1.8336544036865234, -23.593048095703125, 8.812057495117188, -3.326862335205078, 32.06731414794922, -12.86761474609375, -8.340797424316406, -12.635566711425781, 32.74169158935547, 19.770530700683594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000227.npy"}
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 10.160097122192383, "std": 15.958332061767578, "min": -24.91965103149414, "p10": -8.39897861480713, "median": 8.892483711242676, "p90": 33.794584655761724, "max": 49.80921173095703, "pos_frac": 0.734375, "sample": [-1.0009326934814453, 30.70360565185547, 6.1504669189453125, 41.35669708251953, 16.041351318359375, 7.150382995605469, 4.901193618774414, 39.13726043701172, 10.185699462890625, -12.991050720214844, 11.220987319946289, -1.0651397705078125, -4.3372802734375, -24.91965103149414, 4.128692626953125, 2.432811737060547, 20.190593719482422, -4.79205322265625, 6.158042907714844, 9.450448989868164, 1.4698524475097656, 32.349029541015625, -8.021560668945312, 23.881423950195312, -15.407379150390625, 18.173828125, 19.12596321105957, 49.80921173095703, 5.720792770385742, 18.07247543334961, 1.8200340270996094, -8.43768310546875, 7.5039215087890625, 2.3008956909179688, 14.495538711547852, -3.3582077026367188, 11.214302062988281, -1.6467552185058594, -19.880081176757812, -8.30866813659668, 10.040367126464844, 11.29779052734375, 37.183197021484375, -14.201578140258789, 16.370452880859375, 2.0480308532714844, 35.52002716064453, 29.211456298828125, -8.941291809082031, 1.728851318359375, -5.3726959228515625, 11.33779525756836, 30.92401123046875, 12.619895935058594, -4.715789794921875, 18.43084716796875, 38.095428466796875, 34.41410827636719, 28.578815460205078, 8.334518432617188, 6.861732482910156, 13.113563537597656, 17.44278335571289, 18.944862365722656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000228.npy"}
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 8.877264976501465, "std": 14.955461502075195, "min": -31.230121612548828, "p10": -8.776486587524413, "median": 7.521278381347656, "p90": 28.0131914138794, "max": 51.3914794921875, "pos_frac": 0.71875, "sample": [10.967887878417969, 26.8671932220459, 23.14417266845703, 9.334228515625, 6.699123382568359, 14.216346740722656, 25.782798767089844, 10.424158096313477, 13.227706909179688, -9.461002349853516, 12.828704833984375, 15.84393310546875, 15.9815673828125, 1.0322952270507812, 11.080032348632812, 6.861865997314453, -5.0807647705078125, -10.127445220947266, 23.61945343017578, 23.783649444580078, 10.34674072265625, 2.3437652587890625, 6.052177429199219, 25.222274780273438, -16.415390014648438, 8.18069076538086, -11.751884460449219, -2.5484676361083984, 1.8162994384765625, -4.0656585693359375, 11.314888000488281, 51.3914794921875, -31.230121612548828, 29.528841018676758, -7.179283142089844, 2.357198715209961, 21.5863037109375, -11.469493865966797, 5.3705902099609375, 11.281219482421875, 30.626510620117188, 19.781814575195312, 0.43744468688964844, -3.0368309020996094, 17.931865692138672, -0.14048385620117188, 30.680864334106445, 14.416080474853516, 23.467090606689453, 5.82301139831543, -0.47153472900390625, -2.596078872680664, 6.018951416015625, -5.660530090332031, 28.50433349609375, 4.72894287109375, -0.48663330078125, 0.8617782592773438, 31.44705581665039, 33.1029052734375, 26.841625213623047, 3.079315185546875, -0.2668647766113281, -26.103759765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000229.npy"}
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 9.253654479980469, "std": 15.361634254455566, "min": -31.77557373046875, "p10": -4.102524185180663, "median": 5.825200080871582, "p90": 29.364384078979505, "max": 49.141845703125, "pos_frac": 0.8125, "sample": [-2.9322166442871094, 12.978866577148438, 16.64366340637207, -0.8477592468261719, 9.469474792480469, -31.77557373046875, 4.006570816040039, 0.6428680419921875, 12.624397277832031, 25.601348876953125, 11.693092346191406, 36.13751983642578, 19.601234436035156, 38.56816101074219, 10.860336303710938, 3.997600555419922, 17.761066436767578, 26.194854736328125, -6.796913146972656, 3.6265029907226562, -26.87268829345703, 5.0389556884765625, 10.382904052734375, 9.36264419555664, 15.697189331054688, -3.036724090576172, 26.473281860351562, 1.662200927734375, 30.60342788696289, 21.255966186523438, 14.936716079711914, 2.1912498474121094, -8.027469635009766, 5.951324462890625, 4.652320861816406, -4.559295654296875, 0.34288787841796875, 26.09650421142578, 49.141845703125, 12.999214172363281, 1.7464179992675781, 9.208694458007812, -1.9284934997558594, 2.6122817993164062, 42.33661651611328, 16.15069580078125, -10.564855575561523, 39.323875427246094, 1.254241943359375, -1.296609878540039, 32.50787353515625, -26.491806030273438, 22.919023513793945, 15.623327255249023, 7.335168838500977, 3.867938995361328, 2.6655826568603516, 0.4245128631591797, 5.699075698852539, 3.455677032470703, 5.582120895385742, 16.40966796875, 0.16942214965820312, 0.8758773803710938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000230.npy"}
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 13.439750671386719, "std": 14.670259475708008, "min": -21.53338623046875, "p10": -5.027656936645507, "median": 15.014364242553711, "p90": 29.31997356414795, "max": 48.991424560546875, "pos_frac": 0.78125, "sample": [27.934494018554688, 4.9961700439453125, 20.874168395996094, 26.47460174560547, 4.0118560791015625, 6.078407287597656, -2.5893707275390625, 20.282058715820312, -0.0796051025390625, 4.416492462158203, 12.354785919189453, 5.4186553955078125, 10.587478637695312, -0.8458366394042969, -11.239028930664062, 14.220947265625, 15.617362976074219, 19.198867797851562, 15.641403198242188, -1.6193809509277344, 26.987014770507812, -4.081993103027344, 5.914051055908203, 15.563465118408203, 4.575653076171875, 38.84111022949219, -21.53338623046875, -15.072097778320312, 39.90340805053711, 21.453189849853516, 33.09820556640625, -5.480922698974609, 25.127517700195312, 25.759323120117188, 23.91238021850586, -5.426727294921875, 4.000219345092773, 18.564682006835938, 14.377532958984375, -3.19305419921875, 48.991424560546875, 2.6379852294921875, 14.694278717041016, 8.936859130859375, 18.7906494140625, 7.490423202514648, 24.378883361816406, -6.135021209716797, 11.199703216552734, -4.096492767333984, 28.779842376708984, 27.124343872070312, 19.318817138671875, 15.334449768066406, 22.523109436035156, 38.213279724121094, 17.333282470703125, 25.068769454956055, 15.486572265625, 23.730697631835938, -14.9930419921875, 11.605062484741211, 29.55145835876465, 39.154579162597656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000231.npy"}
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 9.072556495666504, "std": 17.44285774230957, "min": -53.350921630859375, "p10": -8.34115219116211, "median": 5.4515533447265625, "p90": 33.302481842041026, "max": 63.57177734375, "pos_frac": 0.734375, "sample": [0.48642730712890625, 34.235633850097656, 17.25501251220703, 41.791526794433594, 5.020269393920898, 7.3368988037109375, 4.722267150878906, 31.33887481689453, 0.6248435974121094, -9.528450012207031, -7.927043914794922, 5.794975280761719, -9.29416275024414, 0.1407012939453125, 5.087528228759766, 23.003398895263672, 0.8352184295654297, 1.479684829711914, 16.93341064453125, 1.9561805725097656, -1.5285758972167969, 24.270984649658203, 9.9671630859375, 14.118743896484375, 4.568183898925781, 14.042173385620117, -10.520341873168945, 37.478004455566406, -8.518627166748047, 20.742630004882812, 11.915990829467773, 28.155441284179688, -10.25181770324707, -0.8201389312744141, 4.58563232421875, -0.960784912109375, 0.8724002838134766, -6.4067840576171875, 34.14402770996094, 7.897308349609375, -2.2443618774414062, 20.83062744140625, 5.108131408691406, 49.0611572265625, 39.26397705078125, 63.57177734375, -9.3133544921875, -4.3754119873046875, -3.9181976318359375, 6.923990249633789, 8.036323547363281, 27.145536422729492, 25.8643798828125, 10.156074523925781, 5.105327606201172, -7.77880859375, 1.6109428405761719, 17.774826049804688, 6.718452453613281, 7.475475311279297, -1.3621292114257812, -53.350921630859375, 15.85736083984375, 7.437620162963867], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000232.npy"}
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 8.477100372314453, "std": 15.685711860656738, "min": -24.17151641845703, "p10": -10.591419219970703, "median": 6.12260627746582, "p90": 31.101454544067398, "max": 47.87628173828125, "pos_frac": 0.671875, "sample": [47.84636688232422, -0.27445220947265625, 33.213645935058594, -5.905706405639648, 7.060646057128906, 19.265491485595703, 12.339157104492188, 14.61685562133789, -2.041248321533203, 8.15350341796875, 16.22547149658203, 47.87628173828125, 1.6919593811035156, -1.6681442260742188, -6.3703765869140625, 19.487777709960938, 19.90633773803711, 2.196382522583008, 0.43706512451171875, 0.1781291961669922, 19.929290771484375, 5.184566497802734, 27.16598129272461, 23.355972290039062, -15.308624267578125, 9.378173828125, -12.4744873046875, 17.252056121826172, -11.174514770507812, -10.532379150390625, 13.279327392578125, 1.4011917114257812, 14.292556762695312, 16.566722869873047, 26.134536743164062, -24.17151641845703, 12.943130493164062, 3.563882827758789, -0.311248779296875, 0.8910408020019531, 9.173660278320312, -14.972694396972656, 32.7880859375, -2.25946044921875, 20.30907440185547, -4.301971435546875, 0.2563629150390625, 13.23883056640625, 17.215972900390625, 23.472108840942383, -11.057640075683594, -5.702936172485352, 2.5787887573242188, 37.66699981689453, 40.55756378173828, 33.44648742675781, 3.6383018493652344, -7.035045623779297, 16.749435424804688, -5.0627288818359375, 10.662467956542969, -8.398323059082031, -10.616722106933594, -1.4130401611328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000233.npy"}
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 7.168840408325195, "std": 14.504755020141602, "min": -26.65612030029297, "p10": -10.227708435058593, "median": 4.611747741699219, "p90": 27.55362014770508, "max": 41.35520935058594, "pos_frac": 0.6875, "sample": [10.864875793457031, -2.2809600830078125, 1.40545654296875, -7.798004150390625, 7.940401077270508, -1.8835334777832031, 4.621845245361328, -17.290313720703125, 13.320083618164062, -9.949485778808594, 14.599075317382812, 11.87173843383789, 10.475502014160156, 14.647594451904297, 27.00226593017578, 3.773143768310547, -6.7130889892578125, -10.635540008544922, 14.474533081054688, -4.584051132202148, 6.887655258178711, 2.438608169555664, 29.568538665771484, 41.35520935058594, -1.6077518463134766, -3.78790283203125, 34.902713775634766, -0.7285594940185547, -26.65612030029297, 2.766002655029297, 9.751289367675781, 27.560367584228516, 33.29579162597656, 27.162944793701172, -2.0372066497802734, -18.13164710998535, 4.0818023681640625, 0.62396240234375, 5.213499069213867, -6.802616119384766, 23.836441040039062, 12.9390869140625, 4.361919403076172, 31.95557403564453, 24.295059204101562, 4.601650238037109, 2.0229644775390625, 18.85663604736328, 27.53787612915039, -3.849618911743164, -3.8468017578125, 10.431650161743164, -15.754383087158203, 17.64244842529297, -13.380859375, 3.905170440673828, 22.024505615234375, 6.378166198730469, 4.150154113769531, 4.696895599365234, 0.8280487060546875, 34.274940490722656, 11.527130126953125, -10.346946716308594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000234.npy"}
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 11.584515571594238, "std": 11.811059951782227, "min": -13.2816162109375, "p10": -1.4352931976318355, "median": 10.900995254516602, "p90": 27.07111968994141, "max": 36.22999572753906, "pos_frac": 0.828125, "sample": [3.872325897216797, 14.633407592773438, 4.377677917480469, 18.14704704284668, 18.597885131835938, 11.630775451660156, 8.081924438476562, 35.61763000488281, 29.173309326171875, 19.969131469726562, -2.985687255859375, 13.838871002197266, 1.9629669189453125, 14.354982376098633, 13.1541748046875, 22.189699172973633, 12.192575454711914, 2.1852073669433594, 21.435348510742188, 9.203010559082031, 8.493721008300781, 5.4016876220703125, -2.406494140625, 26.70916748046875, 22.887115478515625, 17.148269653320312, -1.6434440612792969, -6.200569152832031, 35.1756706237793, -11.213935852050781, 22.209205627441406, 0.5581550598144531, 8.331947326660156, 10.98748779296875, 27.226242065429688, 7.981956481933594, 19.37868881225586, 0.6642646789550781, 30.859466552734375, -13.2816162109375, -0.13385963439941406, 26.389892578125, 1.92645263671875, 8.746757507324219, 19.739845275878906, 1.4127883911132812, -0.9496078491210938, 22.453433990478516, 36.22999572753906, 6.161277770996094, 11.970008850097656, 0.5518035888671875, 22.068893432617188, 10.814502716064453, 3.1381072998046875, -5.507843017578125, 29.790138244628906, 24.807209014892578, 22.89678955078125, 4.769802093505859, 14.233413696289062, -0.22864151000976562, 0.13189697265625, -0.9033107757568359], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000235.npy"}
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 9.887496948242188, "std": 15.884683609008789, "min": -33.79667663574219, "p10": -6.269968032836914, "median": 8.417221069335938, "p90": 32.243766403198244, "max": 58.61000061035156, "pos_frac": 0.6875, "sample": [13.577423095703125, -1.8217239379882812, -3.6212120056152344, 26.666019439697266, 11.17498779296875, 7.5963897705078125, 19.255481719970703, 28.458145141601562, -2.3030471801757812, 1.6783618927001953, 10.75811767578125, 4.59735107421875, -3.865896224975586, 4.88946533203125, -6.446453094482422, 6.299201965332031, 15.904592514038086, 16.470054626464844, 23.977935791015625, 32.428138732910156, -0.4754295349121094, 1.82415771484375, 7.406005859375, -8.898944854736328, 19.168128967285156, 13.343643188476562, 12.63009262084961, 12.311187744140625, 36.054683685302734, 23.865989685058594, 32.70293426513672, -3.9635162353515625, -1.3777008056640625, 6.488880157470703, 36.8582763671875, -5.8581695556640625, -3.6136932373046875, 58.61000061035156, -1.7784538269042969, 10.368688583374023, -10.101337432861328, 11.41555404663086, 18.500640869140625, 27.78961181640625, 3.5981597900390625, 7.347965240478516, -4.008369445800781, -7.543312072753906, 31.81356430053711, 10.74972915649414, 1.4508018493652344, -23.695343017578125, -33.79667663574219, 34.36921310424805, 35.970970153808594, 15.222908020019531, 20.93426513671875, -10.167757034301758, -1.921539306640625, 2.5608787536621094, 29.074756622314453, -3.1637191772460938, 15.820676803588867, 9.238052368164062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000236.npy"}
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 9.869712829589844, "std": 15.312529563903809, "min": -27.879127502441406, "p10": -7.288629150390625, "median": 10.624279975891113, "p90": 28.599785614013676, "max": 54.31390380859375, "pos_frac": 0.796875, "sample": [21.8338623046875, 19.81031036376953, 11.360397338867188, 35.06517028808594, -0.7947921752929688, -2.06854248046875, 8.329216003417969, 11.089401245117188, 1.2472610473632812, 18.026979446411133, 14.949508666992188, 22.838165283203125, -8.481170654296875, 12.836181640625, 14.783050537109375, 20.04991912841797, -1.238668441772461, 21.83324432373047, -16.538108825683594, 21.603546142578125, -20.651084899902344, 19.35342025756836, 15.246139526367188, 10.317157745361328, 33.693695068359375, -5.094747543334961, 6.479888916015625, 23.730587005615234, -7.3497772216796875, 9.943931579589844, 19.37281608581543, 4.752189636230469, 16.977157592773438, -26.485061645507812, 10.931402206420898, 0.7577095031738281, 7.6317596435546875, -27.879127502441406, 40.59745788574219, 3.3247013092041016, 11.850028991699219, 10.994071960449219, 0.9158287048339844, 6.339134216308594, 4.065193176269531, 1.7615394592285156, 15.181039810180664, 9.497238159179688, 5.996551513671875, 28.957237243652344, 27.765731811523438, 14.786834716796875, 31.82164764404297, 54.31390380859375, -25.243988037109375, 12.124248504638672, -1.1250190734863281, 16.530031204223633, 4.869312286376953, 9.550758361816406, 0.8897628784179688, 4.99945068359375, -7.1459503173828125, 29.781898498535156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000237.npy"}
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 10.618122100830078, "std": 15.166804313659668, "min": -18.445438385009766, "p10": -6.760015296936035, "median": 7.5519561767578125, "p90": 31.670222663879404, "max": 42.48077392578125, "pos_frac": 0.703125, "sample": [-2.6427955627441406, 6.8118438720703125, 32.684173583984375, -8.725395202636719, 22.949954986572266, 38.860321044921875, 2.1779518127441406, 4.510795593261719, 26.760948181152344, -0.2718544006347656, 1.4268264770507812, -5.660797119140625, 27.689430236816406, 5.8593597412109375, 15.330049514770508, 12.935047149658203, -2.956296920776367, -6.85923957824707, 9.35369873046875, 32.676780700683594, 21.607376098632812, 41.16658020019531, 24.540061950683594, 21.45660400390625, 1.5395870208740234, 18.562742233276367, 15.752250671386719, -10.218086242675781, -10.477386474609375, 23.620254516601562, 26.17864990234375, 19.307470321655273, 5.249544143676758, 11.873954772949219, -3.5979156494140625, 6.0823974609375, 42.48077392578125, -4.231876373291016, 26.01500701904297, 34.50824737548828, 3.7662887573242188, -2.892475128173828, -0.6350822448730469, -3.8486480712890625, 28.920303344726562, 8.292068481445312, 32.62419128417969, 27.06450653076172, 9.042181015014648, 8.7691650390625, 5.60528564453125, 17.287761688232422, 29.44429588317871, -16.041975021362305, -6.528491973876953, 0.9119224548339844, -5.37872314453125, 6.460235595703125, 25.718368530273438, -9.74822998046875, 18.91510009765625, -5.595664978027344, -18.445438385009766, 1.5258312225341797], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000238.npy"}
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 10.405271530151367, "std": 15.067278861999512, "min": -25.608121871948242, "p10": -7.8219099044799805, "median": 9.421566009521484, "p90": 31.566462707519534, "max": 45.6103515625, "pos_frac": 0.765625, "sample": [7.854835510253906, 7.5541534423828125, 4.807830810546875, 3.3383827209472656, 36.47998046875, 16.646026611328125, -7.4697723388671875, 5.484138488769531, 2.239044189453125, 7.4844818115234375, -11.224159240722656, 8.42498779296875, -1.8563995361328125, 33.34785461425781, 11.207794189453125, 4.5815887451171875, -5.590950012207031, -16.90315818786621, 11.045501708984375, 30.64947509765625, 0.9382133483886719, 8.528739929199219, 27.315444946289062, 15.005287170410156, 42.33363723754883, 8.784355163574219, 12.102890014648438, 18.844364166259766, 14.914634704589844, 15.43988037109375, 20.351436614990234, 5.779609680175781, -12.48126220703125, -7.97282600402832, 7.556190490722656, 25.249038696289062, -2.8463821411132812, -0.40815162658691406, 0.83782958984375, 11.71258544921875, -2.453092575073242, 10.05877685546875, 13.493934631347656, 21.366104125976562, -20.138572692871094, 14.806529998779297, 38.73320007324219, 45.6103515625, 35.80635070800781, -15.749858856201172, -0.01174163818359375, 4.0878143310546875, -25.608121871948242, 18.451316833496094, 23.62310791015625, 10.581153869628906, 26.63336753845215, -0.6761322021484375, 11.046356201171875, 31.959457397460938, 2.932201385498047, 21.49199676513672, 24.846954345703125, 14.958786010742188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000239.npy"}
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 12.944878578186035, "std": 15.30936050415039, "min": -28.653724670410156, "p10": -3.3754032135009764, "median": 12.426753997802734, "p90": 32.12065544128418, "max": 44.168670654296875, "pos_frac": 0.75, "sample": [13.513355255126953, -10.974388122558594, 27.400238037109375, -3.1207275390625, 10.52520751953125, 44.168670654296875, 16.23297119140625, 27.343368530273438, -11.139623641967773, -6.10284423828125, 9.492347717285156, -1.009664535522461, 12.470161437988281, 26.052793502807617, 32.28365707397461, 37.677101135253906, 17.88810920715332, -13.268836975097656, 30.022781372070312, 23.656005859375, -3.141429901123047, 20.76099395751953, 9.985824584960938, -17.57434844970703, 15.536537170410156, 39.79853057861328, 18.221817016601562, 12.383346557617188, 35.19537353515625, 24.75147247314453, 25.642257690429688, 20.071136474609375, 3.7050323486328125, 10.3564453125, -1.0643081665039062, 22.1981201171875, 7.259033203125, 18.5848388671875, 30.650054931640625, -0.18428611755371094, 23.521080017089844, 13.470844268798828, -0.9282302856445312, 23.889968872070312, 41.08088684082031, -28.653724670410156, 0.7527904510498047, -0.07208251953125, 12.331045150756836, 2.9077606201171875, 11.126836776733398, 28.071502685546875, -1.1292648315429688, 8.334381103515625, -1.1232643127441406, 3.5779647827148438, 31.740318298339844, 35.265289306640625, 6.437248229980469, 6.061792373657227, 23.528282165527344, 1.818115234375, -3.475677490234375, 13.691192626953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000240.npy"}
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 7.120691299438477, "std": 13.356430053710938, "min": -15.619171142578125, "p10": -10.22511291503906, "median": 5.097882270812988, "p90": 25.986890792846687, "max": 39.425025939941406, "pos_frac": 0.640625, "sample": [-15.619171142578125, -10.973960876464844, 21.039426803588867, -10.956232070922852, 21.02043914794922, -10.639389038085938, 2.878875732421875, 18.573196411132812, 11.91877555847168, 17.776397705078125, -4.0959625244140625, 24.036605834960938, -5.883266448974609, 5.624269485473633, 1.6221160888671875, 22.048072814941406, 2.6606483459472656, 3.9777965545654297, -3.2637557983398438, 27.248123168945312, 19.522018432617188, 13.163681030273438, -5.717987060546875, 8.968910217285156, 7.003593444824219, -2.675352096557617, 17.520076751708984, -3.5232105255126953, 4.571495056152344, -8.301338195800781, 14.425756454467773, 29.6790771484375, 35.07054138183594, 1.1274642944335938, -3.561370849609375, -12.620803833007812, -1.4795379638671875, 19.547760009765625, -0.11196136474609375, 6.522602081298828, -12.525100708007812, 0.9740676879882812, -0.30925559997558594, 28.324005126953125, 8.624788284301758, 6.727020263671875, -14.909347534179688, 39.425025939941406, 13.472511291503906, -9.258468627929688, 31.703338623046875, 2.7654495239257812, 21.276100158691406, 9.308135986328125, 26.82272720336914, -6.148349761962891, -3.3338661193847656, 21.71137809753418, 13.420036315917969, 14.078775405883789, 0.6468067169189453, 9.201332092285156, -3.3948326110839844, -1.0024242401123047], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000241.npy"}
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 11.416751861572266, "std": 16.095232009887695, "min": -25.543479919433594, "p10": -3.069585418701172, "median": 9.825878143310547, "p90": 33.77894363403321, "max": 58.020782470703125, "pos_frac": 0.765625, "sample": [-20.234907150268555, 7.300788879394531, 15.771202087402344, 6.696453094482422, 19.091102600097656, -13.440105438232422, 25.819229125976562, 19.477018356323242, 18.57901382446289, 31.870216369628906, 2.5912094116210938, 14.30337142944336, 0.39931678771972656, 43.022064208984375, 28.697681427001953, -4.3159332275390625, 35.459075927734375, -2.8731918334960938, 1.8555221557617188, -0.4575080871582031, 4.80462646484375, 25.254196166992188, 35.181495666503906, 4.591381072998047, -1.7242279052734375, 13.128128051757812, 7.1199951171875, 4.287544250488281, 18.24053192138672, 26.81915283203125, 13.655586242675781, 0.22846412658691406, 48.01519775390625, 7.089139938354492, 21.839691162109375, 8.418991088867188, -0.4281425476074219, -3.034626007080078, 58.020782470703125, 2.536031723022461, -18.53106689453125, 18.97766876220703, 14.995674133300781, 1.5940628051757812, 21.681922912597656, -0.821563720703125, 21.357032775878906, 15.640892028808594, 14.87945556640625, 25.172374725341797, 39.7059440612793, 12.423328399658203, 34.59696960449219, -2.1462879180908203, -0.5313148498535156, 11.937881469726562, 11.464736938476562, 1.8112907409667969, 11.232765197753906, -25.543479919433594, 8.184211730957031, -3.0845680236816406, -16.16664695739746, 8.185239791870117], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000242.npy"}
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 9.829358100891113, "std": 13.778255462646484, "min": -16.349143981933594, "p10": -4.439323616027831, "median": 5.78369140625, "p90": 29.704724884033205, "max": 36.00968933105469, "pos_frac": 0.734375, "sample": [3.813446044921875, 27.351905822753906, 17.808509826660156, -14.845245361328125, 29.155229568481445, 4.067705154418945, -14.716140747070312, -0.1736602783203125, 15.198753356933594, -2.6895904541015625, 35.970703125, 7.6103668212890625, 2.1999053955078125, 2.4311981201171875, 4.318935394287109, 34.46886444091797, -1.6485557556152344, 3.8889007568359375, 25.762588500976562, -0.40088653564453125, 5.4303741455078125, 36.00968933105469, 12.549713134765625, -3.759113311767578, 2.8990631103515625, 16.029937744140625, 4.861541748046875, 23.352312088012695, 21.040267944335938, 17.99970245361328, -3.86566162109375, -8.141845703125, 11.421649932861328, -4.685178756713867, 24.586700439453125, 13.351167678833008, 18.93475341796875, 25.109970092773438, -12.62841796875, 1.798126220703125, 34.631324768066406, 6.1370086669921875, 13.898887634277344, 31.98461151123047, 0.849761962890625, 1.1952476501464844, -2.113321304321289, 11.248289108276367, 29.324737548828125, 14.072990417480469, -1.6202392578125, 1.108062744140625, 3.4420700073242188, 29.867576599121094, -5.815151214599609, 24.039642333984375, 8.925434112548828, 4.132152557373047, -3.469278335571289, -0.9306449890136719, -16.349143981933594, 33.553428649902344, 7.347785949707031, 21.75], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000243.npy"}
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 12.898747444152832, "std": 13.536625862121582, "min": -22.40479278564453, "p10": 0.24006633758544957, "median": 9.752168655395508, "p90": 32.57108993530275, "max": 45.21820068359375, "pos_frac": 0.921875, "sample": [9.232467651367188, 16.937667846679688, 15.261062622070312, 6.638664245605469, 3.3114013671875, 9.51824951171875, 0.03727149963378906, 28.410106658935547, 13.255605697631836, -10.251060485839844, 4.548389434814453, 17.50692367553711, 7.6917266845703125, 24.152313232421875, 9.838031768798828, 2.770904541015625, 10.601604461669922, 34.08820343017578, -22.40479278564453, 25.184635162353516, 15.68634033203125, 6.5257720947265625, -16.305007934570312, 19.190418243408203, 2.9972763061523438, 24.218223571777344, 14.822877883911133, 24.592666625976562, 35.04609680175781, 3.9911327362060547, 36.910072326660156, 5.344062805175781, -0.33996009826660156, 0.5805587768554688, 1.4089736938476562, 4.926401138305664, 0.09414100646972656, 17.468482971191406, 37.3875732421875, 45.21820068359375, 3.216947555541992, 26.54901123046875, 1.7836685180664062, 9.666305541992188, 1.120819091796875, 34.64310073852539, 19.13166046142578, 23.55223274230957, 3.0132522583007812, 9.264209747314453, 4.268449783325195, 22.319244384765625, 26.70471954345703, 12.465873718261719, 40.562110900878906, 15.775604248046875, -5.3021697998046875, 9.305351257324219, 2.0069580078125, 4.7455596923828125, 16.782150268554688, 1.9395637512207031, 26.88036346435547, 29.031158447265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000244.npy"}
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 14.382604598999023, "std": 13.5603609085083, "min": -14.644828796386719, "p10": -0.5603921890258773, "median": 13.975183486938477, "p90": 33.265937423706056, "max": 47.49876403808594, "pos_frac": 0.890625, "sample": [16.168128967285156, 16.4661865234375, 13.507930755615234, 35.81571960449219, 10.259748458862305, -1.2569408416748047, 3.7643775939941406, 26.133726119995117, 6.530860900878906, 13.182975769042969, 2.620391845703125, 14.839630126953125, 22.97957420349121, 12.635425567626953, 15.496986389160156, 33.74484634399414, 10.860099792480469, 23.193716049194336, -10.389678955078125, 47.49876403808594, 39.29484558105469, 7.457771301269531, 5.66424560546875, 18.09752082824707, 24.51861572265625, 28.801101684570312, 16.881027221679688, 1.1281661987304688, 15.3525390625, 1.4858646392822266, 32.14848327636719, -10.295032501220703, 1.0648880004882812, 15.068801879882812, 9.074440002441406, 18.754714965820312, -1.94122314453125, 4.315633773803711, 22.590160369873047, 43.61015319824219, 15.014537811279297, 4.1407928466796875, 40.250160217285156, 2.35113525390625, -2.0410709381103516, 13.489606857299805, 37.08235168457031, 31.041351318359375, 1.4059906005859375, 9.514617919921875, 16.070632934570312, 23.12717056274414, -11.681854248046875, 20.37494468688965, 28.13214111328125, 13.952217102050781, -14.644828796386719, 27.332765579223633, 3.5338058471679688, 7.018610000610352, 10.056625366210938, 19.142730712890625, 4.698848724365234, 13.998149871826172], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000245.npy"}
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 11.16021728515625, "std": 17.50347328186035, "min": -23.109420776367188, "p10": -8.810145378112791, "median": 9.734155654907227, "p90": 32.316321563720706, "max": 60.826995849609375, "pos_frac": 0.734375, "sample": [24.645606994628906, 27.389217376708984, 33.5850830078125, 34.61843490600586, 6.06488037109375, 3.9235000610351562, -20.953502655029297, 21.979507446289062, 10.605537414550781, -3.854818344116211, 23.041854858398438, 32.333709716796875, 6.7151947021484375, -4.012054443359375, 24.139389038085938, 15.85897445678711, 8.798450469970703, -7.842493057250977, -9.224853515625, -3.0460052490234375, 21.07635498046875, 24.678184509277344, 3.409515380859375, 32.27574920654297, 44.215171813964844, -19.458572387695312, -2.69976806640625, 60.826995849609375, -9.836181640625, 11.536239624023438, 2.462766647338867, 27.811691284179688, 3.6954803466796875, 5.49439811706543, 17.667236328125, 16.456680297851562, -23.109420776367188, 15.065446853637695, 36.7120475769043, 0.300567626953125, 12.198833465576172, -22.40656280517578, -6.771324157714844, 20.81513214111328, 8.862773895263672, -6.151641845703125, 13.627685546875, 24.346633911132812, 31.238235473632812, -19.144805908203125, 3.9257354736328125, 5.317665100097656, 12.820411682128906, 50.16999816894531, 7.101894378662109, -2.976531982421875, 21.475017547607422, 32.017539978027344, -3.9315948486328125, 2.324188232421875, 10.777103424072266, 7.764488220214844, 18.93665313720703, -1.429840087890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000246.npy"}
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 6.584142684936523, "std": 13.083430290222168, "min": -33.1527099609375, "p10": -10.19871368408203, "median": 7.229846954345703, "p90": 21.501469421386727, "max": 39.81781005859375, "pos_frac": 0.71875, "sample": [-4.6916351318359375, 8.61233901977539, 2.3338546752929688, 9.728492736816406, 6.218700408935547, -10.421836853027344, -4.9991455078125, 24.687469482421875, 31.47686004638672, 6.1858673095703125, 31.513717651367188, 7.4542694091796875, 23.532411575317383, 17.068811416625977, 5.9500732421875, 19.73931884765625, 13.166177749633789, 16.14627456665039, -4.866809844970703, -11.016021728515625, 39.81781005859375, 19.929214477539062, -20.725242614746094, -7.385919570922852, 6.4444122314453125, 7.894844055175781, 13.783000946044922, -0.8919143676757812, -9.678092956542969, 15.233314514160156, 22.17529296875, 13.472442626953125, 6.3879547119140625, 7.777374267578125, 4.9388580322265625, 6.173881530761719, 13.57244873046875, -33.1527099609375, -13.694931030273438, -0.024433135986328125, -2.095623016357422, 3.974637985229492, -6.465431213378906, 27.294273376464844, 14.160858154296875, 11.944416046142578, 0.9073753356933594, -5.5366363525390625, 7.005424499511719, 10.917800903320312, 5.9254608154296875, 11.748668670654297, -3.0756759643554688, 10.768613815307617, -19.614248275756836, 15.025289535522461, 14.729511260986328, 18.39015769958496, 1.1894302368164062, 0.0922393798828125, -12.284225463867188, 15.3692626953125, 11.893592834472656, 9.253135681152344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000247.npy"}
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 10.43209457397461, "std": 16.868453979492188, "min": -28.645729064941406, "p10": -11.964208984374997, "median": 7.445032119750977, "p90": 32.23051071166992, "max": 51.65503692626953, "pos_frac": 0.75, "sample": [7.333412170410156, 12.820510864257812, -19.454437255859375, -18.735626220703125, 44.285484313964844, 25.592910766601562, 28.841598510742188, 18.194114685058594, 2.1398067474365234, -5.462711334228516, -0.08613014221191406, 27.133527755737305, 18.262725830078125, -5.75370979309082, 28.525833129882812, 11.550498962402344, 36.052520751953125, -0.9240341186523438, -14.169540405273438, 14.00558090209961, -15.343650817871094, 3.2570343017578125, 4.2718353271484375, 7.945991516113281, 32.082550048828125, 11.764911651611328, 24.332494735717773, 7.2058563232421875, 10.035686492919922, 32.293922424316406, -28.645729064941406, 30.978134155273438, 0.7334346771240234, -5.568880081176758, 34.24033737182617, 24.401430130004883, 9.944282531738281, 2.5746383666992188, 3.2809677124023438, 19.475173950195312, 16.139625549316406, 28.981670379638672, 6.749452590942383, 37.456748962402344, 32.964447021484375, 7.556652069091797, 5.550500869750977, 3.2897071838378906, 7.698200225830078, 17.987930297851562, 51.65503692626953, 4.418476104736328, -1.78948974609375, 3.0277786254882812, -16.049964904785156, 6.545106887817383, -1.442169189453125, 30.032699584960938, 31.12108612060547, -7.2681121826171875, 3.281648635864258, 2.055154800415039, -13.619903564453125, -8.100921630859375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000248.npy"}
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 5.1751298904418945, "std": 14.450884819030762, "min": -26.173477172851562, "p10": -16.417164611816407, "median": 7.342012405395508, "p90": 21.0705135345459, "max": 34.62384033203125, "pos_frac": 0.671875, "sample": [13.144233703613281, -14.35093879699707, 10.266792297363281, 14.95654296875, 3.570089340209961, -8.526565551757812, -18.856353759765625, 14.203788757324219, 21.443878173828125, 14.680130004882812, 8.841667175292969, 16.95447540283203, -1.4127731323242188, 20.70876693725586, -24.81969451904297, -19.2032470703125, 10.026351928710938, -16.793487548828125, -12.594062805175781, -10.705024719238281, 3.569917678833008, -5.4569854736328125, -10.307586669921875, 7.686199188232422, 13.599578857421875, 6.997825622558594, 1.6858692169189453, -2.0453643798828125, 34.62384033203125, -1.8034439086914062, -8.145774841308594, 20.89112091064453, 1.6751079559326172, 10.651426315307617, 19.73873519897461, -19.953369140625, 3.9995803833007812, -2.016357421875, 11.681873321533203, -22.09521484375, 6.676668167114258, 28.751419067382812, 19.212501525878906, 29.396286010742188, 19.067886352539062, 6.536968231201172, -15.539077758789062, 21.147396087646484, 20.32672882080078, 6.469032287597656, 9.330793380737305, 10.445697784423828, 10.12054443359375, 2.3687820434570312, 15.22018051147461, 10.998092651367188, -2.552431106567383, 21.227813720703125, 31.316741943359375, -26.173477172851562, 17.422489166259766, -8.926233291625977, 2.32012939453125, 9.531845092773438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000249.npy"}
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 12.442645072937012, "std": 15.824583053588867, "min": -35.44358825683594, "p10": -4.6808830261230465, "median": 9.534764289855957, "p90": 32.152955627441415, "max": 52.906951904296875, "pos_frac": 0.796875, "sample": [10.909444808959961, 7.940717697143555, -3.3460330963134766, 7.000431060791016, 2.587310791015625, 16.539291381835938, 17.72620391845703, 39.53038787841797, 30.787925720214844, 4.647529602050781, 20.143901824951172, -9.395095825195312, 30.062828063964844, 2.136272430419922, 17.414581298828125, 3.0912036895751953, 5.669586181640625, -35.44358825683594, 21.792428970336914, 10.034721374511719, 9.034807205200195, -9.170391082763672, 8.080032348632812, -13.781784057617188, 23.75151252746582, 4.271942138671875, 34.46153259277344, 18.533123016357422, 7.725929260253906, 23.613449096679688, -7.923259735107422, 21.510604858398438, 32.73796844482422, -0.46380615234375, 18.6082763671875, -2.5830612182617188, 1.5393104553222656, 18.055721282958984, 49.237342834472656, 17.491897583007812, -5.000438690185547, -9.923641204833984, 14.083232879638672, -0.13869857788085938, 39.89628601074219, 13.701543807983398, 45.36688232421875, 23.70529556274414, 26.818096160888672, 23.40230369567871, 3.7683258056640625, 8.000755310058594, 19.334747314453125, 0.7111721038818359, 4.779876708984375, 7.469764709472656, 52.906951904296875, 24.549697875976562, 8.616519927978516, -0.03679084777832031, -3.935253143310547, 24.544540405273438, 5.7294769287109375, 13.417442321777344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000250.npy"}
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 7.519090175628662, "std": 14.645113945007324, "min": -25.899734497070312, "p10": -10.332418060302732, "median": 7.669805526733398, "p90": 22.82963333129883, "max": 46.98650360107422, "pos_frac": 0.703125, "sample": [2.1610450744628906, 40.25799560546875, 37.501678466796875, -2.487813949584961, 4.8837890625, 10.225685119628906, -22.631988525390625, 6.718875885009766, 9.61016845703125, 46.98650360107422, -1.3285751342773438, -12.651123046875, 8.698471069335938, -3.6662750244140625, 4.277256011962891, -7.129825592041016, 8.699661254882812, 25.69384765625, 5.296724319458008, -25.899734497070312, -3.376220703125, 40.45355987548828, 5.743730545043945, -20.42596435546875, 3.6505126953125, -0.9793624877929688, 21.90270233154297, 16.557022094726562, -11.354515075683594, -1.9939308166503906, 12.221382141113281, 13.239997863769531, 1.7609329223632812, 8.251632690429688, 19.310375213623047, 7.499767303466797, 18.48126220703125, 3.7184829711914062, -5.549800872802734, -0.2059307098388672, 22.977127075195312, 2.24432373046875, 20.177658081054688, 19.264297485351562, 5.0987701416015625, 7.83984375, 15.982406616210938, 9.97165298461914, 12.041824340820312, 15.669525146484375, 21.926483154296875, 9.39154052734375, 24.90121078491211, -7.9475250244140625, 2.3126602172851562, -3.9103755950927734, 16.185775756835938, -12.00274658203125, 12.247732162475586, -20.105430603027344, -6.907524108886719, 18.830825805664062, 22.48548126220703, 8.42424201965332], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000251.npy"}
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 7.4249114990234375, "std": 16.089033126831055, "min": -41.01354217529297, "p10": -8.959704399108885, "median": 7.473577499389648, "p90": 29.68311843872072, "max": 45.989349365234375, "pos_frac": 0.734375, "sample": [13.187257766723633, -35.68580627441406, -5.256601333618164, -34.12109375, 22.506507873535156, -2.0681228637695312, 7.674160003662109, 26.213821411132812, 3.295604705810547, 31.169960021972656, 32.17919921875, 15.641847610473633, 6.163002014160156, 13.476699829101562, 36.60102844238281, 0.01749420166015625, 33.24778747558594, -13.802370071411133, 7.808769226074219, 1.8786048889160156, -10.656719207763672, 5.918039321899414, 17.6458740234375, 8.675819396972656, 8.682144165039062, 6.310432434082031, 3.7160797119140625, -6.064567565917969, 25.388273239135742, -0.8988494873046875, 12.080476760864258, 34.18638610839844, 22.71013641357422, 12.237884521484375, 9.316883087158203, 3.5746192932128906, 12.973358154296875, 9.404333114624023, 5.267831802368164, -3.4107093811035156, 11.08883285522461, -3.5046615600585938, -1.0665817260742188, 14.033456802368164, 4.58648681640625, 45.989349365234375, -15.182807922363281, 8.880897521972656, -41.01354217529297, -9.73876953125, -3.326688766479492, -0.2222747802734375, 10.213109970092773, 0.37157249450683594, 7.220611572265625, 18.30176544189453, 7.2729949951171875, 18.792076110839844, 6.569877624511719, 9.542617797851562, 36.0128173828125, 18.568191528320312, -7.141885757446289, 1.7613945007324219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000252.npy"}
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 7.885035514831543, "std": 13.262789726257324, "min": -17.62183380126953, "p10": -8.665071487426758, "median": 7.513583183288574, "p90": 24.736445808410647, "max": 38.471153259277344, "pos_frac": 0.71875, "sample": [11.593055725097656, -12.15570068359375, 36.593421936035156, 1.4782562255859375, -0.5261325836181641, 9.25826644897461, 7.329376220703125, 24.936717987060547, 14.336654663085938, 7.697790145874023, 18.00910186767578, -8.505081176757812, 22.382858276367188, 1.729156494140625, 1.578878402709961, -8.733638763427734, -15.415191650390625, 10.923015594482422, -8.074813842773438, 19.396621704101562, 27.837600708007812, 16.78034210205078, -12.54684829711914, 17.509124755859375, 9.907285690307617, 3.7423095703125, 24.26914405822754, 16.811782836914062, 4.310340881347656, 14.820327758789062, 1.364776611328125, 2.049774169921875, 17.212385177612305, -2.618419647216797, 3.65740966796875, 36.370330810546875, 3.595245361328125, -17.62183380126953, 16.652481079101562, 12.260034561157227, 6.440071105957031, -6.996498107910156, 7.9999237060546875, -9.15887451171875, 17.57061767578125, -5.095806121826172, -4.68022346496582, 13.934112548828125, 38.471153259277344, 4.215707778930664, 3.7574005126953125, 30.303787231445312, 31.36231231689453, 4.2527008056640625, 20.40611457824707, -7.996162414550781, 13.209228515625, -5.794910430908203, 21.661415100097656, -10.72711181640625, -3.6674118041992188, -2.6828556060791016, 8.723146438598633, 8.938220977783203], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000253.npy"}
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 8.810222625732422, "std": 14.228848457336426, "min": -20.05933380126953, "p10": -7.018311309814451, "median": 7.6752214431762695, "p90": 27.32472763061524, "max": 44.427650451660156, "pos_frac": 0.75, "sample": [-11.003013610839844, 18.092575073242188, 35.93675231933594, 20.41252899169922, 0.86151123046875, 12.490402221679688, 22.364452362060547, 13.483627319335938, 1.6317081451416016, 0.29099273681640625, -4.415653228759766, 9.762256622314453, 21.566383361816406, 10.369497299194336, 12.829437255859375, 23.11071014404297, 26.14947509765625, 10.30419921875, -8.652076721191406, 16.874176025390625, 0.40459442138671875, 44.427650451660156, -15.99188232421875, 31.477924346923828, -2.9814987182617188, -20.05933380126953, -7.8536224365234375, 22.337854385375977, 21.63039207458496, -3.831501007080078, 38.7805061340332, -4.2079010009765625, 10.728271484375, -1.7132987976074219, -5.069252014160156, 8.89251708984375, -1.7264900207519531, 19.76630401611328, 0.40419960021972656, 10.508743286132812, 2.5788803100585938, 0.21331787109375, 10.482036590576172, -10.056541442871094, 7.637872695922852, 31.555084228515625, -2.4723434448242188, 25.235549926757812, 25.967361450195312, 1.7652587890625, 1.82623291015625, -4.001686096191406, 2.4312896728515625, 7.7125701904296875, 32.20465087890625, 1.730255126953125, 8.704483032226562, 3.239469528198242, 21.675460815429688, 27.828407287597656, 5.696699142456055, 1.972198486328125, 1.102874755859375, -19.55927276611328], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000254.npy"}
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 10.78951644897461, "std": 15.211450576782227, "min": -20.736122131347656, "p10": -5.938265419006347, "median": 7.362653732299805, "p90": 33.28766555786133, "max": 43.65359115600586, "pos_frac": 0.75, "sample": [32.08395004272461, 1.1059417724609375, 39.013763427734375, 19.058624267578125, 25.382064819335938, 6.1232757568359375, -2.29974365234375, 1.201751708984375, 2.0359420776367188, 17.204734802246094, -10.043621063232422, -0.7302341461181641, 6.098371505737305, 21.385391235351562, -5.797155380249023, 5.784282684326172, 24.613052368164062, 12.351402282714844, 1.8501129150390625, -20.736122131347656, 22.985912322998047, -1.0842437744140625, 9.42840576171875, 4.131134033203125, 32.19258117675781, 18.33995819091797, -5.099922180175781, 1.1933612823486328, 34.679237365722656, -3.499176025390625, 33.55046081542969, 23.54033660888672, -5.998741149902344, 8.457656860351562, 16.113372802734375, 8.972583770751953, 12.939310073852539, 32.674476623535156, 38.16686248779297, 11.225317001342773, 3.220914840698242, 16.865867614746094, 5.766143798828125, -15.476425170898438, 3.0426082611083984, -5.072465896606445, 4.701576232910156, 40.855499267578125, -6.482719421386719, 8.472785949707031, 2.3994407653808594, 27.21843719482422, 10.099876403808594, 43.65359115600586, 6.267650604248047, 22.2191162109375, 4.481746673583984, 24.421672821044922, 38.97999572753906, -12.04763412475586, -4.267265319824219, -8.436897277832031, 14.95526123046875, -3.9044113159179688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000255.npy"}
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 9.430547714233398, "std": 14.700714111328125, "min": -28.903358459472656, "p10": -5.179230690002441, "median": 7.454597473144531, "p90": 29.782556915283205, "max": 51.39959716796875, "pos_frac": 0.71875, "sample": [-4.2476043701171875, 5.26129150390625, 17.336484909057617, 11.582893371582031, 26.10028076171875, -4.3176116943359375, -5.057111740112305, 30.1793212890625, 5.990154266357422, 3.8502445220947266, -8.298297882080078, 3.854379653930664, -4.1311187744140625, 33.20000457763672, 7.832550048828125, 42.685523986816406, -3.6720809936523438, 8.822029113769531, -28.903358459472656, 13.923286437988281, -1.9047279357910156, 28.7470703125, 7.345829010009766, 7.537452697753906, -12.204414367675781, -10.417251586914062, 8.992637634277344, -0.25124359130859375, 12.591796875, 12.03668212890625, 0.1392669677734375, 18.70825958251953, 12.978240966796875, 20.65273666381836, 22.601825714111328, 25.68683624267578, 26.699851989746094, 23.846328735351562, 0.470184326171875, 31.142375946044922, 0.7825851440429688, 6.922304153442383, 34.879638671875, -3.3240814208984375, 30.84168815612793, 3.2892684936523438, 14.816360473632812, 0.7737503051757812, 12.774728775024414, 51.39959716796875, -0.4152069091796875, 20.139991760253906, -9.844146728515625, 2.6977882385253906, 10.911041259765625, -4.7385406494140625, 13.50836181640625, -4.953666687011719, 0.5467910766601562, 28.856773376464844, -6.678554534912109, 10.837432861328125, -5.2315673828125, 7.371742248535156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000256.npy"}
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 13.200648307800293, "std": 17.444412231445312, "min": -29.038267135620117, "p10": -3.7805929183959956, "median": 9.416618347167969, "p90": 37.61430358886719, "max": 63.865814208984375, "pos_frac": 0.8125, "sample": [-0.28377342224121094, 7.210960388183594, 0.9241943359375, 61.95355224609375, 42.80516052246094, 16.608325958251953, 3.604177474975586, 22.776941299438477, 7.030830383300781, 21.778274536132812, 37.36265563964844, 5.85064697265625, 6.114553451538086, -4.90472412109375, 20.597213745117188, 26.842269897460938, 32.215087890625, -1.7977752685546875, -17.556182861328125, -3.9611434936523438, -3.3593082427978516, 8.707542419433594, 17.877899169921875, 2.7992172241210938, 15.799018859863281, 2.927581787109375, 3.931041717529297, 38.23316955566406, 18.26184844970703, -1.3158340454101562, 22.784053802490234, 37.17978286743164, 10.23321533203125, 3.4178543090820312, 21.02826690673828, 9.620140075683594, 1.2950439453125, 27.498947143554688, 49.84218215942383, -7.880672454833984, -29.038267135620117, 63.865814208984375, 10.04986572265625, 8.409461975097656, 8.467041015625, 0.028324127197265625, 4.573448181152344, 1.9336090087890625, 9.451400756835938, 15.133316040039062, 20.926626205444336, 7.713035583496094, 27.129981994628906, 2.216602325439453, 9.3818359375, 44.28950881958008, -3.257719039916992, -11.674285888671875, 37.72215270996094, 12.244743347167969, 13.943061828613281, 15.818248748779297, 15.817123413085938, -4.355628967285156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000257.npy"}
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 9.160967826843262, "std": 11.397039413452148, "min": -19.82118797302246, "p10": -3.33852024078369, "median": 7.665238380432129, "p90": 23.127277755737303, "max": 35.75286865234375, "pos_frac": 0.84375, "sample": [12.81591796875, 2.8254127502441406, 1.3444671630859375, -3.9888668060302734, 17.052024841308594, 32.745887756347656, 29.511375427246094, 15.590606689453125, 19.22393798828125, 1.757364273071289, 8.33690071105957, 5.2205810546875, -7.0930023193359375, 14.292428970336914, 6.438179016113281, 22.66949462890625, 5.321544647216797, -12.008674621582031, 9.099296569824219, -19.82118797302246, 6.7433624267578125, 4.434600830078125, -14.209091186523438, 10.757766723632812, 6.9935760498046875, 11.648063659667969, 14.781600952148438, 12.251144409179688, 4.881488800048828, 3.9188308715820312, 8.714447021484375, 6.640842437744141, 14.024223327636719, 14.061975479125977, 5.428934097290039, 1.1917800903320312, 6.273307800292969, 1.677001953125, 5.346832275390625, 35.75286865234375, 14.468513488769531, -7.1970977783203125, 27.187637329101562, 15.263641357421875, 20.4051513671875, 23.096067428588867, 18.636856079101562, 13.331180572509766, 4.11346435546875, 23.140653610229492, -1.821044921875, -0.6562652587890625, 11.911033630371094, 0.870086669921875, 6.112945556640625, 5.319639205932617, 27.143218994140625, -18.626266479492188, 10.669609069824219, -0.06449699401855469, 17.509485244750977, 15.736557006835938, 30.95072364807129, 6.153409957885742], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000258.npy"}
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 14.194517135620117, "std": 15.100879669189453, "min": -23.017837524414062, "p10": -5.379069900512693, "median": 16.2913761138916, "p90": 33.132952499389646, "max": 47.491668701171875, "pos_frac": 0.828125, "sample": [-1.5121383666992188, -22.8992919921875, -23.017837524414062, 20.93549346923828, 20.341339111328125, 19.516433715820312, 32.957725524902344, 34.401275634765625, 15.520736694335938, -8.801589965820312, 24.32532501220703, -2.3096771240234375, 3.539243698120117, 36.815521240234375, 22.327308654785156, 1.213836669921875, -10.850057601928711, 2.0462265014648438, 38.05763244628906, 25.95581817626953, 11.469490051269531, 22.573036193847656, 30.11502456665039, 28.252761840820312, 21.280136108398438, -3.5965003967285156, -6.9946746826171875, 5.838348388671875, 12.369621276855469, 10.13996696472168, 21.602767944335938, 16.967453002929688, -2.654766082763672, 14.202857971191406, 10.925483703613281, 8.386390686035156, 31.243995666503906, 3.1416015625, 20.632476806640625, 27.9149169921875, 30.22393798828125, 16.565532684326172, 2.0213546752929688, 47.491668701171875, 2.89862060546875, 17.624282836914062, -13.890827178955078, 1.9033985137939453, -6.143028259277344, 2.4945945739746094, 34.79594421386719, 33.20804977416992, 19.850967407226562, 10.296051025390625, 18.638275146484375, 30.832443237304688, 14.327186584472656, 16.01721954345703, 2.381572723388672, 14.034912109375, 23.613113403320312, 33.63853454589844, 21.142173767089844, 22.109453201293945], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000259.npy"}
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 10.64083480834961, "std": 15.243772506713867, "min": -21.47570037841797, "p10": -10.292136764526367, "median": 11.904470443725586, "p90": 30.00277214050293, "max": 41.22105407714844, "pos_frac": 0.71875, "sample": [28.84576416015625, -15.225936889648438, 21.844392776489258, -9.92233657836914, -2.014129638671875, 41.22105407714844, 17.571151733398438, 14.920684814453125, 29.928054809570312, -14.6068115234375, 30.816917419433594, 13.770689010620117, 25.94493865966797, 23.127037048339844, 23.15129852294922, -6.383937835693359, 30.726531982421875, 17.26519775390625, 15.851339340209961, 40.95600128173828, 25.885353088378906, -0.87384033203125, 27.73041534423828, 31.302932739257812, -5.9693450927734375, -2.095287322998047, 23.676021575927734, 26.396682739257812, 0.6342506408691406, 5.0406341552734375, 13.282279968261719, 8.543365478515625, 23.867523193359375, 6.770912170410156, 2.102937698364258, 10.526660919189453, -2.018280029296875, -4.871421813964844, -21.47570037841797, 13.320205688476562, 3.0651092529296875, 17.047183990478516, 3.050220489501953, 0.041351318359375, 8.213844299316406, 14.655941009521484, 36.63850402832031, -3.9350357055664062, 30.034793853759766, -5.006317138671875, -10.45062255859375, 9.854188919067383, 15.169265747070312, 8.369232177734375, 18.753450393676758, 17.214332580566406, 7.69549560546875, 18.522064208984375, 24.741546630859375, -13.486709594726562, 0.9911308288574219, -16.006053924560547, -12.86676025390625, -0.8569107055664062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000260.npy"}
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 7.826627731323242, "std": 13.994874000549316, "min": -18.5455322265625, "p10": -9.286762237548826, "median": 6.758678436279297, "p90": 29.81776123046876, "max": 41.23096466064453, "pos_frac": 0.71875, "sample": [1.1032981872558594, 9.677291870117188, 9.03407096862793, -2.6565208435058594, -9.826522827148438, 20.804790496826172, -3.063016891479492, 32.15522384643555, 2.949085235595703, 0.4295024871826172, -16.786048889160156, -7.61016845703125, -3.6537933349609375, 32.49002456665039, 15.290433883666992, -6.258392333984375, 6.235481262207031, 31.138198852539062, 3.765962600708008, 35.34224319458008, 6.847450256347656, 7.8865203857421875, -4.693073272705078, -18.5455322265625, 19.9237060546875, 12.645553588867188, 41.23096466064453, 5.380832672119141, -5.644905090332031, 5.190752029418945, 16.64655303955078, 16.072628021240234, 14.430034637451172, 21.84222412109375, 8.80305290222168, -0.6648387908935547, 36.89864730834961, -7.621856689453125, -11.958320617675781, 16.83424949645996, 15.155662536621094, 2.1202220916748047, 17.09255599975586, 15.376102447509766, 4.815212249755859, -17.688491821289062, 0.8364486694335938, -14.845916748046875, -10.206287384033203, 6.928951263427734, 6.6699066162109375, 0.8593368530273438, 25.295578002929688, 32.85952377319336, 5.3030242919921875, -8.027320861816406, 13.58206558227539, 11.16680908203125, -3.205963134765625, 26.736740112304688, 8.241500854492188, 13.845558166503906, 11.287460327148438, 4.639699935913086], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000261.npy"}
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 10.065711975097656, "std": 13.111735343933105, "min": -11.500045776367188, "p10": -7.118290710449219, "median": 8.292778968811035, "p90": 29.404032135009764, "max": 33.805145263671875, "pos_frac": 0.8125, "sample": [-11.500045776367188, 11.094932556152344, 6.841350555419922, 1.7376041412353516, -6.885986328125, 21.100017547607422, 5.337547302246094, 2.92828369140625, 8.567695617675781, 31.892362594604492, -7.296630859375, 28.472328186035156, -6.133140563964844, 1.467733383178711, 31.144947052001953, -11.480945587158203, 8.449151992797852, 33.447967529296875, 22.606719970703125, 12.587638854980469, -6.7217254638671875, 1.0760269165039062, 3.6421127319335938, 20.727752685546875, 13.015739440917969, 31.506473541259766, 2.060455322265625, 33.805145263671875, -7.2178497314453125, 1.345163345336914, 27.952186584472656, 11.914825439453125, 9.170509338378906, 22.16692352294922, -11.139884948730469, 29.106521606445312, 5.553255081176758, 12.608562469482422, 27.79473876953125, 8.324478149414062, -3.5212936401367188, 4.89300537109375, -6.659049987792969, 18.466812133789062, 30.933746337890625, 12.139135360717773, 2.4720840454101562, 3.895296096801758, -9.039932250976562, 4.145229339599609, 11.912994384765625, 1.6556015014648438, 29.382301330566406, 0.1739978790283203, 16.602338790893555, 8.261079788208008, 21.996871948242188, 29.413345336914062, 2.988494873046875, 24.264739990234375, 7.344776153564453, 2.1151123046875, 16.818382263183594, -7.520500183105469], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000262.npy"}
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 7.940185546875, "std": 13.123964309692383, "min": -24.8721866607666, "p10": -6.601601409912108, "median": 7.437841415405273, "p90": 27.611352729797364, "max": 35.4509391784668, "pos_frac": 0.703125, "sample": [5.312915802001953, 18.70285415649414, -15.890090942382812, -0.631805419921875, -4.343378067016602, -3.0746688842773438, 10.57147216796875, 4.276786804199219, 19.980712890625, 27.7017879486084, 14.6484375, 0.7312870025634766, -2.6259307861328125, 15.14828109741211, 2.002960205078125, -10.526435852050781, 27.721832275390625, 0.5150604248046875, 10.554031372070312, -5.767814636230469, 13.112472534179688, 18.766372680664062, 30.13903045654297, 4.196815490722656, -1.3256912231445312, 2.667724609375, 27.40033721923828, 12.217147827148438, 20.635452270507812, 10.979843139648438, 35.4509391784668, 12.671356201171875, -8.041641235351562, 4.265953063964844, -6.9589385986328125, 2.3212966918945312, -18.528907775878906, 28.178653717041016, 12.474464416503906, 17.175430297851562, 10.169113159179688, -4.093299865722656, -3.5057373046875, 28.158424377441406, -5.7440032958984375, 4.362541198730469, 15.281906127929688, 25.526447296142578, 13.140907287597656, 2.858001708984375, 0.8737030029296875, 34.2635498046875, -4.81121826171875, -1.2461128234863281, 12.416168212890625, -2.366119384765625, 15.368934631347656, 17.99433708190918, 8.536022186279297, -10.133411407470703, 6.33966064453125, -24.8721866607666, 25.47502899169922, 11.372838973999023], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000263.npy"}
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 10.378680229187012, "std": 14.567595481872559, "min": -21.270553588867188, "p10": -8.220045089721678, "median": 9.17303466796875, "p90": 26.938806152343762, "max": 53.09114074707031, "pos_frac": 0.75, "sample": [19.5726318359375, -1.3776702880859375, 8.05706787109375, 1.7905635833740234, 4.355989456176758, -4.415567398071289, 3.98095703125, 0.11236572265625, 20.303768157958984, 16.313720703125, -9.634462356567383, 6.2132720947265625, -21.270553588867188, 22.818527221679688, -5.804479598999023, 17.965133666992188, 15.663558959960938, 15.239059448242188, 1.4199295043945312, 24.058761596679688, 13.853591918945312, 15.596799850463867, 21.401168823242188, 20.957618713378906, 12.264251708984375, 42.38112258911133, -2.1521644592285156, 7.6818389892578125, -0.3435821533203125, 15.486194610595703, 12.2418212890625, -0.39163970947265625, 30.37804412841797, 2.538951873779297, -0.2825469970703125, 10.28900146484375, 34.81352996826172, -6.393287658691406, 12.321693420410156, 30.602020263671875, -6.871482849121094, -10.849029541015625, 5.515266418457031, 28.173110961914062, 21.834449768066406, 23.816673278808594, -14.190567016601562, 20.910430908203125, -8.955001831054688, 5.386463165283203, -9.00921630859375, 4.421775817871094, 43.33245086669922, 12.690608978271484, 5.6105194091796875, -8.79800033569336, 4.807807922363281, 23.483642578125, 11.955490112304688, 0.07756423950195312, 2.53375244140625, 53.09114074707031, 24.037437438964844, 22.623245239257812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000264.npy"}
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 10.342907905578613, "std": 15.862972259521484, "min": -20.303909301757812, "p10": -12.71596221923828, "median": 10.267279624938965, "p90": 32.08618106842041, "max": 36.23907470703125, "pos_frac": 0.75, "sample": [3.43133544921875, 4.830148696899414, 2.57550048828125, -20.303909301757812, 9.910894393920898, 3.0670547485351562, 11.222345352172852, 16.517127990722656, -10.915878295898438, 33.175018310546875, 5.727821350097656, 2.2543563842773438, 15.998710632324219, 16.37060546875, -15.826187133789062, -13.4874267578125, 12.603940963745117, 4.466562271118164, 14.033700942993164, -19.79721450805664, 5.639017105102539, 29.49554443359375, 30.285171508789062, 11.434019088745117, 0.02204132080078125, -1.3368339538574219, 27.9908390045166, 2.2999420166015625, 10.623664855957031, 0.7242336273193359, 33.67735290527344, 27.6837158203125, -5.502197265625, 36.23907470703125, 16.454696655273438, 35.77570343017578, 5.189687728881836, -2.369213104248047, -1.1307373046875, -3.0224533081054688, 12.735580444335938, 23.665287017822266, 19.19013023376465, 32.260337829589844, 33.143226623535156, 23.864532470703125, 35.63365173339844, 17.038795471191406, 9.28253173828125, -19.035192489624023, 8.790328979492188, 0.8838958740234375, -3.601032257080078, 31.29694366455078, 29.9324951171875, 31.6798152923584, 30.589889526367188, -20.224472045898438, 14.998655319213867, -4.921588897705078, 17.911163330078125, -1.5433807373046875, 18.777809143066406, -16.431060791015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000265.npy"}
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 9.787688255310059, "std": 14.978828430175781, "min": -32.23192596435547, "p10": -5.202489471435545, "median": 7.497014999389648, "p90": 29.850639152526856, "max": 53.427276611328125, "pos_frac": 0.75, "sample": [-5.920379638671875, 30.792930603027344, 35.28692626953125, 6.679372787475586, 14.943374633789062, -1.8728179931640625, -0.7436141967773438, -11.93115234375, 3.8186988830566406, 1.3585357666015625, 15.70285415649414, 11.913406372070312, 1.2492866516113281, 13.564821243286133, 16.199851989746094, 14.771427154541016, 1.8831787109375, 53.427276611328125, 3.1442413330078125, 1.4198684692382812, 36.844322204589844, 9.02532958984375, 0.5054397583007812, 29.869508743286133, 19.72919273376465, 32.9124755859375, -2.2869415283203125, 26.376388549804688, 17.039278030395508, -0.00128173828125, 22.22394371032715, 3.0501708984375, 23.251205444335938, -3.5274124145507812, 16.890592575073242, -10.222366333007812, 13.694007873535156, 29.806610107421875, 18.753318786621094, -0.3586463928222656, -3.0276336669921875, -9.195541381835938, -1.8885498046875, 6.852611541748047, -32.23192596435547, 4.008460998535156, 12.748237609863281, 8.963096618652344, 10.835456848144531, 8.14141845703125, -24.68088150024414, 5.502851486206055, -2.141265869140625, 6.59246826171875, 24.716461181640625, 14.718921661376953, 40.65003204345703, -6.966218948364258, 17.491973876953125, 5.2028045654296875, 21.961755752563477, 4.778459548950195, 23.612045288085938, 0.5038185119628906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000266.npy"}
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 7.559881210327148, "std": 14.103952407836914, "min": -37.63875961303711, "p10": -6.556801414489746, "median": 5.161884307861328, "p90": 25.73198642730713, "max": 40.65831756591797, "pos_frac": 0.671875, "sample": [14.621231079101562, -0.5194969177246094, 2.5588417053222656, -19.977313995361328, 7.893463134765625, -5.806316375732422, 20.0311279296875, 27.288421630859375, 21.004493713378906, 3.891529083251953, -2.870328903198242, 18.001388549804688, -0.6315879821777344, 40.65831756591797, 2.6998634338378906, -14.789482116699219, 8.655143737792969, -8.472010612487793, -5.8476409912109375, -17.345535278320312, -6.860727310180664, -7.0531005859375, 4.0246734619140625, 13.572372436523438, 26.966156005859375, 30.49296760559082, -2.72930908203125, -0.5852680206298828, 15.611328125, 4.962074279785156, 16.273178100585938, 3.036314010620117, 25.93951988220215, 14.47515869140625, -3.611663818359375, 6.826335906982422, 19.712867736816406, -4.8470001220703125, 5.1500091552734375, 35.45765686035156, 9.844024658203125, -0.19477462768554688, 31.8056640625, -5.487091064453125, 21.7877197265625, 15.602550506591797, 21.32762908935547, 9.912544250488281, -37.63875961303711, 20.199440002441406, 11.950004577636719, -4.2706298828125, -2.8061885833740234, 5.173759460449219, 2.6870880126953125, 1.0041313171386719, 15.430580139160156, 18.20149040222168, 25.24774169921875, 2.198577880859375, 4.573642730712891, 21.726520538330078, 9.422348022460938, -1.7232913970947266], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000267.npy"}
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 8.792243003845215, "std": 12.458946228027344, "min": -13.464889526367188, "p10": -5.818856048583983, "median": 7.212608337402344, "p90": 27.790007781982425, "max": 37.44691467285156, "pos_frac": 0.75, "sample": [2.8612060546875, 34.944976806640625, 3.8164291381835938, -4.9937591552734375, 12.201156616210938, 25.61871337890625, 7.886634826660156, -0.5336074829101562, -5.222511291503906, 13.549259185791016, 18.619537353515625, 0.508453369140625, -0.18526840209960938, 5.331138610839844, 2.2499465942382812, 0.8540458679199219, 11.299520492553711, -0.939605712890625, -0.11852836608886719, 15.715652465820312, 20.1094913482666, 27.985397338867188, -1.0283832550048828, 1.1054840087890625, -6.074432373046875, 11.218902587890625, 3.989105224609375, 16.395668029785156, 11.491012573242188, 4.01202392578125, 17.98084259033203, 7.771907806396484, 6.653308868408203, 10.52813720703125, -4.8505401611328125, 27.33409881591797, 1.3720626831054688, -12.975711822509766, -13.464889526367188, 8.311708450317383, 13.881912231445312, 0.9804439544677734, 3.1827468872070312, 37.44691467285156, 24.17066192626953, 17.14593505859375, 29.69470977783203, -6.427892684936523, 8.232612609863281, -8.552398681640625, -7.410514831542969, -4.720329284667969, 21.48175048828125, 1.6226844787597656, 31.84173583984375, 16.473344802856445, -8.483673095703125, 18.660301208496094, 33.16679382324219, 12.667886734008789, 0.4874439239501953, 31.96906280517578, 13.57266616821289, 0.29018402099609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000268.npy"}
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 10.009855270385742, "std": 13.233062744140625, "min": -22.773056030273438, "p10": -4.005668640136717, "median": 7.160051345825195, "p90": 30.984912109375, "max": 39.64959716796875, "pos_frac": 0.796875, "sample": [0.822357177734375, -14.2838134765625, 6.3454742431640625, 19.76738739013672, 17.457122802734375, -0.370330810546875, 5.9907684326171875, 32.624168395996094, 12.838882446289062, 2.921661376953125, -10.330062866210938, 13.81396484375, 13.830842971801758, 13.220512390136719, 30.737018585205078, 29.18377685546875, 2.264659881591797, 8.875, 31.146156311035156, -15.247711181640625, 2.2504425048828125, -5.646575927734375, 29.44132423400879, 3.305694580078125, -0.5826263427734375, 31.09115219116211, -0.6069850921630859, 39.64959716796875, 10.6878662109375, 23.01007080078125, 5.747251510620117, 7.974628448486328, -2.0463714599609375, 13.460342407226562, 3.8553314208984375, 2.1427783966064453, 3.8556289672851562, 5.6985931396484375, -4.845367431640625, 22.616804122924805, 33.473480224609375, 18.888751983642578, 4.472602844238281, 9.554405212402344, 18.325748443603516, -6.600160598754883, 12.132587432861328, 3.9599685668945312, -0.6657180786132812, -0.5215072631835938, 34.3753662109375, 1.0454597473144531, 33.100860595703125, 1.9170074462890625, 3.0088882446289062, -22.773056030273438, 9.359527587890625, 19.048599243164062, 4.2985992431640625, 13.760223388671875, 13.320663452148438, 15.11724853515625, 25.656906127929688, 3.706928253173828], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000269.npy"}
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 8.20516586303711, "std": 12.065594673156738, "min": -24.78665542602539, "p10": -4.595746994018555, "median": 8.177506446838379, "p90": 20.55214443206787, "max": 41.926307678222656, "pos_frac": 0.8125, "sample": [19.053796768188477, 23.84568214416504, -5.51922607421875, 30.420883178710938, 2.055419921875, 32.32396697998047, 8.860261917114258, 3.256114959716797, 13.774200439453125, 0.377227783203125, 41.926307678222656, 19.959232330322266, 9.504226684570312, 16.594642639160156, 3.9836044311523438, -7.4519805908203125, 9.290908813476562, 16.02387237548828, 0.0695953369140625, 16.09075927734375, 6.8155975341796875, 11.36285400390625, -0.34343719482421875, -4.088405609130859, 2.73370361328125, 10.380111694335938, 13.58281135559082, -24.692657470703125, 5.061985015869141, 1.6518383026123047, 14.452667236328125, 7.4947509765625, 1.006011962890625, 17.279747009277344, 3.5473365783691406, 11.740386962890625, 19.334548950195312, -4.255992889404297, 6.274150848388672, -24.78665542602539, 34.37469482421875, 20.566696166992188, 21.31842041015625, 0.6510066986083984, 10.03108024597168, 10.085556030273438, -4.741355895996094, 5.288063049316406, 15.628433227539062, 11.058246612548828, 20.518190383911133, -0.5876979827880859, -6.6803741455078125, 15.770919799804688, 5.132122039794922, 0.925933837890625, 5.86517333984375, -17.96166229248047, 14.2384033203125, 5.64373779296875, 8.960456848144531, -0.33783721923828125, 6.7798919677734375, 13.611679077148438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000270.npy"}
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 7.920632362365723, "std": 15.542607307434082, "min": -19.55592155456543, "p10": -9.80847587585449, "median": 3.2328414916992188, "p90": 29.46967658996582, "max": 60.202972412109375, "pos_frac": 0.640625, "sample": [10.924446105957031, 14.361671447753906, -1.6633987426757812, 1.8698577880859375, 30.038589477539062, 22.443248748779297, 8.691627502441406, 2.934070587158203, 29.08639144897461, -12.2508544921875, 22.909011840820312, -11.102893829345703, 34.668701171875, -3.6788253784179688, 13.856082916259766, 29.633941650390625, 3.580738067626953, 6.7316436767578125, 2.387969970703125, 37.57598876953125, 2.1364212036132812, 5.846073150634766, 22.3293399810791, -5.971870422363281, -3.3921051025390625, 0.2521514892578125, -6.666999816894531, 15.0770263671875, -1.8639678955078125, -3.8196468353271484, 3.738832473754883, 17.42452621459961, 1.0914230346679688, -10.79062271118164, -0.5294189453125, 31.06159210205078, 5.456169128417969, 2.548065185546875, 20.154464721679688, 26.988916397094727, -1.9894561767578125, -14.697029113769531, 60.202972412109375, -10.891548156738281, 1.2604637145996094, -0.45965576171875, -0.07135772705078125, -19.55592155456543, -15.808055877685547, -0.4086761474609375, 3.2196922302246094, -0.561614990234375, 26.745101928710938, 21.141483306884766, 15.15284538269043, 15.277679443359375, 6.226837158203125, 20.403648376464844, -2.1248703002929688, 3.245990753173828, -5.160205841064453, -7.5167999267578125, 44.668975830078125, 4.551582336425781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000271.npy"}
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 11.578143119812012, "std": 12.335649490356445, "min": -9.215965270996094, "p10": -1.6563755035400378, "median": 9.850037574768066, "p90": 30.97552108764649, "max": 40.509010314941406, "pos_frac": 0.875, "sample": [1.963409423828125, -2.2262191772460938, -8.851486206054688, 13.19550895690918, 7.119659423828125, 36.744468688964844, 10.8570556640625, 27.797271728515625, 40.509010314941406, 8.450908660888672, 20.952880859375, 12.59749984741211, 2.7304954528808594, 2.4649581909179688, 11.710861206054688, 13.201011657714844, 19.151718139648438, 3.5090179443359375, 22.793678283691406, 20.099323272705078, 21.06096076965332, 4.490478515625, 14.691287994384766, 12.267194747924805, 36.40300750732422, 5.844825744628906, 11.880477905273438, -9.215965270996094, 6.5205841064453125, 28.825912475585938, 8.30096435546875, 40.20732879638672, 7.792762756347656, 0.5371894836425781, 15.810638427734375, 1.4802589416503906, -0.3267402648925781, 1.8817291259765625, 15.293167114257812, -8.441974639892578, 20.525062561035156, 36.592403411865234, 31.89678192138672, 10.216255187988281, 3.1673507690429688, 2.3759384155273438, 17.54986572265625, 6.484764099121094, 16.111427307128906, 16.479827880859375, 9.483819961547852, 17.779502868652344, -7.5770111083984375, 0.5276947021484375, 1.8895645141601562, 7.285919189453125, 11.864471435546875, 18.812103271484375, -3.363262176513672, 0.5548648834228516, -3.460050582885742, 1.2552146911621094, 36.78398132324219, 7.689554214477539], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000272.npy"}
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 9.984330177307129, "std": 14.890850067138672, "min": -24.174671173095703, "p10": -6.728522491455077, "median": 8.292659759521484, "p90": 26.84910964965821, "max": 52.873870849609375, "pos_frac": 0.765625, "sample": [-1.6099815368652344, 16.1708984375, 17.36834716796875, -9.912368774414062, 30.51904296875, 3.0781326293945312, -2.777435302734375, 24.759048461914062, -11.24874496459961, 9.882637023925781, 4.31634521484375, -2.0249176025390625, 11.618888854980469, 0.09043312072753906, 10.900703430175781, 13.712093353271484, 27.744850158691406, 1.5309333801269531, 27.930505752563477, 12.068527221679688, 24.54473114013672, 6.507167816162109, 0.85302734375, -1.4096031188964844, -17.4583740234375, 19.11937713623047, 12.500358581542969, 7.03582763671875, -6.2432403564453125, 3.0321311950683594, 8.502456665039062, 52.873870849609375, 20.31328582763672, -24.174671173095703, -1.3431549072265625, 13.841106414794922, 17.47417449951172, 21.778995513916016, 2.817535400390625, 13.4171142578125, -14.764892578125, 8.082862854003906, 16.87641143798828, 1.3211288452148438, 6.636905670166016, 51.823150634765625, 39.068702697753906, 20.891721725463867, 6.7266693115234375, 19.325153350830078, -1.1915855407714844, 5.04345703125, 2.5435142517089844, -6.936500549316406, 7.2488250732421875, 23.67060089111328, 4.573726654052734, 20.46472930908203, 9.598072052001953, -4.885643005371094, 24.74665069580078, 36.077392578125, -13.962177276611328, 17.918197631835938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000273.npy"}
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 9.671546936035156, "std": 13.555678367614746, "min": -22.739158630371094, "p10": -4.429690551757813, "median": 7.849948883056641, "p90": 25.71260261535645, "max": 40.180908203125, "pos_frac": 0.765625, "sample": [16.146011352539062, -4.445545196533203, -2.3436832427978516, 9.841728210449219, 0.062347412109375, -20.117027282714844, 22.90253448486328, 23.91565704345703, 6.870819091796875, 10.561384201049805, 20.64161491394043, 17.643417358398438, 0.669403076171875, -3.1903076171875, 10.688745498657227, 26.131668090820312, 40.180908203125, -17.618722915649414, 5.539295196533203, 10.857154846191406, 23.967681884765625, 24.734783172607422, 14.08675765991211, 23.316072463989258, 13.815902709960938, 1.0963249206542969, 29.093101501464844, 35.171775817871094, 4.188060760498047, -2.3248291015625, 2.5473079681396484, 5.285224914550781, 18.071006774902344, -3.5269088745117188, -0.9893684387207031, 22.56452178955078, -6.784027099609375, -8.68463134765625, 23.063751220703125, 0.22882080078125, -3.7932987213134766, 13.1561279296875, 3.386157989501953, 5.2036285400390625, -3.60809326171875, 14.362747192382812, 5.617244720458984, -22.739158630371094, 10.146652221679688, 7.155475616455078, 31.356338500976562, 22.660228729248047, 5.315467834472656, 3.40936279296875, 7.9501800537109375, -4.392696380615234, -5.009311676025391, 8.50263786315918, 34.46075439453125, 24.230016708374023, 4.645893096923828, 7.749717712402344, 33.54411315917969, 21.81004524230957], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000274.npy"}
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 10.42381477355957, "std": 13.704242706298828, "min": -28.014923095703125, "p10": -2.521776580810547, "median": 8.51736068725586, "p90": 27.30285987854004, "max": 43.50926208496094, "pos_frac": 0.78125, "sample": [14.872406005859375, 11.91611099243164, -2.6293487548828125, 20.84824562072754, 27.09805679321289, 8.451171875, 7.893337249755859, -28.014923095703125, 39.27519226074219, 23.785797119140625, 3.4668731689453125, 5.4165802001953125, 3.8457260131835938, 36.273948669433594, -1.5945796966552734, 1.1607170104980469, 26.39263916015625, 15.104209899902344, 33.35069274902344, -7.37640380859375, -16.900604248046875, 43.50926208496094, 2.2586746215820312, 4.39654541015625, 16.40184783935547, 0.5274124145507812, -6.531429290771484, 3.5353012084960938, 3.345043182373047, 2.0951690673828125, 15.18182373046875, 38.98370361328125, 3.8517532348632812, 10.769401550292969, 27.39063262939453, 18.71363067626953, 12.132469177246094, 9.782636642456055, -0.4557647705078125, -2.2707748413085938, 18.315872192382812, -0.16544342041015625, 5.191947937011719, -0.8780937194824219, 5.808511734008789, 25.02666473388672, 19.19355010986328, 13.792999267578125, -12.245277404785156, 8.35512924194336, 3.4162254333496094, 0.3851165771484375, -1.6628131866455078, 13.63427734375, 32.62159729003906, 21.50676727294922, -0.6838188171386719, 13.193168640136719, 15.032394409179688, 23.516923904418945, 16.303085327148438, 8.583549499511719, -5.21624755859375, 17.84488296508789], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000275.npy"}
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 10.148266792297363, "std": 11.551600456237793, "min": -15.013900756835938, "p10": -1.8010986328124998, "median": 6.410602569580078, "p90": 24.38652725219727, "max": 40.4646110534668, "pos_frac": 0.828125, "sample": [23.393495559692383, 3.4422225952148438, -0.1907329559326172, 16.765281677246094, 5.316265106201172, 6.1587371826171875, 40.4646110534668, -11.768867492675781, 5.882801055908203, 7.681419372558594, 9.43539810180664, 6.34943962097168, 13.573326110839844, 11.733116149902344, 12.103431701660156, 22.84368133544922, 4.480602264404297, 2.301067352294922, 8.425750732421875, 4.675853729248047, 4.9456329345703125, 22.43138885498047, 21.24355697631836, 30.06642723083496, 1.0731353759765625, -8.08024787902832, 15.515487670898438, -0.4318389892578125, 21.29109764099121, 27.922161102294922, 5.589138031005859, 10.4295654296875, -1.8555221557617188, -8.591573715209961, 24.662643432617188, -6.1710357666015625, 34.78797912597656, -0.0066986083984375, 11.243896484375, 18.143638610839844, -15.013900756835938, 6.3861083984375, 27.643287658691406, 1.2822322845458984, -1.6741104125976562, 6.0569000244140625, 22.714006423950195, 16.964096069335938, 22.372314453125, 19.132644653320312, -6.633995056152344, 3.1530609130859375, 6.355628967285156, 4.177494049072266, 6.435096740722656, 28.942459106445312, 23.74225616455078, 5.278163909912109, 3.9954605102539062, 21.558876037597656, 2.6445255279541016, 12.783187866210938, 13.769290924072266, 0.14825439453125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000276.npy"}
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 6.057959079742432, "std": 12.79620361328125, "min": -18.974246978759766, "p10": -8.77488098144531, "median": 5.015989303588867, "p90": 23.51660823822022, "max": 38.60932922363281, "pos_frac": 0.671875, "sample": [17.800338745117188, -9.439956665039062, 16.09918212890625, 1.5323638916015625, 15.241809844970703, -6.977260589599609, 4.825435638427734, 3.6439208984375, 22.81756019592285, 14.93035888671875, -7.170890808105469, -2.6532745361328125, 3.6931304931640625, 7.4255828857421875, -1.695037841796875, 9.008466720581055, 10.826242446899414, -5.7438201904296875, 5.620326995849609, 1.9726295471191406, 7.299678802490234, 15.759193420410156, 3.445220947265625, 5.535621643066406, 1.5449333190917969, -3.18756103515625, -1.3154220581054688, -11.138832092285156, -12.313278198242188, 18.587547302246094, -0.6211624145507812, 8.014892578125, -17.178152084350586, 23.816200256347656, 7.536312103271484, 13.72601318359375, -7.2230377197265625, -13.557182312011719, 14.405677795410156, 3.924468994140625, 5.20654296875, -6.290386199951172, 37.289154052734375, 24.640670776367188, 9.578392028808594, 6.790716171264648, 3.701690673828125, 5.406303405761719, -14.960792541503906, 38.60932922363281, 11.038078308105469, -5.8543701171875, 27.033447265625, 20.21515655517578, -5.045135498046875, 1.0310802459716797, 24.156946182250977, -18.974246978759766, 10.04364013671875, 36.95240783691406, -3.0097923278808594, 1.7484054565429688, -0.12166023254394531, 19.705556869506836], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000277.npy"}
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 9.540511131286621, "std": 13.132084846496582, "min": -34.545799255371094, "p10": -4.245152282714843, "median": 9.912010192871094, "p90": 25.366626739501957, "max": 35.72327423095703, "pos_frac": 0.796875, "sample": [-4.505558013916016, -7.3822021484375, 17.422607421875, 12.887283325195312, 6.4056549072265625, -1.0804023742675781, -19.638145446777344, 6.774440765380859, 26.862083435058594, 33.885498046875, 10.232177734375, -1.029754638671875, 25.580299377441406, 35.72327423095703, 19.159400939941406, -2.314146041870117, 7.535270690917969, 12.640316009521484, 9.826026916503906, 18.837749481201172, 9.04043960571289, 15.198532104492188, 7.447147369384766, 7.313240051269531, -34.545799255371094, 32.49534606933594, 0.4240875244140625, 21.31378746032715, 14.2078857421875, 7.02484130859375, 6.967048645019531, 12.201316833496094, -10.296722412109375, -1.6830863952636719, 15.633712768554688, 1.9897689819335938, 21.025894165039062, 10.88604736328125, 17.402822494506836, 18.357383728027344, 33.402645111083984, 33.66180419921875, -3.6375389099121094, 15.075759887695312, 9.666971206665039, 15.237220764160156, 10.941635131835938, 0.8136367797851562, -14.897132873535156, 3.8117752075195312, 9.997993469238281, 18.948043823242188, -16.783493041992188, -1.6636390686035156, 2.4117431640625, 20.338905334472656, 15.115631103515625, 24.868057250976562, 11.0404052734375, 7.498237609863281, 5.821048736572266, 4.10261344909668, 17.44072723388672, 7.154075622558594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000278.npy"}
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 12.44808292388916, "std": 12.346689224243164, "min": -20.90403938293457, "p10": -1.7954460144042967, "median": 10.4071626663208, "p90": 28.71535911560059, "max": 41.97732925415039, "pos_frac": 0.859375, "sample": [-1.8788299560546875, 7.866615295410156, 41.97732925415039, 6.100502014160156, -20.90403938293457, 24.204605102539062, 7.618946075439453, -5.857231140136719, -5.493804931640625, 12.63311767578125, 16.44310760498047, 22.166885375976562, 10.062637329101562, 29.330047607421875, 19.51467514038086, 35.636932373046875, 10.55023193359375, 27.281085968017578, -5.210859298706055, 1.6523704528808594, -10.570009231567383, 20.693817138671875, 4.67437744140625, 7.821052551269531, 9.494346618652344, 12.447616577148438, 7.940866470336914, 0.8290061950683594, 9.312568664550781, 22.97381591796875, 7.847938537597656, 19.636558532714844, 27.24689483642578, 8.268814086914062, -1.4644317626953125, 23.595949172973633, 19.10919952392578, 21.7196044921875, -1.6008834838867188, 34.338111877441406, 30.389991760253906, 10.264093399047852, 26.771102905273438, 4.58447265625, 2.39190673828125, 7.853538513183594, 5.512912750244141, 6.6498260498046875, 31.16656494140625, 22.24352264404297, 17.769001007080078, -3.6224594116210938, 11.494268417358398, 26.883472442626953, 2.664936065673828, 2.8119964599609375, 12.08062744140625, 3.845367431640625, 15.638179779052734, 13.528511047363281, 17.593204498291016, 10.75640869140625, 35.4986457824707, 3.8976821899414062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000279.npy"}
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 12.247005462646484, "std": 12.157942771911621, "min": -15.257068634033203, "p10": -0.7939300537109375, "median": 8.847073554992676, "p90": 31.996641540527346, "max": 43.28343200683594, "pos_frac": 0.875, "sample": [13.184345245361328, 36.84315872192383, 3.247770309448242, 11.614952087402344, 3.55584716796875, 35.791927337646484, 1.2036209106445312, -0.7750244140625, 5.92127799987793, 15.938209533691406, 11.17608642578125, 14.68878173828125, 15.476425170898438, 9.154251098632812, 26.940277099609375, 23.502334594726562, 6.2724456787109375, 8.798397064208984, 27.38720703125, 1.5583877563476562, 1.4769134521484375, 8.895750045776367, 1.1880645751953125, 2.1604690551757812, 31.069229125976562, 12.046417236328125, 8.696269989013672, 10.605331420898438, 32.39410400390625, 43.28343200683594, 13.116813659667969, -0.802032470703125, 5.0551910400390625, 8.536895751953125, 2.3943519592285156, 18.408981323242188, 38.1942138671875, -6.0038909912109375, 34.29493713378906, 17.178428649902344, -5.0085601806640625, -15.257068634033203, 4.1067352294921875, -2.3090972900390625, 8.94464111328125, 21.52542495727539, 8.325414657592773, -1.3031005859375, 7.7159576416015625, 7.468954086303711, 33.43780517578125, 7.910430908203125, 8.061840057373047, 26.835926055908203, 11.143701553344727, 8.44677734375, 13.681739807128906, 2.7873764038085938, 8.131420135498047, 7.941070556640625, -1.5608444213867188, 22.745628356933594, 21.43756103515625, 24.9281005859375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000280.npy"}
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 10.838560104370117, "std": 13.07650375366211, "min": -19.82086753845215, "p10": -5.089994812011717, "median": 9.47778034210205, "p90": 26.553071212768554, "max": 38.55250549316406, "pos_frac": 0.796875, "sample": [-5.839992523193359, 23.37165069580078, 4.087810516357422, 8.494781494140625, 26.10222625732422, 18.36479949951172, 9.353876113891602, 20.243927001953125, 19.607177734375, 38.55250549316406, 5.800392150878906, 25.04789161682129, 15.869966506958008, 14.565505981445312, -0.9917373657226562, 6.947229385375977, 9.249914169311523, 30.672988891601562, -3.256956100463867, 19.73663330078125, 25.958873748779297, 5.70720100402832, 13.878707885742188, 17.27033233642578, 16.334243774414062, 9.063411712646484, 9.6016845703125, -3.3400001525878906, 3.6565933227539062, -19.82086753845215, 6.934478759765625, 0.20681190490722656, -1.5860214233398438, 12.534624099731445, 27.69994354248047, -15.024101257324219, 19.832733154296875, 35.132781982421875, 1.268218994140625, -2.9877243041992188, 32.934837341308594, 12.558921813964844, 33.76332092285156, 17.559480667114258, 23.033967971801758, 14.242927551269531, 14.970088958740234, 5.648014068603516, 21.838499069213867, 8.786277770996094, 1.0665435791015625, 1.36419677734375, 26.592208862304688, -16.239364624023438, 5.732582092285156, 26.461750030517578, 18.935749053955078, -6.902679443359375, 9.071002960205078, 16.146835327148438, -1.9968986511230469, -9.185998916625977, 0.4515857696533203, -11.468561172485352], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000281.npy"}
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 10.019264221191406, "std": 13.962806701660156, "min": -15.132251739501953, "p10": -8.243675804138183, "median": 7.805809020996094, "p90": 29.458973312377932, "max": 41.62222671508789, "pos_frac": 0.765625, "sample": [3.6336212158203125, -8.929481506347656, -7.402122497558594, 6.5675048828125, 2.7430496215820312, 16.93939971923828, 14.427013397216797, 6.624444961547852, -4.626029968261719, 34.80108642578125, 23.441062927246094, 29.04031753540039, 7.860254287719727, 8.180389404296875, 25.808242797851562, 12.125625610351562, 35.089012145996094, 10.439842224121094, -1.649810791015625, 14.002281188964844, 5.837310791015625, 2.5636749267578125, 7.751363754272461, 26.646453857421875, 3.663949966430664, 2.448833465576172, -1.4208564758300781, 16.647741317749023, 41.62222671508789, 13.034687042236328, -4.193286895751953, -8.327920913696289, -15.132251739501953, 25.482892990112305, 0.1593780517578125, -10.475631713867188, 26.357460021972656, 3.507007598876953, -11.0340576171875, 12.138442993164062, 17.309860229492188, 21.571088790893555, 4.572563171386719, 16.102500915527344, 33.37160110473633, 36.610809326171875, 16.717243194580078, 1.3301887512207031, 0.2532806396484375, 12.80438232421875, -8.047103881835938, -7.40283203125, -0.5499114990234375, 4.64971923828125, -9.087593078613281, 40.33660125732422, 29.638397216796875, 13.843481063842773, -10.389619827270508, 8.302215576171875, 13.427610397338867, 6.643913269042969, 26.160564422607422, 6.6708831787109375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000282.npy"}
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 9.203363418579102, "std": 14.706467628479004, "min": -21.423858642578125, "p10": -8.870197105407714, "median": 6.296611785888672, "p90": 29.597719001770024, "max": 46.4071044921875, "pos_frac": 0.734375, "sample": [20.216842651367188, -7.30919075012207, -13.395805358886719, -11.677696228027344, 8.846649169921875, 7.241752624511719, 4.620372772216797, -1.6244583129882812, 34.769203186035156, -2.1379871368408203, -7.6092376708984375, 15.01629638671875, 9.188819885253906, 1.1479530334472656, 20.95694351196289, 6.978206634521484, 0.9062347412109375, 26.234107971191406, -20.477783203125, 10.650482177734375, 26.136280059814453, 5.775871276855469, 25.04141616821289, 4.988792419433594, 12.496246337890625, 32.21690368652344, 32.038841247558594, 9.05914306640625, 35.23957061767578, 5.9116363525390625, 4.4918975830078125, 21.535503387451172, -5.40216064453125, 10.926383972167969, -6.8479156494140625, 18.2615966796875, -9.570388793945312, -1.116485595703125, 11.836616516113281, 26.803817749023438, 27.602191925048828, -9.410608291625977, 26.15766143798828, 28.547271728515625, 1.5950431823730469, 0.7053184509277344, 6.681587219238281, -21.423858642578125, 1.2839031219482422, 4.04583740234375, 19.377809524536133, 2.6212692260742188, 46.4071044921875, 4.651885986328125, 32.14886474609375, 19.786270141601562, -0.8943405151367188, -1.1605873107910156, 3.662782669067383, 2.5836029052734375, -10.519134521484375, 30.047910690307617, -2.84490966796875, 14.997119903564453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000283.npy"}
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 9.81103515625, "std": 13.188302040100098, "min": -22.163219451904297, "p10": -2.3497480392456054, "median": 7.703361511230469, "p90": 25.777886962890637, "max": 46.15095901489258, "pos_frac": 0.828125, "sample": [8.428733825683594, 3.3494720458984375, 7.128324508666992, 9.614776611328125, 38.90570831298828, 11.388534545898438, -1.5657596588134766, -6.9832000732421875, 30.761550903320312, 4.046722412109375, 2.916820526123047, 11.788433074951172, 2.1806869506835938, 10.61077880859375, 19.74158477783203, -4.715969085693359, 4.806058883666992, 19.431827545166016, -0.0989837646484375, 43.91229248046875, 2.1451797485351562, 15.550178527832031, 2.9992523193359375, -6.391963958740234, -3.832244873046875, 13.016342163085938, 2.2603225708007812, 5.316703796386719, 7.481365203857422, 0.5886650085449219, 18.524219512939453, 10.507118225097656, 17.41571807861328, 1.9155654907226562, 9.58837890625, 4.091217041015625, 18.342247009277344, 46.15095901489258, 22.75396728515625, -2.3820953369140625, 16.80939483642578, 38.306976318359375, 5.25030517578125, 4.977970123291016, -2.274271011352539, 0.5324802398681641, 21.48895263671875, 12.905084609985352, 10.8270263671875, 27.0738525390625, 1.8763084411621094, 7.925357818603516, 22.453292846679688, 7.010772705078125, 6.608161926269531, 9.370254516601562, 38.130104064941406, 9.587482452392578, 13.72197151184082, -21.801132202148438, 15.102806091308594, -22.163219451904297, 2.8634719848632812, -0.3666400909423828], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000284.npy"}
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 8.042612075805664, "std": 12.464913368225098, "min": -16.72644805908203, "p10": -8.287992095947265, "median": 5.647268295288086, "p90": 25.24848785400391, "max": 39.863677978515625, "pos_frac": 0.703125, "sample": [11.096504211425781, 4.909454345703125, 0.608673095703125, 21.266374588012695, -0.073272705078125, -4.790279388427734, -12.298797607421875, 2.6520309448242188, 11.734371185302734, -9.887491226196289, 15.329334259033203, 28.924549102783203, 19.587642669677734, 13.421707153320312, 23.372604370117188, -10.803346633911133, 14.332183837890625, 33.932708740234375, -0.3547019958496094, 5.214626312255859, 4.132289886474609, -0.4881629943847656, -8.701560974121094, -11.355716705322266, -0.3484840393066406, -9.970069885253906, -7.322998046875, 11.577194213867188, 25.6031494140625, -4.1707611083984375, -2.8422698974609375, 18.212661743164062, -16.72644805908203, 23.48314666748047, 24.420944213867188, 7.046581268310547, 10.318016052246094, 9.568359375, 16.295679092407227, 18.626014709472656, 10.646469116210938, 3.6384429931640625, 2.4173851013183594, 30.55803680419922, 21.13495635986328, 6.310661315917969, 5.09620475769043, 39.863677978515625, 1.4794158935546875, 1.9558906555175781, 10.107545852661133, 16.74530029296875, 4.025501251220703, 26.0291748046875, -0.1984710693359375, 0.2933826446533203, 27.99298095703125, 18.8125, -5.873710632324219, 3.34649658203125, -1.119598388671875, 6.0799102783203125, -0.30322265625, 10.155784606933594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000285.npy"}
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 12.79408073425293, "std": 13.704327583312988, "min": -28.551742553710938, "p10": -2.333849716186523, "median": 11.294827461242676, "p90": 33.57089958190918, "max": 38.874603271484375, "pos_frac": 0.84375, "sample": [15.042068481445312, -0.15089988708496094, 1.065774917602539, 12.715591430664062, 10.878273010253906, 17.850906372070312, 6.4283905029296875, 5.059579849243164, 11.28868293762207, -1.9057464599609375, 24.248897552490234, 14.595794677734375, 9.128837585449219, 25.99302864074707, 20.769126892089844, 37.1241455078125, 22.175006866455078, 11.300971984863281, -16.496131896972656, 8.307365417480469, 13.859840393066406, -2.517322540283203, 4.264263153076172, 33.57313919067383, 4.808086395263672, -7.090576171875, 25.028310775756836, 17.441082000732422, -3.6304473876953125, 19.356735229492188, 25.06354331970215, 20.25249481201172, 26.337757110595703, 22.662063598632812, 19.06484603881836, 38.874603271484375, 3.072711944580078, 24.0631160736084, -4.374500274658203, 20.40496826171875, 34.527618408203125, 33.565673828125, 4.528533935546875, -3.79388427734375, 11.068695068359375, 9.721187591552734, -1.620809555053711, 0.139312744140625, 35.364845275878906, 34.56125259399414, 25.9808349609375, 6.142059326171875, 5.903125762939453, 3.172666549682617, 0.5326728820800781, 14.21551513671875, 35.236915588378906, 6.202968597412109, -28.551742553710938, 30.145477294921875, 9.491083145141602, 2.4964561462402344, 13.514022827148438, 0.34229278564453125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000286.npy"}
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 13.006895065307617, "std": 14.160866737365723, "min": -18.18593406677246, "p10": -1.174398422241211, "median": 10.022680282592773, "p90": 35.37558517456055, "max": 45.568668365478516, "pos_frac": 0.828125, "sample": [16.788291931152344, 4.5627593994140625, 3.086498260498047, 19.828899383544922, 38.62834167480469, 33.07337188720703, -1.0973358154296875, 9.096668243408203, 12.502754211425781, -1.7235450744628906, 8.267372131347656, 9.760944366455078, 6.109708786010742, 13.125289916992188, 35.646881103515625, 39.72004699707031, 35.73548889160156, 5.348161697387695, 19.926219940185547, 16.692913055419922, 13.924972534179688, 10.46630859375, 20.018268585205078, 4.119861602783203, 6.787731170654297, 21.97802734375, -1.1127548217773438, -9.874687194824219, 18.856170654296875, 5.599277496337891, 25.00945281982422, 6.4041290283203125, -18.18593406677246, 1.4193954467773438, -0.4724292755126953, 25.864578247070312, 4.529087066650391, 2.4004783630371094, 10.284416198730469, 25.536094665527344, 25.14373016357422, -1.2008171081542969, 34.74256134033203, 31.75471305847168, -7.448997497558594, 1.5807571411132812, 45.568668365478516, 6.731380462646484, 24.409412384033203, 21.001747131347656, -0.5769519805908203, 2.0883331298828125, 12.87019157409668, 5.7175750732421875, -1.4629631042480469, 4.010597229003906, 36.689937591552734, 11.736282348632812, 10.716705322265625, 2.2522315979003906, -7.266441345214844, 39.85981750488281, 34.682861328125, 0.20775604248046875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000287.npy"}
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 8.433120727539062, "std": 14.346867561340332, "min": -22.285598754882812, "p10": -8.6780517578125, "median": 7.530705451965332, "p90": 23.322402954101562, "max": 45.519309997558594, "pos_frac": 0.734375, "sample": [13.152204513549805, 3.6332244873046875, -7.9981689453125, -18.523757934570312, 20.319778442382812, 3.8550338745117188, 9.843391418457031, -7.191780090332031, 22.238800048828125, 7.785058975219727, 6.8049163818359375, 8.83627700805664, 1.9813079833984375, 22.587753295898438, -11.745849609375, 21.661865234375, -15.698104858398438, 11.85831069946289, -6.0735931396484375, 11.088310241699219, -7.659599304199219, 31.312801361083984, 45.519309997558594, 41.14949035644531, -2.6605224609375, 34.13801574707031, 8.688909530639648, 19.623077392578125, 0.9443283081054688, 4.214607238769531, 41.39692687988281, -8.4996337890625, 11.09796142578125, 13.498172760009766, 23.373992919921875, 11.373456954956055, 2.8282012939453125, -2.2841949462890625, 4.8471832275390625, -3.2353363037109375, -9.789581298828125, -22.285598754882812, 27.80699920654297, 3.8767852783203125, 6.524864196777344, -12.148490905761719, 20.368995666503906, -8.7545166015625, 14.161272048950195, 20.70989990234375, 14.891014099121094, 1.248291015625, 19.6348876953125, 4.0146484375, -2.0908336639404297, -2.1328582763671875, 13.268409729003906, 16.07042694091797, 1.8257980346679688, 21.416908264160156, 1.525869369506836, 11.016059875488281, 23.2020263671875, 7.2763519287109375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000288.npy"}
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 9.283180236816406, "std": 14.750408172607422, "min": -21.408340454101562, "p10": -8.093709564208984, "median": 6.722736358642578, "p90": 28.92584037780762, "max": 52.28057861328125, "pos_frac": 0.734375, "sample": [8.079536437988281, 1.7882766723632812, 19.79669189453125, -7.138250350952148, -2.7048797607421875, -7.2624969482421875, -2.4060707092285156, 6.788299560546875, 17.630996704101562, 9.655067443847656, 6.657173156738281, 6.019695281982422, 14.32298469543457, 15.341869354248047, 15.276199340820312, 3.880064010620117, 15.760986328125, -16.055145263671875, -0.8857231140136719, 26.045692443847656, 5.111824035644531, 37.743080139160156, -0.449798583984375, -9.107986450195312, 18.898681640625, 6.935823440551758, 29.258296966552734, 1.1107292175292969, 1.6103019714355469, 39.028133392333984, 4.46099853515625, 6.8077850341796875, -8.449943542480469, -21.408340454101562, 43.34446716308594, -3.5725059509277344, 0.9075412750244141, 52.28057861328125, 6.368415832519531, 9.342842102050781, 8.441465377807617, 15.628761291503906, -4.16015625, 5.016908645629883, 10.972755432128906, -10.239191055297852, 10.702930450439453, -0.1230316162109375, 1.0438385009765625, -14.906463623046875, 22.851720809936523, 27.673675537109375, 6.2031707763671875, 13.232894897460938, 32.473731994628906, 29.943378448486328, 28.150108337402344, 21.789215087890625, 23.772811889648438, 4.269622802734375, 4.571033477783203, -8.749130249023438, 20.71185302734375, -5.960285186767578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000289.npy"}
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 12.105278968811035, "std": 14.651632308959961, "min": -29.317119598388672, "p10": -3.861013793945311, "median": 10.183073997497559, "p90": 32.75193328857422, "max": 40.073036193847656, "pos_frac": 0.8125, "sample": [25.82938003540039, 16.487586975097656, 18.991912841796875, 39.249176025390625, 12.345329284667969, 4.846921920776367, -6.06153678894043, 3.6236705780029297, 4.474945068359375, 25.259117126464844, 10.584991455078125, 3.7663707733154297, 1.8334884643554688, -9.785789489746094, 24.141921997070312, 4.4765472412109375, -5.370086669921875, -2.2963104248046875, -0.3289833068847656, 4.200950622558594, 13.21833610534668, 27.651947021484375, 40.073036193847656, 24.528732299804688, 4.074623107910156, 1.0179061889648438, -0.6296348571777344, -4.5316009521484375, 10.116666793823242, -11.373146057128906, 25.269447326660156, 30.79186248779297, 17.32585334777832, 5.8562164306640625, 1.74169921875, 37.68596649169922, 39.10466766357422, 7.843864440917969, 7.002204895019531, -2.124603271484375, 32.15602111816406, 37.24876403808594, 2.394489288330078, -29.317119598388672, 4.544044494628906, -0.11284637451171875, 0.27407264709472656, 36.698211669921875, 20.64209747314453, 27.327497482299805, 5.038379669189453, 10.249481201171875, 4.985382080078125, 6.807350158691406, -9.968700408935547, 29.94598388671875, 15.056861877441406, 33.00732421875, 14.429046630859375, 11.349403381347656, 15.668010711669922, 28.61334228515625, 15.178665161132812, 11.608413696289062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000290.npy"}
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 11.573490142822266, "std": 16.0311222076416, "min": -27.255556106567383, "p10": -8.454040527343746, "median": 10.644218444824219, "p90": 33.78451423645021, "max": 48.03790283203125, "pos_frac": 0.734375, "sample": [29.9459228515625, 31.657974243164062, -4.8637237548828125, -14.949798583984375, 21.965789794921875, 25.724655151367188, -9.863998413085938, 0.14500808715820312, 38.593963623046875, -0.396514892578125, 17.71066665649414, 26.79168128967285, -11.267719268798828, -0.5941181182861328, 28.170289993286133, 25.360183715820312, -3.3819961547851562, -3.1048965454101562, 20.203048706054688, 12.121147155761719, 8.524452209472656, 0.4934368133544922, 17.4923095703125, 15.533256530761719, 17.598007202148438, 16.700767517089844, 42.76470947265625, 0.7644138336181641, 37.32063674926758, 9.838302612304688, 10.362846374511719, 36.41156005859375, 2.6948318481445312, 5.176174163818359, -5.1641387939453125, -0.9331531524658203, 27.361839294433594, -3.350921630859375, 26.016159057617188, 10.925590515136719, 7.235786437988281, 37.922279357910156, -27.255556106567383, 8.454360961914062, 28.519065856933594, 14.145339965820312, 16.40509033203125, 29.146648406982422, 13.211187362670898, 48.03790283203125, -12.704463958740234, 6.4734344482421875, -11.139644622802734, 12.01397705078125, -12.393617630004883, 2.526531219482422, 3.22979736328125, 34.69588851928711, 7.866127014160156, 11.474742889404297, 18.44823455810547, 7.135049819946289, -4.841644287109375, -4.401771545410156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000291.npy"}
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 9.444466590881348, "std": 13.717903137207031, "min": -30.515655517578125, "p10": -7.30816478729248, "median": 10.018341064453125, "p90": 24.569301414489747, "max": 50.281036376953125, "pos_frac": 0.78125, "sample": [-30.515655517578125, -5.8685150146484375, 14.396535873413086, 0.949554443359375, 30.26810646057129, 20.10577392578125, 5.9034271240234375, 50.281036376953125, 14.427389144897461, 0.6443634033203125, 2.1437301635742188, 3.243204116821289, 13.139867782592773, 23.51941680908203, -8.396469116210938, -1.600067138671875, 11.3475341796875, 19.106292724609375, 20.16973876953125, -1.5701751708984375, 10.109291076660156, 18.834964752197266, 15.253707885742188, -7.75883674621582, 6.8144073486328125, 2.8350143432617188, 2.7318649291992188, 25.114646911621094, -0.7607574462890625, 24.178388595581055, 2.6604995727539062, 13.974695205688477, 3.635408401489258, 0.8590087890625, -0.6807460784912109, -8.601600646972656, 24.736835479736328, -14.396827697753906, 1.5151596069335938, 12.999618530273438, -3.2565155029296875, 18.659713745117188, 18.176523208618164, -6.878030776977539, 38.193580627441406, 15.581079483032227, 10.41864013671875, 10.40578842163086, 7.945585250854492, 9.927391052246094, 17.83074951171875, 17.627288818359375, 2.6620254516601562, 19.524715423583984, 12.255264282226562, 3.2053565979003906, 14.45583724975586, 13.27761459350586, 39.527099609375, 5.888343811035156, 7.151824951171875, -11.628326416015625, -7.4925079345703125, 35.23699188232422], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000292.npy"}
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 13.26816463470459, "std": 16.062456130981445, "min": -35.121192932128906, "p10": -6.790967369079588, "median": 14.235130310058594, "p90": 33.65303726196291, "max": 48.178253173828125, "pos_frac": 0.75, "sample": [2.6933326721191406, 8.11578369140625, 12.70151138305664, 35.222862243652344, 21.49665069580078, 16.541263580322266, -8.315568923950195, 8.79606819152832, 11.251045227050781, 24.132431030273438, 30.058578491210938, -1.193338394165039, 26.893203735351562, 10.668952941894531, 22.861038208007812, 3.3127822875976562, 26.040491104125977, 20.418411254882812, 26.03364372253418, 21.9432373046875, 19.920455932617188, -7.245332717895508, -5.730781555175781, -7.89569091796875, 21.91615104675293, 4.390998840332031, 20.450328826904297, -8.29690933227539, 6.3028717041015625, 48.178253173828125, 2.3059940338134766, 39.88092803955078, 40.248016357421875, -11.32590103149414, 11.457412719726562, -2.4780044555664062, 29.909761428833008, -35.121192932128906, -1.9400978088378906, -0.8252792358398438, 15.719451904296875, 41.832305908203125, -15.02633285522461, -4.131740570068359, -5.318450927734375, -2.891326904296875, 15.862785339355469, 19.617971420288086, 44.57305908203125, 26.379295349121094, 25.903717041015625, 12.750808715820312, 17.104890823364258, 6.992527008056641, 20.543006896972656, -5.5258331298828125, 19.776611328125, 8.147705078125, 10.388679504394531, 35.193519592285156, 26.82049560546875, 9.86788558959961, 20.06665802001953, 20.740493774414062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000293.npy"}
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 11.148218154907227, "std": 14.37961483001709, "min": -12.617134094238281, "p10": -3.38685188293457, "median": 8.0961332321167, "p90": 32.37150535583496, "max": 59.043861389160156, "pos_frac": 0.765625, "sample": [32.87853240966797, 11.250011444091797, 21.318588256835938, 8.591796875, 31.18844223022461, 8.765403747558594, 37.60457992553711, -2.8026885986328125, -8.307769775390625, 6.0760955810546875, 16.66338348388672, -12.617134094238281, -6.257822036743164, 25.056777954101562, 14.103849411010742, 16.31348419189453, 34.347328186035156, 9.962890625, 28.055301666259766, 14.194999694824219, 6.965675354003906, 2.0904693603515625, 5.210762023925781, 16.80864715576172, -2.8806533813476562, 42.48370361328125, -8.349063873291016, 12.645853042602539, -3.8292274475097656, 59.043861389160156, 22.51287841796875, -1.8016700744628906, -0.08331108093261719, 39.1029052734375, -2.238340377807617, 1.0106964111328125, 2.3205413818359375, -2.7063255310058594, 0.421234130859375, 15.788917541503906, 3.9829025268554688, 1.455963134765625, 23.785554885864258, 2.5100460052490234, 13.071271896362305, 14.67474365234375, 15.377386093139648, 24.295196533203125, 2.6237926483154297, 34.67777633666992, -6.678035736083984, 30.19632339477539, 22.058563232421875, 2.186370849609375, 3.5895767211914062, 7.600469589233398, 5.173431396484375, -1.5954952239990234, 6.7207489013671875, 0.7098312377929688, 10.8734130859375, -3.6037940979003906, 9.328933715820312, -0.43267822265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000294.npy"}
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 9.331281661987305, "std": 13.942049980163574, "min": -15.514404296875, "p10": -6.784218215942382, "median": 6.2047576904296875, "p90": 28.429844665527344, "max": 47.43260955810547, "pos_frac": 0.75, "sample": [27.493759155273438, -3.0408859252929688, -2.5163726806640625, 0.396514892578125, 10.071643829345703, 18.24420166015625, 24.152374267578125, 2.2924537658691406, 6.18621826171875, 7.7105560302734375, 47.43260955810547, -0.46734619140625, 41.996795654296875, 5.432319641113281, 16.533859252929688, 11.2977294921875, 6.223297119140625, 7.748847961425781, -6.4412384033203125, 15.774364471435547, 25.446773529052734, -7.445249557495117, 1.683624267578125, 9.860555648803711, 29.7130126953125, -4.50885009765625, 5.632051467895508, -8.697174072265625, 28.351242065429688, 5.887336730957031, -8.73162841796875, -2.345632553100586, 46.30670928955078, 15.71435546875, 12.213615417480469, -5.953712463378906, 14.552421569824219, 8.102249145507812, 17.611156463623047, 32.265865325927734, 12.970626831054688, -15.514404296875, 4.461833953857422, 4.942840576171875, 19.677078247070312, 28.463531494140625, -6.931209564208984, -8.589088439941406, -14.151769638061523, 10.953086853027344, 2.2289562225341797, 4.384340286254883, 0.11310577392578125, 4.052825927734375, 3.8117847442626953, 4.658870697021484, 18.57612419128418, 4.1570892333984375, 6.453826904296875, -4.9990692138671875, 16.333541870117188, 21.616012573242188, 31.598045349121094, -4.2463226318359375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000295.npy"}
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 11.890068054199219, "std": 15.412466049194336, "min": -17.299346923828125, "p10": -8.273748397827148, "median": 10.545616149902344, "p90": 33.449926757812506, "max": 46.93090057373047, "pos_frac": 0.796875, "sample": [24.82421875, 34.637969970703125, 21.57733154296875, 5.859922409057617, -2.2730255126953125, -1.9373016357421875, 3.216306686401367, 39.880584716796875, 2.9081573486328125, 15.021369934082031, 17.079036712646484, 18.091197967529297, -9.944740295410156, 40.07838439941406, 5.47418212890625, 14.0469970703125, 25.239654541015625, 15.46030044555664, 9.92205810546875, 13.870506286621094, 6.095855712890625, 11.591327667236328, 23.662521362304688, -17.299346923828125, -16.223983764648438, -8.184013366699219, 3.511249542236328, 12.029436111450195, 10.778877258300781, 10.312355041503906, 4.709747314453125, -8.312206268310547, 9.718231201171875, 0.942474365234375, 41.48081970214844, 2.7054595947265625, 1.8312454223632812, -4.736042022705078, 23.553281784057617, -9.083648681640625, 10.155052185058594, 9.456100463867188, 45.35237121582031, -14.933868408203125, 28.10089111328125, 17.698654174804688, 18.851749420166016, 20.473608016967773, 18.72987937927246, 26.933135986328125, 21.801589965820312, 2.1107025146484375, 29.80807876586914, 3.146902084350586, 46.93090057373047, -3.2813568115234375, 3.1162948608398438, 32.53228759765625, 0.5522613525390625, -12.531219482421875, -5.347801208496094, 33.84320068359375, 20.212631225585938, 15.135543823242188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000296.npy"}
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 12.21376895904541, "std": 13.04709243774414, "min": -15.33082389831543, "p10": -1.7261344909667966, "median": 13.53315544128418, "p90": 29.52329730987549, "max": 40.33656311035156, "pos_frac": 0.828125, "sample": [20.42391586303711, -8.96160888671875, 5.299530029296875, 19.224639892578125, 4.929656982421875, 21.10338592529297, 4.246665954589844, 29.663267135620117, 0.4939727783203125, -15.33082389831543, 14.409862518310547, 7.3486175537109375, 14.536178588867188, 13.32497787475586, 4.874616622924805, 34.57769775390625, 14.308258056640625, 13.7413330078125, 3.1418285369873047, -1.466064453125, 21.474227905273438, -0.1272430419921875, -10.25775146484375, -11.250823974609375, -6.079133987426758, 34.23597717285156, 11.925827026367188, -0.9756622314453125, 26.61860466003418, 1.324737548828125, 24.73880958557129, 40.33656311035156, 1.2273712158203125, -1.339212417602539, 1.7201766967773438, 13.742887496948242, 3.652130126953125, 0.12469482421875, 27.933853149414062, 18.1036376953125, 18.145660400390625, 2.3001708984375, 16.888504028320312, 20.575143814086914, 5.309093475341797, 24.52306365966797, 6.9376373291015625, 0.16170215606689453, -1.8375930786132812, 29.196701049804688, 13.87405014038086, 22.608112335205078, 24.796581268310547, 23.406143188476562, -6.273052215576172, 8.6900634765625, 24.51055145263672, 30.680450439453125, 0.08889007568359375, 32.851470947265625, 8.344039916992188, 22.514083862304688, 30.715843200683594, 25.65428924560547], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000297.npy"}
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 9.47756290435791, "std": 14.639466285705566, "min": -23.159433364868164, "p10": -8.567411804199219, "median": 8.54611587524414, "p90": 31.006621742248544, "max": 46.56217575073242, "pos_frac": 0.734375, "sample": [8.269401550292969, -21.378965377807617, 46.56217575073242, 8.756500244140625, 27.73189926147461, 2.7413482666015625, 19.5430965423584, 2.079681396484375, -8.572891235351562, 16.347736358642578, 34.89351272583008, 32.14873504638672, -8.55462646484375, 1.519989013671875, -1.004669189453125, 15.027347564697266, 13.874725341796875, -0.6073646545410156, 22.2139835357666, 16.95465087890625, 0.9184150695800781, -5.453987121582031, 22.48584747314453, 16.55694580078125, 9.50909423828125, 8.4063720703125, -12.543678283691406, 15.687429428100586, 17.132408142089844, 31.851041793823242, 4.327581405639648, 36.303863525390625, 2.9246673583984375, 18.316635131835938, 9.929271697998047, 16.69387435913086, 6.010366439819336, 35.35218811035156, 6.914857864379883, 3.859588623046875, 3.2195816040039062, -5.9081878662109375, -9.039388656616211, 10.4268798828125, 5.577028274536133, 26.58612060546875, 29.03630828857422, 2.1732559204101562, -3.176616668701172, 10.858711242675781, 8.685859680175781, -2.1876907348632812, -6.75274658203125, -9.730850219726562, -2.9898109436035156, 16.150840759277344, -8.161861419677734, 16.773452758789062, -8.913864135742188, -23.159433364868164, 27.730506896972656, 17.828815460205078, 5.316188812255859, 32.49187469482422], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000298.npy"}
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 9.262445449829102, "std": 15.870047569274902, "min": -22.29150390625, "p10": -9.687292861938477, "median": 7.332113265991211, "p90": 28.59656982421875, "max": 54.580169677734375, "pos_frac": 0.734375, "sample": [21.111980438232422, 16.88162612915039, -1.7001323699951172, -10.55755615234375, -2.9733219146728516, 26.265357971191406, 8.775344848632812, 6.726665496826172, 42.79896545410156, 0.5515079498291016, -8.608024597167969, 8.042572021484375, -16.966873168945312, 2.0991249084472656, -22.29150390625, 2.6301422119140625, 2.664398193359375, 27.396066665649414, 28.650970458984375, -5.189094543457031, 29.852188110351562, -2.422290802001953, 5.4595794677734375, 22.384693145751953, -5.1057586669921875, -9.764923095703125, -0.37314605712890625, 3.043598175048828, 54.580169677734375, 0.9880218505859375, 25.970123291015625, 11.533729553222656, 18.064498901367188, -19.700130462646484, 4.159271240234375, 7.93756103515625, 11.949356079101562, 35.7249870300293, 10.0869140625, 12.180953979492188, 12.463241577148438, -5.673622131347656, 20.803184509277344, 21.111360549926758, 49.59834289550781, -9.750701904296875, 10.782094955444336, 28.469635009765625, 18.309722900390625, 16.58185577392578, 2.103513717651367, 2.0790748596191406, 15.94676399230957, -3.2178192138671875, 2.695159912109375, 19.644001007080078, -9.724945068359375, 17.341350555419922, 0.025552749633789062, 3.0911941528320312, 0.2032489776611328, 10.077407836914062, -9.599437713623047, 36.57868194580078], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000299.npy"}
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 10.336711883544922, "std": 13.748811721801758, "min": -18.73551368713379, "p10": -5.263933753967285, "median": 10.663545608520508, "p90": 25.585789489746094, "max": 43.9658088684082, "pos_frac": 0.78125, "sample": [5.740348815917969, 30.750141143798828, 1.6576690673828125, 25.666168212890625, 6.035221099853516, 3.1621780395507812, 12.695671081542969, 21.341705322265625, 0.2896595001220703, -18.06585693359375, 36.536766052246094, 22.787330627441406, 21.31281089782715, 17.230148315429688, 43.9658088684082, 16.03011703491211, 27.622207641601562, -7.2594146728515625, -4.673847198486328, 9.682533264160156, 2.535737991333008, 13.881813049316406, 18.074966430664062, -5.271749496459961, -7.952629089355469, 13.613945007324219, 13.102344512939453, -5.153228759765625, 8.871414184570312, 15.025716781616211, 3.671295166015625, -13.94305419921875, 15.334091186523438, 1.1819744110107422, -5.245697021484375, 11.64455795288086, 5.146467208862305, 43.11628723144531, 5.0316619873046875, 25.398239135742188, 23.50433349609375, 19.34445571899414, -3.6633453369140625, 33.04176330566406, 14.551467895507812, -4.310791015625, 5.321552276611328, 17.326419830322266, 0.197174072265625, 21.383752822875977, 18.507064819335938, 5.430639266967773, 6.6963958740234375, 3.1022872924804688, -2.0866928100585938, -10.46682357788086, 19.447616577148438, 20.926498413085938, 22.05478286743164, -18.73551368713379, 3.1780929565429688, 16.186195373535156, 24.107093811035156, -4.066375732421875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000300.npy"}
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 9.327051162719727, "std": 16.233137130737305, "min": -27.080886840820312, "p10": -9.1082218170166, "median": 7.304648399353027, "p90": 30.90883369445801, "max": 59.95372009277344, "pos_frac": 0.71875, "sample": [20.71204376220703, 15.230117797851562, 14.542671203613281, 6.309329986572266, 11.827911376953125, -6.433616638183594, 22.95496368408203, 30.556758880615234, 20.289270401000977, 7.09797477722168, 6.566469192504883, 5.271932601928711, 13.308570861816406, -6.417118072509766, -17.386215209960938, 6.50201416015625, 4.6343841552734375, 9.480926513671875, 19.933998107910156, 1.3488655090332031, -10.036605834960938, 8.121917724609375, 38.62705612182617, -7.4091339111328125, 20.08573341369629, -0.1617584228515625, 1.3252315521240234, 31.059722900390625, 14.958602905273438, -8.723831176757812, -9.272960662841797, -9.8341064453125, 24.92194366455078, 30.189056396484375, -7.49395751953125, 5.063079833984375, -8.022109985351562, 59.95372009277344, -17.32319450378418, 6.532077789306641, -0.7456073760986328, 8.542661666870117, 10.74392318725586, -19.299938201904297, 3.066844940185547, 26.96092414855957, 24.060121536254883, 7.511322021484375, -2.6790771484375, 35.63114929199219, 10.661218643188477, 32.06767272949219, 14.791149139404297, -0.9677467346191406, -7.2789459228515625, 16.17514419555664, 26.216629028320312, 34.667137145996094, 7.655876159667969, 1.6350784301757812, 1.9426956176757812, -27.080886840820312, 38.59612274169922, 5.166084289550781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000301.npy"}
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 6.528234481811523, "std": 13.667444229125977, "min": -26.38482666015625, "p10": -6.886360931396484, "median": 5.217309951782227, "p90": 25.866193962097174, "max": 47.3267822265625, "pos_frac": 0.734375, "sample": [8.774192810058594, 37.25785446166992, -6.5198211669921875, 47.3267822265625, 12.666427612304688, 3.9537601470947266, 33.65108871459961, -7.043449401855469, 30.064727783203125, 8.374824523925781, 7.952262878417969, 9.867603302001953, 3.2698192596435547, 2.7728118896484375, 15.21854019165039, 29.988428115844727, 3.0956573486328125, 1.2591743469238281, 3.4481887817382812, -1.5691375732421875, 6.010395050048828, -11.940361022949219, -18.934066772460938, 12.5753173828125, 19.900558471679688, -10.179601669311523, 6.401924133300781, 10.814109802246094, -1.5390357971191406, -2.099029541015625, -3.72601318359375, 16.468765258789062, 12.605224609375, 2.8002567291259766, -5.822790145874023, 21.185894012451172, 26.3944091796875, -3.75927734375, 5.727142333984375, 7.395652770996094, 10.117820739746094, 24.633691787719727, 0.09263801574707031, 1.166839599609375, 13.69927978515625, 4.988311767578125, -23.433048248291016, -3.0223312377929688, 10.635833740234375, 3.3404312133789062, 18.964502334594727, 0.542205810546875, 0.5503311157226562, 0.2732276916503906, 5.446308135986328, -12.785768508911133, -6.423980712890625, 29.566268920898438, -3.0629425048828125, 0.5620384216308594, -26.38482666015625, 11.222381591796875, 10.165580749511719, 12.862981796264648], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000302.npy"}
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 13.062989234924316, "std": 16.607994079589844, "min": -16.722122192382812, "p10": -10.434121131896969, "median": 14.75021743774414, "p90": 36.88464393615723, "max": 47.95691680908203, "pos_frac": 0.765625, "sample": [-4.2356109619140625, 20.409446716308594, 15.259689331054688, 13.271570205688477, 47.30712890625, 24.943618774414062, 24.442588806152344, -7.87696647644043, 24.37355613708496, 14.028778076171875, 31.180274963378906, -14.129867553710938, 7.696880340576172, -2.509103775024414, -16.722122192382812, -12.058265686035156, -13.990646362304688, 22.648162841796875, -14.005653381347656, -0.53350830078125, 0.09094810485839844, 8.88160514831543, 37.35748291015625, 19.59537124633789, -11.530044555664062, 14.91815185546875, 29.074533462524414, -11.997444152832031, -4.361995697021484, 16.205535888671875, 14.582283020019531, 5.173065185546875, 7.683979034423828, 1.4964141845703125, 5.134788513183594, 2.398357391357422, 39.08684158325195, 4.144065856933594, 21.615543365478516, 24.857254028320312, 5.1874542236328125, 19.750198364257812, 15.451026916503906, 38.39750671386719, 1.2036285400390625, 27.130950927734375, 45.079490661621094, 9.937705993652344, 17.275665283203125, 31.43688201904297, -5.7079925537109375, 1.275033950805664, -4.352321624755859, 17.110198974609375, 32.78284454345703, 18.1302490234375, 8.960077285766602, 16.150619506835938, 21.392494201660156, 35.78135299682617, 47.95691680908203, -6.593725204467773, 18.171142578125, 40.21723937988281], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000303.npy"}
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 12.898505210876465, "std": 12.441407203674316, "min": -22.810867309570312, "p10": -1.4864833831787105, "median": 13.423518180847168, "p90": 30.685760116577153, "max": 41.508750915527344, "pos_frac": 0.84375, "sample": [29.687427520751953, 0.7841720581054688, 20.96978759765625, 23.177444458007812, 3.6439285278320312, 13.863636016845703, 14.5938720703125, 14.62784194946289, -0.6310520172119141, 13.059370040893555, 6.7407073974609375, 25.06658935546875, 25.120941162109375, 7.9341888427734375, 23.930042266845703, 10.617889404296875, -2.3504409790039062, 12.469486236572266, 32.648460388183594, -22.810867309570312, 1.3029632568359375, 17.14897918701172, 11.064224243164062, 9.719362258911133, 6.774879455566406, 5.050102233886719, 1.4939899444580078, 13.787666320800781, 17.5914306640625, 27.841100692749023, 22.1812744140625, 3.6751232147216797, 32.872589111328125, 15.791648864746094, 6.325225830078125, -6.79681396484375, 18.748008728027344, 14.999931335449219, 6.912448883056641, 3.201171875, -0.75421142578125, -3.6688976287841797, 41.508750915527344, 15.0450439453125, 4.79454231262207, 3.0126113891601562, 14.19082260131836, 31.80010986328125, 23.913009643554688, 36.71263122558594, -2.3348846435546875, 0.9892768859863281, 31.113616943359375, -1.0187530517578125, 20.70440673828125, -6.423908233642578, 10.617036819458008, 37.44335174560547, 22.062454223632812, 13.033279418945312, 15.457199096679688, -1.6869392395019531, 13.953460693359375, 22.21164321899414], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000304.npy"}
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 9.658401489257812, "std": 16.37120819091797, "min": -34.64451599121094, "p10": -8.436299133300782, "median": 7.413021087646484, "p90": 31.75583572387696, "max": 43.905853271484375, "pos_frac": 0.703125, "sample": [24.897968292236328, 3.1796188354492188, -2.0630416870117188, -14.007408142089844, -1.7955398559570312, 1.5355415344238281, 41.0738639831543, -11.139408111572266, 15.8865966796875, 30.44611358642578, 5.64996337890625, 33.4590950012207, -18.8248291015625, 14.659626007080078, -3.7246551513671875, -7.008771896362305, 3.472553253173828, 3.46795654296875, -8.456069946289062, 23.022415161132812, 1.3278160095214844, 18.833545684814453, 6.918212890625, 10.968475341796875, 11.386802673339844, 12.496429443359375, 19.361465454101562, -6.181312561035156, 20.933990478515625, -34.64451599121094, 1.942840576171875, 14.71605110168457, 30.355892181396484, 21.154136657714844, -5.1168060302734375, 6.723121643066406, 43.905853271484375, -0.80938720703125, 7.907829284667969, -1.7883453369140625, 2.997669219970703, 6.725166320800781, 32.30242156982422, 41.41069030761719, 28.55376434326172, -0.9654006958007812, 4.4329986572265625, -12.832412719726562, 29.98389434814453, 15.949748992919922, 14.802566528320312, 8.899932861328125, 9.036518096923828, 30.48046875, 24.0701904296875, 33.44342041015625, -8.390167236328125, 13.71099853515625, -22.250999450683594, 21.634511947631836, -4.754932403564453, 4.230806350708008, 33.55659484863281, -3.0144805908203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000305.npy"}
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 8.633684158325195, "std": 12.969161987304688, "min": -20.099769592285156, "p10": -4.4783639907836905, "median": 6.0169219970703125, "p90": 25.61190757751465, "max": 56.74153137207031, "pos_frac": 0.796875, "sample": [40.287452697753906, 17.929100036621094, 2.168508529663086, 27.578414916992188, 4.5572509765625, 6.325141906738281, 2.3207359313964844, 11.48153305053711, 5.594110488891602, 18.221263885498047, 25.70635223388672, 10.186965942382812, -14.589553833007812, 7.577201843261719, -1.5066261291503906, -20.099769592285156, 16.215866088867188, 5.615631103515625, 12.312305450439453, 4.215763092041016, 8.116203308105469, 28.625831604003906, 13.456840515136719, 1.9036235809326172, 11.33311653137207, 6.2615814208984375, -1.920858383178711, 17.76140594482422, 3.3395233154296875, -0.624267578125, -15.704452514648438, 26.46453857421875, -3.1121292114257812, 1.0824832916259766, 9.863029479980469, 16.7020263671875, 25.408573150634766, 25.699050903320312, -0.384796142578125, 5.346469879150391, 4.74583625793457, 3.798633575439453, 5.022518157958984, 15.092262268066406, -3.878997802734375, 0.7374954223632812, -4.735235214233398, 22.316299438476562, 2.408233642578125, 18.451370239257812, 18.501358032226562, 17.778135299682617, 3.38983154296875, 5.7722625732421875, 7.4057159423828125, 2.0063953399658203, 4.4398651123046875, -13.387733459472656, 13.691337585449219, 56.74153137207031, -12.598838806152344, 11.556396484375, 16.33673095703125, -4.751070022583008], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000306.npy"}
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 7.749030113220215, "std": 16.54621696472168, "min": -19.406333923339844, "p10": -10.639440917968749, "median": 4.093682289123535, "p90": 29.41520233154297, "max": 62.29023742675781, "pos_frac": 0.5625, "sample": [28.710128784179688, -2.5128936767578125, 7.2326202392578125, -5.0563507080078125, -18.433006286621094, 14.441017150878906, 62.29023742675781, -1.1768875122070312, 29.717376708984375, 27.53265380859375, -3.7532196044921875, 15.264785766601562, 9.054706573486328, 40.224090576171875, -19.406333923339844, 17.110397338867188, -1.4631500244140625, -1.08056640625, 15.463630676269531, -7.357818603515625, 8.848739624023438, -15.298803329467773, -1.2803573608398438, -14.143325805664062, 8.549880981445312, -14.476181030273438, -9.250495910644531, 13.813304901123047, 26.958999633789062, -3.0500755310058594, 0.9639511108398438, 14.914804458618164, -6.428516387939453, -7.647861480712891, -12.975227355957031, 17.965423583984375, -3.4505233764648438, -0.4735260009765625, 15.197160720825195, -6.81298828125, 6.340341567993164, -1.6512832641601562, 0.9011802673339844, 30.44139862060547, 31.329090118408203, -4.806404113769531, 6.773929595947266, 21.692665100097656, 33.65542984008789, 19.40265655517578, -9.165252685546875, 28.10955810546875, 26.232093811035156, 1.8470230102539062, 20.320587158203125, 23.589256286621094, 0.5292205810546875, 31.990097045898438, -8.869361877441406, -3.9608230590820312, -11.234703063964844, 21.438152313232422, 17.202377319335938, -4.895122528076172], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000307.npy"}
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 10.280960083007812, "std": 14.93410587310791, "min": -28.048072814941406, "p10": -8.84818229675293, "median": 8.482166290283203, "p90": 32.23186473846436, "max": 45.85162353515625, "pos_frac": 0.734375, "sample": [31.59117317199707, 15.119625091552734, 12.33447265625, -6.807643890380859, 20.337615966796875, 23.175872802734375, -0.458526611328125, 14.1993408203125, 5.06597900390625, -1.8843841552734375, 37.398658752441406, 10.249603271484375, 34.40668487548828, 11.448787689208984, 1.1944122314453125, -10.978069305419922, 6.213935852050781, 19.942489624023438, 23.72332191467285, 9.183826446533203, 19.048736572265625, -9.96035385131836, 34.81957244873047, 22.158666610717773, 17.265884399414062, 25.772445678710938, 4.421640396118164, 5.341514587402344, 15.70794677734375, -11.757461547851562, 33.939796447753906, -2.1603546142578125, 6.618865966796875, -9.195804595947266, 16.951820373535156, 2.1751251220703125, 19.980430603027344, 3.1277847290039062, 8.957809448242188, -28.048072814941406, 5.9192352294921875, 5.951244354248047, 29.699234008789062, 32.71737289428711, 4.969150543212891, 8.006523132324219, 16.101234436035156, -8.037063598632812, 18.497800827026367, -1.2800788879394531, 28.03481101989746, -0.8937530517578125, -1.7011260986328125, 27.01611328125, 45.85162353515625, -1.8775863647460938, 32.506446838378906, -15.603046417236328, -10.33213996887207, 5.365583419799805, 2.6212596893310547, -7.291748046875, 9.522697448730469, 1.5945301055908203], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000308.npy"}
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 9.116422653198242, "std": 14.70263957977295, "min": -23.667724609375, "p10": -8.36365509033203, "median": 8.9345121383667, "p90": 26.93173065185547, "max": 45.58414840698242, "pos_frac": 0.71875, "sample": [14.784732818603516, 45.58414840698242, 4.083274841308594, 30.265972137451172, 16.55280113220215, 2.6947708129882812, 17.859241485595703, -12.395538330078125, 9.822982788085938, 0.8654632568359375, 26.379547119140625, -1.274728775024414, -10.42922592163086, -5.44154167175293, 8.265090942382812, -18.90871810913086, 8.126928329467773, 27.168380737304688, -5.323005676269531, -3.3244800567626953, 4.227054595947266, 25.53274154663086, 31.327917098999023, -4.148859024047852, 22.47539520263672, 15.799522399902344, 0.20911216735839844, 8.48237419128418, -4.523109436035156, -0.8780899047851562, 4.386228561401367, -11.2293701171875, -0.8775787353515625, 12.820037841796875, 9.386650085449219, 10.68276596069336, 13.827808380126953, 9.753429412841797, -2.318033218383789, 21.18169403076172, -5.97796630859375, 24.02728271484375, 24.733966827392578, 18.500141143798828, 40.32743835449219, -18.009552001953125, 25.719863891601562, 15.38522720336914, 3.775388717651367, 16.01306915283203, -5.0136871337890625, 3.7938995361328125, 10.869522094726562, 14.478620529174805, 38.703285217285156, 4.306018829345703, 11.681068420410156, 19.16817855834961, -9.386093139648438, -23.667724609375, 5.479034423828125, 0.6917572021484375, 9.585718154907227, 36.79277038574219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000309.npy"}
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 5.901206970214844, "std": 13.409224510192871, "min": -14.453399658203125, "p10": -10.600378036499023, "median": 3.38204288482666, "p90": 22.718174171447767, "max": 40.30322265625, "pos_frac": 0.609375, "sample": [17.523422241210938, -0.77825927734375, 12.71419906616211, 7.493021011352539, 10.723220825195312, -12.136329650878906, -6.512565612792969, 15.509849548339844, 24.077430725097656, 3.2440662384033203, -4.467275619506836, -13.96141242980957, 13.959159851074219, 34.57173156738281, 1.8644256591796875, -1.9238548278808594, 1.2624778747558594, 4.2098846435546875, 40.151397705078125, -8.301040649414062, 2.446331024169922, -2.6821460723876953, -8.430885314941406, 40.30322265625, -4.066337585449219, -9.863605499267578, 17.998130798339844, -4.549468994140625, 3.52001953125, 16.220914840698242, -13.773597717285156, 13.916152954101562, 4.520294189453125, 14.4837646484375, -1.143463134765625, 6.370635986328125, 1.0682373046875, 12.744071960449219, 11.000045776367188, 11.980056762695312, -1.2937393188476562, 10.977790832519531, -12.084686279296875, -14.453399658203125, 18.091773986816406, 7.550987243652344, -1.3431854248046875, 16.772808074951172, -0.89385986328125, -5.854248046875, -0.9769821166992188, 30.03400230407715, 14.53913688659668, -10.9161376953125, 0.5785083770751953, -5.479639053344727, 17.610427856445312, 33.39711380004883, 19.54657554626465, -9.751029968261719, -12.635856628417969, 26.49193572998047, 1.2240886688232422, 5.2589111328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000310.npy"}
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 9.02243423461914, "std": 15.172526359558105, "min": -31.720504760742188, "p10": -7.903351402282713, "median": 7.5512542724609375, "p90": 26.486651611328128, "max": 63.01824951171875, "pos_frac": 0.75, "sample": [17.64234161376953, -0.563079833984375, 2.263763427734375, -9.916336059570312, -4.2718658447265625, -5.826805114746094, 5.660137176513672, 22.42292022705078, 2.9846057891845703, 26.023239135742188, 7.445777893066406, -31.720504760742188, 8.885248184204102, 16.562705993652344, 11.258651733398438, -8.858453750610352, 21.54210662841797, 2.806640625, 27.728424072265625, 7.656730651855469, 2.9496307373046875, 3.6166534423828125, 3.7850818634033203, 4.959327697753906, 20.7083740234375, 3.8335952758789062, -11.939815521240234, -5.873340606689453, 6.3274688720703125, 7.696990966796875, -14.253204345703125, 34.52403259277344, 50.26054382324219, 6.033620834350586, 11.177963256835938, 18.630937576293945, 9.596603393554688, 15.245479583740234, 11.4354248046875, 8.27587890625, 63.01824951171875, 6.5469970703125, -5.948875427246094, 26.685256958007812, -8.740983963012695, 4.9271697998046875, 3.6631927490234375, -2.3407211303710938, 17.20531463623047, 10.817085266113281, 33.07682800292969, 21.9322509765625, 9.184249877929688, -16.5714168548584, 10.462825775146484, 16.147472381591797, 3.9769363403320312, -3.5702133178710938, -2.1316795349121094, 17.6099853515625, -1.06634521484375, 18.674781799316406, 33.52055358886719, 13.639404296875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000311.npy"}
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 11.816064834594727, "std": 13.647669792175293, "min": -18.74740219116211, "p10": -4.501096725463866, "median": 9.50267219543457, "p90": 30.80719757080078, "max": 48.33055877685547, "pos_frac": 0.875, "sample": [10.17947769165039, 5.5000762939453125, 2.3099365234375, 8.749374389648438, 16.497501373291016, 4.296760559082031, 7.616783142089844, 0.05940818786621094, 4.0456085205078125, 2.752532958984375, 41.33164978027344, 15.597747802734375, 24.507522583007812, 22.45131492614746, 2.0594406127929688, 15.452064514160156, 7.4339752197265625, 2.2104339599609375, 4.839710235595703, 27.556732177734375, 34.85697937011719, 6.167610168457031, -5.224948883056641, 14.74456787109375, 8.529167175292969, 30.437149047851562, 4.877410888671875, -7.8466339111328125, 5.2368011474609375, 25.616044998168945, 38.49768829345703, -18.74740219116211, 18.551036834716797, 30.965789794921875, -16.537086486816406, 10.84341049194336, 27.070083618164062, 11.876594543457031, 12.8936767578125, 39.88377380371094, 48.33055877685547, 4.010669708251953, 12.022964477539062, 7.7780609130859375, 20.79705810546875, 5.502861022949219, 5.7976531982421875, 24.394317626953125, 14.616813659667969, -5.378627777099609, -3.4073867797851562, 13.775821685791016, 33.655731201171875, 8.82586669921875, 0.854888916015625, 10.374435424804688, 11.047256469726562, 18.30085563659668, -12.8846435546875, 19.664825439453125, 13.355552673339844, -4.969829559326172, 3.6169967651367188, 8.005691528320312], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000312.npy"}
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 7.683980941772461, "std": 15.928030014038086, "min": -25.695335388183594, "p10": -12.329310989379882, "median": 6.335763931274414, "p90": 26.631548690795903, "max": 61.005767822265625, "pos_frac": 0.65625, "sample": [-7.759563446044922, 14.733268737792969, 17.301973342895508, 9.241718292236328, 12.777900695800781, 14.856632232666016, 8.450267791748047, 2.3849220275878906, 24.75799560546875, -12.554859161376953, 0.3485393524169922, 18.090131759643555, 13.75467300415039, 6.741096496582031, 6.774345397949219, 2.017913818359375, -3.8862380981445312, -3.7451705932617188, 5.6299285888671875, -3.76788330078125, 15.342506408691406, 15.10125732421875, -0.2233448028564453, 39.63148498535156, -25.695335388183594, -13.249713897705078, -1.0266189575195312, 3.996002197265625, 25.701904296875, 12.399375915527344, 35.07909393310547, 25.509349822998047, -19.49371337890625, -4.375436782836914, 34.59619903564453, 23.86968994140625, 2.9751129150390625, -1.3429985046386719, -3.1355438232421875, 39.850074768066406, 6.709682464599609, 3.3697471618652344, 15.567581176757812, -7.494071960449219, 61.005767822265625, 3.3284950256347656, -2.7466888427734375, -0.13507080078125, 8.355209350585938, 5.961845397949219, 26.992919921875, 13.0260009765625, 5.620094299316406, -1.541482925415039, -11.803031921386719, -21.0169677734375, 14.701240539550781, -12.76650619506836, 25.788349151611328, -7.603271484375, -15.299846649169922, 12.10784912109375, 10.051078796386719, 27.93890953063965], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000313.npy"}
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 12.271422386169434, "std": 13.39448356628418, "min": -13.28436279296875, "p10": -3.561219024658203, "median": 9.58973217010498, "p90": 30.966098022460947, "max": 44.017242431640625, "pos_frac": 0.84375, "sample": [19.079246520996094, 5.432243347167969, 8.925148010253906, 37.455665588378906, 16.641815185546875, 16.24909210205078, 5.8580474853515625, 13.722877502441406, -3.3595962524414062, 17.4335994720459, 0.5456619262695312, -5.4803619384765625, 44.017242431640625, 9.001434326171875, -2.3514633178710938, 41.51643371582031, 16.075092315673828, 14.233842849731445, 27.799049377441406, 0.4789276123046875, -0.5202102661132812, 5.5853271484375, 25.094484329223633, 28.850582122802734, 25.854537963867188, -13.28436279296875, 6.459541320800781, 28.96379852294922, 7.260101318359375, 10.229263305664062, 22.891382217407227, 13.1402587890625, 5.6829833984375, 1.4931373596191406, 2.0935630798339844, 7.482852935791016, 25.4754638671875, 1.3968658447265625, 3.4270687103271484, 13.280574798583984, 1.0694541931152344, -8.45654296875, 35.84362030029297, 6.397960662841797, 15.215370178222656, 16.981002807617188, 22.557525634765625, 40.18180847167969, 0.3035736083984375, 26.875877380371094, 8.605194091796875, 32.044715881347656, 17.64898681640625, 7.457923889160156, 19.21185302734375, 8.272979736328125, -4.648048400878906, 11.070777893066406, 5.057605743408203, -3.6476287841796875, 10.178030014038086, -9.282991409301758, -9.523429870605469, 31.82422637939453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000314.npy"}
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 9.909273147583008, "std": 12.977156639099121, "min": -19.021991729736328, "p10": -5.997245025634764, "median": 10.737010955810547, "p90": 26.000585937500002, "max": 42.36345672607422, "pos_frac": 0.734375, "sample": [-8.63775634765625, 12.368560791015625, 23.478361129760742, 6.64820671081543, -2.089628219604492, 11.611764907836914, -19.021991729736328, -4.600900650024414, -7.392971038818359, 12.192146301269531, -13.608718872070312, 15.393743515014648, 4.7342681884765625, 25.485794067382812, -0.4537239074707031, 17.039581298828125, 11.622230529785156, -2.3577728271484375, 1.4007072448730469, -0.5265922546386719, 10.842048645019531, 26.152847290039062, 1.7537498474121094, 8.74755859375, 6.7232818603515625, 15.41510009765625, 34.55523681640625, -4.2067718505859375, 5.844287872314453, 20.137914657592773, 18.244644165039062, 10.631973266601562, 25.645309448242188, 6.3204345703125, 3.5977020263671875, 20.414276123046875, 14.329185485839844, -0.512359619140625, -13.904220581054688, 11.482192993164062, 24.684566497802734, 0.8210296630859375, 16.95294189453125, 25.19582748413086, -0.6052780151367188, 12.685073852539062, 21.36502456665039, 42.36345672607422, -17.17782211303711, 8.117866516113281, 17.863475799560547, 4.179378509521484, 17.599517822265625, -1.674966812133789, 19.022296905517578, 8.081863403320312, 26.797090530395508, 11.69970703125, 31.352710723876953, -6.595678329467773, 30.66265106201172, -1.2818660736083984, 28.10437774658203, 8.480522155761719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000315.npy"}
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 11.19745922088623, "std": 13.94548225402832, "min": -17.2379150390625, "p10": -4.7341064453124995, "median": 10.300174713134766, "p90": 32.16822509765626, "max": 53.85414123535156, "pos_frac": 0.765625, "sample": [-3.9768829345703125, 4.393669128417969, 10.913299560546875, -7.7936553955078125, -17.2379150390625, -2.3282318115234375, 10.8873291015625, 12.662956237792969, 1.9748115539550781, 19.713722229003906, 27.844348907470703, 35.414306640625, 17.512432098388672, 33.54234313964844, 14.858612060546875, -4.3337554931640625, -8.676319122314453, 7.307605743408203, -8.408208847045898, 25.720308303833008, 13.719919204711914, 33.00224304199219, 2.0764617919921875, 7.794837951660156, 5.773223876953125, 6.9385528564453125, 16.147010803222656, 2.9247817993164062, -5.832553863525391, 12.96185302734375, 25.64374542236328, 35.25726318359375, 17.427547454833984, 53.85414123535156, -1.8110218048095703, 9.8192138671875, -4.0304718017578125, 16.928836822509766, 18.27672576904297, 10.675392150878906, 22.494888305664062, 13.891075134277344, 13.373516082763672, 2.2282638549804688, 28.454864501953125, 0.8976955413818359, -1.2285404205322266, 30.222183227539062, -1.3053512573242188, 24.887222290039062, -4.9056854248046875, -7.280633926391602, 2.146728515625, -3.723613739013672, 6.397315979003906, 3.2234344482421875, 1.4573554992675781, 37.00920486450195, 9.837173461914062, 13.988746643066406, 21.736984252929688, 9.924957275390625, 33.39224624633789, 11.978927612304688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000316.npy"}
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 13.624078750610352, "std": 13.75035572052002, "min": -11.354972839355469, "p10": -4.1356258392333975, "median": 14.29769515991211, "p90": 33.20901565551758, "max": 46.18650817871094, "pos_frac": 0.828125, "sample": [18.43939971923828, 20.56930160522461, 14.675724029541016, 31.272354125976562, 9.06982421875, 46.18650817871094, -4.416767120361328, 8.291107177734375, -2.6168575286865234, 9.365943908691406, 2.0038795471191406, 6.60205078125, -7.320930480957031, 1.5478668212890625, 13.633346557617188, 3.4699783325195312, 28.3345947265625, 3.6290740966796875, 30.02667999267578, 33.631805419921875, -6.023658752441406, 1.3084945678710938, 17.45812225341797, -11.354972839355469, 16.659046173095703, 11.9896240234375, 21.847198486328125, 20.100698471069336, 24.9813232421875, -6.673583984375, 39.03472900390625, 2.895263671875, 18.451738357543945, 39.40047836303711, 33.87284851074219, -8.619644165039062, 8.771560668945312, -1.3100166320800781, 16.142986297607422, 33.47406768798828, 5.924434661865234, 14.286331176757812, 40.81287384033203, -6.0756072998046875, 1.83355712890625, 15.048877716064453, 15.119827270507812, 23.861297607421875, 8.497322082519531, 3.6302642822265625, 19.56055450439453, 14.865219116210938, 14.483194351196289, 28.088027954101562, -0.9008369445800781, 14.309059143066406, -3.4796295166015625, 16.289779663085938, 31.370264053344727, 32.59056091308594, 2.6221351623535156, 24.53668975830078, 6.4449310302734375, 9.420692443847656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000317.npy"}
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 8.022417068481445, "std": 18.142398834228516, "min": -37.672637939453125, "p10": -15.680245208740235, "median": 7.7129011154174805, "p90": 30.995749664306643, "max": 44.19331359863281, "pos_frac": 0.640625, "sample": [9.593280792236328, 21.87957000732422, 6.2503814697265625, -15.440933227539062, 25.522186279296875, 26.983280181884766, 11.730667114257812, -20.8125, 7.929067611694336, 43.765419006347656, 2.2480812072753906, -15.745506286621094, 21.268428802490234, 22.146224975585938, 29.333980560302734, 9.652420043945312, 4.7010040283203125, -8.688932418823242, 14.412223815917969, 1.1418514251708984, -2.131671905517578, 19.709854125976562, 34.85614776611328, -15.527969360351562, 2.7412948608398438, 20.194561004638672, 31.396697998046875, -37.672637939453125, 13.927932739257812, 16.225616455078125, -3.0289306640625, 14.00754165649414, -11.533329010009766, 4.760196685791016, -0.7979888916015625, 44.078125, 44.19331359863281, -7.446449279785156, 30.060203552246094, -1.826080322265625, -3.04193115234375, 2.912689208984375, 19.22699737548828, -27.42609405517578, -1.4636001586914062, 26.281112670898438, -5.665672302246094, 38.04515075683594, -0.08137893676757812, 21.776424407958984, 3.184051513671875, 36.283203125, 9.645334243774414, 11.479522705078125, -17.965839385986328, -3.9221954345703125, 18.784011840820312, -17.489559173583984, 7.496734619140625, -3.6522750854492188, 21.27490234375, -20.19103240966797, -10.959342956542969, 14.846858978271484], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000318.npy"}
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 11.609233856201172, "std": 15.71576976776123, "min": -18.22174072265625, "p10": -7.884231567382812, "median": 11.000808715820312, "p90": 30.039986801147464, "max": 52.13079833984375, "pos_frac": 0.75, "sample": [14.041353225708008, 30.18896484375, 6.3495025634765625, 22.320648193359375, 4.291934967041016, 36.84538269042969, 6.723323822021484, 9.563400268554688, -8.174957275390625, 11.802658081054688, 25.39777374267578, 23.072906494140625, -0.8568782806396484, -0.05577850341796875, 10.198959350585938, 25.461807250976562, 0.7967414855957031, 52.13079833984375, 4.39055061340332, 23.99079132080078, 5.519512176513672, -4.859169006347656, 8.4019775390625, 14.805255889892578, 44.99726867675781, -3.2382736206054688, 33.33094024658203, 16.944446563720703, 29.114830017089844, 25.222808837890625, -7.20587158203125, -12.538421630859375, 12.944656372070312, 51.99298095703125, -9.324026107788086, 13.015214920043945, -5.580352783203125, 18.257396697998047, 18.551246643066406, -14.61819076538086, 29.692371368408203, 9.634223937988281, 0.17017745971679688, 4.246391296386719, 0.24488067626953125, 5.56108283996582, -18.22174072265625, -1.3641738891601562, -10.69560432434082, 28.444866180419922, -0.6294403076171875, 7.260810852050781, 7.5419158935546875, -6.670494079589844, 12.802169799804688, 21.834014892578125, 15.687641143798828, 21.315704345703125, 19.181060791015625, 14.290262222290039, 14.1854248046875, -15.777904510498047, 36.26055908203125, 13.782646179199219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000319.npy"}
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 9.009833335876465, "std": 16.50977325439453, "min": -27.883071899414062, "p10": -8.661898803710937, "median": 6.081873893737793, "p90": 29.796224212646493, "max": 64.956787109375, "pos_frac": 0.703125, "sample": [18.50474739074707, -8.493526458740234, 2.358461380004883, 11.024707794189453, 7.1884613037109375, 6.318296432495117, 31.27413558959961, -0.43772125244140625, 18.39655303955078, -3.0552940368652344, -7.23175048828125, 21.881187438964844, 0.07684707641601562, -2.232379913330078, -14.606483459472656, 27.5408935546875, 14.291839599609375, 0.10819244384765625, -3.691791534423828, 23.10790252685547, -12.509885787963867, 8.149894714355469, -27.883071899414062, 3.1079578399658203, -1.5459442138671875, 64.956787109375, 52.58330535888672, 23.16307830810547, 13.994758605957031, -8.851142883300781, 3.8140716552734375, -10.903335571289062, 11.396484375, 11.013916015625, 0.08967208862304688, 40.50543212890625, 4.762226104736328, -3.4963760375976562, -8.734058380126953, -3.006084442138672, 14.314460754394531, 34.437618255615234, 4.579433441162109, 9.540573120117188, 3.548858642578125, 4.272893905639648, 39.239418029785156, 12.198593139648438, 2.8031234741210938, -7.029237747192383, 2.1131820678710938, -12.924880981445312, 30.762794494628906, -7.460792541503906, 5.845451354980469, 19.303382873535156, 17.885757446289062, 22.483699798583984, 27.125823974609375, 6.3242034912109375, 10.718500137329102, -7.317054748535156, 21.010650634765625, 19.92190933227539], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000320.npy"}
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 9.351762771606445, "std": 14.431863784790039, "min": -24.080520629882812, "p10": -10.777281570434567, "median": 9.508888244628906, "p90": 31.13873176574707, "max": 39.32516098022461, "pos_frac": 0.765625, "sample": [10.37942123413086, 18.283004760742188, -0.4832439422607422, 15.196510314941406, 13.653579711914062, -1.7654342651367188, 34.202205657958984, 10.699304580688477, 33.21255874633789, 5.7424774169921875, 34.148868560791016, 5.649135589599609, 7.248424530029297, 23.057044982910156, -17.124160766601562, 0.020816802978515625, 0.27562713623046875, 11.994499206542969, 10.934181213378906, 25.425514221191406, 9.275886535644531, 5.112678527832031, -20.95001220703125, -3.4146652221679688, 9.872634887695312, -11.646060943603516, -12.899660110473633, -14.466836929321289, 19.226808547973633, 13.711532592773438, 6.215087890625, 19.345539093017578, 20.441787719726562, 13.301898956298828, 3.215728759765625, 19.886474609375, 14.296485900878906, -4.192438125610352, 5.184436798095703, 18.635711669921875, 0.965179443359375, -2.621002197265625, 0.4176139831542969, 24.642990112304688, 7.10321044921875, -24.080520629882812, 25.720123291015625, 31.27599334716797, 35.19265365600586, 16.040306091308594, 6.318016052246094, 5.622781753540039, -0.25135040283203125, 30.81845474243164, 34.21067810058594, 9.897346496582031, 8.003440856933594, 6.038810729980469, 39.32516098022461, -18.08045196533203, 9.741889953613281, -0.1161956787109375, 10.174446105957031, -8.750129699707031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000321.npy"}
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 13.090594291687012, "std": 13.078771591186523, "min": -24.09939193725586, "p10": -1.2362201690673813, "median": 10.194997787475586, "p90": 30.714234542846683, "max": 44.43004608154297, "pos_frac": 0.890625, "sample": [-2.373258590698242, 16.293724060058594, 30.9215087890625, 1.995849609375, 20.75539779663086, 6.554821014404297, 25.185386657714844, 19.027860641479492, 19.605998992919922, 9.826208114624023, 5.8842010498046875, 15.710350036621094, 34.2673225402832, 35.56089401245117, 17.029205322265625, 10.169116973876953, 8.150978088378906, 8.241531372070312, 1.6738872528076172, 6.918853759765625, 11.225578308105469, 2.6016006469726562, -10.695695877075195, 30.975736618041992, 26.813236236572266, 17.777633666992188, 5.740699768066406, 11.706048965454102, 0.17647171020507812, 10.182098388671875, 3.8123626708984375, 9.88104248046875, 15.443939208984375, 23.728607177734375, 30.230594635009766, 9.787845611572266, 8.81512451171875, 36.97138214111328, 5.889793395996094, -4.1720733642578125, 19.136489868164062, 22.222824096679688, 2.717020034790039, 28.262943267822266, 10.207897186279297, 34.74018478393555, 29.735538482666016, 17.460050582885742, -11.70199966430664, -24.09939193725586, 9.67447280883789, 4.884208679199219, 5.465118408203125, 14.071674346923828, 29.787086486816406, 8.92791748046875, 12.945541381835938, 4.1752471923828125, 17.701904296875, -11.907587051391602, 44.43004608154297, -1.8416595458984375, 25.08167839050293, 7.428972244262695], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000322.npy"}
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 9.217557907104492, "std": 17.676233291625977, "min": -25.942413330078125, "p10": -8.275540542602538, "median": 4.6636199951171875, "p90": 37.13615646362305, "max": 57.2752685546875, "pos_frac": 0.640625, "sample": [0.7388992309570312, -8.46142578125, 6.8994140625, -0.8299102783203125, -0.45203399658203125, 23.215469360351562, 14.7523193359375, 9.527267456054688, 0.5511245727539062, 43.15562438964844, 7.150115966796875, -3.818540573120117, 32.72492218017578, 7.982818603515625, 38.00011444091797, -7.841808319091797, -17.819229125976562, 25.5224609375, 18.93035125732422, 7.5826873779296875, -3.5150604248046875, 18.939834594726562, 3.2928466796875, 8.572120666503906, 45.372657775878906, -3.3823890686035156, -13.45947265625, 13.332845687866211, -0.10490798950195312, 4.1934814453125, 57.2752685546875, -5.531341552734375, -0.7711334228515625, -7.367343902587891, 27.975250244140625, -0.13375091552734375, -9.3890380859375, 22.203746795654297, 15.888870239257812, 1.7146129608154297, 10.721336364746094, 10.081100463867188, -3.15777587890625, 0.17359161376953125, 35.12025451660156, 53.434749603271484, 44.2142333984375, -22.80345916748047, 40.20478820800781, -1.5489959716796875, -25.942413330078125, -1.8890533447265625, 15.341522216796875, 5.133758544921875, 1.66717529296875, 22.9139461517334, 19.124725341796875, 14.444358825683594, 3.83837890625, -9.534841537475586, 7.883525848388672, 1.1207427978515625, -0.5897216796875, -2.64599609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000323.npy"}
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 10.665374755859375, "std": 12.177136421203613, "min": -20.422801971435547, "p10": -2.6014873504638665, "median": 9.609149932861328, "p90": 25.01709060668946, "max": 41.6529541015625, "pos_frac": 0.828125, "sample": [22.41668701171875, 7.624015808105469, 6.884864807128906, 1.6534652709960938, -6.774532318115234, 29.55426025390625, 14.269161224365234, 12.917869567871094, -1.9196853637695312, -0.7865066528320312, 19.84661102294922, 21.15958595275879, 10.586563110351562, 15.54885482788086, 16.950462341308594, 2.9770240783691406, 23.078125, -12.781036376953125, 13.670616149902344, -0.753265380859375, 13.810874938964844, 10.282958984375, 4.642662048339844, 15.778648376464844, 9.965572357177734, 5.796804428100586, -8.069938659667969, 0.5203094482421875, 40.622772216796875, 9.753475189208984, 3.302501678466797, -20.422801971435547, 17.546281814575195, 22.612316131591797, 7.809814453125, 9.464824676513672, 6.586902618408203, -2.893688201904297, 15.443984985351562, 15.989028930664062, 13.794364929199219, -4.50616455078125, 41.6529541015625, 1.6099395751953125, 37.680946350097656, -0.20629501342773438, 2.2809600830078125, 17.076644897460938, 11.684608459472656, 28.618473052978516, 37.29255676269531, 4.467838287353516, 4.559967041015625, 14.674545288085938, 17.66443634033203, 5.6623382568359375, 3.257781982421875, -4.0223846435546875, 5.43890380859375, 8.83127212524414, 25.762428283691406, 23.277969360351562, 8.842803955078125, 2.5206222534179688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000324.npy"}
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 9.898791313171387, "std": 15.229973793029785, "min": -40.84617614746094, "p10": -6.963713455200195, "median": 8.89697265625, "p90": 32.33219146728516, "max": 49.07579040527344, "pos_frac": 0.8125, "sample": [8.308368682861328, 29.498207092285156, 2.1285037994384766, 18.94771385192871, 3.6097412109375, 9.052490234375, 20.345317840576172, 14.777725219726562, 9.321052551269531, 9.935966491699219, 17.112594604492188, 19.3336124420166, 1.1292800903320312, -6.723384857177734, 4.427032470703125, -4.46630859375, 30.91656494140625, 11.407997131347656, 9.845703125, -7.630498886108398, 49.07579040527344, -10.34033203125, 24.155065536499023, 1.5812015533447266, 0.2987537384033203, 40.5252685546875, 1.6205940246582031, 16.096885681152344, -7.06671142578125, 10.411783218383789, 4.4733734130859375, 10.101215362548828, 17.22441864013672, 8.741455078125, -12.290626525878906, 25.6490478515625, -8.742900848388672, 23.916900634765625, 10.596452713012695, 39.970184326171875, 5.914573669433594, 1.0423336029052734, 5.261444091796875, 35.54901123046875, -40.84617614746094, 6.356834411621094, 1.244232177734375, 2.2304534912109375, 32.93888854980469, 12.721334457397461, 43.600379943847656, -5.954540252685547, 6.946935653686523, -9.088523864746094, 3.8723068237304688, -1.8971099853515625, 10.624286651611328, 14.07271957397461, 33.70115661621094, 5.7392425537109375, 10.859939575195312, 3.6753463745117188, 9.511329650878906, -1.8292388916015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000325.npy"}
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 5.465770721435547, "std": 13.86809253692627, "min": -21.906005859375, "p10": -12.571469116210936, "median": 5.88720703125, "p90": 22.31646823883057, "max": 39.48712158203125, "pos_frac": 0.625, "sample": [-11.121406555175781, -1.3752403259277344, 9.124797821044922, 3.778003692626953, 15.7442626953125, -21.906005859375, 32.8426513671875, 12.181976318359375, -3.7889556884765625, 24.84130096435547, -3.271495819091797, 9.547008514404297, 6.815093994140625, -3.1439590454101562, 7.750659942626953, 15.883522033691406, 2.802335739135742, 11.323341369628906, -3.501373291015625, 14.223220825195312, 21.414451599121094, 1.3772659301757812, 9.955352783203125, 19.857627868652344, -9.968269348144531, 1.4543304443359375, -19.346450805664062, -5.1388092041015625, 4.5635986328125, -1.0335922241210938, 31.548290252685547, -18.024093627929688, 11.907768249511719, -4.502536773681641, 19.46300506591797, 10.472801208496094, 6.044502258300781, 4.7513580322265625, -7.788707733154297, -9.87713623046875, 22.703046798706055, 30.05103302001953, -13.192924499511719, -0.1630859375, 18.56924057006836, 5.310859680175781, 10.620845794677734, -14.487380981445312, 12.979635238647461, 27.028709411621094, -1.0354347229003906, -4.664009094238281, -20.567138671875, 18.93659210205078, -9.860282897949219, 11.315372467041016, -18.592710494995117, 7.4243927001953125, 11.961185455322266, 39.48712158203125, 17.456628799438477, -7.554225921630859, 5.729911804199219, 14.471450805664062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000326.npy"}
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 8.68751335144043, "std": 15.975384712219238, "min": -31.980361938476562, "p10": -10.461979484558105, "median": 8.356039047241211, "p90": 26.287033081054695, "max": 55.13848876953125, "pos_frac": 0.703125, "sample": [17.886131286621094, -11.806900024414062, -7.8963775634765625, 55.13848876953125, 13.038284301757812, 15.83319091796875, -16.351884841918945, 13.952629089355469, -10.944650650024414, 18.629364013671875, 31.61696434020996, 5.721752166748047, -13.057411193847656, 24.063716888427734, 8.812690734863281, 22.238494873046875, 22.244247436523438, 1.1104240417480469, -0.39971923828125, -9.335746765136719, 1.2364883422851562, 7.899387359619141, 13.072774887084961, 24.436416625976562, 27.080154418945312, 1.3507232666015625, 11.669395446777344, 12.341270446777344, 17.58631134033203, -4.123664855957031, 17.68604278564453, 3.533794403076172, -9.331024169921875, 16.324783325195312, 17.362014770507812, -0.12702178955078125, 10.4385986328125, 7.756805419921875, -7.381072998046875, -1.8196449279785156, 33.920623779296875, 1.8687362670898438, -31.980361938476562, 6.33837890625, 16.04314422607422, 53.97969055175781, -3.4360733032226562, 16.472183227539062, 40.702781677246094, 12.153818130493164, 30.04852294921875, 1.1972007751464844, 7.0641326904296875, 4.125804901123047, -12.193614959716797, -3.0178470611572266, 18.603836059570312, 12.138389587402344, 11.827011108398438, -3.13641357421875, 1.0674285888671875, -16.22667121887207, 18.768447875976562, -7.814521789550781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000327.npy"}
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 9.735434532165527, "std": 12.533082008361816, "min": -12.6773681640625, "p10": -4.253986358642578, "median": 8.164691925048828, "p90": 30.041271781921388, "max": 40.072113037109375, "pos_frac": 0.75, "sample": [36.37803649902344, 3.3440189361572266, 9.682291030883789, 7.789497375488281, 16.698711395263672, 4.312351226806641, -3.944580078125, 20.068710327148438, 10.476470947265625, 10.604789733886719, -3.1324844360351562, 20.687088012695312, 31.50786590576172, 8.986125946044922, 7.14251708984375, -9.563644409179688, 34.14064025878906, 11.985345840454102, 17.643722534179688, -5.629673004150391, 8.479660034179688, -1.4677581787109375, 1.5489501953125, 12.65304183959961, -2.5133132934570312, 0.24448394775390625, 33.98277282714844, 7.849723815917969, -12.6773681640625, 10.552558898925781, -11.169364929199219, 18.330564498901367, 24.308868408203125, 7.6390380859375, 20.4320068359375, -0.6082115173339844, 1.0984039306640625, -2.4002151489257812, 40.072113037109375, 0.6375350952148438, 22.590797424316406, 19.883026123046875, -5.943647384643555, -0.32727622985839844, 4.2687835693359375, 15.1820068359375, 1.75592041015625, -1.43109130859375, 5.6223907470703125, 4.643333435058594, 12.544923782348633, -4.386589050292969, 12.48868179321289, 17.248031616210938, -7.235710144042969, 9.110218048095703, 16.559486389160156, 3.188394546508789, 29.413942337036133, 4.3865814208984375, 17.159032821655273, -3.397480010986328, 33.26262283325195, 30.31012725830078], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000328.npy"}
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 10.14764404296875, "std": 14.920702934265137, "min": -17.5970458984375, "p10": -5.944507789611816, "median": 6.965496063232422, "p90": 28.265579986572266, "max": 48.1654052734375, "pos_frac": 0.703125, "sample": [7.498716354370117, 2.3188209533691406, 7.4937896728515625, 5.526123046875, -6.2425079345703125, 10.701761245727539, 37.20306396484375, -0.38045501708984375, 28.46826171875, 4.168769836425781, 16.011154174804688, 9.657554626464844, -5.249174118041992, 18.027259826660156, -0.8491840362548828, 1.3446044921875, 7.404857635498047, 26.032760620117188, -7.0872650146484375, 8.782936096191406, 6.880027770996094, 4.73826789855957, 42.52043914794922, 18.648239135742188, -15.041419982910156, 30.231727600097656, 47.25078582763672, 14.082481384277344, -1.5246429443359375, -0.2867889404296875, 20.531410217285156, 21.258529663085938, -11.720649719238281, 16.596405029296875, 16.039464950561523, -1.7229766845703125, 42.34236145019531, -2.799104690551758, 24.83759880065918, 19.09406280517578, 26.803314208984375, 22.844528198242188, -0.6106605529785156, -8.99267578125, 1.0298652648925781, 21.27747344970703, -1.0619525909423828, 4.923097610473633, 6.392169952392578, 10.987533569335938, 0.6135749816894531, 19.13397216796875, 5.8852691650390625, -4.823326110839844, 48.1654052734375, 26.006942749023438, -2.3524417877197266, 27.79265594482422, 2.6627731323242188, -17.5970458984375, 7.05096435546875, 1.01885986328125, -0.3002643585205078, -10.188949584960938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000329.npy"}
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 10.122156143188477, "std": 15.604010581970215, "min": -34.716575622558594, "p10": -7.577618598937987, "median": 7.394935607910156, "p90": 28.77122383117676, "max": 43.75753402709961, "pos_frac": 0.796875, "sample": [3.9997329711914062, -6.511499404907227, -3.9636192321777344, -21.61380958557129, 4.5985260009765625, -34.716575622558594, 25.275070190429688, 5.465179443359375, -5.074058532714844, 28.932220458984375, 24.388320922851562, 8.143653869628906, -15.399465560913086, 4.4588165283203125, 14.188648223876953, 23.931396484375, 34.191078186035156, -11.039176940917969, 41.99778747558594, 28.395565032958984, 20.943279266357422, -5.239435195922852, 16.470443725585938, -1.3875179290771484, 24.096477508544922, 28.952194213867188, 5.215538024902344, 1.6706466674804688, -8.034526824951172, 5.2588043212890625, 3.7624568939208984, 17.50408172607422, 3.3994216918945312, 6.532817840576172, -14.094928741455078, 25.85803985595703, 19.82640838623047, 19.58868408203125, 18.98284149169922, 7.084041595458984, 0.7445220947265625, 15.032562255859375, 4.039833068847656, 18.762802124023438, 26.124954223632812, 26.618507385253906, 5.096508026123047, 7.705829620361328, 2.2826995849609375, 1.83563232421875, 1.1581802368164062, 12.560348510742188, 33.951438903808594, 19.881866455078125, 8.584213256835938, 0.178497314453125, 5.55255126953125, -6.376054763793945, 39.793487548828125, -15.1448974609375, 21.0214786529541, 20.099334716796875, 8.518562316894531, 43.75753402709961], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000330.npy"}
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 13.488290786743164, "std": 15.841630935668945, "min": -29.09783935546875, "p10": -5.392560482025142, "median": 12.402892112731934, "p90": 34.81804580688477, "max": 51.58869171142578, "pos_frac": 0.8125, "sample": [39.470096588134766, 4.566688537597656, 7.343082427978516, 18.946155548095703, 18.78054428100586, 25.07330322265625, 14.2010498046875, 26.1075439453125, 11.583709716796875, 0.715728759765625, 2.664155960083008, 39.31971740722656, 34.93506622314453, 34.54499816894531, 22.437118530273438, -0.5781173706054688, 51.58869171142578, 14.988914489746094, 12.95321273803711, 43.265830993652344, 17.849239349365234, 6.603334426879883, 5.940319061279297, 4.388023376464844, -0.7905044555664062, 28.430625915527344, 17.829313278198242, 39.451995849609375, 3.9504241943359375, 9.538116455078125, 20.426513671875, 11.20628547668457, 28.72693634033203, 23.179855346679688, 1.9594268798828125, -0.0967864990234375, 16.161048889160156, -16.57709503173828, -9.784292221069336, 27.1318359375, -7.2730865478515625, 33.699462890625, 11.852571487426758, -0.8039398193359375, 9.734918594360352, -29.09783935546875, 9.092849731445312, -9.048858642578125, -14.20489501953125, 2.8288002014160156, 0.592376708984375, 17.62176513671875, 30.873672485351562, 11.757560729980469, 6.954975128173828, 2.3274078369140625, -1.0046663284301758, 22.76508331298828, 13.897384643554688, -11.15768051147461, 27.29206657409668, 14.43728256225586, 26.3748779296875, 35.306427001953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000331.npy"}
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 10.560502052307129, "std": 13.984611511230469, "min": -14.351795196533203, "p10": -2.0376190185546874, "median": 6.92222785949707, "p90": 33.5091049194336, "max": 44.732627868652344, "pos_frac": 0.828125, "sample": [-0.8596229553222656, 3.9026641845703125, 5.029350280761719, -0.92645263671875, 8.002595901489258, 16.08563995361328, 9.747085571289062, 6.488164901733398, 2.5503997802734375, 7.230480194091797, 3.7640838623046875, 44.732627868652344, 34.120452880859375, 14.951248168945312, 36.39897155761719, 8.023475646972656, 10.198783874511719, 1.4914989471435547, 14.119392395019531, 6.3522186279296875, 29.13274383544922, 5.19781494140625, -10.379268646240234, 37.883541107177734, 15.356842041015625, 17.183502197265625, -10.16206169128418, 5.0837554931640625, 7.2752838134765625, 8.920944213867188, 0.02317047119140625, 30.346479415893555, 3.3956527709960938, 0.6711349487304688, 4.7679443359375, 5.441871643066406, 0.2932319641113281, 40.25116729736328, -12.088550567626953, -0.6301116943359375, 12.488388061523438, 26.8162841796875, 11.467531204223633, 4.589315414428711, 5.737464904785156, -1.965179443359375, 12.004844665527344, 22.593257904052734, 14.238189697265625, 8.182640075683594, 32.08262634277344, 6.613975524902344, 7.536319732666016, -2.06866455078125, 4.433082580566406, -11.620758056640625, 39.345703125, 5.142120361328125, 4.677398681640625, 30.134077072143555, 42.78460693359375, 11.690628051757812, -14.351795196533203, -6.0480499267578125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000332.npy"}
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 13.291183471679688, "std": 14.174331665039062, "min": -17.71136474609375, "p10": -1.526540184020996, "median": 9.793441772460938, "p90": 32.97906551361085, "max": 57.13300323486328, "pos_frac": 0.828125, "sample": [3.2563629150390625, 1.4097480773925781, 25.393951416015625, 34.03703308105469, 9.924148559570312, 1.5726909637451172, 17.22396469116211, 18.941986083984375, 4.558704376220703, 24.86785888671875, 28.6788330078125, 23.734020233154297, 13.93548583984375, 18.45524024963379, 26.688100814819336, 9.662734985351562, 9.457183837890625, 41.49443054199219, 15.432144165039062, 37.78352355957031, -1.1587982177734375, 4.847038269042969, 6.760730743408203, 5.045867919921875, 14.199195861816406, -8.741386413574219, -3.7688255310058594, 12.431804656982422, 8.547836303710938, 0.3601531982421875, 21.232711791992188, 33.5399169921875, -1.573638916015625, 7.085838317871094, 17.176963806152344, 19.75168228149414, 57.13300323486328, -0.34842681884765625, 0.9462013244628906, 7.268257141113281, -1.4166431427001953, 6.4904937744140625, -17.71136474609375, 6.722923278808594, 5.779045104980469, -4.5684967041015625, 36.46820068359375, 4.761619567871094, -0.9841384887695312, 13.310256958007812, 7.1279144287109375, -2.5370864868164062, 16.300878524780273, 38.165184020996094, 3.631927490234375, -8.56719970703125, 18.415191650390625, 2.8872756958007812, 31.670412063598633, 21.49828338623047, 27.553558349609375, 21.966087341308594, 29.0517578125, 27.375381469726562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000333.npy"}
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 8.708377838134766, "std": 15.52997875213623, "min": -31.7784423828125, "p10": -7.723351669311523, "median": 6.26947021484375, "p90": 30.20019226074219, "max": 45.58856201171875, "pos_frac": 0.75, "sample": [-12.871200561523438, -31.7784423828125, 10.694599151611328, 13.033000946044922, 8.141021728515625, 0.6678924560546875, -21.464859008789062, 29.519920349121094, 3.998046875, -23.469810485839844, 0.5175704956054688, 45.58856201171875, -7.89097785949707, 4.4192962646484375, 4.564460754394531, 14.614583969116211, 3.7539215087890625, -5.869295120239258, 7.320404052734375, 25.644481658935547, 27.590591430664062, 38.08807373046875, 4.817718505859375, 14.702981948852539, 22.24814224243164, -15.28024673461914, 6.497222900390625, 3.0547828674316406, 34.502933502197266, 3.9992904663085938, -0.8432254791259766, -2.324949264526367, 30.491737365722656, -6.63311767578125, 20.367645263671875, 32.614234924316406, 20.3447265625, 5.8241119384765625, 8.335273742675781, 3.9381256103515625, 6.534252166748047, -7.007598876953125, -1.2309722900390625, 3.4943084716796875, 31.834091186523438, -0.3833580017089844, 2.1159744262695312, 3.1479358673095703, -14.841873168945312, 26.162086486816406, 19.969009399414062, 9.275688171386719, 6.041717529296875, 10.009233474731445, -7.332223892211914, 12.071823120117188, 15.1705322265625, 31.047584533691406, -5.787506103515625, 3.7575035095214844, 20.467041015625, 17.798187255859375, 28.94829559326172, 24.60517120361328], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000334.npy"}
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 10.38405990600586, "std": 16.134397506713867, "min": -17.946819305419922, "p10": -9.340832519531249, "median": 6.035091400146484, "p90": 33.408789443969724, "max": 48.362457275390625, "pos_frac": 0.75, "sample": [17.051132202148438, 26.449615478515625, -8.469045639038086, 43.557952880859375, 4.519554138183594, 30.762847900390625, 19.895469665527344, -0.07355499267578125, 23.199493408203125, 12.5806884765625, 33.24189376831055, 27.90612030029297, 5.095794677734375, 6.929618835449219, 39.42230987548828, 28.802021026611328, 1.6210098266601562, 8.65013313293457, 3.3136444091796875, 41.33216857910156, -13.025516510009766, 6.1180572509765625, 12.188850402832031, 35.232276916503906, -8.945159912109375, 11.3148193359375, 22.427093505859375, -12.9600830078125, -10.63916015625, 23.033775329589844, 24.968414306640625, 2.35870361328125, 4.2277984619140625, 10.844169616699219, 20.23978042602539, -6.513553619384766, 37.18103790283203, -4.213840484619141, 3.33209228515625, 48.362457275390625, 2.190216064453125, 2.0286617279052734, -3.216064453125, 17.35950469970703, 9.295917510986328, 33.480316162109375, 1.0712203979492188, -12.63519287109375, 5.166461944580078, 0.2668304443359375, 4.2836151123046875, -7.398685455322266, -1.4458541870117188, -17.488937377929688, -17.946819305419922, -9.510406494140625, 24.940155029296875, 17.857242584228516, 5.952125549316406, 18.810497283935547, -5.663330078125, 1.0429763793945312, 4.104606628417969, 20.713897705078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000335.npy"}
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 9.697211265563965, "std": 16.83255958557129, "min": -20.8839111328125, "p10": -7.7735038757324215, "median": 6.286388397216797, "p90": 31.92379989624024, "max": 67.95567321777344, "pos_frac": 0.640625, "sample": [-10.022781372070312, 16.965835571289062, 25.40607452392578, -2.0546493530273438, 46.713134765625, 19.435134887695312, 14.289539337158203, -6.7313995361328125, 47.31988525390625, 1.29876708984375, 2.6150455474853516, -2.4157962799072266, 42.76487731933594, -4.805244445800781, 24.966079711914062, 5.59808349609375, -0.6175537109375, 15.625720977783203, 40.973846435546875, -11.39529037475586, -3.7054672241210938, 7.2628631591796875, 32.687782287597656, 1.3361282348632812, -2.658294677734375, 20.67117691040039, 7.245159149169922, -20.8839111328125, -7.4073028564453125, -15.428062438964844, 36.77058410644531, -7.808113098144531, 4.929296493530273, 9.968017578125, -0.2575645446777344, -1.283792495727539, 7.817832946777344, -7.6927490234375, 23.310720443725586, 18.028522491455078, 7.803718566894531, -3.5805740356445312, 3.5283050537109375, 5.891197204589844, 21.601383209228516, 9.034233093261719, 11.024459838867188, -3.9967041015625, 5.026060104370117, 30.14117431640625, 6.68157958984375, -0.168365478515625, 16.375869750976562, 25.4534912109375, 67.95567321777344, 15.47372055053711, -0.00811767578125, -1.759674072265625, 20.275367736816406, -9.008377075195312, 3.44085693359375, 16.258983612060547, 13.830337524414062, -9.485214233398438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000336.npy"}
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 8.650962829589844, "std": 14.847637176513672, "min": -23.071346282958984, "p10": -7.006930923461913, "median": 7.449005126953125, "p90": 28.05248012542725, "max": 52.648651123046875, "pos_frac": 0.6875, "sample": [2.8747634887695312, -2.3938140869140625, -18.186660766601562, -7.401222229003906, 9.8914794921875, 19.954086303710938, -3.9985389709472656, 27.30232810974121, 0.2774658203125, 2.7656707763671875, 36.892791748046875, 5.309700012207031, 13.780784606933594, -18.738927841186523, -3.331298828125, -2.8726577758789062, 9.759796142578125, 30.160125732421875, 4.7455902099609375, -3.4652481079101562, 5.427406311035156, -1.683908462524414, -12.50143051147461, -23.071346282958984, 19.348024368286133, 9.847434997558594, -3.47369384765625, 13.237377166748047, 11.051887512207031, 41.36412048339844, 9.549049377441406, -0.9339065551757812, 25.012537002563477, -6.086917877197266, 28.696304321289062, 28.373973846435547, 17.953636169433594, -1.3765106201171875, 2.794719696044922, 17.24967384338379, 25.371116638183594, 7.854820251464844, 15.045318603515625, 13.006759643554688, 3.7734222412109375, -3.613372802734375, 7.950136184692383, 7.5146942138671875, 18.775634765625, 3.4678878784179688, 17.22216796875, 52.648651123046875, -0.845733642578125, 1.849884033203125, 7.3833160400390625, 25.092391967773438, -0.8317546844482422, -8.793182373046875, 19.383668899536133, 17.01227378845215, -15.3575439453125, 4.00146484375, 38.50246047973633, 13.142524719238281], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000337.npy"}
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 8.576506614685059, "std": 14.037274360656738, "min": -26.403854370117188, "p10": -6.383086967468261, "median": 5.565528869628906, "p90": 26.82720279693604, "max": 39.9937629699707, "pos_frac": 0.71875, "sample": [-12.412384033203125, 2.4966392517089844, 16.436614990234375, 20.39287567138672, 22.650970458984375, 13.947990417480469, -19.848533630371094, 15.799049377441406, 4.447265625, 33.78767395019531, 39.9937629699707, 8.841423034667969, -3.7322311401367188, 7.2434539794921875, 4.267604827880859, -9.276268005371094, -0.20061492919921875, -17.01188087463379, 1.4951019287109375, 1.10369873046875, 5.663665771484375, 24.65583038330078, 5.360084533691406, -2.1962947845458984, 13.48553466796875, 8.761817932128906, -26.403854370117188, 27.381614685058594, 12.668350219726562, 19.924592971801758, 14.119636535644531, 25.339447021484375, 0.4026527404785156, 4.015933990478516, 7.505411148071289, 13.368751525878906, 15.06658935546875, 5.4673919677734375, 28.247230529785156, 1.378082275390625, -1.9366683959960938, 25.5335750579834, -3.2664413452148438, 17.46764373779297, 1.7876319885253906, 12.794355392456055, -7.49072265625, -5.695531845092773, 3.6164779663085938, -2.1616783142089844, 2.6990890502929688, 38.35833740234375, 16.951168060302734, -1.1663017272949219, 9.395938873291016, 20.545726776123047, -0.2310791015625, 34.340553283691406, 20.13391876220703, -6.677753448486328, -4.251836776733398, 4.996086120605469, 39.20520782470703, -0.6859550476074219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000338.npy"}
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 9.925113677978516, "std": 13.563788414001465, "min": -14.550186157226562, "p10": -7.21299705505371, "median": 5.8915557861328125, "p90": 29.132088851928714, "max": 43.846187591552734, "pos_frac": 0.78125, "sample": [17.066194534301758, 9.188529968261719, 3.1779022216796875, -2.3231658935546875, -3.2311935424804688, 29.398128509521484, 4.59539794921875, 8.745170593261719, 5.089328765869141, -7.891366958618164, 16.474090576171875, -6.2122344970703125, 5.724632263183594, 5.5308685302734375, 4.717319488525391, 5.944091796875, -12.339897155761719, 13.581039428710938, 29.589859008789062, -1.438241958618164, 27.913894653320312, 3.4227981567382812, 16.278690338134766, 16.43670654296875, 1.5353240966796875, 23.6417236328125, 4.244363784790039, 4.814762115478516, -9.485740661621094, 17.269458770751953, 21.243480682373047, 28.511329650878906, 40.1083984375, 43.846187591552734, 17.723114013671875, 2.4994239807128906, 16.07903289794922, -3.8867034912109375, 18.28704833984375, 14.326683044433594, 36.041656494140625, 5.839019775390625, -7.584377288818359, -6.346443176269531, 10.615875244140625, -1.0169296264648438, 3.7359771728515625, 14.164306640625, -14.550186157226562, -10.403350830078125, 2.4751148223876953, 2.5570449829101562, 14.825302124023438, 21.869482040405273, -11.286224365234375, 43.09339141845703, 4.757568359375, 8.488958358764648, 4.290771484375, 17.86957550048828, 14.263751983642578, 17.20999526977539, 2.075307846069336, 32.02522277832031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000339.npy"}
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 10.43388843536377, "std": 17.406991958618164, "min": -26.486957550048828, "p10": -9.686385345458984, "median": 8.294628143310547, "p90": 35.445206832885745, "max": 43.592132568359375, "pos_frac": 0.65625, "sample": [13.411788940429688, 31.858739852905273, 12.518909454345703, 14.578357696533203, -1.8151473999023438, 43.592132568359375, -2.7370052337646484, -1.5037822723388672, -2.842418670654297, 31.29583740234375, 0.13465118408203125, 20.076133728027344, 8.5516357421875, 19.117929458618164, 38.41168975830078, 22.79201889038086, 0.4927654266357422, -24.26959228515625, 22.474822998046875, 22.4869384765625, -7.573419570922852, -4.758113861083984, -17.562671661376953, 12.299005508422852, 0.43475341796875, 37.893035888671875, -3.0027122497558594, 29.62933349609375, 4.0790557861328125, 31.771224975585938, -12.345352172851562, 24.46289825439453, 1.725677490234375, 12.55316162109375, 34.62995910644531, 8.037620544433594, 34.92599868774414, -1.1567153930664062, -0.092254638671875, 36.21903610229492, 29.040864944458008, -3.3236751556396484, -2.0269775390625, 36.66839599609375, 26.777629852294922, -8.286649703979492, -16.581253051757812, -4.134700775146484, 5.8527679443359375, 32.46757507324219, 6.6339874267578125, 9.69343376159668, 17.306602478027344, -10.286272048950195, -26.486957550048828, -7.9994354248046875, 35.667724609375, -0.6901016235351562, 14.085920333862305, 37.26824188232422, 9.563167572021484, 2.0115890502929688, -13.321378707885742, 7.072418212890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000340.npy"}
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 10.442707061767578, "std": 15.658528327941895, "min": -23.551116943359375, "p10": -8.507427978515624, "median": 7.256977081298828, "p90": 30.096218872070313, "max": 42.40635681152344, "pos_frac": 0.734375, "sample": [-13.93295669555664, 23.577709197998047, 21.290122985839844, 17.513307571411133, 24.21720314025879, 21.511743545532227, 29.60088348388672, 37.120391845703125, 29.656646728515625, -10.84942626953125, 31.906524658203125, 3.9052391052246094, -13.984073638916016, -3.9806766510009766, 4.539649963378906, 2.5319557189941406, 20.71587371826172, 5.777153015136719, 27.123022079467773, 40.24230194091797, 25.033432006835938, 30.28460693359375, 30.8436279296875, 11.703699111938477, -23.551116943359375, 42.40635681152344, 26.056869506835938, -2.971874237060547, -1.5891189575195312, -4.493194580078125, 9.969284057617188, 6.8871917724609375, -4.739738464355469, -1.3090286254882812, 7.994354248046875, 25.777915954589844, 6.173908233642578, 6.439596176147461, 7.626762390136719, -17.518943786621094, -6.444496154785156, -5.641448974609375, 5.640342712402344, 20.371246337890625, 3.592395782470703, 37.947486877441406, 23.609481811523438, 12.326339721679688, 18.021591186523438, 23.298507690429688, 11.85870361328125, 6.742088317871094, 23.32787322998047, -6.9565887451171875, 3.0826034545898438, 21.135887145996094, 15.681427001953125, 5.5365447998046875, 1.9969005584716797, -9.172073364257812, -2.0392532348632812, 3.432098388671875, 4.787006378173828, -23.308555603027344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000341.npy"}
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 12.369542121887207, "std": 15.05530834197998, "min": -18.593305587768555, "p10": -4.6071434020996085, "median": 10.062614440917969, "p90": 33.15767612457276, "max": 49.13829803466797, "pos_frac": 0.765625, "sample": [10.412178039550781, 6.404266357421875, -4.890796661376953, -4.8622894287109375, 2.9932498931884766, 29.819454193115234, 4.037696838378906, 25.73431396484375, -6.9655609130859375, 24.524673461914062, 1.9398880004882812, 11.485610961914062, 0.19694900512695312, 30.713401794433594, 9.713050842285156, -5.4582061767578125, 2.8754329681396484, -18.593305587768555, -1.8060855865478516, -4.011802673339844, 7.130157470703125, 24.245426177978516, 26.563735961914062, 31.815481185913086, -6.406959533691406, 17.679969787597656, 10.886482238769531, 12.201608657836914, 22.56708526611328, 27.42502212524414, 11.24261474609375, 9.551841735839844, 3.76483154296875, 21.3404541015625, 19.743263244628906, -3.7729358673095703, 24.64842987060547, -1.5700531005859375, 3.145599365234375, 31.656906127929688, -2.4567489624023438, 37.95526123046875, 22.7589111328125, 11.384552001953125, -7.226015090942383, 40.87378692626953, 2.769207000732422, -2.420461654663086, 45.352569580078125, 36.952667236328125, 10.968948364257812, -0.039867401123046875, 8.348167419433594, 49.13829803466797, -3.8698959350585938, 33.73290252685547, 18.31879997253418, 1.2481155395507812, 4.025123596191406, 12.309452056884766, 18.856884002685547, 6.47056770324707, 0.21728515625, 37.861083984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000342.npy"}
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 7.1204609870910645, "std": 13.713807106018066, "min": -24.348037719726562, "p10": -9.125332450866697, "median": 5.944820404052734, "p90": 22.711157226562506, "max": 41.11137390136719, "pos_frac": 0.671875, "sample": [10.886932373046875, 5.0048370361328125, -1.932525634765625, -0.809722900390625, -1.868316650390625, 20.787439346313477, 5.943183898925781, -1.0333194732666016, -2.6791229248046875, -3.479827880859375, -20.527725219726562, -11.0074462890625, 13.8695068359375, 9.633293151855469, 37.439544677734375, -3.794017791748047, 23.213973999023438, 41.11137390136719, 30.088485717773438, 21.537918090820312, 9.085556030273438, 8.75851058959961, 17.492507934570312, 10.104930877685547, 1.2837486267089844, 12.372940063476562, 3.5371932983398438, -10.402732849121094, 13.53924560546875, -2.371957778930664, -15.690345764160156, 4.066566467285156, 19.357513427734375, 5.955810546875, 0.6474609375, 2.408843994140625, 19.594581604003906, 16.93915557861328, -22.536041259765625, 35.830322265625, -0.86639404296875, 6.2816925048828125, 16.083175659179688, 25.1214599609375, 7.0372467041015625, 5.9464569091796875, 30.958908081054688, 5.6394195556640625, 2.6976776123046875, 19.029006958007812, 19.391921997070312, -1.8648757934570312, 4.6275787353515625, 20.286102294921875, 14.051406860351562, 10.69757080078125, -5.20416259765625, -2.70574951171875, 16.78985595703125, -1.61962890625, -24.348037719726562, -12.685028076171875, -6.144731521606445, 4.150363922119141], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000343.npy"}
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 10.912022590637207, "std": 13.634291648864746, "min": -18.673208236694336, "p10": -5.115625381469727, "median": 10.452095031738281, "p90": 27.405974769592287, "max": 45.666717529296875, "pos_frac": 0.765625, "sample": [-7.6732177734375, 26.612390518188477, 5.817420959472656, 9.396627426147461, 26.284889221191406, -0.22146034240722656, 11.363967895507812, 11.828128814697266, 8.653648376464844, 16.398880004882812, 8.961883544921875, 3.877227783203125, 11.371347427368164, 18.541969299316406, 18.414039611816406, 22.54693603515625, 11.493026733398438, 45.666717529296875, 2.0511322021484375, 20.182815551757812, -4.724094390869141, -1.9015350341796875, -11.470359802246094, 36.032508850097656, -0.6155834197998047, -7.186302185058594, -10.744644165039062, 1.2955245971679688, 29.917068481445312, 15.765918731689453, -5.283424377441406, 34.952491760253906, 1.022216796875, 8.733650207519531, 18.175125122070312, 30.368995666503906, 27.746082305908203, 4.194061279296875, 10.575393676757812, -16.29355239868164, 4.149990081787109, 1.8804206848144531, -4.2851104736328125, -2.5575504302978516, 0.6908035278320312, -2.8016223907470703, 35.18315887451172, 21.363746643066406, 17.029403686523438, 11.492755889892578, 18.41566276550293, 4.568611145019531, 25.025360107421875, 10.191106796264648, 23.173887252807617, 0.5683689117431641, 10.32879638671875, 21.57786750793457, -0.7644805908203125, 21.08386993408203, -18.673208236694336, 25.93353271484375, 18.14905548095703, 24.517105102539062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000344.npy"}
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 10.031785011291504, "std": 14.879952430725098, "min": -24.07504653930664, "p10": -8.894298744201658, "median": 7.73614501953125, "p90": 30.49574966430664, "max": 46.38343811035156, "pos_frac": 0.734375, "sample": [9.578292846679688, 3.770692825317383, 43.39702606201172, -19.556690216064453, 3.7094192504882812, -1.348602294921875, 1.1209487915039062, -14.839424133300781, -0.9879608154296875, 2.326417922973633, 16.467472076416016, 30.996089935302734, 22.60507583618164, 18.0294246673584, -9.969085693359375, 11.379997253417969, -11.706428527832031, 15.326740264892578, 4.966453552246094, 23.813705444335938, 46.38343811035156, 4.910619735717773, 14.82159423828125, 18.4921875, 24.210241317749023, 0.4313812255859375, -0.649993896484375, 5.503995895385742, 0.8274078369140625, 5.45513916015625, 18.288414001464844, 30.958999633789062, 22.720855712890625, 13.539201736450195, 14.755462646484375, -11.729433059692383, 36.38819122314453, 30.244918823242188, 24.71038055419922, -2.1053123474121094, 25.350067138671875, -1.5880355834960938, -9.889793395996094, 33.33760070800781, 14.673095703125, -6.571477890014648, 5.7957305908203125, -5.887260437011719, 7.5171356201171875, 4.233894348144531, 20.26483154296875, 7.9551544189453125, -1.3255882263183594, 12.678741455078125, 2.3272933959960938, 22.276199340820312, -0.9519290924072266, -24.07504653930664, -3.011058807373047, 2.7410812377929688, 22.773481369018555, 11.842926025390625, 23.726665496826172, 30.603248596191406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000345.npy"}
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 11.793712615966797, "std": 15.599407196044922, "min": -25.31751251220703, "p10": -4.655356979370117, "median": 8.195062637329102, "p90": 33.162557983398436, "max": 41.84022903442383, "pos_frac": 0.78125, "sample": [27.551084518432617, 35.59646987915039, 19.618194580078125, 31.00397491455078, 32.5611572265625, 13.256942749023438, 6.739044189453125, 15.70241928100586, 31.92461395263672, 31.744991302490234, 32.05950164794922, 2.235809326171875, 13.76824951171875, 26.66552734375, -4.687202453613281, 41.84022903442383, -13.048599243164062, 2.6233253479003906, -2.1771812438964844, 8.49554443359375, 12.754085540771484, 6.630058288574219, 16.318740844726562, 6.215740203857422, 19.799013137817383, 38.173126220703125, -25.31751251220703, 13.064680099487305, 0.7463455200195312, -18.716796875, 1.885650634765625, -12.108085632324219, 24.466846466064453, -4.494373321533203, 8.148731231689453, 5.430458068847656, 3.2866477966308594, -3.9878387451171875, 32.6639404296875, 33.376251220703125, 35.775360107421875, 28.65353775024414, 1.7744140625, 27.502059936523438, 5.655467987060547, 7.138114929199219, 33.715721130371094, 8.24139404296875, -6.896533966064453, 13.037918090820312, 3.9226531982421875, 0.87908935546875, 8.459779739379883, -2.020721435546875, -4.581050872802734, 2.35430908203125, 7.6558837890625, 20.683837890625, -11.631500244140625, 36.698951721191406, -2.972515106201172, -0.4903411865234375, 7.713768005371094, 21.71820068359375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000346.npy"}
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 8.804999351501465, "std": 14.882699012756348, "min": -21.9727783203125, "p10": -8.848452377319335, "median": 6.888058662414551, "p90": 31.52052574157715, "max": 49.897605895996094, "pos_frac": 0.703125, "sample": [6.883138656616211, 4.9456939697265625, 10.495841979980469, 18.859848022460938, -21.9727783203125, 3.6394119262695312, 15.93096923828125, 9.78333854675293, 17.46034812927246, 23.278797149658203, -3.842071533203125, 31.754398345947266, 10.3536376953125, -9.52325439453125, -12.506515502929688, 49.897605895996094, 8.608261108398438, 12.43764877319336, -9.951446533203125, 1.5543212890625, 7.7604217529296875, 5.75556755065918, 22.61431121826172, 1.0286636352539062, -13.480476379394531, 2.6519126892089844, 18.346054077148438, 24.311241149902344, 34.90242004394531, -6.602546691894531, -0.29038238525390625, -2.6217708587646484, -11.128318786621094, 11.727134704589844, -6.569099426269531, 7.033840179443359, -7.273914337158203, 12.902023315429688, -2.869140625, 5.932929992675781, 15.600141525268555, 0.36977386474609375, -1.9446868896484375, 12.518318176269531, 7.787906646728516, 38.50202941894531, -5.038204193115234, -11.053993225097656, 42.777809143066406, 39.734825134277344, 30.974822998046875, 16.988574981689453, 13.761260986328125, 6.2448272705078125, -7.077608108520508, 19.871990203857422, 6.892978668212891, 5.830631256103516, 3.897613525390625, -3.3577194213867188, -0.07061386108398438, 2.895641326904297, 31.946983337402344, 23.248579025268555], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000347.npy"}
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 12.487504959106445, "std": 17.5072021484375, "min": -13.235458374023438, "p10": -5.917170524597168, "median": 6.8354644775390625, "p90": 34.247301483154295, "max": 56.64299011230469, "pos_frac": 0.71875, "sample": [34.359954833984375, 33.98444366455078, 7.429473876953125, 2.3798980712890625, 31.03436279296875, 14.609405517578125, 3.02392578125, -3.8832626342773438, -13.235458374023438, 0.5294189453125, 47.15410614013672, -4.2144317626953125, 10.925247192382812, 21.14118194580078, 27.232452392578125, 30.123655319213867, 6.723804473876953, 48.35131072998047, 6.947124481201172, 43.6494255065918, 2.277721405029297, 24.503711700439453, 30.781822204589844, 1.5380630493164062, 9.545875549316406, 28.092586517333984, -1.9677047729492188, 29.756786346435547, -7.354471206665039, -11.728706359863281, 15.527320861816406, 0.7264022827148438, 3.558847427368164, 46.248260498046875, 5.808738708496094, -7.468114852905273, 14.395687103271484, -6.2524261474609375, 0.6227645874023438, 10.409051895141602, 31.575424194335938, 55.96678924560547, 2.065765380859375, -2.674173355102539, 26.685157775878906, -1.1824092864990234, -6.121213912963867, -5.017833709716797, -2.1709136962890625, 13.353534698486328, -1.6484642028808594, 56.64299011230469, -6.076532363891602, 18.2590389251709, 27.1376895904541, 0.9635391235351562, 22.44093132019043, 4.471000671386719, -3.1383590698242188, 2.658384323120117, -5.545326232910156, 10.939666748046875, 23.749061584472656, -1.4217033386230469], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000348.npy"}
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 11.893203735351562, "std": 14.931480407714844, "min": -30.36003875732422, "p10": -4.9863695144653315, "median": 11.667811393737793, "p90": 28.98414344787598, "max": 49.10261535644531, "pos_frac": 0.765625, "sample": [-5.771640777587891, 17.8267822265625, 27.911773681640625, 11.462120056152344, 7.897096633911133, 13.288114547729492, 13.533084869384766, 27.976665496826172, 11.802801132202148, 12.392194747924805, -5.073453903198242, 3.2660694122314453, 21.874996185302734, 5.963569641113281, 13.648048400878906, 23.9975643157959, 25.151384353637695, -11.881179809570312, -30.36003875732422, 26.340574264526367, 9.935600280761719, 27.8294677734375, -2.5930633544921875, 6.8198699951171875, 21.716991424560547, 1.4950485229492188, 17.142120361328125, 17.2750244140625, 1.999542236328125, 28.34552001953125, 23.817901611328125, 18.572620391845703, -2.7634429931640625, 15.35584831237793, -0.3708953857421875, 33.67870330810547, 7.441598892211914, 47.82598876953125, -8.401817321777344, 13.08053207397461, 39.74315643310547, -1.4448471069335938, 3.3249053955078125, 0.41168212890625, 12.592350006103516, 35.2977294921875, 16.92938232421875, -0.004795074462890625, -4.783172607421875, -3.6142120361328125, 18.31841278076172, 9.533798217773438, -5.739521026611328, -0.8556137084960938, 9.385862350463867, 39.92726135253906, 1.48052978515625, 6.242408752441406, -9.194297790527344, 29.25783920288086, 0.21176528930664062, 11.532821655273438, 14.059310913085938, 49.10261535644531], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000349.npy"}
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 13.733760833740234, "std": 15.550896644592285, "min": -20.499427795410156, "p10": -5.132436370849609, "median": 13.127620697021484, "p90": 32.24853591918946, "max": 48.68812561035156, "pos_frac": 0.796875, "sample": [41.271888732910156, 3.1802139282226562, 15.257633209228516, -4.143867492675781, 18.475234985351562, 7.489198684692383, 32.53009033203125, 30.077499389648438, 15.220573425292969, 28.14552116394043, 8.687850952148438, -3.4698028564453125, -1.788238525390625, 20.201675415039062, 13.238319396972656, 20.4830322265625, 26.45629119873047, -11.515083312988281, -5.418083190917969, -4.4659271240234375, 23.08587646484375, 26.162979125976562, -11.823585510253906, 12.163238525390625, 13.620214462280273, -3.7882156372070312, -20.499427795410156, 26.810623168945312, 0.8768138885498047, -0.014469146728515625, 23.085411071777344, -8.153968811035156, 22.0740966796875, 4.061737060546875, 23.278018951416016, 31.591575622558594, 11.607307434082031, 29.893814086914062, 35.63591003417969, 48.68812561035156, 11.795438766479492, 48.289947509765625, 45.63447952270508, 1.8186416625976562, 5.771522521972656, 3.9486732482910156, 13.016921997070312, 2.0755615234375, -6.260204315185547, 24.65428924560547, 10.97161865234375, 18.94870376586914, 1.336700439453125, 13.97674560546875, 7.129890441894531, -9.887481689453125, 11.1519775390625, 10.576766967773438, 2.2608718872070312, 16.7421875, 43.16986846923828, 14.260448455810547, 22.120080947875977, 27.186954498291016], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000350.npy"}
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 10.779518127441406, "std": 15.52985954284668, "min": -38.233116149902344, "p10": -6.989676666259764, "median": 12.739622116088867, "p90": 30.342960166931157, "max": 34.57356262207031, "pos_frac": 0.765625, "sample": [-2.5571537017822266, 14.376670837402344, 31.26099395751953, 26.58342742919922, -1.3295097351074219, 23.01132583618164, -5.402568817138672, 15.918533325195312, 32.157554626464844, 6.85211181640625, 15.939788818359375, 29.53594970703125, 8.358888626098633, 6.2625274658203125, 7.139965057373047, -9.698966979980469, 7.5373382568359375, 2.5802993774414062, 24.940410614013672, 10.399307250976562, 15.559814453125, -8.558113098144531, 33.552738189697266, 10.103790283203125, 25.037979125976562, 24.967361450195312, 17.361671447753906, 22.012107849121094, -28.32567596435547, 7.918365478515625, 26.187458038330078, -7.512596130371094, 27.117706298828125, 1.9172134399414062, 17.141178131103516, -15.55501937866211, -5.76953125, 31.725662231445312, 10.763420104980469, 13.033348083496094, 17.92139434814453, 34.57356262207031, 8.479873657226562, -5.0171661376953125, 24.809127807617188, 8.326866149902344, 16.16912078857422, 14.52130126953125, 1.2346343994140625, 13.724899291992188, -4.922996520996094, -1.4938888549804688, 24.23584747314453, 17.544057846069336, 30.68882179260254, 5.1696319580078125, 12.44589614868164, 33.84571838378906, -38.233116149902344, 10.900365829467773, -29.5289306640625, 14.740036010742188, 17.249479293823242, -0.0411224365234375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000351.npy"}
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 10.232229232788086, "std": 14.304128646850586, "min": -18.479751586914062, "p10": -5.451117706298827, "median": 6.882170677185059, "p90": 28.757390785217286, "max": 53.95280456542969, "pos_frac": 0.78125, "sample": [18.366348266601562, 53.95280456542969, 38.64423370361328, 3.5916900634765625, 24.930015563964844, -18.479751586914062, -4.313377380371094, 4.075780868530273, 3.8841209411621094, 10.478206634521484, 11.71674919128418, 17.994277954101562, -5.938720703125, -0.39066505432128906, 15.851943969726562, 14.74566650390625, -3.25360107421875, -1.53924560546875, -8.1612548828125, 6.242500305175781, 25.826171875, 11.37152099609375, 7.8645782470703125, 34.071468353271484, 28.91473388671875, 16.372650146484375, 9.233871459960938, -13.90681266784668, 6.291166305541992, -3.7940845489501953, 2.405567169189453, 4.475917816162109, 5.973762512207031, 7.5598602294921875, 6.1151580810546875, 17.560718536376953, 4.249334335327148, 20.66619873046875, 26.967803955078125, 10.34796142578125, 24.14598846435547, -0.5086746215820312, 53.135406494140625, 2.3375186920166016, 4.185600280761719, 28.390256881713867, -6.645790100097656, 4.1605224609375, -2.5560131072998047, 19.84573745727539, 2.34515380859375, 7.473175048828125, 14.736953735351562, 29.57210922241211, 20.834060668945312, 0.682342529296875, -8.986183166503906, 29.260879516601562, -10.149646759033203, 2.3100128173828125, 0.7341957092285156, 13.073633193969727, 5.280797958374023, 10.239358901977539], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000352.npy"}
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 11.329439163208008, "std": 15.92221450805664, "min": -45.30358123779297, "p10": -4.100150299072265, "median": 9.025703430175781, "p90": 32.63644828796387, "max": 55.123268127441406, "pos_frac": 0.78125, "sample": [20.652074813842773, 29.493301391601562, -12.87057876586914, 13.582210540771484, -10.733318328857422, 7.607212066650391, 5.908760070800781, -9.903121948242188, 34.60472106933594, 11.722396850585938, 0.5248146057128906, 16.414527893066406, 22.61188507080078, 2.70855712890625, 37.9085807800293, -0.9102306365966797, 10.134811401367188, 15.748329162597656, 24.858642578125, 34.546287536621094, 31.585174560546875, 31.36901092529297, 1.7903175354003906, 23.560630798339844, -0.6504974365234375, -1.7270393371582031, 35.21087646484375, 19.40485382080078, 33.440673828125, 10.69630241394043, 12.942207336425781, 2.6366539001464844, 15.56589126586914, -1.6691474914550781, -45.30358123779297, 24.3123722076416, -11.301876068115234, -7.580436706542969, 21.495750427246094, 31.034433364868164, 8.666072845458984, 31.052772521972656, 55.123268127441406, 6.355735778808594, 8.7745361328125, 6.685649871826172, 12.701950073242188, 5.512277603149414, 2.2542800903320312, 5.1985015869140625, 4.557485580444336, 1.852691650390625, -0.9943313598632812, 13.040565490722656, 4.160005569458008, 6.709419250488281, 9.276870727539062, 0.7321014404296875, -2.3382720947265625, -2.9823837280273438, 21.255462646484375, -4.579193115234375, 33.08699417114258, 17.559188842773438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000353.npy"}
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 11.625505447387695, "std": 15.191024780273438, "min": -22.26590919494629, "p10": -9.013462448120118, "median": 10.816919326782227, "p90": 30.290282058715825, "max": 48.2458381652832, "pos_frac": 0.78125, "sample": [44.79444885253906, 21.630550384521484, -8.750911712646484, 15.266010284423828, -14.081886291503906, 5.333307266235352, 12.4176025390625, 11.6051025390625, 26.110336303710938, 0.6605148315429688, 18.408109664916992, 28.341487884521484, 19.86690902709961, 2.2360687255859375, 4.267356872558594, 0.21722793579101562, 17.478721618652344, -6.984897613525391, 34.94482421875, 48.2458381652832, 29.12705421447754, 23.46001434326172, 8.734054565429688, 14.683769226074219, -22.26590919494629, 0.24757766723632812, -11.44873046875, 26.375873565673828, 10.932029724121094, 11.191612243652344, 19.577552795410156, 27.18069839477539, 26.157012939453125, 14.859649658203125, 10.198745727539062, 3.73126220703125, 10.70180892944336, -11.388742446899414, 35.25492477416992, -0.3778228759765625, 17.57147979736328, 34.53451156616211, -9.125984191894531, -10.995948791503906, 29.540863037109375, -13.831764221191406, 19.576553344726562, 30.669830322265625, 6.807491302490234, 5.410980224609375, 5.772834777832031, 9.037271499633789, 24.475971221923828, 30.611461639404297, 9.890979766845703, 7.201923370361328, 1.7714462280273438, -0.9694004058837891, -3.594940185546875, 14.472942352294922, 3.858724594116211, 28.832469940185547, -6.402191162109375, -0.024280548095703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000354.npy"}
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 12.20285701751709, "std": 16.20931625366211, "min": -25.2384033203125, "p10": -8.598960876464842, "median": 12.527390480041504, "p90": 33.788129806518555, "max": 46.22613525390625, "pos_frac": 0.71875, "sample": [12.791465759277344, 40.627540588378906, -1.0766983032226562, 15.460136413574219, 23.6478271484375, 18.237533569335938, 5.9995269775390625, -10.845182418823242, -5.944419860839844, 6.794059753417969, 21.64185333251953, -6.2330322265625, 17.009809494018555, -9.835235595703125, 20.322853088378906, 24.107547760009766, -3.5471763610839844, 16.61624526977539, -15.298171997070312, 10.850410461425781, -15.566940307617188, 13.988910675048828, 20.003372192382812, 27.83526611328125, 30.407838821411133, 9.317960739135742, 12.263315200805664, 14.531837463378906, 28.988487243652344, 46.22613525390625, -9.612930297851562, 11.39181137084961, -1.7152423858642578, 31.304107666015625, 9.945594787597656, -16.796531677246094, 19.742599487304688, -4.395877838134766, 9.094520568847656, 27.46091651916504, 9.787651062011719, 4.074676513671875, 39.56578826904297, 0.7735443115234375, 4.875020980834961, 33.81513214111328, 25.801788330078125, 3.3462753295898438, 43.07863998413086, -25.2384033203125, 33.72512435913086, 18.955917358398438, 25.074203491210938, -3.2930221557617188, 38.5482063293457, -2.396577835083008, -5.220161437988281, -0.9951496124267578, 35.67790985107422, -0.46660614013671875, 20.250289916992188, 5.6500244140625, 12.799530029296875, 17.051002502441406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000355.npy"}
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 10.045708656311035, "std": 18.19095802307129, "min": -38.16255569458008, "p10": -12.002886772155756, "median": 8.651374816894531, "p90": 33.507154846191405, "max": 45.05412292480469, "pos_frac": 0.75, "sample": [45.05412292480469, 31.049394607543945, 6.290336608886719, -25.42646026611328, 26.79969596862793, 3.8453445434570312, 3.0433349609375, 25.609085083007812, 15.647424697875977, 2.8578262329101562, -23.55950927734375, -35.01007843017578, 8.104873657226562, -14.07769775390625, 34.07147216796875, 35.893699645996094, 6.017616271972656, 9.115997314453125, 33.585601806640625, -0.92559814453125, 22.464523315429688, -38.16255569458008, 21.30406951904297, 8.963516235351562, 8.3392333984375, 33.33287048339844, -6.430202484130859, -4.99737548828125, 7.853481292724609, 10.695365905761719, 7.548614501953125, 6.223014831542969, -5.001535415649414, 0.7932281494140625, -15.174888610839844, 18.026718139648438, 20.173503875732422, -6.626882553100586, 18.5592041015625, 28.72185516357422, 5.664482116699219, 2.602142333984375, 31.05411148071289, -2.867462158203125, 9.838874816894531, 12.33381462097168, 32.783390045166016, 9.511634826660156, -7.161661148071289, 17.797500610351562, -0.6722450256347656, 13.477933883666992, 39.14653015136719, 29.963668823242188, 0.47756195068359375, 19.937789916992188, 7.671945571899414, 2.7203750610351562, 33.58184814453125, -1.98919677734375, -23.124134063720703, 43.7529296875, 17.523658752441406, 24.3076171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000356.npy"}
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 8.766051292419434, "std": 15.397314071655273, "min": -26.511375427246094, "p10": -11.499377059936522, "median": 6.9888715744018555, "p90": 28.692305755615237, "max": 45.802215576171875, "pos_frac": 0.765625, "sample": [27.980422973632812, -2.9630813598632812, 24.09803009033203, -4.0431976318359375, -22.166221618652344, 43.430328369140625, 5.191986083984375, 9.826690673828125, 7.894317626953125, 7.130424499511719, -1.72686767578125, 24.75641632080078, -7.591255187988281, 9.048355102539062, 45.802215576171875, 10.068817138671875, 32.06159210205078, 13.148101806640625, 19.9665584564209, 5.284687042236328, 15.204261779785156, 5.8373565673828125, 9.170970916748047, 3.9582691192626953, 3.902008056640625, 9.25146484375, 2.662168502807617, 0.6080150604248047, -11.861846923828125, 9.750431060791016, -22.123817443847656, 3.323240280151367, -10.653614044189453, 11.274818420410156, 18.179885864257812, 34.88414001464844, 25.727340698242188, 17.905487060546875, 24.369247436523438, 36.233421325683594, -1.7337646484375, 11.872787475585938, 6.109718322753906, -1.1882705688476562, 32.2275390625, 10.988082885742188, 0.19692230224609375, -1.9621505737304688, -13.425792694091797, 5.78125, 6.847318649291992, 21.407882690429688, 23.20367431640625, 4.245647430419922, 23.287521362304688, 4.064239501953125, 1.500650405883789, 5.30804443359375, 28.997398376464844, -18.28731918334961, -14.819610595703125, -26.511375427246094, 17.6256103515625, 0.48969459533691406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000357.npy"}
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 12.789813995361328, "std": 15.90260124206543, "min": -18.72807502746582, "p10": -5.162638854980469, "median": 11.107403755187988, "p90": 31.65370750427246, "max": 47.25343704223633, "pos_frac": 0.78125, "sample": [27.47259521484375, 16.980865478515625, 23.795658111572266, 31.29277801513672, -3.041788101196289, 33.942543029785156, 30.868274688720703, 12.025177001953125, 4.161083221435547, 4.9173126220703125, -4.651252746582031, 16.07905387878418, 7.976936340332031, 19.377655029296875, 31.808391571044922, 5.95123291015625, -12.974075317382812, 15.300399780273438, 22.175811767578125, 11.94493293762207, 20.049652099609375, 3.1547317504882812, -2.256561279296875, 28.84077262878418, 23.146894454956055, -5.1743621826171875, 3.5289154052734375, -0.469146728515625, -5.135284423828125, 0.22343826293945312, -14.606971740722656, 7.8801727294921875, -6.039337158203125, 2.922433853149414, -4.273124694824219, 17.7403564453125, 17.489803314208984, 22.21862030029297, 2.9673690795898438, -1.6425495147705078, 41.12535095214844, 31.158782958984375, 40.03570556640625, 2.513774871826172, 46.786582946777344, 47.25343704223633, 26.486244201660156, -7.118724822998047, 5.49273681640625, 29.58990478515625, 3.354665756225586, -18.72807502746582, -17.101924896240234, 26.42333984375, 1.611663818359375, 20.936893463134766, 4.910732269287109, 9.443031311035156, 3.28387451171875, 17.006587982177734, 15.124006271362305, 28.92284393310547, 10.269874572753906, 43.79737091064453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000358.npy"}
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 6.965500831604004, "std": 14.246461868286133, "min": -27.483306884765625, "p10": -11.063537216186521, "median": 5.962716102600098, "p90": 24.989355468750006, "max": 38.56907653808594, "pos_frac": 0.703125, "sample": [-9.620735168457031, 18.983383178710938, -4.957366943359375, 12.206924438476562, 14.10226821899414, 4.370521545410156, -11.681880950927734, 3.7048568725585938, -6.092754364013672, 1.961761474609375, 35.188140869140625, 6.553232192993164, 1.940826416015625, 16.434791564941406, -4.64100456237793, 13.428497314453125, 27.50569725036621, -6.157794952392578, 11.420387268066406, -23.842811584472656, 2.0999298095703125, 3.8802223205566406, 3.37493896484375, -15.357177734375, 6.68475341796875, 27.0626220703125, 9.463005065917969, 7.584133148193359, 12.08477783203125, -20.888168334960938, 23.495574951171875, 38.56907653808594, -12.23199462890625, 23.194015502929688, 2.7686691284179688, 4.772064208984375, 9.340545654296875, 12.66360092163086, 7.0195770263671875, 19.724700927734375, -6.826652526855469, 22.58588981628418, 28.312631607055664, 23.00996208190918, 5.372200012207031, 3.840423583984375, -4.276817321777344, -27.483306884765625, -4.916038513183594, 11.292137145996094, 5.116855621337891, -5.094915390014648, -13.613357543945312, -1.6590690612792969, 21.064346313476562, -0.5039405822753906, 21.820777893066406, 11.369171142578125, 9.274551391601562, 25.629547119140625, 17.403736114501953, 37.400150299072266, -0.4319267272949219, 0.993927001953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000359.npy"}
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 10.797454833984375, "std": 15.338611602783203, "min": -17.943161010742188, "p10": -6.113893508911133, "median": 8.454153060913086, "p90": 32.89033508300781, "max": 54.82928466796875, "pos_frac": 0.765625, "sample": [7.625091552734375, 50.96305847167969, -6.0036773681640625, 8.976188659667969, 8.448295593261719, 23.74536895751953, -6.998756408691406, 10.001399993896484, -1.9107398986816406, 17.752243041992188, -3.101633071899414, -5.9038543701171875, 0.49813079833984375, -6.161128997802734, 25.50103759765625, -3.5757904052734375, 41.09808349609375, -12.06576919555664, 8.460010528564453, 24.89483642578125, 8.448062896728516, 2.294790267944336, -17.943161010742188, 2.6011581420898438, 8.694259643554688, 6.652313232421875, 21.574951171875, 9.947257995605469, 10.852970123291016, 11.473129272460938, 11.113433837890625, 6.350002288818359, 11.911136627197266, -9.671844482421875, 5.080780029296875, 16.04627227783203, -9.426383972167969, 14.319091796875, 26.949249267578125, 3.7276763916015625, 54.82928466796875, 0.8075408935546875, 16.41667938232422, -2.9828548431396484, 33.743743896484375, 15.365425109863281, 33.08751678466797, 7.679508209228516, 5.8921661376953125, 47.036373138427734, 32.43024444580078, 18.3184814453125, 34.428653717041016, -3.5965137481689453, 1.3054561614990234, 13.216865539550781, 3.4657764434814453, -9.622028350830078, 3.247671127319336, 26.765037536621094, 4.9732208251953125, -2.9013214111328125, 11.720291137695312, 22.17236328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000360.npy"}
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 12.380705833435059, "std": 15.359463691711426, "min": -31.708595275878906, "p10": -4.4069084167480455, "median": 13.282328605651855, "p90": 29.80356025695801, "max": 57.76325225830078, "pos_frac": 0.8125, "sample": [12.499275207519531, 17.47576904296875, 0.4517230987548828, -13.228504180908203, 33.47956848144531, 15.361478805541992, -6.2486724853515625, -13.523433685302734, 7.646198272705078, 9.125297546386719, 7.7874908447265625, -17.496456146240234, -0.5900650024414062, 26.733383178710938, 27.1446533203125, 19.097023010253906, -3.0623703002929688, 4.419071197509766, 3.587188720703125, 5.6034393310546875, 17.020172119140625, 22.22583770751953, 12.533348083496094, 31.588096618652344, 18.99274444580078, 15.08636474609375, 17.45421600341797, -31.708595275878906, 31.778846740722656, 14.55105972290039, 19.27303695678711, 1.9959259033203125, 23.164714813232422, 55.40123748779297, 25.49615478515625, 25.1510009765625, 27.93197250366211, 14.481756210327148, -4.9831390380859375, -2.5693283081054688, -2.124370574951172, 1.9277114868164062, 30.03675079345703, 23.156658172607422, 16.232524871826172, 29.259449005126953, 17.735849380493164, 6.855796813964844, -7.711250305175781, 5.7584686279296875, 23.99542236328125, 7.861480712890625, 17.038658142089844, 12.02181625366211, 57.76325225830078, 7.562772750854492, 30.439605712890625, 9.208305358886719, -2.858489990234375, 14.107088088989258, 6.611202239990234, 14.031309127807617, 0.11387252807617188, 2.213775634765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000361.npy"}
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 12.08340835571289, "std": 15.929220199584961, "min": -21.452289581298828, "p10": -3.7140289306640617, "median": 10.333637237548828, "p90": 31.890613555908207, "max": 71.61639404296875, "pos_frac": 0.796875, "sample": [-4.102363586425781, 10.801481246948242, 2.9862823486328125, 0.8026504516601562, 7.920330047607422, -21.452289581298828, 18.910850524902344, 3.5459442138671875, 71.61639404296875, 11.75128173828125, 0.8512496948242188, -2.2234039306640625, 16.830612182617188, 3.59783935546875, -1.2907600402832031, -1.4098663330078125, 10.82049560546875, 18.89947509765625, 8.684844970703125, 28.87530517578125, -1.4078960418701172, 7.5869293212890625, 9.865793228149414, 7.434295654296875, 28.638885498046875, 8.248130798339844, 30.019641876220703, 2.92718505859375, -1.2564449310302734, 20.746692657470703, 44.4478759765625, 6.450538635253906, 0.31792449951171875, 19.990314483642578, 4.632881164550781, 23.72027587890625, 13.663764953613281, 5.6321868896484375, 14.168960571289062, -2.8079147338867188, 3.8895397186279297, 39.99988555908203, 30.66150665283203, 34.64773941040039, 12.975875854492188, 3.01385498046875, 26.62527847290039, 32.41737365722656, 21.02448272705078, 16.712570190429688, -12.463930130004883, 16.762115478515625, 20.912994384765625, -13.203174591064453, 8.558685302734375, 13.468742370605469, 32.978363037109375, -11.08148193359375, -9.045478820800781, 11.820159912109375, 40.88341522216797, -14.11130142211914, 22.56163787841797, 13.892929077148438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000362.npy"}
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 12.225383758544922, "std": 14.739978790283203, "min": -19.873371124267578, "p10": -5.703240394592284, "median": 10.550129890441895, "p90": 31.50690097808838, "max": 51.2327880859375, "pos_frac": 0.796875, "sample": [19.147884368896484, -15.062995910644531, 34.917686462402344, 10.833137512207031, 10.609283447265625, 8.152923583984375, 4.764690399169922, 42.57661437988281, 3.1005859375, -8.151609420776367, -4.913642883300781, 21.032440185546875, 51.2327880859375, 29.697357177734375, 2.5944442749023438, 38.52104949951172, 5.500789642333984, 3.570779800415039, 11.231193542480469, 30.82938003540039, 2.9639739990234375, 14.505548477172852, 14.923059463500977, 36.37739562988281, 29.926918029785156, 10.490976333618164, 4.928955078125, 7.529815673828125, -4.783119201660156, 3.0896682739257812, -6.04163932800293, -19.873371124267578, 15.34930419921875, -10.261882781982422, 31.177730560302734, 16.414304733276367, 5.530656814575195, 20.13079833984375, 8.160493850708008, -2.134449005126953, 6.404972076416016, 4.812835693359375, 18.37639617919922, 25.649200439453125, 12.530584335327148, 35.021202087402344, 22.329036712646484, 8.222152709960938, 27.17120361328125, -1.9138870239257812, 31.647974014282227, 3.071533203125, 16.46541404724121, 12.516927719116211, -9.542604446411133, -6.168426513671875, 27.603073120117188, -1.1220741271972656, 29.09307861328125, 8.485965728759766, -1.5632648468017578, 5.979194641113281, 16.43805694580078, 12.326051712036133], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000363.npy"}
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 12.851823806762695, "std": 18.047710418701172, "min": -31.38147735595703, "p10": -5.436578369140625, "median": 8.863838195800781, "p90": 37.105630493164064, "max": 53.361846923828125, "pos_frac": 0.765625, "sample": [35.94354248046875, -1.7165870666503906, 3.0910110473632812, 47.97389221191406, 37.603668212890625, 4.356025695800781, 15.832027435302734, 17.81090545654297, -4.196907043457031, 33.75260925292969, 53.361846923828125, 13.625198364257812, -5.508209228515625, -8.041595458984375, 26.966720581054688, 44.45586395263672, 8.465469360351562, 0.7026100158691406, 52.55378723144531, 4.0254974365234375, 38.662139892578125, 7.5038604736328125, 15.561813354492188, 24.558746337890625, -20.659637451171875, -18.09515953063965, 41.689170837402344, -1.5607986450195312, 1.027170181274414, 15.586830139160156, 30.214872360229492, 19.530357360839844, -0.5705642700195312, 8.264945983886719, 8.836990356445312, 1.9633941650390625, 22.20492172241211, 24.6585693359375, -2.7881011962890625, 3.5409412384033203, 14.171684265136719, 17.764930725097656, -3.6841812133789062, -1.854461669921875, 29.43017578125, 4.495319366455078, 1.8830032348632812, 20.044570922851562, 4.506046295166016, 34.39493942260742, -5.269439697265625, 25.487356185913086, -17.91686248779297, 25.768211364746094, -31.38147735595703, 1.8132781982421875, 19.68187713623047, 20.171768188476562, 0.03351593017578125, 8.89068603515625, 4.622676849365234, -6.997711181640625, 35.37474822998047, 19.898269653320312], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000364.npy"}
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 12.087200164794922, "std": 13.139313697814941, "min": -20.206756591796875, "p10": -2.215242385864258, "median": 10.239275932312012, "p90": 28.60113506317139, "max": 44.67147445678711, "pos_frac": 0.828125, "sample": [1.5454883575439453, 13.720010757446289, 13.314247131347656, 7.330028533935547, 9.778560638427734, 5.594738006591797, 19.47820281982422, 22.694622039794922, 30.002761840820312, 17.9052734375, 15.461181640625, 11.963157653808594, -20.206756591796875, 10.162595748901367, 9.624580383300781, 18.779605865478516, -2.8791427612304688, 8.085033416748047, -0.32271575927734375, -1.5043087005615234, 15.43267822265625, 10.315956115722656, 19.453994750976562, 3.9485034942626953, -0.15306663513183594, -10.984931945800781, 9.374717712402344, 43.68694305419922, 3.5654525756835938, 21.66124725341797, 15.17608642578125, 8.950942993164062, 21.10965347290039, 5.884639739990234, 26.077224731445312, 14.275810241699219, 29.735671997070312, -2.002513885498047, 3.2248306274414062, 13.880813598632812, 28.874086380004883, 20.78365707397461, 19.213104248046875, 6.117034912109375, 18.198287963867188, 39.259910583496094, 5.8299102783203125, -7.807094573974609, 44.4976806640625, 5.196388244628906, 27.964248657226562, -4.474830627441406, 17.96588897705078, 4.372711181640625, 2.663951873779297, 18.885635375976562, 1.0161056518554688, -13.333576202392578, 9.678062438964844, 5.6733856201171875, 44.67147445678711, 23.95124053955078, -2.3064117431640625, 13.548126220703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000365.npy"}
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 10.048315048217773, "std": 16.483610153198242, "min": -21.11325454711914, "p10": -9.166275405883788, "median": 10.272449493408203, "p90": 32.99634704589844, "max": 51.4150390625, "pos_frac": 0.703125, "sample": [-0.25510597229003906, 26.525726318359375, 25.15386199951172, 6.962863922119141, -6.369686126708984, 29.171293258666992, -18.327743530273438, -8.162052154541016, 6.028564453125, -9.596656799316406, -3.438220977783203, 17.127517700195312, 10.754179000854492, 14.89486312866211, 23.60870361328125, 13.32513427734375, -17.737503051757812, 12.590621948242188, 4.357212066650391, 16.817123413085938, 14.784027099609375, 11.285072326660156, 2.632843017578125, -3.107196807861328, 6.457090377807617, 14.676740646362305, 10.145820617675781, 33.2625846862793, -16.890304565429688, -4.572429656982422, 51.4150390625, -2.7990951538085938, -5.800580978393555, 36.85545349121094, -20.71440315246582, 2.4000988006591797, -1.4790687561035156, -13.484130859375, 8.086502075195312, 10.426555633544922, 3.0841598510742188, -5.764293670654297, 5.165870666503906, 11.933868408203125, -4.149127960205078, 0.8442611694335938, 1.615427017211914, 33.2210693359375, 45.23735046386719, 10.278762817382812, 10.432441711425781, 29.41706085205078, 18.46587371826172, -21.11325454711914, 22.262975692749023, -3.653076171875, 42.664146423339844, 20.14373779296875, 32.847686767578125, 26.986068725585938, 33.06005859375, 18.859756469726562, 23.973899841308594, 10.266136169433594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000366.npy"}
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 9.034981727600098, "std": 13.245285987854004, "min": -29.422306060791016, "p10": -7.254430961608887, "median": 7.876370429992676, "p90": 27.361524963378905, "max": 35.91260528564453, "pos_frac": 0.8125, "sample": [5.156364440917969, 33.577667236328125, 9.37020492553711, -1.4038848876953125, 14.5147705078125, 1.432525634765625, 30.135055541992188, 27.387649536132812, 8.055557250976562, 2.081583023071289, -0.9195709228515625, 3.969268798828125, 4.759246826171875, 3.8369979858398438, 35.91260528564453, 8.048233032226562, 13.627891540527344, 0.2821807861328125, -7.167856216430664, 19.655532836914062, 6.4412078857421875, 11.2855224609375, -6.375764846801758, -9.55323600769043, 14.384902954101562, -29.422306060791016, -10.021987915039062, 26.397987365722656, 7.81498908996582, 15.107086181640625, 27.73309326171875, 16.403459548950195, 8.837493896484375, 18.1318359375, 7.548343658447266, 5.0569610595703125, 5.0466766357421875, -7.291534423828125, 7.937751770019531, 24.08489227294922, 4.486583709716797, -25.011131286621094, 3.4785404205322266, 32.71568298339844, 14.987228393554688, -7.7459259033203125, 16.55762481689453, 6.636333465576172, 14.235416412353516, 1.0760841369628906, 11.072311401367188, 4.610067367553711, -10.692459106445312, 21.418716430664062, 23.891448974609375, 27.300567626953125, 4.063083648681641, -2.1083831787109375, 13.942138671875, 30.210121154785156, 1.340667724609375, 11.832590103149414, 1.8909835815429688, 26.191120147705078], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000367.npy"}
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 8.893351554870605, "std": 14.281230926513672, "min": -22.405029296875, "p10": -10.832105636596678, "median": 9.593652725219727, "p90": 28.27000503540039, "max": 39.83582305908203, "pos_frac": 0.703125, "sample": [-22.405029296875, 13.63775634765625, -3.43267822265625, 1.7109012603759766, 10.537872314453125, 20.00861358642578, 24.68575668334961, 9.103492736816406, -1.2108230590820312, -4.404304504394531, 25.913604736328125, 3.678375244140625, 7.1726837158203125, 14.846061706542969, 35.763328552246094, 16.785263061523438, -3.342498779296875, -11.856552124023438, -8.36883544921875, 27.858016967773438, 4.648807525634766, 14.029668807983398, 15.844635009765625, -12.005172729492188, 30.558713912963867, 7.738430023193359, -11.455642700195312, 18.942901611328125, 39.83582305908203, 22.196514129638672, -3.466510772705078, 11.2491455078125, 23.376907348632812, -3.1528053283691406, 2.4304542541503906, 15.564632415771484, 5.18231201171875, 15.678367614746094, 2.1185989379882812, 15.130928039550781, 6.412345886230469, -1.2831554412841797, 30.443199157714844, 21.346710205078125, -15.26092529296875, -6.224021911621094, 10.083812713623047, -3.5958786010742188, 28.446571350097656, -12.819053649902344, 32.33235168457031, 32.57404708862305, 21.915924072265625, 17.748828887939453, 1.2596893310546875, 15.705078125, 11.027748107910156, -6.3659210205078125, -9.377185821533203, 11.595542907714844, 7.49761962890625, 16.457246780395508, -18.759674072265625, 6.885894775390625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000368.npy"}
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 8.93631362915039, "std": 14.390641212463379, "min": -21.483545303344727, "p10": -8.36415214538574, "median": 9.644807815551758, "p90": 26.339188766479495, "max": 42.81827926635742, "pos_frac": 0.734375, "sample": [21.878829956054688, -16.59418487548828, 11.52900505065918, 23.520751953125, -21.483545303344727, 29.683273315429688, 20.054157257080078, 4.4072265625, 14.620597839355469, 9.821399688720703, 5.577278137207031, 21.099014282226562, 23.12958526611328, 24.439804077148438, 28.62043571472168, 2.9570541381835938, 10.709049224853516, 11.55712890625, 6.1374969482421875, 6.690120697021484, 9.468215942382812, -19.106342315673828, 4.81072998046875, -18.98984718322754, 10.154922485351562, 24.884300231933594, -5.403728485107422, 12.583213806152344, 13.905879974365234, -5.982418060302734, 27.26910400390625, -1.89471435546875, 9.941228866577148, 23.552818298339844, 17.851594924926758, 26.811222076416016, 38.208961486816406, -3.2513885498046875, -1.8626251220703125, -13.576248168945312, 1.56756591796875, 7.9480438232421875, -3.2941741943359375, 10.999574661254883, 1.4595527648925781, 6.4312591552734375, 37.951995849609375, 11.395057678222656, 24.258575439453125, -2.9374160766601562, -1.6617202758789062, 3.36004638671875, -0.13519668579101562, 25.237777709960938, 0.7227134704589844, -5.9227142333984375, 3.2614917755126953, 42.81827926635742, 0.9624252319335938, 13.759803771972656, -9.384895324707031, 15.052177429199219, -16.635114669799805, 16.979616165161133], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000369.npy"}
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 12.476125717163086, "std": 17.841106414794922, "min": -36.66775894165039, "p10": -6.727288818359375, "median": 10.68148136138916, "p90": 38.23002395629883, "max": 49.04960632324219, "pos_frac": 0.75, "sample": [10.923944473266602, 37.96715545654297, 18.491857528686523, 5.522571563720703, 39.100006103515625, 3.1213951110839844, 49.04960632324219, 3.4116363525390625, 42.725074768066406, 5.285358428955078, 3.144561767578125, 0.126800537109375, -0.91656494140625, 26.228713989257812, 49.03105163574219, 20.84018325805664, 13.898605346679688, -10.118629455566406, -11.090972900390625, 20.451889038085938, 15.131805419921875, 16.891159057617188, 8.041988372802734, 17.847396850585938, 28.88022232055664, 13.267059326171875, 30.840694427490234, 2.6888694763183594, 5.289680480957031, -2.6003570556640625, -2.033782958984375, 1.9962959289550781, -1.8226852416992188, -15.657936096191406, -2.632110595703125, 26.103111267089844, 40.37867736816406, 15.530609130859375, 39.236019134521484, 27.678943634033203, 38.342681884765625, -31.050765991210938, 20.450929641723633, 15.874740600585938, 2.562389373779297, 34.00469970703125, 18.7840576171875, 24.13786506652832, 24.947126388549805, 27.93761444091797, -36.66775894165039, 6.676815032958984, 27.704444885253906, -5.006196975708008, -6.612007141113281, 6.9431915283203125, 8.158172607421875, -0.7095870971679688, -8.650753021240234, -0.0025577545166015625, 10.439018249511719, 31.76294708251953, -6.776695251464844, 2.971799850463867], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000370.npy"}
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 11.916627883911133, "std": 16.535385131835938, "min": -22.219057083129883, "p10": -9.126784896850586, "median": 12.065601348876953, "p90": 30.57211170196534, "max": 52.13951110839844, "pos_frac": 0.703125, "sample": [22.61572265625, -22.219057083129883, 42.66425323486328, 13.067375183105469, 28.16100311279297, -10.73541259765625, 24.480979919433594, -0.9983272552490234, -1.1176376342773438, 3.798360824584961, 7.436859130859375, 21.001068115234375, -8.612041473388672, 47.05058288574219, 4.778343200683594, 40.0684814453125, -0.7868423461914062, 31.605443954467773, 9.788749694824219, 42.94911575317383, 12.87890625, -6.792121887207031, -2.3866748809814453, 5.93389892578125, 21.66687774658203, 25.71723747253418, 19.55937385559082, 52.13951110839844, 20.41619873046875, 24.296424865722656, 25.68292236328125, 20.336265563964844, -13.86572265625, -2.1638145446777344, 15.478479385375977, 11.3245849609375, -2.481048583984375, -11.32101821899414, -9.44232177734375, 40.774383544921875, -7.7567138671875, 20.650020599365234, 3.1581382751464844, 1.284881591796875, 18.798233032226562, -7.633094787597656, 10.168241500854492, 18.608108520507812, -4.894172668457031, 12.806617736816406, 10.48537826538086, -11.033233642578125, 1.3218421936035156, 17.900802612304688, 27.13800048828125, 0.5367240905761719, 17.079803466796875, 8.473480224609375, 26.492416381835938, 18.85877227783203, 27.37364959716797, 24.09942626953125, -9.347389221191406, -4.655132293701172], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000371.npy"}
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 7.255866050720215, "std": 17.676557540893555, "min": -28.829811096191406, "p10": -14.040608978271484, "median": 7.434841156005859, "p90": 27.933262252807616, "max": 48.693458557128906, "pos_frac": 0.734375, "sample": [-14.162376403808594, 13.69314193725586, 15.162277221679688, 9.042655944824219, 15.143310546875, 11.543636322021484, 35.15983581542969, -6.6174774169921875, 24.315818786621094, 20.805503845214844, 37.87346649169922, 0.4430389404296875, 12.728424072265625, 48.693458557128906, 14.913627624511719, 2.9654159545898438, 16.114776611328125, -28.829811096191406, 3.0868453979492188, 17.05506134033203, 3.789571762084961, 4.57017707824707, 8.15472412109375, 27.969165802001953, 2.9749755859375, -2.7695693969726562, -27.032730102539062, 4.4293975830078125, 46.74657440185547, -12.586893081665039, 23.398536682128906, 19.354827880859375, 2.045684814453125, -7.579448699951172, -13.756484985351562, 11.24867057800293, -25.171527862548828, 8.236442565917969, 25.29461669921875, 12.599775314331055, 27.8494873046875, 11.780498504638672, 2.6748046875, 24.463356018066406, -7.750274658203125, 12.813539505004883, -25.775741577148438, 0.28003692626953125, 4.909786224365234, 10.829902648925781, 2.9018478393554688, -28.75409698486328, 17.11042022705078, 2.084379196166992, -5.0484619140625, -11.500020980834961, -21.287120819091797, 8.104591369628906, 31.207887649536133, 46.109405517578125, -6.591648101806641, -5.7560272216796875, 5.906669616699219, 6.7650909423828125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000372.npy"}
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 9.709882736206055, "std": 16.865154266357422, "min": -23.853439331054688, "p10": -10.764100646972654, "median": 7.857208251953125, "p90": 29.138246917724615, "max": 55.61586380004883, "pos_frac": 0.75, "sample": [-20.129104614257812, 23.62483024597168, 34.033042907714844, 19.709075927734375, 23.225746154785156, -11.562728881835938, -3.754253387451172, 22.100933074951172, 12.065376281738281, 7.002923965454102, 18.69361114501953, -18.114479064941406, 14.98712158203125, -4.3057861328125, -19.026201248168945, -7.725921630859375, 7.071508407592773, 15.689598083496094, -23.853439331054688, -4.0193023681640625, 11.472454071044922, 0.46722412109375, -2.652374267578125, -8.152076721191406, 21.735740661621094, 27.338478088378906, 2.3205413818359375, 8.960136413574219, 18.72754669189453, 13.809774398803711, 45.03034973144531, 0.49868011474609375, 1.5442352294921875, 25.98297882080078, -8.900634765625, 0.9302215576171875, 1.5175628662109375, 24.9366455078125, 0.50518798828125, 27.29672622680664, 16.743907928466797, -7.5877685546875, -13.465690612792969, -0.22220993041992188, 1.49884033203125, 21.252765655517578, -17.294523239135742, 33.31790542602539, 7.319465637207031, 8.394950866699219, 0.94537353515625, 41.00025939941406, 3.600046157836914, 55.61586380004883, 9.663736343383789, 22.651081085205078, 12.51565933227539, 19.328094482421875, 6.662754058837891, 48.97947692871094, 8.410196304321289, 5.999271392822266, 7.111518859863281, 29.909576416015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000373.npy"}
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 13.459157943725586, "std": 14.234453201293945, "min": -20.917648315429688, "p10": -3.536573028564451, "median": 13.15485954284668, "p90": 33.21643104553224, "max": 43.586273193359375, "pos_frac": 0.84375, "sample": [43.586273193359375, 20.828018188476562, 13.232933044433594, -1.3690147399902344, 8.731746673583984, 23.857440948486328, 34.45644760131836, 25.43183135986328, 37.47089385986328, 14.000371932983398, 21.453060150146484, 4.041595458984375, 22.046646118164062, 27.81878662109375, 10.146171569824219, -5.772735595703125, 25.831825256347656, 9.07647705078125, 1.696502685546875, 8.46356201171875, 12.066848754882812, 17.831241607666016, 4.503086090087891, -4.440544128417969, 10.218940734863281, 14.294418334960938, 6.520538330078125, 40.048057556152344, 0.522918701171875, 19.34014129638672, 6.0684051513671875, 18.946487426757812, 5.248752593994141, 2.4465179443359375, 16.455734252929688, 22.040760040283203, 18.14020538330078, -13.601776123046875, 30.32305908203125, 18.358631134033203, 12.92047119140625, -9.823373794555664, -0.9423351287841797, 39.86961364746094, 35.03111267089844, 5.620994567871094, -20.917648315429688, 8.969161987304688, 11.488748550415039, 14.742286682128906, 43.37059783935547, 25.072586059570312, 26.33599853515625, 9.498367309570312, 13.076786041259766, 2.3009109497070312, 19.495208740234375, 19.532485961914062, 15.314697265625, 24.562294006347656, -12.562746047973633, -1.42730712890625, 5.2465972900390625, -15.750640869140625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000374.npy"}
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 8.110240936279297, "std": 14.316776275634766, "min": -24.41487693786621, "p10": -9.502089691162109, "median": 7.551028251647949, "p90": 27.095930480957033, "max": 45.67524719238281, "pos_frac": 0.703125, "sample": [-10.626472473144531, -12.739372253417969, 3.4488449096679688, 10.942462921142578, 13.635061264038086, 5.573398590087891, 31.668548583984375, 4.430805206298828, -4.278739929199219, 12.885540008544922, -4.425254821777344, 20.774749755859375, -9.631790161132812, 37.31793212890625, 7.314352035522461, -5.153888702392578, 9.018789291381836, -6.8507232666015625, 13.096553802490234, -5.180419921875, 8.27496337890625, 27.253684997558594, 27.154006958007812, -12.572616577148438, 8.838386535644531, 24.775779724121094, -0.3695716857910156, -1.941253662109375, 5.91900634765625, 20.290210723876953, 4.244194030761719, -5.41358757019043, 1.7653656005859375, -14.461250305175781, 8.463882446289062, 9.087575912475586, 12.933830261230469, 5.24609375, 45.67524719238281, 7.7877044677734375, 21.436676025390625, 17.380762100219727, 24.866371154785156, -24.41487693786621, 4.8837432861328125, 2.979278564453125, 32.81327819824219, 4.4511871337890625, 26.960418701171875, 10.022029876708984, 20.89838409423828, 25.078628540039062, -4.451148986816406, 20.216205596923828, 3.2561302185058594, 6.604499816894531, 11.305610656738281, -20.060510635375977, -3.8348236083984375, 23.665802001953125, -5.928062438964844, 27.900514602661133, 8.052726745605469, -9.199455261230469], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000375.npy"}
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 14.881567001342773, "std": 18.069625854492188, "min": -17.83544921875, "p10": -8.224699401855467, "median": 12.03911304473877, "p90": 41.120146942138675, "max": 64.44619750976562, "pos_frac": 0.796875, "sample": [16.347198486328125, 4.9990234375, 3.179849624633789, 7.919830322265625, 10.586036682128906, -7.381694793701172, -1.8034896850585938, 30.444244384765625, 5.0615234375, 10.443309783935547, 18.5460205078125, 24.632274627685547, -16.157745361328125, -1.927682876586914, -15.41204833984375, 2.220661163330078, 41.70092010498047, 19.628692626953125, 10.372535705566406, 64.44619750976562, 13.992080688476562, 36.975494384765625, 3.1277313232421875, 25.902877807617188, 40.787017822265625, -4.1778106689453125, 34.49004364013672, 39.013099670410156, 21.749065399169922, 21.111892700195312, 31.6252384185791, 13.135921478271484, 43.119606018066406, 18.267807006835938, 11.673383712768555, 29.957809448242188, 36.657623291015625, -8.585987091064453, 41.38172149658203, -13.801734924316406, 8.680583953857422, -9.86553955078125, 33.098915100097656, 11.019538879394531, 14.337081909179688, 44.36468505859375, -3.0207691192626953, 17.126495361328125, -2.1877212524414062, 36.15648651123047, -14.420440673828125, 12.404842376708984, 7.712165832519531, 8.36624526977539, 41.262916564941406, 2.691404342651367, 21.54510498046875, 3.2124900817871094, 41.38929748535156, 20.528575897216797, 4.360260009765625, 5.086481094360352, 2.1581268310546875, -17.83544921875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000376.npy"}
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 10.194432258605957, "std": 14.154486656188965, "min": -18.57794952392578, "p10": -6.207121086120605, "median": 10.577949523925781, "p90": 27.852041053771977, "max": 40.51133728027344, "pos_frac": 0.71875, "sample": [19.250511169433594, -7.773490905761719, 21.197017669677734, 19.105140686035156, 8.236244201660156, -2.9382781982421875, -18.57794952392578, -12.467903137207031, 7.847434997558594, 19.242332458496094, 34.84809875488281, 1.6319808959960938, 24.99658966064453, -1.6681900024414062, 18.21363067626953, 19.89464569091797, -5.526206970214844, 20.52349090576172, -8.117712020874023, 2.9051170349121094, 26.911128997802734, -2.1835880279541016, 19.38174819946289, 12.899124145507812, 12.791147232055664, -18.030258178710938, 9.23583984375, 39.36183166503906, 9.640434265136719, 3.0978145599365234, 20.97547149658203, -5.420234680175781, 16.483232498168945, 2.133890151977539, 7.286537170410156, -5.370454788208008, 17.278152465820312, 19.0460205078125, 21.341167449951172, 11.515464782714844, -4.867259979248047, 33.240089416503906, 6.618461608886719, 12.458732604980469, 8.128952026367188, 36.483787536621094, 13.102901458740234, -5.244575500488281, -4.728828430175781, 12.504129409790039, 23.998794555664062, 3.43975830078125, 1.9936084747314453, 40.51133728027344, -2.8938140869140625, -6.498941421508789, 28.25528907775879, 19.557239532470703, -17.21109962463379, 7.730705261230469, 19.266202926635742, -1.9280471801757812, 28.709991455078125, 20.619304656982422], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000377.npy"}
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 8.545565605163574, "std": 15.575573921203613, "min": -30.558311462402344, "p10": -7.0109535217285135, "median": 5.0663604736328125, "p90": 28.827616119384782, "max": 53.2481689453125, "pos_frac": 0.75, "sample": [15.829154968261719, 7.086296081542969, 10.152008056640625, 4.618661880493164, 15.781593322753906, 17.816978454589844, -9.113693237304688, 0.4695472717285156, 11.857986450195312, 31.968475341796875, -30.558311462402344, 4.590541839599609, 46.56575012207031, 1.9162940979003906, 17.639022827148438, 11.900947570800781, 2.930675506591797, 8.87136459350586, 40.22927474975586, -2.86383056640625, 21.193225860595703, 9.809951782226562, -0.2931709289550781, 11.51092529296875, 25.16769790649414, 4.7278594970703125, 2.7271041870117188, 23.196380615234375, -16.938827514648438, 2.50177001953125, 21.8331298828125, 6.661834716796875, -15.019630432128906, 1.131082534790039, 21.8297119140625, 1.6821136474609375, -0.1744232177734375, 0.7157363891601562, 21.90850830078125, -7.871362686157227, 53.2481689453125, 5.4048614501953125, 6.699378967285156, 30.39615249633789, 4.61676025390625, 15.72308349609375, -0.8659286499023438, 37.599430084228516, 12.402145385742188, 1.9218311309814453, 4.690792083740234, -4.3668365478515625, 3.8716888427734375, -19.6822509765625, -2.021228790283203, 21.008956909179688, -5.003332138061523, 20.234039306640625, 3.6338424682617188, 7.0773162841796875, -4.670520782470703, -21.552005767822266, -1.7426776885986328, 34.304161071777344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000378.npy"}
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 11.91736888885498, "std": 15.442754745483398, "min": -18.693603515625, "p10": -4.782092285156248, "median": 9.901277542114258, "p90": 30.10480117797852, "max": 56.737060546875, "pos_frac": 0.796875, "sample": [15.398008346557617, 1.6506500244140625, 29.47869873046875, -3.0051116943359375, 7.2966156005859375, 12.723953247070312, 18.35700035095215, -13.337579727172852, 6.495342254638672, 6.1754913330078125, -0.3227386474609375, 4.671783447265625, -11.089080810546875, 22.231979370117188, 2.637279510498047, 56.737060546875, 20.28949737548828, -2.6617355346679688, 23.561668395996094, 24.178756713867188, 50.415435791015625, 27.743194580078125, 34.117427825927734, 22.72509002685547, 4.76287841796875, -18.693603515625, 20.69776153564453, -5.5436553955078125, 18.28368377685547, -12.68017578125, 41.044654846191406, 16.406484603881836, 7.572977066040039, 8.257488250732422, -12.876174926757812, 22.371780395507812, 30.373130798339844, 4.835197448730469, 9.080524444580078, -15.850337982177734, -2.3592987060546875, 14.154144287109375, 5.631538391113281, 37.405677795410156, 18.927837371826172, 3.344329833984375, 43.73072814941406, 18.462799072265625, 0.9495506286621094, 10.722030639648438, 16.243942260742188, -0.2301483154296875, -0.985198974609375, 16.647417068481445, 24.805509567260742, 1.0853614807128906, 8.219932556152344, 19.4735107421875, 1.3291358947753906, 2.2729129791259766, 11.518081665039062, 18.294998168945312, 13.130081176757812, 5.4253997802734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000379.npy"}
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 12.620878219604492, "std": 15.821794509887695, "min": -27.817466735839844, "p10": -8.719435119628905, "median": 12.348453521728516, "p90": 32.84361953735352, "max": 45.393707275390625, "pos_frac": 0.78125, "sample": [8.049711227416992, 27.001197814941406, 7.7386474609375, 9.610870361328125, -7.952720642089844, -13.023429870605469, 26.187612533569336, 29.045074462890625, -10.207874298095703, 26.664833068847656, 14.244979858398438, 2.2452659606933594, -2.1280784606933594, 22.69778823852539, 5.273792266845703, -9.048027038574219, -0.4383888244628906, 12.756004333496094, 31.675918579101562, 33.34406280517578, 19.103225708007812, 10.804779052734375, -4.29595947265625, -27.817466735839844, 34.43523406982422, 4.673072814941406, 14.765892028808594, 20.88097381591797, -2.953329086303711, 7.660577774047852, 21.591949462890625, 29.30035400390625, 11.940902709960938, 8.302345275878906, 37.54130554199219, 5.7352447509765625, 37.18560791015625, 17.75128936767578, -2.3878936767578125, 6.033103942871094, 28.019695281982422, -26.86412811279297, 35.331153869628906, 45.393707275390625, 13.643407821655273, 18.334325790405273, 15.235099792480469, 11.804367065429688, -3.67303466796875, 29.091976165771484, 3.9598159790039062, 2.767049789428711, -9.90704345703125, 42.866905212402344, 25.272293090820312, 18.742752075195312, 3.425996780395508, 23.388763427734375, 24.25141143798828, 20.894439697265625, 11.32180404663086, 16.210290908813477, 5.649446487426758, -11.412704467773438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000380.npy"}
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 5.450328350067139, "std": 14.87196159362793, "min": -40.729705810546875, "p10": -8.82815475463867, "median": 3.5702476501464844, "p90": 22.38019027709961, "max": 48.938385009765625, "pos_frac": 0.625, "sample": [-4.9007415771484375, 0.8922500610351562, 1.508279800415039, -13.011106491088867, -40.729705810546875, 5.288372039794922, 5.05462646484375, 6.9816741943359375, 45.244529724121094, 14.12982177734375, 7.358161926269531, -3.1240196228027344, -2.5048484802246094, 10.241085052490234, -3.2225284576416016, 48.938385009765625, 16.875778198242188, -1.0916156768798828, 3.5493240356445312, -1.4983463287353516, 2.5711421966552734, -6.900062561035156, 8.909051895141602, 15.873764038085938, -14.621612548828125, 12.233924865722656, -5.900421142578125, -14.877403259277344, 14.420608520507812, -2.0784683227539062, 9.510211944580078, 16.3709716796875, 9.347679138183594, 2.540313720703125, 0.1510772705078125, 22.630599975585938, 9.026752471923828, -3.7897300720214844, -18.20259666442871, 3.5911712646484375, 22.614173889160156, -9.65447998046875, 17.993785858154297, 6.570474624633789, 7.456001281738281, 40.066383361816406, 17.81024742126465, 10.673088073730469, -1.3243846893310547, 1.1812210083007812, -2.010986328125, 21.12447166442871, 21.834228515625, -1.6751785278320312, 5.884067535400391, -1.24725341796875, 27.76732635498047, -22.60882568359375, -4.527015686035156, 4.75145149230957, 2.2605361938476562, -3.248870849609375, -0.8728561401367188, 31.217052459716797], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000381.npy"}
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 8.905851364135742, "std": 13.266440391540527, "min": -20.086875915527344, "p10": -5.627150726318359, "median": 7.691509246826172, "p90": 25.97155666351319, "max": 51.31357955932617, "pos_frac": 0.703125, "sample": [-4.415618896484375, 0.9214954376220703, 31.748794555664062, 14.003679275512695, 29.206626892089844, -4.3414306640625, 3.2349700927734375, 3.035350799560547, 22.968780517578125, 8.393770217895508, 6.361358642578125, 4.14044189453125, -3.0167922973632812, 17.556350708007812, 19.362895965576172, 2.6704864501953125, -7.5808868408203125, -2.8184356689453125, -20.086875915527344, 14.672462463378906, -2.0226058959960938, 7.301246643066406, 11.737197875976562, -1.0191650390625, -6.3320465087890625, 9.688613891601562, 14.835630416870117, 6.8960418701171875, -7.696372985839844, 51.31357955932617, 10.715049743652344, 12.046005249023438, -10.990768432617188, -11.097261428833008, 28.471744537353516, -3.225231170654297, -4.475284576416016, -0.424591064453125, 6.558074951171875, 26.548124313354492, 6.300201416015625, 44.94023895263672, 8.081771850585938, 18.620140075683594, 4.6802825927734375, 14.850151062011719, -6.120807647705078, -2.450572967529297, -1.915557861328125, 11.963966369628906, 9.34014892578125, 19.79212188720703, -1.9880867004394531, 10.409595489501953, 19.024646759033203, 1.0460166931152344, 19.04071044921875, 6.92506217956543, 24.626232147216797, 34.70464324951172, 15.860748291015625, 14.155685424804688, 11.684013366699219, 11.557662963867188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000382.npy"}
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 11.239961624145508, "std": 16.82264518737793, "min": -24.876251220703125, "p10": -7.586315536499023, "median": 9.803777694702148, "p90": 32.866686248779295, "max": 53.760955810546875, "pos_frac": 0.703125, "sample": [0.8833236694335938, 21.440330505371094, 17.51580810546875, 6.411933898925781, 11.183334350585938, -7.645908355712891, 20.616100311279297, 10.18289566040039, -5.1396484375, -11.395700454711914, 11.262134552001953, -7.447265625, 10.967754364013672, 17.45363998413086, 21.610206604003906, 9.394508361816406, 23.67493438720703, -11.033027648925781, 36.49055480957031, 12.25027847290039, 14.966278076171875, -1.947723388671875, 28.820823669433594, 5.824676513671875, 32.92262268066406, -3.408111572265625, 6.926704406738281, -0.9800872802734375, 17.36742401123047, 38.35888671875, 30.18321418762207, -5.062156677246094, -0.7352523803710938, 53.760955810546875, -3.645153045654297, 15.718631744384766, 32.54412841796875, -24.876251220703125, 3.1530838012695312, 14.813301086425781, -14.338127136230469, 1.7860183715820312, 17.693626403808594, 8.09385871887207, 28.580581665039062, -5.774394989013672, 5.713050842285156, 9.424659729003906, -13.055805206298828, 43.52452850341797, 3.7410144805908203, 48.78556823730469, 25.31879425048828, 2.1389312744140625, -0.9587326049804688, 32.736167907714844, 23.701644897460938, 3.526519775390625, -1.4702224731445312, -19.276222229003906, 29.94485855102539, -6.62261962890625, 35.68122863769531, 17.080429077148438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000383.npy"}
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 10.718770980834961, "std": 13.6395845413208, "min": -24.004852294921875, "p10": -6.591470718383786, "median": 10.361050605773926, "p90": 28.554676818847657, "max": 45.71930694580078, "pos_frac": 0.796875, "sample": [-9.707763671875, 28.002182006835938, 11.261848449707031, 8.346664428710938, 3.7309303283691406, 17.869243621826172, 10.806221008300781, 17.532859802246094, 24.677841186523438, 0.04578399658203125, 15.2620849609375, 18.9658203125, 6.014617919921875, 3.5081100463867188, -1.6843719482421875, 4.4385223388671875, 1.8192024230957031, 35.090065002441406, -1.1469879150390625, 30.640872955322266, -3.21673583984375, 17.09930419921875, 7.848888397216797, -13.028602600097656, -7.803783416748047, 12.508768081665039, 37.29108428955078, 18.237525939941406, -9.359977722167969, 19.364212036132812, 37.67030715942383, 18.493026733398438, 2.3903961181640625, 8.187650680541992, 12.4654541015625, 1.5724945068359375, 45.71930694580078, 4.5004119873046875, -24.004852294921875, 19.992874145507812, -3.7627410888671875, 17.288848876953125, 28.7353515625, -11.395309448242188, 31.78540802001953, 24.70392608642578, 10.492862701416016, 4.153772354125977, 1.8599281311035156, 10.229238510131836, 12.616973876953125, 2.5479354858398438, 7.808198928833008, 14.720184326171875, 6.878303527832031, -0.5274715423583984, -8.250785827636719, 24.270889282226562, -3.3828964233398438, 5.753078460693359, 28.133102416992188, 16.79718780517578, 20.055715560913086, 13.088127136230469], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000384.npy"}
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 11.885599136352539, "std": 13.909668922424316, "min": -17.739295959472656, "p10": -5.4973815917968745, "median": 12.161720275878906, "p90": 28.71081428527832, "max": 53.231685638427734, "pos_frac": 0.78125, "sample": [-7.346099853515625, 1.4074668884277344, 29.889314651489258, 13.777694702148438, 17.480575561523438, 10.864395141601562, 16.33453369140625, 12.122711181640625, 5.693977355957031, 10.543716430664062, -6.939144134521484, 13.364494323730469, 10.983348846435547, 20.943775177001953, 13.692649841308594, 24.789112091064453, 34.240516662597656, 26.039581298828125, -0.3564434051513672, 10.761119842529297, -2.9741363525390625, -12.961471557617188, 1.6633377075195312, 28.480018615722656, 28.80972671508789, 22.722396850585938, 13.292644500732422, 12.200729370117188, 1.2688674926757812, 10.955326080322266, 7.736785888671875, -17.739295959472656, 23.80303192138672, 0.7233505249023438, 12.348926544189453, 27.784652709960938, 19.83746337890625, 13.739425659179688, 11.050073623657227, 17.8994197845459, -7.485828399658203, -1.5857772827148438, 17.452301025390625, 19.984153747558594, -3.72021484375, 36.927001953125, 30.183944702148438, -11.150924682617188, -2.1117820739746094, -5.293670654296875, 2.24310302734375, 13.785232543945312, 0.10776519775390625, 6.7928619384765625, 19.342391967773438, 24.277175903320312, -1.145233154296875, 23.825437545776367, -5.584686279296875, 53.231685638427734, 1.4011383056640625, 4.7122802734375, 37.9979133605957, 27.563507080078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000385.npy"}
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 8.834293365478516, "std": 13.830795288085938, "min": -24.755508422851562, "p10": -7.041922950744628, "median": 7.379032135009766, "p90": 26.024497985839844, "max": 49.582275390625, "pos_frac": 0.71875, "sample": [49.582275390625, -6.333200454711914, -0.6411628723144531, 16.82671546936035, 22.435195922851562, 6.05084228515625, 1.532470703125, 19.049285888671875, 6.991661071777344, 17.067108154296875, 4.166168212890625, -1.2503814697265625, -21.321020126342773, 6.730796813964844, -4.227943420410156, 26.07347869873047, 12.916473388671875, 26.841964721679688, 19.17627716064453, 7.0324859619140625, 13.13212776184082, 27.764076232910156, 7.554588317871094, -2.3168411254882812, -7.345661163330078, 18.66403579711914, 23.62461280822754, -7.565834045410156, 15.82241439819336, 9.319938659667969, 8.045501708984375, 14.317733764648438, -0.143341064453125, 5.01300048828125, -13.529144287109375, -2.3258743286132812, 2.46221923828125, 7.9296875, 10.443138122558594, 25.91020965576172, 6.1778106689453125, -8.009639739990234, 14.163509368896484, 21.286956787109375, 29.227943420410156, -5.802333831787109, -0.192108154296875, 12.162742614746094, -0.20006942749023438, 38.54667663574219, -0.22177886962890625, 1.426177978515625, -18.057952880859375, 23.538925170898438, 31.7166748046875, 3.0843353271484375, 3.3885040283203125, 8.656341552734375, 24.9208984375, 17.63678741455078, 11.40728759765625, 2.613018035888672, 7.2034759521484375, -24.755508422851562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000386.npy"}
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 12.292238235473633, "std": 13.04440689086914, "min": -14.876174926757812, "p10": -1.8336782455444331, "median": 10.659656524658203, "p90": 29.975544929504398, "max": 55.22093200683594, "pos_frac": 0.84375, "sample": [18.47393798828125, 3.2219314575195312, 23.119029998779297, -4.237548828125, 2.7601852416992188, 30.5009765625, 10.517532348632812, 0.2770652770996094, 11.152315139770508, 55.22093200683594, 8.133543014526367, -1.3743152618408203, 2.3236083984375, 10.801780700683594, 30.947364807128906, 5.811225891113281, 12.933563232421875, 23.345138549804688, 15.653667449951172, 23.1683349609375, 20.108856201171875, 30.29302406311035, 18.686752319335938, -3.7919883728027344, -3.906200408935547, 18.55803680419922, 1.7918777465820312, 4.7865447998046875, 29.234760284423828, 15.236425399780273, 2.516551971435547, -2.030548095703125, 7.5881195068359375, 33.400421142578125, -0.3116455078125, 20.66497039794922, 9.83266830444336, 2.0631351470947266, 5.7557373046875, -0.032958984375, 2.027538299560547, 16.915573120117188, 24.406822204589844, -14.876174926757812, -2.8712615966796875, 16.422748565673828, 19.894859313964844, 39.16333770751953, 3.539590835571289, 25.071367263793945, -7.830348968505859, 12.369758605957031, 9.456710815429688, 11.477767944335938, 3.04638671875, 11.521671295166016, 4.342231750488281, 5.320959091186523, 7.0632781982421875, 28.395614624023438, 43.754638671875, 3.7868785858154297, 13.863813400268555, 17.244670867919922], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000387.npy"}
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 9.1153564453125, "std": 16.19068145751953, "min": -20.588607788085938, "p10": -8.587938690185547, "median": 7.578640937805176, "p90": 29.89310455322266, "max": 54.608306884765625, "pos_frac": 0.65625, "sample": [15.328216552734375, 26.49213409423828, 25.815811157226562, 47.218536376953125, 3.7330474853515625, 26.860610961914062, 10.007301330566406, 7.6064300537109375, -5.215240478515625, -3.8819656372070312, 13.279399871826172, 24.84461212158203, -3.8685340881347656, 14.858444213867188, -8.735359191894531, 54.608306884765625, -2.400135040283203, 42.70726013183594, 3.4852294921875, 17.83188819885254, 22.659652709960938, -0.3159637451171875, 27.301183700561523, -1.1627273559570312, 26.19640350341797, -20.588607788085938, 5.97247314453125, 10.998428344726562, 5.365842819213867, 5.56785774230957, -3.4237842559814453, -6.932044982910156, -0.9854621887207031, 32.83510208129883, -5.3259429931640625, 8.183013916015625, 2.6844215393066406, -11.841552734375, 20.718759536743164, -13.559883117675781, 15.227882385253906, 30.28076171875, -14.311027526855469, 28.988571166992188, 21.93971824645996, 12.620697021484375, -0.5049667358398438, 8.844749450683594, 12.2406005859375, 32.29987335205078, 2.870067596435547, 36.319305419921875, -7.409820556640625, 2.107879638671875, -17.735870361328125, -14.720176696777344, 8.673004150390625, 10.06982421875, -3.6019210815429688, 0.7429122924804688, 10.731678009033203, -6.520942687988281, -8.24395751953125, 7.550851821899414], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000388.npy"}
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 10.459827423095703, "std": 15.288549423217773, "min": -35.264530181884766, "p10": -5.8985874176025375, "median": 10.617744445800781, "p90": 31.053616333007827, "max": 43.8043212890625, "pos_frac": 0.734375, "sample": [-9.832572937011719, -4.378625869750977, 17.561513900756836, 24.019744873046875, 12.459135055541992, 9.580825805664062, 19.473979949951172, 27.674697875976562, 43.8043212890625, 19.606719970703125, -2.263916015625, -11.668960571289062, 19.629928588867188, 13.68800163269043, 9.947723388671875, 21.455385208129883, 15.273849487304688, 21.315658569335938, 11.287765502929688, 8.510860443115234, 27.04248809814453, 16.198631286621094, 4.958957672119141, 11.535415649414062, 23.593475341796875, -3.9233856201171875, 22.297252655029297, -4.152740478515625, 3.4393768310546875, 17.8033447265625, -1.6752281188964844, 20.006610870361328, -6.519805908203125, 35.257965087890625, 32.50172424316406, 8.419036865234375, 36.561668395996094, 14.739669799804688, -4.449077606201172, -27.204002380371094, 24.952362060546875, -3.6495437622070312, 5.250703811645508, 0.006290435791015625, 15.067895889282227, 1.977294921875, 21.666458129882812, -8.214675903320312, -35.264530181884766, 8.02532958984375, 0.01679229736328125, 7.505228042602539, 0.321441650390625, -3.038471221923828, -6.771121978759766, 36.75391387939453, 34.78762435913086, 12.2860107421875, 9.873550415039062, 36.905426025390625, -1.8664836883544922, 18.56195831298828, 4.653390884399414, -3.955242156982422], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000389.npy"}
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 10.419500350952148, "std": 13.21918773651123, "min": -19.257831573486328, "p10": -6.19992389678955, "median": 9.916037559509277, "p90": 27.318952178955087, "max": 50.40367889404297, "pos_frac": 0.78125, "sample": [-6.622062683105469, 10.512786865234375, 12.085380554199219, 3.4272232055664062, 31.038654327392578, 5.0955352783203125, 12.291664123535156, 18.15605354309082, -3.2330265045166016, 11.287956237792969, 0.36481475830078125, -4.172615051269531, 17.67241668701172, 0.7131481170654297, -5.751434326171875, 50.40367889404297, 18.867977142333984, -4.806098937988281, 11.771949768066406, -6.905281066894531, 19.599639892578125, 3.8964614868164062, 14.460309982299805, 34.15269470214844, 25.30389404296875, 23.843807220458984, 9.04547119140625, -15.650856018066406, 6.119335174560547, 32.415733337402344, -3.7505111694335938, 7.287774085998535, 9.31928825378418, 16.86084747314453, -6.865301132202148, -0.13555908203125, -19.257831573486328, 24.15826416015625, 28.18254852294922, 0.6762847900390625, 34.50360870361328, 4.734716415405273, 9.287498474121094, 10.959619522094727, -0.05921363830566406, 19.409475326538086, 6.042530059814453, 10.525077819824219, 24.385894775390625, 6.815906524658203, 15.804229736328125, 13.33407211303711, 7.348285675048828, 21.131227493286133, 3.9477615356445312, 12.626922607421875, 6.014978408813477, -8.004730224609375, 11.641389846801758, -6.392133712768555, 21.975173950195312, 17.565704345703125, 4.929618835449219, 36.45940399169922], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000390.npy"}
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 12.077947616577148, "std": 16.98453140258789, "min": -29.62879180908203, "p10": -9.573927688598632, "median": 13.34140682220459, "p90": 33.86553649902344, "max": 62.88932800292969, "pos_frac": 0.765625, "sample": [4.1363677978515625, -29.62879180908203, 35.09039306640625, 17.76568603515625, 14.619834899902344, 22.68700408935547, -0.7274875640869141, 26.79669952392578, 35.292884826660156, 43.578704833984375, 11.297897338867188, -17.3021240234375, 20.95703125, -2.2732620239257812, 23.583110809326172, 44.452171325683594, -16.60014533996582, 12.265151977539062, -7.999176025390625, 0.9249343872070312, 0.19815826416015625, 18.086761474609375, 18.341522216796875, 18.50402069091797, 3.8032684326171875, -2.6373538970947266, -15.839399337768555, -6.4328460693359375, 18.300453186035156, 3.4715118408203125, 62.88932800292969, 15.746505737304688, 8.42808723449707, -10.248821258544922, 10.326812744140625, -1.9203300476074219, 2.176473617553711, 19.667465209960938, 24.407302856445312, 16.92314910888672, 15.751800537109375, 23.907318115234375, -16.293964385986328, -11.535526275634766, 15.108734130859375, 13.622074127197266, 34.07843780517578, 11.46856689453125, 28.981842041015625, 29.33136749267578, -2.43646240234375, -3.75201416015625, 35.2946891784668, 32.86162185668945, 33.36876678466797, 22.971454620361328, 14.960485458374023, 13.060739517211914, 0.2134552001953125, 4.572589874267578, 5.60302734375, 15.634204864501953, 11.636697769165039, 1.4697494506835938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000391.npy"}
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 13.833769798278809, "std": 20.104711532592773, "min": -28.49072265625, "p10": -7.9133104324340815, "median": 11.095420837402344, "p90": 39.37306518554688, "max": 84.427734375, "pos_frac": 0.78125, "sample": [-19.293594360351562, 2.7915115356445312, 15.516029357910156, 8.374359130859375, 6.674064636230469, 1.3935394287109375, 40.96393966674805, -4.396965026855469, 26.7852783203125, 28.88372230529785, 4.6112213134765625, -18.20703125, 15.590408325195312, 28.087100982666016, 37.86857604980469, 16.361671447753906, -2.5973968505859375, 22.937667846679688, 31.569976806640625, 18.976234436035156, 13.925918579101562, 11.797439575195312, 9.077201843261719, -20.291109085083008, 31.487136840820312, 17.702438354492188, 40.17864227294922, 16.2796630859375, 21.911697387695312, -28.49072265625, 6.84637451171875, 38.045867919921875, 25.893569946289062, -1.578155517578125, -15.448509216308594, 16.48587417602539, 6.948249816894531, 13.953365325927734, 9.670894622802734, 29.78406524658203, 52.11054229736328, 48.08325958251953, -8.23151969909668, 56.230072021484375, 1.7219409942626953, 4.793340682983398, 0.8731403350830078, 1.5432968139648438, 39.941864013671875, 8.822975158691406, -7.1708221435546875, 84.427734375, -1.8647651672363281, -8.41375732421875, -7.168182373046875, 2.2532615661621094, 36.010650634765625, -1.5692138671875, 10.393402099609375, 28.901840209960938, 5.216005325317383, 15.258642196655273, 3.2030029296875, 12.924339294433594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000392.npy"}
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 13.065959930419922, "std": 14.185859680175781, "min": -10.907867431640625, "p10": -3.995933532714842, "median": 11.277515411376953, "p90": 34.626580810546876, "max": 41.14543914794922, "pos_frac": 0.796875, "sample": [12.6473388671875, -2.227649688720703, 29.193572998046875, 34.8482666015625, 37.07413864135742, 20.42731475830078, 6.315254211425781, 11.8465576171875, 4.800544738769531, 5.346057891845703, -0.31665611267089844, 3.2626113891601562, 38.07703399658203, 21.221778869628906, 18.556873321533203, -4.714099884033203, 3.34814453125, 20.659873962402344, -2.4103431701660156, 13.023433685302734, 1.0806808471679688, 7.662433624267578, 4.070953369140625, 41.02497100830078, 12.779136657714844, 32.05079650878906, -1.5918445587158203, 34.10931396484375, 23.87635040283203, 28.367603302001953, 10.708473205566406, -5.925621032714844, 34.92036437988281, 33.170654296875, 2.7291927337646484, 17.249122619628906, -6.641595840454102, 35.645713806152344, 3.9640121459960938, -1.0936145782470703, 12.803508758544922, 7.4990234375, 13.790725708007812, -4.675472259521484, 7.489084243774414, 16.902786254882812, 27.213428497314453, 6.286067962646484, 4.614543914794922, 32.9420166015625, 25.38270378112793, 41.14543914794922, 15.568391799926758, -10.046043395996094, 7.559593200683594, 23.697898864746094, 2.600189208984375, -0.11297607421875, -10.907867431640625, -9.144241333007812, 12.257099151611328, 26.7049617767334, 2.9444122314453125, 4.569023132324219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000393.npy"}
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 9.990022659301758, "std": 13.375109672546387, "min": -19.564594268798828, "p10": -8.745657730102538, "median": 10.147100448608398, "p90": 24.667200660705568, "max": 45.084686279296875, "pos_frac": 0.796875, "sample": [2.2817230224609375, 34.524200439453125, 36.11469268798828, 3.1366043090820312, 1.9069404602050781, 9.491500854492188, -7.296730041503906, 7.5417327880859375, 24.98619842529297, 9.620019912719727, 29.80194091796875, 15.71019172668457, 2.418651580810547, -9.366626739501953, -12.926395416259766, 12.142045974731445, 3.3378982543945312, -11.534704208374023, 3.2101898193359375, 10.082210540771484, 13.98434066772461, -6.30712890625, 18.917984008789062, 20.52410888671875, 20.581972122192383, 45.084686279296875, 23.92287254333496, 20.20777130126953, 10.355600357055664, -0.658721923828125, -19.564594268798828, 3.6703643798828125, 37.82792663574219, 14.309738159179688, 10.027505874633789, 8.247756958007812, -1.1844730377197266, 3.7534332275390625, 20.567901611328125, 0.2654457092285156, 19.053306579589844, -10.2828369140625, -5.41522216796875, 13.947906494140625, 9.924844741821289, 35.78443145751953, -18.067092895507812, 18.248249053955078, -3.60296630859375, 9.179458618164062, -11.085681915283203, 16.450889587402344, 12.756614685058594, 13.713943481445312, 10.211990356445312, 8.045095443725586, 12.328451156616211, 16.403413772583008, 11.020547866821289, 8.949981689453125, 10.660636901855469, 22.355783462524414, 12.654632568359375, 16.408283233642578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000394.npy"}
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 9.96342658996582, "std": 12.896591186523438, "min": -14.561698913574219, "p10": -4.861053276062011, "median": 7.128108978271484, "p90": 27.156201934814455, "max": 41.13576126098633, "pos_frac": 0.765625, "sample": [17.64678955078125, 22.151336669921875, 15.465057373046875, 1.8907623291015625, 25.65007781982422, 10.354934692382812, -2.5813465118408203, 21.47698211669922, 33.767112731933594, -3.1439170837402344, 4.001045227050781, 17.091552734375, 27.045700073242188, 30.71440887451172, 6.011749267578125, 41.13576126098633, -0.7249259948730469, 13.298017501831055, 0.92474365234375, 7.2369537353515625, -1.2008895874023438, 6.4791412353515625, 0.7818069458007812, 10.26296615600586, 15.58084487915039, -10.98231315612793, 14.149185180664062, 23.91356658935547, 8.656869888305664, -2.7193069458007812, 20.64568519592285, 36.84111022949219, 7.8604278564453125, 6.277442932128906, -8.829757690429688, 14.309158325195312, -6.423759460449219, 4.675495147705078, 27.20355987548828, -5.341165542602539, 2.385467529296875, 3.620990753173828, 37.414039611816406, 2.553447723388672, 17.189411163330078, 16.91057586669922, 5.309747695922852, 21.826705932617188, -3.7407913208007812, -1.1753101348876953, 4.136655807495117, 39.42247009277344, -7.54156494140625, 1.1898078918457031, 6.254859924316406, 21.355438232421875, -6.449916839599609, 7.019264221191406, 11.197643280029297, -14.561698913574219, 7.576822280883789, -0.8564910888671875, 5.238990783691406, 9.829902648925781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000395.npy"}
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 11.890352249145508, "std": 15.09028148651123, "min": -18.086624145507812, "p10": -4.816613388061522, "median": 9.13779067993164, "p90": 32.97190856933594, "max": 53.1544189453125, "pos_frac": 0.78125, "sample": [39.916603088378906, 53.1544189453125, 28.24859619140625, 37.67253875732422, 32.66887664794922, 33.03478240966797, 11.657482147216797, 18.727357864379883, 6.481559753417969, 4.362466812133789, 9.018123626708984, 6.6147003173828125, 21.65699005126953, -10.40609359741211, 6.180961608886719, -0.7212677001953125, 37.18939208984375, 25.850521087646484, -0.5490913391113281, 20.009553909301758, 6.4303131103515625, -1.1986083984375, 39.01305389404297, 8.318634033203125, 8.891921997070312, 14.545257568359375, 12.618457794189453, -16.48015594482422, 16.954559326171875, 12.852596282958984, 8.319782257080078, -8.837085723876953, 2.2812366485595703, 39.90113830566406, 15.6837158203125, 18.011550903320312, 12.80160903930664, 8.5255126953125, 32.82520294189453, 26.617340087890625, -3.237140655517578, 12.954935073852539, -1.4357662200927734, 9.434814453125, 7.423789978027344, 5.243995666503906, 9.257457733154297, -0.9054183959960938, 17.868850708007812, -5.4935302734375, 5.814231872558594, -7.976451873779297, 11.838546752929688, 1.0704269409179688, -2.038990020751953, 26.122882843017578, 10.285356521606445, 21.301576614379883, 3.045896530151367, 3.274839401245117, -18.086624145507812, -16.765254974365234, 0.8986129760742188, 32.24098587036133], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000396.npy"}
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 5.964630126953125, "std": 16.86263656616211, "min": -60.16386795043945, "p10": -11.421383666992186, "median": 7.83856201171875, "p90": 23.860879516601564, "max": 43.50834274291992, "pos_frac": 0.703125, "sample": [0.1676769256591797, 3.3703994750976562, -17.855775833129883, 7.044197082519531, -3.766246795654297, 15.586105346679688, 17.93761444091797, 3.279266357421875, 10.286262512207031, 11.65435791015625, 4.836158752441406, 13.78542709350586, 9.598075866699219, 21.394058227539062, 15.638572692871094, 20.260887145996094, -26.557281494140625, -14.544464111328125, 5.755195617675781, -5.909833908081055, 18.841604232788086, -4.417205810546875, 10.421857833862305, 13.127288818359375, 8.439414978027344, -33.2208251953125, -0.85614013671875, 5.765911102294922, 29.786651611328125, 21.528350830078125, 7.237709045410156, 3.6842308044433594, 23.896827697753906, 13.580276489257812, 31.492996215820312, 10.60394287109375, 10.780258178710938, 27.074050903320312, 22.27828025817871, 2.934305191040039, -2.6107635498046875, 23.777000427246094, 25.069931030273438, 8.898635864257812, -60.16386795043945, 10.531181335449219, 31.6646728515625, -9.71990966796875, -11.917617797851562, -3.8145904541015625, 9.87203598022461, 6.1173095703125, 2.1117782592773438, 4.663415908813477, -0.3153419494628906, -4.394035339355469, 17.074588775634766, 9.417709350585938, -6.342010498046875, 19.457138061523438, -33.44255447387695, -10.263504028320312, -2.3836822509765625, 43.50834274291992], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000397.npy"}
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 12.49331283569336, "std": 15.367023468017578, "min": -13.46170425415039, "p10": -5.095413208007812, "median": 9.489025115966797, "p90": 36.99064483642578, "max": 58.393157958984375, "pos_frac": 0.796875, "sample": [15.832038879394531, 10.661266326904297, 16.036128997802734, 4.6263885498046875, 26.553131103515625, 2.4733314514160156, 40.696929931640625, 42.18817138671875, -7.099266052246094, 41.81342315673828, 18.948806762695312, 9.541389465332031, -0.9613189697265625, -3.4917068481445312, 13.797286987304688, -4.0837554931640625, 24.308792114257812, -8.204521179199219, 10.31024169921875, -13.46170425415039, 21.384185791015625, 4.534177780151367, 36.70698547363281, 40.60874938964844, -1.7518997192382812, 34.49610900878906, 32.62220764160156, 17.576457977294922, 11.326425552368164, 7.705848693847656, -10.893253326416016, 7.5996856689453125, 1.6878509521484375, 16.86510467529297, 10.103286743164062, -5.945501327514648, 2.717212677001953, 25.588775634765625, 4.276145935058594, 9.436660766601562, -4.405517578125, -5.391082763671875, 20.276519775390625, 8.532844543457031, 19.671375274658203, 19.594573974609375, 4.167974472045898, 0.0280609130859375, 4.2141265869140625, 20.40849494934082, 6.1776885986328125, 58.393157958984375, 13.461097717285156, 2.3752975463867188, 16.69927215576172, -0.3753509521484375, 37.112213134765625, -9.246627807617188, 8.769157409667969, 0.4622783660888672, 8.763671875, 21.69594383239746, 39.235084533691406, 1.821493148803711], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000398.npy"}
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 14.145965576171875, "std": 17.904497146606445, "min": -14.681175231933594, "p10": -2.0031188964843745, "median": 9.798608779907227, "p90": 37.221722793579104, "max": 92.498046875, "pos_frac": 0.78125, "sample": [8.305130004882812, 7.175193786621094, 14.19915771484375, 24.17822265625, -1.3883438110351562, 26.9293212890625, 21.128555297851562, 8.143993377685547, -3.520893096923828, -2.2665939331054688, -8.378547668457031, 38.89183044433594, 11.599130630493164, -10.321338653564453, 6.5946807861328125, 7.853546142578125, 10.42437744140625, 18.032623291015625, 7.981388092041016, 41.931793212890625, -13.213577270507812, 2.55377197265625, -14.681175231933594, 20.367958068847656, 52.897735595703125, 7.272491455078125, 9.97073745727539, 92.498046875, -0.9563751220703125, 4.653863906860352, 6.002845764160156, 0.8192596435546875, 6.7510986328125, 3.5705718994140625, 7.3373870849609375, -5.841575622558594, 9.919349670410156, 30.375564575195312, 9.664939880371094, 28.639755249023438, 24.305007934570312, -1.1847457885742188, 31.2406005859375, 5.280845642089844, 31.63654327392578, 14.080673217773438, 11.754493713378906, 9.677867889404297, -1.1384124755859375, 2.2088546752929688, 33.779319763183594, 37.914207458496094, 35.648258209228516, 12.991859436035156, 23.0445499420166, 26.584701538085938, 14.954879760742188, 47.75492858886719, -0.9921112060546875, -0.7703361511230469, 37.89606475830078, -1.2480354309082031, 11.668449401855469, 12.157356262207031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000399.npy"}
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 8.411383628845215, "std": 15.084396362304688, "min": -21.63813018798828, "p10": -6.690164184570311, "median": 3.9702835083007812, "p90": 31.525629043579105, "max": 47.77831268310547, "pos_frac": 0.734375, "sample": [3.2131004333496094, 12.085166931152344, 2.5192031860351562, 3.0753173828125, 22.582290649414062, 31.98904037475586, 2.9330062866210938, 17.376611709594727, 19.00885009765625, 36.58515167236328, 4.180408477783203, -11.042703628540039, 4.690036773681641, 3.644338607788086, -0.019687652587890625, 1.294830322265625, 0.26080322265625, 30.4443359375, 11.808021545410156, 26.553504943847656, -3.631103515625, 0.8541297912597656, 13.240219116210938, -2.3101673126220703, -7.12310791015625, 21.22088050842285, 0.7908153533935547, 13.779159545898438, 3.7601585388183594, -2.29913330078125, -18.636459350585938, -2.721792221069336, 15.25799560546875, 9.273002624511719, 47.77831268310547, 3.1078834533691406, -14.359001159667969, -2.16424560546875, 25.205718994140625, 18.389240264892578, 33.42729187011719, 23.72345733642578, -21.63813018798828, 15.22125244140625, 3.6480484008789062, 9.31573486328125, 0.9921417236328125, -4.39300537109375, 8.137123107910156, 45.626495361328125, 5.339271545410156, 9.247749328613281, -5.679962158203125, 6.587501525878906, -14.671607971191406, 38.26612854003906, 37.40495681762695, 1.56207275390625, 5.708808898925781, 2.37628173828125, -14.843772888183594, -2.503520965576172, -0.7741317749023438, 13.654190063476562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000400.npy"}
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 14.348001480102539, "std": 15.80478572845459, "min": -13.358474731445312, "p10": -2.6179342269897456, "median": 12.466752052307129, "p90": 36.148152923583986, "max": 54.47932434082031, "pos_frac": 0.8125, "sample": [-0.7584915161132812, 20.602401733398438, 23.972957611083984, 14.611778259277344, -2.817842483520508, 0.9317703247070312, 9.0634765625, 14.168113708496094, 0.5905799865722656, -6.5105743408203125, 36.399559020996094, 0.037841796875, -11.43838119506836, -2.1514816284179688, 10.315467834472656, -1.601318359375, 33.290802001953125, 30.270721435546875, 32.26320266723633, 42.86067199707031, 9.808639526367188, 10.614875793457031, 2.89794921875, 19.939342498779297, -13.358474731445312, -7.171302795410156, 11.905502319335938, 16.305770874023438, 9.299674987792969, 13.495552062988281, 23.324073791503906, 12.778053283691406, 10.60136604309082, 30.54462432861328, 0.30611419677734375, 35.56153869628906, 0.46898460388183594, 14.358757019042969, -0.05973625183105469, 20.76605224609375, -0.317108154296875, 44.253944396972656, 24.952896118164062, -6.882081985473633, 25.656173706054688, 3.9184303283691406, 5.149993896484375, 46.355072021484375, 31.075836181640625, 29.036840438842773, 12.250221252441406, 3.9354324340820312, 8.675220489501953, 5.5861968994140625, -6.689422607421875, 13.024511337280273, 19.302242279052734, 12.683282852172852, 42.37275695800781, 47.481895446777344, 12.981943130493164, 0.7450675964355469, 21.754825592041016, 54.47932434082031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000401.npy"}
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 10.721388816833496, "std": 14.891109466552734, "min": -19.89063262939453, "p10": -7.657922172546386, "median": 9.842235565185547, "p90": 30.89510955810547, "max": 42.656715393066406, "pos_frac": 0.75, "sample": [26.751970291137695, 30.794044494628906, 17.538524627685547, 25.069215774536133, 14.920913696289062, 17.267602920532227, 40.84864807128906, -0.431121826171875, 10.095794677734375, 30.93842315673828, -0.3546295166015625, 37.201011657714844, 10.766580581665039, -19.89063262939453, 19.00531005859375, 5.896335601806641, 1.4464797973632812, 11.575590133666992, 5.323699951171875, 9.260055541992188, 33.64408874511719, 0.5847129821777344, 4.584800720214844, -15.543182373046875, 33.09812927246094, -0.6502189636230469, 9.588676452636719, 19.672073364257812, 12.406665802001953, 6.729560852050781, 38.21989440917969, 42.656715393066406, 4.4741668701171875, 4.917304992675781, 5.235679626464844, 6.511293411254883, 28.557655334472656, 30.544963836669922, 13.418998718261719, -3.6797752380371094, -7.06334114074707, -12.937248229980469, -18.259521484375, 21.878442764282227, 14.867462158203125, 2.9209423065185547, 10.259622573852539, -1.0191802978515625, 3.1259994506835938, -0.7959938049316406, 19.538488388061523, 14.777999877929688, 21.62725830078125, 26.254562377929688, -1.723876953125, 0.1432781219482422, -1.3296585083007812, -11.842369079589844, -15.787155151367188, 17.95490264892578, 10.845916748046875, 23.634132385253906, -7.912742614746094, 8.014991760253906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000402.npy"}
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 14.987966537475586, "std": 17.21675682067871, "min": -23.769107818603516, "p10": -6.111487960815428, "median": 13.233299255371094, "p90": 37.065213775634774, "max": 61.73895263671875, "pos_frac": 0.8125, "sample": [-2.362152099609375, 20.1710205078125, 35.853485107421875, 16.682769775390625, 33.367706298828125, 5.3881378173828125, 55.46699523925781, 16.46469497680664, 6.061897277832031, 15.201766967773438, 26.5371036529541, -4.211883544921875, 37.58452606201172, 1.6055412292480469, 42.406028747558594, 8.990997314453125, 8.06617546081543, 1.3789749145507812, 10.256568908691406, 7.750373840332031, -9.364645004272461, -8.743045806884766, 61.73895263671875, 43.410118103027344, -0.3756103515625, 24.57962417602539, 30.96021270751953, 2.8484859466552734, 29.005115509033203, 13.762397766113281, 2.367717742919922, 9.890419006347656, 12.704200744628906, 10.384761810302734, 13.881532669067383, -6.535327911376953, 5.797739028930664, 46.25004577636719, 10.481468200683594, 17.67156982421875, 6.98773193359375, -0.09087181091308594, 39.38307571411133, 10.29470443725586, 18.501129150390625, -5.122528076171875, 18.939937591552734, 21.614303588867188, 24.357738494873047, 9.5528564453125, 25.145050048828125, 33.34294509887695, -23.769107818603516, -9.025680541992188, 34.69000244140625, -17.291168212890625, 34.81720733642578, 18.94647979736328, 1.9013481140136719, 29.97881317138672, -7.5245361328125, 3.311920166015625, 18.58161163330078, 18.33045768737793], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000403.npy"}
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 12.050149917602539, "std": 15.857672691345215, "min": -24.05799102783203, "p10": -6.613456726074219, "median": 11.869741439819336, "p90": 32.54161834716797, "max": 49.188987731933594, "pos_frac": 0.75, "sample": [5.2265472412109375, 4.663215637207031, 18.403182983398438, 7.7696075439453125, 11.504518508911133, -2.811248779296875, 26.219154357910156, 3.8788375854492188, -5.282268524169922, -1.46319580078125, 32.29066467285156, -1.77935791015625, -18.86233139038086, -18.89434242248535, -0.5136318206787109, 18.521881103515625, -7.385700225830078, -9.141124725341797, 32.649169921875, 35.2784423828125, 19.40237045288086, 29.080039978027344, 18.811676025390625, 49.188987731933594, -6.636962890625, 26.68244171142578, -5.873970031738281, 23.73712158203125, 15.488872528076172, 24.915363311767578, 3.5398101806640625, 24.42971420288086, -2.243825912475586, 27.625041961669922, -2.053863525390625, 2.1355056762695312, 2.6519012451171875, 22.901649475097656, 9.893417358398438, 3.103374481201172, 5.9818878173828125, 7.5071258544921875, 19.17083740234375, 11.563339233398438, 37.01782989501953, 17.528766632080078, 15.076416015625, 31.4361572265625, 42.02103042602539, 12.176143646240234, 2.0730819702148438, 45.636932373046875, 36.189697265625, 12.342742919921875, 15.56619644165039, 18.27533721923828, 23.071754455566406, -6.5586090087890625, -10.150314331054688, -24.05799102783203, 12.63214111328125, 15.309215545654297, 7.404151916503906, 6.945030212402344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000404.npy"}
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 10.784950256347656, "std": 15.158881187438965, "min": -21.749420166015625, "p10": -6.3014032363891594, "median": 6.893874168395996, "p90": 32.99609031677247, "max": 49.307838439941406, "pos_frac": 0.734375, "sample": [29.791717529296875, -2.48779296875, -6.553741455078125, 41.48748779296875, 17.77251434326172, 3.08758544921875, 30.809547424316406, -0.4932098388671875, 0.8098564147949219, -21.749420166015625, -1.0053787231445312, 6.605123519897461, 1.0521068572998047, -5.712614059448242, 15.230499267578125, 5.169986724853516, 38.52394104003906, 0.1383533477783203, -3.3038711547851562, 22.915679931640625, 33.79719161987305, 29.4429931640625, 8.207881927490234, 21.534332275390625, -3.4307193756103516, 6.441986083984375, 15.594818115234375, 35.587989807128906, -8.539485931396484, 15.17558479309082, 3.812946319580078, 2.0990943908691406, 15.850173950195312, 15.968460083007812, -4.675407409667969, 22.485092163085938, 6.01446533203125, -13.887216567993164, 11.847820281982422, 27.333602905273438, 7.182624816894531, -7.756999969482422, 6.456184387207031, 5.753129959106445, 3.4113388061523438, 3.5807571411132812, 4.1731414794921875, 16.768951416015625, 36.05535888671875, 23.141971588134766, -9.17642593383789, 7.438720703125, 17.441078186035156, 7.3421173095703125, 38.67108917236328, 49.307838439941406, -4.2959442138671875, 13.967479705810547, 21.083267211914062, -7.9287872314453125, -1.3127975463867188, 31.126853942871094, 18.418739318847656, -3.3628082275390625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000405.npy"}
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 12.849464416503906, "std": 16.11313819885254, "min": -28.793825149536133, "p10": -5.278155517578124, "median": 10.882343292236328, "p90": 36.62033462524414, "max": 44.063209533691406, "pos_frac": 0.765625, "sample": [25.416641235351562, 3.699951171875, -4.206781387329102, -7.709495544433594, 13.323532104492188, 24.820068359375, 36.465003967285156, 40.154052734375, -9.369813919067383, 40.28260803222656, 18.672021865844727, 19.308982849121094, 4.909839630126953, -1.360626220703125, 24.38951873779297, 23.500823974609375, 33.8067626953125, 8.610116958618164, 10.172691345214844, 42.48664474487305, 20.581527709960938, 10.926521301269531, 23.381919860839844, 23.788772583007812, 10.836219787597656, 20.28521728515625, -3.9900894165039062, 14.276313781738281, 0.8744125366210938, 4.977210998535156, -1.6026535034179688, -0.6378288269042969, -19.519683837890625, -2.6276779174804688, 15.97607421875, 19.207313537597656, 6.290849685668945, 38.6397705078125, -9.247760772705078, 40.261619567871094, 14.4329833984375, 2.3245391845703125, 7.681373596191406, 44.063209533691406, 4.0457000732421875, 33.41874313354492, -13.499549865722656, 3.974893569946289, 25.60480499267578, 36.68690490722656, 15.56962776184082, -4.5944976806640625, 20.515243530273438, -5.5711517333984375, 4.306596755981445, -28.793825149536133, 7.709228515625, 17.715667724609375, -3.209930419921875, 3.0776138305664062, 27.115615844726562, 9.8653564453125, 29.03778076171875, 10.838165283203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000406.npy"}
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 9.644933700561523, "std": 15.695564270019531, "min": -38.8583984375, "p10": -6.142187118530273, "median": 7.381135940551758, "p90": 29.471240615844728, "max": 62.517921447753906, "pos_frac": 0.71875, "sample": [1.9200286865234375, 1.8058795928955078, -0.9289398193359375, 23.55572509765625, -2.26434326171875, 26.45046615600586, 30.779647827148438, -0.942535400390625, -22.014362335205078, 20.66053581237793, 6.48919677734375, 6.7994232177734375, 0.236328125, -5.496910095214844, -8.070693969726562, 2.6954803466796875, 11.330314636230469, 20.13115692138672, 35.57840347290039, 30.775257110595703, 7.76483154296875, 2.0641231536865234, 17.78089141845703, 10.491828918457031, 12.73223876953125, 11.796356201171875, 3.021697998046875, -5.834236145019531, 7.8937530517578125, 62.517921447753906, 14.740455627441406, 1.6707000732421875, 26.6370849609375, 23.036842346191406, 26.856048583984375, 3.7782211303710938, -4.01513671875, 34.284637451171875, -7.84904670715332, 8.31243896484375, -4.448822021484375, 29.2364501953125, 19.61829376220703, 10.509017944335938, 21.026153564453125, 34.81880187988281, 18.720109939575195, 1.75421142578125, 23.24774169921875, -1.3576431274414062, 5.8317413330078125, -38.8583984375, -9.17071533203125, 3.1515350341796875, -6.274166107177734, 6.997440338134766, -0.9195041656494141, 29.57186508178711, -7.44856071472168, -1.7879409790039062, -1.4410934448242188, 14.8927001953125, 16.82135772705078, 15.613494873046875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000407.npy"}
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 14.132817268371582, "std": 15.26633358001709, "min": -15.222814559936523, "p10": -3.8397251129150383, "median": 11.799190521240234, "p90": 34.11887092590332, "max": 50.88702392578125, "pos_frac": 0.8125, "sample": [5.037443161010742, 34.2929573059082, 26.472618103027344, 18.527606964111328, 0.040111541748046875, -15.222814559936523, -12.543975830078125, 2.2511024475097656, 20.112144470214844, 8.696805953979492, 9.628734588623047, 32.26408386230469, -0.48871421813964844, 2.53179931640625, 37.038818359375, 22.943893432617188, 7.472629547119141, -4.194923400878906, 21.17109489440918, 7.611625671386719, 13.171775817871094, -1.603973388671875, -2.140871047973633, 21.017961502075195, 42.33177185058594, -11.287765502929688, 8.231803894042969, 46.90791320800781, 26.152667999267578, 22.395143508911133, 37.606868743896484, -3.0109291076660156, 22.90965461730957, 1.8163833618164062, 20.89038848876953, 31.337257385253906, 9.72100830078125, -7.050424575805664, 50.88702392578125, 12.392730712890625, 6.059967041015625, 19.48950958251953, 33.679351806640625, 44.58977127075195, 21.108551025390625, 10.820615768432617, 25.751609802246094, -4.448963165283203, 5.435691833496094, 5.98602294921875, 22.94708251953125, 21.729106903076172, 33.712669372558594, -11.301929473876953, 12.1962890625, 17.503829956054688, 6.830959320068359, -2.8426246643066406, 13.22189712524414, 6.5874786376953125, 11.402091979980469, 7.480152130126953, 4.536903381347656, 25.704824447631836], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000408.npy"}
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 8.090204238891602, "std": 15.668961524963379, "min": -24.50537109375, "p10": -9.239849281311034, "median": 5.025417327880859, "p90": 31.995300292968754, "max": 49.86456298828125, "pos_frac": 0.6875, "sample": [0.6339912414550781, 0.9758987426757812, -5.880546569824219, -0.01320648193359375, 2.1698646545410156, -10.366531372070312, 13.056795120239258, -4.918628692626953, 4.968715667724609, -9.4801025390625, -1.2211952209472656, 34.44508361816406, 0.8108463287353516, 49.86456298828125, 15.56655502319336, 2.83953857421875, 24.497955322265625, -4.152345657348633, -7.042484283447266, 32.44164276123047, 30.953834533691406, -7.384189605712891, 21.161102294921875, 21.39521026611328, 7.607185363769531, -7.5947113037109375, 3.7484512329101562, 14.594802856445312, 14.725397109985352, -11.080902099609375, -5.155418395996094, 7.4530487060546875, 21.704856872558594, -9.619945526123047, 18.98479461669922, -0.6355743408203125, 26.968894958496094, 32.649688720703125, 10.25551986694336, 5.887393951416016, -6.7641143798828125, -7.312721252441406, 2.8625411987304688, -13.460500717163086, 10.057846069335938, 6.120246887207031, 3.5714950561523438, 6.163150787353516, -24.50537109375, 35.129974365234375, 2.3568878173828125, 24.041622161865234, 0.6142387390136719, 39.304359436035156, -22.596891403198242, 4.353546142578125, 19.071319580078125, 5.082118988037109, 17.301170349121094, 25.830947875976562, 37.85169982910156, -8.679258346557617, 19.76788330078125, 5.795036315917969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000409.npy"}
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 10.480886459350586, "std": 12.721647262573242, "min": -11.27947998046875, "p10": -5.874905395507812, "median": 9.06756591796875, "p90": 25.54590873718262, "max": 49.692108154296875, "pos_frac": 0.796875, "sample": [6.707805633544922, 7.11680793762207, 8.164947509765625, -2.177522659301758, 40.032806396484375, -11.27947998046875, 19.250736236572266, 11.91335678100586, -6.221099853515625, -10.478294372558594, 18.152904510498047, 13.722782135009766, 0.7690696716308594, -6.714694976806641, 19.70196533203125, 19.72673225402832, 24.190547943115234, -5.881263732910156, 19.557764053344727, 32.58826446533203, 37.62577819824219, 9.487712860107422, -3.1534194946289062, 19.607646942138672, 3.331787109375, 12.414985656738281, -7.44415283203125, 11.972297668457031, 11.555381774902344, -3.5993270874023438, 19.40064239501953, -1.5091934204101562, 5.626424789428711, 26.12677764892578, 4.055793762207031, 37.12639617919922, 16.238143920898438, 1.4424018859863281, 1.6165523529052734, 18.80938720703125, 2.5879592895507812, -7.7918853759765625, 11.225133895874023, 5.7239227294921875, 9.134132385253906, 17.477088928222656, 20.898059844970703, -0.14245223999023438, 5.664226531982422, 11.262069702148438, 7.055755615234375, 8.8277587890625, 49.692108154296875, 17.731582641601562, 3.3397750854492188, 9.000999450683594, 5.728534698486328, 1.3273677825927734, 3.4518203735351562, 21.941715240478516, 28.900970458984375, 13.452865600585938, -5.860069274902344, 10.571144104003906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000410.npy"}
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 11.987203598022461, "std": 19.1330509185791, "min": -27.536354064941406, "p10": -10.154494285583496, "median": 12.138092041015625, "p90": 37.88242454528809, "max": 65.95761108398438, "pos_frac": 0.65625, "sample": [-7.368186950683594, 30.12808609008789, 40.59687805175781, 38.84012985229492, 12.19537353515625, -0.9005966186523438, -22.301681518554688, -27.536354064941406, 13.308189392089844, -0.14007568359375, 18.95844268798828, 35.19743347167969, 18.948577880859375, 12.64339828491211, 37.62007141113281, 7.626945495605469, 20.900745391845703, -6.982551574707031, 20.954200744628906, 10.576118469238281, 18.228919982910156, 31.07605743408203, 28.775894165039062, 9.929712295532227, -6.966583251953125, 13.603073120117188, 1.3716144561767578, 26.185108184814453, -2.843128204345703, -22.144363403320312, -10.2432861328125, -9.94731330871582, -3.1624526977539062, 20.84222412109375, 11.618793487548828, 16.843486785888672, -0.9984760284423828, 46.5042724609375, -2.58673095703125, 65.95761108398438, 16.860504150390625, 12.080810546875, -7.4898529052734375, 29.037683486938477, 30.831092834472656, 40.096290588378906, -15.754112243652344, 14.968070983886719, -1.600067138671875, -10.876815795898438, -6.0198822021484375, -2.8598785400390625, 37.9948616027832, 29.715362548828125, 10.065057754516602, 22.405302047729492, 13.922365188598633, 7.689950942993164, 52.544803619384766, 10.266441345214844, -3.25213623046875, 0.5054988861083984, -15.11187744140625, 15.851974487304688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000411.npy"}
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 11.49658203125, "std": 15.10114574432373, "min": -9.475967407226562, "p10": -5.4015647888183596, "median": 9.063613891601562, "p90": 30.545468711853026, "max": 50.94059753417969, "pos_frac": 0.71875, "sample": [-8.244194030761719, 32.421051025390625, 9.172233581542969, 50.5350341796875, 10.286243438720703, 17.23000717163086, 14.605575561523438, 18.09671974182129, -6.032924652099609, 2.7171249389648438, 12.192319869995117, 1.335296630859375, 13.629714965820312, 49.07273864746094, -2.0211734771728516, -2.472412109375, 21.36614418029785, -9.475967407226562, 0.13344573974609375, 1.872222900390625, 27.193389892578125, 15.762863159179688, 4.688575744628906, 4.339164733886719, 3.7593841552734375, -4.9344329833984375, 15.231552124023438, 8.729557037353516, 20.35177993774414, 5.043701171875, 5.0836181640625, 29.190750122070312, 8.064388275146484, 50.94059753417969, -5.414054870605469, -2.1175079345703125, 12.52398681640625, -1.8761825561523438, 20.381269454956055, -3.1940841674804688, -5.106386184692383, -7.2337188720703125, 24.534698486328125, -6.607677459716797, 4.81591796875, 24.32781982421875, 23.787521362304688, 25.29853057861328, 46.73095703125, -2.284435272216797, 9.346000671386719, 18.591123580932617, 30.556255340576172, -5.3724212646484375, 8.954994201660156, 24.33453369140625, 9.351964950561523, 7.518001556396484, -6.766513824462891, 10.930099487304688, 30.520299911499023, -0.23802757263183594, 31.410804748535156, -1.7866058349609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000412.npy"}
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 11.076757431030273, "std": 16.961618423461914, "min": -23.9927978515625, "p10": -9.485821723937985, "median": 9.792779922485352, "p90": 30.850008392333994, "max": 60.22312927246094, "pos_frac": 0.78125, "sample": [1.3671035766601562, -11.194107055664062, 21.703262329101562, 8.868684768676758, 0.30666351318359375, 1.9647293090820312, 9.423820495605469, 45.01205825805664, -12.65492057800293, 12.877182006835938, -23.9927978515625, 18.68854522705078, 14.308082580566406, 6.365333557128906, -5.73956298828125, 7.3101654052734375, -3.546295166015625, 41.470428466796875, 0.18177032470703125, 11.370981216430664, 53.165679931640625, 11.322105407714844, 11.756324768066406, 4.752506256103516, 19.62176513671875, 8.141744613647461, 60.22312927246094, 33.23175048828125, 3.798023223876953, 12.029495239257812, 12.24566650390625, 14.389169692993164, 16.259307861328125, 14.666717529296875, 43.632484436035156, -10.85430908203125, 13.717857360839844, 6.7816314697265625, 19.55951690673828, 3.0399703979492188, -19.96196746826172, 22.89783477783203, 18.084609985351562, -0.5465774536132812, 19.4091796875, -6.292684555053711, -23.657546997070312, 3.2009201049804688, 8.60488510131836, 4.266262054443359, 2.42047119140625, -16.348114013671875, 27.1553955078125, -1.780984878540039, -1.5222549438476562, 28.18047332763672, 6.990734100341797, 28.173248291015625, 10.161739349365234, 23.989486694335938, -4.5932159423828125, 26.073898315429688, 31.994094848632812, 26.440967559814453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000413.npy"}
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 12.887630462646484, "std": 16.953462600708008, "min": -22.582046508789062, "p10": -5.581671524047851, "median": 12.043251991271973, "p90": 36.59991226196289, "max": 55.651466369628906, "pos_frac": 0.765625, "sample": [27.432748794555664, -8.354118347167969, 16.543243408203125, 22.902618408203125, -0.5449981689453125, 16.230005264282227, -6.064910888671875, 32.43941879272461, 9.018611907958984, 6.0344696044921875, 19.002365112304688, 14.621929168701172, 22.89447784423828, -10.64510726928711, 36.63697052001953, 23.006488800048828, -21.042007446289062, -17.19725799560547, 8.448806762695312, 14.470531463623047, 16.70616340637207, 8.476511001586914, 55.651466369628906, 5.605384826660156, 36.51344299316406, -2.78790283203125, 18.846233367919922, 16.197715759277344, 11.742362976074219, 25.567237854003906, 42.581905364990234, -1.754791259765625, 22.74382209777832, 0.185546875, 30.869903564453125, 12.480043411254883, 26.155601501464844, 16.37702178955078, 14.9356689453125, 23.56240463256836, 8.332672119140625, -5.7424163818359375, 2.6729736328125, 0.3706951141357422, 30.86907958984375, -2.5954742431640625, 3.8311233520507812, 18.456436157226562, -22.582046508789062, 3.7806320190429688, 42.5069580078125, -1.2846603393554688, -5.206600189208984, 4.360103607177734, 41.92048263549805, 55.236907958984375, 3.7424240112304688, 0.890869140625, 10.114124298095703, 12.344141006469727, -1.498260498046875, -1.7957096099853516, 0.3396644592285156, 39.25419616699219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000414.npy"}
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 11.723310470581055, "std": 18.46110725402832, "min": -31.868797302246094, "p10": -10.872764587402344, "median": 11.124650955200195, "p90": 31.189375686645516, "max": 64.3990478515625, "pos_frac": 0.765625, "sample": [-2.140819549560547, 32.00761795043945, 17.07274055480957, 22.80328369140625, 4.019184112548828, 12.426895141601562, 22.043914794921875, 43.91962432861328, -4.763641357421875, 29.28014373779297, 9.631370544433594, -14.545600891113281, 28.318077087402344, 28.183218002319336, -1.1328887939453125, 42.82695007324219, 59.1278076171875, 7.761440277099609, 22.893234252929688, -2.045063018798828, 1.78466796875, 13.853691101074219, 11.613555908203125, -0.4415740966796875, 15.473541259765625, 48.18134307861328, -17.497028350830078, 2.7959747314453125, 21.843528747558594, 24.28466796875, -11.125518798828125, 13.869255065917969, 12.606756210327148, -3.7185134887695312, -22.307403564453125, 64.3990478515625, 21.08275604248047, 28.72749137878418, 25.065773010253906, -10.283004760742188, 9.515464782714844, -21.541259765625, -12.112510681152344, -31.868797302246094, 12.66412353515625, 17.924041748046875, 1.7700691223144531, 5.139610290527344, 10.603874206542969, 3.0169239044189453, 0.3282661437988281, 10.860076904296875, 3.7672958374023438, 1.9480094909667969, 6.3577117919921875, 11.389225006103516, -0.6603031158447266, 17.23302459716797, 12.293838500976562, 0.11568450927734375, 4.394500732421875, 18.03113555908203, 42.19017791748047, 29.03516387939453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000415.npy"}
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 9.448211669921875, "std": 17.125043869018555, "min": -24.16961669921875, "p10": -9.550820541381833, "median": 8.237954139709473, "p90": 30.79271183013917, "max": 55.044708251953125, "pos_frac": 0.734375, "sample": [2.915975570678711, -24.16961669921875, -3.632373809814453, 20.336868286132812, 3.7224044799804688, 33.24982452392578, -4.8912200927734375, -7.4537353515625, 28.32430648803711, 42.09461212158203, 7.8814697265625, 44.920074462890625, 55.044708251953125, 4.701999664306641, 2.1406288146972656, 8.558040618896484, 4.552768707275391, 11.364107131958008, -21.818531036376953, -16.91973876953125, 11.065328598022461, 16.303945541381836, 15.514490127563477, 19.32726287841797, -2.170930862426758, 22.104278564453125, 19.074798583984375, 54.17775344848633, 40.45465087890625, 14.3250732421875, 7.917867660522461, 15.151863098144531, 4.0783843994140625, -18.490205764770508, 9.997499465942383, -1.70477294921875, -10.339454650878906, -17.314638137817383, 13.308462142944336, 5.169471740722656, -7.710674285888672, 14.514518737792969, -1.2068023681640625, 0.00945281982421875, 5.360694885253906, -4.462810516357422, 28.985422134399414, -23.157470703125, 0.2677745819091797, 18.951507568359375, 8.678764343261719, 7.371294021606445, -4.6951751708984375, 22.307113647460938, -6.192340850830078, 7.012842178344727, 2.2534332275390625, 9.444717407226562, 13.068161010742188, 13.676422119140625, 23.799392700195312, 25.477218627929688, 31.567264556884766, 10.49110221862793], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000416.npy"}
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 11.858535766601562, "std": 16.980268478393555, "min": -33.60475158691406, "p10": -5.647272109985352, "median": 12.870564460754395, "p90": 32.51467781066895, "max": 59.510826110839844, "pos_frac": 0.765625, "sample": [8.761701583862305, -3.9595870971679688, -2.1814498901367188, -33.60475158691406, 29.556396484375, 23.782005310058594, -29.404983520507812, 33.27330780029297, 13.226959228515625, 13.417747497558594, 57.48030090332031, 6.3263092041015625, 16.348480224609375, 15.576482772827148, 14.197433471679688, -25.035995483398438, 35.53620147705078, 9.790756225585938, -0.25276947021484375, 23.253639221191406, 16.47802734375, 15.752227783203125, 10.890874862670898, 5.3840484619140625, 12.377090454101562, -3.7654647827148438, 26.479610443115234, 9.497596740722656, 6.5316162109375, 17.757484436035156, -0.29739952087402344, 39.730796813964844, 39.39168930053711, -3.910919189453125, 10.312591552734375, -6.853267669677734, 0.3576812744140625, 3.1959915161132812, 21.77820587158203, 14.070655822753906, 33.906463623046875, 6.698417663574219, 23.361278533935547, 13.227523803710938, -6.578071594238281, 18.322402954101562, 12.514169692993164, 14.237785339355469, -5.830806732177734, 13.705673217773438, 5.830650329589844, 16.978382110595703, 1.521087646484375, -14.103057861328125, 16.25564956665039, -5.219024658203125, 15.192073822021484, 24.625003814697266, 5.775581359863281, -1.9202194213867188, 26.67478370666504, 12.267875671386719, 30.74454116821289, 59.510826110839844], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000417.npy"}
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 8.067527770996094, "std": 18.240373611450195, "min": -34.87109375, "p10": -12.536349105834958, "median": 7.923885345458984, "p90": 28.53851661682129, "max": 75.92607116699219, "pos_frac": 0.71875, "sample": [13.678825378417969, 25.62146759033203, 17.650314331054688, 8.6973876953125, 2.804708480834961, 16.154296875, 4.463382720947266, 18.898109436035156, 45.329769134521484, -10.424308776855469, 29.112289428710938, 43.72405242919922, -4.987495422363281, 15.771591186523438, 7.9232177734375, -13.441509246826172, 14.941879272460938, 1.1182479858398438, 33.7939338684082, 0.5952224731445312, 0.3386726379394531, 75.92607116699219, 8.415786743164062, 5.582923889160156, 23.678848266601562, 2.5441055297851562, -6.8311004638671875, 9.109073638916016, 7.811000823974609, 7.924552917480469, -34.87109375, -22.489959716796875, 28.81694793701172, 10.998579025268555, 40.761436462402344, -3.4904098510742188, 2.3472938537597656, -5.5401153564453125, 9.906951904296875, 0.6838607788085938, -27.675994873046875, -20.419105529785156, -21.37544059753418, -0.39841461181640625, 10.144552230834961, -9.28216552734375, 10.144065856933594, 11.521615982055664, -2.8406314849853516, 3.0601882934570312, 5.480018615722656, -2.298004150390625, 23.777305603027344, 26.02214813232422, 17.19582748413086, -7.712005615234375, -16.427406311035156, 17.45539093017578, 11.916166305541992, 27.888843536376953, 12.399978637695312, -3.7526473999023438, 14.653778076171875, 3.7949066162109375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000418.npy"}
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 13.080909729003906, "std": 15.46658992767334, "min": -15.682472229003906, "p10": -4.688501358032226, "median": 10.90286636352539, "p90": 35.81113929748536, "max": 50.26713562011719, "pos_frac": 0.78125, "sample": [-0.5478973388671875, 37.10215759277344, 20.02172088623047, 7.172706604003906, -12.87835693359375, 7.709308624267578, 36.93370056152344, 11.031402587890625, 0.8316268920898438, 6.201316833496094, -14.765495300292969, 32.72304916381836, 19.978683471679688, 16.894149780273438, -4.145057678222656, 2.218475341796875, 19.867355346679688, 10.774330139160156, 30.99646759033203, -0.47829437255859375, 8.073524475097656, 15.974861145019531, -4.921405792236328, 19.55279541015625, 5.3981170654296875, 2.8919296264648438, 29.37725830078125, -10.545791625976562, 21.739974975585938, 29.135147094726562, -1.7593765258789062, -10.718910217285156, -7.933589935302734, 37.095558166503906, 46.25990295410156, 19.821102142333984, 20.561450958251953, 9.75794792175293, 7.626155853271484, 18.782089233398438, 13.132123947143555, 37.67655944824219, 5.246150970458984, -3.6744155883789062, 33.86068344116211, 0.279449462890625, 27.807144165039062, 10.535343170166016, -0.8209152221679688, 50.26713562011719, 32.73953628540039, 1.0002269744873047, 14.886512756347656, 5.500968933105469, 19.055498123168945, 2.4937496185302734, 16.412094116210938, 18.475818634033203, 36.64704895019531, -2.469970703125, 11.401300430297852, 10.102615356445312, -15.682472229003906, 28.495956420898438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000419.npy"}
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 8.08056640625, "std": 18.125041961669922, "min": -41.719573974609375, "p10": -11.230328369140624, "median": 6.710079193115234, "p90": 33.3236255645752, "max": 53.75457000732422, "pos_frac": 0.671875, "sample": [-0.47197723388671875, -15.433162689208984, 33.94646453857422, -2.4924087524414062, 38.743309020996094, -19.583812713623047, 12.59808349609375, 4.130123138427734, 6.620704650878906, -35.663761138916016, -1.58697509765625, 11.4161376953125, 4.41082763671875, 19.30420684814453, -2.4922943115234375, 21.665939331054688, 44.11784362792969, 26.18383026123047, 45.772037506103516, 26.227455139160156, 3.635801315307617, -7.061305999755859, 0.9646549224853516, 13.367801666259766, 8.938383102416992, 31.87033462524414, -12.422651290893555, -22.83368492126465, 5.648061752319336, 6.255100250244141, -11.835784912109375, 9.229423522949219, 1.8178482055664062, 3.372570037841797, -6.3857269287109375, -6.001806259155273, 9.299934387207031, -8.076519012451172, 18.978294372558594, 13.777580261230469, -3.0898284912109375, -41.719573974609375, -6.932647705078125, 9.807662963867188, 34.81449890136719, 10.720827102661133, 10.810531616210938, 35.5067138671875, 6.7994537353515625, 9.7386474609375, 28.126304626464844, 9.886749267578125, 15.301753997802734, -1.6789169311523438, 2.226360321044922, 19.35354232788086, 28.37939453125, -0.7842330932617188, 18.443050384521484, 21.481521606445312, 3.7565078735351562, 53.75457000732422, -9.817596435546875, -7.679912567138672], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000420.npy"}
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 9.154892921447754, "std": 19.20836639404297, "min": -41.81043243408203, "p10": -15.624019813537595, "median": 8.380971908569336, "p90": 32.58107833862306, "max": 65.13922119140625, "pos_frac": 0.703125, "sample": [-22.760177612304688, 22.04193115234375, 0.04358100891113281, 19.609737396240234, 18.964279174804688, 20.17340087890625, 21.86145782470703, 0.1934661865234375, -1.7650833129882812, 65.13922119140625, 45.51246643066406, -12.024162292480469, 33.40153503417969, 14.409133911132812, -5.811038970947266, 6.195308685302734, 6.5667877197265625, -3.8430557250976562, 26.545372009277344, 11.792083740234375, 30.66667938232422, 3.6552886962890625, 2.135988235473633, -1.9578628540039062, 13.594545364379883, -2.9107093811035156, 25.11498260498047, -41.81043243408203, -24.795852661132812, -7.304584503173828, 11.893793106079102, 45.884857177734375, 34.703033447265625, 0.4456157684326172, 7.944694519042969, 24.49199676513672, 40.97572326660156, 26.38465118408203, 2.323894500732422, 2.412311553955078, 34.15391540527344, -1.0931892395019531, 5.7744293212890625, -16.874557495117188, -20.93950653076172, 16.433467864990234, -5.1055908203125, 11.956703186035156, -20.875579833984375, -1.5765457153320312, 21.8135986328125, 30.09003448486328, 1.5567207336425781, 11.913978576660156, 1.928802490234375, -22.8123779296875, 13.970954895019531, 13.893854141235352, 10.010974884033203, -12.706098556518555, 22.964805603027344, 27.416664123535156, 8.817249298095703, -4.894405364990234], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000421.npy"}
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 11.708234786987305, "std": 16.072280883789062, "min": -17.860000610351562, "p10": -6.376158905029296, "median": 10.925969123840332, "p90": 27.69589576721192, "max": 65.04898071289062, "pos_frac": 0.734375, "sample": [25.835662841796875, 51.89009094238281, -12.464960098266602, -0.5429515838623047, 18.81884765625, -4.9364471435546875, 21.530502319335938, -17.860000610351562, 8.420295715332031, 65.04898071289062, 12.134902954101562, 13.624441146850586, -4.1099700927734375, 19.659927368164062, 30.56366729736328, 5.393596649169922, 20.362838745117188, 28.281360626220703, 5.0696868896484375, 16.261444091796875, 16.54092788696289, -2.498443603515625, -6.748565673828125, 10.284992218017578, 8.502487182617188, 13.173095703125, 14.240394592285156, 23.249574661254883, 42.86763000488281, 12.525321960449219, 32.489044189453125, 20.62713623046875, 10.92348861694336, -7.034238815307617, 4.285865783691406, 9.485431671142578, 24.22681427001953, 3.1432552337646484, 26.329811096191406, -1.9040298461914062, -0.49161529541015625, -11.552745819091797, 21.666645050048828, 19.47793197631836, 17.9454345703125, -10.960441589355469, -5.507209777832031, 16.47937774658203, 0.590576171875, 17.319564819335938, -1.2610588073730469, 51.619903564453125, 6.577478408813477, -3.4806289672851562, 3.5717735290527344, 7.4709930419921875, 1.345327377319336, -16.651473999023438, 8.74539566040039, 22.753433227539062, 25.40777587890625, 10.928449630737305, -3.9533233642578125, 13.593591690063477], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000422.npy"}
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 12.502666473388672, "std": 17.169479370117188, "min": -22.27638053894043, "p10": -5.0023757934570305, "median": 9.361896514892578, "p90": 36.20961456298829, "max": 56.98857879638672, "pos_frac": 0.8125, "sample": [4.0301971435546875, -5.166648864746094, 21.798965454101562, -2.4889087677001953, 56.389705657958984, 0.4575786590576172, 2.9399147033691406, -20.788101196289062, 49.167327880859375, 3.64898681640625, -4.5068359375, 29.16948890686035, 9.340728759765625, -6.821708679199219, 4.783233642578125, 0.9941253662109375, -0.2237548828125, -15.52569580078125, 37.914093017578125, -9.228248596191406, 11.150623321533203, 11.092918395996094, -16.6199951171875, 1.1135883331298828, 7.237388610839844, 33.69218444824219, 31.45661163330078, 0.6071147918701172, -22.27638053894043, 5.92254638671875, 29.6273193359375, 38.382476806640625, 15.602813720703125, 15.138931274414062, 22.885011672973633, 19.915626525878906, 21.012466430664062, 22.017669677734375, 56.98857879638672, 30.572296142578125, 9.383064270019531, 6.501585006713867, 4.074989318847656, 15.35345458984375, 18.167808532714844, 30.655784606933594, 30.08114242553711, 7.934476852416992, 0.836273193359375, 14.561168670654297, 37.28851318359375, 12.862136840820312, 2.7028369903564453, 11.562925338745117, 15.722618103027344, 40.43836975097656, 7.140773773193359, 2.6375732421875, -4.619071960449219, 18.269882202148438, 23.977447509765625, -1.1290817260742188, 3.771503448486328, 0.5902385711669922], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000423.npy"}
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 13.203311920166016, "std": 16.543212890625, "min": -16.250404357910156, "p10": -4.294277191162109, "median": 8.037353515625, "p90": 36.907412719726565, "max": 49.85235595703125, "pos_frac": 0.8125, "sample": [-4.270111083984375, 33.170494079589844, 4.26568603515625, 28.11347007751465, -0.5983734130859375, 25.686384201049805, 49.85235595703125, 16.13665771484375, -11.996231079101562, -9.989105224609375, -4.304634094238281, 7.69117546081543, 2.4195003509521484, -15.317642211914062, 36.725929260253906, 7.4951171875, 23.29517364501953, -6.566352844238281, 10.477462768554688, 2.8190078735351562, -0.6330471038818359, 18.700450897216797, 4.4656982421875, 7.266090393066406, -3.54949951171875, 16.662395477294922, 40.1959228515625, 4.049285888671875, 48.855255126953125, 8.555398941040039, 20.325454711914062, 14.969970703125, 4.385536193847656, 6.608329772949219, 19.71344757080078, 1.529449462890625, 18.451583862304688, 47.046630859375, 19.356346130371094, 21.346328735351562, 4.5704345703125, 18.616649627685547, 8.159454345703125, 43.16124725341797, 10.101478576660156, 46.94733428955078, 30.902542114257812, 0.6239166259765625, -11.000297546386719, 0.4914836883544922, 31.914775848388672, 7.915252685546875, 14.305389404296875, 36.985191345214844, 1.9500808715820312, 4.231620788574219, 5.7168426513671875, -16.250404357910156, 35.824180603027344, 32.827571868896484, 1.4027023315429688, 18.18602752685547, -2.1715087890625, 6.1929473876953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000424.npy"}
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 10.847334861755371, "std": 16.557527542114258, "min": -20.427207946777344, "p10": -8.985363578796386, "median": 7.341073989868164, "p90": 35.715833282470705, "max": 58.137939453125, "pos_frac": 0.75, "sample": [23.689960479736328, 24.361305236816406, 10.476982116699219, 4.881553649902344, 34.778770446777344, -2.3165740966796875, 37.048614501953125, 0.8449478149414062, 13.999561309814453, 7.512378692626953, 15.201202392578125, -9.331254959106445, 14.071083068847656, -0.2763519287109375, -13.047801971435547, 17.575454711914062, -5.4857330322265625, 40.525970458984375, 7.169769287109375, 5.239994049072266, -11.818283081054688, 12.692520141601562, 0.864227294921875, 6.734933853149414, 20.823455810546875, -1.2761077880859375, 5.10321044921875, 36.556427001953125, -0.5106048583984375, 50.09889221191406, 32.0518798828125, -2.3153648376464844, 13.249435424804688, 4.7810516357421875, 16.219436645507812, 6.699897766113281, 0.8908233642578125, 46.89473342895508, 14.043148040771484, 7.078857421875, 18.88471221923828, -20.427207946777344, 11.430038452148438, 11.075897216796875, 13.618511199951172, -12.771188735961914, 7.566131591796875, 6.477504730224609, 8.536056518554688, 3.7351608276367188, 26.232097625732422, 5.173229217529297, -11.64207649230957, 31.97967529296875, -6.167091369628906, 4.820415496826172, 27.53885269165039, 12.919143676757812, -8.17828369140625, 58.137939453125, -6.204860687255859, 6.924823760986328, 36.117431640625, -17.329879760742188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000425.npy"}
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 6.944094657897949, "std": 17.232027053833008, "min": -50.43446350097656, "p10": -9.721532630920409, "median": 5.6012372970581055, "p90": 28.903417968750006, "max": 56.01939010620117, "pos_frac": 0.65625, "sample": [3.1590499877929688, -1.9842605590820312, 15.757888793945312, 27.890396118164062, 11.863693237304688, 23.435134887695312, 2.8203887939453125, 56.01939010620117, 6.981437683105469, 10.902229309082031, -14.257560729980469, -7.2064361572265625, 13.856632232666016, -16.04450225830078, 7.753082275390625, 4.699485778808594, 15.7755126953125, -5.93853759765625, 7.861726760864258, -8.353996276855469, -19.423866271972656, 4.2207794189453125, 2.9589385986328125, 3.85821533203125, -5.661186218261719, -1.5407257080078125, 7.90837287902832, -50.43446350097656, 19.593914031982422, 13.535881042480469, 16.638671875, 20.285673141479492, 15.74337387084961, -4.940412521362305, -18.579696655273438, 14.653751373291016, -5.8656005859375, 4.302377700805664, 3.2053680419921875, 40.92559814453125, 39.70745086669922, 40.36933898925781, -15.210498809814453, -10.1402587890625, -6.386039733886719, 44.7977180480957, -5.7670440673828125, 2.0249385833740234, -1.1955947875976562, -5.2808074951171875, 6.502988815307617, 29.337570190429688, 4.169090270996094, 25.19970703125, 7.4217529296875, -4.926811218261719, 10.881298065185547, 13.262367248535156, 11.3626708984375, -2.225128173828125, 12.0867919921875, 10.751483917236328, 30.047847747802734, -8.744504928588867], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000426.npy"}
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 11.549413681030273, "std": 17.87139129638672, "min": -28.024999618530273, "p10": -8.213608932495116, "median": 8.769298553466797, "p90": 35.24592514038086, "max": 73.82235717773438, "pos_frac": 0.71875, "sample": [17.52585220336914, 5.572744369506836, -4.538608551025391, -17.420875549316406, -2.4894256591796875, 18.492965698242188, 37.69731903076172, -3.586345672607422, 19.125167846679688, -8.5648193359375, 29.553749084472656, -5.500133514404297, -16.52776527404785, -7.723106384277344, 17.284698486328125, 21.402984619140625, 14.800041198730469, 7.235992431640625, -0.7053451538085938, 4.453529357910156, -0.6519947052001953, 5.222530364990234, 27.791122436523438, 18.013092041015625, 10.891998291015625, 25.34605598449707, 7.86883544921875, 24.490631103515625, 4.461067199707031, 7.088554382324219, 37.66926574707031, 7.55828857421875, 22.561248779296875, 6.307487487792969, 10.111442565917969, 35.73059844970703, -0.47011566162109375, -28.024999618530273, 49.209327697753906, 33.066749572753906, 8.294021606445312, -0.16021728515625, 13.55108642578125, -5.0067138671875, -8.423824310302734, -13.39616584777832, 21.87470245361328, 27.438217163085938, 9.244575500488281, -17.271968841552734, 41.07356262207031, 34.115020751953125, 5.703338623046875, 10.160232543945312, 14.54638671875, 7.6752471923828125, 7.3360443115234375, 13.32010269165039, 9.382186889648438, 4.5471954345703125, 10.142257690429688, -5.638847351074219, 73.82235717773438, 46.503868103027344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000427.npy"}
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 13.410066604614258, "std": 18.418210983276367, "min": -26.226043701171875, "p10": -6.100252532958984, "median": 9.352357864379883, "p90": 40.09566078186036, "max": 65.85110473632812, "pos_frac": 0.765625, "sample": [-26.226043701171875, 28.04662322998047, 29.59136199951172, -2.1073341369628906, 1.0373878479003906, 6.169059753417969, 0.4733924865722656, -14.611312866210938, -17.12322235107422, 7.514884948730469, 42.28791809082031, 0.02317047119140625, 10.920875549316406, -6.3251495361328125, 30.643829345703125, 12.590179443359375, 36.130523681640625, 5.883415222167969, -5.052825927734375, 6.116279602050781, 21.133453369140625, 9.06026840209961, -2.1954727172851562, 12.624946594238281, 22.359962463378906, 13.432727813720703, 8.838607788085938, -15.764663696289062, 32.03831481933594, -16.118881225585938, 29.080718994140625, -5.575492858886719, 6.37443733215332, 8.712779998779297, 19.30347442626953, 46.98204803466797, 44.71068572998047, 41.83728790283203, 10.192176818847656, -3.289041519165039, 28.048690795898438, 65.85110473632812, 14.224506378173828, 32.95738983154297, 4.848480224609375, 0.5025634765625, 26.88593292236328, 9.644447326660156, 2.9683570861816406, -6.683197021484375, 4.316864013671875, 7.060455322265625, -1.5175437927246094, 1.4292793273925781, 38.01994705200195, 40.985252380371094, 41.42787551879883, 29.1357364654541, 23.892929077148438, 20.66318130493164, 30.46319580078125, 19.382478713989258, -1.6326675415039062, -4.352312088012695], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000428.npy"}
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 11.226228713989258, "std": 16.778263092041016, "min": -29.32508087158203, "p10": -9.63193130493164, "median": 11.70235824584961, "p90": 33.374018096923834, "max": 57.3868408203125, "pos_frac": 0.78125, "sample": [24.798904418945312, -8.229354858398438, 0.7034282684326172, 12.853858947753906, 24.616371154785156, 13.056608200073242, 13.464191436767578, 23.32221221923828, -7.610897064208984, 5.7155914306640625, 6.511650085449219, -17.148361206054688, 5.667327880859375, 12.523735046386719, 5.005393981933594, 12.556961059570312, -29.32508087158203, 9.927459716796875, 6.84814453125, 14.521631240844727, 17.495460510253906, 5.502727508544922, 13.90771484375, 30.55181121826172, 34.31194305419922, 35.446380615234375, 39.50117492675781, 4.852710723876953, 16.452232360839844, -18.036720275878906, -9.256179809570312, -16.68624496459961, -0.4301300048828125, -11.596626281738281, 28.77156639099121, 2.8828582763671875, -9.361724853515625, 23.381813049316406, -3.5073928833007812, 1.1199226379394531, 31.867233276367188, 34.21483612060547, 5.5993804931640625, 28.32440757751465, 34.01978302001953, -16.927841186523438, -9.747734069824219, 9.474750518798828, 2.9486541748046875, 16.623340606689453, 1.596527099609375, 43.59455871582031, 18.038864135742188, -7.188785552978516, 10.8809814453125, 13.072463989257812, 31.547786712646484, 9.222759246826172, 57.3868408203125, 22.8922119140625, 8.360969543457031, 17.390228271484375, 13.910530090332031, 26.2928466796875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000429.npy"}
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 13.269466400146484, "std": 16.117151260375977, "min": -26.943878173828125, "p10": -2.080908393859862, "median": 10.841221809387207, "p90": 32.27755813598633, "max": 55.41780090332031, "pos_frac": 0.84375, "sample": [3.491058349609375, 6.10951042175293, -3.4630203247070312, 8.718711853027344, -8.723648071289062, 29.423667907714844, 0.4205970764160156, 48.32991409301758, 6.234048843383789, 14.201896667480469, -10.819091796875, 28.63507080078125, 32.57572937011719, 5.959201812744141, 45.139713287353516, 11.868919372558594, 21.626449584960938, -0.2646636962890625, 40.40906524658203, 2.479555130004883, 4.4372100830078125, 55.41780090332031, 8.994613647460938, 6.9195556640625, -0.7651576995849609, 40.246986389160156, 11.063343048095703, 28.79143524169922, 28.441680908203125, 18.7879638671875, 10.619100570678711, 24.617767333984375, 13.463882446289062, 16.516189575195312, -2.540433883666992, 51.535518646240234, 31.581825256347656, 17.042823791503906, -13.414997100830078, 24.25188636779785, 12.717269897460938, 17.80315399169922, 5.700105667114258, 3.795337677001953, 10.608322143554688, 18.986621856689453, 0.5706634521484375, 0.7441692352294922, -26.943878173828125, 1.7661361694335938, 0.502288818359375, 17.447187423706055, 0.7747955322265625, 20.461402893066406, -1.0086822509765625, 25.96564483642578, 23.7132568359375, 17.92278480529785, 17.382598876953125, 18.884872436523438, 8.095565795898438, -16.409597396850586, 4.1906890869140625, 7.2134246826171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000430.npy"}
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 11.012140274047852, "std": 16.39493751525879, "min": -26.756683349609375, "p10": -4.931252861022948, "median": 8.405977249145508, "p90": 34.0040512084961, "max": 58.003475189208984, "pos_frac": 0.765625, "sample": [-26.756683349609375, 16.968534469604492, 8.855880737304688, 4.770870208740234, 2.131296157836914, 27.010406494140625, 41.05596923828125, 13.517013549804688, 18.353424072265625, -2.3168468475341797, 0.5695018768310547, 51.20918655395508, 17.526756286621094, -1.960723876953125, 7.383964538574219, 12.364896774291992, 50.31908416748047, -3.4921131134033203, 27.96002197265625, 8.365524291992188, 8.422584533691406, -12.918861389160156, -1.2600784301757812, 34.94684600830078, -9.609146118164062, 42.8948974609375, 32.44812774658203, -7.040288925170898, 5.714637756347656, -2.555004119873047, 3.8301315307617188, 9.228780746459961, 18.373428344726562, 25.756988525390625, -5.548027038574219, 3.5324325561523438, 34.670875549316406, 6.987152099609375, -19.528793334960938, -6.092765808105469, 0.69793701171875, 4.757942199707031, -2.47381591796875, 17.526992797851562, 0.4304618835449219, 4.927104949951172, 17.18740463256836, 10.082931518554688, 10.647146224975586, 58.003475189208984, 2.7285614013671875, 13.874378204345703, -2.91644287109375, 14.583271026611328, 12.591751098632812, 8.38936996459961, 7.547275543212891, -0.4850883483886719, 8.784503936767578, 28.061058044433594, 31.323272705078125, 9.850618362426758, 11.953384399414062, 0.6136474609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000431.npy"}
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 11.383286476135254, "std": 15.239411354064941, "min": -23.849594116210938, "p10": -4.345723724365234, "median": 9.97502326965332, "p90": 32.57412109375, "max": 48.34893798828125, "pos_frac": 0.71875, "sample": [-3.239410400390625, -7.572305679321289, 0.970458984375, -23.849594116210938, 2.670513153076172, 7.420867919921875, -4.701042175292969, 13.398666381835938, -4.050224304199219, 35.14894104003906, 10.71120834350586, -21.53527069091797, 0.948822021484375, 11.834197998046875, 23.1707763671875, -9.365814208984375, 10.210731506347656, -4.461700439453125, 34.15484619140625, 33.209381103515625, -1.4159393310546875, 19.42388153076172, 31.13349151611328, 5.63458251953125, -2.0162715911865234, 26.3739013671875, 7.495033264160156, 32.39833068847656, 21.48321533203125, -2.022672653198242, -2.8791275024414062, 5.776950836181641, 3.9478797912597656, -5.882133483886719, 21.10186767578125, 8.350421905517578, 36.71379852294922, 20.713706970214844, 11.806800842285156, 26.674774169921875, -0.7827110290527344, 9.739315032958984, 21.58667755126953, 2.5420074462890625, 22.278839111328125, 5.5210113525390625, -0.030332565307617188, 31.519311904907227, 0.19836807250976562, 48.34893798828125, 18.381216049194336, 14.46893310546875, 22.72174072265625, 26.415878295898438, 21.583541870117188, 43.163841247558594, -4.075111389160156, 17.620441436767578, 16.99462890625, -2.7589569091796875, -2.4335708618164062, 32.64945983886719, 0.038055419921875, 12.952281951904297], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000432.npy"}
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 14.394730567932129, "std": 18.94524574279785, "min": -28.216812133789062, "p10": -5.860817909240723, "median": 10.149866104125977, "p90": 43.65784721374512, "max": 60.39257049560547, "pos_frac": 0.78125, "sample": [6.016834259033203, 20.685623168945312, -1.4010200500488281, 10.667469024658203, 7.907806396484375, 5.779388427734375, -0.12555503845214844, 28.3218994140625, -2.031932830810547, 13.615478515625, 5.754219055175781, 7.3936309814453125, 9.63226318359375, -4.62994384765625, 51.29206085205078, -18.28453826904297, 13.043235778808594, 30.423328399658203, 23.61022186279297, -8.072620391845703, 30.424888610839844, 4.453826904296875, 4.657554626464844, 21.649324417114258, -28.216812133789062, 3.8635330200195312, 24.782684326171875, 1.2245330810546875, 29.46455192565918, 23.331249237060547, 2.3284873962402344, 20.672080993652344, -5.509828567504883, 6.9784393310546875, 15.723915100097656, 1.6807708740234375, -7.934856414794922, 10.700202941894531, 14.38677978515625, 60.39257049560547, 47.253662109375, 43.982688903808594, 46.51244354248047, 6.0078125, 31.549148559570312, -19.170166015625, 3.5386486053466797, 27.61407470703125, 3.1534042358398438, 42.89988327026367, -0.8043479919433594, 16.481666564941406, 12.938613891601562, -8.436145782470703, -0.7131576538085938, 33.808433532714844, 25.795862197875977, 4.4781494140625, 35.42908477783203, 41.57887268066406, 44.8763427734375, 0.16718292236328125, 53.68010711669922, -6.011241912841797], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000433.npy"}
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 13.04580307006836, "std": 18.712421417236328, "min": -35.261566162109375, "p10": -9.313558959960938, "median": 13.664172172546387, "p90": 37.54263305664063, "max": 58.63273620605469, "pos_frac": 0.78125, "sample": [-29.57030487060547, 12.7608642578125, 19.729555130004883, 1.2776947021484375, 16.852386474609375, 38.324764251708984, -9.373786926269531, 29.437294006347656, -11.219749450683594, 8.698898315429688, -0.4730510711669922, -8.665390014648438, -35.261566162109375, 16.095962524414062, 14.310188293457031, 0.97149658203125, 29.171207427978516, 15.048614501953125, 36.453155517578125, 40.04977035522461, 10.533843994140625, -0.9385738372802734, 43.51521301269531, 2.7090225219726562, -14.874711990356445, 19.49486541748047, 41.64569854736328, 17.387313842773438, 4.1551055908203125, 17.21103858947754, 3.8936691284179688, 32.019779205322266, 11.505165100097656, 57.570499420166016, 58.63273620605469, 27.468605041503906, -9.173027038574219, 20.10735321044922, 26.994873046875, -2.6086502075195312, 35.971431732177734, -15.033477783203125, 9.100992202758789, 14.74935531616211, 6.040885925292969, 3.272491455078125, 2.1979122161865234, 28.12049102783203, 21.655059814453125, 15.40379524230957, 1.7376384735107422, 38.009552001953125, -18.961101531982422, 1.1732597351074219, -0.5794143676757812, 10.898811340332031, 26.869915008544922, 29.633506774902344, 13.018156051635742, -6.420619964599609, 15.346342086791992, 22.93596649169922, 18.683435440063477, 9.239192962646484], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000434.npy"}
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 11.029098510742188, "std": 15.44383430480957, "min": -23.396583557128906, "p10": -6.244901657104491, "median": 7.88001823425293, "p90": 32.27672271728515, "max": 51.1470832824707, "pos_frac": 0.78125, "sample": [37.284324645996094, 8.15765380859375, 0.8006782531738281, 6.925825119018555, -14.067075729370117, 1.0563278198242188, 4.7718658447265625, 2.8852615356445312, 25.51648712158203, -1.8198089599609375, 14.71500015258789, 32.12153625488281, -9.301025390625, 23.271072387695312, 29.73334312438965, 24.470176696777344, 4.214836120605469, -6.6353912353515625, -1.2589187622070312, 32.81626892089844, 14.737518310546875, 3.6597824096679688, 24.101547241210938, 37.03485107421875, 5.3569488525390625, 12.832862854003906, 0.2902984619140625, 4.6863250732421875, 39.627445220947266, 27.905479431152344, 2.7690277099609375, 30.125823974609375, -20.309036254882812, 14.117725372314453, 14.282062530517578, 22.412395477294922, 5.179969787597656, -3.9599609375, 14.875509262084961, 23.48553466796875, 17.426986694335938, 34.71864318847656, 51.1470832824707, 23.250015258789062, 0.019023895263671875, 6.721181869506836, 7.808071136474609, -1.8939838409423828, -5.333759307861328, 32.343231201171875, -8.976375579833984, 7.95196533203125, -3.091329574584961, 8.443756103515625, 10.45611572265625, 11.68533706665039, 26.61236572265625, -13.789421081542969, -23.396583557128906, 1.5480804443359375, -1.81109619140625, 5.999969482421875, 22.68375015258789, 6.4687347412109375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000435.npy"}
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 11.414917945861816, "std": 16.167926788330078, "min": -21.5560359954834, "p10": -10.671215057373047, "median": 13.740028381347656, "p90": 32.4072982788086, "max": 44.288970947265625, "pos_frac": 0.75, "sample": [15.756332397460938, 3.1001853942871094, 42.51878356933594, 4.555870056152344, 15.790822982788086, 2.9926834106445312, 11.24361801147461, 26.40789794921875, -9.885322570800781, 10.00393295288086, 17.982391357421875, 34.13410949707031, 5.474037170410156, 21.029815673828125, 1.5248565673828125, 33.175819396972656, 32.79484558105469, 38.2874755859375, -14.502143859863281, 21.51610565185547, 13.300457000732422, 15.388418197631836, -11.008026123046875, 3.5285263061523438, 10.535921096801758, 1.3379135131835938, 31.503021240234375, -2.119840621948242, 23.650238037109375, 23.38390350341797, 20.82467269897461, -6.701019287109375, -21.5560359954834, 3.9424285888671875, -3.675050735473633, 7.8565826416015625, 22.235183715820312, -3.1534175872802734, -0.81707763671875, -6.1318359375, 23.487548828125, 16.139442443847656, 14.17959976196289, 0.3970184326171875, -13.141998291015625, 15.094383239746094, -8.489166259765625, 44.288970947265625, 15.330810546875, 31.01239013671875, 17.20220184326172, 1.4660110473632812, 31.322547912597656, 15.189220428466797, -3.1739883422851562, 21.90548324584961, 31.39177703857422, -11.295799255371094, 16.842933654785156, 1.3690872192382812, -21.337448120117188, 40.09352111816406, -13.277229309082031, 28.330368041992188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000436.npy"}
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 11.10509204864502, "std": 20.066211700439453, "min": -42.04798889160156, "p10": -11.179859924316407, "median": 9.052999496459961, "p90": 38.693909072876004, "max": 67.34501647949219, "pos_frac": 0.71875, "sample": [52.77838134765625, 31.93011474609375, 27.170881271362305, 14.097000122070312, 4.719566345214844, -13.815563201904297, 28.057342529296875, 20.222305297851562, 67.34501647949219, 6.313488006591797, 25.871261596679688, 2.6491546630859375, -42.04798889160156, 15.496566772460938, -7.882453918457031, -8.1517333984375, 14.15960693359375, 10.108997344970703, 13.963115692138672, -11.716865539550781, 53.027008056640625, 52.140411376953125, 41.35761642456055, -5.949047088623047, -11.193805694580078, 10.034210205078125, 26.68609619140625, -11.147319793701172, -19.198280334472656, -12.69708251953125, -4.3851776123046875, -6.030647277832031, 0.6250534057617188, -2.0047836303710938, 12.086219787597656, 44.250022888183594, 14.00970458984375, -8.340606689453125, 11.410728454589844, 7.645578384399414, 6.016838073730469, 0.9697017669677734, -6.604190826416016, 10.917991638183594, 11.215995788574219, 30.613895416259766, 7.388542175292969, 23.346986770629883, 24.115081787109375, -10.59027099609375, 6.625165939331055, 30.249645233154297, 15.50531005859375, 32.47859191894531, 18.643325805664062, 8.071788787841797, 3.35064697265625, 19.790313720703125, -18.60662841796875, 1.0149040222167969, -1.0045700073242188, 7.4629364013671875, 1.8053359985351562, 44.354461669921875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000437.npy"}
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 13.601827621459961, "std": 17.554845809936523, "min": -21.661149978637695, "p10": -7.694459152221679, "median": 11.492088317871094, "p90": 36.54086799621582, "max": 58.2178955078125, "pos_frac": 0.78125, "sample": [46.76048278808594, -4.6899566650390625, 58.2178955078125, -19.717926025390625, 0.8556671142578125, -7.923286437988281, 0.037281036376953125, 16.281360626220703, 1.4948062896728516, 39.78099060058594, 13.615528106689453, 44.57587432861328, -7.362651824951172, 1.0260086059570312, 24.43738555908203, 0.9690132141113281, 24.968534469604492, -21.661149978637695, 19.379234313964844, 14.250459671020508, 33.42610168457031, 24.797195434570312, -7.836662292480469, 14.576034545898438, 15.295997619628906, 22.62921714782715, 15.593696594238281, 2.202770233154297, 36.429813385009766, 35.96659851074219, 22.249000549316406, -1.171661376953125, 36.588462829589844, 2.558431625366211, -3.3246726989746094, 23.94274139404297, 25.504928588867188, -9.777708053588867, 11.561729431152344, 20.51698112487793, 32.16093444824219, -1.4499664306640625, 8.869512557983398, 39.775115966796875, 32.47743225097656, 6.9246063232421875, 21.00428009033203, 6.5746307373046875, -2.6761856079101562, 5.550434112548828, -0.7988395690917969, 1.5733070373535156, 8.955377578735352, 30.954307556152344, 30.520309448242188, -16.33063507080078, 39.32200622558594, 5.0615997314453125, -15.090560913085938, 32.77965545654297, 11.367351531982422, 4.30705451965332, 11.422447204589844, 10.238243103027344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000438.npy"}
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 12.401932716369629, "std": 15.488285064697266, "min": -29.616058349609375, "p10": -4.942845726013183, "median": 10.18343734741211, "p90": 36.889393997192386, "max": 43.70783996582031, "pos_frac": 0.828125, "sample": [12.012222290039062, -0.5825843811035156, 4.376121520996094, -15.52154541015625, 17.852508544921875, 20.770999908447266, 8.897375106811523, 1.7878570556640625, 5.719146728515625, 30.869041442871094, 41.82189178466797, 2.932455062866211, 37.56654357910156, 11.490432739257812, 15.704238891601562, 31.099761962890625, 43.102386474609375, 0.9175872802734375, 23.809188842773438, 17.277679443359375, 1.1235237121582031, 20.23259162902832, 14.124870300292969, -6.490898132324219, 23.414798736572266, 10.28192138671875, 43.70783996582031, -5.981441497802734, 3.698587417602539, 36.393123626708984, -5.140886306762695, 37.102081298828125, -4.480751037597656, -0.48883056640625, 20.653648376464844, 12.059669494628906, 3.0086593627929688, 8.237724304199219, -29.616058349609375, 1.3973674774169922, -14.907333374023438, 8.458450317382812, 26.4931640625, 40.65458679199219, 12.907892227172852, 29.185653686523438, 9.969345092773438, 42.26957702636719, 5.731407165527344, 11.758892059326172, 7.846088409423828, 2.5785980224609375, 8.659805297851562, 19.023849487304688, 13.309045791625977, 32.061553955078125, 8.5924072265625, 5.6296844482421875, -10.390419006347656, -3.0739364624023438, 10.17434310913086, 17.735549926757812, 10.19253158569336, 3.7221450805664062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000439.npy"}
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 13.061777114868164, "std": 17.0676326751709, "min": -28.62900733947754, "p10": -7.474452400207518, "median": 10.597187042236328, "p90": 38.257954406738286, "max": 53.06910705566406, "pos_frac": 0.796875, "sample": [-3.7294559478759766, 13.661666870117188, 31.538307189941406, 44.749267578125, 38.71324157714844, 53.06910705566406, 40.2156867980957, 0.5083236694335938, 35.007102966308594, 8.220108032226562, 15.180122375488281, 45.9559326171875, 5.955909729003906, 24.641067504882812, 16.486461639404297, 1.2397079467773438, 26.595380783081055, -9.329437255859375, -8.399223327636719, 9.37973403930664, 10.481689453125, -0.7866420745849609, 3.2370147705078125, 14.608993530273438, 13.6990966796875, 19.73600196838379, 9.082679748535156, 14.042739868164062, 3.1782054901123047, 6.424507141113281, 5.193996429443359, 12.395606994628906, 18.8724365234375, 4.839389801025391, 2.048788070678711, 4.872062683105469, 7.193037033081055, -28.62900733947754, 11.792516708374023, 37.19561767578125, 24.012409210205078, 34.18448257446289, 3.0284271240234375, 33.63288116455078, -21.383502960205078, 2.3494186401367188, 36.16630554199219, 10.393257141113281, 20.07439422607422, -1.8569412231445312, -2.3901710510253906, -7.913625717163086, 19.530670166015625, -6.449714660644531, 11.840116500854492, 5.605503082275391, -0.9778518676757812, -9.128763198852539, 22.54672622680664, -10.823272705078125, 10.712684631347656, 42.20491409301758, 44.42336654663086, 17.034278869628906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000440.npy"}
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 14.157617568969727, "std": 14.425634384155273, "min": -14.146900177001953, "p10": -2.6059970855712873, "median": 12.478530883789062, "p90": 33.784091567993165, "max": 49.34771728515625, "pos_frac": 0.859375, "sample": [16.338979721069336, 15.197210311889648, 33.36603927612305, 25.454055786132812, -1.0112876892089844, 9.441457748413086, 26.0125789642334, 2.4467849731445312, 16.210601806640625, 2.081634521484375, -8.90347671508789, 30.74274444580078, 4.9161529541015625, 38.073829650878906, 4.500265121459961, 6.40283203125, 15.472938537597656, 9.994598388671875, -14.146900177001953, 43.83152770996094, 5.0707550048828125, 20.176124572753906, 33.9632568359375, 41.68791198730469, 7.1102294921875, 49.34771728515625, 5.740226745605469, 7.88226318359375, 43.87223815917969, 1.91473388671875, 2.8407058715820312, 13.76690673828125, -5.882242202758789, 24.124210357666016, 22.077991485595703, -3.2894439697265625, -0.519775390625, 32.196807861328125, 24.831388473510742, 6.984230041503906, 18.27166748046875, 30.55487823486328, 3.207977294921875, 9.028148651123047, 23.738327026367188, 14.1641845703125, 18.16973876953125, 3.54681396484375, 11.190155029296875, -4.275299072265625, 23.759902954101562, 2.3842411041259766, 31.36060333251953, 36.31150817871094, -7.459936141967773, 15.565727233886719, 6.521766662597656, 14.253250122070312, 4.617988586425781, 10.616714477539062, -10.921775817871094, 16.74932098388672, 20.0097599029541, 4.403036117553711], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000441.npy"}
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 12.878868103027344, "std": 15.978835105895996, "min": -22.074302673339844, "p10": -5.302020454406738, "median": 11.335844039916992, "p90": 30.340856933593752, "max": 50.956642150878906, "pos_frac": 0.78125, "sample": [4.1454010009765625, 29.486312866210938, 13.294540405273438, -5.5537109375, 10.003904342651367, 49.589263916015625, 43.863739013671875, 39.63399124145508, 17.115947723388672, 26.72655487060547, 18.078411102294922, 20.621780395507812, 29.574249267578125, -1.643035888671875, 22.5863037109375, 17.451095581054688, -1.4158401489257812, 2.14410400390625, 6.4385223388671875, -6.127992630004883, -3.8359222412109375, 24.906814575195312, 16.572952270507812, 1.6919326782226562, 30.669403076171875, 24.254470825195312, -22.074302673339844, 15.48828125, 0.6107463836669922, 25.256179809570312, 11.125350952148438, 12.442161560058594, 16.573699951171875, 29.24773406982422, 18.371551513671875, 6.433324813842773, -8.994094848632812, 6.666744232177734, 50.956642150878906, 27.77185821533203, 8.488580703735352, -3.2876319885253906, 5.60826301574707, 10.195880889892578, -2.455354690551758, -16.103408813476562, -3.1267852783203125, 44.127723693847656, -13.445526123046875, 4.7369232177734375, 26.270217895507812, 19.20318603515625, -19.057327270507812, 8.447906494140625, 6.5494537353515625, -4.714742660522461, 9.1016845703125, 5.826446533203125, 6.69427490234375, 28.705768585205078, 23.289627075195312, 33.53019332885742, 11.546337127685547, 13.966733932495117], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000442.npy"}
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 17.049985885620117, "std": 19.538246154785156, "min": -25.432464599609375, "p10": -1.7225959777832032, "median": 14.475960731506348, "p90": 40.7094955444336, "max": 84.18496704101562, "pos_frac": 0.84375, "sample": [0.7293128967285156, 28.894989013671875, 21.437850952148438, 32.084228515625, 7.305961608886719, 26.0338134765625, 17.74053955078125, -20.179241180419922, 12.307220458984375, 16.183090209960938, 2.51641845703125, -5.516763687133789, 6.955318450927734, 26.196081161499023, 4.913066864013672, 29.24103546142578, 1.926473617553711, 16.70185089111328, 12.343856811523438, 24.45789337158203, 84.18496704101562, -24.685962677001953, 8.621711730957031, 31.275299072265625, -1.1486797332763672, 53.882713317871094, 43.59795379638672, 8.980951309204102, 50.962974548339844, 39.444820404052734, 22.180809020996094, 12.583690643310547, 13.57322883605957, 24.4950008392334, 29.214752197265625, 33.50531768798828, 30.946773529052734, 43.876792907714844, 34.31661605834961, 35.76667785644531, 5.282161712646484, 29.443674087524414, 0.34145164489746094, 19.1419677734375, 41.25149917602539, 36.21918487548828, -3.5565872192382812, 8.969284057617188, 7.8777313232421875, 1.895151138305664, -0.08500289916992188, -13.770538330078125, 18.27813720703125, 15.378692626953125, 6.262676239013672, 2.116943359375, -25.432464599609375, -1.73077392578125, 43.71369171142578, 10.945459365844727, 0.5816440582275391, -1.7035140991210938, 12.668289184570312, 39.26091003417969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000443.npy"}
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 6.2789082527160645, "std": 16.91016387939453, "min": -30.369892120361328, "p10": -14.902021598815917, "median": 4.391252517700195, "p90": 28.418047904968265, "max": 46.60418701171875, "pos_frac": 0.640625, "sample": [23.750938415527344, 2.5734596252441406, -6.321388244628906, 1.9106464385986328, 26.9517822265625, 16.009075164794922, 19.273893356323242, -13.685958862304688, -3.4159584045410156, 28.972768783569336, -4.431068420410156, 26.699234008789062, -10.282890319824219, -20.52233123779297, 10.713577270507812, 20.410934448242188, 7.96807861328125, 13.516780853271484, 4.971044540405273, -18.92572021484375, 35.901458740234375, 22.896011352539062, 3.051401138305664, 1.3102188110351562, -5.30401611328125, 5.186500549316406, 1.0289344787597656, -5.639654159545898, -14.913925170898438, -7.9298858642578125, 27.749481201171875, 3.978900909423828, 14.797096252441406, 10.655250549316406, 4.8036041259765625, 23.998748779296875, 0.15960311889648438, 20.80063247680664, -3.936901092529297, -30.369892120361328, 7.8355865478515625, 46.60418701171875, -28.68378448486328, -16.347084045410156, -3.436056137084961, 28.70457649230957, 14.626129150390625, 38.045166015625, 30.062793731689453, 8.775810241699219, 11.523500442504883, 5.589179992675781, 0.6418972015380859, -0.7961654663085938, 18.164764404296875, -6.282007217407227, -6.773593902587891, 2.158048629760742, 13.306983947753906, -14.874246597290039, -21.21465301513672, 43.25897216796875, -2.149158477783203, -1.2512016296386719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000444.npy"}
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 12.55510425567627, "std": 15.95542049407959, "min": -22.917678833007812, "p10": -9.037141609191893, "median": 8.600522994995117, "p90": 35.99800109863281, "max": 45.24876403808594, "pos_frac": 0.8125, "sample": [21.492084503173828, 42.60015869140625, 8.257530212402344, 3.4337921142578125, -12.637886047363281, 6.29693603515625, 32.492835998535156, 8.861053466796875, 9.155258178710938, 8.726787567138672, -0.5772247314453125, -14.017074584960938, -2.8195648193359375, 8.3658447265625, -13.583419799804688, 25.850801467895508, -7.6734161376953125, 1.0922622680664062, 2.2404327392578125, 8.402965545654297, 0.29609107971191406, 30.988067626953125, -16.155662536621094, 9.356193542480469, -22.917678833007812, 34.65052795410156, 11.374099731445312, 36.198524475097656, -0.02483367919921875, 3.7549896240234375, -13.36800765991211, -9.62159538269043, 4.328361511230469, 36.030799865722656, 45.24876403808594, 12.873260498046875, 22.256378173828125, 40.38774108886719, 8.119132995605469, 5.641944885253906, 26.31909942626953, 26.70086669921875, 19.389873504638672, 36.58473205566406, 21.255781173706055, 29.37765884399414, 29.920455932617188, 4.556800842285156, 19.117280960083008, -2.951059341430664, 7.259746551513672, 6.012474060058594, 5.883697509765625, 7.681205749511719, 19.13617706298828, 8.474258422851562, 35.921470642089844, 26.276443481445312, 6.978208541870117, 11.576934814453125, 18.412086486816406, 18.88821792602539, 38.79795455932617, 6.579093933105469], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000445.npy"}
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 16.696086883544922, "std": 15.527929306030273, "min": -10.554668426513672, "p10": -2.6996234893798827, "median": 17.978670120239258, "p90": 36.81727867126465, "max": 58.93597412109375, "pos_frac": 0.8125, "sample": [2.2041587829589844, -1.8650741577148438, -10.554668426513672, 24.5592041015625, 12.858879089355469, 24.287460327148438, 8.396675109863281, -2.8090438842773438, 58.93597412109375, 34.55467987060547, 13.098129272460938, 18.564361572265625, 19.5791015625, 22.4068603515625, 24.45441436767578, 48.39747619628906, 18.46184539794922, 21.55756378173828, 31.3465576171875, 18.248390197753906, -0.5200920104980469, 20.525238037109375, -3.357513427734375, 20.875137329101562, 42.36906814575195, 17.70895004272461, 4.262348175048828, 25.71436309814453, 37.20286560058594, -5.056159973144531, 21.31231117248535, 3.0799503326416016, -3.2431716918945312, -0.6988754272460938, -2.358671188354492, 2.9902706146240234, 2.390787124633789, -4.229267120361328, 19.105667114257812, 13.002235412597656, 12.139060974121094, 47.447425842285156, 35.91757583618164, 20.10896110534668, 40.66304016113281, 0.4528827667236328, 16.489700317382812, 34.634620666503906, -2.4443092346191406, 5.818183898925781, 24.62334442138672, 5.824455261230469, 21.053558349609375, 35.10395812988281, 33.18419647216797, 43.38357162475586, 6.910869598388672, 14.306610107421875, 2.4352035522460938, 28.75262451171875, -3.3809375762939453, 9.219131469726562, 20.721595764160156, 17.425750732421875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000446.npy"}
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 15.832550048828125, "std": 17.726058959960938, "min": -45.367942810058594, "p10": -7.847317504882812, "median": 16.876541137695312, "p90": 40.59753341674806, "max": 49.48729705810547, "pos_frac": 0.828125, "sample": [43.657012939453125, 26.662368774414062, 23.5294246673584, 6.0023040771484375, 2.9374923706054688, -4.919677734375, 36.19457244873047, -8.042434692382812, -14.774024963378906, 4.736968994140625, 15.770263671875, 20.921791076660156, 7.092201232910156, 15.67840576171875, 20.834617614746094, 45.092376708984375, 10.440628051757812, 19.049999237060547, 12.39617919921875, 14.498416900634766, 8.782032012939453, -11.495529174804688, 7.721832275390625, -7.3920440673828125, 5.753692626953125, 33.729217529296875, 38.11396789550781, 24.097137451171875, 19.917724609375, 7.279541015625, 24.352706909179688, 27.05352210998535, 20.200410842895508, 35.674102783203125, 22.934528350830078, 20.67523956298828, 41.66191864013672, 17.774734497070312, 44.309295654296875, -0.12460517883300781, 21.496498107910156, 9.238525390625, 29.042800903320312, -9.705574035644531, -1.4793586730957031, -12.479644775390625, 3.9243545532226562, 46.130218505859375, 15.978347778320312, 21.055421829223633, 0.8302154541015625, 34.4158935546875, -8.620716094970703, 35.865989685058594, 20.25518798828125, 14.464157104492188, 19.24944305419922, -45.367942810058594, 10.534936904907227, 49.48729705810547, 42.20320129394531, 2.6875076293945312, 15.402862548828125, 19.895160675048828], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000447.npy"}
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 12.858972549438477, "std": 16.862838745117188, "min": -35.53771209716797, "p10": -5.737940216064453, "median": 13.138538360595703, "p90": 31.891468811035157, "max": 56.36105728149414, "pos_frac": 0.765625, "sample": [24.791419982910156, -21.7547550201416, 36.46985626220703, 6.716407775878906, 20.778221130371094, 13.761032104492188, -0.8535346984863281, 18.116455078125, 18.800949096679688, -0.408782958984375, 30.713104248046875, 31.979263305664062, 3.624237060546875, 3.644012451171875, 18.70465850830078, 26.56304931640625, -5.571252822875977, 39.74138641357422, 20.654312133789062, 6.403125762939453, -1.464324951171875, 56.36105728149414, 11.499839782714844, 14.640487670898438, 16.735700607299805, 0.9698295593261719, 30.994735717773438, 9.394756317138672, 12.516044616699219, -4.095306396484375, -5.809377670288086, -0.57244873046875, 7.920768737792969, 3.4710769653320312, 3.4046096801757812, 19.763614654541016, -7.1642608642578125, 19.858049392700195, 31.686614990234375, -7.048654556274414, 17.390310287475586, -35.53771209716797, 16.781906127929688, 5.111328125, 16.993179321289062, 20.32581329345703, 20.649337768554688, 30.707672119140625, 22.89272117614746, 3.7277984619140625, -5.891059875488281, 6.6469268798828125, 10.969985961914062, 15.3206787109375, -1.3056793212890625, 38.835357666015625, -1.8406600952148438, -20.03148651123047, 41.35209274291992, 28.955848693847656, 23.712615966796875, 4.855541229248047, 2.7802810668945312, 53.635459899902344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000448.npy"}
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 8.030338287353516, "std": 14.956350326538086, "min": -18.305957794189453, "p10": -9.935299682617186, "median": 5.740688323974609, "p90": 28.77194519042971, "max": 52.849422454833984, "pos_frac": 0.6875, "sample": [1.0507545471191406, 15.60382080078125, -14.734527587890625, 7.1412811279296875, -4.629554748535156, -5.077705383300781, 3.7153854370117188, 13.139671325683594, 1.620361328125, 31.24498748779297, 12.819953918457031, 7.82635498046875, -0.1928253173828125, -10.934652328491211, -4.036888122558594, 11.7957763671875, 52.849422454833984, 6.001930236816406, 8.642776489257812, 12.382816314697266, -1.9302940368652344, 11.62440299987793, 38.79960632324219, 46.69021224975586, -16.748859405517578, 10.00674819946289, 15.844703674316406, 3.6879119873046875, 14.562904357910156, 1.68585205078125, 12.502700805664062, -4.6860809326171875, 10.083686828613281, -18.305957794189453, 32.776084899902344, 5.4794464111328125, 44.80424499511719, -1.3290138244628906, 2.336772918701172, 14.038692474365234, 4.449934005737305, -2.8233489990234375, -1.3042182922363281, 14.769271850585938, -5.785640716552734, 22.514507293701172, -10.032180786132812, 18.3624267578125, 4.335796356201172, 7.862907409667969, 20.295516967773438, -1.5877342224121094, 4.8076171875, 8.282127380371094, -10.06134033203125, 31.23638916015625, 22.017135620117188, -13.853561401367188, 18.266407012939453, 0.22826766967773438, 23.021575927734375, -9.709243774414062, 3.4572525024414062, -2.9610595703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000449.npy"}
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 9.53365421295166, "std": 16.06702995300293, "min": -22.89324378967285, "p10": -8.458848571777343, "median": 8.2041015625, "p90": 31.77786712646485, "max": 49.610313415527344, "pos_frac": 0.703125, "sample": [-0.18372726440429688, -1.5964279174804688, -8.152053833007812, -4.080707550048828, 32.433197021484375, 33.79009246826172, 11.683601379394531, 7.935394287109375, 23.239139556884766, 49.610313415527344, 22.15064239501953, 5.602405548095703, 18.611818313598633, 1.8587379455566406, 0.4835662841796875, 39.391273498535156, -0.6343307495117188, 5.072414398193359, 16.957298278808594, 47.87823486328125, -1.5916175842285156, 14.73843002319336, 3.1855545043945312, -7.5055694580078125, 7.253936767578125, -9.447185516357422, 19.357948303222656, 25.134384155273438, 10.150009155273438, -12.147802352905273, 3.066558837890625, 25.745025634765625, 4.0147552490234375, 14.702585220336914, -22.4793643951416, 30.248764038085938, 21.32935333251953, -0.47777748107910156, 22.0382080078125, 10.073688507080078, 9.412662506103516, 24.82524871826172, 9.307306289672852, -22.89324378967285, 10.377433776855469, 4.472620010375977, 15.472156524658203, 8.472808837890625, 12.9630126953125, -6.703887939453125, 3.4336490631103516, -11.896537780761719, 2.8691158294677734, 11.108732223510742, 41.568634033203125, 10.735824584960938, -8.59033203125, 16.616472244262695, -6.077247619628906, 44.12630081176758, -7.037040710449219, 6.119117736816406, -15.997390747070312, -1.9722824096679688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000450.npy"}
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 10.744665145874023, "std": 16.462692260742188, "min": -21.25342559814453, "p10": -9.477867889404296, "median": 8.777328491210938, "p90": 32.60694885253907, "max": 53.64723205566406, "pos_frac": 0.71875, "sample": [28.50650405883789, 5.6722869873046875, 11.918670654296875, 5.7444000244140625, 31.470260620117188, 8.238494873046875, -18.192481994628906, -1.7954254150390625, 9.846742630004883, 20.614219665527344, -0.6609954833984375, 9.289688110351562, 26.812667846679688, 10.046356201171875, 26.938899993896484, 8.264968872070312, 17.39557647705078, 15.3472900390625, 12.92359733581543, 21.097679138183594, 2.6152572631835938, -21.25342559814453, -6.478008270263672, 44.81071472167969, 24.50928497314453, -13.506446838378906, 33.09410095214844, -19.389572143554688, -8.162467956542969, -0.22145652770996094, 16.007551193237305, 33.2176513671875, -2.3386001586914062, 3.0024852752685547, 16.77138900756836, 8.242057800292969, 40.14393997192383, 16.337196350097656, 0.5333232879638672, 12.498374938964844, 0.8682670593261719, 26.529220581054688, 2.9231796264648438, -0.17418670654296875, 19.572656631469727, -1.488983154296875, 7.0521240234375, 46.25605010986328, 42.92561340332031, 7.87689208984375, 17.214126586914062, 11.898368835449219, 10.317230224609375, -10.041610717773438, -2.831867218017578, -2.5084762573242188, 27.557897567749023, 7.581993103027344, -2.593647003173828, 20.912105560302734, 53.64723205566406, -14.46786880493164, -12.613319396972656, 1.3327560424804688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000451.npy"}
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 12.153406143188477, "std": 17.451637268066406, "min": -31.9078369140625, "p10": -9.15684356689453, "median": 12.443544387817383, "p90": 35.82003402709962, "max": 52.74658966064453, "pos_frac": 0.734375, "sample": [11.666984558105469, -9.837867736816406, 6.453863143920898, 29.441818237304688, 3.2001495361328125, 21.17247772216797, 12.799854278564453, 32.30372619628906, 28.801555633544922, 36.30573272705078, 11.48178482055664, 36.23915100097656, -27.10693359375, 17.35675048828125, 37.248558044433594, 15.41583251953125, 45.44905090332031, 15.744094848632812, 20.862668991088867, 25.890411376953125, 3.6534576416015625, -3.3828582763671875, 26.298614501953125, 15.403144836425781, 1.4434089660644531, -4.49250602722168, 21.93462371826172, 8.651748657226562, -0.0140533447265625, 3.4441146850585938, -1.0413036346435547, 52.74658966064453, -13.886337280273438, 27.392803192138672, -7.567787170410156, 25.25274658203125, 10.775993347167969, 13.224006652832031, -14.255149841308594, 5.873687744140625, 14.280815124511719, 0.618988037109375, 31.158119201660156, 12.087234497070312, 21.01995849609375, -16.606435775756836, 43.48339080810547, 38.704307556152344, 19.695846557617188, -0.7653007507324219, 15.616592407226562, -4.295694351196289, 10.306529998779297, 17.985591888427734, 1.1343727111816406, -11.155036926269531, -3.4440078735351562, 19.22217559814453, 25.643033981323242, -3.0871734619140625, -4.817924499511719, 34.84209442138672, -31.9078369140625, 5.7537384033203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000452.npy"}
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 12.20389175415039, "std": 18.58200454711914, "min": -24.44535255432129, "p10": -10.56380157470703, "median": 8.189815521240234, "p90": 38.181015014648445, "max": 56.866668701171875, "pos_frac": 0.734375, "sample": [14.511390686035156, 19.530141830444336, -6.6640472412109375, 0.095062255859375, 47.87858581542969, 7.889518737792969, 2.9510269165039062, 18.31037139892578, 24.914016723632812, -0.6160736083984375, -24.44535255432129, 3.9940948486328125, 25.549026489257812, -3.8666229248046875, 35.58916473388672, 36.990013122558594, 38.997894287109375, 4.5770111083984375, 32.46283721923828, 15.14522933959961, 2.2037391662597656, -7.347412109375, 12.6195068359375, 7.576194763183594, 0.46538543701171875, -16.313392639160156, -11.335151672363281, 34.011322021484375, -10.64938735961914, 23.41911506652832, 50.790618896484375, 5.411327362060547, 22.70428466796875, -19.98217010498047, -3.5785179138183594, 8.4901123046875, 40.123748779296875, 3.598846435546875, -10.172286987304688, 56.866668701171875, 6.001502990722656, 44.89472961425781, -6.113033294677734, -0.8199043273925781, 0.269287109375, 18.346084594726562, -10.36410140991211, 7.858997344970703, 17.00739288330078, 13.382369995117188, 16.371166229248047, -1.8582572937011719, 6.541654586791992, 19.577892303466797, 16.063262939453125, -12.10848617553711, -13.747013092041016, 33.93046569824219, 26.3548583984375, 9.3385009765625, 38.691444396972656, 27.939573287963867, 35.853355407714844, 4.9415435791015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000453.npy"}
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 13.397918701171875, "std": 19.262699127197266, "min": -53.036590576171875, "p10": -7.674168777465817, "median": 10.385528564453125, "p90": 38.50834579467774, "max": 52.23622131347656, "pos_frac": 0.796875, "sample": [38.71625518798828, -13.142036437988281, -2.9844932556152344, 32.967872619628906, -53.036590576171875, -4.255334854125977, 28.247081756591797, 2.0762405395507812, 9.377204895019531, 6.2975311279296875, 18.365459442138672, 39.59027862548828, 10.102554321289062, -24.01190948486328, 32.644859313964844, 9.950691223144531, 40.03197479248047, 30.743738174438477, 19.16008758544922, 10.668502807617188, -1.6903724670410156, -20.11199951171875, -0.7278976440429688, 32.689788818359375, 36.794227600097656, 50.3155517578125, -9.139383316040039, 6.7112274169921875, 6.312156677246094, -0.6044540405273438, 1.61053466796875, 13.146995544433594, 38.95564270019531, 3.4579315185546875, 12.628204345703125, 7.490955352783203, 52.23622131347656, -12.628547668457031, 50.44524383544922, 6.410268783569336, 9.621585845947266, 15.839469909667969, 0.8063507080078125, 1.0441408157348633, 4.536895751953125, 5.484893798828125, 33.936519622802734, 32.7393798828125, 4.753023147583008, 15.628814697265625, 17.997787475585938, 26.668609619140625, 22.539730072021484, 38.023223876953125, 16.974903106689453, 3.3459548950195312, 14.834196090698242, 28.704971313476562, 12.228721618652344, 33.414154052734375, 20.7222900390625, 2.331829071044922, -1.3491859436035156, -9.173795700073242], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000454.npy"}
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 13.600913047790527, "std": 16.987104415893555, "min": -23.19029998779297, "p10": -4.526636505126952, "median": 12.33854866027832, "p90": 37.415682601928715, "max": 61.3260498046875, "pos_frac": 0.75, "sample": [25.883880615234375, 8.57135009765625, 18.21251678466797, 31.303466796875, 18.00603485107422, -2.637643814086914, 41.40617370605469, 43.60920333862305, -8.034980773925781, 6.151878356933594, 8.126007080078125, -12.147392272949219, 13.795158386230469, 12.44412612915039, 21.303543090820312, 61.3260498046875, -2.9663619995117188, -2.9577178955078125, 10.335182189941406, -5.970634460449219, 36.8634033203125, 2.4088306427001953, 38.58914566040039, -0.5816383361816406, -4.960319519042969, 27.9754638671875, 7.5797119140625, 19.702911376953125, 25.30352783203125, 0.26377105712890625, 13.636577606201172, 4.668962478637695, 31.765235900878906, 6.446800231933594, 17.549108505249023, 4.10577392578125, -23.19029998779297, 17.239105224609375, 26.91009521484375, 37.56488037109375, -6.628139495849609, 1.6800765991210938, 16.636154174804688, -11.892303466796875, 37.06755447387695, 25.396896362304688, 13.602607727050781, 4.657745361328125, 12.23297119140625, -3.51470947265625, 57.22378921508789, 19.58716583251953, -0.2959480285644531, 20.829471588134766, -2.6418304443359375, 6.810020446777344, 22.04836654663086, -1.6059188842773438, -3.4893341064453125, 7.902732849121094, 5.077526092529297, 17.33053970336914, 13.179306030273438, 43.66278076171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000455.npy"}
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 11.877501487731934, "std": 15.861017227172852, "min": -25.70849609375, "p10": -7.563338470458984, "median": 10.247974395751953, "p90": 34.05810546875001, "max": 50.8736572265625, "pos_frac": 0.75, "sample": [22.998458862304688, -7.044624328613281, -0.4616222381591797, 19.42572021484375, 50.8736572265625, -4.694103240966797, -1.0065155029296875, 17.20376205444336, -0.5047111511230469, 37.715576171875, 16.08147430419922, 11.916767120361328, 31.598312377929688, 39.352256774902344, 16.878067016601562, 10.092887878417969, -0.4324798583984375, 15.381172180175781, 8.99268913269043, 19.868980407714844, -7.78564453125, 6.8877716064453125, 8.895606994628906, -1.7205810546875, 24.857112884521484, 0.8252964019775391, -17.061248779296875, -13.007034301757812, 15.922897338867188, -8.060134887695312, 6.261569976806641, 8.958541870117188, 9.295488357543945, 15.886940002441406, 16.413848876953125, 21.173030853271484, 6.3263702392578125, 4.496368408203125, 16.546714782714844, 2.267040252685547, -9.235677719116211, 1.5525169372558594, 4.495929718017578, 10.403060913085938, -9.80229377746582, -25.70849609375, 32.78277587890625, 17.02678680419922, -2.4580917358398438, -2.1193008422851562, 31.46227264404297, 50.24663162231445, 1.4809837341308594, 34.60467529296875, 12.773269653320312, 30.160377502441406, 26.793075561523438, 6.6122283935546875, 14.779136657714844, 41.782005310058594, 17.078001022338867, 3.4163970947265625, 13.840484619140625, 36.57765197753906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000456.npy"}
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 11.140571594238281, "std": 15.522226333618164, "min": -32.09644317626953, "p10": -5.693784141540526, "median": 8.396897315979004, "p90": 33.29672393798829, "max": 46.18226623535156, "pos_frac": 0.78125, "sample": [9.60565185546875, -11.330299377441406, 1.9784259796142578, 17.031734466552734, 17.65410804748535, -10.805171966552734, 12.715930938720703, 7.0503997802734375, -0.9062576293945312, 3.5978164672851562, 3.8071441650390625, 34.04779052734375, 17.18332290649414, 31.421714782714844, -8.56304931640625, -4.0187835693359375, 29.6912841796875, 7.143455505371094, 1.1890487670898438, 20.606788635253906, 43.663604736328125, 19.03960609436035, 46.18226623535156, 0.0826873779296875, -2.779897689819336, 13.642248153686523, -6.273736953735352, 28.014354705810547, -32.09644317626953, 5.947299957275391, -4.3405609130859375, -16.090660095214844, 3.6402969360351562, 19.932266235351562, 8.18661880493164, -2.8874435424804688, -12.357025146484375, 19.18292236328125, 4.305397033691406, 9.954994201660156, 26.313461303710938, 1.311126708984375, 30.776613235473633, 34.8692512512207, -2.5479202270507812, 8.305274963378906, 21.491317749023438, -1.8078651428222656, 13.825355529785156, 35.6877555847168, 26.410926818847656, 8.488519668579102, 36.563812255859375, 15.642885208129883, 14.671792984008789, 39.48700714111328, 31.544235229492188, 4.6993865966796875, 10.210630416870117, 2.6655616760253906, 13.769378662109375, 4.536102294921875, 7.427450180053711, 4.604681015014648], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000457.npy"}
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 12.686552047729492, "std": 17.763362884521484, "min": -36.2091064453125, "p10": -8.234415626525875, "median": 10.957904815673828, "p90": 37.83159942626954, "max": 49.607093811035156, "pos_frac": 0.78125, "sample": [7.854877471923828, 23.45562744140625, 3.9692134857177734, 28.964828491210938, 15.774435043334961, 26.83358383178711, 9.823188781738281, -36.2091064453125, -0.9505825042724609, 3.5275650024414062, -4.153709411621094, -12.291748046875, -26.25018310546875, 24.8330078125, 32.45854187011719, 2.775632858276367, 17.05449676513672, 25.114274978637695, 8.621776580810547, 12.558395385742188, 9.644647598266602, 18.866783142089844, 20.575702667236328, -3.5125732421875, 30.744293212890625, 1.0343551635742188, 22.658828735351562, 17.251371383666992, 6.1584930419921875, 27.41252899169922, 29.47577667236328, 5.667333602905273, 5.750179290771484, -11.549125671386719, 12.434402465820312, -1.3245086669921875, 7.418083190917969, 7.264488220214844, 18.77191162109375, 46.4784049987793, 18.327314376831055, 5.687908172607422, -23.542022705078125, -1.485626220703125, 38.690185546875, 4.054450988769531, 12.817695617675781, -3.3302230834960938, 49.607093811035156, 41.483543395996094, -9.98328971862793, 41.13983154296875, 26.478302001953125, 17.33738136291504, -11.272314071655273, 47.748779296875, 5.582542419433594, 35.82823181152344, 1.99560546875, -1.28045654296875, 8.76174545288086, 12.092620849609375, 42.79169464111328, 17.4228515625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000458.npy"}
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 14.696706771850586, "std": 17.60372543334961, "min": -17.558456420898438, "p10": -5.820374298095703, "median": 14.806396484375, "p90": 41.43838424682618, "max": 55.730552673339844, "pos_frac": 0.8125, "sample": [15.834831237792969, 28.952138900756836, -13.787094116210938, 13.556442260742188, -3.8846435546875, -11.369110107421875, 10.136758804321289, 34.49786376953125, 10.942550659179688, 18.93918800354004, 21.388641357421875, -5.194248199462891, 22.034351348876953, -5.477653503417969, 49.209102630615234, 55.33085632324219, 15.109146118164062, 12.171722412109375, 10.702217102050781, 18.031909942626953, 21.789146423339844, 19.977798461914062, 7.6782989501953125, 3.9280624389648438, 5.407489776611328, 16.98175811767578, 5.308441162109375, 20.220657348632812, 17.835002899169922, 32.21990966796875, 46.98588562011719, 2.9495391845703125, -5.967254638671875, 8.369230270385742, 3.0914077758789062, 16.06504249572754, 17.29058074951172, -13.686561584472656, 29.472763061523438, -7.575143814086914, 2.489412307739258, 24.494705200195312, 28.182353973388672, 4.598968505859375, 39.07060241699219, -4.5226593017578125, 22.013404846191406, 0.39871978759765625, -17.558456420898438, 42.453147888183594, 1.085968017578125, 5.768894195556641, -15.597387313842773, 24.185272216796875, 15.117176055908203, -5.074981689453125, 12.584281921386719, 14.503646850585938, 49.329689025878906, 31.069839477539062, 45.51393127441406, 2.3554229736328125, 55.730552673339844, 16.92969512939453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000459.npy"}
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 16.898448944091797, "std": 18.722877502441406, "min": -23.386581420898438, "p10": -4.125006103515624, "median": 11.13183307647705, "p90": 45.404123306274414, "max": 56.13604736328125, "pos_frac": 0.828125, "sample": [35.2774658203125, 27.19061279296875, 56.13604736328125, 5.7187652587890625, 9.898414611816406, 52.841285705566406, 48.454681396484375, 8.786445617675781, 45.57707214355469, 0.119537353515625, -2.5232086181640625, 6.384143829345703, 12.611831665039062, 9.237586975097656, 4.393684387207031, 1.7942581176757812, 13.87452507019043, 27.833236694335938, -4.526679992675781, 49.061546325683594, 4.349443435668945, 49.260292053222656, 16.56932830810547, 31.427566528320312, 40.871856689453125, 23.61469268798828, 17.985851287841797, 7.1648712158203125, 31.101360321044922, 10.796411514282227, 16.677825927734375, 44.27168273925781, 28.488666534423828, 9.795570373535156, 1.8188095092773438, 11.467254638671875, 8.373678207397461, 1.56402587890625, 8.25152587890625, 5.23785400390625, 36.3357048034668, 26.306365966796875, -3.1877670288085938, 0.351470947265625, 18.309246063232422, 45.00057601928711, -7.572700500488281, -10.34539794921875, 17.55694580078125, -23.386581420898438, -2.8785972595214844, 10.621917724609375, 32.32268524169922, 3.8558483123779297, -8.426498413085938, 32.765472412109375, 4.826530456542969, 42.70050048828125, -5.426860809326172, -9.271575927734375, 26.698272705078125, 29.581329345703125, -0.6361770629882812, 48.17024230957031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000460.npy"}
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 14.643134117126465, "std": 17.512413024902344, "min": -18.684432983398438, "p10": -6.502378654479981, "median": 13.231189727783203, "p90": 40.13083724975587, "max": 56.26780700683594, "pos_frac": 0.765625, "sample": [26.32135009765625, 12.422134399414062, 6.397087097167969, 11.948026657104492, 42.10901641845703, 41.43617248535156, 15.632904052734375, 23.292190551757812, 25.054542541503906, -6.410396575927734, -18.684432983398438, 12.960845947265625, 19.228897094726562, 5.979526519775391, 23.739646911621094, 5.824241638183594, -9.262954711914062, 41.034400939941406, 3.5898685455322266, 6.470754623413086, 14.567756652832031, 35.12813949584961, 24.975730895996094, 19.120330810546875, 7.513263702392578, 21.845365524291992, -13.175590515136719, -1.033029556274414, -4.905082702636719, 12.21949577331543, 19.155540466308594, -5.548454284667969, -13.380136489868164, 14.070573806762695, 16.792072296142578, 28.6077880859375, 48.611724853515625, 15.0498046875, 4.9336700439453125, 56.26780700683594, 21.380661010742188, 38.02252197265625, 28.981372833251953, 11.073196411132812, 3.9495792388916016, -2.7640228271484375, 7.254535675048828, -6.541799545288086, 26.436384201049805, 13.501533508300781, 24.380462646484375, 23.416553497314453, 4.433368682861328, 2.878406524658203, 31.98260498046875, -4.4539947509765625, 50.632545471191406, -9.586109161376953, -3.246530532836914, -9.603050231933594, 36.58564758300781, 6.6411590576171875, 53.253028869628906, -1.3480911254882812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000461.npy"}
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 16.53244400024414, "std": 17.539810180664062, "min": -24.430130004882812, "p10": -3.586432456970214, "median": 17.992252349853516, "p90": 39.25400314331055, "max": 72.51116943359375, "pos_frac": 0.828125, "sample": [39.553680419921875, 45.58578872680664, 41.941776275634766, 22.810523986816406, -2.3684463500976562, 11.453983306884766, 25.556312561035156, -0.8287315368652344, 3.4370193481445312, 1.1535186767578125, -4.002593994140625, 46.85776901245117, 35.35126495361328, 13.997695922851562, 26.083160400390625, 40.68971252441406, 1.253509521484375, 32.11985778808594, 10.227745056152344, 25.46131134033203, 18.336139678955078, 11.009696960449219, 17.89947509765625, 18.08502960205078, 19.4443359375, 3.8698196411132812, 20.06692886352539, 33.334388732910156, 3.9651222229003906, 22.278038024902344, 6.355255126953125, -4.2613525390625, 19.030548095703125, 26.17891502380371, 38.55475616455078, -13.86767578125, -4.520353317260742, 26.94945526123047, 22.395105361938477, 8.276348114013672, 33.32604217529297, 29.348175048828125, 20.59777069091797, 6.8105926513671875, 14.853530883789062, 30.752086639404297, 5.154945373535156, 14.59286880493164, 26.969738006591797, 23.272647857666016, 9.101310729980469, -24.430130004882812, 26.92082977294922, 41.39033508300781, 29.44428253173828, -0.01422882080078125, 4.438749313354492, 72.51116943359375, 10.901657104492188, 1.5469303131103516, -22.70849609375, -5.1230010986328125, -2.615388870239258, 1.3190498352050781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000462.npy"}
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 11.770732879638672, "std": 14.391351699829102, "min": -24.952316284179688, "p10": -4.081017875671386, "median": 9.097328186035156, "p90": 30.16078853607178, "max": 49.704193115234375, "pos_frac": 0.8125, "sample": [1.8134269714355469, 25.399246215820312, 28.46668243408203, -24.952316284179688, 30.646072387695312, 18.117176055908203, -3.1856021881103516, 42.23833084106445, 15.601371765136719, 12.090164184570312, -7.21649169921875, 0.1996002197265625, 6.311576843261719, 26.322898864746094, 29.013092041015625, 0.0319366455078125, 0.48586273193359375, 5.103118896484375, -0.4387798309326172, -2.9287261962890625, 4.15955924987793, 3.015838623046875, 4.1494140625, 40.02538299560547, 17.966020584106445, 15.543792724609375, 8.3427734375, 9.328651428222656, -8.596717834472656, -2.702411651611328, 8.656593322753906, 7.720512390136719, 29.3756103515625, 30.49729347229004, 5.189563751220703, 26.92292022705078, 22.16490936279297, 5.908073425292969, 4.478446960449219, 34.12355041503906, 14.997020721435547, 8.866004943847656, 5.7989501953125, 1.8545989990234375, -7.078998565673828, -4.4647674560546875, 49.704193115234375, 14.571990966796875, 31.1754150390625, 24.069961547851562, 14.607379913330078, -16.87151336669922, 4.425315856933594, -2.588644027709961, 11.976852416992188, 23.865264892578125, 21.490188598632812, 14.282974243164062, 20.653656005859375, 17.148948669433594, -4.845817565917969, 16.409461975097656, 0.9100360870361328, 22.980022430419922], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000463.npy"}
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 13.240883827209473, "std": 18.812244415283203, "min": -34.21308135986328, "p10": -7.881236267089844, "median": 10.60164213180542, "p90": 37.60732421875, "max": 51.131927490234375, "pos_frac": 0.765625, "sample": [-2.5330276489257812, 6.4748382568359375, 32.293331146240234, -14.896232604980469, 32.906166076660156, 1.4949111938476562, -4.0273895263671875, 33.88422393798828, 23.31576156616211, 15.172271728515625, 28.627777099609375, -7.397651672363281, 30.882301330566406, 17.09919548034668, 9.502342224121094, -5.0872802734375, 37.645835876464844, 8.836071014404297, 12.703865051269531, 19.612037658691406, 7.7152252197265625, 36.15559387207031, -7.929414749145508, 15.631309509277344, 3.8523941040039062, 27.998046875, 7.209831237792969, 44.5116081237793, 27.14642906188965, 22.89830780029297, 6.5898284912109375, 3.7815933227539062, 38.035701751708984, 37.90119934082031, -21.565811157226562, 7.78948974609375, 37.51746368408203, 18.382705688476562, -15.7254638671875, -7.971103668212891, 44.40215301513672, 11.652420043945312, 8.339717864990234, 34.10900115966797, -28.309234619140625, 2.9136276245117188, 3.6740798950195312, 9.550864219665527, -5.156158447265625, 41.93000793457031, 16.868877410888672, 35.84144592285156, -34.21308135986328, -3.7685546875, 6.155494689941406, -7.768819808959961, 8.41619873046875, 37.320831298828125, 15.199970245361328, 51.131927490234375, -6.1196136474609375, 18.20673942565918, 19.88287353515625, 0.7215423583984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000464.npy"}
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 12.024148941040039, "std": 18.360727310180664, "min": -19.731067657470703, "p10": -10.317832946777344, "median": 8.513896942138672, "p90": 40.18538208007813, "max": 48.08074951171875, "pos_frac": 0.71875, "sample": [-16.461463928222656, 31.518661499023438, 19.59082794189453, -7.4391937255859375, 29.576431274414062, 19.34563446044922, -2.935546875, 45.43516540527344, -11.514022827148438, 19.699386596679688, 4.070671081542969, 34.20246124267578, 30.64322853088379, 43.84779357910156, -12.171098709106445, 39.5599365234375, -10.511421203613281, 41.05817413330078, 28.7711181640625, 6.933448791503906, 3.0340576171875, -14.814826965332031, 26.59684944152832, 45.52876281738281, 27.712570190429688, 40.45343017578125, -5.726024627685547, -9.866127014160156, -6.044471740722656, 4.82530403137207, 3.633819580078125, -9.583549499511719, 10.519500732421875, 18.11688995361328, 17.38111114501953, 3.1961669921875, -9.039060592651367, -3.130687713623047, -3.344165802001953, 18.94406509399414, -19.542850494384766, 6.309577941894531, 8.247169494628906, 8.780624389648438, -7.210868835449219, -19.731067657470703, 1.0611000061035156, 6.915922164916992, 13.460960388183594, 46.929203033447266, 1.1668472290039062, 30.92803955078125, 5.008995056152344, 31.654502868652344, 5.732666015625, -4.091669082641602, 11.029983520507812, 7.987390518188477, 17.751708984375, 14.789833068847656, 48.08074951171875, 12.257364273071289, 25.712814331054688, 24.702789306640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000465.npy"}
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 7.929244518280029, "std": 15.411398887634277, "min": -29.728759765625, "p10": -9.6297945022583, "median": 8.171971321105957, "p90": 28.06944732666016, "max": 45.8765869140625, "pos_frac": 0.734375, "sample": [-4.788482666015625, 26.753860473632812, -23.778411865234375, -22.336288452148438, 13.80482292175293, 0.20794105529785156, 2.4234962463378906, 4.571052551269531, 8.526420593261719, -29.728759765625, 11.031303405761719, 10.386268615722656, 30.54778289794922, -11.93988037109375, 9.022041320800781, -3.546600341796875, 18.237030029296875, 45.8765869140625, 10.604171752929688, 38.120574951171875, 24.039684295654297, 13.065370559692383, 31.035194396972656, -0.27146148681640625, 8.930511474609375, 4.7984161376953125, 23.231292724609375, -10.166912078857422, 37.95068359375, 4.478431701660156, -0.9310264587402344, 4.3763885498046875, -5.6671905517578125, 14.06719970703125, 16.827098846435547, 10.187978744506836, 24.26280975341797, 8.587074279785156, -8.364978790283203, 11.44183349609375, 5.711906433105469, -7.18328857421875, -8.376520156860352, -16.65576171875, 18.081748962402344, 4.945793151855469, -3.3498382568359375, 13.958492279052734, 44.10527801513672, 3.4504356384277344, -20.898696899414062, 12.70428466796875, -3.1221160888671875, 17.848854064941406, 4.474617004394531, 2.267009735107422, 7.817522048950195, 4.68562126159668, 7.486310958862305, 17.001237869262695, 28.633270263671875, 10.287633895874023, 11.879005432128906, 5.845527648925781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000466.npy"}
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 12.323994636535645, "std": 15.935688972473145, "min": -19.28532600402832, "p10": -3.7696094512939453, "median": 10.222883224487305, "p90": 33.77925262451172, "max": 71.4908447265625, "pos_frac": 0.78125, "sample": [2.0980987548828125, 46.6590461730957, 15.920738220214844, 19.001510620117188, -2.0385055541992188, 8.22726821899414, -3.0529327392578125, -5.618457794189453, 10.473506927490234, 15.655166625976562, 18.876922607421875, -6.413005828857422, 35.24790954589844, -3.11383056640625, 14.2615966796875, 34.481849670410156, 7.88128662109375, 4.752895355224609, 3.433319091796875, 3.5377769470214844, 15.451919555664062, -11.441238403320312, -0.2877082824707031, 39.745079040527344, 29.930999755859375, 5.0877227783203125, 18.174156188964844, 23.212875366210938, -19.28532600402832, 22.18359375, -3.714214324951172, 22.741661071777344, 5.478292465209961, 9.617156982421875, -3.5541458129882812, 6.810699462890625, 18.089324951171875, 34.1611328125, 1.5583343505859375, 20.1671142578125, 0.4734611511230469, 0.5283012390136719, -6.910808563232422, 11.502191543579102, 5.62211799621582, -3.7933502197265625, 9.972259521484375, -0.15256500244140625, 19.494388580322266, 32.88819885253906, 71.4908447265625, 9.710018157958984, 11.547134399414062, 2.056262969970703, 13.409042358398438, 10.8275146484375, -12.500686645507812, 44.75439453125, 10.587690353393555, 32.836700439453125, 3.996917724609375, 21.980850219726562, 28.84015655517578, 15.175018310546875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000467.npy"}
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 18.892831802368164, "std": 19.135848999023438, "min": -15.101921081542969, "p10": -6.392640686035156, "median": 15.22341537475586, "p90": 43.91009712219239, "max": 74.98284912109375, "pos_frac": 0.828125, "sample": [10.679449081420898, 3.4806995391845703, 12.333667755126953, 45.43115234375, 12.478153228759766, -3.1056365966796875, 3.8306846618652344, 37.46399688720703, 19.929595947265625, 12.17095947265625, 28.099544525146484, 34.61522674560547, 30.546493530273438, -11.881410598754883, 10.631004333496094, 33.94978332519531, 16.006668090820312, -10.847854614257812, -6.4901123046875, 19.70203399658203, 51.31636047363281, 12.827285766601562, 35.78141784667969, 46.7666015625, 12.228092193603516, 18.748823165893555, 47.371337890625, 8.215177536010742, -2.7616195678710938, -15.101921081542969, 44.73420333862305, -7.401763916015625, 2.366363525390625, -7.393476486206055, 5.2930755615234375, -8.722034454345703, 19.526756286621094, 21.913421630859375, -6.1652069091796875, 32.55904769897461, 14.440162658691406, 12.853530883789062, 30.8416748046875, 40.6856689453125, 36.43071365356445, 6.156843185424805, 12.366033554077148, 26.289566040039062, 12.535259246826172, 16.689001083374023, 37.68603515625, 28.222877502441406, 22.901935577392578, 12.528411865234375, 65.26570129394531, 6.92352294921875, 2.3993759155273438, 74.98284912109375, 41.9871826171875, 25.76156234741211, 37.88893508911133, 28.403587341308594, 6.790718078613281, -3.0159835815429688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000468.npy"}
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 14.13906478881836, "std": 18.209402084350586, "min": -19.09023094177246, "p10": -4.033586502075194, "median": 10.064332008361816, "p90": 40.32417144775391, "max": 63.8272819519043, "pos_frac": 0.765625, "sample": [6.7783355712890625, 27.852508544921875, 28.65766143798828, 12.780021667480469, 5.972267150878906, 4.350318908691406, 3.1227264404296875, 15.931135177612305, 20.635379791259766, -12.922454833984375, 32.209808349609375, 17.97054672241211, 11.430648803710938, -8.631309509277344, 1.362457275390625, 21.681699752807617, 22.440818786621094, 8.698015213012695, 21.03081703186035, 41.44249725341797, 39.43152618408203, 4.210136413574219, 17.30748748779297, 3.6496734619140625, 27.363067626953125, -14.371429443359375, 6.243946075439453, 3.957752227783203, -2.635843276977539, 7.775215148925781, -4.824577331542969, -0.8676357269287109, 26.015838623046875, 63.8272819519043, 35.055580139160156, -4.4796142578125, -0.3613872528076172, -0.8349399566650391, 40.70673370361328, 49.162841796875, 33.45648193359375, 13.89801025390625, 6.816434860229492, 18.53974151611328, 3.0019149780273438, -0.265625, 4.088134765625, -2.9928550720214844, 62.826141357421875, 13.247657775878906, -4.504150390625, 1.9610443115234375, 57.720584869384766, 2.2556800842285156, 47.2530517578125, -1.1928482055664062, 16.503124237060547, 16.15594482421875, 16.727741241455078, -19.09023094177246, 16.89678955078125, 1.3529720306396484, -2.037933349609375, 23.156829833984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000469.npy"}
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 13.090213775634766, "std": 17.207826614379883, "min": -44.357784271240234, "p10": -7.7576196670532225, "median": 13.388751029968262, "p90": 34.49457550048828, "max": 52.1209602355957, "pos_frac": 0.765625, "sample": [24.40545654296875, 23.19184112548828, 52.1209602355957, -44.357784271240234, 11.318763732910156, 17.783843994140625, 24.546531677246094, 6.985054016113281, 34.83631134033203, 28.3505859375, 21.842945098876953, -13.229988098144531, 34.51776123046875, 13.215290069580078, 29.482681274414062, 2.0419692993164062, 29.783950805664062, -5.046777725219727, 6.579658508300781, 26.3566837310791, -10.887836456298828, 34.44047546386719, 8.541208267211914, -3.8658447265625, 36.041419982910156, 14.905084609985352, 47.568138122558594, 22.626708984375, 3.853130340576172, -9.229246139526367, 11.324848175048828, -2.2673892974853516, 27.97351837158203, 3.7722320556640625, -7.803777694702148, -3.1010665893554688, 10.543285369873047, 1.5168704986572266, 9.928138732910156, -10.32186508178711, 30.271224975585938, 23.121212005615234, 30.642127990722656, 3.2497482299804688, -8.717750549316406, -5.1275482177734375, 13.562211990356445, 27.60101318359375, 14.564697265625, 39.41657257080078, 15.257038116455078, 30.94666290283203, -5.426273345947266, 13.568191528320312, 5.476570129394531, 15.724647521972656, 9.128639221191406, 23.68218994140625, -7.6499176025390625, 1.9766483306884766, 38.331817626953125, -1.3665828704833984, 15.884693145751953, 3.3720855712890625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000470.npy"}
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 16.298080444335938, "std": 19.196636199951172, "min": -45.17835998535156, "p10": -4.419670104980467, "median": 15.872608184814453, "p90": 42.003295898437514, "max": 52.84307861328125, "pos_frac": 0.828125, "sample": [1.7340068817138672, 37.64134979248047, 32.87406921386719, 26.141389846801758, 17.37256622314453, -2.866180419921875, 31.624290466308594, 24.551918029785156, 37.56536865234375, 34.35978698730469, 25.63939666748047, 9.01129150390625, 46.982765197753906, 2.8542404174804688, -20.627662658691406, -2.0093212127685547, 8.745943069458008, 48.916297912597656, 37.780555725097656, -14.213729858398438, 4.895698547363281, 22.48672103881836, 27.757675170898438, 15.90765380859375, 25.77788543701172, -45.17835998535156, 25.435436248779297, 25.44659423828125, 44.85948944091797, 4.618923187255859, 27.12442398071289, 15.91219711303711, 7.044349670410156, 2.7872085571289062, 12.100540161132812, 13.644943237304688, 44.047523498535156, 5.9708099365234375, 4.279365539550781, 37.8197021484375, 8.991291046142578, -4.964935302734375, 43.7962646484375, 32.39263916015625, 0.8747005462646484, 4.6708526611328125, 17.449996948242188, 29.25204849243164, -11.606475830078125, 22.869129180908203, -12.341392517089844, 35.055450439453125, 4.423191070556641, 2.9200439453125, 15.837562561035156, -3.1473846435546875, 12.65869140625, 30.33509063720703, -5.4761505126953125, 52.84307861328125, 8.89996337890625, 2.344940185546875, -1.3246345520019531, 49.50602722167969], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000471.npy"}
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 14.231185913085938, "std": 20.0870361328125, "min": -30.501022338867188, "p10": -10.113227462768554, "median": 10.662357330322266, "p90": 45.64756851196289, "max": 60.01055145263672, "pos_frac": 0.796875, "sample": [10.656806945800781, 11.395515441894531, 9.942604064941406, 5.3718414306640625, 32.48779296875, 37.443660736083984, -6.4571990966796875, 34.31846618652344, 33.218170166015625, 47.81034851074219, 4.300731658935547, 53.937652587890625, 10.764055252075195, -10.597824096679688, 2.385354995727539, 45.5208740234375, 7.468873977661133, 15.918586730957031, 60.01055145263672, 8.8797607421875, -11.804527282714844, 15.11993408203125, -5.260822296142578, 36.68180847167969, -4.3151092529296875, 15.277435302734375, 7.2657318115234375, 45.701866149902344, 45.82207489013672, -8.030170440673828, 6.22607421875, 13.870071411132812, 7.393587112426758, -16.955116271972656, -8.982501983642578, 14.69390869140625, 28.452362060546875, 46.534912109375, 3.375102996826172, 5.5668792724609375, 30.37761878967285, 46.75300979614258, -30.501022338867188, 40.534088134765625, -18.614715576171875, -14.094413757324219, 33.200401306152344, 1.7201213836669922, -19.9608154296875, 13.634078979492188, 10.66790771484375, 9.790838241577148, 12.287757873535156, -1.3523788452148438, 7.3026580810546875, 1.1501235961914062, 20.186172485351562, 9.382492065429688, 39.77348327636719, 2.3488311767578125, 20.03136444091797, 11.998100280761719, 1.5603752136230469, 31.209732055664062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000472.npy"}
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 14.437677383422852, "std": 21.986974716186523, "min": -27.620372772216797, "p10": -12.089888763427734, "median": 9.56967544555664, "p90": 46.05936889648438, "max": 62.36126708984375, "pos_frac": 0.765625, "sample": [46.59918212890625, 44.3223876953125, 3.4508190155029297, -14.848518371582031, -22.931650161743164, 38.03157043457031, 8.156234741210938, 17.921165466308594, 5.860849380493164, 47.5770263671875, 44.7998046875, 1.7977886199951172, -2.0306472778320312, 10.895706176757812, 8.879425048828125, -12.665428161621094, 56.45347595214844, 48.385963439941406, 12.760177612304688, 35.972145080566406, -27.620372772216797, 5.1980438232421875, 32.241729736328125, -14.225082397460938, -6.67266845703125, 26.444290161132812, 14.13119888305664, 28.113189697265625, 4.011081695556641, 18.091896057128906, 4.717132568359375, 0.6404800415039062, -20.76879119873047, -10.304756164550781, -10.392396926879883, 29.90020751953125, 62.36126708984375, 44.416969299316406, 6.4625701904296875, 2.0064544677734375, 40.995697021484375, 15.846378326416016, 56.15240478515625, 48.314056396484375, 7.610996246337891, -2.2036590576171875, 7.379276275634766, 5.2947540283203125, 10.2950439453125, 33.05878448486328, 17.56454086303711, 3.1322669982910156, 25.975130081176758, 10.259925842285156, -4.890800476074219, 14.884256362915039, 41.377052307128906, -10.746963500976562, 2.2683868408203125, -4.501678466796875, 3.4742202758789062, 36.357765197753906, 21.58647918701172, -23.612781524658203], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000473.npy"}
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 10.789448738098145, "std": 19.142515182495117, "min": -30.573348999023438, "p10": -10.01886444091797, "median": 10.051937103271484, "p90": 37.46819725036622, "max": 47.966346740722656, "pos_frac": 0.640625, "sample": [41.37169647216797, 28.663768768310547, 13.540782928466797, -16.22897720336914, -9.534717559814453, -2.347209930419922, 14.856536865234375, 47.966346740722656, 27.75433349609375, -2.7884902954101562, 3.7315292358398438, 23.847091674804688, -23.235980987548828, 19.116897583007812, 11.339500427246094, 38.00809860229492, -4.285865783691406, -0.3356590270996094, -4.9149322509765625, 11.044425964355469, -11.240449905395508, -3.479930877685547, 33.1395378112793, 17.20108413696289, 22.63983917236328, 3.1259822845458984, -0.7961959838867188, -30.573348999023438, -1.2732276916503906, -2.8227005004882812, 43.90296173095703, 11.620494842529297, -23.067657470703125, 6.06927490234375, -9.911502838134766, 33.51959228515625, 3.6424942016601562, 32.73473358154297, 25.560081481933594, -3.9903488159179688, 5.639820098876953, 9.447509765625, 36.20842742919922, 6.2153778076171875, -6.274669647216797, 31.225509643554688, 34.786895751953125, 11.664901733398438, 32.18701171875, -1.9387226104736328, 42.29594421386719, 40.913108825683594, -2.4654388427734375, 10.817535400390625, 10.656364440917969, -10.064876556396484, 43.23986053466797, -8.764968872070312, 8.98208236694336, 12.915176391601562, 30.924053192138672, -27.09417724609375, 4.726261138916016, 10.71182632446289], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000474.npy"}
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 19.662551879882812, "std": 18.842714309692383, "min": -13.125274658203125, "p10": -2.8851123809814445, "median": 16.674396514892578, "p90": 47.146312713623054, "max": 61.577911376953125, "pos_frac": 0.84375, "sample": [1.6366901397705078, 6.604907989501953, 18.196510314941406, 9.228530883789062, 30.732389450073242, 10.189476013183594, 39.79511260986328, 47.64225769042969, 8.750192642211914, -2.1801376342773438, 16.726844787597656, -6.478240966796875, 36.33343505859375, 46.060546875, 16.821535110473633, 28.009742736816406, 23.77528190612793, 17.508014678955078, 9.939172744750977, 28.57709503173828, 12.1292724609375, 6.164794921875, 0.25900840759277344, 10.305809020996094, 16.541885375976562, -0.7275657653808594, 8.980430603027344, 61.577911376953125, 16.6219482421875, -6.8338470458984375, 60.882850646972656, 47.61164093017578, 37.91181945800781, 59.22814178466797, 10.00105094909668, -1.7932548522949219, 18.78386878967285, -3.649456024169922, 24.50566864013672, 6.497566223144531, 40.696441650390625, 24.67755126953125, 5.014957427978516, 14.223190307617188, 22.240137100219727, -10.15485954284668, -3.187244415283203, 14.325212478637695, 50.487823486328125, 5.1498260498046875, 44.182708740234375, 58.13812255859375, 32.960205078125, 26.36455535888672, 35.245643615722656, 0.7367019653320312, -8.609992980957031, 30.295806884765625, 38.934730529785156, 27.467952728271484, 11.87551498413086, 12.068504333496094, -13.125274658203125, 25.526145935058594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000475.npy"}
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 11.747025489807129, "std": 19.33905792236328, "min": -27.261383056640625, "p10": -9.566496086120603, "median": 9.230640411376953, "p90": 36.83492088317871, "max": 60.628761291503906, "pos_frac": 0.703125, "sample": [44.999595642089844, 9.523929595947266, -4.617887496948242, 9.314865112304688, -12.848281860351562, 20.472732543945312, -5.8746185302734375, 45.06031036376953, 50.791534423828125, 3.98626708984375, 36.96551513671875, -0.3791637420654297, 56.19537353515625, 22.552255630493164, 22.34218978881836, 1.5726547241210938, 11.9134521484375, 8.054359436035156, 19.11046600341797, 7.680206298828125, -10.335786819458008, 2.8419666290283203, 31.133901596069336, 28.775630950927734, 19.835662841796875, -21.822050094604492, -3.0106887817382812, 10.237220764160156, -0.8287506103515625, 4.75794792175293, 51.61703872680664, -17.854598999023438, 18.872467041015625, 1.0770225524902344, 12.49917984008789, -13.082008361816406, -7.771484375, -16.437301635742188, 30.7786865234375, 2.371723175048828, -0.432098388671875, 12.245010375976562, 6.4954071044921875, 13.061798095703125, 17.056957244873047, 9.146415710449219, 12.655357360839844, -1.8935546875, -2.547454833984375, 33.08881378173828, 17.287628173828125, -27.261383056640625, 26.45452880859375, -7.0716552734375, 27.86324691772461, 60.628761291503906, 0.26947784423828125, 4.4799346923828125, 36.53020095825195, 2.445037841796875, -7.526548385620117, 31.793119430541992, 23.295503616333008, -6.726388931274414], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000476.npy"}
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 13.353883743286133, "std": 20.725757598876953, "min": -32.64873123168945, "p10": -13.423421478271482, "median": 14.84890365600586, "p90": 39.81270523071289, "max": 53.36164093017578, "pos_frac": 0.71875, "sample": [-14.655998229980469, -10.547409057617188, 34.15251159667969, 6.256690979003906, 19.89520263671875, 23.915782928466797, 1.0876541137695312, 3.6374568939208984, 5.4930572509765625, -2.1506919860839844, 31.606422424316406, 18.05318832397461, -25.929821014404297, 1.1774215698242188, 13.040399551391602, 4.028608322143555, -4.991111755371094, 51.559539794921875, 15.744796752929688, -9.472620010375977, 24.04364013671875, 44.348777770996094, 17.27288055419922, 18.463157653808594, 32.09880828857422, -6.417457580566406, -15.488349914550781, 14.205558776855469, 33.375022888183594, 4.527935028076172, -18.121078491210938, 6.898246765136719, 8.870355606079102, -2.122417449951172, -32.64873123168945, -8.094196319580078, -10.002021789550781, 15.49224853515625, 16.836807250976562, 12.859634399414062, 47.27199172973633, -3.567047119140625, 40.18202209472656, 53.36164093017578, 18.319547653198242, 38.950965881347656, -22.999462127685547, 31.216552734375, 35.34387969970703, -2.616729736328125, 43.719268798828125, 10.702692031860352, 16.64256477355957, 35.55710220336914, 38.5595703125, 24.508331298828125, -28.123504638671875, 29.943695068359375, 21.331626892089844, 41.35063171386719, 3.192880630493164, 37.617431640625, -0.2827606201171875, 26.165775299072266], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000477.npy"}
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 11.457244873046875, "std": 21.059701919555664, "min": -31.84814453125, "p10": -12.189144897460936, "median": 10.109420776367188, "p90": 44.67073211669923, "max": 58.416053771972656, "pos_frac": 0.65625, "sample": [-3.259920120239258, 58.416053771972656, 6.7234649658203125, 28.104337692260742, 13.34585952758789, -0.6974029541015625, 45.35986328125, 38.696434020996094, -3.5524234771728516, 16.879058837890625, 1.0247611999511719, 2.8942108154296875, 18.462860107421875, 50.30864715576172, -9.280998229980469, 6.9611663818359375, -12.492889404296875, 5.5717620849609375, -7.146373748779297, 12.85430908203125, -6.356374740600586, -6.487844467163086, 25.227508544921875, 24.86798095703125, 36.515953063964844, -25.621658325195312, 23.896209716796875, 19.810758590698242, 32.043914794921875, -24.870254516601562, 14.38653564453125, 1.6391944885253906, -31.84814453125, -25.07238006591797, 10.33975601196289, 43.06275939941406, 37.28929901123047, -4.542236328125, 24.430137634277344, 11.814132690429688, 46.77849578857422, -0.74798583984375, -16.04473876953125, 10.61821174621582, 46.45850372314453, 9.043594360351562, -0.23242568969726562, 33.529876708984375, 10.174148559570312, -4.394618988037109, 11.681766510009766, 33.033546447753906, -11.48040771484375, 19.025951385498047, -3.4499664306640625, 49.41539764404297, 48.30768966674805, -9.230674743652344, 2.1463241577148438, 10.044692993164062, -2.2197113037109375, -14.883869171142578, 14.31048583984375, 1.6812801361083984], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000478.npy"}
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 10.658845901489258, "std": 18.97064208984375, "min": -59.89551544189453, "p10": -7.612274551391599, "median": 8.271437644958496, "p90": 36.54982414245607, "max": 51.90971755981445, "pos_frac": 0.703125, "sample": [9.572338104248047, 34.180362701416016, 1.127349853515625, -5.553394317626953, -3.3300533294677734, 8.347122192382812, 51.90971755981445, 20.473529815673828, 25.15267562866211, -2.0919723510742188, 9.194976806640625, -10.319942474365234, -3.6296157836914062, -2.40826416015625, 1.4327545166015625, 15.867301940917969, 38.46391296386719, 25.401153564453125, -1.2739524841308594, 0.8331508636474609, -12.439048767089844, 4.231220245361328, 49.9510498046875, 34.15285110473633, 7.257051467895508, -14.009136199951172, 8.844501495361328, 4.571533203125, -12.201171875, -8.494651794433594, 7.92364501953125, 18.115087509155273, 1.6302680969238281, 40.95857238769531, 23.585235595703125, 6.144924163818359, -22.138465881347656, -3.8871536254882812, 11.228218078613281, 31.72534942626953, 37.62489318847656, -59.89551544189453, -0.5680389404296875, 8.438318252563477, 37.5653076171875, 51.5619010925293, 23.274139404296875, -3.3409423828125, 4.449283599853516, 6.8624420166015625, 20.282272338867188, 33.45487976074219, -4.443368911743164, 16.023101806640625, 4.304319381713867, 22.24551010131836, 19.77256965637207, 8.19575309753418, 14.394180297851562, 20.031333923339844, -1.4742069244384766, -4.605621337890625, 17.210716247558594, 20.3038272857666], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000479.npy"}
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 14.492194175720215, "std": 20.691307067871094, "min": -47.69881820678711, "p10": -6.258784484863281, "median": 15.898250579833984, "p90": 41.27041397094727, "max": 56.595977783203125, "pos_frac": 0.734375, "sample": [16.028053283691406, -3.349578857421875, 24.96331024169922, -6.3250274658203125, 24.291135787963867, 5.488630294799805, 18.562437057495117, -47.69881820678711, 4.194145202636719, 56.595977783203125, -1.6741046905517578, 10.780517578125, 28.60040283203125, -1.7956314086914062, 10.790878295898438, -6.104217529296875, -2.7459259033203125, 19.575538635253906, -3.64752197265625, 3.9916000366210938, -14.380374908447266, 30.997512817382812, 26.358741760253906, -16.83934783935547, 41.338096618652344, 40.7840576171875, 48.504913330078125, 16.613906860351562, 37.9564208984375, 14.203987121582031, 22.598026275634766, -1.2377147674560547, 5.026268005371094, 15.33755874633789, 22.263330459594727, 22.489700317382812, 30.166213989257812, 19.725311279296875, -46.633583068847656, 19.28148651123047, 43.3853759765625, 34.58631134033203, 20.007863998413086, 43.55615997314453, 15.768447875976562, 36.15483474731445, 3.6868667602539062, -12.269851684570312, 11.380474090576172, 14.017208099365234, 11.752706527709961, -5.68011474609375, 47.105003356933594, 20.182159423828125, 20.553367614746094, 29.425907135009766, 46.93817138671875, 30.324495315551758, 5.747547149658203, -4.986572265625, 4.707950592041016, 41.11248779296875, -4.8639373779296875, -10.168777465820312], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000480.npy"}
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 15.902488708496094, "std": 20.97294044494629, "min": -29.44443130493164, "p10": -7.960577011108398, "median": 13.02783203125, "p90": 45.98787002563478, "max": 62.89056396484375, "pos_frac": 0.765625, "sample": [27.990999221801758, 6.208503723144531, 55.563072204589844, 4.751579284667969, -7.4199066162109375, 15.267129898071289, 30.632755279541016, 14.411174774169922, -11.746749877929688, -11.835750579833984, 28.870346069335938, 12.387786865234375, 32.73139953613281, 33.90416717529297, -8.018913269042969, 47.29408264160156, 22.93422508239746, 16.01020050048828, 18.264053344726562, 40.905601501464844, -29.44443130493164, -0.48357391357421875, 42.940040588378906, 14.392707824707031, 0.8716545104980469, -4.805795669555664, 0.407135009765625, 29.239084243774414, 9.22604751586914, 27.60064697265625, -0.8725357055664062, 32.70433807373047, 13.667877197265625, 0.50592041015625, 11.565864562988281, 17.956298828125, 24.24335479736328, 9.068317413330078, 6.287044525146484, 7.719829559326172, -8.789764404296875, 9.384590148925781, 40.956398010253906, 61.8720588684082, -1.1211872100830078, 47.789329528808594, -3.5214157104492188, 1.6180496215820312, 60.10528564453125, 30.948244094848633, 27.91413116455078, 10.561065673828125, 5.235820770263672, 8.221755981445312, -7.007989883422852, 55.10247802734375, -26.1071834564209, 25.232223510742188, 20.485944747924805, 62.89056396484375, 26.54908561706543, 7.29730224609375, -11.9285888671875, -7.824459075927734], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000481.npy"}
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 12.837133407592773, "std": 23.382366180419922, "min": -40.4056396484375, "p10": -12.327579498291016, "median": 10.347612380981445, "p90": 45.399198913574224, "max": 87.19644165039062, "pos_frac": 0.734375, "sample": [-8.8837890625, -40.4056396484375, -2.5903701782226562, 15.341712951660156, 4.6656646728515625, 14.160804748535156, 7.710113525390625, -12.356773376464844, 2.4246559143066406, 43.40345764160156, 14.12021255493164, 40.5748291015625, 29.62596893310547, 9.007469177246094, 21.255878448486328, 17.224197387695312, 46.2545166015625, 22.44140625, -31.932849884033203, 15.301780700683594, 47.15367889404297, -1.79254150390625, 66.51840209960938, -23.789276123046875, -0.6085128784179688, 18.93876075744629, 49.16559600830078, 21.8599853515625, -4.019065856933594, 1.5229110717773438, -0.10112762451171875, 24.194311141967773, -16.48128318786621, 11.218997955322266, 54.4552001953125, 30.78278350830078, 22.73621368408203, 1.4283599853515625, 7.755199432373047, -8.718826293945312, 87.19644165039062, 12.255340576171875, 12.424201965332031, 3.47991943359375, 9.476226806640625, 32.9183235168457, 11.261314392089844, 6.857580184936523, 54.86344909667969, -28.257606506347656, 22.976348876953125, 8.486328125, 21.71982192993164, 8.550437927246094, 22.446029663085938, 0.8941650390625, -19.647872924804688, 42.688446044921875, -12.115104675292969, -12.25946044921875, 23.0419921875, -0.56390380859375, 2.622711181640625, 0.6984157562255859], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000482.npy"}
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 15.584157943725586, "std": 20.28885269165039, "min": -47.06306457519531, "p10": -4.66779556274414, "median": 13.96210765838623, "p90": 42.96399459838867, "max": 66.19740295410156, "pos_frac": 0.765625, "sample": [-5.342506408691406, 40.995758056640625, -4.892669677734375, 9.888093948364258, 27.491676330566406, 54.59906768798828, 26.862329483032227, 11.8497314453125, -3.6781539916992188, -4.956356048583984, 8.40890121459961, 24.442279815673828, 44.107086181640625, 47.22142028808594, -3.4944190979003906, 36.03891372680664, 13.87091064453125, -17.950469970703125, 2.9113082885742188, 32.05841064453125, 19.371505737304688, 44.32160186767578, 42.693115234375, 2.454193115234375, 31.719619750976562, 15.00823974609375, 29.166046142578125, 0.8509445190429688, -0.516876220703125, 7.250732421875, 11.062950134277344, -21.997299194335938, 49.346702575683594, 2.6066646575927734, 2.422149658203125, 31.45526123046875, 17.713058471679688, -47.06306457519531, 14.053304672241211, -9.715675354003906, 9.590133666992188, 18.31446075439453, 23.345199584960938, 3.0897789001464844, -4.143089294433594, 16.578174591064453, 66.19740295410156, 38.075347900390625, 22.32384490966797, 37.02736282348633, 11.370437622070312, 21.245162963867188, 9.387046813964844, 43.08008575439453, -1.3593616485595703, 2.4761714935302734, 0.29138946533203125, 15.503541946411133, 36.05601501464844, 21.570140838623047, -1.7254714965820312, -3.406982421875, -1.993783950805664, 31.858604431152344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000483.npy"}
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 12.458930969238281, "std": 15.536128044128418, "min": -15.23846435546875, "p10": -7.665396118164062, "median": 13.718072891235352, "p90": 32.556966400146486, "max": 52.63865661621094, "pos_frac": 0.71875, "sample": [13.381763458251953, 20.94025421142578, -15.23846435546875, 9.538997650146484, 52.63865661621094, 3.655935287475586, 8.102603912353516, -14.066970825195312, 33.76422119140625, -9.927505493164062, 11.600547790527344, -13.408462524414062, 6.523471832275391, 5.682384490966797, 19.488386154174805, 24.268428802490234, -1.1707763671875, -2.7102012634277344, 13.854263305664062, 20.33031463623047, -3.482511520385742, 23.459571838378906, 28.437149047851562, 14.685985565185547, 21.754976272583008, 23.311311721801758, 37.38798522949219, 2.3258819580078125, 25.153560638427734, -0.5076503753662109, 5.969738006591797, 21.355125427246094, -3.700714111328125, -12.201431274414062, 36.05030822753906, -1.005462646484375, 19.53497886657715, 18.603071212768555, 15.999870300292969, 2.553913116455078, 17.098482131958008, -1.9272232055664062, 49.99834060668945, -0.8749237060546875, 32.34718322753906, 10.77717399597168, 32.646873474121094, -2.806720733642578, 18.0047607421875, 15.499702453613281, 47.231346130371094, 14.528173446655273, 15.058456420898438, -1.1244964599609375, -12.761909484863281, -7.832855224609375, 20.661365509033203, 9.060333251953125, 12.937213897705078, 19.097900390625, 18.274131774902344, 22.237548828125, 13.58188247680664, -7.274658203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000484.npy"}
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 14.170873641967773, "std": 19.62021255493164, "min": -31.931732177734375, "p10": -5.337160301208495, "median": 9.663253784179688, "p90": 46.345185470581065, "max": 56.99281311035156, "pos_frac": 0.78125, "sample": [9.73419189453125, 14.397882461547852, 22.011409759521484, 17.091781616210938, 13.900382995605469, 13.753028869628906, 8.574016571044922, -1.6158885955810547, 31.890830993652344, -8.05003547668457, -22.80817413330078, 3.8813323974609375, 56.99281311035156, 31.484344482421875, -1.2653732299804688, 44.32380676269531, 11.489704132080078, 47.211490631103516, 51.3817138671875, 8.11250114440918, 1.7957515716552734, 9.4600830078125, 0.3916778564453125, 48.2039794921875, -1.24273681640625, 0.3545494079589844, -31.931732177734375, 13.582382202148438, 10.901847839355469, 17.109676361083984, -8.308860778808594, -1.4817161560058594, 40.25208282470703, 0.7511062622070312, 15.059959411621094, -8.752120971679688, 42.724273681640625, 42.982200622558594, 3.5615081787109375, 19.300960540771484, 15.358951568603516, -8.630373001098633, 14.210929870605469, 3.365753173828125, 8.733207702636719, -4.76197624206543, -2.8793563842773438, 32.031227111816406, 40.19621276855469, 4.7897796630859375, 2.8257102966308594, 13.519948959350586, 20.17559051513672, 52.29057312011719, 9.592315673828125, 56.215972900390625, 15.565605163574219, 51.144073486328125, -4.7224578857421875, -5.583667755126953, 6.7969207763671875, 5.002010345458984, 9.153488159179688, 5.338775634765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000485.npy"}
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 12.634367942810059, "std": 21.689088821411133, "min": -33.969482421875, "p10": -11.401361083984373, "median": 9.143661499023438, "p90": 44.98264694213867, "max": 75.49658203125, "pos_frac": 0.734375, "sample": [-3.1455459594726562, 10.04547119140625, 32.164154052734375, 15.199357986450195, 6.84417724609375, -3.0764923095703125, 10.431442260742188, 75.49658203125, 19.968704223632812, -3.47894287109375, -4.830671310424805, 20.608802795410156, 15.778266906738281, 11.457719802856445, 30.397483825683594, 1.2421875, -12.084403991699219, 50.57275390625, -1.4933948516845703, 23.035659790039062, 12.967878341674805, 7.723457336425781, -16.177886962890625, 6.6464385986328125, 51.828453063964844, -15.378860473632812, 29.304046630859375, -17.677682876586914, 27.634157180786133, -21.896316528320312, 35.428985595703125, 13.341781616210938, 4.342498779296875, -27.343156814575195, 11.148134231567383, 9.983566284179688, 14.211677551269531, 27.591381072998047, -6.520956039428711, 4.852352142333984, 2.057107925415039, 45.23702621459961, 14.422798156738281, 59.01983642578125, 44.389095306396484, -4.615501403808594, 7.41522216796875, 18.225793838500977, 18.081321716308594, 54.67424011230469, 5.6675872802734375, 1.793182373046875, 8.303756713867188, 35.34495544433594, -4.210205078125, 5.0021820068359375, 1.4990768432617188, 7.77589225769043, 61.14723205566406, -2.6877593994140625, 20.696304321289062, -33.969482421875, 5.994228363037109, -9.807594299316406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000486.npy"}
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 12.919859886169434, "std": 18.973241806030273, "min": -33.65876770019531, "p10": -7.746450424194336, "median": 9.121465682983398, "p90": 37.62743568420411, "max": 72.97607421875, "pos_frac": 0.765625, "sample": [19.882904052734375, -10.71457290649414, 11.288894653320312, 14.084465026855469, 18.496315002441406, 7.208097457885742, -2.001678466796875, 20.684585571289062, 38.68140411376953, 29.213119506835938, 20.689788818359375, -1.4716262817382812, 3.9294586181640625, -5.532035827636719, -10.973121643066406, 52.87440872192383, 35.290313720703125, 30.46145248413086, 45.62908935546875, 17.159469604492188, -7.641868591308594, 2.35076904296875, 3.9997196197509766, 38.20393753051758, 11.408306121826172, -7.791271209716797, 0.42205047607421875, -23.473785400390625, 2.5245742797851562, 5.01947021484375, 52.587913513183594, 10.400733947753906, 19.886472702026367, -8.588481903076172, 36.282264709472656, -0.2321929931640625, 10.024908065795898, 9.417282104492188, 4.698022842407227, 16.796091079711914, 30.953094482421875, 6.266395568847656, -0.3785552978515625, 22.747249603271484, -15.383129119873047, 7.438907623291016, 7.649751663208008, 1.9112701416015625, 2.432832717895508, -33.65876770019531, -3.1488494873046875, 43.12001037597656, 8.82564926147461, 7.1418914794921875, 7.181671142578125, 29.398441314697266, 4.096500396728516, 19.730453491210938, -0.508636474609375, 32.036781311035156, 30.905670166015625, 23.7110652923584, 72.97607421875, 10.24966049194336], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000487.npy"}
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 15.71593952178955, "std": 19.504480361938477, "min": -19.117738723754883, "p10": -6.665038681030272, "median": 13.409671783447266, "p90": 42.422532653808595, "max": 72.3863525390625, "pos_frac": 0.796875, "sample": [1.1661529541015625, -5.7923431396484375, -3.143350601196289, -12.967306137084961, 4.4307861328125, -7.039051055908203, -7.883083343505859, 32.48236083984375, 42.37187194824219, 16.25168228149414, 9.678386688232422, 7.14324951171875, 59.780372619628906, 7.973136901855469, 0.5171737670898438, 72.3863525390625, 21.354080200195312, 42.444244384765625, 3.0386581420898438, 1.2418861389160156, 15.023635864257812, 13.237113952636719, 6.7531585693359375, 5.147247314453125, 2.406646728515625, 10.424766540527344, -13.872917175292969, 38.22479248046875, 7.296699523925781, 35.98064041137695, 16.06322479248047, -13.8189697265625, -0.8155651092529297, 54.6539306640625, 22.483001708984375, 7.536983489990234, 36.56157302856445, -1.0565376281738281, 45.37505340576172, 38.195526123046875, 34.88727569580078, 15.97524642944336, 15.512992858886719, 13.068572998046875, 34.695526123046875, 14.156513214111328, -2.1998138427734375, 17.06085968017578, -4.0698394775390625, 13.582229614257812, 7.846466064453125, 17.038427352905273, -19.117738723754883, 23.19097900390625, 21.24005126953125, 25.26837921142578, 24.935546875, 32.67580032348633, 51.14357376098633, 14.533031463623047, 9.298599243164062, 2.3356475830078125, -13.620948791503906, 45.14745330810547], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000488.npy"}
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 8.13943099975586, "std": 17.3324031829834, "min": -27.884567260742188, "p10": -13.511225128173827, "median": 5.667657852172852, "p90": 31.56157684326172, "max": 50.19947052001953, "pos_frac": 0.671875, "sample": [-27.884567260742188, 17.013259887695312, 17.106163024902344, 22.9453125, 33.61934280395508, 31.684326171875, -5.53564453125, 23.31163787841797, 18.71133041381836, 48.14093017578125, 22.269447326660156, 1.4834976196289062, 10.287139892578125, -15.900283813476562, -3.697397232055664, 2.4209747314453125, -12.168476104736328, -0.2541236877441406, 26.868850708007812, 12.203958511352539, 50.19947052001953, 22.767059326171875, 14.20083999633789, 7.048259735107422, 24.216751098632812, 31.275161743164062, -6.429786682128906, -7.624473571777344, 20.466293334960938, 5.628940582275391, -15.902717590332031, -22.44388198852539, 41.24794006347656, 5.0838165283203125, -2.9839706420898438, -6.127420425415039, -1.2554931640625, 1.8814353942871094, 27.980976104736328, 6.217678070068359, 7.422513961791992, 4.97760009765625, 4.52801513671875, 20.262298583984375, 11.281044006347656, -14.086688995361328, 33.561363220214844, 4.027931213378906, 14.343616485595703, 18.734573364257812, -21.731101989746094, -0.7634506225585938, 6.609340667724609, 2.9216575622558594, -5.938968658447266, -21.194366455078125, 5.2969970703125, 4.06988525390625, -7.5565032958984375, 5.7063751220703125, -9.919042587280273, 41.11711120605469, -6.682504653930664, 5.8633575439453125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000489.npy"}
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 12.111776351928711, "std": 22.826099395751953, "min": -37.65559387207031, "p10": -15.144539070129392, "median": 12.145278930664062, "p90": 43.429423522949236, "max": 59.19404602050781, "pos_frac": 0.671875, "sample": [-3.980365753173828, -13.396591186523438, -37.65559387207031, 26.188331604003906, -7.472620010375977, 18.378753662109375, -1.3034744262695312, 21.159271240234375, 59.19404602050781, 49.082366943359375, -37.43041229248047, 46.10541915893555, -35.570823669433594, 6.948150634765625, 23.950828552246094, 23.005727767944336, 34.770057678222656, 20.53851318359375, 33.30462646484375, 4.279815673828125, 0.8462600708007812, 40.091278076171875, 25.822301864624023, 32.71669006347656, -11.263961791992188, 5.4333038330078125, 55.05738830566406, 0.8203163146972656, 9.998729705810547, 8.721473693847656, -24.196319580078125, 19.295547485351562, 12.972488403320312, 25.883384704589844, -1.4517841339111328, 2.3782958984375, -1.2160110473632812, 53.15062713623047, -15.893659591674805, -2.9774169921875, 24.616065979003906, -5.028472900390625, 13.324005126953125, 27.458784103393555, -3.3756351470947266, 19.36621856689453, -1.2042770385742188, -4.048032760620117, -17.197311401367188, -4.347198486328125, 16.73687744140625, 14.450536727905273, 1.5897445678710938, 15.778446197509766, 57.21531677246094, 10.143013000488281, 40.12621307373047, 15.507179260253906, 11.318069458007812, 44.84508514404297, -23.36319351196289, -11.598079681396484, 40.09546661376953, 26.459930419921875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000490.npy"}
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 12.546684265136719, "std": 17.152666091918945, "min": -28.49146270751953, "p10": -8.597439956665037, "median": 8.590766906738281, "p90": 36.734786987304695, "max": 43.23577117919922, "pos_frac": 0.78125, "sample": [34.67090606689453, -4.846504211425781, 39.667579650878906, 7.093315124511719, 20.068862915039062, 20.430145263671875, -1.8595695495605469, 22.019424438476562, 23.357704162597656, 2.6380958557128906, 13.113204956054688, -18.003646850585938, 31.668773651123047, 1.660146713256836, 37.548057556152344, -10.180381774902344, 1.2708702087402344, 24.530181884765625, 6.342937469482422, 16.23497772216797, 5.0411529541015625, 19.172637939453125, 37.38597869873047, 7.4393463134765625, 14.72835922241211, 35.08439636230469, 4.916595458984375, -2.8208236694335938, 10.775535583496094, 6.198459625244141, 38.58802795410156, 23.339759826660156, -4.8121185302734375, 6.332115173339844, 42.24736785888672, 18.407611846923828, -9.423072814941406, 42.17939758300781, 5.3439178466796875, -9.716636657714844, 43.23577117919922, 1.8456954956054688, 4.203691482543945, 1.3316802978515625, 2.267223358154297, 25.887493133544922, -4.6041259765625, -19.250484466552734, 31.579994201660156, 35.21533966064453, 14.305625915527344, 26.42202377319336, 30.18944549560547, 31.764812469482422, 5.142332077026367, -3.796478271484375, 9.7421875, 2.00152587890625, 24.657588958740234, -28.49146270751953, -14.68214225769043, 6.4214019775390625, -6.670963287353516, 26.436538696289062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000491.npy"}
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 11.051876068115234, "std": 21.436983108520508, "min": -39.775028228759766, "p10": -13.156951904296875, "median": 10.704054832458496, "p90": 37.46866989135742, "max": 63.916831970214844, "pos_frac": 0.640625, "sample": [45.669677734375, 12.1358642578125, -11.655130386352539, 7.060361862182617, 32.312843322753906, -14.454668045043945, 30.67847442626953, 10.877025604248047, 14.41569709777832, -2.7139511108398438, 21.054428100585938, 21.820465087890625, 11.077667236328125, -2.7842483520507812, 21.204627990722656, 9.294219970703125, -34.08490753173828, 10.273216247558594, 1.9970283508300781, 37.28984069824219, 22.252605438232422, -6.451271057128906, 0.2740936279296875, -5.5902862548828125, 13.782676696777344, 11.470600128173828, 20.530296325683594, 61.665985107421875, -9.3541259765625, -15.876213073730469, 47.59790802001953, 27.82512092590332, -3.47772216796875, 21.234298706054688, -4.001991271972656, -1.3207550048828125, -13.68023681640625, 63.916831970214844, -24.89715576171875, -9.453437805175781, 36.95191192626953, 19.221099853515625, 21.58557891845703, -1.54461669921875, -4.252922058105469, 31.808250427246094, 36.30482864379883, 10.531084060668945, -8.225326538085938, 37.545310974121094, -39.775028228759766, -13.2066650390625, 4.9697265625, 28.971466064453125, 27.490081787109375, 6.694072723388672, 44.75700759887695, 14.068923950195312, 17.292247772216797, -1.080810546875, 4.004608154296875, -13.04095458984375, -12.56304931640625, 40.89750671386719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000492.npy"}
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 15.557969093322754, "std": 19.163637161254883, "min": -43.67533493041992, "p10": -4.719852066040039, "median": 12.133949279785156, "p90": 41.02578353881838, "max": 55.356712341308594, "pos_frac": 0.8125, "sample": [11.679130554199219, 5.081523895263672, -5.5644989013671875, 28.368927001953125, 9.262924194335938, 6.019195556640625, -34.911155700683594, 12.120231628417969, 9.264665603637695, 8.069343566894531, 4.6989288330078125, 27.797698974609375, 29.973297119140625, 47.73827362060547, 4.253360748291016, -4.672916412353516, 43.04386901855469, 3.3004417419433594, 36.316917419433594, 34.05841064453125, 5.092023849487305, -0.36360931396484375, -5.213354110717773, 16.35977554321289, -1.4242172241210938, -4.739967346191406, 28.42638397216797, 11.743206024169922, 52.1934814453125, 28.042322158813477, 10.304939270019531, 28.37779998779297, 0.025543212890625, 22.388904571533203, -9.522171020507812, 16.276214599609375, 52.69290542602539, -6.6451263427734375, 33.95848083496094, 9.805715560913086, 34.617897033691406, 21.966659545898438, 29.325828552246094, 26.461395263671875, 4.692920684814453, 51.1593017578125, -1.3961639404296875, 27.782760620117188, 31.792570114135742, 4.552742004394531, 1.21319580078125, 28.838680267333984, 12.147666931152344, -0.230621337890625, 55.356712341308594, 19.78094482421875, 18.248443603515625, 16.93706512451172, 16.45831298828125, 45.61341857910156, 10.900321960449219, 4.564720153808594, 14.922737121582031, -43.67533493041992], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000493.npy"}
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 11.528377532958984, "std": 22.113391876220703, "min": -44.522666931152344, "p10": -11.593209457397458, "median": 10.145687103271484, "p90": 40.34465942382815, "max": 61.14497375488281, "pos_frac": 0.71875, "sample": [27.61224365234375, 3.5045852661132812, 9.567863464355469, 11.029693603515625, 29.880393981933594, -1.0919342041015625, -12.296802520751953, 28.78192710876465, 19.34113311767578, 1.8639755249023438, 5.627185821533203, -13.781982421875, 6.9065399169921875, 34.54924774169922, 3.4861183166503906, 12.238418579101562, 33.2154541015625, 1.3031387329101562, 20.384906768798828, -14.631622314453125, 1.3571586608886719, 59.81935119628906, 20.68716049194336, 4.0393829345703125, 23.83214569091797, 59.2366943359375, 21.16699981689453, -9.951492309570312, 15.333831787109375, 42.828407287597656, 12.63681411743164, -36.06291198730469, -37.05530548095703, 31.53130531311035, -7.083278656005859, -9.394943237304688, -23.44281005859375, 26.748016357421875, 13.082382202148438, 15.444564819335938, 15.782089233398438, -0.08616447448730469, -44.522666931152344, -1.9929313659667969, 19.021526336669922, 9.072311401367188, 61.14497375488281, 7.6300201416015625, -3.066070556640625, 12.609878540039062, 47.438812255859375, 10.7235107421875, 21.724716186523438, 47.015533447265625, 7.49847412109375, 8.679801940917969, -5.336700439453125, -9.526214599609375, 32.928985595703125, -9.278793334960938, 29.028640747070312, 0.2792205810546875, -4.730068206787109, 53.533302307128906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000494.npy"}
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 10.122034072875977, "std": 17.903690338134766, "min": -25.77560806274414, "p10": -7.254696655273436, "median": 6.834695816040039, "p90": 35.667342376709, "max": 60.5380859375, "pos_frac": 0.703125, "sample": [-15.621490478515625, -6.649406433105469, -14.493171691894531, -25.77560806274414, 40.98748779296875, -6.549211502075195, -10.747922897338867, 5.7093658447265625, 15.970325469970703, -2.356597900390625, -4.559131622314453, -25.577857971191406, -2.41998291015625, 11.882518768310547, 11.633003234863281, 53.249534606933594, -4.296478271484375, 21.088058471679688, 14.680747985839844, 2.1853809356689453, 7.976696014404297, 17.505186080932617, 17.49974822998047, 3.208620071411133, 24.593280792236328, 37.821205139160156, -4.135723114013672, 18.737167358398438, 6.257255554199219, 2.8899574279785156, 60.5380859375, 20.94205665588379, 1.9816131591796875, -10.009668350219727, 24.894895553588867, 2.2088851928710938, 0.64385986328125, 13.465194702148438, 26.7266845703125, 22.50572967529297, -6.007049560546875, 6.804203033447266, 25.357257843017578, 17.546340942382812, 46.74911117553711, 2.282794952392578, 12.827444076538086, 0.5307998657226562, -1.4608612060546875, 48.5, -4.03447151184082, 30.64166259765625, 12.316459655761719, -3.5314788818359375, -7.514106750488281, 43.651031494140625, 6.8651885986328125, -1.0025100708007812, 5.47808837890625, 7.699676513671875, 11.002866744995117, 6.872093200683594, 28.031585693359375, 3.613750457763672], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000495.npy"}
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 13.190352439880371, "std": 22.546371459960938, "min": -19.14202880859375, "p10": -10.427949523925781, "median": 9.795685768127441, "p90": 36.964616775512695, "max": 116.109130859375, "pos_frac": 0.6875, "sample": [18.66522216796875, 15.609619140625, 63.41685485839844, 19.139114379882812, -10.083503723144531, 18.148963928222656, 6.739173889160156, 28.747467041015625, 27.007265090942383, 37.249019622802734, -9.430824279785156, 1.8464889526367188, 19.352643966674805, 8.266891479492188, -0.016626358032226562, -1.2157974243164062, -2.7268524169921875, 11.773086547851562, 21.67624282836914, 38.07408905029297, -17.459548950195312, 116.109130859375, -0.2473773956298828, 36.30101013183594, 18.46294403076172, -19.14202880859375, 54.263885498046875, 26.7447509765625, -1.7260932922363281, 26.484619140625, -7.7335052490234375, 4.789356231689453, 47.85932159423828, 29.632583618164062, -3.9157257080078125, 7.260461807250977, -13.888399124145508, -5.42680549621582, 9.014984130859375, 13.811996459960938, 12.3323974609375, 11.56317138671875, 27.070220947265625, 20.10123062133789, 15.402511596679688, 8.598403930664062, -10.638923645019531, 10.576387405395508, -4.489780426025391, 6.312694549560547, 61.954254150390625, -14.930580139160156, -14.997039794921875, 2.2570247650146484, 28.26739501953125, 18.397197723388672, 1.8640594482421875, 28.496902465820312, -7.805900573730469, 6.1530303955078125, -2.52386474609375, 13.499942779541016, -10.575569152832031, 3.86328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000496.npy"}
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 16.72954559326172, "std": 21.016551971435547, "min": -30.090980529785156, "p10": -7.80884780883789, "median": 16.558731079101562, "p90": 41.565964508056645, "max": 71.39425659179688, "pos_frac": 0.8125, "sample": [66.36277770996094, 13.727958679199219, 71.39425659179688, 37.58906555175781, 5.569976806640625, -3.9605026245117188, 9.403060913085938, 2.7519302368164062, 14.789176940917969, 13.865043640136719, 33.793338775634766, 11.169883728027344, 31.281972885131836, 13.874954223632812, 19.734176635742188, -0.45542144775390625, 30.15386962890625, -30.090980529785156, 32.99433517456055, 28.522926330566406, -2.1951656341552734, 17.596904754638672, 1.3114395141601562, -3.15362548828125, 9.210479736328125, -8.083206176757812, -13.219085693359375, 20.69771957397461, 33.62150573730469, 45.32097625732422, 1.0613231658935547, 19.308578491210938, 18.064117431640625, 21.094680786132812, -21.273460388183594, 20.190547943115234, 40.183753967285156, 29.011964797973633, 27.306049346923828, -7.168678283691406, 42.15834045410156, 9.437746047973633, 0.7449760437011719, 21.291419982910156, 21.549270629882812, -10.942672729492188, 60.03607177734375, 52.776756286621094, 3.235321044921875, 4.351879119873047, 19.5469970703125, 31.2738037109375, 28.319250106811523, 15.520557403564453, 5.838020324707031, 65.94296264648438, -24.52960205078125, 4.354362487792969, 1.5802497863769531, 22.090740203857422, 27.5594482421875, 7.671600341796875, -8.900924682617188, 18.42572021484375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000497.npy"}
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 15.366159439086914, "std": 19.70924949645996, "min": -38.998199462890625, "p10": -3.6587347030639648, "median": 10.892261505126953, "p90": 44.6414909362793, "max": 64.47401428222656, "pos_frac": 0.8125, "sample": [-2.1915130615234375, 15.940753936767578, 54.28913879394531, 17.260608673095703, -8.71575927734375, -10.31015396118164, 28.833969116210938, 44.74781799316406, -4.584651947021484, 0.292083740234375, 24.927413940429688, 12.046302795410156, 13.026237487792969, 7.633583068847656, 44.393394470214844, 5.8689422607421875, 3.119504928588867, -3.527956008911133, 0.6661033630371094, 24.006149291992188, 52.93431091308594, 3.9303817749023438, 35.53289794921875, 3.860036849975586, 7.3948822021484375, 9.405044555664062, -8.400375366210938, 46.13610076904297, 37.15818405151367, 0.07295989990234375, 33.16119384765625, 39.44694900512695, 3.7690048217773438, 18.40350341796875, 11.592811584472656, -5.41650390625, -3.71478271484375, 20.018442153930664, 1.8831424713134766, 2.056276321411133, 25.81108856201172, 38.70071792602539, -38.998199462890625, 19.10464859008789, 10.549379348754883, 7.330390930175781, 1.6795616149902344, 13.202117919921875, 10.664604187011719, 11.119918823242188, 40.12993621826172, 57.22515106201172, 28.913124084472656, 2.3503341674804688, 47.79510498046875, 2.131195068359375, 34.6854248046875, 13.826416015625, -0.1298351287841797, -2.1814727783203125, 4.025787353515625, -0.17588043212890625, 14.254280090332031, 64.47401428222656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000498.npy"}
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 14.025482177734375, "std": 21.369157791137695, "min": -30.689483642578125, "p10": -12.989233016967773, "median": 13.501266479492188, "p90": 41.80916404724122, "max": 66.572998046875, "pos_frac": 0.71875, "sample": [26.064659118652344, 43.11522674560547, -30.689483642578125, 37.1185302734375, 11.005287170410156, 11.436397552490234, 10.235712051391602, -27.199241638183594, 4.361522674560547, 14.866920471191406, 29.15772247314453, 20.6265869140625, 44.496063232421875, -12.336627960205078, 27.64349365234375, -1.5069503784179688, 66.572998046875, -23.4088134765625, 40.858341217041016, 45.94807434082031, 17.017288208007812, -1.849639892578125, -7.346574783325195, -13.2689208984375, -7.705411911010742, 23.200668334960938, 14.097030639648438, 35.052001953125, -8.535520553588867, 5.5815887451171875, -0.9937477111816406, -0.9482460021972656, 22.077404022216797, 15.314498901367188, 18.9845027923584, -2.675384521484375, 3.0646896362304688, 23.824443817138672, -9.499486923217773, 11.090763092041016, 38.729736328125, 3.5561904907226562, 33.05530548095703, 11.279052734375, -24.628265380859375, 20.41155242919922, 5.999267578125, 39.996429443359375, 30.984619140625, 60.762481689453125, 27.132713317871094, 12.905502319335938, 22.483421325683594, -14.741043090820312, -17.599151611328125, 30.978118896484375, 27.939970016479492, 16.11267852783203, -7.0886688232421875, 42.21665954589844, 4.742301940917969, 47.40904998779297, 8.162040710449219, 1.9825325012207031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000499.npy"}
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 12.87067985534668, "std": 19.211130142211914, "min": -24.736862182617188, "p10": -12.412585830688474, "median": 9.97722053527832, "p90": 38.649199676513675, "max": 58.931392669677734, "pos_frac": 0.8125, "sample": [-5.3108367919921875, 12.041282653808594, 5.642173767089844, 2.076364517211914, 19.404159545898438, 6.668449401855469, 20.167434692382812, 40.74171447753906, 6.27044677734375, 47.21443557739258, 11.994468688964844, 42.85797119140625, 34.486595153808594, -2.102142333984375, 38.09614562988281, 25.832351684570312, 58.931392669677734, -0.4132232666015625, 9.026718139648438, -13.069656372070312, 34.698097229003906, 19.662803649902344, 22.262855529785156, 33.149696350097656, 2.6101951599121094, 46.54249954223633, 23.424392700195312, 2.178213119506836, 1.9059906005859375, -24.186439514160156, -10.87942123413086, 2.859813690185547, 2.162487030029297, 9.590911865234375, -20.387784957885742, -10.430864334106445, 13.891159057617188, 10.231948852539062, 30.35572052001953, 0.9401321411132812, 20.778125762939453, 3.935964584350586, 18.190147399902344, 8.990581512451172, -13.79806137084961, 28.278596878051758, 3.78778076171875, -22.49932861328125, 36.32804489135742, -24.358802795410156, 43.922264099121094, 1.917022705078125, 1.3047504425048828, 12.045661926269531, 23.124980926513672, 9.722492218017578, 6.480564117431641, 20.697723388671875, 22.08211326599121, 0.717041015625, -24.736862182617188, 21.91898536682129, 34.86888885498047, 38.88622283935547], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000500.npy"}
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 12.492513656616211, "std": 15.124780654907227, "min": -25.57807159423828, "p10": -1.694429206848144, "median": 7.785274505615234, "p90": 30.335773086547853, "max": 56.67352294921875, "pos_frac": 0.84375, "sample": [9.246330261230469, 24.172557830810547, 13.942703247070312, 18.60710906982422, 18.329509735107422, 21.20998191833496, 7.0361785888671875, 13.564918518066406, 13.791240692138672, 48.15239715576172, 44.83909606933594, 2.3015003204345703, 21.97595977783203, 2.932769775390625, 1.545999526977539, 1.1142711639404297, 20.85881805419922, 28.241836547851562, -14.858528137207031, -5.863710403442383, 5.126548767089844, -2.507659912109375, -0.061336517333984375, 25.617225646972656, 56.67352294921875, 35.44480895996094, 8.191314697265625, 42.17596435546875, 10.514533996582031, 0.035587310791015625, 2.49066162109375, 6.43658447265625, -1.9362754821777344, 6.283710479736328, 18.354934692382812, 7.03216552734375, 24.798940658569336, -2.8764305114746094, 4.748989105224609, 2.6665477752685547, 4.839513778686523, 29.72479248046875, 1.8609695434570312, 4.677703857421875, 5.235443115234375, -25.57807159423828, -0.25045013427734375, 19.63067626953125, -10.153858184814453, 19.001487731933594, 16.850387573242188, 29.00572967529297, 5.483665466308594, 0.5635452270507812, 7.379234313964844, -1.1301212310791016, 6.1690673828125, 6.492879867553711, 29.56329345703125, 30.59762191772461, 38.684410095214844, 17.262596130371094, 14.554819107055664, 8.704254150390625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000501.npy"}
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 12.687867164611816, "std": 16.293888092041016, "min": -17.605377197265625, "p10": -4.2254077911376955, "median": 9.13615608215332, "p90": 31.324995040893562, "max": 65.4815673828125, "pos_frac": 0.84375, "sample": [-4.0631103515625, 5.865224838256836, 4.754161834716797, -8.090347290039062, 20.44989776611328, 21.837890625, 5.663993835449219, 1.8739776611328125, -13.4024658203125, 17.741622924804688, 16.280242919921875, 48.89677429199219, -2.022815704345703, -12.298011779785156, 20.549163818359375, 53.87501525878906, 7.536773681640625, 5.510833740234375, 7.68377685546875, 7.3797454833984375, 10.506416320800781, 2.1378440856933594, 21.773056030273438, 15.532760620117188, 6.34881591796875, 4.807111740112305, 29.197429656982422, -4.294963836669922, 1.2780570983886719, 18.031837463378906, 3.9236984252929688, 1.7524662017822266, 14.8204345703125, 7.194631576538086, 10.04226303100586, 6.3687744140625, 51.900508880615234, 65.4815673828125, -10.152359008789062, -0.5836830139160156, 20.130882263183594, 6.4163818359375, 19.061302185058594, 2.6948471069335938, 9.855579376220703, 2.3642349243164062, 8.416732788085938, 23.07196044921875, 28.63669204711914, -15.901618957519531, 15.051082611083984, 22.624099731445312, 25.204017639160156, 0.229827880859375, 17.392623901367188, -17.605377197265625, 18.003341674804688, 32.23680877685547, 2.010345458984375, 37.868995666503906, 11.482425689697266, 22.522178649902344, 35.070281982421875, 23.09685707092285], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000502.npy"}
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 13.057201385498047, "std": 19.14435386657715, "min": -31.033023834228516, "p10": -8.450875091552733, "median": 10.109949111938477, "p90": 38.1844123840332, "max": 49.40400695800781, "pos_frac": 0.796875, "sample": [8.180084228515625, -3.402477264404297, -8.633026123046875, 2.462167739868164, 26.202430725097656, 9.358161926269531, 39.52362060546875, -30.126861572265625, 5.061546325683594, 25.345626831054688, 20.624679565429688, 9.417137145996094, 11.089805603027344, 49.40400695800781, -2.385528564453125, 17.473217010498047, -5.56501579284668, 0.4903717041015625, 45.281951904296875, -13.332122802734375, -0.15538787841796875, 10.06967544555664, 45.669525146484375, 15.240631103515625, 16.794330596923828, 10.150222778320312, 20.18053436279297, 34.3841552734375, 8.932880401611328, 30.136295318603516, -3.3212738037109375, 4.317132949829102, 45.57795333862305, 36.826019287109375, 38.13698196411133, 0.2510223388671875, 17.98896026611328, -8.025856018066406, -21.018112182617188, 3.226043701171875, 6.290065765380859, 10.32427978515625, 2.255922317504883, 15.714807510375977, 5.073455810546875, 5.500679016113281, 9.03179931640625, 38.19606018066406, 38.15723419189453, 13.329765319824219, 34.080501556396484, 35.64643096923828, -31.033023834228516, 3.4241409301757812, 44.83605194091797, 37.55998611450195, 4.105855941772461, -17.02433204650879, 15.632286071777344, 20.651344299316406, 8.939979553222656, 18.140960693359375, 31.249982833862305, -26.254837036132812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000503.npy"}
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 17.38503646850586, "std": 17.525440216064453, "min": -11.795877456665039, "p10": -2.1272602081298815, "median": 15.157466888427734, "p90": 39.5944465637207, "max": 68.14603424072266, "pos_frac": 0.84375, "sample": [-7.971588134765625, 39.673545837402344, -3.8366470336914062, 30.98995018005371, 22.296585083007812, -2.62451171875, 24.81158447265625, 14.317689895629883, 57.05986785888672, -0.6142559051513672, 1.68695068359375, 20.162559509277344, 8.353797912597656, 53.57975387573242, 8.106334686279297, 35.80012130737305, 22.614501953125, -7.433013916015625, 21.508575439453125, 50.589927673339844, 21.202259063720703, 10.851940155029297, 3.785472869873047, 37.248619079589844, 39.409881591796875, -5.8162841796875, 9.195745468139648, 43.22482681274414, 6.551948547363281, 68.14603424072266, 20.936981201171875, 24.269699096679688, -0.7683658599853516, 25.231693267822266, 22.599807739257812, 0.1461944580078125, 6.383216857910156, -10.944168090820312, 33.207916259765625, 11.722137451171875, 32.572410583496094, 1.4317779541015625, -0.9670066833496094, 16.79437255859375, 3.1461181640625, 24.873371124267578, 16.391921997070312, 10.466344833374023, 14.193572998046875, 26.004364013671875, 32.355857849121094, 3.760852813720703, -11.795877456665039, 6.378448486328125, 15.997243881225586, 11.625938415527344, 5.301023483276367, 29.452835083007812, 22.18431854248047, 5.277713775634766, 8.419313430786133, 54.35613250732422, 18.374130249023438, 10.389768600463867], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000504.npy"}
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 11.885502815246582, "std": 20.77714729309082, "min": -33.213401794433594, "p10": -9.179886627197265, "median": 7.775851249694824, "p90": 38.494275665283205, "max": 67.04454040527344, "pos_frac": 0.6875, "sample": [25.500911712646484, -0.7040443420410156, -9.729644775390625, 4.318401336669922, -2.3156967163085938, 38.112335205078125, -6.127685546875, 67.04454040527344, 24.98626136779785, 32.95319747924805, 43.32341003417969, -7.897117614746094, 4.824867248535156, 3.460765838623047, 11.856414794921875, -2.6446533203125, 8.4244384765625, 1.9885063171386719, 18.882741928100586, 10.046371459960938, -7.887550354003906, 23.86463165283203, 29.82404327392578, 22.17029571533203, 26.3699951171875, 65.10408782958984, 17.20612335205078, -7.4146575927734375, 11.977272033691406, -1.0082130432128906, 18.43274688720703, -3.6000518798828125, 42.653846740722656, -25.553977966308594, 54.339515686035156, 34.08910369873047, 15.553028106689453, 24.815168380737305, -3.6840343475341797, -4.467071533203125, 12.4732666015625, 4.9361572265625, 3.0405807495117188, -33.213401794433594, 7.75927734375, 6.48399543762207, 17.99732208251953, -11.362741470336914, 38.59489440917969, -26.649600982666016, -3.2123794555664062, 25.514598846435547, 41.346858978271484, 7.6171417236328125, 34.20015335083008, 2.2159576416015625, 21.43170166015625, -12.894813537597656, 6.0708465576171875, 38.259498596191406, 7.792425155639648, 0.23276519775390625, -7.490898132324219, -19.560073852539062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000505.npy"}
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 12.035234451293945, "std": 20.784624099731445, "min": -36.42070388793945, "p10": -10.227194213867188, "median": 12.058131217956543, "p90": 35.5965503692627, "max": 80.6939697265625, "pos_frac": 0.75, "sample": [11.97706413269043, 34.84151840209961, 0.4607887268066406, -3.4585723876953125, 15.812088012695312, 15.174774169921875, 16.497394561767578, -5.214824676513672, 16.77402114868164, -9.903106689453125, 1.7998847961425781, 23.511505126953125, 11.087242126464844, 10.719108581542969, 6.1534423828125, 13.380241394042969, -25.475017547607422, 56.21351623535156, 80.6939697265625, 1.2815475463867188, 34.4171142578125, -19.819610595703125, 1.0072021484375, -29.434539794921875, -18.521240234375, -36.42070388793945, 5.7552032470703125, 50.590301513671875, 35.920135498046875, 33.88341522216797, 56.89641189575195, -12.218238830566406, 41.483619689941406, -4.755523681640625, 34.644569396972656, 5.617683410644531, 36.305850982666016, 2.6171340942382812, 22.01207733154297, -10.3660888671875, -7.28228759765625, -0.4756317138671875, 3.9476547241210938, 19.611900329589844, 17.353744506835938, 12.139198303222656, -0.6422309875488281, 9.861503601074219, 12.392427444458008, 31.34093475341797, 27.13347053527832, 12.311553955078125, 4.028589248657227, 24.159515380859375, 17.860485076904297, 16.315635681152344, 3.2144851684570312, 16.563640594482422, -5.7957763671875, -7.196769714355469, 9.958358764648438, 14.359039306640625, 24.00226593017578, 13.151885986328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000506.npy"}
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 13.82800579071045, "std": 20.1926212310791, "min": -21.96338653564453, "p10": -8.99600830078125, "median": 12.300506591796875, "p90": 38.64636840820313, "max": 71.83468627929688, "pos_frac": 0.703125, "sample": [27.43414306640625, 0.955657958984375, -6.085319519042969, 28.244842529296875, 71.83468627929688, 10.350860595703125, 29.023651123046875, -18.205322265625, -16.730392456054688, 18.412914276123047, 5.241615295410156, 17.52469253540039, -8.936172485351562, 28.494720458984375, -4.121856689453125, 14.250152587890625, -0.04534912109375, -9.698043823242188, 30.514080047607422, 5.10856819152832, 7.6012115478515625, 25.18421173095703, 29.763507843017578, -7.82402229309082, 39.25312042236328, 0.06462669372558594, 9.22989273071289, 24.894424438476562, 15.139396667480469, 16.140487670898438, 6.6333770751953125, -2.675018310546875, 63.147430419921875, -8.047592163085938, 20.073970794677734, 21.385915756225586, -6.120086669921875, 26.978500366210938, -13.879289627075195, -3.5006790161132812, -2.2366104125976562, 21.095489501953125, 2.7969188690185547, 30.0310001373291, 20.148719787597656, 6.8078460693359375, 37.230613708496094, 7.7139739990234375, 50.7896728515625, 54.89227294921875, -21.96338653564453, 22.8948974609375, -4.7693634033203125, 10.111686706542969, 27.36376953125, -0.7868270874023438, 25.73184585571289, -9.021652221679688, 17.510330200195312, -17.233158111572266, 17.58733367919922, 44.22997283935547, 7.236003875732422, 49.81953430175781], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000507.npy"}
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 14.780179977416992, "std": 19.55887222290039, "min": -20.737234115600586, "p10": -7.962543296813964, "median": 12.913841247558594, "p90": 36.35453338623047, "max": 69.60758972167969, "pos_frac": 0.75, "sample": [25.822879791259766, 1.1425247192382812, 34.78253173828125, 23.61071014404297, 5.2867431640625, -11.141159057617188, 68.74690246582031, -8.381263732910156, 29.365968704223633, 38.88066864013672, 69.60758972167969, 35.03742980957031, 3.8677902221679688, 5.989677429199219, -3.40228271484375, 29.764923095703125, -20.737234115600586, 20.33277130126953, 7.143440246582031, -11.028926849365234, 11.91961669921875, -2.1278533935546875, 23.65117073059082, -1.70501708984375, -5.359016418457031, -18.228958129882812, 17.383087158203125, 0.42525672912597656, 24.56427001953125, 23.426921844482422, -0.98419189453125, 32.18080139160156, 10.327239990234375, 18.157424926757812, 29.616134643554688, 27.897064208984375, 5.355983734130859, 17.145172119140625, 45.07426452636719, 4.790496826171875, -3.4794921875, -9.098358154296875, 34.57447052001953, 25.183059692382812, -4.471702575683594, 1.0845718383789062, 24.858165740966797, 29.7447509765625, 45.88182830810547, 4.330738067626953, 31.255634307861328, 0.7056598663330078, 4.586490631103516, -6.985528945922852, 8.388442993164062, -19.423294067382812, -6.956878662109375, 47.57518005371094, 19.33776092529297, 28.54766845703125, 19.34405517578125, 12.492111206054688, 13.3355712890625, 36.91900634765625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000508.npy"}
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 12.687555313110352, "std": 16.86067008972168, "min": -24.855690002441406, "p10": -8.969410705566405, "median": 13.883235931396484, "p90": 32.763665008544926, "max": 50.49658203125, "pos_frac": 0.765625, "sample": [20.40149688720703, -7.227313995361328, 8.60519790649414, 28.080955505371094, 5.437797546386719, 0.0007190704345703125, 7.432365417480469, -2.3378524780273438, -20.889305114746094, 3.7727127075195312, 23.84258270263672, 40.411903381347656, -5.947662353515625, 8.917701721191406, 34.39153289794922, -24.855690002441406, 30.791717529296875, 25.097450256347656, 2.330047607421875, 7.954689025878906, 21.040496826171875, -8.070388793945312, 29.531158447265625, -14.562744140625, 27.362388610839844, 10.803054809570312, -1.3797683715820312, 2.021717071533203, 22.096046447753906, 4.5125732421875, 23.64190673828125, 31.16794204711914, 16.185104370117188, 13.321189880371094, 10.226089477539062, -0.29308128356933594, 26.582660675048828, 15.541084289550781, 20.776535034179688, 43.24358367919922, 17.645652770996094, -1.5037612915039062, 33.72730255126953, -16.453235626220703, 32.947509765625, 21.634811401367188, 9.387619018554688, 14.445281982421875, 50.49658203125, 33.28376007080078, 31.985923767089844, 1.8492584228515625, -7.769683837890625, -18.793731689453125, 4.254974365234375, 23.53530502319336, 26.470563888549805, -9.354705810546875, -15.810699462890625, 8.83953857421875, 18.515117645263672, 21.311141967773438, 32.334693908691406, 19.065719604492188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000509.npy"}
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 14.474861145019531, "std": 19.47356414794922, "min": -21.25814437866211, "p10": -12.965035247802733, "median": 12.52033805847168, "p90": 39.883727645874025, "max": 59.64447021484375, "pos_frac": 0.78125, "sample": [5.991386413574219, 40.19681167602539, 28.010231018066406, 11.016746520996094, -17.065460205078125, 1.4138031005859375, 18.65945816040039, 59.64447021484375, 10.966346740722656, 6.767730712890625, -4.509063720703125, 3.7727584838867188, 5.484477996826172, 10.431459426879883, -21.25814437866211, -1.0103912353515625, 2.6229248046875, 12.252059936523438, 2.7582244873046875, 10.122861862182617, 13.782867431640625, 28.621381759643555, 38.93940734863281, -12.25106430053711, 2.3383102416992188, 34.922054290771484, 25.318260192871094, -0.2334747314453125, 0.4649219512939453, 29.72879409790039, 24.860137939453125, 21.626121520996094, 46.15498352050781, 21.88993263244629, 12.765789031982422, 15.08953857421875, 17.75659942626953, 59.248924255371094, -4.474090576171875, -15.918365478515625, 5.66796875, -16.306453704833984, 24.621810913085938, 23.45665740966797, 29.1075439453125, 7.300849914550781, 12.274887084960938, 55.85187530517578, 26.832733154296875, -5.5455780029296875, 20.88060760498047, 13.708911895751953, 54.2186393737793, 15.724544525146484, -14.564468383789062, 14.03177261352539, 39.1531982421875, 45.38835144042969, -0.2773933410644531, 19.458633422851562, 32.290740966796875, 9.082403182983398, -13.27102279663086, -19.595802307128906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000510.npy"}
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 17.618114471435547, "std": 21.088951110839844, "min": -19.307723999023438, "p10": -7.47267112731933, "median": 14.985618591308594, "p90": 47.72174224853516, "max": 70.99176025390625, "pos_frac": 0.828125, "sample": [70.99176025390625, 34.332191467285156, 18.445322036743164, 53.31866455078125, 11.229793548583984, 5.0786895751953125, 6.0828094482421875, 1.0598526000976562, 41.608367919921875, 43.50790786743164, 17.435577392578125, -17.980377197265625, 16.53003692626953, 9.427818298339844, 44.102745056152344, 24.092514038085938, 27.98389434814453, 0.47563934326171875, 47.822509765625, 24.432205200195312, 45.469200134277344, -2.21368408203125, 6.76458740234375, -18.889225006103516, -13.728813171386719, -0.735107421875, 52.75760269165039, 22.501022338867188, 15.857688903808594, 13.21710205078125, 1.4581050872802734, 9.95086669921875, 18.015213012695312, 9.007614135742188, 20.67841339111328, 19.747039794921875, 39.095306396484375, 3.395526885986328, -9.636940002441406, 9.648382186889648, 3.6872024536132812, 17.892414093017578, 34.38184356689453, 49.00807189941406, 47.163818359375, 27.37445831298828, -16.023239135742188, 5.82989501953125, 1.1214122772216797, 52.316368103027344, 7.635902404785156, 3.0474891662597656, 26.97878646850586, 14.876121520996094, 1.2480430603027344, 6.324254989624023, -10.214618682861328, 57.33734893798828, 15.095115661621094, 35.60039138793945, 47.48661804199219, -2.422710418701172, -19.307723999023438, -1.2157859802246094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000511.npy"}
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 13.19101333618164, "std": 21.590417861938477, "min": -23.55841064453125, "p10": -11.36330757141113, "median": 6.601554870605469, "p90": 42.2333854675293, "max": 83.42626953125, "pos_frac": 0.75, "sample": [22.97260284423828, 10.520294189453125, 21.34623146057129, -17.46772003173828, -3.1061744689941406, 46.848838806152344, 3.362812042236328, 0.8252792358398438, 2.8103904724121094, 16.009071350097656, 21.033527374267578, 35.90765380859375, -23.435546875, -9.783893585205078, 19.644329071044922, 1.5013885498046875, 1.1669769287109375, -1.4602699279785156, 40.718482971191406, 0.7581844329833984, 55.95697021484375, 28.533018112182617, 1.1322708129882812, 20.129497528076172, 7.8602752685546875, 42.88262939453125, -23.523391723632812, 25.660545349121094, 19.74547576904297, 44.81523132324219, 2.1891403198242188, 18.228988647460938, 46.368141174316406, 0.6013050079345703, 17.136764526367188, 5.0213623046875, 1.2954998016357422, 23.960968017578125, -5.886205673217773, 34.541748046875, -1.5536651611328125, -16.64735984802246, -1.9466400146484375, -2.1304244995117188, 10.411087036132812, 38.46977233886719, -0.9643783569335938, 5.34283447265625, 0.416473388671875, 0.7088737487792969, 40.55333709716797, -2.3143997192382812, 5.034431457519531, -15.844436645507812, 3.3295135498046875, 19.439620971679688, 83.42626953125, 39.435020446777344, 57.508201599121094, 31.95598602294922, 10.813882827758789, -23.55841064453125, 17.55675506591797, -12.040199279785156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000512.npy"}
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 14.092215538024902, "std": 17.43815040588379, "min": -20.214920043945312, "p10": -4.860563659667968, "median": 10.567842483520508, "p90": 44.2560604095459, "max": 60.03768539428711, "pos_frac": 0.84375, "sample": [44.856956481933594, 29.535614013671875, 51.188236236572266, 14.916301727294922, 15.67987060546875, 8.73562240600586, 10.759063720703125, 13.449359893798828, 4.680248260498047, -8.44504165649414, 8.071464538574219, -20.214920043945312, 28.577743530273438, 8.123907089233398, 9.031866073608398, 20.978769302368164, -1.6156044006347656, 8.139572143554688, 60.03768539428711, 22.80396270751953, 19.777244567871094, 51.99874496459961, 18.99665069580078, 9.104270935058594, 46.470497131347656, 3.0843887329101562, 8.041458129882812, 18.05319595336914, 3.2970752716064453, 12.647884368896484, 10.546920776367188, 3.283344268798828, 25.733848571777344, -18.579906463623047, 11.33740234375, 11.224908828735352, -11.624908447265625, 5.327850341796875, 47.964813232421875, 8.843685150146484, -5.029138565063477, -9.939298629760742, 53.538612365722656, -0.3945770263671875, 26.660003662109375, 42.85396957397461, 39.69862747192383, 0.48990631103515625, 21.712692260742188, 10.588764190673828, 9.412490844726562, 13.030948638916016, 18.71076202392578, 8.033279418945312, 16.011627197265625, -4.467222213745117, 5.7109527587890625, 8.56842041015625, 2.515674591064453, 3.8024425506591797, 21.41855239868164, -9.090324401855469, 1.0455322265625, 12.199024200439453], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000513.npy"}
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 14.38492488861084, "std": 18.40530014038086, "min": -36.9566764831543, "p10": -7.667620086669921, "median": 15.058372497558594, "p90": 40.23497314453125, "max": 58.28936767578125, "pos_frac": 0.78125, "sample": [3.042276382446289, 4.462287902832031, 17.675392150878906, 27.74773406982422, 20.867835998535156, 14.724203109741211, 40.64579772949219, -3.0373153686523438, 12.5465087890625, 2.028545379638672, 16.692745208740234, 30.378833770751953, 18.138267517089844, 19.447845458984375, 0.06527519226074219, 9.477569580078125, -22.15008544921875, 18.984071731567383, 2.6751022338867188, 14.852210998535156, -36.9566764831543, 2.7812347412109375, 26.777040481567383, 21.240827560424805, 51.96617889404297, -3.0602264404296875, -2.6973114013671875, 27.58626937866211, 24.97364044189453, 33.732826232910156, -3.9362411499023438, -10.402322769165039, 26.608505249023438, 16.63622283935547, 15.264533996582031, 17.821781158447266, 20.797149658203125, 37.546630859375, 1.2576637268066406, 11.96728515625, 4.212089538574219, -7.7761993408203125, -9.611408233642578, 6.669670104980469, 23.680496215820312, 22.22137451171875, 12.728073120117188, -2.5696182250976562, 14.432478904724121, 22.290504455566406, -7.414268493652344, 8.572479248046875, 8.92892074584961, 41.57463836669922, -13.611358642578125, 33.24492645263672, 43.28711700439453, 17.563796997070312, 42.163291931152344, -6.690385818481445, 58.28936767578125, 49.295982360839844, -9.293319702148438, 39.27638244628906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000514.npy"}
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 14.396139144897461, "std": 18.014543533325195, "min": -17.981475830078125, "p10": -7.680984306335449, "median": 12.647506713867188, "p90": 38.54249420166016, "max": 54.058738708496094, "pos_frac": 0.78125, "sample": [3.052165985107422, 14.677129745483398, 10.675308227539062, 22.26683807373047, 46.6072998046875, 33.133522033691406, 16.843120574951172, 30.853168487548828, 5.800071716308594, 30.24072265625, 16.68798828125, 19.23542022705078, 25.938745498657227, -6.316144943237305, 19.978254318237305, 8.1080322265625, -13.326530456542969, 37.01972961425781, -13.224845886230469, 5.1992950439453125, 15.016242980957031, 0.9112434387207031, 5.3193206787109375, -2.0988616943359375, -15.900901794433594, 26.97466278076172, 54.058738708496094, 24.303916931152344, 11.852104187011719, -1.501190185546875, 13.442909240722656, 19.22479248046875, 2.717113494873047, 2.2275257110595703, 19.713790893554688, 37.261600494384766, -17.310745239257812, 23.182022094726562, 3.522531509399414, 16.773056030273438, -7.661922454833984, 7.791282653808594, 43.049591064453125, -17.981475830078125, 11.312164306640625, 52.93231964111328, 8.983938217163086, -7.689153671264648, 35.78949737548828, 37.4439697265625, 17.069236755371094, 11.340042114257812, 7.44976806640625, 43.10881042480469, 39.01329040527344, -5.330829620361328, 32.646888732910156, 1.6787147521972656, 43.98867416381836, -12.926910400390625, 7.2273406982421875, 25.075653076171875, -2.8489608764648438, -3.2482070922851562], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000515.npy"}
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 11.97529411315918, "std": 18.11087989807129, "min": -36.093711853027344, "p10": -5.454682159423827, "median": 12.959644317626953, "p90": 32.708530426025405, "max": 53.442962646484375, "pos_frac": 0.734375, "sample": [48.66239929199219, 53.442962646484375, 22.71957778930664, -7.432868957519531, -5.85552978515625, 29.635135650634766, -3.6576004028320312, 11.51156234741211, 17.598342895507812, 18.094890594482422, 6.77398681640625, 17.856191635131836, 17.764739990234375, 48.034034729003906, 9.73513412475586, 0.8934421539306641, 20.531280517578125, 26.670833587646484, 27.660110473632812, 18.159957885742188, -7.577789306640625, -36.093711853027344, 9.424560546875, 19.771621704101562, 50.74620819091797, 18.17046356201172, 20.981430053710938, 1.1929817199707031, 38.837276458740234, 22.875762939453125, -29.650131225585938, -4.040168762207031, 1.2149429321289062, -0.9771041870117188, 14.584815979003906, 30.01514434814453, 16.92186737060547, -1.0568370819091797, 7.814567565917969, -1.3545799255371094, 21.773345947265625, -30.754127502441406, 18.374195098876953, 23.001249313354492, 18.166282653808594, 4.764251708984375, 25.338424682617188, 8.789714813232422, 4.51861572265625, 13.633773803710938, 2.5556182861328125, 12.285514831542969, 5.5621795654296875, -3.5619583129882812, -16.611526489257812, 29.039146423339844, -4.054195404052734, 18.027799606323242, -3.68218994140625, -0.5199928283691406, 33.86283874511719, 2.7808303833007812, -4.519371032714844, 37.04850387573242], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000516.npy"}
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 12.095857620239258, "std": 19.399662017822266, "min": -23.79590606689453, "p10": -7.958620834350586, "median": 7.973886489868164, "p90": 41.61677703857422, "max": 64.31343841552734, "pos_frac": 0.703125, "sample": [18.839313507080078, -5.595741271972656, -2.355548858642578, 17.15131378173828, -12.26556396484375, 12.670330047607422, 0.6901397705078125, 11.683843612670898, 49.55792236328125, 23.330097198486328, 12.735000610351562, -12.922918319702148, -6.915779113769531, 7.573394775390625, 5.6080322265625, -2.6193008422851562, 2.7498531341552734, 14.330207824707031, 19.975473403930664, -3.5261077880859375, 17.993999481201172, -16.375320434570312, 41.9254150390625, 37.657623291015625, -7.822540283203125, 10.357461929321289, 6.3609466552734375, 7.487571716308594, 6.821136474609375, 31.05874252319336, 24.926589965820312, 40.89662170410156, 31.809534072875977, 7.965545654296875, -7.1926422119140625, -15.142007827758789, 4.405487060546875, 42.73956298828125, 1.122457504272461, -5.251106262207031, 16.906578063964844, 33.11418151855469, 57.858551025390625, -2.5538711547851562, 14.41424560546875, 42.211647033691406, 13.635692596435547, 15.29696273803711, -13.227394104003906, 2.3120193481445312, -6.826276779174805, 18.308975219726562, -5.081321716308594, 64.31343841552734, 56.49549865722656, 32.229087829589844, 6.764007568359375, -8.01694107055664, 6.190418243408203, 7.982227325439453, 22.771791458129883, -23.79590606689453, 15.151634216308594, -4.7594451904296875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000517.npy"}
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 11.2738676071167, "std": 18.752334594726562, "min": -20.561752319335938, "p10": -10.269843101501463, "median": 9.828460693359375, "p90": 35.67737503051759, "max": 68.00520324707031, "pos_frac": 0.671875, "sample": [12.639762878417969, 0.6035614013671875, 2.5340652465820312, 31.64659881591797, -2.31268310546875, -1.6739425659179688, 7.584861755371094, 21.660404205322266, -20.351581573486328, 12.357902526855469, 5.541339874267578, -5.583732604980469, 9.666389465332031, 33.922264099121094, 15.53759765625, -12.7313232421875, 19.048194885253906, 43.32771301269531, -15.740701675415039, 21.95294189453125, 39.25054931640625, -20.561752319335938, 3.6144027709960938, 11.510807037353516, 21.64593505859375, 13.798038482666016, -5.490379333496094, 13.898826599121094, -16.608489990234375, 32.59685516357422, -0.005825042724609375, -8.322809219360352, -1.0881481170654297, 23.75813102722168, 28.914535522460938, -2.3872528076171875, 16.543771743774414, -11.104286193847656, 15.681663513183594, 68.00520324707031, 14.12518310546875, -3.186798095703125, 5.893310546875, 61.08843994140625, 36.4295654296875, 27.019134521484375, -3.3305816650390625, 43.1864013671875, 9.769088745117188, -15.618034362792969, -5.7408599853515625, -1.6289691925048828, 9.887832641601562, 0.6787662506103516, 9.477561950683594, 25.42571258544922, 14.800933837890625, 12.164100646972656, -0.6274852752685547, -2.2702407836914062, 11.53518295288086, 1.4261665344238281, 52.522064208984375, 15.221611022949219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000518.npy"}
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 7.926634788513184, "std": 21.6678524017334, "min": -54.6053466796875, "p10": -16.509224700927735, "median": 6.448005676269531, "p90": 39.01859283447267, "max": 50.38050079345703, "pos_frac": 0.65625, "sample": [-23.652387619018555, 26.714004516601562, -10.516815185546875, -35.34185028076172, -7.323848724365234, 21.092063903808594, -2.627643585205078, 50.38050079345703, 10.647735595703125, 15.842926025390625, 1.1420745849609375, 5.58616828918457, 27.730140686035156, 11.398506164550781, 10.069313049316406, 4.343294143676758, 29.273296356201172, 4.053213119506836, -17.953950881958008, -24.4578857421875, 31.600740432739258, 43.42765808105469, -16.591949462890625, 14.835418701171875, 26.88641357421875, 47.10468673706055, -12.491527557373047, 16.71210479736328, 20.58868408203125, 7.8953857421875, -8.653345108032227, 35.62727355957031, 6.75250244140625, -7.8468017578125, 13.294296264648438, -12.078598022460938, 4.3798065185546875, 8.52933120727539, 43.60699462890625, -10.66455078125, 1.4764404296875, 10.071699142456055, 30.163238525390625, 6.826395034790039, 6.1435089111328125, -54.6053466796875, 4.0200042724609375, -10.670272827148438, 21.86577606201172, -10.889802932739258, 40.472015380859375, -13.662887573242188, -14.934131622314453, 45.530120849609375, 4.0381622314453125, 13.073043823242188, 28.539382934570312, 35.081520080566406, 3.408641815185547, -16.316200256347656, -0.7000503540039062, -1.003091812133789, 47.953208923339844, -17.890125274658203], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000519.npy"}
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 11.486869812011719, "std": 15.860755920410156, "min": -19.287818908691406, "p10": -7.320570373535155, "median": 9.51639175415039, "p90": 37.51914710998536, "max": 43.8303108215332, "pos_frac": 0.8125, "sample": [14.434713363647461, -2.7079715728759766, -2.642454147338867, 4.875007629394531, 35.54684829711914, -4.014923095703125, 17.41909408569336, 0.7922134399414062, 7.752960205078125, 34.97026062011719, 23.34760284423828, 14.152505874633789, 0.02172088623046875, 2.9467315673828125, 19.098770141601562, 9.61529541015625, 0.07695770263671875, -8.124961853027344, 2.2919158935546875, 43.8303108215332, 24.201507568359375, 12.391803741455078, 13.308309555053711, 41.51080322265625, 9.6270751953125, 15.622848510742188, 39.85289001464844, -19.287818908691406, -15.416465759277344, 34.04730224609375, 14.252647399902344, 5.805385589599609, 3.8923568725585938, 14.344619750976562, 8.183174133300781, 3.381397247314453, 41.79680633544922, 16.06726837158203, -5.443656921386719, -15.962310791015625, -15.528778076171875, 5.4027862548828125, 4.576812744140625, -15.741912841796875, 8.473091125488281, 6.782127380371094, 6.007301330566406, 10.787403106689453, 15.459259033203125, 10.855644226074219, 6.140754699707031, 40.89924621582031, 38.364418029785156, 8.054040908813477, 42.52685546875, 24.2858943939209, 22.399444580078125, 15.070697784423828, 0.8010025024414062, -2.7532825469970703, 9.417488098144531, 23.990392684936523, 26.147253036499023, -13.116806030273438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000520.npy"}
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 13.880088806152344, "std": 17.458477020263672, "min": -28.530685424804688, "p10": -3.292343330383301, "median": 12.227657318115234, "p90": 37.9105567932129, "max": 60.447509765625, "pos_frac": 0.8125, "sample": [19.834182739257812, 8.082489013671875, 38.391998291015625, 11.988800048828125, 36.69878387451172, 6.663734436035156, 43.528587341308594, -0.33885955810546875, 1.3915252685546875, 25.57672119140625, 21.20073699951172, -1.8932456970214844, 21.368820190429688, 13.042457580566406, 30.651138305664062, 19.320938110351562, 17.18675422668457, 9.303115844726562, 5.025115966796875, 48.58308029174805, 1.7392959594726562, 31.750595092773438, 5.6229248046875, 0.39078521728515625, -0.6673908233642578, 23.342422485351562, -14.792922973632812, 24.98686981201172, -0.0981597900390625, 18.702713012695312, 60.447509765625, 54.527374267578125, -17.332839965820312, 6.113319396972656, 1.225738525390625, -3.2501754760742188, 1.8747406005859375, 18.37710952758789, 4.463279724121094, 40.71611785888672, 29.742950439453125, 1.9659233093261719, -9.308216094970703, 12.466514587402344, -3.310415267944336, 14.173612594604492, 22.108413696289062, 23.058486938476562, 15.000232696533203, 36.787193298339844, 9.734264373779297, 11.150833129882812, -28.530685424804688, 45.766754150390625, -17.765487670898438, 5.512697219848633, 7.712717056274414, 3.78204345703125, 19.879974365234375, -3.3805465698242188, 13.113685607910156, 6.110176086425781, 15.962936401367188, 22.845428466796875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000521.npy"}
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 12.680675506591797, "std": 22.77235221862793, "min": -52.365966796875, "p10": -15.136979866027831, "median": 10.278438568115234, "p90": 39.75074577331544, "max": 63.982513427734375, "pos_frac": 0.75, "sample": [-7.704620361328125, 23.002838134765625, 3.622844696044922, 13.381420135498047, 13.615997314453125, 34.04723358154297, 10.535202026367188, 37.14192199707031, 36.21296691894531, 8.563833236694336, -28.254924774169922, -15.425031661987305, 12.651138305664062, 52.530250549316406, 6.902191162109375, 33.471824645996094, -16.96747589111328, 15.342472076416016, 26.129337310791016, 3.866365432739258, 37.67191696166992, 6.003887176513672, 8.187294006347656, -52.365966796875, 51.0233154296875, 2.463459014892578, 14.643321990966797, -7.6857147216796875, -14.464859008789062, 1.907571792602539, -3.6864700317382812, 47.297340393066406, 0.9780807495117188, 37.77544021606445, 31.86753273010254, 13.903228759765625, -4.177005767822266, 15.548904418945312, -5.65777587890625, 10.021675109863281, 16.9013614654541, 9.16696548461914, 5.822074890136719, 8.581581115722656, 20.060535430908203, -2.4097728729248047, 45.54173278808594, -23.529857635498047, 61.861663818359375, -4.094343185424805, 32.298118591308594, -25.310379028320312, 17.430213928222656, 63.982513427734375, 6.4469146728515625, 5.027748107910156, -28.855308532714844, 40.59730529785156, 27.919532775878906, 27.394432067871094, 5.34715461730957, -12.452945709228516, 29.115829467773438, 30.799232482910156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000522.npy"}
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 14.946388244628906, "std": 19.889509201049805, "min": -39.0045280456543, "p10": -11.031246948242186, "median": 15.686298370361328, "p90": 39.6226692199707, "max": 59.47963333129883, "pos_frac": 0.796875, "sample": [-3.0119285583496094, 43.77033996582031, 10.902698516845703, 9.211021423339844, 31.705673217773438, 28.909265518188477, 4.336181640625, 30.218284606933594, 31.626571655273438, 8.812698364257812, 6.017646789550781, 32.56072998046875, 37.621437072753906, -17.191082000732422, 32.24840545654297, 18.649734497070312, 16.329429626464844, 1.2521629333496094, 13.100875854492188, -23.99217987060547, 12.830345153808594, 26.297348022460938, -5.509376525878906, 6.0689544677734375, -23.461517333984375, 21.616653442382812, -6.4233551025390625, -0.6260566711425781, 18.235397338867188, 29.848861694335938, 17.831953048706055, 22.1212158203125, 29.58953857421875, 19.54793357849121, 24.44023895263672, 40.961666107177734, 42.20615005493164, 11.176254272460938, 16.81110382080078, -25.69927978515625, 13.465606689453125, -10.156417846679688, 59.47963333129883, 24.639320373535156, 38.94129180908203, 8.113243103027344, 4.969020843505859, 8.9759521484375, 45.1724853515625, -21.200233459472656, 48.71638488769531, 15.043167114257812, 20.402244567871094, -9.759651184082031, 39.91468811035156, 21.606834411621094, 12.933042526245117, 14.2056884765625, 35.78441619873047, -39.0045280456543, -11.406173706054688, 14.902894973754883, 3.07806396484375, 26.809844970703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000523.npy"}
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 10.720426559448242, "std": 16.55256462097168, "min": -43.57410430908203, "p10": -5.24561767578125, "median": 10.364910125732422, "p90": 30.42152328491212, "max": 54.22987365722656, "pos_frac": 0.703125, "sample": [27.096946716308594, 10.322189331054688, 52.59807586669922, 9.279048919677734, 6.1451416015625, -9.406005859375, -3.6296615600585938, -0.0035247802734375, 23.314407348632812, 5.786540985107422, 26.462081909179688, -4.007568359375, -15.975286483764648, 11.050788879394531, 4.235603332519531, 26.423004150390625, -21.815811157226562, -10.154083251953125, 9.083576202392578, 32.058372497558594, 20.449134826660156, -43.57410430908203, 25.208328247070312, 3.3857421875, -4.276702880859375, 11.141866683959961, 54.22987365722656, 15.783988952636719, 19.021512985229492, -4.92218017578125, -2.6549758911132812, 18.634056091308594, -2.2879791259765625, 8.605274200439453, 25.750244140625, 17.871288299560547, 21.00324821472168, 13.998794555664062, 13.764572143554688, -5.965740203857422, -5.246002197265625, -5.244720458984375, 2.8046188354492188, 33.58882141113281, 10.615045547485352, 3.8995285034179688, 35.84928894042969, -2.0617103576660156, 18.43718719482422, 21.234130859375, 8.132499694824219, 10.407630920410156, 17.12805938720703, 35.92820739746094, 11.896709442138672, 16.751981735229492, 27.556987762451172, 10.117347717285156, -3.2662811279296875, 0.08126068115234375, 31.649181365966797, -1.8797454833984375, 23.737701416015625, -0.04051399230957031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000524.npy"}
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 9.85859203338623, "std": 16.185781478881836, "min": -19.132583618164062, "p10": -9.065779495239257, "median": 8.431199073791504, "p90": 32.245018386840826, "max": 52.875736236572266, "pos_frac": 0.75, "sample": [13.916595458984375, -18.72412109375, -12.614616394042969, -7.217617034912109, -10.270843505859375, 8.06781005859375, 11.51446533203125, 0.57794189453125, 34.06874084472656, -6.967437744140625, 11.050212860107422, 6.582237243652344, 8.794588088989258, 7.2094879150390625, 5.009147644042969, 26.04937744140625, 7.553552627563477, -19.132583618164062, 20.466690063476562, 10.010284423828125, 8.91326904296875, -3.250843048095703, 41.017723083496094, 32.937950134277344, 0.3295135498046875, 17.693458557128906, 1.4813766479492188, 10.646636962890625, 11.886978149414062, 21.357696533203125, 2.4696197509765625, 28.533035278320312, 6.516881942749023, 20.210437774658203, 16.628604888916016, -4.687164306640625, 6.884498596191406, 4.688018798828125, 51.99072265625, -5.144805908203125, -9.85784912109375, 9.833477020263672, 15.952131271362305, -18.030113220214844, 3.238971710205078, 40.91603088378906, 2.453279495239258, 1.8187026977539062, 25.88690185546875, 30.628177642822266, -0.5989303588867188, 39.606719970703125, 9.299430847167969, 2.124053955078125, 17.106117248535156, 30.048492431640625, -4.048431396484375, 13.350997924804688, 12.83481216430664, 11.2772216796875, -2.4114837646484375, 52.875736236572266, -3.180025100708008, -17.2220458984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000525.npy"}
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 10.031301498413086, "std": 17.01561737060547, "min": -34.65260314941406, "p10": -10.085727882385253, "median": 8.581008911132812, "p90": 34.4809066772461, "max": 47.149566650390625, "pos_frac": 0.765625, "sample": [43.40773010253906, 16.834880828857422, -0.8616275787353516, 7.993997573852539, -13.68096923828125, 11.216182708740234, 5.035835266113281, 33.470245361328125, -26.304412841796875, -10.602996826171875, 8.322296142578125, 15.975475311279297, -18.091426849365234, 6.359893798828125, 16.385841369628906, -4.655178070068359, 13.250324249267578, 47.149566650390625, 6.113983154296875, 23.381086349487305, 12.623855590820312, 23.962738037109375, 10.520751953125, 34.91404724121094, 44.995361328125, 5.988750457763672, 6.071784973144531, 6.5138092041015625, 4.052791595458984, 13.181838989257812, -1.968069076538086, 3.7160778045654297, 27.54962921142578, -8.67735481262207, 21.826908111572266, -4.99394416809082, 18.43889617919922, 0.6373977661132812, 21.678977966308594, -2.2425537109375, 41.292686462402344, 16.66433334350586, 7.48260498046875, 9.399980545043945, 17.608638763427734, -24.39432144165039, -8.878767013549805, 3.0444107055664062, 31.60572052001953, -2.126922607421875, 13.041107177734375, 1.01123046875, 38.155670166015625, 10.72756576538086, 40.01142120361328, 5.3760528564453125, -13.781307220458984, 17.70522689819336, 10.991127014160156, 6.321037292480469, 8.8397216796875, 5.082275390625, 21.984024047851562, -34.65260314941406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000526.npy"}
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 8.345399856567383, "std": 18.370363235473633, "min": -38.24043655395508, "p10": -11.38525581359863, "median": 6.277767181396484, "p90": 35.99705867767336, "max": 59.00883865356445, "pos_frac": 0.625, "sample": [-4.55072021484375, 8.39306640625, 6.751834869384766, 22.93354034423828, 8.071060180664062, 3.6030101776123047, 41.116180419921875, 9.95419692993164, 31.78261375427246, 31.230255126953125, 41.48093795776367, -12.527862548828125, 20.815120697021484, -4.528083801269531, 16.598922729492188, -7.985607147216797, 39.7593994140625, -1.193380355834961, 6.756061553955078, -7.206499099731445, -8.578132629394531, -15.566917419433594, -16.115188598632812, -6.37176513671875, 20.888240814208984, 4.275154113769531, 46.32537078857422, -5.54608154296875, 21.264141082763672, 2.6388397216796875, 7.818031311035156, 3.6964797973632812, -4.51947021484375, 16.93905258178711, 5.803699493408203, 59.00883865356445, 20.331642150878906, -0.5530853271484375, -4.664892196655273, 22.677066802978516, 22.080039978027344, -0.8376388549804688, 23.428436279296875, -8.282337188720703, 4.11981201171875, -3.361337661743164, -8.719173431396484, 37.80324935913086, 3.7711181640625, -6.137632369995117, -13.913421630859375, 7.710447311401367, 14.4361572265625, 14.978109359741211, -14.162849426269531, -3.3637237548828125, 7.507598876953125, 16.72509002685547, 27.339622497558594, 42.03325653076172, -38.24043655395508, -24.600448608398438, 11.212760925292969, 1.5738601684570312], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000527.npy"}
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 8.864951133728027, "std": 18.634428024291992, "min": -36.834014892578125, "p10": -13.339885139465329, "median": 10.711821556091309, "p90": 31.542060089111338, "max": 51.63264465332031, "pos_frac": 0.703125, "sample": [38.14057922363281, -36.834014892578125, 44.08307647705078, -0.20446205139160156, 20.27545928955078, 8.563133239746094, 22.365177154541016, 0.6914138793945312, 1.8201332092285156, 10.72637939453125, -6.057348251342773, -7.618133544921875, 19.20012664794922, -33.691017150878906, -7.163064956665039, -15.008949279785156, 35.213260650634766, 4.131010055541992, 10.697263717651367, 2.0757675170898438, -4.817375183105469, 35.562042236328125, 2.483957290649414, -32.56706237792969, -0.9222526550292969, 13.712501525878906, -3.6924209594726562, -5.7535552978515625, 12.866472244262695, 28.57720947265625, 22.581735610961914, -1.354217529296875, 18.146724700927734, 43.361534118652344, 7.6500396728515625, -2.791353225708008, 51.63264465332031, 27.48668670654297, 29.266082763671875, 6.754058837890625, 19.807296752929688, 21.467979431152344, 1.8570308685302734, 21.11236000061035, -18.862913131713867, 12.40765380859375, 13.867488861083984, 0.4792938232421875, 28.19971466064453, 4.487585067749023, 13.909744262695312, -28.682374954223633, -20.325729370117188, 24.541656494140625, 19.26495361328125, 11.50753402709961, -2.710437774658203, 4.224388122558594, 32.517478942871094, -9.445402145385742, 11.588935852050781, 11.617256164550781, 22.655174255371094, 12.280960083007812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000528.npy"}
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 19.338348388671875, "std": 19.300355911254883, "min": -34.10463333129883, "p10": -4.816997909545898, "median": 19.239049911499023, "p90": 43.27121047973633, "max": 57.482574462890625, "pos_frac": 0.8125, "sample": [34.77664566040039, -5.033397674560547, 21.182937622070312, 7.3144683837890625, 31.69607925415039, 4.208808898925781, 54.955718994140625, 50.449073791503906, 3.898937225341797, 2.6074371337890625, -4.312065124511719, -7.01231575012207, 11.612548828125, 11.587615966796875, 16.47760772705078, 12.66839599609375, -12.432571411132812, 35.926963806152344, -2.3172683715820312, 23.369300842285156, 37.519927978515625, 3.4805145263671875, -9.216506958007812, 19.325279235839844, 27.687667846679688, 39.766448974609375, -7.482337951660156, 0.03780364990234375, 19.152820587158203, 9.84738540649414, 41.536415100097656, 29.688282012939453, 36.53022003173828, 51.511940002441406, 21.394996643066406, 36.86064910888672, 36.65949249267578, 50.145721435546875, -1.196685791015625, 33.18696594238281, 17.9501953125, 25.185646057128906, 57.482574462890625, 10.958984375, 14.938377380371094, 29.57830047607422, 15.4193115234375, -1.562917709350586, 24.961959838867188, 34.31951904296875, 5.812202453613281, 29.7845458984375, 4.511556625366211, 40.92677307128906, -10.913684844970703, 49.1358642578125, 23.350051879882812, 32.68373107910156, 17.117889404296875, 25.284210205078125, -34.10463333129883, 14.936286926269531, -2.179096221923828, 44.01469421386719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000529.npy"}
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 11.604594230651855, "std": 15.85655689239502, "min": -20.949918746948242, "p10": -5.574937438964843, "median": 8.600675582885742, "p90": 28.586902618408207, "max": 52.265380859375, "pos_frac": 0.71875, "sample": [52.265380859375, 16.196941375732422, 18.14653778076172, -1.0642509460449219, 11.135734558105469, -5.217193603515625, 47.22850036621094, -2.8588409423828125, 22.285934448242188, 19.78963851928711, 6.240360260009766, 22.87664031982422, -2.7052669525146484, 8.61916732788086, 20.02631187438965, 23.583818435668945, 4.94007682800293, 11.081733703613281, 47.16455078125, -11.168060302734375, 0.07929039001464844, -7.332603454589844, -1.3828277587890625, 18.315181732177734, 27.468276977539062, 0.6361179351806641, 28.80469512939453, 18.994932174682617, 8.582183837890625, 45.994972229003906, 11.725341796875, 8.481880187988281, 27.753143310546875, 36.89459228515625, -5.3775177001953125, 2.075620651245117, 13.811386108398438, 20.8769474029541, -12.121845245361328, 7.685478210449219, 15.118804931640625, 23.190078735351562, 2.2328567504882812, 7.126251220703125, -3.4692344665527344, 4.855064392089844, 28.001483917236328, -5.255439758300781, -20.949918746948242, -1.8214874267578125, -6.623237609863281, -7.742099761962891, 27.130640029907227, -0.9063835144042969, 38.14769744873047, 4.7513275146484375, 23.244670867919922, -5.009712219238281, 7.5486297607421875, -5.6595458984375, 28.078720092773438, 3.628753662109375, 8.85189437866211, 17.69123077392578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000530.npy"}
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 10.26496696472168, "std": 18.273088455200195, "min": -29.841163635253906, "p10": -9.675741577148438, "median": 6.537937164306641, "p90": 35.70101318359375, "max": 56.80995178222656, "pos_frac": 0.765625, "sample": [-22.970535278320312, 30.517013549804688, 30.2584228515625, 1.9837417602539062, 7.265268325805664, 3.200714111328125, 0.4251060485839844, 33.20637512207031, 8.352264404296875, 5.642978668212891, 6.7161407470703125, 4.914030075073242, 36.54484558105469, -4.839813232421875, 41.18762969970703, 31.74883270263672, -13.929454803466797, -10.535140991210938, 4.416114807128906, 47.44398880004883, 0.9951324462890625, 21.88232421875, -9.841529846191406, -9.288902282714844, -4.254508972167969, 9.346456527709961, 7.751243591308594, 8.918601989746094, 16.920082092285156, 2.8173179626464844, 6.359733581542969, -3.4490623474121094, -0.4731597900390625, 0.7232284545898438, -29.841163635253906, -2.4847412109375, 3.1007080078125, 19.53697967529297, 17.632034301757812, -29.814605712890625, 22.030319213867188, 13.486953735351562, 34.96916198730469, 33.213722229003906, -20.620315551757812, 0.19655609130859375, 47.06888961791992, 36.01466369628906, 13.624778747558594, 7.07049560546875, -0.48827362060546875, 4.330463409423828, 10.583854675292969, 25.109100341796875, 6.748626708984375, 1.1087455749511719, 23.338516235351562, -0.6515846252441406, 30.138042449951172, 56.80995178222656, 4.392019271850586, 2.1956539154052734, 36.52264404296875, 1.6801681518554688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000531.npy"}
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 13.579508781433105, "std": 19.39522933959961, "min": -23.085166931152344, "p10": -10.845379638671874, "median": 12.856029510498047, "p90": 42.942562866210956, "max": 64.38003540039062, "pos_frac": 0.75, "sample": [19.395187377929688, 20.267526626586914, 30.094993591308594, 48.88914489746094, -10.212265014648438, -1.118896484375, 10.183555603027344, 59.045196533203125, 2.469951629638672, 20.327468872070312, -11.116714477539062, 23.007122039794922, 37.78839874267578, 3.659637451171875, 31.479705810546875, 9.081016540527344, 15.15005874633789, 17.35198974609375, 4.243940353393555, -16.562515258789062, -2.3270206451416016, 4.261251449584961, 27.0587100982666, 8.088541030883789, 11.024398803710938, 17.809730529785156, 12.533939361572266, 38.410400390625, 21.63005828857422, 27.55925750732422, 26.279022216796875, 4.493206024169922, 25.799789428710938, 64.38003540039062, 7.730857849121094, 18.499801635742188, 16.85303497314453, -12.913810729980469, -0.5201416015625, 3.5516719818115234, -7.791749954223633, -13.98724365234375, 0.56817626953125, -5.118459701538086, 49.19501495361328, 7.853446960449219, 46.677040100097656, -2.7503204345703125, 15.422889709472656, 13.259796142578125, 13.178119659423828, 15.927162170410156, -1.9062080383300781, 46.788509368896484, -14.502601623535156, 27.688716888427734, -23.040771484375, 3.77056884765625, 44.884918212890625, -23.085166931152344, 2.8403263092041016, -6.464221954345703, 25.962249755859375, 20.091129302978516], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000532.npy"}
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 12.413052558898926, "std": 18.289213180541992, "min": -43.235328674316406, "p10": -7.243902587890624, "median": 9.077582359313965, "p90": 37.06760559082032, "max": 54.82605743408203, "pos_frac": 0.75, "sample": [9.794151306152344, 31.48186683654785, -2.216339111328125, 19.367828369140625, 18.43569564819336, 27.085256576538086, 2.1675949096679688, 1.0126800537109375, 29.778987884521484, 21.729116439819336, 5.351318359375, 4.234027862548828, 6.446739196777344, -4.745670318603516, 28.941287994384766, 35.4697265625, 5.974908828735352, 31.04595184326172, -2.1671600341796875, -7.778144836425781, 1.5562210083007812, 8.361013412475586, 16.48242950439453, 4.4900054931640625, -13.485267639160156, 16.574743270874023, 29.730976104736328, -2.7860565185546875, 6.507389068603516, -23.45851707458496, 2.1486358642578125, 0.7707633972167969, 37.752410888671875, 31.00322723388672, -8.594209671020508, 34.19334411621094, -8.206451416015625, 46.37712097167969, 7.414703369140625, -4.1837158203125, 11.118537902832031, 16.513778686523438, 15.29543685913086, 21.21605682373047, 38.03868865966797, 54.82605743408203, 8.10162353515625, -12.1290283203125, 6.1169891357421875, -3.1256484985351562, -3.2967376708984375, 33.570098876953125, 17.42369842529297, -0.878753662109375, 3.426839828491211, -5.997337341308594, 14.207267761230469, 14.50299072265625, 14.472808837890625, 50.55995559692383, 38.38017272949219, 40.17518615722656, -43.235328674316406, 21.093421936035156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000533.npy"}
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 12.566951751708984, "std": 18.786352157592773, "min": -38.197120666503906, "p10": -8.974316596984863, "median": 12.916175842285156, "p90": 35.42811813354493, "max": 60.596649169921875, "pos_frac": 0.75, "sample": [5.184181213378906, -9.147588729858398, 10.172637939453125, 14.834423065185547, 2.782613754272461, 21.56505584716797, 6.2005462646484375, -38.197120666503906, 12.797351837158203, -15.100784301757812, -2.537494659423828, 6.00714111328125, 33.12443161010742, 17.623779296875, -1.9509696960449219, 60.596649169921875, 19.92731475830078, 17.523975372314453, 25.771888732910156, -16.038803100585938, 33.40452575683594, 36.10748291015625, 13.03499984741211, 16.849472045898438, -9.841514587402344, 15.817901611328125, -31.158065795898438, 8.68809700012207, 37.02252960205078, 18.285125732421875, 8.830886840820312, -6.838344573974609, 33.842933654785156, 42.44853973388672, 42.89746856689453, 26.32171630859375, -3.5532379150390625, 6.1951751708984375, 51.291748046875, 29.179182052612305, 28.30091094970703, 30.982421875, -8.570014953613281, 8.972900390625, 22.65489387512207, 18.372913360595703, -3.6895828247070312, 17.25152587890625, 2.0748252868652344, -3.1123809814453125, 7.3166046142578125, 36.107845306396484, 17.831897735595703, -7.077976226806641, 0.16564178466796875, 5.940925598144531, 24.597320556640625, -12.766563415527344, 14.269309997558594, 0.5126876831054688, 31.743568420410156, 8.29840087890625, 32.630340576171875, -8.489418029785156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000534.npy"}
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 11.72776985168457, "std": 20.54655647277832, "min": -46.933753967285156, "p10": -13.162273406982422, "median": 11.404598236083984, "p90": 37.0302906036377, "max": 66.01333618164062, "pos_frac": 0.75, "sample": [4.451221466064453, -11.16641616821289, 36.12689208984375, 22.754100799560547, 5.4936981201171875, 19.857067108154297, -25.805404663085938, -5.6043548583984375, -12.606986999511719, 23.0638427734375, 25.05670166015625, -15.634664535522461, 13.497541427612305, 22.459014892578125, -5.596702575683594, 7.895866394042969, -12.342636108398438, 9.138381958007812, -0.21221923828125, -1.5402450561523438, 16.990814208984375, 17.420867919921875, 21.33473777770996, 5.0518798828125, 18.694721221923828, 0.9649276733398438, 49.79179382324219, -17.338088989257812, -14.814292907714844, 8.477710723876953, 25.293548583984375, 54.02703857421875, 19.638931274414062, -2.2804412841796875, 13.066858291625977, 16.757028579711914, 4.6013641357421875, 66.01333618164062, -19.922203063964844, 19.875350952148438, 11.418678283691406, 0.9069137573242188, 64.77550506591797, 15.36337661743164, 3.4819717407226562, -46.933753967285156, -4.49658203125, 23.74616241455078, 3.160778045654297, 23.958023071289062, 29.724143981933594, -13.400253295898438, 5.508327484130859, 42.68504333496094, 19.5631103515625, 12.352413177490234, 11.390518188476562, 27.509628295898438, 37.41746139526367, 6.482666015625, 0.5360145568847656, 5.136634826660156, 39.282440185546875, 28.077499389648438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000535.npy"}
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 14.10908317565918, "std": 15.841023445129395, "min": -10.113235473632812, "p10": -4.584517860412597, "median": 11.101247787475586, "p90": 36.49355773925782, "max": 56.011932373046875, "pos_frac": 0.8125, "sample": [15.22872543334961, 1.16748046875, -8.814811706542969, 49.228240966796875, 37.416290283203125, 12.122100830078125, 2.953094482421875, -4.9914703369140625, 7.08258056640625, 4.020509719848633, -10.113235473632812, 8.272811889648438, 34.631080627441406, 23.784523010253906, 14.351268768310547, 19.619911193847656, 17.3076171875, 38.830528259277344, 56.011932373046875, 14.493783950805664, 15.37643814086914, 0.2338886260986328, 32.12330627441406, 20.499923706054688, -3.5931396484375, 47.6866455078125, 10.584068298339844, -3.780118942260742, 20.657638549804688, 11.972522735595703, 36.91002655029297, -7.251373291015625, 1.8545951843261719, 11.40420150756836, -3.3666229248046875, 4.9769744873046875, 25.518301010131836, 1.4322128295898438, 3.5007171630859375, 2.7893409729003906, 25.72283172607422, 30.91746711730957, 28.02186393737793, -5.242511749267578, 3.6460113525390625, 11.025978088378906, 8.393867492675781, 19.81662940979004, 11.176517486572266, 31.87725830078125, -5.848470687866211, 33.048240661621094, 6.840873718261719, 35.52179718017578, 24.542892456054688, 9.005428314208984, 6.382518768310547, 44.43516540527344, -2.77337646484375, -4.92926025390625, 5.92772102355957, 19.457815170288086, -3.1356048583984375, 7.017173767089844], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000536.npy"}
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 12.6707124710083, "std": 19.290462493896484, "min": -32.353111267089844, "p10": -11.998162460327148, "median": 11.317827224731445, "p90": 39.718339920043945, "max": 57.20002746582031, "pos_frac": 0.71875, "sample": [11.554088592529297, 18.89673614501953, 10.307823181152344, 43.36444091796875, 37.34808349609375, 39.548763275146484, 23.872821807861328, -0.3328266143798828, 17.234664916992188, 14.701330184936523, 5.354166030883789, 12.501359939575195, 37.203453063964844, 29.853424072265625, 35.841033935546875, 28.86492919921875, 29.045106887817383, 10.36646842956543, -1.4378032684326172, -8.82181167602539, -5.2366943359375, 6.044403076171875, 12.927688598632812, 3.5577430725097656, -3.509096145629883, 34.71783447265625, 10.48919677734375, 3.9569320678710938, -32.353111267089844, -4.037946701049805, 19.643966674804688, 6.337337493896484, 8.004369735717773, -6.9149169921875, -10.776374816894531, -1.1590995788574219, 39.791015625, 23.304637908935547, 16.526832580566406, 3.358348846435547, 18.001964569091797, 39.96607971191406, 0.02630615234375, 8.061553955078125, -6.293569564819336, -4.953351974487305, 19.927764892578125, -12.773597717285156, 6.9229888916015625, 27.698348999023438, 11.081565856933594, 24.423139572143555, -22.402000427246094, 42.36976623535156, 46.9766845703125, 57.20002746582031, -12.521785736083984, 22.251358032226562, -13.423852920532227, 15.544563293457031, 42.21061706542969, 22.89038848876953, -26.355960845947266, -15.842727661132812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000537.npy"}
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 15.42237377166748, "std": 17.996015548706055, "min": -25.396316528320312, "p10": -6.9209501266479485, "median": 15.629661560058594, "p90": 37.01968002319337, "max": 67.93342590332031, "pos_frac": 0.8125, "sample": [-3.594329833984375, 29.759958267211914, 8.610298156738281, 6.779010772705078, 20.007938385009766, -25.396316528320312, 18.737049102783203, -2.512786865234375, 15.029407501220703, 20.54314613342285, 12.723381042480469, 35.2435417175293, 1.4821929931640625, 28.537582397460938, 21.235567092895508, 17.1964111328125, 34.560081481933594, 45.54862976074219, 15.8626708984375, 22.083934783935547, 6.6249542236328125, -1.141632080078125, 45.83222198486328, 21.04541015625, 8.277667999267578, -8.899696350097656, 11.50927734375, 13.389244079589844, 51.79230499267578, 25.628097534179688, -2.363494873046875, 17.256362915039062, -12.475616455078125, 1.223907470703125, -11.731742858886719, 9.131114959716797, 12.91180419921875, 17.257238388061523, 3.4054412841796875, 11.915451049804688, 35.771827697753906, -16.459060668945312, 37.554473876953125, 19.755111694335938, -7.342157363891602, 21.237293243408203, -5.938133239746094, 32.989227294921875, 15.396652221679688, 67.93342590332031, 20.945817947387695, 19.64636993408203, 46.0975341796875, -8.593246459960938, 18.847000122070312, 6.038215637207031, 55.45148849487305, 2.8764610290527344, 22.021419525146484, 1.621795654296875, 5.5983428955078125, 25.150693893432617, 27.12505340576172, 0.2806282043457031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000538.npy"}
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 12.703271865844727, "std": 16.918540954589844, "min": -17.780250549316406, "p10": -6.71562042236328, "median": 9.131767272949219, "p90": 34.54718017578125, "max": 54.841949462890625, "pos_frac": 0.78125, "sample": [12.132198333740234, 24.922855377197266, 3.0514087677001953, 54.841949462890625, 24.06941795349121, -4.980926513671875, 31.502670288085938, 22.574188232421875, 27.2611083984375, 8.896995544433594, 34.579246520996094, 2.227447509765625, 4.9067230224609375, 9.366539001464844, -1.05657958984375, -2.8333187103271484, 19.056983947753906, -0.40946197509765625, 53.62797546386719, 10.72589111328125, 11.56047248840332, -11.089405059814453, 13.016998291015625, -13.371856689453125, 2.0062484741210938, -17.780250549316406, -1.3279743194580078, 46.38153076171875, 47.37315368652344, 12.282020568847656, 8.818580627441406, 3.8223114013671875, 4.607513427734375, 0.9189224243164062, 25.000442504882812, 13.751123428344727, 3.2994384765625, -1.915313720703125, 34.47235870361328, 2.3498973846435547, -7.4590606689453125, 5.93060302734375, 9.903200149536133, 0.0414276123046875, 28.599990844726562, 6.2033233642578125, -11.668060302734375, 38.22325897216797, 9.850257873535156, -8.28924560546875, 26.247028350830078, 27.999343872070312, 4.607288360595703, 4.247749328613281, 12.07781982421875, 31.817893981933594, -2.0745697021484375, 7.328413009643555, -9.987911224365234, 33.081451416015625, 12.522211074829102, 33.81724548339844, 40.18284606933594, 1.1673736572265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000539.npy"}
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 15.910892486572266, "std": 17.490629196166992, "min": -21.514240264892578, "p10": -4.905033111572263, "median": 12.823604583740234, "p90": 40.79962921142579, "max": 57.656768798828125, "pos_frac": 0.859375, "sample": [-9.913246154785156, 8.77899169921875, 57.656768798828125, 30.63527488708496, 6.398536682128906, -21.514240264892578, 22.751632690429688, 4.6554107666015625, 9.542884826660156, 23.830432891845703, 5.352569580078125, 28.957473754882812, 33.07398986816406, 17.78093719482422, -9.154199600219727, 9.19512939453125, 42.11974334716797, 12.904403686523438, 0.4104881286621094, 17.68285369873047, 22.04761505126953, -12.057708740234375, 25.1168212890625, 7.872812271118164, 44.7535400390625, 38.17613983154297, 5.232816696166992, 12.775390625, 0.10076904296875, 17.1361083984375, -7.307647705078125, 38.861053466796875, 29.161041259765625, 13.622039794921875, 36.89031982421875, 0.21022415161132812, -0.39501190185546875, 22.85137176513672, -5.911529541015625, 1.7158012390136719, 1.0542068481445312, 31.09294891357422, 23.598960876464844, 50.23094940185547, 51.93782043457031, 10.196922302246094, 37.371429443359375, 11.909099578857422, 4.741546630859375, -7.99395751953125, 12.871818542480469, 26.803680419921875, 2.2170028686523438, 0.2500457763671875, 4.777385711669922, 31.972885131835938, -2.5565414428710938, 45.06001281738281, 8.425674438476562, 19.213233947753906, 21.853179931640625, 2.637990951538086, 7.002557754516602, 41.63044738769531], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000540.npy"}
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 14.419329643249512, "std": 18.02631378173828, "min": -21.18829345703125, "p10": -7.809558868408201, "median": 13.138201713562012, "p90": 41.40512466430664, "max": 48.70460510253906, "pos_frac": 0.765625, "sample": [42.7700080871582, 17.40917205810547, 28.479991912841797, 43.89397430419922, -1.2941131591796875, 41.37705993652344, 4.989082336425781, 27.538394927978516, 34.49427032470703, -3.7277374267578125, -12.160377502441406, 4.878143310546875, 11.079338073730469, 14.148870468139648, 3.6723556518554688, -14.956802368164062, 15.872371673583984, 42.91423034667969, 5.892974853515625, 34.75800704956055, -6.395027160644531, 19.110044479370117, 47.2372932434082, 30.07294464111328, 41.417152404785156, 17.988710403442383, 29.60785675048828, -1.1912841796875, -18.014802932739258, -2.115201950073242, 17.318157196044922, 11.1654052734375, 48.70460510253906, 29.347930908203125, 6.3379669189453125, 29.88655662536621, 16.496572494506836, -21.18829345703125, 21.07933807373047, 1.3654613494873047, 3.438386917114258, -5.486499786376953, 31.18426513671875, 31.245677947998047, 5.472414016723633, 11.900379180908203, 24.43185043334961, 11.85748291015625, -19.70166015625, -1.9621124267578125, 0.5620155334472656, 25.460208892822266, 22.31104278564453, 12.127532958984375, -0.6671600341796875, -13.0220947265625, 10.710430145263672, 8.093273162841797, 0.2376232147216797, 15.741859436035156, 15.014276504516602, -8.415786743164062, 47.049537658691406, 34.99359130859375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000541.npy"}
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 15.396230697631836, "std": 19.341135025024414, "min": -25.310264587402344, "p10": -5.916003036499023, "median": 11.00656509399414, "p90": 42.15763092041016, "max": 58.81178665161133, "pos_frac": 0.796875, "sample": [30.088237762451172, 24.97787094116211, 6.820991516113281, 37.934661865234375, 9.195035934448242, -0.16632080078125, 9.617584228515625, 18.668800354003906, 8.77703857421875, 9.757339477539062, 14.164989471435547, 1.2869415283203125, -2.6910171508789062, 22.198837280273438, 8.617353439331055, 19.76615333557129, 5.245452880859375, 40.285003662109375, -9.551376342773438, 37.75513458251953, 40.64678192138672, 42.805137634277344, 5.56141471862793, -0.21985626220703125, -25.310264587402344, -23.615554809570312, 18.76360321044922, 5.264213562011719, 9.859415054321289, -17.595108032226562, 47.31963348388672, 38.61491012573242, 1.8349533081054688, 27.504547119140625, -4.6183013916015625, -5.974414825439453, -8.8131103515625, 14.130294799804688, -5.7797088623046875, 30.384628295898438, 16.755538940429688, 4.4632568359375, 47.17619705200195, -5.396640777587891, 23.288463592529297, 7.808891296386719, 31.191349029541016, 11.781822204589844, 2.3975753784179688, 48.08253479003906, 38.39598083496094, 53.5822639465332, -18.939327239990234, 13.27609634399414, 8.554866790771484, 15.660259246826172, 1.5059890747070312, 58.81178665161133, 30.24138641357422, 14.643745422363281, 10.231307983398438, 43.843170166015625, 8.280311584472656, 36.210052490234375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000542.npy"}
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 13.19050407409668, "std": 17.765825271606445, "min": -47.59281921386719, "p10": -4.012855911254882, "median": 11.433891296386719, "p90": 36.43405227661133, "max": 60.516841888427734, "pos_frac": 0.796875, "sample": [8.379287719726562, 2.923492431640625, 10.20992660522461, -0.038501739501953125, 15.193977355957031, 12.776012420654297, 22.867630004882812, 39.571800231933594, -25.70046615600586, 26.56256103515625, 1.731842041015625, 26.805856704711914, 16.903987884521484, 32.37714767456055, 47.69695281982422, 15.648628234863281, 60.516841888427734, -13.703784942626953, 9.41754150390625, 27.212425231933594, 6.127302169799805, 3.1998519897460938, 1.5150413513183594, 8.47674560546875, 3.872936248779297, 27.34354019165039, -47.59281921386719, 0.47890281677246094, 6.052249908447266, -6.634668350219727, 15.646331787109375, 34.59260559082031, 45.450828552246094, 16.791576385498047, 26.519697189331055, 11.636154174804688, 48.868019104003906, -1.0875530242919922, -4.706413269042969, -4.189413070678711, 21.577640533447266, 11.971792221069336, 20.497711181640625, 17.121002197265625, 26.114395141601562, 20.73505401611328, 27.217018127441406, 37.223243713378906, -0.6121635437011719, 7.361106872558594, -0.4846534729003906, 8.06275749206543, 4.261878967285156, 7.505714416503906, 11.23162841796875, -0.802001953125, 15.462326049804688, 4.532835006713867, 5.515693664550781, -3.600889205932617, 18.558822631835938, 42.608299255371094, -6.820171356201172, 19.23912811279297], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000543.npy"}
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 18.164968490600586, "std": 20.743682861328125, "min": -19.464691162109375, "p10": -7.168553161621093, "median": 13.750118255615234, "p90": 50.119190597534185, "max": 72.83369445800781, "pos_frac": 0.8125, "sample": [31.24016571044922, -16.539413452148438, 13.608802795410156, 8.112966537475586, 26.94037628173828, 22.899749755859375, -19.464691162109375, 14.529417037963867, 27.145645141601562, 12.533546447753906, 7.806640625, 11.393062591552734, 8.297943115234375, 38.035789489746094, 16.279708862304688, 4.499635696411133, 3.019634246826172, 11.428050994873047, 30.01647186279297, 18.093902587890625, 9.78515625, 10.556846618652344, 23.561569213867188, 33.84996795654297, 39.34283447265625, 10.588502883911133, 9.38339614868164, -6.5368804931640625, -4.494203567504883, -7.440338134765625, 59.824249267578125, -13.242904663085938, 23.759445190429688, -1.189798355102539, 48.64619827270508, 13.891433715820312, 62.37615966796875, -15.196342468261719, 8.882698059082031, 8.016532897949219, 21.054855346679688, 15.158592224121094, 52.71038818359375, 3.203084945678711, 5.354896545410156, 7.822601318359375, 34.40260696411133, 12.067941665649414, 54.58441162109375, -11.757293701171875, -0.3655738830566406, 48.050750732421875, 35.88136291503906, 72.83369445800781, -7.43927001953125, 15.600204467773438, 50.75047302246094, 5.32257080078125, 52.26862716674805, 45.78315734863281, -1.8413429260253906, 22.0474853515625, 25.512786865234375, 19.30902099609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000544.npy"}
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 12.36419677734375, "std": 18.64238166809082, "min": -33.160850524902344, "p10": -7.505792999267578, "median": 10.136519432067871, "p90": 40.04142570495607, "max": 63.02119445800781, "pos_frac": 0.734375, "sample": [17.398134231567383, 18.82793426513672, -7.325340270996094, 41.620487213134766, 7.245449066162109, 4.987007141113281, -5.804389953613281, -4.775564193725586, -4.686063766479492, 9.906414031982422, 13.2236328125, -7.5831298828125, 42.91661071777344, 6.947389602661133, 6.64539909362793, 27.944717407226562, 9.672378540039062, 18.602035522460938, -6.589090347290039, 19.936004638671875, 8.491355895996094, 32.024871826171875, -1.6678733825683594, 21.121719360351562, 46.43658447265625, 23.83405303955078, 4.59698486328125, 4.849954605102539, 20.625762939453125, 10.36662483215332, 1.366159439086914, -12.88875961303711, -15.9412841796875, 14.191547393798828, -7.6451873779296875, -6.294271469116211, 42.85259246826172, 52.55659484863281, 36.35694885253906, -1.7563705444335938, -1.4901084899902344, 32.2381591796875, 21.677852630615234, 18.443397521972656, 3.1167831420898438, 26.12810516357422, -6.014373779296875, 15.637130737304688, 12.530731201171875, -33.160850524902344, 3.279876708984375, 22.98737335205078, 18.832550048828125, 63.02119445800781, 28.82794952392578, -16.688217163085938, 0.7463645935058594, 14.881387710571289, -16.957427978515625, 0.7091865539550781, 22.79058837890625, 47.0572509765625, 26.32421875, 3.8014678955078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000545.npy"}
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 13.086883544921875, "std": 17.275997161865234, "min": -29.01404571533203, "p10": -9.531906509399413, "median": 12.649576187133789, "p90": 33.5784912109375, "max": 52.419456481933594, "pos_frac": 0.78125, "sample": [39.96446228027344, -6.456512451171875, 33.97356414794922, 13.290168762207031, 20.27837371826172, 29.490251541137695, 15.524398803710938, -8.586776733398438, -1.2642364501953125, -5.086149215698242, -21.472381591796875, 16.41875457763672, 7.7646942138671875, 14.095199584960938, -29.01404571533203, 22.451126098632812, 6.304830551147461, 9.782516479492188, -9.936962127685547, 7.83648681640625, -7.0764617919921875, 23.916641235351562, 10.466094970703125, 20.758880615234375, 18.870376586914062, 10.467521667480469, 27.52503776550293, 11.454986572265625, 43.2439079284668, 52.419456481933594, 11.15614128112793, 6.530067443847656, 33.773292541503906, 14.50628662109375, 25.818302154541016, -10.698211669921875, 10.308937072753906, 27.2509765625, 9.489599227905273, 8.9913330078125, 12.032468795776367, 6.425405502319336, -17.729534149169922, 12.8602294921875, 17.565034866333008, 27.129051208496094, -7.087394714355469, 12.438922882080078, 23.697710037231445, 27.361656188964844, 19.631431579589844, 6.831058502197266, 42.413673400878906, -16.830780029296875, -1.9013214111328125, 12.082395553588867, 24.817352294921875, 17.876014709472656, 52.3907585144043, 19.83759117126465, 28.598587036132812, -22.427425384521484, 33.12395477294922, 1.8928184509277344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000546.npy"}
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 14.594972610473633, "std": 18.088096618652344, "min": -33.131187438964844, "p10": -4.568412780761719, "median": 15.99360179901123, "p90": 34.89480590820313, "max": 69.52253723144531, "pos_frac": 0.8125, "sample": [4.879859924316406, -2.137542724609375, 11.096494674682617, 6.2454986572265625, 35.874420166015625, 6.141124725341797, -14.718229293823242, 69.52253723144531, 19.065135955810547, 24.391319274902344, 16.81787109375, 18.79784393310547, 32.31658935546875, 13.521577835083008, 6.7797698974609375, 0.786773681640625, 25.582138061523438, 49.29608917236328, 3.086597442626953, 18.565078735351562, 22.642799377441406, 30.334014892578125, 46.07559585571289, -0.654083251953125, 32.609039306640625, 15.132064819335938, -3.972259521484375, 15.406913757324219, 26.678543090820312, 8.186141967773438, 13.829254150390625, -4.5816192626953125, -33.131187438964844, 25.96446990966797, -9.90625, 18.690357208251953, 28.660423278808594, 4.454416275024414, 25.85955810546875, 22.26201057434082, 17.441932678222656, 48.60393524169922, 16.191650390625, 21.54056167602539, -15.454826354980469, 21.223657608032227, 1.1999588012695312, -31.782073974609375, -4.53759765625, 15.991668701171875, 40.555328369140625, 16.778039932250977, 36.71156311035156, -4.7637786865234375, 15.995534896850586, -3.7719573974609375, 11.793880462646484, 30.11520004272461, 2.339010238647461, 14.561103820800781, 23.219675064086914, 4.188728332519531, 25.00592803955078, 0.4799308776855469], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000547.npy"}
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 10.308873176574707, "std": 18.206552505493164, "min": -23.986591339111328, "p10": -9.123216438293458, "median": 6.781306266784668, "p90": 38.83436698913576, "max": 66.16305541992188, "pos_frac": 0.71875, "sample": [9.983901977539062, 20.379425048828125, 9.92327880859375, -9.16656494140625, -9.022069931030273, 22.45477294921875, -9.423423767089844, 58.88776779174805, 31.471235275268555, 11.129592895507812, -11.311355590820312, 44.153770446777344, 2.4651565551757812, -0.29958343505859375, 3.1316680908203125, 5.013917922973633, -15.406051635742188, 5.833921432495117, 3.999755859375, 13.687164306640625, 34.29063415527344, 13.334165573120117, -17.46627426147461, 6.06781005859375, 14.879592895507812, 6.152797698974609, 49.15258026123047, 22.845962524414062, 10.796112060546875, 40.781681060791016, -2.2774314880371094, 14.853580474853516, -4.667747497558594, 15.143501281738281, -5.144447326660156, 0.67816162109375, 44.10829162597656, 9.698749542236328, -6.180915832519531, 6.9971923828125, 14.065414428710938, 6.565420150756836, 21.151763916015625, 3.224945068359375, 48.073028564453125, 0.27651214599609375, -7.425621032714844, 13.578750610351562, -10.428564071655273, 10.449295043945312, 9.145843505859375, -23.986591339111328, 11.228532791137695, 21.904739379882812, 4.317878723144531, -1.6361923217773438, 21.12364387512207, -6.4884490966796875, 6.0823516845703125, -4.17059326171875, 66.16305541992188, 20.491138458251953, -8.171792984008789, 2.3030967712402344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000548.npy"}
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 11.184266090393066, "std": 17.400798797607422, "min": -27.908981323242188, "p10": -11.711963272094726, "median": 12.30699634552002, "p90": 35.9406005859375, "max": 42.88882064819336, "pos_frac": 0.71875, "sample": [3.739959716796875, 25.522422790527344, -7.2161407470703125, 14.932662963867188, -5.046302795410156, -18.322372436523438, 23.90454864501953, 5.779386520385742, 8.010269165039062, 37.877437591552734, 37.8402099609375, 8.828876495361328, 11.931886672973633, 35.7618408203125, -2.7522430419921875, 1.908721923828125, -18.9747314453125, 18.53619384765625, 18.895370483398438, 20.127254486083984, 28.86705780029297, 26.294719696044922, 24.777191162109375, -11.459789276123047, 18.561676025390625, 2.692169189453125, 5.235317230224609, 32.431297302246094, 12.682106018066406, -16.767715454101562, 30.444515228271484, 41.092979431152344, 20.041748046875, -4.461830139160156, 14.403169631958008, 42.742835998535156, -1.8909988403320312, -11.097610473632812, -6.787078857421875, 20.251510620117188, -18.195159912109375, 10.701171875, 1.2127437591552734, 2.7544479370117188, 28.53085708618164, 22.69707679748535, -13.530113220214844, 18.72089385986328, 21.294464111328125, 1.9468154907226562, -3.009817123413086, 36.0172119140625, 19.63555908203125, 20.103256225585938, 14.456008911132812, 0.6365203857421875, 15.991395950317383, 8.326801300048828, -0.22140121459960938, 39.85845184326172, 42.88882064819336, -27.908981323242188, -11.820037841796875, -4.632476806640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000549.npy"}
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 9.223260879516602, "std": 17.17941665649414, "min": -26.332672119140625, "p10": -9.87153625488281, "median": 6.573736190795898, "p90": 33.12846908569338, "max": 49.67842102050781, "pos_frac": 0.671875, "sample": [8.17462158203125, 45.885013580322266, 44.746517181396484, 11.618408203125, -20.29878044128418, -1.6019287109375, -0.2861595153808594, -3.471893310546875, -1.5458831787109375, 9.099327087402344, 23.116304397583008, 11.608978271484375, 20.220829010009766, 7.222043991088867, 4.47913932800293, 35.849098205566406, 49.67842102050781, 24.491127014160156, 15.514022827148438, -2.4641189575195312, 26.78033447265625, -3.8864173889160156, 20.977880477905273, 22.125869750976562, 24.813026428222656, -14.153556823730469, 24.73095703125, -0.07596206665039062, 5.786643981933594, -11.355316162109375, -18.46416664123535, 1.9910812377929688, -5.85205078125, 3.1297225952148438, 3.2210006713867188, 7.897575378417969, 5.30841064453125, -10.521331787109375, -3.298492431640625, 1.4319114685058594, -7.149116516113281, 17.13580322265625, 6.903697967529297, 5.284149169921875, 6.2437744140625, 15.242023468017578, 16.656723022460938, 22.287378311157227, 8.2559814453125, 26.23050308227539, -8.3553466796875, 19.23270034790039, 41.051116943359375, 9.609415054321289, 40.79178237915039, 8.893732070922852, -26.332672119140625, 0.75, 48.85362243652344, -24.215015411376953, 5.9386749267578125, -1.4793777465820312, -2.6294898986816406, -1.5335884094238281], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000550.npy"}
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 11.889625549316406, "std": 17.515260696411133, "min": -19.965667724609375, "p10": -4.792823410034178, "median": 9.25485610961914, "p90": 33.28324737548829, "max": 88.11126708984375, "pos_frac": 0.75, "sample": [15.377670288085938, 11.517745971679688, 19.632694244384766, 5.1515960693359375, 17.314085006713867, 22.96366310119629, 88.11126708984375, 0.9664421081542969, 1.5643939971923828, 25.957778930664062, -5.900905609130859, 8.640945434570312, 39.47620391845703, 3.377138137817383, 24.017539978027344, 10.860980987548828, 17.8157958984375, 15.560310363769531, -19.965667724609375, -2.7182750701904297, 6.308370590209961, 9.868766784667969, 17.86456298828125, 1.5125732421875, 5.899223327636719, 52.960716247558594, 0.23028564453125, 14.305641174316406, 18.65326690673828, 27.623424530029297, 12.065607070922852, -0.9637374877929688, 3.0099048614501953, 10.697784423828125, -1.2593536376953125, 39.914947509765625, -2.065338134765625, 38.124671936035156, 2.0884437561035156, 12.89013671875, -3.7175331115722656, 11.76763916015625, -1.7594642639160156, 1.1236610412597656, 30.648910522460938, 26.7552490234375, 34.06816101074219, 13.209156036376953, -3.3430747985839844, -2.208284378051758, 6.440895080566406, 14.168861389160156, -1.909820556640625, -11.105278015136719, -7.892757415771484, 3.546234130859375, -6.535270690917969, 31.4517822265625, -5.253662109375, 3.906177520751953, -10.6427001953125, 1.2847862243652344, 26.252593994140625, 41.228424072265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000551.npy"}
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 13.42190170288086, "std": 16.21704864501953, "min": -29.8753662109375, "p10": -5.673964118957519, "median": 12.669205665588379, "p90": 32.90180435180664, "max": 50.75901794433594, "pos_frac": 0.84375, "sample": [-5.789823532104492, 7.2933197021484375, 11.382148742675781, 3.3692398071289062, 29.701807022094727, 18.095062255859375, 1.5017318725585938, 2.9364776611328125, 2.8473663330078125, 27.677536010742188, 22.607574462890625, 50.75901794433594, 18.361679077148438, 15.938804626464844, 2.3376998901367188, -9.805658340454102, -5.40362548828125, 25.611907958984375, 10.563133239746094, -7.1825103759765625, 19.677513122558594, 23.741355895996094, 9.774642944335938, 15.909994125366211, 27.73321533203125, 5.741458892822266, 32.406578063964844, 34.687721252441406, 18.794692993164062, -0.5520782470703125, 8.74312973022461, 23.740833282470703, 33.114044189453125, -29.8753662109375, 22.081504821777344, -3.5763397216796875, 34.31317138671875, 18.772964477539062, 1.7786178588867188, 4.8518829345703125, 29.376060485839844, 10.851593017578125, -15.997833251953125, 42.54315185546875, 9.493545532226562, 8.948394775390625, 24.96332550048828, 10.992195129394531, -22.840927124023438, 16.729446411132812, 50.398067474365234, 14.686027526855469, 9.231571197509766, -25.557411193847656, 30.391124725341797, 13.203636169433594, 5.5477142333984375, 12.134775161743164, 16.148635864257812, 11.596389770507812, 39.09754943847656, 19.23766326904297, 14.797561645507812, 8.36701774597168], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000552.npy"}
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 12.328357696533203, "std": 21.884132385253906, "min": -39.17766571044922, "p10": -7.5209856033325195, "median": 6.035919189453125, "p90": 45.947187042236344, "max": 63.513214111328125, "pos_frac": 0.65625, "sample": [53.11968231201172, -4.555023193359375, -7.39154052734375, 22.462875366210938, 1.1225509643554688, 0.10283470153808594, 36.21668243408203, 18.936859130859375, -11.288124084472656, 1.7728691101074219, 7.092964172363281, 3.394287109375, 32.7098274230957, 31.592849731445312, -6.948661804199219, 36.17710876464844, 2.3599720001220703, 21.978485107421875, 5.926216125488281, -39.17766571044922, -7.496973037719727, 36.71049499511719, -1.2373275756835938, 21.01390838623047, 54.7930908203125, 19.858863830566406, -0.33020782470703125, 5.834930419921875, -13.1378173828125, -7.163612365722656, 62.17425537109375, 39.63151550292969, 24.356895446777344, 63.513214111328125, 7.886199951171875, 22.332242965698242, 48.2044792175293, 15.213516235351562, 14.127655029296875, 42.18944549560547, 5.6177978515625, 6.6505889892578125, -0.9790401458740234, -1.0618171691894531, -7.3269500732421875, 0.7538299560546875, -7.531276702880859, 56.22374725341797, -31.432861328125, -14.968290328979492, 6.145622253417969, 17.586658477783203, 23.91582489013672, 2.5531082153320312, -2.3754425048828125, 15.537429809570312, -3.781452178955078, -10.273468017578125, 16.1959228515625, 23.688140869140625, -1.7539787292480469, -4.870990753173828, 47.557647705078125, -1.135711669921875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000553.npy"}
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 12.703022003173828, "std": 19.930063247680664, "min": -36.80122375488281, "p10": -5.508180809020995, "median": 8.535173416137695, "p90": 42.26079406738282, "max": 61.124481201171875, "pos_frac": 0.75, "sample": [8.124298095703125, 38.00083541870117, 55.620849609375, 41.27458190917969, 6.0343780517578125, 7.936370849609375, 9.85035514831543, 10.074237823486328, 8.214973449707031, 22.110450744628906, 2.84344482421875, -4.858589172363281, -14.676284790039062, 27.777755737304688, -33.51292419433594, 3.6951675415039062, 26.42597198486328, -11.6839599609375, 8.958122253417969, 5.657094955444336, 0.18869400024414062, 21.58820343017578, 11.648784637451172, -4.667816162109375, 16.03533935546875, 21.8614501953125, 44.449188232421875, 6.501373291015625, -9.788314819335938, 19.20677947998047, 61.124481201171875, 1.46087646484375, 20.205595016479492, -0.3778800964355469, -0.9654350280761719, -5.786577224731445, 21.784595489501953, -0.0035991668701171875, 27.59972381591797, -0.437347412109375, -36.80122375488281, 42.68345642089844, 0.0891265869140625, 8.264812469482422, 13.200263977050781, 5.561428070068359, -0.9408740997314453, 14.328788757324219, 4.709890365600586, -1.9219970703125, 27.1009521484375, 8.805534362792969, -4.107219696044922, 40.67535400390625, 56.11363220214844, 5.985801696777344, 12.127838134765625, 13.040618896484375, 10.446586608886719, 34.45713424682617, 48.00726318359375, -16.05223846435547, 51.59880065917969, 6.124446868896484], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000554.npy"}
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 12.547405242919922, "std": 18.280858993530273, "min": -44.583526611328125, "p10": -8.50691909790039, "median": 12.196426391601562, "p90": 33.65154762268067, "max": 51.78119659423828, "pos_frac": 0.8125, "sample": [3.138517379760742, -37.522987365722656, 2.9390220642089844, 9.051971435546875, 25.18921661376953, 3.8997268676757812, -4.254364013671875, 4.864448547363281, -9.624225616455078, 6.2633819580078125, 28.536209106445312, -8.953987121582031, 16.90496826171875, 6.440708160400391, 28.805191040039062, 36.143157958984375, 10.668807983398438, 2.3621063232421875, 7.140052795410156, 38.26409149169922, 13.078712463378906, -5.697784423828125, 51.78119659423828, 5.434806823730469, 27.20852279663086, 15.620269775390625, 26.236003875732422, 33.872222900390625, 24.302932739257812, 30.737464904785156, 26.9975528717041, 15.847480773925781, 18.00836181640625, 27.142372131347656, -16.77385711669922, 28.238082885742188, -44.583526611328125, 11.314140319824219, 18.79632568359375, 9.520652770996094, 14.8843994140625, 17.227920532226562, 4.421974182128906, 5.001899719238281, -6.016532897949219, 10.11346435546875, 28.860076904296875, 14.467544555664062, 39.778255462646484, -7.4637603759765625, 33.13663864135742, 0.188568115234375, -19.507888793945312, -2.5088882446289062, 30.903995513916016, 6.60516357421875, 25.017658233642578, 9.376083374023438, 14.041242599487305, -10.947513580322266, 50.6638069152832, 18.73759651184082, 37.07123565673828, 1.6430339813232422], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000555.npy"}
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 11.086524963378906, "std": 20.665014266967773, "min": -28.288406372070312, "p10": -9.34011116027832, "median": 7.351430892944336, "p90": 38.997848510742195, "max": 77.00030517578125, "pos_frac": 0.671875, "sample": [14.231414794921875, 8.5987548828125, 11.721145629882812, -9.517589569091797, -13.61529541015625, 24.68627166748047, 44.58597183227539, 9.280570983886719, -4.607294082641602, -8.925994873046875, -7.111974716186523, 2.0862693786621094, 35.744293212890625, 34.231292724609375, 4.364665985107422, -7.998748779296875, 8.773300170898438, 2.6300735473632812, 11.85354232788086, 35.502525329589844, -4.371299743652344, 28.905426025390625, 37.342803955078125, 2.7712478637695312, -4.348493576049805, 5.182289123535156, -22.01895523071289, 12.531499862670898, 20.89972686767578, -8.895034790039062, -1.7467708587646484, -28.288406372070312, 39.7071533203125, 4.194145202636719, -7.0800018310546875, 4.181056976318359, 23.441375732421875, 35.38597869873047, 2.378662109375, 0.8355312347412109, 37.252525329589844, 77.00030517578125, 4.275508880615234, 19.967323303222656, 29.503150939941406, 25.692359924316406, -3.6981048583984375, 10.597076416015625, -21.24437141418457, -2.1114959716796875, 48.11280059814453, -11.442840576171875, -5.059314727783203, 40.75666809082031, 6.644001007080078, -8.638046264648438, 9.634414672851562, -22.7838134765625, 8.058860778808594, 34.641029357910156, 43.805458068847656, 42.68562698364258, 13.320993423461914, -4.9535980224609375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000556.npy"}
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 15.483181953430176, "std": 18.428983688354492, "min": -15.6063232421875, "p10": -6.695916748046874, "median": 12.33874225616455, "p90": 42.57179412841798, "max": 61.29499816894531, "pos_frac": 0.78125, "sample": [43.21427917480469, 18.264144897460938, 14.897499084472656, 3.4227447509765625, -2.584117889404297, -7.444334030151367, 59.71055603027344, 6.06683349609375, 23.340808868408203, 10.21881103515625, 3.7768478393554688, -7.341514587402344, -4.3392486572265625, 25.605453491210938, 2.908222198486328, 6.7247161865234375, 23.805736541748047, 10.167232513427734, 40.619140625, 19.670188903808594, 2.523090362548828, -2.4797592163085938, 39.61474609375, -12.996734619140625, -2.1436080932617188, -15.6063232421875, 46.65355682373047, 47.27922058105469, 3.7800559997558594, -0.7280502319335938, 46.471893310546875, 10.970796585083008, 21.34638214111328, -10.08333969116211, 8.3985595703125, 40.31898880004883, 16.315467834472656, 36.71636962890625, 35.16058349609375, 61.29499816894531, 3.2314453125, 3.470094680786133, -7.268363952636719, -5.360206604003906, 0.19819259643554688, 8.0076904296875, 26.6837158203125, 24.305755615234375, 24.32121467590332, 13.588909149169922, 0.9657058715820312, 11.08857536315918, 18.979541778564453, 21.625349044799805, 21.414043426513672, 19.119384765625, 41.072662353515625, 10.179365158081055, 16.583423614501953, 16.246627807617188, -3.581146240234375, -10.202133178710938, 44.89778137207031, 27.845081329345703], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000557.npy"}
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 14.951372146606445, "std": 18.451595306396484, "min": -24.934837341308594, "p10": -4.36944522857666, "median": 11.581218719482422, "p90": 38.82628211975098, "max": 62.87098693847656, "pos_frac": 0.734375, "sample": [19.31841278076172, 36.23582458496094, 28.965438842773438, -1.0327606201171875, 36.33575439453125, 5.197668075561523, 40.686317443847656, -4.409688949584961, 1.6607093811035156, 22.696762084960938, -1.2967987060546875, 25.479042053222656, 19.371170043945312, 9.256637573242188, -5.282310485839844, 31.968303680419922, 22.966381072998047, 8.555160522460938, 9.785381317138672, 13.148326873779297, 38.127716064453125, 24.773712158203125, 11.310470581054688, 36.18348693847656, 16.97945785522461, -4.275543212890625, 54.31939697265625, 11.851966857910156, 23.75591278076172, 12.706268310546875, 39.125667572021484, 6.356956481933594, 7.984375, 57.597320556640625, -2.7223892211914062, 0.1814556121826172, 13.79954719543457, 6.550773620605469, 8.169334411621094, 62.87098693847656, 20.496417999267578, -11.425338745117188, 47.537109375, -3.690399169921875, 35.37738800048828, 50.618492126464844, -9.34646987915039, -3.15924072265625, 31.355941772460938, 4.313896179199219, 9.518661499023438, -24.934837341308594, 14.540672302246094, -2.3120269775390625, -6.030120849609375, -0.8736343383789062, 20.720104217529297, -1.227142333984375, -10.05172348022461, -0.27622222900390625, 24.19347381591797, 5.454473495483398, 17.161399841308594, 3.67431640625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000558.npy"}
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 13.690475463867188, "std": 21.329998016357422, "min": -28.271636962890625, "p10": -8.67733917236328, "median": 8.511332511901855, "p90": 39.42909469604493, "max": 83.88470458984375, "pos_frac": 0.703125, "sample": [-4.5320587158203125, -13.410362243652344, 7.347202301025391, -5.034553527832031, 19.707313537597656, 83.88470458984375, 36.29747772216797, 38.249847412109375, 7.211906433105469, 2.7965736389160156, -28.271636962890625, 31.994171142578125, 8.140480041503906, 15.798660278320312, 0.46036338806152344, 43.219482421875, 22.482215881347656, -0.4719696044921875, 32.188438415527344, 23.06683349609375, 34.74523162841797, 8.744331359863281, -1.5289306640625, 25.723556518554688, -1.485443115234375, 4.2537384033203125, -9.496566772460938, 27.925376892089844, 3.925811767578125, 54.89128875732422, 8.879859924316406, 10.262741088867188, 12.766319274902344, 34.347900390625, 35.87555694580078, -3.2078495025634766, 19.71684455871582, -2.1433048248291016, 0.01515960693359375, -11.161979675292969, -0.23990249633789062, -6.76580810546875, -3.29400634765625, 8.352386474609375, 7.748115539550781, 44.338775634765625, 27.960508346557617, -9.88751220703125, 26.647174835205078, 8.15582275390625, 8.410367965698242, 51.88630676269531, -28.166759490966797, -6.2226104736328125, 23.44202423095703, 52.78884506225586, 8.612297058105469, 39.934486389160156, 9.205394744873047, 7.352941513061523, 37.3032341003418, 24.21929931640625, -2.4275283813476562, -27.338153839111328], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000559.npy"}
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 13.26607608795166, "std": 18.4219970703125, "min": -16.004302978515625, "p10": -10.35327911376953, "median": 8.30820083618164, "p90": 39.42190246582032, "max": 55.994930267333984, "pos_frac": 0.734375, "sample": [13.815223693847656, 8.026634216308594, 38.65430450439453, 13.599845886230469, -16.004302978515625, -14.582527160644531, 39.72987365722656, 8.589767456054688, 46.212158203125, 0.7570934295654297, 24.161636352539062, -2.9327774047851562, 3.9759368896484375, 13.38592529296875, 0.12635040283203125, -7.62738037109375, -6.1844329833984375, -2.208829879760742, 16.791969299316406, 22.6834716796875, -11.035964965820312, 29.717575073242188, 6.974224090576172, 2.486703872680664, 39.14356231689453, 20.826095581054688, -1.7059478759765625, 37.32167053222656, 46.620460510253906, 5.017845153808594, 26.40460205078125, -6.198097229003906, -11.748062133789062, 36.72248840332031, 39.54119110107422, -11.50689697265625, 3.8794784545898438, -8.760345458984375, 55.994930267333984, -12.596900939941406, 49.24042510986328, 26.05658721923828, 15.740432739257812, 22.624052047729492, 6.8626708984375, 14.075691223144531, -1.3971939086914062, 35.24166488647461, 6.029232025146484, 2.2822952270507812, -2.384735107421875, 25.552978515625, 15.264881134033203, -1.3022613525390625, 12.261016845703125, 21.51837921142578, -14.536712646484375, 5.613803863525391, 37.979103088378906, 7.301311492919922, 45.027488708496094, 21.411151885986328, 4.2879638671875, 6.210041046142578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000560.npy"}
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 14.09537410736084, "std": 19.444421768188477, "min": -15.611282348632812, "p10": -8.63303985595703, "median": 8.184186935424805, "p90": 46.61900177001954, "max": 63.43561553955078, "pos_frac": 0.78125, "sample": [2.3548049926757812, 3.295185089111328, 2.1664886474609375, 6.802648544311523, 6.173501968383789, 25.640235900878906, 11.207275390625, -9.674781799316406, 29.069679260253906, 13.815017700195312, 10.3890380859375, 16.18365478515625, 2.6979942321777344, 25.586639404296875, 2.9091949462890625, -11.597850799560547, 4.9699249267578125, 45.56731414794922, 63.43561553955078, 3.4391632080078125, 0.0001678466796875, 27.2752685546875, 18.590087890625, -8.94586181640625, 18.99938201904297, -3.2459449768066406, 17.944503784179688, -10.487960815429688, -15.611282348632812, 56.5625, -9.509532928466797, 9.565725326538086, 38.78440856933594, 47.822654724121094, 5.85491943359375, 4.578178405761719, 6.704597473144531, 12.305471420288086, 2.5722312927246094, 59.08091735839844, 55.04368591308594, 41.92869567871094, 34.49151611328125, 47.069725036621094, 25.054241180419922, -7.8284759521484375, -14.347419738769531, 16.19849395751953, 29.66680908203125, 9.816595077514648, 2.2030792236328125, 5.688327789306641, -1.6088485717773438, -0.1945476531982422, 19.95367431640625, 20.865421295166016, 5.562967300415039, 17.303085327148438, 23.483596801757812, -4.3063507080078125, -2.427705764770508, 5.75750732421875, 47.3618049621582, -7.9031219482421875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000561.npy"}
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 15.395806312561035, "std": 18.279935836791992, "min": -22.457244873046875, "p10": -6.026060485839844, "median": 15.472648620605469, "p90": 42.47228431701661, "max": 49.745567321777344, "pos_frac": 0.78125, "sample": [4.6994171142578125, 38.240386962890625, -14.975078582763672, 25.646270751953125, 19.65679931640625, 20.866539001464844, -12.8671875, 49.715423583984375, 47.678932189941406, -5.678596496582031, 21.956192016601562, 22.88385009765625, -8.95646858215332, 2.7474403381347656, 34.29522705078125, 1.0689048767089844, -5.9753570556640625, 15.953521728515625, 21.288406372070312, -6.04779052734375, 31.39319610595703, 31.54019546508789, -4.5658416748046875, 0.9967880249023438, -2.3285140991210938, 40.2982292175293, -2.531461715698242, 1.881256103515625, 28.209564208984375, 14.187620162963867, 28.80769157409668, -7.058540344238281, 43.404022216796875, 37.34477233886719, 9.175270080566406, 33.526458740234375, 1.10369873046875, 6.529228210449219, -2.1661033630371094, 46.922645568847656, 0.5694427490234375, 1.385162353515625, 25.95121955871582, 16.272605895996094, 17.883033752441406, 47.79686737060547, 26.076133728027344, 14.991775512695312, 27.315921783447266, 13.115341186523438, 45.36546325683594, 26.548065185546875, -1.5045318603515625, 9.437591552734375, 22.260040283203125, 20.09239959716797, 26.15101432800293, 49.745567321777344, 9.408340454101562, -22.457244873046875, 1.2314453125, -12.01654052734375, 0.8515892028808594, 9.993919372558594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000562.npy"}
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 13.438268661499023, "std": 20.196935653686523, "min": -33.25138854980469, "p10": -12.190408706665037, "median": 12.865148544311523, "p90": 37.11482391357422, "max": 69.79194641113281, "pos_frac": 0.71875, "sample": [-0.16452789306640625, 17.619409561157227, 12.81148910522461, 1.2278385162353516, 30.91228485107422, 35.88287353515625, -33.25138854980469, 36.974334716796875, 12.030807495117188, 0.0669708251953125, -0.5825881958007812, 12.918807983398438, 1.9377365112304688, -6.6937103271484375, 37.17503356933594, 13.77195930480957, -12.718914031982422, 26.72687530517578, -13.432186126708984, 39.1658935546875, 14.52447509765625, 52.36122131347656, -5.188581466674805, 17.912330627441406, 26.81763458251953, 9.529579162597656, 27.859451293945312, 7.101432800292969, 18.020095825195312, -20.943870544433594, -5.0880584716796875, -5.104057312011719, -6.918327331542969, 6.249786376953125, -2.8309669494628906, 43.049217224121094, 9.45926284790039, -24.318309783935547, 20.02207374572754, 8.089481353759766, 48.473785400390625, 38.451507568359375, 30.6776180267334, -10.957229614257812, -16.617919921875, 12.134895324707031, 32.036293029785156, -1.5942230224609375, 33.351341247558594, 14.158565521240234, -25.282440185546875, 16.992530822753906, 21.091365814208984, 34.262733459472656, 21.146442413330078, 11.231378555297852, 69.79194641113281, 35.01374816894531, 7.727783203125, 35.01374053955078, 17.64812469482422, -0.9095115661621094, 22.959484100341797, 10.264350891113281], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000563.npy"}
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 15.714042663574219, "std": 18.593219757080078, "min": -21.843547821044922, "p10": -8.592750930786131, "median": 13.61009693145752, "p90": 38.384133911132814, "max": 57.36170196533203, "pos_frac": 0.8125, "sample": [32.861080169677734, -7.2517242431640625, 35.10249328613281, 23.801551818847656, 37.39478302001953, 42.64203643798828, 26.973861694335938, -21.843547821044922, -0.4917335510253906, 3.30706787109375, 44.255149841308594, 37.315452575683594, 16.779178619384766, -12.060766220092773, 7.97332763671875, 4.3776092529296875, -4.204402923583984, 15.374252319335938, 19.37457275390625, 6.2371368408203125, 8.435230255126953, 0.8295536041259766, 29.62171173095703, 15.4244384765625, 28.313644409179688, 16.396224975585938, 38.50099182128906, -10.822956085205078, 7.680784225463867, 15.375946044921875, 37.78485870361328, 48.63114929199219, 55.15143585205078, 12.870281219482422, 1.95318603515625, -15.790565490722656, 14.216711044311523, 4.328624725341797, 6.681938171386719, 7.939369201660156, 44.86046600341797, 24.217994689941406, 17.301109313964844, 0.7784023284912109, -9.167476654052734, 6.6255035400390625, 11.857837677001953, 19.61810302734375, 10.148223876953125, -19.271163940429688, 25.407695770263672, 18.479639053344727, 37.94171142578125, -1.4688358306884766, 38.11146545410156, 7.747871398925781, 34.413028717041016, 57.36170196533203, -15.330841064453125, 35.34339904785156, -3.3409881591796875, 13.003482818603516, 10.88499641418457, 8.7354736328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000564.npy"}
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 11.793862342834473, "std": 17.952463150024414, "min": -25.697052001953125, "p10": -10.525159454345703, "median": 11.20870590209961, "p90": 31.87188949584961, "max": 59.08894348144531, "pos_frac": 0.734375, "sample": [6.572216033935547, -10.336925506591797, -13.324600219726562, 31.16394805908203, -6.201728820800781, 6.93072509765625, 15.033447265625, 11.297508239746094, -5.367561340332031, 4.092750549316406, 19.87596893310547, 19.01031494140625, 36.85297393798828, -25.697052001953125, 0.213836669921875, 16.148406982421875, 14.63461685180664, 27.24040985107422, 50.7020263671875, 48.44544982910156, 34.78642272949219, -19.602630615234375, -3.7241973876953125, -4.624824523925781, 19.600088119506836, 22.78937530517578, 9.436223983764648, 17.643280029296875, 28.58069610595703, 21.982582092285156, 30.668807983398438, 2.3568572998046875, -3.39630126953125, -10.605831146240234, 24.865253448486328, -8.157047271728516, 11.019731521606445, 19.96249771118164, -14.567193984985352, 12.59002685546875, 14.657333374023438, -12.011669158935547, -7.13616943359375, 32.17529296875, 11.119903564453125, 25.88968276977539, 3.6981582641601562, 23.17563247680664, 42.2703971862793, 7.888038635253906, 21.41326904296875, -9.532844543457031, -23.101043701171875, -3.3664207458496094, 9.676587104797363, 5.8066864013671875, 0.524505615234375, 24.592586517333984, 8.806648254394531, 13.189895629882812, 59.08894348144531, 27.09857749938965, 9.687446594238281, 30.30519676208496], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000565.npy"}
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 12.47337818145752, "std": 21.191822052001953, "min": -34.519386291503906, "p10": -12.42808151245117, "median": 11.109575271606445, "p90": 37.74180030822754, "max": 71.36000061035156, "pos_frac": 0.6875, "sample": [19.11385726928711, 6.732109069824219, -5.7276153564453125, 11.423744201660156, 3.298553466796875, 28.32095718383789, 47.310890197753906, 17.486289978027344, 6.551126480102539, 32.214324951171875, 18.009456634521484, 16.882278442382812, 10.792840957641602, 11.008289337158203, 27.369163513183594, 46.14886474609375, -30.265731811523438, 21.107925415039062, 24.944042205810547, 23.984088897705078, 35.156490325927734, -1.7781982421875, -11.517593383789062, 35.32269287109375, -0.2864418029785156, -9.600723266601562, 23.37900161743164, 31.36184310913086, 33.795143127441406, -10.545326232910156, 37.81821823120117, 28.652076721191406, 32.15945816040039, 11.67279052734375, -6.7275390625, -34.519386291503906, -9.583503723144531, 12.686908721923828, 37.56349182128906, -9.716812133789062, 6.4952850341796875, -14.378730773925781, 71.36000061035156, -12.818290710449219, 40.15576934814453, 6.743492126464844, -8.256423950195312, -18.114883422851562, 17.709854125976562, 2.625518798828125, -2.906646728515625, -1.4525909423828125, 10.860931396484375, 25.02940559387207, -4.563648223876953, 6.706668853759766, -21.259925842285156, 38.95063781738281, 8.351081848144531, -18.542760848999023, 23.214641571044922, 11.210861206054688, 8.610260009765625, 60.567626953125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000566.npy"}
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 13.849380493164062, "std": 17.00166130065918, "min": -17.095115661621094, "p10": -8.66380786895752, "median": 13.988676071166992, "p90": 38.81665649414063, "max": 50.884368896484375, "pos_frac": 0.765625, "sample": [15.5931396484375, 6.108226776123047, -3.652801513671875, 42.160560607910156, 17.142318725585938, 16.830657958984375, 40.31867980957031, 19.893489837646484, 43.95576477050781, -3.2489700317382812, 27.202720642089844, 17.633512496948242, 9.717277526855469, -14.326324462890625, 11.389862060546875, 2.6254119873046875, -4.788688659667969, 15.983230590820312, 5.3155517578125, 18.35775375366211, 23.585296630859375, 0.5444869995117188, -1.2471866607666016, 26.103111267089844, 0.14098358154296875, 1.5602264404296875, -10.615425109863281, -8.830442428588867, 18.024574279785156, -17.095115661621094, 48.974037170410156, 13.664970397949219, 37.54386901855469, 47.186248779296875, 37.42604064941406, 1.9040107727050781, 26.69660758972168, -1.6866378784179688, 16.192855834960938, 50.884368896484375, -15.218667984008789, 14.648971557617188, 13.430549621582031, 11.401504516601562, 26.03066635131836, 29.70611572265625, -4.0931243896484375, 8.238449096679688, 27.893592834472656, 14.223243713378906, 25.039379119873047, -1.3108177185058594, 35.679786682128906, 7.0377655029296875, 2.7237319946289062, -9.350227355957031, -12.26613998413086, 19.53169059753418, 39.36213684082031, -8.274993896484375, 13.754108428955078, 10.195770263671875, 22.53883171081543, 20.26982879638672], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000567.npy"}
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 15.59406852722168, "std": 17.199054718017578, "min": -14.313194274902344, "p10": -3.710159683227539, "median": 15.176739692687988, "p90": 40.300730133056646, "max": 71.19049072265625, "pos_frac": 0.828125, "sample": [15.095108032226562, 1.351144790649414, 19.087627410888672, 4.970710754394531, 15.333133697509766, 21.757827758789062, 18.60870361328125, -14.313194274902344, 11.406322479248047, -3.9006729125976562, -13.21827507019043, 33.483177185058594, 5.180429458618164, 35.04193115234375, -3.528491973876953, -6.53875732421875, 5.648609161376953, 8.410598754882812, 16.477989196777344, 15.258371353149414, -3.7880172729492188, 6.39569091796875, 2.499363899230957, 21.105369567871094, 29.218467712402344, 48.32355499267578, 26.792369842529297, 44.56889343261719, -11.51980972290039, 6.3173065185546875, 19.325942993164062, 6.872323989868164, 19.00217628479004, -2.8167800903320312, 21.181543350219727, 8.889293670654297, -2.1326675415039062, 32.622840881347656, -0.2005634307861328, 20.246002197265625, 15.760848999023438, 32.847999572753906, 6.929374694824219, 19.767539978027344, 28.799888610839844, 2.8124542236328125, 19.73082733154297, -6.6748199462890625, 11.563674926757812, 1.6134185791015625, 15.527618408203125, 22.53369140625, 11.460960388183594, 38.857177734375, 40.919395446777344, 9.515363693237305, 0.7090034484863281, 44.212310791015625, 41.774314880371094, 35.45521545410156, 2.5895309448242188, 1.341094970703125, 50.267494201660156, 71.19049072265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000568.npy"}
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 11.926674842834473, "std": 18.010759353637695, "min": -28.45136260986328, "p10": -9.25606632232666, "median": 10.992255210876465, "p90": 39.34028625488282, "max": 52.654144287109375, "pos_frac": 0.78125, "sample": [22.392807006835938, 13.218276977539062, 9.863401412963867, 38.061805725097656, 12.6695556640625, 44.21675109863281, -18.59038543701172, 41.00714111328125, 23.9493408203125, 43.86517333984375, 23.23590087890625, 10.570930480957031, 1.0737648010253906, 13.968851089477539, -28.11748504638672, 16.40161895751953, 17.16024398803711, 33.64265441894531, 0.7241630554199219, 17.95448112487793, 41.4647331237793, 21.78839111328125, 9.126934051513672, 39.888206481933594, -3.4903411865234375, 27.738765716552734, 21.544113159179688, 8.778694152832031, 7.10906982421875, -9.91180419921875, -27.904573440551758, 7.861591339111328, 1.1369972229003906, 7.784534454345703, 2.10113525390625, -1.3186187744140625, 52.654144287109375, 5.434404373168945, 12.550941467285156, 3.130359649658203, 11.323606491088867, -5.472414016723633, -1.2639617919921875, 0.23120880126953125, 9.7078857421875, -28.45136260986328, 20.866744995117188, -17.560638427734375, 21.855030059814453, 17.3520565032959, 13.583738327026367, 10.660903930664062, 45.30138397216797, -9.631584167480469, -8.379858016967773, 8.383172988891602, 21.602645874023438, 20.38409423828125, 27.951995849609375, -5.5571441650390625, -5.733161926269531, 13.354248046875, 31.82802963256836, 6.233924865722656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000569.npy"}
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 18.10549545288086, "std": 18.96437644958496, "min": -14.61810302734375, "p10": -3.8001451492309566, "median": 16.563549041748047, "p90": 43.33313713073731, "max": 79.84603881835938, "pos_frac": 0.84375, "sample": [31.21923828125, -14.61810302734375, 18.660552978515625, 14.605701446533203, 16.003997802734375, 19.409889221191406, -0.6786117553710938, -7.323036193847656, 9.66015625, 0.7722625732421875, 7.314811706542969, 5.918848037719727, 9.896650314331055, 19.814311981201172, 8.802314758300781, 29.360492706298828, -2.4859561920166016, -3.2313804626464844, 5.200767517089844, 21.723651885986328, 17.447662353515625, -11.580867767333984, 2.8789291381835938, 40.93202209472656, 57.74062728881836, 0.6709632873535156, 21.285266876220703, 19.464263916015625, 36.554954528808594, 14.357635498046875, 22.996444702148438, 44.933021545410156, 26.169567108154297, -12.482498168945312, 15.707279205322266, 14.998695373535156, 28.09258270263672, 15.156396865844727, 44.362186431884766, 16.277114868164062, 33.7083740234375, 37.66319274902344, 36.821746826171875, -4.043901443481445, 7.4484100341796875, 16.84998321533203, 46.99658203125, 34.75212860107422, 17.364593505859375, -14.511173248291016, 28.272428512573242, -8.201179504394531, 4.6400604248046875, 33.67884826660156, 58.241668701171875, 3.2766799926757812, 19.350229263305664, 6.70048713684082, 18.137191772460938, 39.790870666503906, 79.84603881835938, 3.589813232421875, 45.90483856201172, 6.4850616455078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000570.npy"}
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 10.103891372680664, "std": 16.616308212280273, "min": -27.649337768554688, "p10": -11.800486946105956, "median": 11.917255401611328, "p90": 28.35338478088379, "max": 46.8481330871582, "pos_frac": 0.75, "sample": [17.32280731201172, 3.1742401123046875, 8.892784118652344, 6.306346893310547, -4.9577484130859375, 44.03953552246094, 16.364212036132812, 12.678726196289062, 24.951684951782227, 33.1185302734375, 14.272222518920898, 28.032119750976562, -4.5548553466796875, 7.264955520629883, -4.277828216552734, 13.8494873046875, 8.4700927734375, 19.205463409423828, 0.5429229736328125, 23.049575805664062, 31.95272445678711, -1.3641738891601562, 24.40380096435547, 8.894920349121094, 18.145164489746094, 28.491069793701172, 14.745346069335938, 20.85736656188965, -20.910263061523438, 16.290740966796875, -0.10818672180175781, -16.484487533569336, 6.494655609130859, -10.699365615844727, 18.637435913085938, 3.6287593841552734, -12.272396087646484, -1.8340835571289062, -7.4273681640625, 0.5837860107421875, 18.766616821289062, 15.522964477539062, 18.36486053466797, 17.85802459716797, 17.29471206665039, 43.35284423828125, 27.560272216796875, -20.170072555541992, 6.143852233886719, 1.5389785766601562, -22.456497192382812, -26.45378875732422, 19.49170684814453, 20.419349670410156, 20.428760528564453, 5.7301025390625, 11.155784606933594, 39.65098190307617, 9.733421325683594, 18.286453247070312, -7.012563705444336, 2.4727859497070312, -27.649337768554688, 46.8481330871582], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000571.npy"}
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 12.30316162109375, "std": 16.857973098754883, "min": -16.673355102539062, "p10": -9.362281799316404, "median": 10.094514846801758, "p90": 32.978736114501956, "max": 61.973388671875, "pos_frac": 0.765625, "sample": [1.336639404296875, -16.673355102539062, 2.6296844482421875, 19.554298400878906, 30.99408721923828, 13.434406280517578, 40.780487060546875, 9.386489868164062, -7.169624328613281, 61.973388671875, 10.776992797851562, 19.029081344604492, 24.326324462890625, 29.42792510986328, -6.419635772705078, 23.51264190673828, -10.103607177734375, 5.484342575073242, 2.9479618072509766, 31.685531616210938, 28.102840423583984, 22.633087158203125, 16.076156616210938, 37.030128479003906, 17.780487060546875, 6.842998504638672, -0.7543811798095703, 8.069908142089844, -10.03466796875, -4.203514099121094, 36.1124267578125, 33.53296661376953, 50.30661392211914, -13.5113525390625, 20.4599552154541, 10.402835845947266, 9.78619384765625, -10.742691040039062, 5.591192245483398, 7.051305770874023, 16.04566192626953, 19.319448471069336, 20.591323852539062, 25.294872283935547, 8.608814239501953, 27.873214721679688, 0.3862800598144531, -10.396987915039062, 45.23468017578125, 22.967269897460938, 13.546134948730469, 27.202880859375, 0.9975433349609375, 15.92047119140625, -15.632904052734375, -7.769866943359375, 1.251708984375, 5.8830413818359375, -7.29638671875, -7.7933807373046875, 18.47216796875, -3.6797122955322266, 8.602499008178711, 4.326961517333984], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000572.npy"}
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 13.541420936584473, "std": 22.88884735107422, "min": -35.1421012878418, "p10": -12.795541381835937, "median": 10.292484283447266, "p90": 48.7483268737793, "max": 72.0038070678711, "pos_frac": 0.765625, "sample": [10.136138916015625, 15.427816390991211, -35.1421012878418, 22.03961181640625, -7.220252990722656, 3.239776611328125, -8.248720169067383, 31.24566650390625, 14.378631591796875, -11.641036987304688, -33.72967529296875, -15.9111328125, -4.81456184387207, 4.0362091064453125, 3.8066940307617188, 11.063385009765625, 9.286483764648438, 16.896774291992188, 47.86151885986328, -24.116012573242188, 35.795494079589844, 10.733976364135742, 39.70355987548828, 10.448829650878906, -13.290328979492188, -1.9531517028808594, 64.11616516113281, 8.131824493408203, 35.780296325683594, 5.571830749511719, 28.377487182617188, 0.828277587890625, 9.597816467285156, 14.641593933105469, 59.112945556640625, 5.210536956787109, 11.314422607421875, 5.865568161010742, 9.868339538574219, 72.0038070678711, 34.503700256347656, 16.550500869750977, 18.505767822265625, 0.5093879699707031, -8.344696044921875, 9.632293701171875, 60.99644470214844, 11.08380126953125, 3.6363067626953125, 17.101234436035156, 52.605712890625, 41.35307312011719, 5.434112548828125, 22.814224243164062, -0.7656135559082031, -15.739105224609375, 54.983154296875, -5.033546447753906, 17.300945281982422, 19.776094436645508, 16.477218627929688, 0.8628807067871094, -17.17584991455078, 49.128387451171875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000573.npy"}
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 15.25147819519043, "std": 15.979156494140625, "min": -12.857315063476562, "p10": -3.135032653808593, "median": 14.89686393737793, "p90": 40.467927932739265, "max": 55.072601318359375, "pos_frac": 0.8125, "sample": [14.576026916503906, -5.391578674316406, -3.4088211059570312, -11.303276062011719, 5.28082275390625, 25.98812484741211, 4.1817626953125, -0.63818359375, 4.100152969360352, -1.7168655395507812, -7.991981506347656, 16.880538940429688, 8.133041381835938, 3.5882205963134766, -2.4961929321289062, -4.351581573486328, -4.149873733520508, 18.733444213867188, 14.296306610107422, 12.348747253417969, 37.171905517578125, 15.933380126953125, 11.375808715820312, 18.156402587890625, 1.4130706787109375, 47.27412414550781, 13.938491821289062, 46.49620056152344, 1.2047061920166016, 34.48412322998047, 18.573486328125, -0.641082763671875, 15.217700958251953, 18.214248657226562, 17.986907958984375, 20.25555419921875, -12.857315063476562, 17.986717224121094, 35.28044891357422, 1.6154651641845703, 15.818634033203125, 21.754287719726562, 47.932289123535156, 22.156753540039062, 22.9702205657959, 41.311683654785156, 18.500534057617188, 24.178619384765625, 11.260421752929688, 12.082210540771484, 3.2191390991210938, 46.595645904541016, 7.3505706787109375, 17.151182174682617, 23.983306884765625, 1.5373954772949219, 8.922491073608398, 38.49916458129883, 26.305267333984375, 43.44488525390625, -0.8519287109375, 21.053939819335938, 55.072601318359375, 0.10611724853515625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000574.npy"}
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 10.309743881225586, "std": 20.246400833129883, "min": -36.20650100708008, "p10": -13.285366630554199, "median": 8.133689880371094, "p90": 40.80767059326172, "max": 61.450469970703125, "pos_frac": 0.734375, "sample": [1.9834671020507812, -16.458667755126953, -31.143566131591797, -5.75358772277832, 30.4029541015625, 2.8741455078125, -0.073455810546875, 14.885292053222656, 1.5050430297851562, -8.60336685180664, 25.489646911621094, -20.429656982421875, 40.123329162597656, -12.477739334106445, -14.450538635253906, 21.702774047851562, -13.631492614746094, 13.841392517089844, 13.717384338378906, 9.957069396972656, 18.340438842773438, 43.37993621826172, 13.890785217285156, 9.048023223876953, 2.0924606323242188, 1.5813446044921875, -11.54400634765625, 18.772531509399414, 33.22528076171875, 2.3363418579101562, 2.685476303100586, 6.712059020996094, 44.0761833190918, 45.94538879394531, 32.641380310058594, -22.441940307617188, 57.843971252441406, 0.8194198608398438, 43.425071716308594, 35.085533142089844, -1.4949188232421875, 11.013397216796875, 21.089279174804688, 8.37646484375, 3.6018218994140625, 7.8909149169921875, 3.5723628997802734, -36.20650100708008, 6.5845947265625, 61.450469970703125, 23.932388305664062, 11.185348510742188, 14.838027954101562, 13.285808563232422, -8.088115692138672, 12.520408630371094, 6.220132827758789, -6.799083709716797, 7.600297927856445, 26.022235870361328, -9.799888610839844, 41.10095977783203, 17.19213104248047, -6.641002655029297], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000575.npy"}
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 14.680505752563477, "std": 17.66414451599121, "min": -19.980743408203125, "p10": -5.482883834838867, "median": 13.980998992919922, "p90": 37.627422332763686, "max": 53.53742599487305, "pos_frac": 0.734375, "sample": [33.4505615234375, 46.652374267578125, -2.890167236328125, 14.511856079101562, 15.171630859375, 19.514358520507812, 51.64175796508789, 9.359537124633789, 19.63623046875, 11.220382690429688, -5.528818130493164, -10.357633590698242, 27.033336639404297, -13.354598999023438, 13.693214416503906, -9.785102844238281, -4.52618408203125, 32.56431579589844, 32.326507568359375, 10.26171875, 22.339609146118164, 51.604454040527344, 15.924091339111328, 0.438568115234375, 26.117828369140625, 20.578744888305664, 18.414859771728516, -5.984165191650391, 43.1356201171875, -4.432697296142578, 10.681938171386719, 2.9196701049804688, 3.214508056640625, 29.969011306762695, 14.268783569335938, 7.204803466796875, 40.130950927734375, -2.93365478515625, 2.716033935546875, 3.9973297119140625, 28.889087677001953, -0.17919540405273438, 39.097198486328125, -7.676029205322266, 28.54924774169922, 6.1732940673828125, 53.53742599487305, 32.95741271972656, -2.4476051330566406, 29.985801696777344, 29.730545043945312, -19.980743408203125, 3.3927459716796875, -0.2321014404296875, 21.476913452148438, 14.355537414550781, -5.375703811645508, -3.350992202758789, 31.144454956054688, 27.98794174194336, 3.559354782104492, 5.261991500854492, -2.4036636352539062, 34.19794464111328], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000576.npy"}
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 10.352388381958008, "std": 19.308244705200195, "min": -30.366912841796875, "p10": -13.729451370239257, "median": 8.763425827026367, "p90": 36.097510528564456, "max": 60.82575225830078, "pos_frac": 0.71875, "sample": [16.592227935791016, 2.724672317504883, -1.3880462646484375, 12.165584564208984, -2.3125228881835938, 7.692718505859375, 2.1190834045410156, -6.091827392578125, 35.073951721191406, -4.151664733886719, -5.786102294921875, 25.845115661621094, 21.284526824951172, 2.889434814453125, -13.60061264038086, 42.733367919921875, 32.31903076171875, 37.88819122314453, -30.366912841796875, 17.6290340423584, 33.64555740356445, 14.622978210449219, 3.333494186401367, 39.269630432128906, -2.6730270385742188, -22.18653106689453, -20.402629852294922, 29.232383728027344, 17.182771682739258, -3.435760498046875, 8.054283142089844, 15.94028091430664, 2.4883041381835938, 3.8473777770996094, -13.78466796875, 8.938987731933594, 10.242938995361328, 11.033843994140625, 8.912269592285156, -3.70361328125, 11.591682434082031, 18.547622680664062, 39.425933837890625, 5.92851448059082, 60.03451156616211, 36.53617858886719, -22.929893493652344, 13.101593017578125, 8.614582061767578, 29.881744384765625, -24.675155639648438, 17.287620544433594, 60.82575225830078, 31.560619354248047, 5.8566131591796875, 1.827972412109375, -20.121612548828125, -7.64320182800293, -5.2057952880859375, 27.012605667114258, 7.956417083740234, 18.74188995361328, 13.84576416015625, 0.7328033447265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000577.npy"}
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 9.41856861114502, "std": 17.36763572692871, "min": -26.494003295898438, "p10": -9.983119773864745, "median": 5.416887283325195, "p90": 34.980966567993164, "max": 52.970611572265625, "pos_frac": 0.640625, "sample": [9.369400024414062, -16.948463439941406, 2.4988250732421875, 3.6674957275390625, 17.79156494140625, -0.7805728912353516, 9.778850555419922, 9.862674713134766, 16.798446655273438, 4.047380447387695, -2.4386062622070312, 52.970611572265625, -3.4211502075195312, 13.678958892822266, 29.277828216552734, -26.494003295898438, 39.63047790527344, 21.88947105407715, 17.260597229003906, 31.228416442871094, 37.69548034667969, 12.854576110839844, 4.015342712402344, 6.033061981201172, -3.8661651611328125, 12.442270278930664, 26.12395477294922, -1.7859649658203125, 5.574346542358398, -0.9330101013183594, 48.771522521972656, 17.310836791992188, 52.29591751098633, 8.02728271484375, -5.598388671875, 4.9383087158203125, 8.069084167480469, 22.261146545410156, 7.215309143066406, 5.259428024291992, 31.941009521484375, -16.015106201171875, 13.210647583007812, -8.437921524047852, -5.6025390625, -10.968366622924805, -0.12145233154296875, 35.74275207519531, -6.727886199951172, 34.98464584350586, 13.474945068359375, -12.391525268554688, 34.972381591796875, 0.08561515808105469, 3.5590057373046875, -10.645347595214844, 2.086090087890625, -0.30205535888671875, -4.740753173828125, -0.6550121307373047, -5.666614532470703, -2.0837860107421875, -12.0330810546875, 32.75022888183594], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000578.npy"}
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 17.34226417541504, "std": 19.19150733947754, "min": -26.318374633789062, "p10": -5.101691818237303, "median": 15.019756317138672, "p90": 45.8015075683594, "max": 63.3927001953125, "pos_frac": 0.796875, "sample": [23.73760986328125, 10.72222900390625, 14.386760711669922, 6.9561767578125, 4.0780029296875, -11.5369873046875, 24.528732299804688, 5.4134368896484375, 19.84405517578125, 55.13481140136719, 36.41325378417969, 39.94915008544922, 17.957698822021484, 54.97404861450195, 22.315820693969727, 11.041618347167969, 40.80290985107422, -1.4707107543945312, -5.633094787597656, 29.678756713867188, 63.3927001953125, -7.280632019042969, 52.31352615356445, -0.5084648132324219, 14.173210144042969, 13.620677947998047, 57.31404113769531, 31.772483825683594, 38.287498474121094, 18.447288513183594, -3.8617515563964844, 13.254255294799805, 37.94464874267578, 13.772768020629883, 24.529281616210938, 22.611785888671875, 25.100982666015625, 17.69944953918457, 9.727188110351562, 48.96333694458008, -1.5723724365234375, 22.406940460205078, -1.3930206298828125, -19.39928436279297, 15.652751922607422, 5.105470657348633, -6.796012878417969, -26.318374633789062, 47.943763732910156, 10.633468627929688, 19.944473266601562, 8.266372680664062, -3.4765472412109375, 28.6257381439209, 20.404949188232422, 6.625072479248047, 30.724899291992188, 28.38604736328125, 6.641050338745117, 7.2996826171875, -10.444931030273438, 7.597171783447266, 21.565185546875, 0.9138870239257812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000579.npy"}
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 12.421806335449219, "std": 19.715896606445312, "min": -30.237916946411133, "p10": -11.872209167480468, "median": 11.76479721069336, "p90": 41.1834083557129, "max": 54.82041931152344, "pos_frac": 0.671875, "sample": [-15.211479187011719, -8.0517578125, -0.263092041015625, 9.45924186706543, -6.143058776855469, 28.88727378845215, 44.13758850097656, 16.31194305419922, 13.459762573242188, 0.19793319702148438, 24.388534545898438, 20.93872833251953, 4.9884185791015625, 4.9192047119140625, -12.263679504394531, 9.24915885925293, 25.246021270751953, 14.394248962402344, -2.6386642456054688, 11.640113830566406, -2.5014076232910156, 22.680381774902344, 38.448158264160156, 9.437088012695312, 31.05803108215332, -4.676841735839844, 42.064178466796875, -0.29979705810546875, -0.7490043640136719, 53.48468780517578, 39.20397186279297, 38.271236419677734, 13.371726989746094, 22.27069664001465, -10.958778381347656, 54.82041931152344, -21.881336212158203, -30.237916946411133, 4.953296661376953, 45.686553955078125, -25.354671478271484, 14.8984375, -13.057167053222656, 13.19537353515625, 42.03173828125, -3.175821304321289, 5.421934127807617, 3.316242218017578, 22.478683471679688, 13.384140014648438, 26.033710479736328, 32.49821472167969, -0.04050445556640625, -15.932304382324219, -0.029544830322265625, -4.409952163696289, 17.639450073242188, 28.387924194335938, 15.901485443115234, 52.29925537109375, 11.889480590820312, -3.1562652587890625, 4.638629913330078, 28.045364379882812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000580.npy"}
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 11.059354782104492, "std": 18.647050857543945, "min": -27.421098709106445, "p10": -9.225181007385254, "median": 9.967069625854492, "p90": 31.661922454833995, "max": 62.71543884277344, "pos_frac": 0.734375, "sample": [27.465328216552734, 23.64476776123047, 1.8566436767578125, 9.888385772705078, 12.151687622070312, 5.2154083251953125, 1.46624755859375, -8.707366943359375, 24.04802703857422, 18.895545959472656, -23.91913604736328, 4.403186798095703, 3.809030532836914, 12.02412223815918, 22.168106079101562, -9.38859748840332, 21.132308959960938, 6.9525299072265625, 20.43951416015625, 0.49739837646484375, 45.92913818359375, 5.487884521484375, 48.7122802734375, -3.65130615234375, 37.43895721435547, 61.04145812988281, 23.670495986938477, 5.056243896484375, 29.067176818847656, 22.73006820678711, -10.191619873046875, 18.025909423828125, 4.3870697021484375, 29.019622802734375, 10.045753479003906, 18.734882354736328, 9.40447998046875, -19.91358184814453, 62.71543884277344, 34.766380310058594, 3.8751144409179688, -21.375213623046875, 11.144439697265625, -1.6131362915039062, 14.258052825927734, -27.421098709106445, 32.773956298828125, -0.7830657958984375, -2.8456153869628906, -23.61996841430664, 22.430797576904297, -3.2971572875976562, 11.308784484863281, 22.368682861328125, 2.958332061767578, 22.175582885742188, -0.6600780487060547, -8.843875885009766, -5.1028289794921875, 10.953861236572266, 1.6318130493164062, 24.719722747802734, 23.24918556213379, -7.007427215576172], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000581.npy"}
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 12.679601669311523, "std": 16.631690979003906, "min": -19.520883560180664, "p10": -8.974137115478516, "median": 11.424933433532715, "p90": 32.44759941101074, "max": 56.8824462890625, "pos_frac": 0.796875, "sample": [27.851043701171875, 55.81914520263672, 7.468841552734375, 24.673866271972656, 2.257831573486328, 0.8060951232910156, 25.037551879882812, 11.795269012451172, 26.041404724121094, 30.872928619384766, 10.310646057128906, 13.160018920898438, 20.014419555664062, 56.8824462890625, -6.051612854003906, 31.142290115356445, 8.395889282226562, -2.2173690795898438, 14.23305892944336, 27.339004516601562, 7.044410705566406, 15.508613586425781, 12.020896911621094, 34.427215576171875, 22.91492462158203, 17.497146606445312, 9.017396926879883, 10.928590774536133, -17.37865447998047, -19.520883560180664, 1.2479114532470703, 16.20526123046875, 27.184669494628906, -8.869285583496094, -2.071195602416992, 31.801029205322266, 38.686546325683594, 20.725658416748047, 27.984310150146484, 5.006782531738281, 32.724700927734375, 14.280826568603516, 8.117574691772461, 11.054597854614258, -6.123725891113281, 1.2285385131835938, -10.128425598144531, 13.027687072753906, 9.662208557128906, -19.227298736572266, 4.481075286865234, 5.465782165527344, -1.0390701293945312, 0.30737876892089844, -12.75419807434082, 14.579734802246094, 28.378170013427734, -12.189422607421875, 40.75579071044922, -9.019073486328125, 4.5941009521484375, 3.5718154907226562, 20.01042938232422, 33.54115295410156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000582.npy"}
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 12.673837661743164, "std": 20.168251037597656, "min": -36.92516326904297, "p10": -10.073011779785157, "median": 12.582122802734375, "p90": 40.568067932128926, "max": 61.34017562866211, "pos_frac": 0.71875, "sample": [14.773990631103516, 2.1722755432128906, -4.4266357421875, -8.293407440185547, 56.78964614868164, 10.984981536865234, -10.352783203125, -9.89146614074707, -16.220626831054688, 4.204368591308594, 26.248199462890625, -15.80386734008789, -1.8036422729492188, 24.655563354492188, 6.837900161743164, -10.10595703125, 19.58016586303711, -2.617982864379883, 61.34017562866211, 42.282493591308594, 20.36659049987793, -25.852291107177734, 13.0263671875, -1.2518386840820312, 17.095382690429688, 21.56109046936035, 17.890380859375, -19.122352600097656, 10.563972473144531, 28.593467712402344, 19.345867156982422, 13.110382080078125, 18.228073120117188, 6.995330810546875, 16.335540771484375, 46.75074768066406, -9.996139526367188, -5.647186279296875, 22.43547821044922, 46.906333923339844, 7.486257553100586, 49.13221740722656, -36.92516326904297, 33.79185485839844, 6.525026321411133, 28.94891357421875, 23.855751037597656, 0.7086715698242188, 16.927215576171875, -5.892696380615234, 22.662261962890625, 9.212600708007812, 20.27262306213379, 3.908781051635742, 18.2965030670166, 12.13787841796875, 3.6115970611572266, 24.443252563476562, -0.12556076049804688, 36.56774139404297, 56.462806701660156, 34.95378875732422, -7.730743408203125, 4.2054443359375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000583.npy"}
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 12.968864440917969, "std": 20.723968505859375, "min": -31.7839298248291, "p10": -10.653466415405273, "median": 12.89739990234375, "p90": 39.937068176269534, "max": 67.04556274414062, "pos_frac": 0.71875, "sample": [52.7431640625, 12.760772705078125, 18.603443145751953, -6.345247268676758, 13.9447021484375, -0.5811939239501953, 45.6739501953125, 25.648277282714844, 24.092201232910156, 20.980667114257812, 13.003559112548828, -4.437875747680664, -5.847177505493164, -5.913610458374023, -12.540592193603516, 23.67906951904297, 25.10897445678711, 40.21825408935547, 28.17518424987793, -9.905948638916016, 31.027076721191406, 37.66375732421875, -4.439903259277344, 33.856414794921875, 24.633636474609375, 22.456512451171875, 7.83119010925293, 12.582977294921875, 5.614654541015625, 4.970100402832031, 25.4210205078125, 2.5292816162109375, 21.78118896484375, 6.1142578125, -10.973831176757812, 40.984169006347656, -9.652793884277344, -6.9352569580078125, -1.4210281372070312, 1.6417179107666016, 13.91324234008789, 41.770469665527344, -11.327606201171875, 15.100061416625977, 30.376876831054688, 27.566680908203125, 0.020503997802734375, 13.661897659301758, 13.133941650390625, -12.709526062011719, -30.017333984375, 12.791240692138672, -6.521270751953125, 39.280967712402344, 1.8337345123291016, 67.04556274414062, 66.31388092041016, 2.061124801635742, 17.492952346801758, 7.6156158447265625, 6.718788146972656, 26.268123626708984, -25.34435272216797, -31.7839298248291], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000584.npy"}
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 10.476306915283203, "std": 17.99974822998047, "min": -46.34447479248047, "p10": -3.315363311767578, "median": 9.751204490661621, "p90": 31.100186538696292, "max": 57.28883361816406, "pos_frac": 0.796875, "sample": [14.06899642944336, 30.069351196289062, 4.221549987792969, 12.122138977050781, 14.580734252929688, 7.81829833984375, 4.226287841796875, 28.28436279296875, 2.5112686157226562, 2.8490123748779297, -37.1390495300293, 4.389545440673828, 29.750457763671875, -2.4871673583984375, 1.2069683074951172, 26.95306396484375, -19.884363174438477, 6.260871887207031, 20.822837829589844, 16.221778869628906, 34.949337005615234, 3.1720619201660156, 3.9968738555908203, -17.825698852539062, 10.46796989440918, 12.147481918334961, 3.5384597778320312, -1.3898162841796875, 14.867843627929688, 8.168716430664062, -2.986316680908203, 0.20232391357421875, 28.77352523803711, -2.295318603515625, -14.506500244140625, 5.90185546875, 4.64373779296875, 27.289100646972656, 31.541973114013672, 25.928936004638672, 8.13345718383789, 19.090717315673828, 10.677047729492188, -15.96879768371582, -46.34447479248047, 51.21880340576172, 47.61406326293945, 33.91334915161133, -3.0697021484375, -1.9367446899414062, 11.578218460083008, 11.243331909179688, 16.23188018798828, 12.635334014892578, 3.8590431213378906, -3.4206466674804688, 16.093475341796875, 39.05115509033203, 57.28883361816406, 16.047607421875, 3.9750900268554688, 9.034439086914062, 13.226337432861328, 16.878379821777344], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000585.npy"}
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 14.152345657348633, "std": 16.359663009643555, "min": -22.058677673339844, "p10": -3.435439300537109, "median": 14.751985549926758, "p90": 35.51633758544922, "max": 60.70220947265625, "pos_frac": 0.828125, "sample": [34.53631591796875, 9.168647766113281, 23.490840911865234, 14.737136840820312, 5.609344482421875, 34.99497985839844, 14.932588577270508, 1.0113449096679688, 6.7838897705078125, 12.782449722290039, 22.37394905090332, -7.578620910644531, 0.028247833251953125, 48.307159423828125, 35.739776611328125, -4.276195526123047, 12.454431533813477, 29.907821655273438, 7.041013717651367, 23.38399314880371, 4.290611267089844, 26.248943328857422, 14.766834259033203, 30.384227752685547, 4.820220947265625, -0.7680091857910156, -2.4837570190429688, 6.247711181640625, 25.370018005371094, 20.18549346923828, 38.95439147949219, 18.844341278076172, 2.9721450805664062, 22.560871124267578, 20.572250366210938, 5.543178558349609, -20.322261810302734, 60.70220947265625, -3.109333038330078, 5.258136749267578, 4.362089157104492, 4.9941864013671875, 5.9159393310546875, 22.8826904296875, 14.976516723632812, -3.5751991271972656, -5.558525085449219, 15.200023651123047, -0.43988609313964844, 20.78131866455078, 36.72419738769531, 1.4786224365234375, 14.986061096191406, 4.752704620361328, 45.37501907348633, 2.2550888061523438, -22.058677673339844, 46.200828552246094, -16.602996826171875, 20.368698120117188, 23.56833267211914, 20.64615249633789, 26.386093139648438, 15.66351318359375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000586.npy"}
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 13.500299453735352, "std": 21.089008331298828, "min": -50.852386474609375, "p10": -9.835405731201169, "median": 12.85300064086914, "p90": 41.41002159118653, "max": 61.26488494873047, "pos_frac": 0.75, "sample": [4.741230010986328, -28.004531860351562, 2.5281982421875, 14.376420974731445, 23.96246337890625, -11.635723114013672, 3.699207305908203, -6.739959716796875, 36.946868896484375, 14.5501708984375, 6.210491180419922, 18.47608184814453, -11.162025451660156, -6.6956787109375, 0.0078125, 15.762100219726562, 20.28684425354004, -1.5382976531982422, 11.291702270507812, 61.26488494873047, 27.206436157226562, 12.934165954589844, -0.9222946166992188, 46.339691162109375, 14.8050537109375, 22.219390869140625, 55.68715286254883, -0.7422142028808594, -50.852386474609375, -1.6556282043457031, 23.707351684570312, 13.871097564697266, 27.844879150390625, 40.25626754760742, 7.5108795166015625, 9.873062133789062, 19.328182220458984, -30.030303955078125, -4.8150787353515625, 34.277313232421875, 37.742713928222656, -14.249217987060547, 31.346351623535156, 17.029401779174805, 0.8263912200927734, 17.701129913330078, 9.736488342285156, 31.787864685058594, 26.078792572021484, 12.452251434326172, 41.90448760986328, -16.965070724487305, 31.63262176513672, 2.79803466796875, 12.771835327148438, 50.43902587890625, 4.379875183105469, -4.485954284667969, -0.5800285339355469, 10.947479248046875, 22.83202362060547, 53.57086181640625, 44.08352279663086, 5.0670013427734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000587.npy"}
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 12.990303039550781, "std": 17.81580924987793, "min": -23.47574234008789, "p10": -7.070219421386717, "median": 12.584022521972656, "p90": 32.30627479553223, "max": 59.203094482421875, "pos_frac": 0.75, "sample": [12.792678833007812, -8.935165405273438, -0.27548980712890625, 16.78893280029297, -4.068565368652344, 26.68377685546875, 32.94824981689453, 15.536865234375, 26.045700073242188, 30.48663330078125, 57.56024169921875, 7.813074111938477, 10.353364944458008, 8.280771255493164, 6.708457946777344, 18.77004051208496, 55.65747833251953, 59.203094482421875, 10.225751876831055, 44.52641296386719, 13.56182861328125, 19.19441032409668, -12.707710266113281, 50.165428161621094, 18.199098587036133, 31.523391723632812, 12.897659301757812, -3.5954971313476562, -2.0190486907958984, -4.72796630859375, 3.0936717987060547, 23.822650909423828, -23.47574234008789, 7.8220062255859375, 20.844505310058594, 25.698638916015625, -2.678903579711914, 14.19192886352539, 15.581815719604492, -1.4776840209960938, -12.761116027832031, -5.426155090332031, 12.3753662109375, 32.64179611206055, 16.68682098388672, 10.500900268554688, 31.446287155151367, 2.617290496826172, 1.5019378662109375, 22.15593910217285, 9.522613525390625, 2.4046630859375, 1.5930900573730469, 12.803321838378906, -0.8010787963867188, 7.4412384033203125, -7.774818420410156, 18.088821411132812, 7.6799774169921875, -22.172393798828125, 19.98908805847168, 24.0927734375, 31.048171997070312, -17.291915893554688], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000588.npy"}
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 16.189313888549805, "std": 18.31147575378418, "min": -25.1417236328125, "p10": -2.8202442169189452, "median": 11.868873596191406, "p90": 43.525681304931645, "max": 76.95419311523438, "pos_frac": 0.84375, "sample": [0.7364902496337891, 25.24396514892578, -13.427003860473633, 45.19812774658203, 14.407844543457031, 5.554439544677734, 5.4954833984375, 9.798812866210938, 1.3662185668945312, 2.9729976654052734, -0.1196136474609375, 3.2212772369384766, 56.51795959472656, 13.72232437133789, 30.046218872070312, 11.326393127441406, 11.4591064453125, 33.84005355834961, 45.97050857543945, 28.44483184814453, 7.2202911376953125, 23.539825439453125, 24.850875854492188, -2.93609619140625, 39.98240661621094, 12.278640747070312, -6.483131408691406, 22.435684204101562, 34.826324462890625, 21.43840980529785, 19.464616775512695, -2.5499229431152344, 44.13304901123047, 6.00531005859375, -5.15667724609375, 17.93902587890625, 10.631561279296875, 18.39720916748047, 12.281295776367188, 54.14176940917969, 17.5596923828125, 21.991350173950195, 27.68224334716797, 9.289226531982422, 17.79091453552246, 8.305313110351562, 6.115291595458984, -8.034542083740234, -0.9503936767578125, 8.748947143554688, -25.1417236328125, 15.790351867675781, 50.68171691894531, -10.772575378417969, 8.695762634277344, 28.053152084350586, 42.108489990234375, 11.343002319335938, 10.137840270996094, 5.574790954589844, 6.3529510498046875, 9.736177444458008, 76.95419311523438, 13.887041091918945], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000589.npy"}
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 15.812767028808594, "std": 19.534013748168945, "min": -24.894561767578125, "p10": -4.975117111206054, "median": 15.398816108703613, "p90": 44.336062622070315, "max": 56.4111328125, "pos_frac": 0.765625, "sample": [31.65374755859375, 6.0955352783203125, 14.164421081542969, -3.811309814453125, 24.65235137939453, 43.406150817871094, 9.850601196289062, -24.894561767578125, 3.653533935546875, -0.6724777221679688, 6.71099853515625, 2.806638717651367, 0.96826171875, 5.479591369628906, -20.02642822265625, 12.445253372192383, 24.80276107788086, 27.160675048828125, -19.030193328857422, -0.38808441162109375, -4.324520111083984, 20.822967529296875, 4.3196563720703125, 42.880043029785156, 48.4852294921875, -1.8729629516601562, 2.0078048706054688, 7.62176513671875, 52.02058410644531, 21.00848388671875, -2.066162109375, 29.900970458984375, 5.419313430786133, 22.424072265625, 35.8594970703125, 27.711265563964844, 22.760786056518555, 5.4042205810546875, 3.38330078125, 18.323455810546875, 47.474361419677734, 17.7393741607666, 51.7857666015625, 26.711639404296875, 42.11517333984375, 19.046096801757812, 20.472564697265625, 24.441452026367188, -7.495880126953125, 16.651939392089844, -10.115131378173828, 40.430458068847656, 31.208919525146484, -2.59869384765625, 5.1444854736328125, 44.734596252441406, 16.633211135864258, 12.689651489257812, 50.411415100097656, -3.3937950134277344, 26.007427215576172, -5.253944396972656, 56.4111328125, -16.352375030517578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000590.npy"}
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 15.290372848510742, "std": 21.269859313964844, "min": -35.45611572265625, "p10": -10.10684814453125, "median": 15.812419891357422, "p90": 40.392334365844725, "max": 59.46043395996094, "pos_frac": 0.734375, "sample": [-1.2649555206298828, 54.46071243286133, 21.449377059936523, 32.44940948486328, 20.38225555419922, 23.75139617919922, 40.37788009643555, 13.921615600585938, 26.748729705810547, 31.44995880126953, 6.2657623291015625, -2.191497802734375, 24.672607421875, -21.085426330566406, 13.517890930175781, -7.2726287841796875, 20.196483612060547, 5.6513519287109375, 14.34326171875, 5.071891784667969, 23.76708984375, 19.2954158782959, 38.546051025390625, -10.163665771484375, 34.39757537841797, -35.45611572265625, 27.676010131835938, 17.12413787841797, 5.720588684082031, 9.304290771484375, -20.10541534423828, -13.428594589233398, 46.69195556640625, 59.46043395996094, 1.469970703125, 32.84556579589844, -17.605953216552734, 4.579261779785156, 38.4373779296875, 0.257904052734375, -1.5805892944335938, -8.244729995727539, 15.187187194824219, 30.586875915527344, -2.258279800415039, 55.77105712890625, -8.14559555053711, -25.97631072998047, -7.628974914550781, 42.7073974609375, -9.974273681640625, -0.8717498779296875, 39.17488098144531, 21.97972869873047, 9.11655044555664, 36.56907653808594, 35.143714904785156, 21.92220687866211, 16.437652587890625, 21.803417205810547, 52.715110778808594, 13.500350952148438, 4.540628433227539, 40.398529052734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000591.npy"}
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 13.944801330566406, "std": 19.002050399780273, "min": -14.027782440185547, "p10": -7.243048667907712, "median": 9.40390396118164, "p90": 46.740038299560545, "max": 68.58969116210938, "pos_frac": 0.796875, "sample": [6.2046966552734375, 52.53004455566406, 36.40571975708008, 10.382608413696289, 26.958194732666016, 54.480438232421875, 34.883575439453125, 5.381038665771484, 4.7774505615234375, 9.511146545410156, -8.433082580566406, 17.10384178161621, -0.21771240234375, 2.5932693481445312, 1.6777267456054688, 30.632240295410156, 54.50862503051758, 18.14698028564453, 1.0076217651367188, 3.9178123474121094, -13.499374389648438, 32.33612823486328, 13.189136505126953, 46.70794677734375, 4.675121307373047, 16.943191528320312, 11.858299255371094, 46.75379180908203, 25.33330535888672, 18.726791381835938, -8.218849182128906, 19.01255226135254, 2.0666732788085938, 11.979667663574219, 9.296661376953125, 6.988201141357422, -9.178756713867188, 28.51526641845703, -14.027782440185547, -4.966180801391602, -0.7988338470458984, 20.420679092407227, -1.2812881469726562, 0.20674896240234375, -9.59979248046875, 21.665191650390625, 10.871261596679688, 50.56536865234375, 55.83142852783203, -2.6375045776367188, 68.58969116210938, -1.0218353271484375, 2.0488929748535156, 3.6291122436523438, 6.019859313964844, 3.8004589080810547, 6.4091796875, 9.824073791503906, -13.549663543701172, 11.642009735107422, 5.912010192871094, 5.9317626953125, 11.272369384765625, 19.772098541259766], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000592.npy"}
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 13.544509887695312, "std": 21.059499740600586, "min": -33.15047836303711, "p10": -12.47812995910644, "median": 12.972902297973633, "p90": 42.531643676757824, "max": 58.58385467529297, "pos_frac": 0.734375, "sample": [23.696151733398438, 4.167270660400391, 28.94615936279297, -4.837623596191406, 35.57591247558594, 0.01604461669921875, 8.582427978515625, -22.243362426757812, 53.2815055847168, 10.285820007324219, 48.54935836791992, 1.4618072509765625, 39.172149658203125, 34.0191650390625, 9.1651611328125, -14.586322784423828, 16.463706970214844, 21.148284912109375, 43.55165100097656, 15.257637023925781, 5.726789474487305, 25.581466674804688, 43.76898956298828, 34.99470520019531, 12.929306030273438, 11.582588195800781, -6.465414047241211, 29.571996688842773, 13.016498565673828, -33.15047836303711, 26.243202209472656, -18.05010223388672, -3.1116104125976562, 24.183650970458984, 18.643686294555664, 30.191925048828125, -2.8909645080566406, -7.2326812744140625, 28.056556701660156, -7.559013366699219, -17.701740264892578, -5.375707626342773, 40.15162658691406, -26.781639099121094, 51.944908142089844, 24.567150115966797, 53.181556701660156, 13.122739791870117, -7.472133636474609, 58.58385467529297, 14.430419921875, -1.0631275177001953, 12.035659790039062, 13.959705352783203, 14.280426025390625, -26.412078857421875, 8.044872283935547, 31.101089477539062, 0.9052009582519531, -5.378902435302734, 3.9175491333007812, 11.336151123046875, 6.1252288818359375, 21.641870498657227], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000593.npy"}
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 11.02804183959961, "std": 16.214162826538086, "min": -27.192703247070312, "p10": -6.989821624755858, "median": 8.231988906860352, "p90": 31.013030624389657, "max": 50.01426696777344, "pos_frac": 0.71875, "sample": [5.85960578918457, 21.445602416992188, -5.085437774658203, 25.561012268066406, 25.189167022705078, 26.225357055664062, 8.000019073486328, 8.463958740234375, 21.449676513671875, -9.853614807128906, -3.6502227783203125, -3.3493919372558594, -10.633962631225586, 10.040679931640625, 6.647485733032227, -1.5507125854492188, -3.5157470703125, -1.9964866638183594, 28.764816284179688, 2.988983154296875, 24.487857818603516, 8.958877563476562, 3.421649932861328, 16.898792266845703, 50.01426696777344, 16.273460388183594, 17.010093688964844, -9.793617248535156, 20.281230926513672, -1.9472637176513672, 47.48474884033203, 21.8621826171875, 40.85516357421875, 32.750675201416016, 0.38092041015625, -5.6349334716796875, -7.570487976074219, 15.866687774658203, 31.976551055908203, -11.157814025878906, 19.865806579589844, 1.7076587677001953, 28.39072608947754, 38.77165985107422, 1.3030319213867188, 9.885740280151367, 6.993417739868164, 28.545791625976562, -1.7071094512939453, 28.190475463867188, 5.019012451171875, -27.192703247070312, 17.46820068359375, 44.159339904785156, 20.36541748046875, 19.06588363647461, 3.5854759216308594, 5.746315002441406, 6.1078338623046875, -3.90087890625, 10.233413696289062, -17.116661071777344, 0.6590576171875, -3.772064208984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000594.npy"}
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 17.264110565185547, "std": 21.903024673461914, "min": -47.3575439453125, "p10": -9.603495025634759, "median": 16.45486068725586, "p90": 44.883846282958984, "max": 69.65943908691406, "pos_frac": 0.78125, "sample": [-2.0839004516601562, 27.083084106445312, 47.04034423828125, 18.178260803222656, 18.00640869140625, 12.917570114135742, 21.180381774902344, 9.583961486816406, -2.9923858642578125, 39.37491226196289, 28.668960571289062, 9.878700256347656, 7.635402679443359, -18.098432540893555, 45.00818634033203, 6.082855224609375, 8.016407012939453, -3.5644302368164062, 32.8932991027832, 2.0851993560791016, 36.16503143310547, 15.543010711669922, 42.2107048034668, 15.584602355957031, 53.2149658203125, 2.6790924072265625, 10.279701232910156, 16.968284606933594, 18.144699096679688, 23.092300415039062, 7.102516174316406, -14.240638732910156, 10.781414031982422, -13.01605224609375, 33.08208465576172, 44.593719482421875, 48.17213439941406, -12.191665649414062, -47.3575439453125, 67.29985809326172, 69.65943908691406, 41.74696731567383, 19.918365478515625, 15.941436767578125, -14.813060760498047, 44.45458984375, 18.759384155273438, 25.799972534179688, 14.933181762695312, 18.32596206665039, -2.528776168823242, 30.568382263183594, 6.37462043762207, 5.169158935546875, 6.2446441650390625, 20.844722747802734, 51.98857116699219, 27.879966735839844, -3.0074615478515625, -0.23699951171875, -0.2650108337402344, -24.28844451904297, 27.853919982910156, 38.57646179199219], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000595.npy"}
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 8.666291236877441, "std": 18.67624855041504, "min": -30.211633682250977, "p10": -10.220695877075194, "median": 5.3568620681762695, "p90": 40.09250221252444, "max": 55.586181640625, "pos_frac": 0.640625, "sample": [-12.434486389160156, 12.037460327148438, 44.03736877441406, 9.964126586914062, 16.477943420410156, -8.500045776367188, -3.1111488342285156, 5.637733459472656, -0.1348419189453125, 31.10462188720703, -14.775619506835938, 11.369449615478516, -13.197608947753906, -3.4495162963867188, 42.26014709472656, 49.08924865722656, 21.572715759277344, 4.764125823974609, -17.061317443847656, 8.739316940307617, -8.611572265625, 25.258987426757812, 5.455223083496094, 13.722929000854492, 54.416343688964844, 35.034664154052734, 20.392921447753906, -10.910320281982422, 2.83758544921875, 2.48248291015625, -0.003765106201171875, 6.718788146972656, 7.451366424560547, -0.05720710754394531, 8.124336242675781, -6.8300933837890625, -6.063270568847656, 55.586181640625, -2.736530303955078, -20.14783477783203, 11.45571517944336, 10.875602722167969, 0.3696556091308594, -30.211633682250977, -1.90936279296875, 6.229282379150391, 0.3081817626953125, -7.652044296264648, 45.20043182373047, 29.08282470703125, 34.15495300292969, 2.19195556640625, -2.91680908203125, 5.249992370605469, -3.3204421997070312, 48.68181610107422, 1.95904541015625, 15.268890380859375, -8.556146621704102, 5.258501052856445, 11.931842803955078, 10.719131469726562, 8.59246826171875, -4.832101821899414], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000596.npy"}
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 13.03089714050293, "std": 18.5463924407959, "min": -23.82274627685547, "p10": -7.56710681915283, "median": 7.248044967651367, "p90": 40.3780258178711, "max": 60.24083709716797, "pos_frac": 0.75, "sample": [-8.57387924194336, -23.82274627685547, -3.3902816772460938, -0.9561614990234375, 40.70848083496094, 5.812526702880859, 12.117961883544922, -12.078216552734375, -5.461065292358398, 50.58644104003906, 3.048980712890625, 28.325721740722656, 3.926227569580078, 15.455841064453125, -5.03863525390625, 9.327333450317383, 60.24083709716797, 31.842510223388672, 24.920860290527344, -9.152650833129883, 17.533857345581055, 6.138557434082031, -11.774658203125, 2.5286178588867188, 29.236106872558594, 17.734630584716797, 6.709470748901367, -4.212673187255859, 23.46723175048828, 1.72369384765625, 15.611061096191406, 12.675285339355469, 48.619972229003906, 19.73175048828125, 27.945602416992188, 6.533634185791016, -1.8662567138671875, 7.786619186401367, 11.527250289916992, 46.90052795410156, -1.5993881225585938, -8.469696044921875, 3.617828369140625, 1.2737789154052734, -4.07209587097168, 3.127483367919922, 0.2224292755126953, 3.3636722564697266, 25.972488403320312, 11.255279541015625, 58.220542907714844, 13.601179122924805, 30.91912078857422, -3.294464111328125, -12.882804870605469, 21.99718475341797, 4.188665390014648, 4.6317291259765625, 41.412994384765625, 39.606964111328125, 4.020597457885742, 34.722694396972656, 33.53193664550781, 26.218908309936523], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000597.npy"}
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 9.416687965393066, "std": 18.442407608032227, "min": -29.093238830566406, "p10": -14.282986450195311, "median": 7.582484245300293, "p90": 34.009088897705084, "max": 52.63227844238281, "pos_frac": 0.703125, "sample": [0.5130214691162109, -12.578659057617188, 8.381721496582031, 8.629859924316406, 21.3736572265625, -1.3095512390136719, 8.987503051757812, 38.08112716674805, 21.581239700317383, -29.093238830566406, 0.6480770111083984, 5.297353744506836, -1.1756477355957031, -17.544382095336914, 52.63227844238281, -15.013412475585938, 48.14200973510742, -12.065597534179688, 29.147239685058594, 20.375743865966797, -5.778757095336914, -20.224899291992188, -15.174148559570312, 22.885765075683594, 0.8492050170898438, 7.910560607910156, 7.684558868408203, 21.94427490234375, -5.683124542236328, 0.19267654418945312, -7.523220062255859, -0.08713912963867188, 13.169744491577148, 10.574897766113281, 34.44732666015625, 0.4389381408691406, 27.796066284179688, 0.7811088562011719, 31.208580017089844, 0.221405029296875, 30.737823486328125, 17.105690002441406, 0.03946876525878906, 52.194793701171875, 0.7251739501953125, 16.153446197509766, 13.232666015625, -1.7122783660888672, 20.18995475769043, 17.53423309326172, -1.8607177734375, -19.22411346435547, 1.3149185180664062, 11.849029541015625, 7.480409622192383, -16.87811279296875, 45.3919677734375, 17.1204833984375, -0.16289520263671875, 3.5669708251953125, 30.44232177734375, 32.986534118652344, -10.81454849243164, 34.61060333251953], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000598.npy"}
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 11.335319519042969, "std": 19.499431610107422, "min": -29.465065002441406, "p10": -8.699538421630859, "median": 6.889835357666016, "p90": 35.32434692382813, "max": 67.21540832519531, "pos_frac": 0.6875, "sample": [-0.8464126586914062, -2.483652114868164, 7.887590408325195, 11.585075378417969, -4.288368225097656, 15.564794540405273, -0.36722564697265625, 44.02642822265625, -13.44704818725586, -5.65576171875, -9.248298645019531, -5.973978042602539, 1.91119384765625, 16.674999237060547, 29.77165985107422, -24.467926025390625, -20.810546875, -0.6309890747070312, 6.167131423950195, -1.717254638671875, 34.75005340576172, 18.208139419555664, 4.717140197753906, -29.465065002441406, 5.433738708496094, 3.4357452392578125, -0.3179168701171875, 45.79214096069336, -14.076736450195312, 4.255805969238281, 50.616668701171875, 19.274463653564453, -7.419097900390625, -5.452705383300781, 7.174568176269531, 8.354095458984375, 4.586097717285156, 9.498619079589844, 6.6051025390625, 35.984466552734375, 32.61808395385742, 2.646076202392578, 2.8837051391601562, 7.876298904418945, 35.570472717285156, 31.861522674560547, 34.54053497314453, -0.41744232177734375, -7.301113128662109, 15.191761016845703, 58.432586669921875, 27.7567138671875, 14.910629272460938, 15.634468078613281, 33.953399658203125, 18.69664192199707, 67.21540832519531, 5.2236175537109375, 28.302810668945312, 5.838186264038086, -16.306533813476562, 26.701448440551758, 24.995986938476562, 13.028450012207031], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000599.npy"}
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 11.66678237915039, "std": 18.770986557006836, "min": -38.248233795166016, "p10": -7.329390907287598, "median": 8.742843627929688, "p90": 40.282289123535165, "max": 53.46768569946289, "pos_frac": 0.734375, "sample": [19.9871826171875, 12.939926147460938, -7.33820915222168, 3.6650543212890625, 38.38544464111328, 6.386192321777344, 9.877243041992188, -10.603002548217773, 41.09522247314453, 44.39143371582031, -3.5180130004882812, 16.200550079345703, 31.45612335205078, 43.08836364746094, -4.271385192871094, -2.821260452270508, -7.308815002441406, -6.236063003540039, -1.612091064453125, 31.86614990234375, 4.897716522216797, 7.722236633300781, 31.322738647460938, 22.34600830078125, -5.163368225097656, 30.036178588867188, 10.476882934570312, 47.48453140258789, -2.3815040588378906, 22.946731567382812, 42.09050750732422, 8.40667724609375, -24.959423065185547, 1.0476150512695312, 15.125350952148438, -38.248233795166016, -0.21710205078125, 13.166656494140625, 1.50750732421875, 36.148834228515625, 1.0826339721679688, 13.660308837890625, 14.239501953125, 24.622360229492188, -3.6080703735351562, 15.202016830444336, -16.067642211914062, -8.225622177124023, -20.67646026611328, 32.02890396118164, 2.900014877319336, 8.406303405761719, 5.363500595092773, 5.75921630859375, 12.063987731933594, 7.372406005859375, 20.655868530273438, 4.318340301513672, 16.25106430053711, 53.46768569946289, 9.079010009765625, 4.799957275390625, 11.8580322265625, 52.734130859375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000600.npy"}
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 15.290576934814453, "std": 19.772794723510742, "min": -26.368736267089844, "p10": -6.785726165771483, "median": 10.459264755249023, "p90": 41.902482223510745, "max": 65.6185302734375, "pos_frac": 0.75, "sample": [25.632431030273438, 9.942657470703125, 28.70783233642578, -10.503362655639648, 33.77848815917969, 30.678852081298828, 65.6185302734375, -0.555755615234375, 5.304466247558594, -5.9032440185546875, 62.572662353515625, -26.368736267089844, 33.954063415527344, 14.341320037841797, -0.38342857360839844, 8.832427978515625, 32.440773010253906, 24.687286376953125, -10.315841674804688, 24.81531524658203, 44.181419372558594, 8.142208099365234, 1.2615203857421875, 35.58197021484375, 59.34886932373047, 6.504062652587891, -1.9658889770507812, -1.1902313232421875, 26.513587951660156, -5.392242431640625, 4.419273376464844, 25.561607360839844, 10.856964111328125, 5.0705108642578125, 3.419984817504883, 42.198001861572266, 6.695747375488281, 15.532669067382812, 6.9474029541015625, 9.388347625732422, -11.828887939453125, 15.706130981445312, 4.951160430908203, 37.49104690551758, 4.921417236328125, 10.681020736694336, 11.660221099853516, -7.163932800292969, -3.4924564361572266, 10.237508773803711, 21.58277130126953, 42.49314880371094, -4.757196426391602, 0.28879547119140625, 19.38296890258789, 30.8814697265625, 43.41461181640625, 26.366165161132812, 31.656585693359375, -0.49504852294921875, -11.623443603515625, 33.664588928222656, 41.21293640136719, -18.987106323242188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000601.npy"}
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 13.789787292480469, "std": 16.715831756591797, "min": -21.66394805908203, "p10": -5.119337463378907, "median": 11.03957748413086, "p90": 38.117384338378905, "max": 50.83405303955078, "pos_frac": 0.8125, "sample": [-5.969474792480469, 27.308128356933594, 27.358802795410156, 50.19830322265625, 14.704055786132812, 12.10992431640625, 12.684066772460938, 0.27084922790527344, 19.396583557128906, 10.222793579101562, 50.83405303955078, -4.292270660400391, 30.059112548828125, 37.916770935058594, -12.156497955322266, 45.870819091796875, -15.19354248046875, 32.03141784667969, 18.249114990234375, 3.3192615509033203, 2.533050537109375, -2.4110870361328125, 0.027873992919921875, -5.139991760253906, 2.576812744140625, 16.88318634033203, 5.606977462768555, 23.242881774902344, 11.267524719238281, 12.612150192260742, -5.632778167724609, 11.330581665039062, 14.351615905761719, 7.401027679443359, 8.919120788574219, 2.848651885986328, 8.91293716430664, 4.797946929931641, 35.525184631347656, 2.1471023559570312, -0.2472991943359375, 17.85367202758789, 22.326623916625977, 41.76694869995117, -5.071144104003906, -21.66394805908203, 21.855079650878906, 4.713787078857422, 13.088653564453125, 3.644977569580078, -0.4823493957519531, 8.3009033203125, 44.93621826171875, 26.385765075683594, 5.332145690917969, 34.29292678833008, 21.38202667236328, 10.811630249023438, 50.263031005859375, 38.20336151123047, -6.340873718261719, 26.641788482666016, 9.533203125, 2.2961959838867188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000602.npy"}
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 13.843368530273438, "std": 18.024343490600586, "min": -22.060272216796875, "p10": -6.895237350463867, "median": 11.222362518310547, "p90": 39.42124824523926, "max": 66.7457275390625, "pos_frac": 0.78125, "sample": [19.787811279296875, 41.25745391845703, 11.468719482421875, -6.326484680175781, -4.463294982910156, 18.799758911132812, 7.0071868896484375, 19.097206115722656, -5.782222747802734, -0.3228912353515625, 39.04890060424805, 10.459770202636719, 18.141658782958984, -3.899679183959961, 4.354164123535156, 30.11457061767578, -7.138988494873047, 7.357475280761719, 4.32440185546875, 66.7457275390625, 13.335548400878906, 13.549652099609375, 39.58082580566406, 9.154167175292969, -1.0800743103027344, 10.593536376953125, 22.78868293762207, -7.4890594482421875, 22.011734008789062, -15.078704833984375, 4.140052795410156, 7.451652526855469, 14.138711929321289, 13.295978546142578, -16.391691207885742, 20.667556762695312, 22.3204345703125, 6.034450531005859, 0.028057098388671875, 50.518775939941406, 10.752641677856445, 19.231056213378906, 11.417160034179688, 11.027565002441406, 32.98174285888672, 32.327125549316406, 17.151290893554688, 1.5360774993896484, 8.376228332519531, 3.0226993560791016, -10.091743469238281, 3.064889907836914, 19.764074325561523, 31.946941375732422, 43.088783264160156, -1.4064865112304688, 16.071901321411133, -11.228057861328125, 38.33810806274414, 32.68586730957031, 41.346065521240234, 1.3928451538085938, 55.63746643066406, -22.060272216796875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000603.npy"}
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 12.347269058227539, "std": 20.784305572509766, "min": -46.62132263183594, "p10": -7.750598144531249, "median": 10.341484069824219, "p90": 45.288664245605474, "max": 59.82374572753906, "pos_frac": 0.71875, "sample": [18.369873046875, 26.60871124267578, -0.3081626892089844, -22.800003051757812, 15.43271255493164, 17.601903915405273, 0.18530654907226562, 11.274288177490234, 50.049041748046875, -10.337196350097656, -12.307815551757812, -2.0661087036132812, 3.4688186645507812, 28.87987518310547, -5.590423583984375, 7.802101135253906, 28.41759490966797, 6.282159805297852, 12.570701599121094, 19.05255889892578, -29.092979431152344, -7.38128662109375, 4.0396881103515625, -7.90887451171875, 16.01931381225586, -0.1919269561767578, 2.503072738647461, 5.300178527832031, 11.36212158203125, -46.62132263183594, 13.76803970336914, -5.312461853027344, 11.174652099609375, -7.3533172607421875, 46.12237548828125, -5.1438140869140625, -2.109710693359375, 26.753265380859375, 56.71534729003906, 22.782470703125, 17.1151065826416, 50.79314422607422, 54.39228820800781, 7.664806365966797, 21.644699096679688, 32.05601501464844, 5.292274475097656, 37.630401611328125, 59.82374572753906, -5.830299377441406, 43.34333801269531, 9.508316040039062, 5.4550323486328125, 5.311256408691406, 18.12946319580078, 22.155223846435547, 20.30392837524414, 1.2726154327392578, 3.2982959747314453, 48.742027282714844, 25.507980346679688, -5.2633514404296875, -11.031299591064453, 24.899391174316406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000604.npy"}
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 11.557830810546875, "std": 18.339786529541016, "min": -27.241844177246094, "p10": -11.00699577331543, "median": 12.768709182739258, "p90": 37.231601333618165, "max": 59.0194091796875, "pos_frac": 0.71875, "sample": [16.47476577758789, 3.460205078125, -15.325630187988281, 59.0194091796875, 23.14006805419922, 34.59715270996094, -1.2460784912109375, 16.9793701171875, 17.765228271484375, -12.978458404541016, 25.051010131835938, -11.141036987304688, 1.8212356567382812, -1.1672286987304688, 3.6236228942871094, -10.694232940673828, 14.487442016601562, 29.67314910888672, 21.545055389404297, 13.469379425048828, -3.3477783203125, -1.494537353515625, 15.824838638305664, -23.61126708984375, -5.207420349121094, 12.068038940429688, 44.49205780029297, 20.01422882080078, 7.14793586730957, 11.130348205566406, 18.34699249267578, 7.971046447753906, -23.72010040283203, 44.995643615722656, -0.98291015625, 26.5762939453125, 37.47749710083008, 23.42713165283203, 7.131431579589844, 37.732086181640625, -5.0675048828125, 5.135280609130859, 20.510299682617188, 17.588232040405273, 37.49705505371094, 5.711952209472656, 23.7017822265625, 36.65784454345703, 18.654335021972656, -7.246124267578125, 44.370506286621094, 14.551475524902344, 15.203144073486328, 3.2688217163085938, -25.70684814453125, 4.569002151489258, 10.27458381652832, -27.241844177246094, 15.2034912109375, -4.417888641357422, 6.373195648193359, 34.59417724609375, -5.6653900146484375, 16.655595779418945], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000605.npy"}
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 12.466194152832031, "std": 16.894142150878906, "min": -17.20142364501953, "p10": -7.051195526123047, "median": 10.323249816894531, "p90": 34.397381973266604, "max": 57.323974609375, "pos_frac": 0.703125, "sample": [-7.069938659667969, 34.768943786621094, 5.6707000732421875, 34.150115966796875, -3.3079605102539062, 0.5248985290527344, 29.038837432861328, -12.163833618164062, 0.39554595947265625, 29.21837043762207, 42.40296936035156, -7.884635925292969, 27.3011474609375, 23.409927368164062, 22.544822692871094, -17.20142364501953, 15.837934494018555, 7.759613037109375, -1.341970443725586, 8.937873840332031, 57.323974609375, 4.1497344970703125, 3.303546905517578, 11.908248901367188, -7.8463592529296875, -1.421844482421875, 18.895599365234375, -2.5903186798095703, 13.015132904052734, 23.35779571533203, 11.073295593261719, -2.5076217651367188, -16.12006378173828, 34.01052474975586, 6.398414611816406, 4.039398193359375, 9.573204040527344, -1.2681560516357422, 15.3660888671875, 21.931854248046875, 29.993335723876953, 48.41184997558594, 15.694633483886719, 24.512439727783203, 7.924163818359375, 46.76554870605469, 1.8162918090820312, 13.397125244140625, -7.390905380249023, -3.717967987060547, 34.503353118896484, -6.271095275878906, -7.0074615478515625, 25.76543426513672, -0.85101318359375, 20.524288177490234, 9.172542572021484, -5.196086883544922, 13.526145935058594, 42.757965087890625, 12.212875366210938, 32.5289306640625, 15.650619506835938, -2.4709949493408203], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000606.npy"}
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 13.672164916992188, "std": 18.866249084472656, "min": -25.64727783203125, "p10": -8.116789817810059, "median": 14.507322311401367, "p90": 36.913484954834, "max": 68.60531616210938, "pos_frac": 0.71875, "sample": [16.69988250732422, 4.18853759765625, -7.558786392211914, 7.1502685546875, 19.52404022216797, 18.074119567871094, 26.488906860351562, 21.00774383544922, -25.64727783203125, -8.355934143066406, 10.579219818115234, 15.167762756347656, 54.81806945800781, 40.53313446044922, 7.2687835693359375, -5.340362548828125, -7.3860626220703125, 3.689359664916992, 19.943984985351562, 51.74476623535156, 10.903488159179688, 29.588924407958984, 29.513526916503906, 24.68825912475586, 33.593467712402344, 20.13788604736328, 7.492668151855469, 17.452377319335938, -4.797626495361328, -3.89349365234375, 29.174789428710938, 22.656204223632812, 29.722129821777344, 16.525856018066406, -6.732429504394531, 44.560935974121094, 42.87535095214844, -1.927947998046875, 29.759628295898438, -3.42572021484375, -10.629150390625, 5.675727844238281, 2.3834667205810547, 17.214847564697266, 13.846881866455078, -10.341434478759766, -4.502897262573242, 26.36852264404297, 68.60531616210938, 38.26118469238281, -0.971710205078125, -8.856582641601562, 12.992317199707031, 30.799636840820312, -6.455013275146484, 22.415794372558594, 0.15472793579101562, 1.3034515380859375, 16.62615203857422, 33.76885223388672, 8.769775390625, -20.42969512939453, -12.525421142578125, 20.085365295410156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000607.npy"}
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 10.587484359741211, "std": 17.0660457611084, "min": -29.60882568359375, "p10": -6.868362426757812, "median": 8.038070678710938, "p90": 30.002250671386722, "max": 60.716949462890625, "pos_frac": 0.78125, "sample": [17.374296188354492, -3.3382720947265625, 3.8284339904785156, 30.214309692382812, -0.5716896057128906, 14.541717529296875, 27.116676330566406, -5.16070556640625, 7.454135894775391, -9.673683166503906, 9.947982788085938, 12.229728698730469, 23.112030029296875, 17.94769287109375, 8.38880729675293, 17.277687072753906, 46.47917175292969, -24.890716552734375, -1.911447525024414, 1.1073188781738281, 5.4570465087890625, -5.479793548583984, 0.08208084106445312, 57.78166580200195, 21.140708923339844, -1.9684066772460938, -9.285812377929688, 9.563423156738281, 4.312828063964844, 41.951847076416016, 8.36114501953125, -14.480125427246094, 7.084136962890625, 15.961891174316406, 1.5503082275390625, 18.226760864257812, 21.11822509765625, 0.42066192626953125, 6.9629364013671875, 4.90203857421875, 3.7985000610351562, 8.89114761352539, 5.353172302246094, 0.052093505859375, -7.32708740234375, 19.13665771484375, -5.798004150390625, 28.963165283203125, 60.716949462890625, 21.26838493347168, 11.726043701171875, 8.456398010253906, 7.714996337890625, 17.4901123046875, 2.8965606689453125, 25.275009155273438, -29.60882568359375, 43.76438903808594, 11.3299560546875, -11.310768127441406, 2.24249267578125, 29.5074462890625, 2.6739044189453125, 35.249267578125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000608.npy"}
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 15.568743705749512, "std": 16.52623176574707, "min": -27.168540954589844, "p10": -3.278133392333984, "median": 14.992794036865234, "p90": 32.51374435424805, "max": 59.01184844970703, "pos_frac": 0.8125, "sample": [39.10383605957031, 16.855148315429688, 29.123695373535156, 7.959625244140625, 8.705169677734375, 2.5863494873046875, 22.818695068359375, 19.439743041992188, -0.5554885864257812, -1.6110687255859375, -5.225059509277344, 13.146774291992188, -3.41229248046875, 6.679950714111328, 55.57875061035156, 32.438201904296875, 39.683876037597656, 25.826557159423828, 8.024141311645508, 59.01184844970703, -12.098175048828125, 25.980865478515625, 16.7347412109375, 26.833038330078125, -2.9650955200195312, 27.929298400878906, 19.90587043762207, 17.018478393554688, -9.068473815917969, 18.922317504882812, 32.546119689941406, -0.4352283477783203, 52.11811065673828, 24.072662353515625, 23.236122131347656, 24.775890350341797, 12.672809600830078, 1.2381954193115234, 51.15721130371094, 15.067611694335938, 9.368289947509766, -9.091693878173828, 5.804958343505859, 30.80999755859375, 28.27840805053711, -27.168540954589844, 13.890724182128906, 14.917976379394531, 20.60590362548828, 2.207609176635742, 12.231407165527344, 12.388160705566406, 10.486663818359375, 9.646917343139648, 18.06318473815918, 3.221141815185547, 20.15560531616211, 23.593639373779297, 24.279911041259766, -10.277153015136719, 25.603012084960938, 7.285270690917969, 9.870857238769531, -1.5934295654296875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000609.npy"}
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 14.658071517944336, "std": 21.793006896972656, "min": -35.415252685546875, "p10": -8.208598327636718, "median": 10.327948570251465, "p90": 42.61370620727539, "max": 81.90304565429688, "pos_frac": 0.71875, "sample": [10.426729202270508, 38.59538269042969, 36.54083251953125, 37.3260498046875, 3.5846328735351562, 26.620746612548828, 5.6756439208984375, 6.947757720947266, -13.782089233398438, 13.145538330078125, 8.875991821289062, -18.678903579711914, 9.654216766357422, 42.93721008300781, 1.0596466064453125, 10.809410095214844, 45.381961822509766, -4.932159423828125, -35.415252685546875, 1.1089630126953125, -0.40345001220703125, -5.51153564453125, 9.250900268554688, 16.852977752685547, -8.279373168945312, -3.49578857421875, -10.795211791992188, 58.70636749267578, 25.231340408325195, 41.858863830566406, 4.93342399597168, 63.44677734375, 31.838851928710938, 32.941993713378906, -0.275787353515625, 24.83738136291504, 33.96534729003906, 2.0829715728759766, 21.80279541015625, 10.9559326171875, 13.611648559570312, 10.890830993652344, 11.519767761230469, -8.04345703125, 52.428077697753906, -3.0337562561035156, 2.304004669189453, -2.4532432556152344, 32.86219787597656, 41.070045471191406, 3.0040130615234375, 12.52491569519043, 10.114334106445312, -0.12278938293457031, -3.7019710540771484, -1.0555191040039062, -10.916770935058594, 24.551475524902344, 81.90304565429688, 10.229167938232422, 29.901138305664062, -10.3382568359375, 53.85992431640625, 11.180618286132812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000610.npy"}
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 10.158263206481934, "std": 16.07805061340332, "min": -21.724594116210938, "p10": -8.010523223876952, "median": 10.055909156799316, "p90": 29.236010360717774, "max": 63.63716125488281, "pos_frac": 0.703125, "sample": [24.056304931640625, 15.511198043823242, -13.518417358398438, 9.453113555908203, 39.27693176269531, 4.41693115234375, -1.829498291015625, 16.225906372070312, -3.300395965576172, 10.409255981445312, -7.333103179931641, 35.7090950012207, 13.190305709838867, 14.612167358398438, -2.7459449768066406, -0.07322883605957031, -8.300846099853516, 27.450668334960938, -19.34640121459961, 63.63716125488281, 15.84912109375, 25.488086700439453, 12.83123779296875, -6.708576202392578, -7.148761749267578, -0.7384414672851562, 28.661361694335938, 10.863067626953125, 4.0403900146484375, 24.9891357421875, 7.456172943115234, -11.0643310546875, -5.178260803222656, 38.966041564941406, 4.7151641845703125, 1.98651123046875, -21.35061264038086, 39.60438537597656, 9.844566345214844, 9.581954956054688, 21.159866333007812, 10.267251968383789, -5.29803466796875, 24.108779907226562, 4.4233856201171875, 6.193321228027344, 12.729145050048828, 16.18524169921875, 13.274311065673828, 17.848388671875, 21.65601348876953, 37.42449188232422, 5.504682540893555, 15.407379150390625, 7.6700592041015625, 14.031044006347656, 29.482288360595703, -21.724594116210938, 16.12261199951172, 7.7939453125, -9.196847915649414, -3.5346336364746094, -3.1196537017822266, 11.531013488769531], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000611.npy"}
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 14.776636123657227, "std": 20.895933151245117, "min": -47.77754592895508, "p10": -3.773303031921386, "median": 12.64682388305664, "p90": 41.92111511230469, "max": 73.46232604980469, "pos_frac": 0.78125, "sample": [-1.0071487426757812, -20.777267456054688, 20.3109130859375, -1.2404556274414062, -6.5294647216796875, 10.418502807617188, 27.928485870361328, 24.859619140625, -2.3689422607421875, 2.5632247924804688, 35.20703125, 22.379104614257812, 19.361953735351562, 10.602157592773438, 10.883781433105469, 12.929656982421875, 15.900604248046875, 7.5457763671875, 7.733573913574219, 14.904186248779297, 25.242095947265625, -27.260986328125, 28.906005859375, 7.355247497558594, -0.44842529296875, 1.7587203979492188, 50.91766357421875, -7.749485015869141, -2.9306602478027344, 4.066001892089844, 7.333702087402344, 41.75859069824219, 0.026712417602539062, 0.04771232604980469, -0.631591796875, 42.230533599853516, 12.703514099121094, 31.075050354003906, 67.55661010742188, -19.537403106689453, 55.30516052246094, 8.119937896728516, 32.78764343261719, -2.821685791015625, 37.009796142578125, 41.99076843261719, 12.590133666992188, -4.134435653686523, 31.169532775878906, 73.46232604980469, 7.6353912353515625, 8.368858337402344, 47.723724365234375, 16.08489990234375, 12.405157089233398, 20.12701416015625, 17.658859252929688, 2.6004581451416016, 23.225616455078125, -47.77754592895508, 16.995868682861328, 23.723403930664062, 19.055593490600586, 18.37335968017578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000612.npy"}
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 17.639934539794922, "std": 19.904071807861328, "min": -27.22174072265625, "p10": -3.42060546875, "median": 15.151670455932617, "p90": 47.91696548461914, "max": 62.946815490722656, "pos_frac": 0.828125, "sample": [-5.5284576416015625, 14.818397521972656, 0.7010269165039062, 14.4476318359375, 24.241472244262695, 21.89752197265625, 29.754905700683594, 15.957687377929688, -7.557243347167969, -0.6731758117675781, 19.24371337890625, 16.129779815673828, -27.22174072265625, 14.495849609375, 20.133319854736328, 30.008636474609375, 7.352682113647461, 10.484014511108398, 45.01410675048828, 9.715286254882812, 0.3299293518066406, -10.543657302856445, -19.851409912109375, 6.915140151977539, 56.08354949951172, 18.981290817260742, 47.21465301513672, 9.563032150268555, 11.041877746582031, 16.50592041015625, 48.21795654296875, 15.484943389892578, 22.341461181640625, -0.08354759216308594, -0.0212249755859375, 14.265743255615234, -3.2963943481445312, 38.975868225097656, 16.465787887573242, 62.946815490722656, 39.775115966796875, 36.77568435668945, 5.880931854248047, 32.93048095703125, 52.49223327636719, 2.594837188720703, 49.547996520996094, 21.261232376098633, -3.4738388061523438, 29.68753433227539, 2.5008010864257812, 3.24462890625, 21.03348159790039, 52.50244140625, 24.870498657226562, 14.431001663208008, 12.601531982421875, 41.851463317871094, 14.814577102661133, 61.626922607421875, 4.406059265136719, 22.667715072631836, 1.1433944702148438, -21.164047241210938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000613.npy"}
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 12.952991485595703, "std": 14.858039855957031, "min": -30.64981460571289, "p10": -2.7819885253906245, "median": 12.534196853637695, "p90": 31.1907133102417, "max": 55.8831787109375, "pos_frac": 0.8125, "sample": [27.10618019104004, 34.95838165283203, 7.227436065673828, 20.756946563720703, 9.829811096191406, 16.64191436767578, 5.307319641113281, 2.2313385009765625, 10.57332992553711, 8.051738739013672, 27.974517822265625, 15.294878005981445, 26.284393310546875, 30.896392822265625, 19.596328735351562, 13.777748107910156, 20.87313461303711, 27.92485809326172, 40.663536071777344, 29.182479858398438, 0.3804950714111328, 3.3534927368164062, -3.8797607421875, -17.656890869140625, 55.8831787109375, -2.9613037109375, 2.083089828491211, -3.75006103515625, 20.289926528930664, 13.376373291015625, 0.32756805419921875, 4.588253021240234, -1.4493408203125, 35.6429443359375, 8.495174407958984, 9.240407943725586, 31.316850662231445, 17.869918823242188, -30.64981460571289, 16.617080688476562, -2.1339187622070312, 36.09632873535156, 16.894668579101562, 28.983613967895508, 10.557579040527344, 15.596433639526367, 10.391021728515625, 11.569107055664062, 12.953903198242188, -2.36358642578125, 15.312705993652344, 8.701766967773438, -14.962654113769531, -1.5167617797851562, 3.630992889404297, 33.060028076171875, 18.490516662597656, -0.28000640869140625, 21.725515365600586, 12.114490509033203, 5.441246032714844, 16.955310821533203, -8.052482604980469, 25.555397033691406], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000614.npy"}
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 15.195127487182617, "std": 20.8251953125, "min": -31.327529907226562, "p10": -8.946376419067382, "median": 12.70637321472168, "p90": 47.397970581054686, "max": 59.34185791015625, "pos_frac": 0.78125, "sample": [13.307472229003906, 17.246932983398438, 26.474397659301758, 8.988265991210938, 15.662940979003906, 34.62553405761719, 12.407249450683594, -31.327529907226562, 49.40449523925781, 18.59410858154297, -8.23675537109375, 12.661773681640625, -18.346099853515625, 21.281417846679688, 8.137331008911133, -10.496307373046875, 10.135704040527344, 50.9461669921875, 14.732002258300781, -4.7835540771484375, 35.352115631103516, 3.6202926635742188, 38.242637634277344, -4.611211776733398, 27.01727294921875, 8.047119140625, -10.429149627685547, 14.601455688476562, 12.92236328125, 46.61837387084961, 28.831451416015625, 4.461431503295898, -0.26392364501953125, -9.250499725341797, 20.40821075439453, 33.812339782714844, 48.803680419921875, 6.191375732421875, 0.8536376953125, 48.39215850830078, 37.25147247314453, 53.051963806152344, 27.365386962890625, 22.99053955078125, 29.473251342773438, 7.050205230712891, 8.547767639160156, -3.9185943603515625, 3.7413101196289062, 39.71345138549805, 4.0775146484375, 59.34185791015625, -7.335479736328125, 12.750972747802734, 1.372406005859375, 47.59619140625, 7.665916442871094, 17.860078811645508, 46.935455322265625, 4.508615493774414, 11.85308837890625, -0.2371978759765625, -28.809616088867188, -25.395111083984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000615.npy"}
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 11.094036102294922, "std": 17.85677719116211, "min": -25.69011688232422, "p10": -8.98023338317871, "median": 7.129646301269531, "p90": 36.355694580078136, "max": 63.20939636230469, "pos_frac": 0.71875, "sample": [40.071983337402344, 6.064643859863281, 3.3970947265625, -2.2195053100585938, 20.311237335205078, -5.554815292358398, 63.20939636230469, 8.435821533203125, -0.5918846130371094, 9.049249649047852, 12.214143753051758, 1.8778953552246094, 11.757614135742188, -25.69011688232422, 1.2869224548339844, -17.369539260864258, 7.4295196533203125, -8.484363555908203, 1.81396484375, 48.87538146972656, 2.7561264038085938, 54.92266845703125, -9.864276885986328, -0.8867282867431641, 54.01542663574219, -4.395927429199219, 1.9058094024658203, -10.23431396484375, -1.0835018157958984, 37.314979553222656, 20.545921325683594, 5.1255035400390625, 9.151954650878906, 18.76921844482422, 39.12765121459961, 25.526790618896484, -10.194133758544922, 12.26953125, -2.264984130859375, 30.800630569458008, 21.008041381835938, 20.689556121826172, 6.82977294921875, 20.36725616455078, 34.11736297607422, 17.262451171875, 1.811838150024414, 1.9430656433105469, -10.114639282226562, 2.553913116455078, -2.4548377990722656, -9.1927490234375, 11.882987976074219, 2.47100830078125, 14.28720474243164, 24.568115234375, 11.574897766113281, 18.606014251708984, 26.729373931884766, -0.54150390625, 29.63886260986328, 19.266624450683594, 2.007904052734375, -4.487161636352539], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000616.npy"}
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 12.726191520690918, "std": 17.348806381225586, "min": -25.11815643310547, "p10": -4.4803369522094725, "median": 9.760604858398438, "p90": 39.68838882446291, "max": 63.048095703125, "pos_frac": 0.8125, "sample": [7.960334777832031, 7.736072540283203, -5.947174072265625, 14.187591552734375, 42.22601318359375, 10.437320709228516, 16.86236572265625, 28.392139434814453, 3.5944900512695312, -8.321441650390625, 3.4629135131835938, 22.234642028808594, 9.406448364257812, 15.07774543762207, 2.709064483642578, 18.359127044677734, 8.405517578125, 22.554100036621094, 7.784782409667969, 10.545360565185547, 46.253631591796875, 1.4780006408691406, 3.8329544067382812, 33.76726531982422, 11.433372497558594, -3.2421607971191406, 0.4372367858886719, 24.378093719482422, 1.6067123413085938, 19.38886260986328, 12.374414443969727, -3.589313507080078, 5.847404479980469, 45.213809967041016, 22.467300415039062, 63.048095703125, 2.8855438232421875, -9.157752990722656, -1.6980743408203125, 43.59259796142578, 25.34607696533203, 10.114761352539062, -4.371326446533203, 2.546855926513672, 42.847442626953125, 21.722639083862305, -21.547645568847656, -18.650253295898438, -25.11815643310547, 6.073341369628906, 17.639862060546875, 28.750396728515625, 30.15569305419922, -4.527055740356445, 53.226295471191406, 18.947546005249023, 10.889396667480469, 1.5176010131835938, 21.829444885253906, 4.20062255859375, 6.400472640991211, 7.217475891113281, 24.83656883239746, -3.5572357177734375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000617.npy"}
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 13.382397651672363, "std": 17.275997161865234, "min": -29.35906982421875, "p10": -3.847609710693358, "median": 9.459409713745117, "p90": 39.85997848510744, "max": 58.35832214355469, "pos_frac": 0.8125, "sample": [41.54327392578125, 35.932289123535156, 22.075626373291016, 15.225997924804688, 1.9866714477539062, 2.7466259002685547, -5.091556549072266, 4.412040710449219, 12.806427001953125, -0.5629653930664062, 6.077493667602539, 42.63323974609375, 30.047260284423828, 17.876808166503906, 53.07081985473633, 4.805419921875, 7.8202056884765625, 1.8484878540039062, 52.0909423828125, 23.732742309570312, 6.1152496337890625, 58.35832214355469, 4.040081024169922, 13.743675231933594, -2.2054977416992188, 41.731689453125, 6.800718307495117, 14.497154235839844, 18.727745056152344, 6.460081100463867, 0.0809783935546875, 0.1745452880859375, 31.152053833007812, 9.916114807128906, 23.814231872558594, 18.29052734375, 23.123947143554688, -4.253227233886719, 24.288116455078125, 7.987007141113281, -7.135347366333008, -0.5080604553222656, -2.9011688232421875, -10.752220153808594, 5.118015289306641, -15.3785400390625, 2.681406021118164, -29.35906982421875, 16.95618438720703, 24.51854705810547, -1.3992843627929688, 30.195205688476562, 9.903823852539062, 9.014995574951172, 8.345420837402344, 12.179180145263672, 1.06500244140625, 14.444618225097656, 14.763912200927734, 31.116138458251953, 24.25957489013672, 49.625396728515625, -11.17327880859375, 6.9716339111328125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000618.npy"}
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 12.893839836120605, "std": 17.198827743530273, "min": -28.796844482421875, "p10": -5.828961944580078, "median": 10.097282409667969, "p90": 37.459381866455075, "max": 54.97795104980469, "pos_frac": 0.765625, "sample": [15.830329895019531, -7.5772552490234375, 3.477518081665039, 4.6300201416015625, 54.36265563964844, 40.397613525390625, 30.525543212890625, -5.060661315917969, -15.425827026367188, 44.81158447265625, 21.832481384277344, 7.057155609130859, 29.6513671875, 3.9521007537841797, 25.92327880859375, -3.8772201538085938, 19.92291259765625, 15.465606689453125, -1.3027000427246094, -3.37249755859375, -9.757240295410156, 37.515281677246094, 3.9053421020507812, -6.158233642578125, -3.4362316131591797, 20.450843811035156, 21.46949577331543, 4.2648162841796875, 0.8900814056396484, 0.8503952026367188, 10.485816955566406, -28.796844482421875, -6.211212158203125, 18.111083984375, 28.152145385742188, 35.87395095825195, 9.708747863769531, 38.0532341003418, 54.97795104980469, -11.4801025390625, 10.608461380004883, 37.328948974609375, 13.171012878417969, -4.810970306396484, 45.04478454589844, 19.13658905029297, 8.878082275390625, 19.203514099121094, 16.411685943603516, 9.398193359375, 4.347442626953125, 12.82223129272461, 22.840972900390625, 2.6716384887695312, 33.710601806640625, 0.19741058349609375, 5.658622741699219, 26.22406005859375, -0.4955577850341797, 6.147371292114258, 3.60443115234375, 13.310760498046875, -1.087860107421875, 20.79001235961914], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000619.npy"}
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 11.368496894836426, "std": 18.840106964111328, "min": -22.67535400390625, "p10": -8.956119155883789, "median": 6.85416316986084, "p90": 41.07688102722169, "max": 58.10284423828125, "pos_frac": 0.71875, "sample": [4.000755310058594, -3.8572998046875, -12.304008483886719, 55.83488464355469, -15.9068603515625, 2.5633544921875, 7.2650299072265625, 42.91798400878906, 0.5480270385742188, 15.968435287475586, 39.55134582519531, 41.63974380493164, 10.609176635742188, 12.198394775390625, 4.2993316650390625, 39.76353454589844, 11.355133056640625, 6.443296432495117, -6.2503662109375, 1.2366447448730469, -10.39208984375, -0.2663459777832031, -22.67535400390625, 48.462303161621094, -13.285224914550781, 33.32854461669922, -1.7746238708496094, -21.292724609375, 58.10284423828125, -2.9827880859375, 23.400583267211914, 24.33264923095703, 0.4280834197998047, -5.30781364440918, -5.28228759765625, 4.069549560546875, 0.11099433898925781, 15.421661376953125, 33.405967712402344, 2.0498580932617188, 17.97889518737793, 7.78717041015625, 13.087852478027344, 9.267074584960938, 2.5460472106933594, -8.472404479980469, 46.9456787109375, 44.04762268066406, -0.8024501800537109, 21.070022583007812, -2.8253135681152344, 36.55328369140625, 2.2781982421875, -4.863887786865234, -9.16342544555664, 3.7093944549560547, 14.527687072753906, 4.885261535644531, 17.728626251220703, 21.99810791015625, 23.74530792236328, 19.95813751220703, 20.114713668823242, 7.751865386962891], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000620.npy"}
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 12.089757919311523, "std": 17.924449920654297, "min": -26.615745544433594, "p10": -6.145431327819824, "median": 10.505288124084473, "p90": 35.794802856445315, "max": 51.772281646728516, "pos_frac": 0.71875, "sample": [-1.3305244445800781, 5.0014495849609375, 5.542839050292969, -3.860919952392578, -13.796783447265625, -26.615745544433594, 19.414215087890625, 46.64806365966797, -8.141363143920898, 0.7909183502197266, 12.065631866455078, 36.054039001464844, 35.189918518066406, 3.957517623901367, 51.772281646728516, 11.627670288085938, 10.053647994995117, 12.213241577148438, 20.062320709228516, 28.144195556640625, 7.239360809326172, 3.1962890625, 28.4288330078125, -22.943557739257812, 37.02063751220703, 44.0802001953125, 31.830665588378906, -13.173187255859375, 22.358367919921875, -4.7558746337890625, 50.29791259765625, 20.94416046142578, 24.12078857421875, 23.883056640625, 18.859575271606445, 4.6750946044921875, 10.665637969970703, 2.049041748046875, 19.87049102783203, -5.9160308837890625, -6.243745803833008, 10.344938278198242, 29.80352020263672, 12.761749267578125, 2.3902454376220703, -0.3087272644042969, 8.642940521240234, -2.6105575561523438, -5.6192626953125, 13.27907943725586, -0.5565662384033203, 25.626708984375, 24.894195556640625, 1.9832115173339844, 24.201583862304688, -4.772844314575195, -1.4706802368164062, 48.0157470703125, -22.29273223876953, 18.62369155883789, -3.44915771484375, 2.9949569702148438, 18.0393009185791, 31.94281005859375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000621.npy"}
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 12.046436309814453, "std": 15.628266334533691, "min": -14.606884002685547, "p10": -5.26946506500244, "median": 10.197540283203125, "p90": 31.272362518310562, "max": 62.176544189453125, "pos_frac": 0.75, "sample": [20.534912109375, 5.859203338623047, -0.1159210205078125, -14.606884002685547, 14.185111999511719, 7.179634094238281, 5.218658447265625, 21.59575653076172, -6.200143814086914, 2.6959381103515625, 62.176544189453125, 17.210914611816406, 0.6595745086669922, -9.000568389892578, 23.58917999267578, 12.679340362548828, 27.284259796142578, 13.443614959716797, 14.3797607421875, 37.61244201660156, 11.92885971069336, 26.413930892944336, -4.429487228393555, 4.247644424438477, 43.62028503417969, -0.26963043212890625, 10.99810791015625, 17.047340393066406, -0.08169937133789062, 19.088741302490234, 46.77378845214844, -12.248260498046875, 14.002824783325195, -0.9496536254882812, -2.5324859619140625, 3.5243396759033203, 33.60406494140625, 25.081871032714844, 23.839981079101562, 5.878044128417969, 27.86602020263672, 13.438987731933594, 17.798336029052734, -4.161369323730469, 6.616359710693359, 21.10793685913086, 9.39697265625, -11.011604309082031, 0.9738197326660156, 18.553138732910156, 32.73222351074219, 2.2031707763671875, -2.4227848052978516, -5.62945556640625, 5.163335800170898, 2.328338623046875, 48.428802490234375, 7.960990905761719, -12.981637954711914, -0.31386566162109375, 23.38068389892578, 7.042348861694336, 19.721473693847656, 20.859725952148438], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000622.npy"}
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 13.872968673706055, "std": 16.009075164794922, "min": -16.349403381347656, "p10": -7.101817321777343, "median": 14.919857025146484, "p90": 32.5194091796875, "max": 51.36432647705078, "pos_frac": 0.78125, "sample": [17.89423370361328, 5.732917785644531, -3.103435516357422, -13.273887634277344, 12.256080627441406, 42.37396240234375, 16.93878173828125, 20.595172882080078, 24.024688720703125, 31.904808044433594, -8.17025375366211, 4.321685791015625, 25.083267211914062, 17.15418243408203, 1.9418792724609375, 51.36432647705078, -8.989009857177734, 31.139892578125, 34.130638122558594, 15.056999206542969, 16.541969299316406, 7.407175064086914, 25.32197380065918, -6.673866271972656, -7.285224914550781, 15.926321029663086, 24.46056365966797, 25.69195556640625, -13.721435546875, 32.499969482421875, 23.829940795898438, 44.85049057006836, 18.806488037109375, 32.527740478515625, 1.2462158203125, -16.349403381347656, 14.78271484375, -0.976837158203125, 29.788551330566406, 27.411705017089844, 31.547584533691406, 45.93104553222656, 12.917518615722656, 1.1681442260742188, -0.5812301635742188, 0.7876033782958984, -6.553901672363281, 7.619668960571289, 1.3212165832519531, 21.643997192382812, 25.0875244140625, 33.52442932128906, 15.701004028320312, 31.519607543945312, 1.2065296173095703, 14.748207092285156, 13.206085205078125, 2.9923057556152344, 8.414352416992188, 0.8840160369873047, 22.57677459716797, -7.675872802734375, -1.2139892578125, -3.3665847778320312], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000623.npy"}
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 14.060632705688477, "std": 18.245361328125, "min": -22.05833625793457, "p10": -8.454114532470703, "median": 14.7431640625, "p90": 38.270501708984376, "max": 58.659149169921875, "pos_frac": 0.703125, "sample": [-9.197639465332031, -9.261497497558594, 22.739643096923828, 13.802894592285156, 48.59357833862305, 45.37506866455078, -0.9962196350097656, 6.6465606689453125, -1.6696853637695312, -5.185102462768555, 15.285774230957031, 14.422893524169922, 17.244892120361328, 26.657020568847656, 23.92401123046875, 17.394256591796875, 11.199512481689453, 22.02899169921875, 35.375396728515625, 7.316581726074219, -1.0315818786621094, -1.0318031311035156, -12.940797805786133, 30.91820526123047, -1.0028266906738281, 16.755706787109375, 2.8470611572265625, -8.689888000488281, 6.634391784667969, 8.534629821777344, 37.6424560546875, 38.435699462890625, 19.518783569335938, 37.885040283203125, 15.063434600830078, -12.20443344116211, 47.88945007324219, 29.700420379638672, 58.659149169921875, 24.152488708496094, 19.43968391418457, 20.99828338623047, 20.63105010986328, 9.26690673828125, -7.9039764404296875, 21.978286743164062, 45.172515869140625, -1.1045150756835938, -3.1042652130126953, 23.23663330078125, 16.063629150390625, 4.363254547119141, -4.595157623291016, 1.2352867126464844, -22.05833625793457, 36.41029357910156, -1.6474761962890625, 0.9152145385742188, 37.75428771972656, -2.770618438720703, -19.609041213989258, 4.17222785949707, 22.567474365234375, 39.036293029785156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000624.npy"}
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 10.377538681030273, "std": 17.61495018005371, "min": -26.095325469970703, "p10": -10.38508949279785, "median": 8.860527038574219, "p90": 36.639861297607425, "max": 51.268951416015625, "pos_frac": 0.71875, "sample": [11.483819961547852, 29.807472229003906, 8.689361572265625, 10.394744873046875, -7.802225112915039, 18.368244171142578, 10.829887390136719, 4.952644348144531, 10.535238265991211, 40.7720947265625, 6.790378570556641, -3.1166839599609375, 28.93024444580078, -1.4655876159667969, 34.775146484375, 4.785209655761719, 49.71441650390625, 9.031692504882812, -8.868907928466797, -1.4631500244140625, 10.958122253417969, 36.213645935058594, 2.7345428466796875, -11.592880249023438, 1.5168380737304688, 11.840156555175781, 10.264312744140625, 30.50860595703125, 0.7736473083496094, 9.307891845703125, -4.309238433837891, 36.82252502441406, 30.97076416015625, -1.6091232299804688, 12.05044937133789, -7.36009407043457, -6.805667877197266, 44.1845703125, 5.979591369628906, 5.737598419189453, 18.65704345703125, 51.268951416015625, 41.88494110107422, 15.790590286254883, 21.454635620117188, -2.03778076171875, -20.93950653076172, -11.034881591796875, -12.40066146850586, 17.133399963378906, 1.1478958129882812, -17.437911987304688, -26.095325469970703, 40.2755241394043, 13.281845092773438, 2.6863250732421875, 12.958740234375, 3.761016845703125, 23.40879249572754, -13.919235229492188, 4.256278991699219, 27.520877838134766, 5.031118392944336, -7.820457458496094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000625.npy"}
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 15.226972579956055, "std": 22.17559814453125, "min": -21.120094299316406, "p10": -11.71422882080078, "median": 10.636898040771484, "p90": 43.56021232604981, "max": 75.54373168945312, "pos_frac": 0.796875, "sample": [28.13971710205078, 12.99627685546875, 37.64024353027344, -15.087955474853516, 75.54373168945312, 42.62617492675781, 45.889068603515625, -10.521730422973633, 20.426538467407227, 40.27906036376953, 0.6475620269775391, 2.294281005859375, 9.60003662109375, 0.6559925079345703, -11.355384826660156, 57.61843490600586, 21.694190979003906, -19.81511688232422, 42.59479522705078, 3.6658458709716797, -21.120094299316406, 19.36767578125, 12.413507461547852, 56.11211395263672, 9.697376251220703, 7.467897415161133, 21.46306037902832, 31.289459228515625, -15.731338500976562, 61.999420166015625, 10.760185241699219, -7.555624008178711, 26.111656188964844, 25.128585815429688, 6.222955703735352, -19.54681396484375, 0.03179931640625, 27.51422119140625, 0.1629486083984375, 15.078880310058594, 9.15726089477539, 6.655511856079102, 44.55486297607422, 6.000566482543945, -10.17681884765625, 10.51361083984375, 1.8295269012451172, 41.957950592041016, 22.282733917236328, 18.079193115234375, 30.53302001953125, 4.618526458740234, -19.350872039794922, 8.8824462890625, -11.868019104003906, 25.51171875, 3.1662559509277344, 41.70416259765625, 43.960514068603516, 16.52545166015625, 40.12034606933594, -10.762733459472656, 8.086616516113281, -9.855278015136719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000626.npy"}
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 17.156383514404297, "std": 21.125883102416992, "min": -27.991004943847656, "p10": -5.174419784545898, "median": 12.220932006835938, "p90": 51.653022766113295, "max": 71.76594543457031, "pos_frac": 0.8125, "sample": [11.899646759033203, 15.21908187866211, 26.980873107910156, 18.547805786132812, 12.542217254638672, 21.597293853759766, 57.839744567871094, 53.25883483886719, 47.9061279296875, 10.24554443359375, -9.88916015625, 35.957244873046875, 60.727020263671875, 21.65142822265625, 10.920587539672852, -1.45733642578125, 7.766838073730469, 9.010910034179688, 57.419525146484375, 2.1702194213867188, 27.971038818359375, 15.799654006958008, 6.340782165527344, 22.096160888671875, 3.049652099609375, -13.440750122070312, 42.043697357177734, -11.003746032714844, 28.381755828857422, -4.854137420654297, 2.0309619903564453, 58.009246826171875, 11.196823120117188, -0.4300384521484375, 8.587310791015625, 23.524627685546875, -1.5926589965820312, 3.0695724487304688, 1.4545650482177734, 28.440540313720703, 71.76594543457031, 21.183517456054688, -4.804859161376953, -7.80540657043457, -5.311683654785156, 8.7515869140625, 5.69696044921875, 35.41377258300781, 7.51544189453125, 23.090530395507812, 16.463485717773438, -27.991004943847656, 6.586406707763672, 25.443559646606445, 42.79534149169922, 10.47137451171875, 1.6309814453125, -24.43795394897461, 16.82257843017578, 8.424240112304688, 27.59979248046875, 34.028472900390625, 26.03699493408203, 57.64894104003906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000627.npy"}
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 16.378416061401367, "std": 22.10992431640625, "min": -48.88348388671875, "p10": -9.612175559997558, "median": 17.41211700439453, "p90": 43.54171447753907, "max": 68.98419189453125, "pos_frac": 0.796875, "sample": [68.98419189453125, 8.644058227539062, 23.86669921875, 20.829580307006836, -10.09817123413086, -19.063827514648438, 28.4017333984375, 3.84771728515625, 35.13020324707031, 31.143352508544922, 59.14562225341797, 53.357826232910156, 27.287796020507812, 11.28518295288086, 22.270517349243164, 11.768211364746094, -48.88348388671875, -27.520139694213867, 33.79258728027344, 4.203098297119141, 31.977277755737305, 17.193450927734375, 6.960226058959961, -13.644989013671875, 25.30565643310547, 1.8532867431640625, 13.3438720703125, 17.8216552734375, 11.428977966308594, 43.68384552001953, 19.667043685913086, 27.741180419921875, 6.461616516113281, 52.69536590576172, 11.51296615600586, 37.36175537109375, -3.717021942138672, 2.312030792236328, 24.619369506835938, -5.387847900390625, 23.52696990966797, 17.630783081054688, 13.550485610961914, 29.252601623535156, 30.2769775390625, 22.250106811523438, 47.836952209472656, 34.93037414550781, 43.21007537841797, 30.49234390258789, 1.7515010833740234, -2.9700469970703125, 4.648294448852539, -23.785837173461914, 12.704324722290039, 52.92695617675781, 5.6320037841796875, 21.329452514648438, -2.006256103515625, 13.588985443115234, -8.478185653686523, 39.85376739501953, -4.205089569091797, -23.3114013671875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000628.npy"}
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 13.168947219848633, "std": 15.20627498626709, "min": -11.618209838867188, "p10": -3.319825363159178, "median": 9.17689323425293, "p90": 36.77478790283203, "max": 58.816734313964844, "pos_frac": 0.859375, "sample": [0.19753265380859375, 5.691978454589844, -11.618209838867188, 17.651975631713867, 15.225326538085938, 9.775283813476562, 3.7859115600585938, 15.924388885498047, 2.174072265625, 14.2596435546875, 7.204170227050781, 12.894660949707031, 5.223766326904297, 58.816734313964844, 5.80272102355957, -4.858730316162109, 8.578502655029297, 16.825775146484375, 10.186752319335938, 15.165725708007812, -1.82513427734375, 0.017578125, -3.960407257080078, 3.8930435180664062, -7.001985549926758, 4.3121337890625, 37.001312255859375, 20.063720703125, 0.7610721588134766, 32.112525939941406, 28.67364501953125, -5.399864196777344, 16.583351135253906, 21.629209518432617, 18.738117218017578, 17.184829711914062, 11.617414474487305, -1.558807373046875, 8.35976791381836, 0.39511871337890625, 1.804168701171875, 3.921863555908203, 43.59522247314453, 8.371044158935547, 1.5642318725585938, 3.9818801879882812, 6.471473693847656, 22.59667205810547, 13.664949417114258, 28.181835174560547, 29.268638610839844, 3.3843421936035156, 41.1341552734375, 6.760698318481445, 20.777618408203125, 34.141357421875, -7.514305114746094, 47.20985412597656, 41.054725646972656, 38.69328689575195, 7.21630859375, 36.24623107910156, -10.86834716796875, 10.650093078613281], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000629.npy"}
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 14.149776458740234, "std": 17.312026977539062, "min": -14.123626708984375, "p10": -7.162645339965818, "median": 13.008697509765625, "p90": 34.34401092529297, "max": 57.95081329345703, "pos_frac": 0.765625, "sample": [-13.213420867919922, 13.889083862304688, 1.5794525146484375, -1.4064407348632812, 26.689781188964844, 34.21858215332031, 24.481300354003906, 0.00989532470703125, 55.315834045410156, 28.791458129882812, -2.8250732421875, 25.103946685791016, -9.142223358154297, 28.94214630126953, 3.8424224853515625, 32.55394744873047, -9.534748077392578, 13.887760162353516, 24.63971710205078, 11.85235595703125, 4.34173583984375, -1.5425338745117188, 24.79059600830078, 35.58648681640625, 33.76251220703125, 28.318954467773438, -11.614021301269531, 14.232198715209961, -14.123626708984375, 18.646785736083984, -1.2161483764648438, 15.518585205078125, -4.700946807861328, 40.559410095214844, 25.6627197265625, 4.6033782958984375, 33.787654876708984, 3.2808761596679688, 13.425567626953125, 11.646507263183594, 55.347503662109375, 17.941179275512695, 40.40122985839844, -13.737628936767578, 57.95081329345703, 28.28534698486328, 12.890518188476562, 6.2490692138671875, 16.86223030090332, -3.7860870361328125, 4.838691711425781, -8.217658996582031, -1.5071144104003906, 34.39776611328125, 17.866432189941406, -3.1414661407470703, 11.4102783203125, 7.175540924072266, 13.126876831054688, 28.97313690185547, 0.5508651733398438, 10.409881591796875, 4.319271087646484, 2.3365478515625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000630.npy"}
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 16.771381378173828, "std": 21.1657657623291, "min": -21.101821899414062, "p10": -7.983321380615234, "median": 13.369283676147461, "p90": 47.16246643066407, "max": 75.40496826171875, "pos_frac": 0.75, "sample": [28.095495223999023, 12.626129150390625, 4.8470458984375, 31.084022521972656, 13.838777542114258, 40.98095703125, 35.851226806640625, 22.610549926757812, -1.6692581176757812, 2.7123260498046875, -12.79632568359375, -13.177192687988281, 13.705724716186523, -4.7028961181640625, -4.396675109863281, 8.007972717285156, -7.546905517578125, 7.0788116455078125, 20.089813232421875, 45.24945068359375, 50.4268798828125, -3.789947509765625, 63.00572204589844, 28.389617919921875, 28.594802856445312, 24.80217742919922, 1.905059814453125, 47.982330322265625, -2.552518844604492, 11.456989288330078, 11.130229949951172, 9.085395812988281, -20.266204833984375, 19.340574264526367, 19.645824432373047, 33.31451416015625, 31.415679931640625, 14.071441650390625, 75.40496826171875, 32.240234375, 4.053192138671875, -17.215682983398438, -1.1640167236328125, 18.03973960876465, -0.5108642578125, -21.101821899414062, 11.493322372436523, 13.032842636108398, -13.093727111816406, 27.6575927734375, -8.170356750488281, 24.506881713867188, 4.793182373046875, 48.90440368652344, -5.864299774169922, 27.92902374267578, 10.341926574707031, 51.28717803955078, 15.029621124267578, 39.37785339355469, 12.727508544921875, 58.81925964355469, 12.604969024658203, 41.797882080078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000631.npy"}
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 16.63653564453125, "std": 18.075082778930664, "min": -22.702930450439453, "p10": -7.327958869934081, "median": 16.108678817749023, "p90": 40.579596328735356, "max": 57.60394287109375, "pos_frac": 0.828125, "sample": [19.00848388671875, 28.761764526367188, 15.613677978515625, 6.370063781738281, 22.516742706298828, 41.12806701660156, -2.6903324127197266, 35.0583381652832, -3.314067840576172, 31.539833068847656, 10.649688720703125, -18.825225830078125, 14.075286865234375, -1.8059349060058594, -9.682573318481445, 31.746387481689453, 40.03384017944336, 3.7410049438476562, 36.669151306152344, -18.308576583862305, 22.237823486328125, 11.439468383789062, -17.368167877197266, 16.568954467773438, 7.106101989746094, 29.085052490234375, 14.957550048828125, 28.522666931152344, 13.55147933959961, 16.10928726196289, 41.08343505859375, 48.193878173828125, 9.042926788330078, -7.790069580078125, 24.31970977783203, 25.211843490600586, 32.30247497558594, 14.69758415222168, 30.644996643066406, 19.70633888244629, 32.0245361328125, 30.481319427490234, 3.9957733154296875, 3.2619705200195312, 5.204851150512695, 40.81349182128906, 28.597793579101562, 9.166681289672852, 28.97711944580078, 29.620405197143555, 57.60394287109375, -19.084732055664062, 23.966949462890625, 42.52193069458008, -6.249700546264648, 2.8960647583007812, 47.45497512817383, 21.12238311767578, 16.108070373535156, 8.679536819458008, 7.387138366699219, 9.234428405761719, -22.702930450439453, 1.7471961975097656], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000632.npy"}
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 12.760246276855469, "std": 18.919981002807617, "min": -19.439483642578125, "p10": -8.649608612060543, "median": 9.948722839355469, "p90": 39.25323181152344, "max": 69.136474609375, "pos_frac": 0.703125, "sample": [-2.5912017822265625, 8.921867370605469, 12.263229370117188, 16.642486572265625, 21.039154052734375, 10.542373657226562, 12.122573852539062, 6.758308410644531, 2.9289989471435547, 13.260932922363281, 2.0774459838867188, -1.3354549407958984, 48.78132629394531, 9.355072021484375, -1.4115486145019531, 24.5281982421875, -5.0828704833984375, 6.4225311279296875, 10.936744689941406, -5.860038757324219, 4.941287994384766, -1.8490066528320312, 26.337493896484375, 29.858116149902344, 22.1876220703125, 15.202995300292969, -4.812145233154297, 3.912353515625, 19.08649253845215, 19.71569061279297, 19.354049682617188, 0.6908226013183594, 30.10630989074707, -1.234130859375, -3.281352996826172, 10.806926727294922, -3.4786224365234375, -9.845138549804688, -11.005447387695312, -13.846771240234375, -11.186027526855469, -0.21918106079101562, 1.3000297546386719, 69.136474609375, 21.296226501464844, 1.4597854614257812, 11.752504348754883, -13.819147109985352, 30.56397247314453, 41.79933166503906, 39.575767517089844, 19.21796417236328, 26.03966522216797, 38.500648498535156, 6.927875518798828, -11.718269348144531, 5.187442779541016, -2.1795692443847656, 37.971435546875, 56.70805740356445, 54.341278076171875, -19.439483642578125, 50.68849182128906, 19.602813720703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000633.npy"}
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 8.20103931427002, "std": 18.432077407836914, "min": -21.834495544433594, "p10": -12.105110931396483, "median": 2.477458953857422, "p90": 37.63853569030763, "max": 55.222007751464844, "pos_frac": 0.59375, "sample": [1.1630935668945312, -7.620429992675781, 5.23419189453125, 55.222007751464844, 24.69293212890625, -2.166799545288086, -2.344440460205078, -2.6815547943115234, -3.7210769653320312, -12.3895263671875, 3.5323143005371094, 2.9765548706054688, -4.030488967895508, -3.12713623046875, -0.6346893310546875, 5.887773513793945, 12.845447540283203, 10.713569641113281, -4.119148254394531, -5.852546691894531, -11.441474914550781, 0.20848846435546875, 24.464767456054688, -21.834495544433594, 46.059326171875, -1.2438697814941406, 31.389862060546875, 24.285491943359375, 38.87724304199219, -12.465534210205078, -5.4503173828125, 19.689760208129883, 2.9029312133789062, -6.169086456298828, 34.74821853637695, -3.743112564086914, -18.815624237060547, -18.150169372558594, 48.132537841796875, 34.186920166015625, 42.88169860839844, -7.7286376953125, -13.225662231445312, 14.269611358642578, 1.439361572265625, -19.358715057373047, 2.0519866943359375, 9.41358757019043, 43.68185806274414, 8.251941680908203, -7.45611572265625, -5.662384033203125, 9.525276184082031, 1.6772689819335938, 15.611852645874023, 1.2593154907226562, 30.137094497680664, 25.251354217529297, 19.15680694580078, 21.69910430908203, -2.725341796875, 9.455015182495117, 5.8773651123046875, 40.17095947265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000634.npy"}
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 10.881446838378906, "std": 21.186792373657227, "min": -26.476436614990234, "p10": -12.451723861694335, "median": 7.669792175292969, "p90": 37.630863571167, "max": 68.52537536621094, "pos_frac": 0.671875, "sample": [-10.196792602539062, 22.27039909362793, -18.90712547302246, -7.818691253662109, 33.2491455078125, 3.1422042846679688, -11.765750885009766, -22.31402587890625, -9.6295166015625, 42.6127815246582, 2.3201141357421875, 24.17034149169922, -1.8982696533203125, -8.471321105957031, 22.696258544921875, 37.99053192138672, 35.21208190917969, 8.87649917602539, 36.00860595703125, 19.87518310546875, -9.080535888671875, 1.04534912109375, -10.953285217285156, 62.58393096923828, -6.3481597900390625, -12.745712280273438, 19.501541137695312, 0.0516204833984375, -5.126354217529297, 10.917545318603516, 6.05499267578125, -15.58477783203125, 32.44423294067383, 7.691581726074219, -2.5167694091796875, 7.941669464111328, 36.7916374206543, 0.5874061584472656, 33.66893005371094, -26.476436614990234, 19.227676391601562, 45.00291442871094, -4.327838897705078, 57.62165832519531, 4.069976806640625, -9.517524719238281, 16.691421508789062, 17.36060333251953, 25.75341796875, 6.8293304443359375, 12.467891693115234, 68.52537536621094, 11.426254272460938, 7.648002624511719, -22.08057403564453, 12.450096130371094, 37.99992370605469, 3.028453826904297, 5.5387420654296875, 28.734066009521484, -19.046730041503906, 24.34333038330078, 20.361122131347656, -1.5660209655761719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000635.npy"}
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 15.140966415405273, "std": 17.325956344604492, "min": -20.471839904785156, "p10": -3.3299734115600574, "median": 12.328369140625, "p90": 37.57585411071778, "max": 63.99250793457031, "pos_frac": 0.828125, "sample": [26.317890167236328, -0.3035087585449219, 8.211515426635742, 3.9599838256835938, 26.07410430908203, 9.670799255371094, 18.769695281982422, 5.89996337890625, -1.279388427734375, 34.26497268676758, 16.5340576171875, -4.78672981262207, 22.568450927734375, 17.699325561523438, 15.294509887695312, 9.270156860351562, 16.647598266601562, 14.058502197265625, 9.973533630371094, 27.207374572753906, 16.290184020996094, 14.154655456542969, 1.9007434844970703, 20.69861602783203, 8.318321228027344, 16.218460083007812, 63.99250793457031, 0.03904533386230469, -2.287088394165039, 21.854705810546875, 7.0151214599609375, 8.606040954589844, 54.64012145996094, 12.50634765625, -3.7769241333007812, 37.19816589355469, 11.430709838867188, 21.119844436645508, -4.639152526855469, 49.134613037109375, 13.773340225219727, 37.73772048950195, -0.21610260009765625, 55.01409912109375, 3.072315216064453, -19.082687377929688, 48.348670959472656, 11.773773193359375, 12.147518157958984, 19.274032592773438, 12.150390625, 8.897796630859375, 33.2513427734375, 48.6978759765625, 18.794143676757812, -17.57756805419922, -20.471839904785156, 2.3649158477783203, 8.297285079956055, 34.720489501953125, 10.591667175292969, -7.294166564941406, 12.55398941040039, 11.735015869140625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000636.npy"}
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 11.723153114318848, "std": 20.820524215698242, "min": -38.3714599609375, "p10": -8.912668991088866, "median": 10.071759223937988, "p90": 49.471245956420915, "max": 63.748992919921875, "pos_frac": 0.703125, "sample": [15.44305419921875, 19.58074951171875, 17.358360290527344, 63.748992919921875, -1.3828659057617188, 11.899948120117188, -7.7247314453125, 15.381967544555664, 31.129852294921875, 5.72381591796875, -5.981819152832031, 2.8325881958007812, 12.260589599609375, -19.17656707763672, 10.686233520507812, 16.26174545288086, -1.3152618408203125, 12.17715835571289, 51.47092819213867, -0.8163928985595703, 5.0533905029296875, 12.140464782714844, -0.06949615478515625, 0.2096099853515625, -0.2548961639404297, 21.257064819335938, -5.654518127441406, 57.697898864746094, 35.58448791503906, 60.26933288574219, 56.93252182006836, 7.6200714111328125, -14.606796264648438, 13.3463134765625, -6.0103302001953125, 16.969703674316406, -18.72705078125, 5.463718414306641, 0.44945526123046875, 21.91307830810547, 9.30810546875, 33.473388671875, -2.955995559692383, 10.376285552978516, 21.67083740234375, 13.754554748535156, -14.119056701660156, 16.37200927734375, -9.24759292602539, 1.0467948913574219, 4.8437347412109375, 53.786033630371094, 9.767232894897461, 12.726425170898438, 11.249923706054688, -12.534980773925781, 44.805320739746094, 4.094823837280273, -1.5341033935546875, -38.3714599609375, 11.4207763671875, 53.75000762939453, -8.131179809570312, 5.587547302246094], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000637.npy"}
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 14.905263900756836, "std": 19.36924934387207, "min": -24.783912658691406, "p10": -6.262889289855955, "median": 10.081291198730469, "p90": 41.96503868103028, "max": 57.88075256347656, "pos_frac": 0.75, "sample": [-2.933013916015625, 32.336692810058594, 27.308425903320312, -0.4295082092285156, -24.783912658691406, -0.2658843994140625, 46.79896545410156, 9.482177734375, 4.874462127685547, -14.541522979736328, 3.7876663208007812, -7.089542388916016, 0.09497833251953125, 9.005477905273438, 3.2735061645507812, 28.93768310546875, -12.100631713867188, -8.923751831054688, 34.57377624511719, 0.0497589111328125, 20.995269775390625, 24.040237426757812, 40.78788757324219, 8.294754028320312, -20.995765686035156, 54.01057434082031, 20.149635314941406, 42.46953201293945, 2.7749996185302734, 8.611045837402344, 10.600545883178711, 40.0838623046875, 16.76361083984375, 50.74039077758789, 24.3006591796875, -4.33403205871582, 0.6620101928710938, 25.924903869628906, 10.282546997070312, 25.766613006591797, 27.80706024169922, 27.646484375, 36.70505142211914, -3.249683380126953, 31.552711486816406, -0.6552505493164062, 5.729278564453125, 43.42418670654297, 9.880035400390625, 29.840850830078125, -2.266387939453125, 28.515666961669922, 52.96287536621094, 12.058914184570312, -7.600242614746094, 18.45081329345703, -0.88702392578125, 57.88075256347656, 37.4727783203125, 3.3609352111816406, 1.5205841064453125, 3.3309097290039062, -1.4484176635742188, 10.518938064575195], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000638.npy"}
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 14.60429573059082, "std": 19.467851638793945, "min": -18.6865234375, "p10": -5.652567672729492, "median": 9.844310760498047, "p90": 42.00243682861329, "max": 67.8934326171875, "pos_frac": 0.765625, "sample": [22.03557014465332, 26.252655029296875, 7.240104675292969, 16.55584716796875, 22.20423126220703, 17.506690979003906, 10.925907135009766, -5.616596221923828, -5.515722274780273, 23.861568450927734, 1.3189849853515625, 14.016225814819336, 67.51705932617188, 9.421249389648438, 18.580127716064453, 1.17132568359375, 17.62730598449707, -1.1880741119384766, -13.024765014648438, 12.194267272949219, 9.274791717529297, 46.37282180786133, 1.605133056640625, 38.167205810546875, -6.529315948486328, 25.277854919433594, 8.92861557006836, 23.17218017578125, -5.6679840087890625, 42.92771911621094, 53.219276428222656, -11.050622940063477, 3.31646728515625, 21.26593780517578, -2.0338668823242188, 3.8918418884277344, 14.751956939697266, 4.255889892578125, 67.8934326171875, 29.640869140625, 33.88618850708008, 7.4419403076171875, 47.656776428222656, 10.267372131347656, -0.42926025390625, 6.370197296142578, -1.471588134765625, -4.704660415649414, 11.90057373046875, 2.4012107849121094, 36.140159606933594, 27.44795036315918, 4.5866241455078125, 7.071678161621094, 6.4799041748046875, -18.6865234375, 5.934452056884766, -4.661163330078125, -9.550514221191406, 59.93537902832031, -13.483657836914062, 24.68804931640625, 23.846227645874023, 39.84344482421875], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000639.npy"}
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 11.297544479370117, "std": 18.625539779663086, "min": -51.92995834350586, "p10": -8.708750343322752, "median": 12.370708465576172, "p90": 33.01049919128418, "max": 45.67566680908203, "pos_frac": 0.796875, "sample": [27.000118255615234, 21.129772186279297, 32.73847961425781, -24.24314308166504, 1.0074996948242188, 13.373092651367188, 28.040088653564453, 3.6869354248046875, 23.33056640625, 37.09059143066406, 0.8339080810546875, 0.22951507568359375, 24.235979080200195, 13.702259063720703, -17.49541473388672, 12.360031127929688, 17.319368362426758, 18.695999145507812, 1.2514228820800781, 45.67566680908203, -51.92995834350586, 6.688102722167969, 36.96748352050781, 8.019561767578125, -9.82989501953125, -6.952543258666992, 18.902053833007812, 0.8303604125976562, -11.840967178344727, 3.6662025451660156, 30.47574806213379, -5.8593902587890625, 6.0572662353515625, 27.314437866210938, 14.490470886230469, 0.4711189270019531, -0.6109466552734375, 11.774795532226562, 6.5390625, 11.715805053710938, 4.050994873046875, 29.614974975585938, 2.2965087890625, 12.873504638671875, 30.37030029296875, 17.795997619628906, 23.87920379638672, -40.48229217529297, 12.381385803222656, 20.083026885986328, 18.018421173095703, 39.57865905761719, 14.519332885742188, 27.76177978515625, -0.7020416259765625, -5.675565719604492, 44.20646667480469, -9.461410522460938, -5.16143798828125, 31.56626319885254, 1.2971611022949219, 33.127079010009766, 7.286565780639648, 36.966407775878906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000640.npy"}
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 12.104511260986328, "std": 16.47960090637207, "min": -31.326828002929688, "p10": -1.7524673461914058, "median": 11.063441276550293, "p90": 32.34098434448242, "max": 64.35055541992188, "pos_frac": 0.8125, "sample": [-11.830547332763672, -12.681144714355469, 25.823997497558594, 16.002483367919922, 23.124008178710938, -0.28440093994140625, 23.063175201416016, 4.524208068847656, -4.731559753417969, 3.1501998901367188, 3.1931304931640625, 6.820165634155273, 13.170783996582031, 41.84352111816406, 8.054244995117188, 10.853979110717773, -1.9249114990234375, 40.450782775878906, 48.27344512939453, 10.68252182006836, 19.426979064941406, 1.4624137878417969, 3.1954803466796875, 4.0459747314453125, 14.027145385742188, 25.64263153076172, 11.272903442382812, 13.833541870117188, 14.684684753417969, 18.94892692565918, -0.2797088623046875, 31.753173828125, 12.711578369140625, 15.74345588684082, -0.0149383544921875, -23.63811492919922, 13.558574676513672, 12.973052978515625, -15.722816467285156, 2.6890945434570312, -31.326828002929688, 35.26000213623047, 16.77301788330078, 6.5391845703125, 7.349662780761719, 54.14438247680664, 0.20110321044921875, -0.03278923034667969, 20.0234375, 12.703338623046875, 64.35055541992188, 16.094636917114258, 32.59290313720703, 17.996755599975586, -1.35009765625, 8.529548645019531, 14.518096923828125, 9.671104431152344, 4.339885711669922, 6.67230224609375, 7.271965026855469, 4.09246826171875, 17.169553756713867, 27.212343215942383], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000641.npy"}
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 10.146419525146484, "std": 19.234703063964844, "min": -43.270416259765625, "p10": -10.370203399658203, "median": 7.549312591552734, "p90": 39.427388000488286, "max": 49.736671447753906, "pos_frac": 0.640625, "sample": [27.695083618164062, 13.759117126464844, 2.6486854553222656, -12.722137451171875, 5.534523010253906, -6.153799057006836, 16.265975952148438, 39.63594055175781, -3.8171539306640625, -43.270416259765625, 38.2310791015625, 4.0586700439453125, 7.1259613037109375, -8.984085083007812, 14.419662475585938, 0.14752960205078125, 31.343887329101562, 44.42645263671875, -8.304216384887695, -15.314746856689453, 5.9259185791015625, 13.99139404296875, 27.758804321289062, -2.725475311279297, 46.70802307128906, 15.739639282226562, 49.736671447753906, 6.9675445556640625, 38.940765380859375, 12.177078247070312, 13.515106201171875, 16.751556396484375, 18.599334716796875, 4.361286163330078, -3.9694061279296875, 41.337974548339844, -17.411285400390625, 7.972663879394531, 8.786888122558594, -7.8064117431640625, -2.7974319458007812, 6.9385833740234375, -2.946714401245117, -5.9518585205078125, 19.471956253051758, 12.76030158996582, -10.501693725585938, 25.6169490814209, 37.25596618652344, 45.653289794921875, -1.052276611328125, -8.582542419433594, 23.606361389160156, 34.00071716308594, 42.62310028076172, 10.878482818603516, -10.063392639160156, 15.605705261230469, -14.911134719848633, -13.857742309570312, -0.30942535400390625, -3.6522903442382812, -7.2284698486328125, 12.73028564453125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000642.npy"}
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 12.842626571655273, "std": 18.468290328979492, "min": -27.002033233642578, "p10": -7.9164581298828125, "median": 10.523992538452148, "p90": 41.034864807128905, "max": 51.84295654296875, "pos_frac": 0.75, "sample": [3.5414810180664062, 15.963010787963867, 7.470630645751953, -2.10198974609375, 3.630965232849121, 41.0826416015625, 7.279581069946289, 20.938724517822266, 25.985702514648438, -1.7065505981445312, 13.597305297851562, 9.249059677124023, 13.209003448486328, 5.5981597900390625, 32.3743896484375, 46.03387451171875, -2.0243682861328125, 0.7635669708251953, 37.94269561767578, 51.84295654296875, 9.429252624511719, 12.703424453735352, 34.82020950317383, 11.618732452392578, -5.522224426269531, 43.52081298828125, 20.51995086669922, -16.336700439453125, 46.52391815185547, 26.63385009765625, 4.497920989990234, 16.1351261138916, 8.470367431640625, 21.975139617919922, 4.692800521850586, -1.5556678771972656, 4.640279769897461, 13.643768310546875, 41.55583190917969, 40.92338562011719, 34.44615173339844, -13.909645080566406, 28.533950805664062, -6.355525970458984, 18.003334045410156, -8.563919067382812, -20.626235961914062, -27.002033233642578, 3.807279586791992, 13.105182647705078, -7.4786376953125, 16.985071182250977, 30.631317138671875, 15.532445907592773, 3.1559600830078125, -19.841079711914062, -8.104095458984375, 38.50096893310547, 0.42127227783203125, 8.689430236816406, -4.748622894287109, 44.4359130859375, 17.677536010742188, -4.928985595703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000643.npy"}
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 11.211599349975586, "std": 19.923328399658203, "min": -36.53839874267578, "p10": -10.326728057861327, "median": 9.907654762268066, "p90": 39.02563171386719, "max": 55.728187561035156, "pos_frac": 0.75, "sample": [22.510581970214844, -5.88505744934082, -30.722084045410156, 23.648788452148438, -5.761585235595703, -16.417682647705078, 10.066802978515625, 16.32978630065918, 40.72858810424805, 35.50605773925781, 17.430267333984375, -2.2688560485839844, 27.281692504882812, 37.296443939208984, 10.161956787109375, 5.943717956542969, 7.569202423095703, 6.848823547363281, -3.7444381713867188, 44.931541442871094, 0.2445545196533203, 29.322601318359375, -28.16527557373047, 0.9995574951171875, -1.6806793212890625, 21.379745483398438, 0.045257568359375, 11.455928802490234, 1.3463783264160156, -11.48248291015625, 33.62509536743164, -10.27178955078125, 11.796443939208984, 24.76526641845703, 29.323165893554688, -8.019281387329102, 2.67340087890625, 51.06777572631836, 42.79410171508789, 4.101816177368164, 16.397171020507812, 31.17193603515625, 4.356819152832031, 9.748506546020508, -0.9231338500976562, -36.53839874267578, -2.1103897094726562, 34.55178451538086, 12.53903579711914, 39.7667121887207, 6.6136016845703125, 3.5881500244140625, 0.6542396545410156, -10.350273132324219, 13.07659912109375, 12.810417175292969, -22.99822998046875, 11.352611541748047, 14.754302978515625, 0.4209442138671875, 3.3697509765625, 53.29063415527344, 19.4952392578125, 55.728187561035156], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000644.npy"}
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 12.907305717468262, "std": 17.643590927124023, "min": -28.928192138671875, "p10": -7.3326877593994135, "median": 10.6954984664917, "p90": 36.01204833984375, "max": 63.94287109375, "pos_frac": 0.78125, "sample": [14.664127349853516, 34.46196746826172, 28.528587341308594, -11.171905517578125, 24.894607543945312, 63.94287109375, 18.533470153808594, 8.859235763549805, 7.252723693847656, -0.2164764404296875, -12.01003646850586, 7.8638153076171875, 8.560308456420898, 0.6870307922363281, 37.3709716796875, 4.561550140380859, 21.103899002075195, 5.3443603515625, -0.8305892944335938, -1.1030216217041016, 41.4564208984375, -7.4117584228515625, 6.485755920410156, 11.484123229980469, 19.782203674316406, -8.052841186523438, 7.146965026855469, 11.4453125, 33.83716583251953, 22.46575164794922, 7.249351501464844, 1.6173210144042969, 19.035655975341797, 0.1324005126953125, 2.4610443115234375, 23.927658081054688, -16.2596435546875, -1.1174545288085938, -9.950668334960938, 27.601730346679688, 37.610801696777344, 15.862319946289062, 59.30403137207031, 0.0959014892578125, 4.526233673095703, 17.340253829956055, 20.808589935302734, 9.945684432983398, 13.468032836914062, 16.345016479492188, 0.4954376220703125, 27.074989318847656, 12.445606231689453, 50.84397888183594, 20.487762451171875, -2.7307510375976562, 35.91876983642578, -7.148189544677734, 16.817230224609375, -28.928192138671875, 3.1396255493164062, 36.052024841308594, 18.197202682495117, -6.5387725830078125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000645.npy"}
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 13.602485656738281, "std": 18.535757064819336, "min": -33.40162658691406, "p10": -4.665541076660156, "median": 10.166253089904785, "p90": 34.65937805175781, "max": 62.477020263671875, "pos_frac": 0.75, "sample": [-1.1858291625976562, 1.030141830444336, 9.758155822753906, 22.984275817871094, 34.81840515136719, 26.017410278320312, -12.586074829101562, 28.224334716796875, 31.91150665283203, 34.28831481933594, -2.886932373046875, 10.652374267578125, 27.966529846191406, 45.39210510253906, 14.984344482421875, 2.3635482788085938, 15.013832092285156, 4.62493896484375, 47.990577697753906, 19.89336585998535, 30.454673767089844, 16.263198852539062, -25.3010311126709, -3.72943115234375, -2.0107078552246094, 28.365745544433594, 8.31216812133789, -2.5244293212890625, 6.978363037109375, 4.667198181152344, 9.920135498046875, 8.299003601074219, 19.894325256347656, 39.34453201293945, 27.192138671875, 1.4890365600585938, 8.062356948852539, -4.795234680175781, 10.39842414855957, 21.517288208007812, -0.6197452545166016, -6.9654388427734375, 16.54473876953125, 7.7548980712890625, 25.691665649414062, 62.477020263671875, 27.164276123046875, 31.70195770263672, 6.776363372802734, 50.404075622558594, 30.446874618530273, -0.129119873046875, 11.627273559570312, 9.93408203125, -33.40162658691406, 1.4935760498046875, 33.21949005126953, -21.451597213745117, -4.362922668457031, 47.098731994628906, -9.835872650146484, -0.8661880493164062, 7.506565093994141, 14.296955108642578], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000646.npy"}
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 16.612897872924805, "std": 19.266626358032227, "min": -14.430633544921875, "p10": -5.58973217010498, "median": 12.920503616333008, "p90": 42.723348999023465, "max": 69.19465637207031, "pos_frac": 0.765625, "sample": [3.7775001525878906, 15.231842041015625, 31.948833465576172, 11.550765991210938, 23.818496704101562, 19.997215270996094, -5.066204071044922, 59.22309112548828, -0.403594970703125, 45.86769104003906, 9.61651611328125, 9.564483642578125, 13.302536010742188, -1.4617347717285156, -0.935516357421875, 18.806854248046875, 65.35693359375, -1.378030776977539, -11.856319427490234, -10.132843017578125, 29.74981117248535, 17.515914916992188, 30.153945922851562, 4.395168304443359, 54.885772705078125, 30.1051025390625, 35.38655090332031, -5.809747695922852, 31.10555648803711, -0.5270843505859375, 10.122528076171875, -1.0677947998046875, 22.6859130859375, 2.5758438110351562, 15.399368286132812, -10.190559387207031, 12.538471221923828, 32.94929122924805, 6.289787292480469, -5.076362609863281, 23.01805305480957, 21.17847442626953, 9.386009216308594, 25.97101593017578, -14.430633544921875, 7.249761581420898, 18.119781494140625, 29.38791275024414, 6.481025695800781, 12.271383285522461, -6.0594329833984375, 51.624794006347656, 21.23807144165039, -8.7833251953125, 3.882049560546875, 31.798294067382812, 2.4571056365966797, 69.19465637207031, 9.541709899902344, 28.039581298828125, 54.26837158203125, 6.0345916748046875, 26.392881393432617, 24.94732666015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000647.npy"}
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 14.107415199279785, "std": 19.092187881469727, "min": -40.80250930786133, "p10": -5.069654846191406, "median": 14.875848770141602, "p90": 40.59457550048828, "max": 58.83907699584961, "pos_frac": 0.765625, "sample": [22.056068420410156, 24.72814178466797, 15.106193542480469, -40.80250930786133, 35.716644287109375, 0.0714263916015625, -22.291976928710938, 12.99277114868164, 32.662445068359375, 19.15267562866211, -4.769145965576172, 15.101734161376953, 11.081398010253906, -1.7159881591796875, -3.9324111938476562, -18.571455001831055, 19.635883331298828, -3.5639495849609375, 53.45820617675781, 37.001251220703125, 58.83907699584961, 11.972541809082031, 22.482101440429688, 41.08366394042969, 33.53834533691406, 7.567989349365234, 5.157501220703125, 45.20563507080078, 9.760673522949219, 49.091636657714844, 4.991462707519531, -1.2385387420654297, 39.453369140625, 42.30560302734375, 2.305601119995117, -10.89910888671875, 17.707839965820312, 10.253852844238281, 21.268693923950195, 28.384078979492188, 17.211387634277344, 22.391361236572266, 43.900657653808594, -2.5447998046875, -6.780792236328125, 4.898826599121094, 18.330917358398438, 26.705339431762695, 6.5555572509765625, 19.70098876953125, 14.64996337890625, 25.983924865722656, 17.53437042236328, -21.140975952148438, 0.466766357421875, 20.593276977539062, 29.179725646972656, -2.071868896484375, 3.473377227783203, -5.198444366455078, 2.86285400390625, -1.0185222625732422, 7.661041259765625, 17.180191040039062], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000648.npy"}
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 13.503532409667969, "std": 19.316307067871094, "min": -30.908859252929688, "p10": -12.10204734802246, "median": 15.218925476074219, "p90": 40.005086517333986, "max": 51.82832717895508, "pos_frac": 0.75, "sample": [32.111846923828125, 21.268320083618164, -3.322399139404297, 19.562705993652344, 30.223825454711914, 18.716236114501953, 25.078086853027344, 13.473455429077148, 1.9208793640136719, 37.328041076660156, 0.48696327209472656, 44.96281433105469, 15.286834716796875, -17.97518539428711, 17.83002471923828, 15.151016235351562, 17.195507049560547, 27.873680114746094, 19.661638259887695, 18.59955596923828, 49.29048156738281, -9.978857040405273, -13.223617553710938, -15.405250549316406, -30.908859252929688, 28.79039764404297, 25.429771423339844, 5.920949935913086, -7.622463226318359, -10.53887939453125, 33.96438980102539, -1.1088333129882812, 41.86281967163086, 0.987823486328125, 27.565250396728516, 14.297119140625, 17.394580841064453, 27.97256088256836, 39.66259002685547, 18.534032821655273, 5.194671630859375, 51.82832717895508, 25.00029754638672, -10.472126007080078, -1.26812744140625, 40.15187072753906, 2.331880569458008, 2.358978271484375, 7.586219787597656, 3.6159839630126953, -15.706192016601562, 46.85401916503906, -12.771976470947266, 6.6371917724609375, 49.85435485839844, -10.185810089111328, -14.737709045410156, -6.916351318359375, 21.672229766845703, 23.304203033447266, 31.620399475097656, 10.898857116699219, 4.850002288818359, 4.204986572265625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000649.npy"}
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 14.12302017211914, "std": 20.496578216552734, "min": -30.329193115234375, "p10": -7.715150451660156, "median": 10.804431915283203, "p90": 40.99340744018556, "max": 60.371986389160156, "pos_frac": 0.765625, "sample": [-10.06430435180664, 38.49217224121094, 4.995330810546875, 41.80229187011719, 2.095745086669922, 42.13038635253906, 16.917373657226562, 3.057861328125, 8.609306335449219, 1.740478515625, 32.562591552734375, 30.142200469970703, -7.8654022216796875, 12.784961700439453, 1.2378482818603516, 33.269126892089844, 36.28517532348633, 16.403390884399414, 30.900733947753906, 2.112813949584961, 13.710628509521484, 15.800432205200195, -5.524423599243164, -5.5323028564453125, 49.3299446105957, 33.936309814453125, 60.371986389160156, 25.61111068725586, 27.686134338378906, -2.5410308837890625, 2.9589462280273438, -30.120452880859375, 4.165172576904297, 36.07335662841797, -30.329193115234375, 6.1648712158203125, 55.300537109375, 25.42885971069336, 3.606048583984375, -23.438560485839844, 16.977127075195312, 37.51139831542969, -17.038970947265625, 39.10601043701172, -8.934242248535156, -7.198356628417969, 44.17839050292969, 26.544979095458984, 7.02520751953125, -0.40288352966308594, 14.388601303100586, 2.3958053588867188, 26.865673065185547, -0.22226333618164062, 2.691934585571289, 14.347063064575195, -7.36456298828125, 20.044967651367188, 21.993328094482422, 55.89457702636719, 1.916015625, 5.666385650634766, 8.823902130126953, -1.6052932739257812], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000650.npy"}
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 12.482656478881836, "std": 17.645763397216797, "min": -24.069725036621094, "p10": -6.030487060546875, "median": 10.355790138244629, "p90": 34.86600799560547, "max": 58.77046585083008, "pos_frac": 0.734375, "sample": [0.5081958770751953, 1.1336669921875, 12.840057373046875, 7.6088409423828125, 17.986831665039062, -24.069725036621094, -6.126319885253906, 10.399389266967773, -1.9677352905273438, 29.53615379333496, 4.859527587890625, 5.1001739501953125, -10.966514587402344, -1.9077606201171875, 25.790180206298828, -23.687515258789062, -0.9854393005371094, 28.238983154296875, 13.231269836425781, 23.220199584960938, 18.188980102539062, -1.6431884765625, -7.656471252441406, 8.895492553710938, 8.763614654541016, 0.9719352722167969, 23.45684814453125, 26.801681518554688, -2.67828369140625, 35.1446533203125, 17.80220603942871, 43.625267028808594, 12.068073272705078, 1.9840164184570312, 15.632957458496094, 0.5342025756835938, 33.18483352661133, 19.98717498779297, -5.806877136230469, 14.405719757080078, 43.43919372558594, 17.40436553955078, 29.311809539794922, 7.025381088256836, -2.9802589416503906, 58.77046585083008, 58.060585021972656, 34.21583557128906, -4.030967712402344, 35.48358154296875, 5.153064727783203, -0.5686492919921875, 1.8175468444824219, 24.05047607421875, 14.792465209960938, 18.388832092285156, 27.465625762939453, 10.312191009521484, 11.846023559570312, -14.211334228515625, 8.677143096923828, -8.1475830078125, 50.293724060058594, -2.0847930908203125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000651.npy"}
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 11.890359878540039, "std": 18.39470672607422, "min": -24.3248291015625, "p10": -9.297734069824218, "median": 9.27243423461914, "p90": 39.6195240020752, "max": 52.41764831542969, "pos_frac": 0.703125, "sample": [-1.0967292785644531, 38.66320037841797, 42.993865966796875, 13.498054504394531, -1.885223388671875, 40.02937698364258, 19.197519302368164, 11.106632232666016, 52.41764831542969, 2.1651153564453125, 27.91577911376953, -14.301639556884766, 1.8078804016113281, 19.83665657043457, 20.815826416015625, 22.436553955078125, -1.3176345825195312, 27.25152587890625, 10.697471618652344, -10.2352294921875, -8.325065612792969, 24.205230712890625, 17.636871337890625, 0.3897056579589844, 2.162792205810547, 0.749420166015625, -24.3248291015625, -7.373443603515625, 34.322845458984375, 12.248153686523438, 9.21722412109375, -4.869968414306641, 11.063610076904297, 9.327644348144531, 22.62303924560547, -6.1788177490234375, 47.215538024902344, 7.663433074951172, -4.817502975463867, -15.350860595703125, -0.8622055053710938, 48.9133186340332, 7.038766860961914, 15.369003295898438, 2.0471420288085938, 0.14440536499023438, 34.705406188964844, 17.352386474609375, -0.5000419616699219, 31.98377227783203, 51.560577392578125, 4.146736145019531, 21.407737731933594, 3.2683486938476562, -5.843269348144531, 22.00257110595703, -9.789108276367188, 8.536531448364258, -9.714591979980469, 48.98680877685547, 18.490196228027344, -4.9382781982421875, -15.896751403808594, 22.991928100585938], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000652.npy"}
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 13.12789535522461, "std": 19.3145694732666, "min": -29.243640899658203, "p10": -6.428158569335936, "median": 7.998622894287109, "p90": 43.49638061523438, "max": 61.286834716796875, "pos_frac": 0.8125, "sample": [23.400575637817383, 44.819252014160156, 0.380767822265625, 3.2094268798828125, 8.782760620117188, 12.478057861328125, 15.573905944824219, 7.7064208984375, 48.46550750732422, 2.4827880859375, -13.295909881591797, 4.2384185791015625, 38.602718353271484, -0.46094512939453125, 41.42817687988281, 29.054000854492188, 18.129653930664062, 44.04576110839844, 4.791606903076172, 42.21449279785156, 28.399368286132812, 47.58125305175781, 7.686929702758789, -1.6758995056152344, 8.714435577392578, 1.7143478393554688, 29.70807647705078, 0.29436492919921875, -0.53594970703125, 2.1539764404296875, 61.286834716796875, -15.435897827148438, -5.561927795410156, 6.825859069824219, 19.856691360473633, -14.442237854003906, 8.290824890136719, 11.086174011230469, 19.4808349609375, 6.831493377685547, 4.067047119140625, -27.625831604003906, 31.945709228515625, 19.739761352539062, 22.964508056640625, -29.243640899658203, 44.407493591308594, 0.0102996826171875, 51.6883544921875, 6.729339599609375, 2.0357437133789062, 37.540374755859375, -17.41876220703125, 0.7112960815429688, -1.8173561096191406, 1.4127330780029297, 21.42409896850586, 16.035507202148438, 4.817333221435547, 17.108367919921875, 7.063087463378906, 19.804065704345703, -6.799400329589844, 15.278129577636719], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000653.npy"}
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 12.512324333190918, "std": 18.950515747070312, "min": -18.38800048828125, "p10": -7.747857093811033, "median": 7.267890930175781, "p90": 39.08328247070313, "max": 72.30426025390625, "pos_frac": 0.75, "sample": [4.628639221191406, 36.974822998046875, -18.38800048828125, 16.601360321044922, 2.1176986694335938, -9.466899871826172, 31.288909912109375, 11.556739807128906, 10.795726776123047, -2.820556640625, 1.5885772705078125, -8.677940368652344, 33.47401428222656, 7.4931182861328125, -1.6632156372070312, 12.081110000610352, 23.423110961914062, -10.897045135498047, 39.986907958984375, -0.07912445068359375, 3.5173873901367188, 6.769115447998047, 31.37187957763672, 57.347251892089844, 0.5027179718017578, 5.841678619384766, -0.15598106384277344, -4.895195007324219, -2.5244522094726562, 11.79071044921875, -0.6353893280029297, 22.036468505859375, 72.30426025390625, 9.821281433105469, 20.517227172851562, 22.836769104003906, 23.797943115234375, -6.050067901611328, 2.3151702880859375, 4.3179168701171875, 27.067659378051758, -17.44390869140625, 8.745346069335938, -8.475481033325195, 6.260807037353516, 1.5241012573242188, 53.356563568115234, 49.26020812988281, 13.624942779541016, 10.227699279785156, 18.814849853515625, 48.47191619873047, 4.769634246826172, 11.152137756347656, 54.749305725097656, -11.919048309326172, 5.133567810058594, 5.665493011474609, 4.3140106201171875, 30.786026000976562, 10.024524688720703, 7.04266357421875, 10.716140747070312, -3.925079345703125], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000654.npy"}
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 14.149422645568848, "std": 15.549736976623535, "min": -13.718832969665527, "p10": -3.4479774475097655, "median": 10.511802673339844, "p90": 30.13547821044922, "max": 73.53933715820312, "pos_frac": 0.84375, "sample": [28.403654098510742, 7.7176361083984375, -0.8937225341796875, 5.135835647583008, 11.905006408691406, 41.159759521484375, 27.421463012695312, 22.634159088134766, 10.308586120605469, 6.22991943359375, -1.2428512573242188, 17.517372131347656, 4.979637145996094, 9.295387268066406, -6.379425048828125, -7.636516571044922, 11.25001335144043, 30.253700256347656, 44.35955047607422, 5.761297225952148, 35.23495101928711, -3.4385833740234375, 27.739540100097656, 20.741989135742188, 15.87217903137207, 0.32358551025390625, 10.034584045410156, -10.217330932617188, 27.96307373046875, 0.05876922607421875, 26.96276092529297, 10.715019226074219, 29.85962677001953, 5.864128112792969, 26.759445190429688, 5.4454498291015625, 7.677835464477539, 22.200965881347656, 16.18035888671875, 73.53933715820312, 2.6189804077148438, -8.382438659667969, -3.4520034790039062, 28.8873291015625, 13.750982284545898, -11.633148193359375, 9.828447341918945, 22.084976196289062, 41.53376770019531, 27.188552856445312, 2.275035858154297, 4.1031951904296875, 21.75372314453125, 9.207679748535156, 5.994865417480469, 15.254226684570312, 32.201663970947266, 9.400672912597656, 25.96759033203125, 1.736083984375, -13.718832969665527, 25.597856521606445, 8.3619384765625, 17.303756713867188], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000655.npy"}
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 15.105348587036133, "std": 17.765050888061523, "min": -23.980642318725586, "p10": -5.135472869873046, "median": 12.18271255493164, "p90": 41.99936637878418, "max": 58.560401916503906, "pos_frac": 0.8125, "sample": [40.87184524536133, 3.5719375610351562, 43.13715362548828, 19.989967346191406, 4.534717559814453, 4.6949920654296875, 8.404338836669922, 2.1683120727539062, 12.254913330078125, -3.9063339233398438, 7.792041778564453, 35.41419982910156, 1.9567375183105469, 2.7736053466796875, 1.8051738739013672, 25.140945434570312, -12.951202392578125, -5.83514404296875, 15.305442810058594, 28.50571060180664, 58.560401916503906, 14.340038299560547, -23.980642318725586, 24.726726531982422, 32.4742431640625, -2.119548797607422, 15.310493469238281, 26.68536376953125, -5.6622467041015625, 2.8349037170410156, 49.046897888183594, -0.41236114501953125, 8.08357048034668, 14.229339599609375, 12.110511779785156, 42.62400817871094, 28.714385986328125, 14.523250579833984, 51.49485778808594, 16.55609130859375, -0.3123931884765625, -6.256477355957031, 23.763328552246094, 42.48258972167969, 6.607147216796875, 30.714759826660156, 8.481241226196289, 29.842071533203125, 6.191417694091797, 19.09050941467285, 29.714691162109375, -7.252086639404297, 15.8502197265625, 55.80888366699219, 19.578651428222656, 8.248626708984375, 3.9885330200195312, -9.799118041992188, 1.0867385864257812, 3.383359909057617, -2.6995162963867188, 7.300971984863281, 26.39190101623535, 38.7666015625], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000656.npy"}
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 16.855880737304688, "std": 20.652755737304688, "min": -37.09394454956055, "p10": -7.115998840332031, "median": 17.478188514709473, "p90": 45.00824279785158, "max": 62.32556915283203, "pos_frac": 0.78125, "sample": [50.21296691894531, 29.807601928710938, -4.466121673583984, 33.15936279296875, 20.51891326904297, 17.433698654174805, 52.34454345703125, 51.13313293457031, 24.523849487304688, -37.09394454956055, -8.46303939819336, -4.997491836547852, 25.818553924560547, 5.922544479370117, 37.48583984375, 12.358236312866211, 31.653968811035156, 56.84864807128906, -10.501701354980469, 36.25556945800781, -7.3891143798828125, 18.558177947998047, -14.30093002319336, 3.903806686401367, 17.19108009338379, 40.90907287597656, 1.87042236328125, 15.460868835449219, -0.6377811431884766, 8.629386901855469, 19.965667724609375, 32.302635192871094, 22.259033203125, 26.004283905029297, 26.844423294067383, 17.68841552734375, 17.52267837524414, 46.76502990722656, 38.65040588378906, 8.431818008422852, 29.556732177734375, 10.027999877929688, -0.8480911254882812, 34.719764709472656, -3.740276336669922, 0.5343017578125, 62.32556915283203, 6.284757614135742, -6.478729248046875, 9.54708480834961, 2.9443283081054688, -28.93360137939453, 35.04021453857422, 0.3659248352050781, 6.550832748413086, 7.739463806152344, -7.934089660644531, 21.067520141601562, 27.36362075805664, 52.922760009765625, -3.6669654846191406, 26.27684783935547, 3.8756179809570312, 32.65032958984375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000657.npy"}
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 12.206518173217773, "std": 21.16199493408203, "min": -37.41770935058594, "p10": -11.72021217346191, "median": 8.49488639831543, "p90": 37.5412094116211, "max": 61.45318603515625, "pos_frac": 0.734375, "sample": [17.158905029296875, 18.694515228271484, 25.907203674316406, -24.495758056640625, 6.41241455078125, 4.064016342163086, -25.99645233154297, 0.115447998046875, 23.948808670043945, 30.306396484375, 0.8473052978515625, 10.132843017578125, -31.759798049926758, 34.30880355834961, 22.91500473022461, 1.4360847473144531, -0.9296722412109375, 19.39874267578125, 61.45318603515625, 1.019927978515625, -3.286294937133789, 5.37139892578125, 5.336769104003906, 25.869850158691406, 41.32171630859375, 36.63233947753906, -13.443862915039062, 22.52198028564453, 23.726226806640625, 2.5162277221679688, 0.3975505828857422, -1.4435577392578125, 10.960077285766602, 34.8546142578125, -0.4970130920410156, 23.11743927001953, 37.93072509765625, -13.957855224609375, 3.125415802001953, 18.309585571289062, 56.20783996582031, 5.364988327026367, 10.798324584960938, 36.246253967285156, -1.8012161254882812, 20.22352409362793, -7.179283142089844, 43.95635986328125, 31.28661346435547, 25.210697174072266, 5.861602783203125, -2.3436717987060547, -5.337799072265625, -23.061967849731445, 11.4803466796875, 20.810958862304688, -7.698360443115234, -37.41770935058594, 51.926544189453125, -4.462089538574219, 56.05743408203125, 6.856929779052734, 32.068885803222656, 1.8606853485107422], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000658.npy"}
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 10.781927108764648, "std": 18.002063751220703, "min": -29.970157623291016, "p10": -10.09464111328125, "median": 10.04793930053711, "p90": 37.93788146972656, "max": 55.038997650146484, "pos_frac": 0.71875, "sample": [6.1220245361328125, 22.399520874023438, 38.225738525390625, 21.980592727661133, -2.7174720764160156, 37.80938720703125, 7.1392364501953125, 22.55360221862793, 10.392694473266602, 55.038997650146484, 28.320968627929688, -10.2374267578125, 5.638648986816406, -0.6617622375488281, 1.3633041381835938, 12.475379943847656, 20.062904357910156, -17.54633331298828, 6.029478073120117, -12.752052307128906, 37.992950439453125, -7.640798568725586, 2.82647705078125, 12.385963439941406, 2.7500858306884766, -0.9624786376953125, 5.187541961669922, 12.068675994873047, -3.2832584381103516, 10.230911254882812, 5.176185607910156, 11.242988586425781, 19.976661682128906, 6.486846923828125, 9.098773956298828, 18.65508270263672, 45.65510559082031, 21.49451446533203, 52.74812316894531, 10.81954574584961, 38.24143981933594, -1.938232421875, -9.761474609375, 28.0455265045166, -7.44743537902832, 3.700103759765625, 18.669151306152344, 15.461128234863281, 22.719375610351562, -12.742179870605469, 9.92770004272461, 21.629364013671875, -29.970157623291016, -15.96685791015625, 29.272171020507812, 10.16817855834961, -1.892669677734375, -27.555633544921875, 48.34528350830078, 12.005111694335938, -5.310295104980469, 21.13047218322754, 4.0631561279296875, -5.2972259521484375], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000659.npy"}
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 14.899673461914062, "std": 17.4344482421875, "min": -22.492996215820312, "p10": -5.581855773925781, "median": 12.020853996276855, "p90": 36.46623191833496, "max": 59.53266143798828, "pos_frac": 0.828125, "sample": [45.72460174560547, 7.033294677734375, 7.689212799072266, -5.1432342529296875, 16.912172317504883, -7.017753601074219, 8.22613525390625, 31.044437408447266, -17.924758911132812, -22.492996215820312, 36.6585807800293, 59.53266143798828, 11.037944793701172, 17.744434356689453, 11.677080154418945, 51.78253173828125, 30.72602081298828, 17.3909969329834, 9.956083297729492, -6.8911285400390625, 8.629806518554688, -13.981689453125, 27.187156677246094, 2.119783401489258, 11.8861083984375, 48.592315673828125, -14.04248046875, 15.13214111328125, 17.579395294189453, 0.8578567504882812, 10.852069854736328, 34.864166259765625, 47.609619140625, 11.511886596679688, -0.28179931640625, -5.76983642578125, 6.410713195800781, 5.535736083984375, 32.25403594970703, 2.4069976806640625, 12.677299499511719, 15.675403594970703, 17.35553741455078, 27.45220184326172, 33.40931701660156, 20.21869659423828, -0.837158203125, 3.1646347045898438, 26.488901138305664, 36.017417907714844, 13.333515167236328, 50.96427917480469, 8.177032470703125, -1.2332592010498047, 33.2911376953125, 13.485607147216797, 17.76449203491211, 23.890960693359375, 8.85504150390625, 1.63250732421875, 12.155599594116211, 15.729278564453125, 2.369720458984375, 8.520606994628906], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000660.npy"}
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 12.295242309570312, "std": 19.699399948120117, "min": -49.88506317138672, "p10": -9.854524993896483, "median": 11.111998558044434, "p90": 35.297541809082034, "max": 54.194923400878906, "pos_frac": 0.75, "sample": [9.140213012695312, -8.660377502441406, -17.843017578125, -14.240497589111328, 29.176105499267578, 10.97024917602539, 39.970245361328125, -3.990875244140625, -37.00495910644531, 16.56463050842285, -13.81890869140625, 1.5778656005859375, 5.976654052734375, 53.35102844238281, 26.039810180664062, 34.268836975097656, 54.194923400878906, 15.274444580078125, 45.684844970703125, 8.916755676269531, -49.88506317138672, 15.159744262695312, 32.61702346801758, -0.0463714599609375, -4.445247650146484, 2.8447647094726562, 7.114467620849609, -6.194507598876953, 4.362766265869141, 0.6480712890625, 13.832027435302734, 23.76658058166504, -17.83994483947754, 31.421241760253906, 23.004974365234375, 6.750894546508789, 19.453506469726562, 37.38854217529297, 3.113605499267578, 29.889366149902344, 27.806793212890625, 4.530651092529297, 5.655059814453125, 35.47632598876953, 31.438369750976562, 14.784866333007812, 36.11791229248047, 11.253747940063477, 5.190244674682617, 9.509380340576172, -10.366302490234375, 21.56951904296875, -6.5858612060546875, -1.9276485443115234, 13.376270294189453, 34.88037872314453, 25.674110412597656, 13.43194580078125, 9.894683837890625, 33.903472900390625, 13.468376159667969, -1.34161376953125, -0.6174697875976562, 31.267898559570312], "npy": "outputs/llama-3-8b-base-new-dpo-ultrafeedback-4xh200/margin_logs/step_0000661.npy"}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:595346a359a0f251b6f828ecde55b99d8d200998b72e97c71260664617d415b2
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c4a0d8c26a315903fc2506660d8ac2eb82c1e4d9a761e6a7de89830e1a119f6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6118899301ab39459bc5d6a2e903fb1043df1e3b9f4922d7d5a5c38bcdcb04ce
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:10559c4fb5a705d9e3eba6994a1ea40ad8cc284d97500dd100b04cdfce6d4da0
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f7f6c959c787c9136c91956982fe63d4a2415671517b4669c29893d90ea5949b
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4d412be46a00ffa560c43b8ca6eb0b2b8621bd8802c5cfb16f89eabaff50a12d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:993c87e67eb469c4a55029031a9b6771089ee781289efb5bdbdf1d1094d50746
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:13b721036f74e1ad862a1306032febd24e0d26c2f5d3b04758393bf943d4a56d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:941541894875ea7e6f01a9e80364e4173cb900787bd03a8daa5dd814911d2f1a
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b1d2e4fbe789eac620f265f84751e44e654e9003c3850e558718f85f56b76e37
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e3c38da1e7749e4b705535099982a684f889ebbf7147bd4bddc797e2f2d01d1d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4695f0486522612ceb85f4fe145c7523f03d86e27aed7bd1d3348ad8cda4ff68
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f64e0a029d549fff51b9954ceb03650be7c802523088dcfc5e00b02c6b8b1e2c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:75af98d429d60c7971a3c32a56edd7a52108edd85679ecb3c5b6bee5e856a01c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9ac37ff8081b8d8ef8444a92f6e33b7e46dc5c7aba58d9b0e71b7d4eb00b22a3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aece58bc20ad454cb3a74aac5b67190a19083a8127071a0bee593dea034b1d8e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4e79939617e7321f1b667162b3d294efd17103b4949baee989aba19a92a0b76e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:417b3b5f8c3e771325fb67d758c4631d2adefa9b8b062201094f733140f12448
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:16f95b049fe871d811c72efa0687ac1617827a0258e65d0d4ecd769f4a9add53
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d6d4410a9ffb39c7a86686b42c26d18419804bebac7466066db45edb392aa539
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9a593978095be43edffd6f31c875f5f8e0e68ac8076d6e4baa6aef2b47bbff71
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:789bbf146d51c04fdf3d8fd05c34830bae364388f1cda945afb9dbfe50efe3a8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9a5976885f0aa89630838b00ff8624e905be791c031ecc7c75f7b243cbd3fb4d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b2fcb51a00b2f27c3148c65f783a4f5253b6f1c3b74ab91df6d13a6f69725963
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a3b911ecc7108f3c8aa4647fd83c0c9f30fdcba84cec0891b3e3721f85da76ad
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:934655d55a8deba9db6499b6fcfa62e81823ea4e4964ce59ac403050f37412fc
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eca2f08a97b88964d9587e43e535840a55e6f5c5b77e9c5a3024f1c43bb88f37
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8ca770fe369e1c3309a67c36d61f211b24393c499f1da7f97d817c0fa7805837
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:26533b039e3c70dd40fcf3df72ccf0bc555072f2cb6db87d047b3791fca86eb3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7958408de51d3193f4b5450268b3a66dbb56b11f5b2be14c498e663db183a5ec
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d9eedaa59928422ee352acc0d7bd632b3ecfefbd0a1c9df81919fb7b9043bf3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c2efd36a5c36d0cbf42b1ba004381134438e2cb695ae37a43ac8df0910dd5d73
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:945ae7248e351b8d0e2b5b7971d4814d50a21cec9652749d3f386eab0fd8b76e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a43d3e1419084bb6dbab66584bd3c528150146cd8cbcfd4cb45e7c567ec79786
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1a423d1109802a7e25870816f213ae17e131bc7caf2bbce665b0801a031e1a78
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7d9f8edcdbfd2795b5db850816b14c4000b3fc74fdeaebe25370b0ba0459b520
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dbe849aa7aa79f0e8851f3bc7ebc8f9e1ea5b6c64f4ec3cfb060b96a57ebfca6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:191ac926965081b1264974227a827b9e91849e1c2c73bda87d01f0642ec3c167
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:49132df3156e1c1e724ac7247fe2fcba999c9b17927bf75399787a2f430bd2dc
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:221615b5281cfeda5bb8da4f7b09675172c1cd375e33a3e185baed16b7431f24
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6e5c14707a5936ed781083ebf018d0ee4c16e9b2e0d2b2c97de0e12f7c2631d7
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:31bb459f48d10c40c86922d4f618fe37100a1f654bdbd5d79920acae6e095653
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b77be52e3fdd5c85f13f0e0fc0c1511578c111489a45ab325c442eeab49c859c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fdc29e118998ab348909242a205b3df17aa1fe5018bc4b7f73e41e9af1060b58
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e0e6b7aa26fa3fc951be555ce6d88094dd0d8c8c6f975a617ce50e4ad73c3fe4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:723fca1e40ea94f287cdd2a6a23ec4bcbcac8e0ed02b3bf11f65e667e8e272e5
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7e3eb85718d7c532f9d8e124d16b5fefe2880ccc873489daae7ffee5c5a1ca8e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5c3df5a61db4c3017cb21694cecc2f2608d556339d734b433e8d975691e52d50
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ed497b08ba690a53da405a11caa402b80396b551b20c410e1c33eb7ad657c1aa
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2c5624a994f6f0106836e637b94dbdcdd1b2cb41eacac13cd2dbd3202e31c5f4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8a752043348c9dafe1013b68b000d81f99529de3d7e5518c9d5ef1c5fbce8609
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:245afa1ed86a1bd31f208924dfef0d4daee881790637a703ea65c38bf7de5ff3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d637ec6c9b36793066612c2643203712110e590abb3e0cb597e1c751408f49cd
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:24d464cd99be6c3de6141546fadc9296f7f9bd7ba6195a24fa31862cdc98023c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f182ee84113375eca30b4dd84dd351200f4ac94430f194c8f4a48d5c5f70bc8d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b3befd5b798cc066b2e801bcb8cfdf62e9aa6679fe5385b4854f671e4edf6c8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bb0a61223367367483aca2601294c31623b95b576896183e83df3257e7e43150
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b3f55b23c6210582edbcc284dd37ec91b8c3c94ea3d2c5b7aff329bb5c5230d4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3c8b5e6508b429ceb03755a4531e3f97f023416c9fc162ddaa7b5d88840b3139
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1c78c0ddf15bb22d264d360c81889ece4ef1f00372b241ea79a9f5f66787c7bf
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b7aca7df18a9b2b50835305317eeaf9edc9350a92c1105f66af12a3226ecd966
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fe7bbc8319271d57bdb443f16abc2aadcfff66804ebc2f7f3ee6dbb21a416003
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7694145dce766dfe0e7bc6ae4dd1aa93434bd7b6306cfe877fdbbb21b823efea
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a47a8d1518f5cf47382185743b8dd1b46e3e6fdfe038ff05190efdb7b64e3a5c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dc4e3d46496f4a95d7f328559616d0b432f8466f086b09eae2741f6e0fed8010
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:efba2de76e84f4fddbc28b9d8e6c09c668e579db5cf78970bb665fd5028150c0
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d3b57b39e3bd579ef386e9bda13a05c2ffd4270027a4fbd34cd4c3854731eb80
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e5ab05bfbdbd21bf7a8485cc9c6abd0cca3623bb6ae354bbe61a29ab2be3cba7
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0313665b6e7e1276ea3d2cd51a93a71c6f56571ad2a757cc8824b8f127b75c50
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9d705d8f910585e92e2b01cc428fe76e114a2a91518e9f7c5a0fa8e685b2bfd0
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0d4344e6654d302c28696b21dd2fe04006b339613eebb7893f2f974350e80014
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:da558ec0d09e36dd69fb891d4dd3c78fc2a20b639c30dd2376112711dccc0637
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:268dfa051c850c8d1d1133654432d745a1b3b9c41c616c4bd2032948615ce32e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:efa266336440cf568bb9e7c5f20b1531ed321e1a54c95107630ccb07d8ad2a16
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:76dff0acebe582b44c16937ac51f7e86878403864caba5d1400520d2a8526792
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d72c77ce17a864ce90e86d6ce73a721607b75b25147f3bbd18e37e2c52517ea8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:85670ef8807f8ca67b22b33b9517d767b4b4b278010d909156bc4e45f7d3220d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6cc5a9034b32ffad21d1bd0318d99f8060665d587b0f9f1379769de06ccce44d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4960acc700bf4deca0641c799d64caaa2329898888bff905107b46ea32392618
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1d690d0d7cb95a7fcd5baa6469a4c288bd02ae4ed28d62cc99db70486acdf7e0
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e62be6056a65f1f43d3a77cd4302f736796f53d8833be9b53438f503f93b3b4e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:161a04581d3577c58c2578ed11cfe3691ef1335ad17ca9ae3a4169eb7225f541
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:790eb7e07a0ec490d949e650f582a0d1ab44e7aba521a00ec41ab343e2a7c0bc
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:565c2bcacc158d5322570c1122bf5ca6622819002fb55c27eed4333532a622d8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:38d67a0507e6f768a59dfe9cba5687f449cb99cc29ed95e3b3b1999d08a653b3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fb98ab4c3523272af4e9aabb1a9944ca8c739030536727c27157be14b2df6765
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ba68a0c7d97b429e5dc64f002ab9f192de5a4082cb8c8fcbf68d380cd0bd6358
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9a0ced081795446daef4e2035a1ad63c547ab2acba2300478472f4d7a344d1fc
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d5d05cc3de4e2f77c077c8899aa9c78e6004ce8dc6b161eca31d72121d8a7e0e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8f1fb3192779801c3172e6521497c9097172e78c30782d5b20fd65e03300dca0
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:285c7569593bd4163e760864614a00784165c31e0dcbc80325205c66c01a5b35
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bcb6622ac9abff7e3238d3cca2ed5391943b9b2078425bcfc037173a4d9fb2db
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5ed2e0a201efbb14e705819d46991ca265958470fe393eba814457c5a558adcd
size 384

Some files were not shown because too many files have changed in this diff Show More