初始化项目,由ModelHub XC社区提供模型

Model: W-61/llama3-hh-harmless-qt045-b0p8-20260429-085449
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-16 04:42:19 +08:00
commit 6f346413ec
681 changed files with 21593 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

62
README.md Normal file
View File

@@ -0,0 +1,62 @@
---
library_name: transformers
base_model: W-61/llama-3-8b-base-sft-hh-harmless-4xh200
tags:
- alignment-handbook
- new-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449
This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-hh-harmless-4xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-hh-harmless-4xh200) on the Anthropic/hh-rlhf dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

9
all_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 0.999244142101285,
"total_flos": 0.0,
"train_loss": 1.1380426484229165,
"train_runtime": 1650.6898,
"train_samples": 42336,
"train_samples_per_second": 25.647,
"train_steps_per_second": 0.4
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"vocab_size": 128256
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.51.0"
}

661
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,661 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0013527870178222656, "std": 0.2564818859100342, "min": -0.736083984375, "p10": -0.3432229995727539, "median": 0.038166046142578125, "p90": 0.29227676391601565, "max": 0.645111083984375, "pos_frac": 0.578125, "sample": [0.1120758056640625, 0.12518310546875, 0.31621551513671875, 0.13765716552734375, -0.12592506408691406, 0.23141098022460938, -0.21887779235839844, 0.21950721740722656, 0.04480743408203125, 0.020877838134765625, 0.0570220947265625, 0.058269500732421875, -0.4338226318359375, -0.030628204345703125, 0.645111083984375, -0.395477294921875, 0.09050941467285156, 0.0007190704345703125, -0.34615325927734375, 0.016077041625976562, -0.33638572692871094, 0.293853759765625, 0.17610931396484375, 0.22386932373046875, 0.21470260620117188, -0.08536529541015625, 0.0907745361328125, -0.03816986083984375, 0.39190101623535156, 0.16336441040039062, 0.08024787902832031, -0.031158447265625, 0.08477020263671875, 0.002460479736328125, -0.242034912109375, 0.07232666015625, -0.60186767578125, 0.20531463623046875, 0.155731201171875, -0.14299774169921875, -0.25698089599609375, 0.12331962585449219, -0.26497650146484375, 0.15140533447265625, -0.0920257568359375, -0.18599319458007812, 0.19028091430664062, 0.2496490478515625, 0.42162322998046875, 0.17873382568359375, -0.1525421142578125, -0.4972076416015625, 0.32010650634765625, -0.10365867614746094, -0.233795166015625, -0.19828224182128906, -0.4018898010253906, -0.13407135009765625, -0.09596633911132812, 0.031524658203125, 0.28859710693359375, -0.192962646484375, -0.736083984375, 0.3026123046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000001.npy"}
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.03744968771934509, "std": 0.2875921130180359, "min": -0.7604827880859375, "p10": -0.2812448501586914, "median": 0.03963661193847656, "p90": 0.3654294967651367, "max": 0.8134727478027344, "pos_frac": 0.5625, "sample": [0.30594635009765625, -0.24289894104003906, -0.11509323120117188, -0.13417816162109375, 0.06942558288574219, 0.36568641662597656, -0.14640045166015625, 0.1497650146484375, 0.30261993408203125, 0.10124588012695312, 0.13028717041015625, -0.0031890869140625, 0.0361480712890625, 0.5662612915039062, 0.09694290161132812, -0.01091766357421875, 0.1128997802734375, 0.0411834716796875, -0.21860504150390625, -0.1236419677734375, -0.08812713623046875, 0.10360527038574219, 0.1790008544921875, -0.5114288330078125, 0.3056755065917969, -0.14553451538085938, 0.28168487548828125, 0.26990509033203125, 0.1686878204345703, 0.038089752197265625, 0.19541168212890625, -0.10783576965332031, -0.2644004821777344, -0.19707489013671875, -0.140472412109375, 0.1349811553955078, 0.19672012329101562, -0.0714111328125, 0.53369140625, 0.1271820068359375, 0.8134727478027344, 0.2990264892578125, -0.7604827880859375, -0.08274078369140625, 0.05890846252441406, 0.029361724853515625, 0.4510040283203125, -0.1599273681640625, -0.29346656799316406, 0.10005569458007812, -0.27509117126464844, -0.1937713623046875, 0.19167327880859375, 0.28173065185546875, -0.09406471252441406, -0.3380699157714844, -0.29186248779296875, 0.36483001708984375, 0.009979248046875, 0.44391632080078125, -0.126708984375, -0.6550216674804688, 0.6160736083984375, -0.28388214111328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000002.npy"}
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": 0.022987276315689087, "std": 0.37670379877090454, "min": -0.861907958984375, "p10": -0.5498474121093749, "median": 0.03221893310546875, "p90": 0.46909084320068367, "max": 0.7790374755859375, "pos_frac": 0.53125, "sample": [0.27184295654296875, 0.14017486572265625, -0.25759124755859375, -0.6647720336914062, -0.18487548828125, 0.7790374755859375, -0.06109619140625, 0.081268310546875, 0.3631591796875, -0.586669921875, -0.09204864501953125, 0.14984130859375, 0.1214752197265625, -0.04583740234375, 0.6027679443359375, 0.2713432312011719, -0.861907958984375, -0.055450439453125, 0.16783714294433594, -0.324981689453125, -0.7102813720703125, -0.1406097412109375, -0.23828125, 0.18198776245117188, -0.4244499206542969, 0.47544288635253906, 0.43883514404296875, 0.159332275390625, -0.11595916748046875, 0.0353240966796875, 0.10887527465820312, -0.6299095153808594, 0.361175537109375, 0.03113555908203125, -0.061054229736328125, -0.17077255249023438, -0.41095733642578125, -0.46392822265625, 0.024335861206054688, 0.6394500732421875, 0.4542694091796875, -0.086456298828125, 0.2817230224609375, 0.6971435546875, 0.03330230712890625, 0.39691925048828125, -0.12067413330078125, 0.4477863311767578, -0.055667877197265625, 0.1041259765625, -0.4415435791015625, 0.43413543701171875, 0.48459625244140625, 0.2814655303955078, -0.10064697265625, 0.04593658447265625, 0.418609619140625, -0.5957260131835938, -0.0083770751953125, 0.09284210205078125, -0.269134521484375, 0.7318878173828125, -0.6139297485351562, -0.04460906982421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000003.npy"}
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": 0.0026539862155914307, "std": 0.3499410152435303, "min": -0.80810546875, "p10": -0.4273097991943359, "median": 0.011664390563964844, "p90": 0.38760375976562506, "max": 0.89117431640625, "pos_frac": 0.515625, "sample": [-0.072784423828125, -0.121917724609375, 0.394287109375, 0.05194091796875, -0.377685546875, -0.5468330383300781, 0.14748764038085938, 0.6226367950439453, 0.0481414794921875, 0.657257080078125, 0.1854095458984375, -0.0555267333984375, -0.2974700927734375, 0.16730499267578125, -0.289581298828125, -0.144012451171875, -0.0472564697265625, -0.088836669921875, 0.119293212890625, -0.80810546875, -0.15674591064453125, -0.01032257080078125, -0.21014404296875, -0.7122955322265625, 0.24106216430664062, -0.26537322998046875, 0.34850311279296875, 0.3665771484375, -0.17107772827148438, 0.156646728515625, 0.882781982421875, 0.285308837890625, 0.1568450927734375, 0.08200836181640625, -0.44412994384765625, 0.0827789306640625, 0.89117431640625, 0.15162277221679688, -0.32513427734375, -0.1336822509765625, -0.00717926025390625, 0.0004558563232421875, 0.5332984924316406, -0.1334991455078125, 0.37200927734375, 0.0228729248046875, -0.369598388671875, 0.05728340148925781, -0.679412841796875, -0.008731842041015625, 0.12644195556640625, -0.29187774658203125, -0.264312744140625, 0.11625289916992188, 0.181884765625, 0.1072540283203125, -0.49974822998046875, 0.6525382995605469, 0.03232574462890625, -0.4019126892089844, -0.08864593505859375, 0.2156982421875, 0.17449951171875, -0.43819427490234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000004.npy"}
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": 0.04782620072364807, "std": 0.31983980536460876, "min": -0.579925537109375, "p10": -0.3190435409545898, "median": 0.017728805541992188, "p90": 0.4704343795776368, "max": 0.983245849609375, "pos_frac": 0.53125, "sample": [0.025707244873046875, -0.22484588623046875, -0.04449653625488281, 0.983245849609375, -0.1937103271484375, 0.07403564453125, -0.18675804138183594, 0.0097503662109375, 0.16976165771484375, 0.42975616455078125, 0.12078094482421875, -0.099884033203125, -0.4143638610839844, -0.06307792663574219, -0.47501373291015625, -0.20160293579101562, 0.11850357055664062, -0.06009864807128906, 0.38298797607421875, 0.1466064453125, -0.07683181762695312, 0.07771682739257812, 0.2270679473876953, 0.21249771118164062, -0.3965606689453125, 0.7145538330078125, -0.579925537109375, 0.03686714172363281, 0.20086669921875, 0.5115013122558594, 0.4518585205078125, -0.3860931396484375, -0.1900177001953125, 0.4783954620361328, -0.1196441650390625, -0.15708160400390625, -0.015436172485351562, -0.2264404296875, 0.05434608459472656, 0.0048465728759765625, 0.07547378540039062, -0.312286376953125, -0.2627983093261719, 0.249847412109375, 0.5647354125976562, 0.04439544677734375, 0.2703704833984375, 0.246002197265625, 0.36064910888671875, -0.196624755859375, 0.1209869384765625, -0.0152740478515625, 0.1616973876953125, 0.6785202026367188, 0.83538818359375, -0.1181488037109375, -0.365264892578125, -0.32193946838378906, -0.05472755432128906, 0.1427288055419922, -0.18117141723632812, -0.24521636962890625, -0.26406097412109375, 0.3278236389160156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000005.npy"}
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": -0.08591002225875854, "std": 0.3201543688774109, "min": -0.83795166015625, "p10": -0.47711563110351557, "median": -0.08083629608154297, "p90": 0.32687358856201204, "max": 0.8377151489257812, "pos_frac": 0.390625, "sample": [-0.33692169189453125, 0.098388671875, -0.27564239501953125, 0.5076828002929688, 0.12520599365234375, -0.2177295684814453, 0.12316513061523438, -0.5160064697265625, -0.13611602783203125, -0.123382568359375, 0.09827995300292969, -0.20770263671875, -0.43895721435546875, 0.3614044189453125, -0.5411224365234375, -0.351409912109375, 0.5069961547851562, 0.08156967163085938, 0.05927276611328125, -0.5861129760742188, -0.16381072998046875, -0.3734893798828125, -0.079315185546875, 0.24630165100097656, -0.13418960571289062, -0.360504150390625, 0.08551025390625, 0.00042724609375, -0.1501312255859375, 0.026464462280273438, 0.1810169219970703, -0.26979827880859375, 0.1491241455078125, -0.057727813720703125, -0.7297248840332031, 0.083587646484375, -0.2993354797363281, -0.21642684936523438, -0.3502998352050781, -0.05562591552734375, -0.13427734375, -0.03310394287109375, 0.02019500732421875, 0.17182540893554688, -0.2134246826171875, -0.14434814453125, 0.401031494140625, 0.16585540771484375, -0.08235740661621094, -0.039459228515625, 0.8377151489257812, 0.5452804565429688, -0.34598541259765625, -0.49346923828125, -0.0767669677734375, -0.3327770233154297, -0.13219833374023438, 0.02434539794921875, -0.83795166015625, -0.831695556640625, -0.2554473876953125, 0.3703327178955078, 0.20381736755371094, -0.04829216003417969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000006.npy"}
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": -0.0013970732688903809, "std": 0.2368505299091339, "min": -0.5979766845703125, "p10": -0.270684814453125, "median": 0.013325691223144531, "p90": 0.2726411819458009, "max": 0.5668869018554688, "pos_frac": 0.53125, "sample": [0.0205535888671875, 0.10634231567382812, 0.2430877685546875, -0.013612747192382812, 0.2853069305419922, 0.14444732666015625, 0.055469512939453125, 0.21027374267578125, 0.19190216064453125, 0.4036293029785156, -0.07149887084960938, -0.3442726135253906, 0.001617431640625, -0.4497222900390625, -0.08455085754394531, -0.08273124694824219, 0.20538711547851562, -0.12506866455078125, 0.015476226806640625, -0.27193450927734375, 0.15993690490722656, -0.5979766845703125, 0.1291656494140625, 0.5668869018554688, 0.3726806640625, -0.1144256591796875, -0.26541900634765625, -0.07883453369140625, 0.3686370849609375, 0.07001304626464844, -0.186798095703125, 0.12591552734375, -0.04653739929199219, 0.0553436279296875, 0.11702728271484375, 0.12226104736328125, -0.0699462890625, 0.3496665954589844, 0.06178855895996094, 0.04632759094238281, 0.1651458740234375, -0.1570911407470703, 0.5444755554199219, -0.5194244384765625, -0.01092529296875, 0.06873321533203125, -0.03079986572265625, -0.029195785522460938, -0.1558837890625, 0.0282135009765625, -0.25380706787109375, -0.26776885986328125, 0.011175155639648438, -0.15932464599609375, 0.12739181518554688, -0.0532989501953125, -0.4910430908203125, 0.08518218994140625, -0.4421539306640625, -0.11358642578125, 0.051544189453125, -0.03980255126953125, 0.18201446533203125, -0.25499725341796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000007.npy"}
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": 0.016815185546875, "std": 0.3597966432571411, "min": -1.0919418334960938, "p10": -0.3805562973022461, "median": 0.008671760559082031, "p90": 0.4876319885253907, "max": 1.1265945434570312, "pos_frac": 0.515625, "sample": [0.2539997100830078, 0.4732818603515625, 0.01214599609375, 0.24718475341796875, -1.0919418334960938, -0.06562042236328125, 0.5894241333007812, 0.6148910522460938, -0.20519256591796875, -0.19305419921875, -0.0828094482421875, 0.01317596435546875, -0.07450103759765625, -0.5145912170410156, 0.4759521484375, 1.1265945434570312, 0.39813995361328125, 0.13288497924804688, -0.371673583984375, 0.4947662353515625, 0.13461875915527344, -0.1043853759765625, -0.36948394775390625, 0.24371337890625, -0.08397674560546875, -0.3833332061767578, -0.5330867767333984, 0.1074371337890625, 0.12001800537109375, -0.12581253051757812, 0.49263763427734375, 0.1317596435546875, -0.24747848510742188, 0.052276611328125, -0.22942352294921875, -0.16690444946289062, 0.0590972900390625, 0.5177154541015625, 0.7995758056640625, -0.077423095703125, 0.33696746826171875, -0.020572662353515625, -0.2164154052734375, -0.37407684326171875, -0.2939872741699219, -0.181396484375, 0.0051975250244140625, -0.4464874267578125, -0.5427398681640625, -0.15981674194335938, 0.030330657958984375, 0.15456581115722656, -0.5363082885742188, -0.02862548828125, 0.293426513671875, 0.3333740234375, -0.2339458465576172, 0.115875244140625, 0.21533203125, 0.08156967163085938, 0.07071113586425781, 0.043365478515625, -0.0187530517578125, -0.12201690673828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000008.npy"}
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.006889969110488892, "std": 0.28940871357917786, "min": -0.64337158203125, "p10": -0.378013801574707, "median": 0.021646499633789062, "p90": 0.31947479248046884, "max": 0.855194091796875, "pos_frac": 0.53125, "sample": [-0.47640228271484375, 0.44952392578125, -0.21635055541992188, -0.1647796630859375, 0.033946990966796875, -0.029796600341796875, -0.2502479553222656, 0.12803268432617188, 0.1811981201171875, -0.64337158203125, -0.25074005126953125, -0.2798900604248047, 0.23954010009765625, 0.05268287658691406, -0.137054443359375, -0.640472412109375, -0.5961380004882812, 0.855194091796875, -0.07667160034179688, 0.21629905700683594, 0.3741455078125, 0.17618560791015625, 0.7324371337890625, 0.16121673583984375, -0.06682968139648438, 0.06618499755859375, 0.24515724182128906, 0.3760833740234375, -0.082855224609375, -0.07426834106445312, -0.1502227783203125, 0.040645599365234375, 0.00934600830078125, 0.1572418212890625, 0.035564422607421875, -0.16191864013671875, 0.299468994140625, -0.4301910400390625, -0.039794921875, 0.152252197265625, -0.3238372802734375, -0.3514537811279297, -0.397735595703125, 0.1054534912109375, 0.07761192321777344, 0.004817962646484375, 0.06281089782714844, 0.28307533264160156, -0.18580055236816406, -0.38939666748046875, 0.08221435546875, -0.0541534423828125, -0.025531768798828125, 0.3280487060546875, 0.12091445922851562, -0.20538330078125, 0.2627830505371094, -0.06194305419921875, 0.0390472412109375, 0.17079925537109375, 0.443206787109375, -0.01513671875, 0.2844085693359375, -0.0282135009765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000009.npy"}
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": 0.036684393882751465, "std": 0.3913628160953522, "min": -0.9143867492675781, "p10": -0.34941539764404295, "median": 0.004576683044433594, "p90": 0.47486534118652346, "max": 1.3668060302734375, "pos_frac": 0.5, "sample": [-0.25555419921875, 0.3340950012207031, 0.1849822998046875, 0.033069610595703125, -0.180755615234375, -0.86187744140625, -0.33422279357910156, 0.151824951171875, -0.3939361572265625, 0.26546478271484375, -0.27176666259765625, 0.4728279113769531, 0.1789093017578125, 0.3757781982421875, 1.0872802734375, -0.3305473327636719, 0.27419090270996094, -0.531341552734375, 0.227874755859375, -0.08019638061523438, -0.054157257080078125, 0.13733291625976562, -0.1021728515625, 0.5921707153320312, -0.30259132385253906, -0.038021087646484375, 0.2074127197265625, 0.2642822265625, -0.006710052490234375, -0.11873245239257812, 0.07954788208007812, 0.08567047119140625, -0.28347015380859375, -0.2711029052734375, 0.030817031860351562, 0.6279830932617188, -0.3738899230957031, 0.08504295349121094, 0.20264434814453125, 0.27936363220214844, -0.1854877471923828, -0.04697608947753906, -0.3626708984375, -0.2914276123046875, -0.19307708740234375, -0.1008148193359375, -0.9143867492675781, -0.0536041259765625, 1.3668060302734375, -0.2006988525390625, -0.02777099609375, 0.23496246337890625, -0.16720962524414062, 0.015863418579101562, 0.17438125610351562, 0.08646392822265625, 0.7042236328125, 0.08429527282714844, -0.21041107177734375, 0.25746917724609375, 0.475738525390625, -0.1982421875, 0.8687820434570312, -0.355926513671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000010.npy"}
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": 0.06162214279174805, "std": 0.29146817326545715, "min": -1.1552047729492188, "p10": -0.19211692810058592, "median": 0.0185699462890625, "p90": 0.4026016235351563, "max": 0.6943588256835938, "pos_frac": 0.546875, "sample": [0.33826446533203125, 0.1441783905029297, -0.16265487670898438, 0.3025779724121094, -0.5000133514404297, 0.08798599243164062, -0.02329254150390625, 0.16219711303710938, 0.15972900390625, 0.001251220703125, 0.29708099365234375, 0.49198150634765625, -0.00106048583984375, 0.039093017578125, -0.07499313354492188, -0.012939453125, 0.6943588256835938, 0.3836212158203125, -0.05322265625, 0.2709197998046875, -0.09862899780273438, 0.02362060546875, 0.31845855712890625, -0.01586151123046875, -0.10019302368164062, -0.16436004638671875, 0.05179595947265625, 0.0484466552734375, -0.16582298278808594, 0.37433624267578125, -0.30105018615722656, 0.013519287109375, -0.08866691589355469, 0.6851806640625, 0.0007381439208984375, 0.24666595458984375, -0.12924575805664062, 0.48069000244140625, 0.5748291015625, -0.18392181396484375, 0.34881591796875, 0.118072509765625, -0.31685638427734375, 0.4353790283203125, -0.16887664794921875, -1.1552047729492188, 0.07204055786132812, 0.08067131042480469, -0.08789825439453125, 0.25113677978515625, 0.2607421875, -0.036651611328125, -0.19562911987304688, -0.032306671142578125, -0.1976318359375, -0.09487152099609375, -0.00555419921875, 0.07626914978027344, -0.04361724853515625, 0.3300933837890625, -0.0392608642578125, 0.410736083984375, -0.39044189453125, 0.20906829833984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000011.npy"}
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": -0.03785929083824158, "std": 0.30310407280921936, "min": -0.8084487915039062, "p10": -0.4151165008544922, "median": -0.02516937255859375, "p90": 0.3030241012573243, "max": 0.80291748046875, "pos_frac": 0.484375, "sample": [-0.07895088195800781, -0.18781280517578125, 0.14794349670410156, 0.18630027770996094, -0.26708412170410156, 0.4259796142578125, 0.1544475555419922, -0.4780387878417969, 0.1956329345703125, 0.0292510986328125, -0.40045928955078125, -0.5204582214355469, -0.42499542236328125, 0.11676025390625, 0.1997966766357422, -0.20572662353515625, 0.245758056640625, 0.00482177734375, -0.29344940185546875, 0.2741508483886719, -0.3739738464355469, -0.10430908203125, -0.324554443359375, 0.80291748046875, 0.16599464416503906, -0.37017059326171875, -0.3230152130126953, -0.4213981628417969, 0.00380706787109375, -0.0367584228515625, -0.0426483154296875, -0.0760498046875, -0.8084487915039062, 0.1761322021484375, -0.26373863220214844, 0.17952346801757812, -0.18250656127929688, 0.4049797058105469, 0.053768157958984375, -0.19208717346191406, -0.2765655517578125, -0.15143394470214844, -0.1367034912109375, -0.16061973571777344, 0.46927642822265625, 0.31360816955566406, 0.065032958984375, -0.14913558959960938, 0.2392120361328125, 0.27832794189453125, -0.10261154174804688, 0.21230506896972656, 0.052005767822265625, -0.56744384765625, -0.051361083984375, -0.013580322265625, -0.7392787933349609, 0.38460540771484375, 0.016460418701171875, -0.3198585510253906, 0.36212921142578125, 0.06201171875, 0.22707366943359375, 0.17221832275390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000012.npy"}
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": -0.05621263384819031, "std": 0.3271491527557373, "min": -1.092864990234375, "p10": -0.43914947509765623, "median": -0.024595260620117188, "p90": 0.3411163330078125, "max": 0.5677490234375, "pos_frac": 0.4375, "sample": [-1.092864990234375, -0.216461181640625, -0.4930877685546875, -0.13159942626953125, 0.50689697265625, -0.6791496276855469, 0.4469413757324219, 0.32166099548339844, -0.013946533203125, 0.25006866455078125, 0.009359359741210938, 0.17137718200683594, 0.2630348205566406, -0.6192245483398438, 0.2636260986328125, 0.04046440124511719, 0.5677490234375, -0.4375152587890625, -0.184814453125, 0.005462646484375, -0.2905120849609375, -0.01705169677734375, 0.47454833984375, -0.31103515625, -0.66180419921875, 0.08957672119140625, -0.3723411560058594, 0.21593284606933594, -0.11456298828125, -0.068389892578125, -0.13788795471191406, 0.1678924560546875, 0.41706085205078125, -0.7491683959960938, -0.00485992431640625, 0.3338584899902344, -0.146209716796875, -0.023113250732421875, 0.38683319091796875, -0.2211456298828125, 0.07074737548828125, 0.127227783203125, -0.14279937744140625, -0.43208885192871094, -0.3121490478515625, -0.20065879821777344, 0.040477752685546875, 0.25099945068359375, -0.20206451416015625, 0.08294677734375, -0.25981903076171875, 0.0945587158203125, -0.09123992919921875, -0.07065200805664062, -0.0260772705078125, -0.439849853515625, -0.32440185546875, 0.10030174255371094, 0.20107269287109375, 0.3442268371582031, 0.18292236328125, -0.12762451171875, -0.09291267395019531, -0.31635093688964844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000013.npy"}
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": 0.08529558777809143, "std": 0.37512829899787903, "min": -0.6416053771972656, "p10": -0.24040069580078124, "median": 0.007833480834960938, "p90": 0.5006931304931643, "max": 1.436767578125, "pos_frac": 0.5, "sample": [-0.6416053771972656, -0.0053863525390625, 0.5283889770507812, -0.1664886474609375, 0.19096946716308594, 0.1784381866455078, 0.20449066162109375, 0.776031494140625, 0.1557331085205078, -0.037078857421875, 0.08105850219726562, 0.4360694885253906, 0.021053314208984375, -0.4419517517089844, -0.13241958618164062, -0.3553791046142578, -0.13707733154296875, 0.27416229248046875, -0.02935028076171875, 0.09637451171875, -0.00754547119140625, -0.5733718872070312, 0.09936904907226562, -0.24222946166992188, 0.06616973876953125, -0.1256256103515625, -0.1112060546875, -0.01465606689453125, -0.13562583923339844, 1.2987136840820312, 0.203948974609375, 1.2527923583984375, 0.11701202392578125, 1.436767578125, -0.23613357543945312, 0.1240081787109375, 0.06304931640625, -0.2829742431640625, -0.08676910400390625, -0.1190032958984375, 0.55731201171875, 0.39165496826171875, -0.12299728393554688, -0.03851318359375, -0.08024215698242188, 0.05655670166015625, -0.165924072265625, -0.11602020263671875, 0.5474605560302734, 0.18697547912597656, 0.22225570678710938, 0.3400230407714844, 0.06569290161132812, -0.05680084228515625, -0.07390594482421875, -0.3505973815917969, -0.093353271484375, -0.03524589538574219, -0.03334808349609375, 0.10066604614257812, 0.3284626007080078, 0.1094970703125, -0.18301010131835938, 0.179595947265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000014.npy"}
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": -0.022380709648132324, "std": 0.32376405596733093, "min": -1.07470703125, "p10": -0.34617919921875, "median": -0.0004444122314453125, "p90": 0.3376529693603517, "max": 0.9124298095703125, "pos_frac": 0.5, "sample": [0.29017066955566406, -0.320709228515625, -0.010540008544921875, -0.10975837707519531, -0.20673370361328125, 0.4008636474609375, 0.0869903564453125, -0.3469047546386719, -0.2711181640625, -0.8374176025390625, 0.29821014404296875, -0.07494735717773438, 0.2908477783203125, -0.053066253662109375, 0.0438079833984375, 0.14075469970703125, 0.9124298095703125, 0.0924835205078125, -0.11558723449707031, -0.09399986267089844, 0.441192626953125, -0.12469100952148438, -0.040126800537109375, 0.20445632934570312, 0.012807846069335938, 0.10840606689453125, 0.292572021484375, -0.1173095703125, 0.50384521484375, -0.247344970703125, -0.07842254638671875, -0.3444862365722656, -0.1881866455078125, -0.3758697509765625, 0.11021041870117188, 0.06876373291015625, 0.15679168701171875, -0.226104736328125, -0.06048583984375, 0.07886505126953125, -0.145416259765625, -0.0921173095703125, 0.3867645263671875, 0.03874969482421875, 0.10667037963867188, -1.07470703125, 0.00965118408203125, -0.17401885986328125, 0.13382720947265625, -0.8191986083984375, -0.5870513916015625, 0.07874298095703125, -0.09914779663085938, 0.02245330810546875, -0.120147705078125, 0.017627716064453125, -0.518402099609375, -0.11761093139648438, 0.10577011108398438, 0.3545570373535156, 0.636474609375, -0.03665924072265625, 0.11493682861328125, 0.05522727966308594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000015.npy"}
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": 0.005598574876785278, "std": 0.244706392288208, "min": -0.7479782104492188, "p10": -0.2748176574707031, "median": 0.018568038940429688, "p90": 0.328750228881836, "max": 0.5601806640625, "pos_frac": 0.53125, "sample": [-0.12180900573730469, 0.3019752502441406, -0.33449554443359375, -0.026763916015625, 0.45941162109375, 0.1930522918701172, -0.029954910278320312, -0.11602783203125, 0.20787811279296875, -0.18417739868164062, -0.29717254638671875, 0.0710296630859375, 0.02518463134765625, -0.13549041748046875, 0.09473609924316406, 0.07667350769042969, -0.01096343994140625, -0.041370391845703125, 0.07641983032226562, 0.129974365234375, 0.1521148681640625, -0.22265625, 0.3534088134765625, 0.5601806640625, 0.18156051635742188, -0.7479782104492188, -0.16890335083007812, -0.315460205078125, 0.03196525573730469, 0.3402252197265625, 0.34357452392578125, 0.0338592529296875, 0.06939697265625, 0.1704559326171875, 0.1312255859375, -0.657745361328125, 0.2942657470703125, -0.122100830078125, -0.12313461303710938, -0.5277423858642578, -0.0511932373046875, 0.09197235107421875, -0.15142440795898438, 0.3636932373046875, 0.300537109375, 0.09323883056640625, 0.14691162109375, -0.356048583984375, 0.10310935974121094, -0.1646556854248047, 0.011951446533203125, 0.11317253112792969, -0.11430168151855469, -0.12982177734375, 0.11362075805664062, 0.005687713623046875, -0.09528350830078125, -0.1937408447265625, -0.1077728271484375, 0.04958534240722656, -0.05345726013183594, -0.08365249633789062, 0.43834686279296875, -0.08678817749023438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000016.npy"}
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": 0.032395362854003906, "std": 0.2640622556209564, "min": -0.46709442138671875, "p10": -0.2576759338378906, "median": -0.000179290771484375, "p90": 0.3413536071777344, "max": 0.830352783203125, "pos_frac": 0.5, "sample": [0.20090866088867188, 0.0996551513671875, -0.09222984313964844, -0.09256172180175781, 0.6850128173828125, 0.12502288818359375, 0.34477996826171875, 0.026109695434570312, 0.20513534545898438, -0.09035491943359375, 0.1158294677734375, 0.022106170654296875, -0.0590362548828125, 0.17696380615234375, -0.14025115966796875, -0.2586212158203125, -0.2118682861328125, -0.2834968566894531, 0.12696075439453125, -0.246856689453125, 0.099365234375, -0.0067501068115234375, -0.376190185546875, -0.17630767822265625, 0.03000640869140625, -0.3033733367919922, 0.474029541015625, -0.11480140686035156, 0.12872695922851562, 0.13518142700195312, -0.19341278076171875, -0.3836212158203125, -0.25547027587890625, 0.000732421875, 0.47473907470703125, 0.830352783203125, -0.02390289306640625, -0.12137603759765625, -0.42192840576171875, 0.317230224609375, -0.027252197265625, -0.050724029541015625, 0.08447074890136719, 0.2028350830078125, 0.11825752258300781, 0.01688385009765625, -0.1920623779296875, 0.7117462158203125, 0.27318382263183594, 0.0284423828125, -0.10123443603515625, -0.46709442138671875, -0.051364898681640625, -0.1376628875732422, 0.44026947021484375, 0.21509170532226562, 0.026927947998046875, -0.0076808929443359375, 0.3333587646484375, -0.21001815795898438, 0.3123016357421875, -0.02567291259765625, -0.1850452423095703, -0.00109100341796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000017.npy"}
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": 0.05678561329841614, "std": 0.2664037346839905, "min": -0.706695556640625, "p10": -0.21603946685791015, "median": 0.029439926147460938, "p90": 0.3594409942626954, "max": 0.8592529296875, "pos_frac": 0.578125, "sample": [-0.13655853271484375, -0.155670166015625, -0.1434497833251953, 0.18550491333007812, 0.4406566619873047, 0.15996932983398438, 0.1733551025390625, 0.37480926513671875, 0.34577178955078125, -0.13287734985351562, 0.114532470703125, 0.0015411376953125, 0.08899497985839844, 0.037220001220703125, -0.08556365966796875, 0.258209228515625, -0.024257659912109375, 0.1418304443359375, 0.30762481689453125, 0.14996337890625, -0.04875946044921875, 0.8592529296875, -0.378997802734375, 0.6528091430664062, 0.1419658660888672, 0.30552101135253906, -0.1314849853515625, 0.34248924255371094, 0.5594520568847656, 0.11716270446777344, 0.08338165283203125, 0.153564453125, -0.0229034423828125, 0.008209228515625, -0.016452789306640625, -0.12633514404296875, -0.020097732543945312, 0.016046524047851562, -0.017669677734375, 0.2038726806640625, 0.1692657470703125, 0.02165985107421875, -0.67877197265625, -0.706695556640625, -0.27892112731933594, 0.45229148864746094, -0.2206439971923828, 0.019313812255859375, -0.0091705322265625, -0.26612091064453125, -0.3041229248046875, -0.028697967529296875, -0.20529556274414062, -0.10944366455078125, 0.20206451416015625, -0.1949024200439453, 0.23571014404296875, 0.0390167236328125, 0.077850341796875, 0.2869834899902344, -0.06515121459960938, -0.0504302978515625, 0.3652992248535156, 0.10055923461914062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000018.npy"}
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": 0.07246837019920349, "std": 0.36890846490859985, "min": -1.11224365234375, "p10": -0.29420547485351556, "median": 0.030638694763183594, "p90": 0.4900993347167969, "max": 0.8878021240234375, "pos_frac": 0.5625, "sample": [0.6446990966796875, -0.111541748046875, -0.1396026611328125, -0.25103759765625, 0.1416778564453125, 0.2706890106201172, -0.391632080078125, -0.029449462890625, -1.0859222412109375, 0.2197113037109375, 0.13164520263671875, 0.2664947509765625, 0.456024169921875, 0.09669113159179688, 0.2227325439453125, -0.11445808410644531, -0.025127410888671875, -0.06539726257324219, 0.14011383056640625, 0.24237632751464844, -0.0071868896484375, 0.20549774169921875, -0.21721267700195312, 0.206787109375, 0.6341705322265625, 0.8013381958007812, -0.39996337890625, -0.20967864990234375, -0.1427154541015625, 0.013311386108398438, 0.7259521484375, -0.43123626708984375, 0.05231475830078125, 0.3885765075683594, -1.11224365234375, 0.4374237060546875, -0.11652755737304688, -0.2017841339111328, 0.4776039123535156, 0.0059051513671875, -0.058185577392578125, -0.14499855041503906, -0.1356201171875, 0.11574935913085938, 0.18329620361328125, 0.2865028381347656, -0.06536102294921875, 0.8878021240234375, -0.043178558349609375, 0.24907684326171875, 0.36182403564453125, 0.030040740966796875, 0.17020416259765625, -0.025493621826171875, -0.208160400390625, 0.48764801025390625, -0.31270599365234375, -0.38826751708984375, 0.015375137329101562, 0.7268524169921875, 0.031236648559570312, -0.10504150390625, 0.49114990234375, 0.3592109680175781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000019.npy"}
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": 0.027072086930274963, "std": 0.29346784949302673, "min": -0.914703369140625, "p10": -0.31215476989746094, "median": 0.03509712219238281, "p90": 0.42538528442382817, "max": 0.6607208251953125, "pos_frac": 0.578125, "sample": [-0.0356903076171875, -0.3160057067871094, -0.01227569580078125, 0.07115936279296875, 0.07309913635253906, -0.28023529052734375, -0.061588287353515625, 0.132080078125, 0.14331817626953125, -0.11380767822265625, 0.17729949951171875, 0.003566741943359375, -0.4460296630859375, 0.04861259460449219, 0.2067108154296875, -0.023508071899414062, 0.19676971435546875, 0.0154876708984375, 0.18021392822265625, -0.5439071655273438, 0.03787040710449219, 0.011913299560546875, 0.20772171020507812, 0.39714813232421875, 0.08979606628417969, 0.41068267822265625, 0.5045452117919922, 0.2103099822998047, 0.2878875732421875, -0.03917407989501953, -0.500640869140625, -0.5660743713378906, 0.20694732666015625, 0.09417724609375, -0.914703369140625, 0.012172698974609375, 0.28829193115234375, -0.124267578125, -0.21671295166015625, -0.08193588256835938, -0.2432575225830078, -0.16597366333007812, -0.14532470703125, -0.1288604736328125, 0.4319305419921875, 0.306243896484375, 0.446746826171875, -0.46701812744140625, 0.13766860961914062, 0.44910430908203125, -0.30316925048828125, 0.651123046875, 0.12658309936523438, 0.6607208251953125, -0.029117584228515625, 0.10631370544433594, 0.09657669067382812, 0.4316864013671875, 0.0684356689453125, -0.09324455261230469, -0.018259048461914062, -0.15307235717773438, -0.19677162170410156, 0.03232383728027344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000020.npy"}
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": -0.008510619401931763, "std": 0.24353046715259552, "min": -0.5550079345703125, "p10": -0.33247909545898435, "median": -0.026187896728515625, "p90": 0.3464553833007814, "max": 0.7212066650390625, "pos_frac": 0.4375, "sample": [-0.1602020263671875, 0.09579849243164062, -0.13350296020507812, -0.2357635498046875, 0.4090576171875, 0.2221832275390625, 0.004375457763671875, -0.06391525268554688, -0.079559326171875, -0.09827804565429688, -0.027706146240234375, -0.2318267822265625, 0.0591583251953125, 0.1805572509765625, 0.17217254638671875, 0.19661712646484375, -0.5550079345703125, 0.7212066650390625, -0.4756584167480469, -0.03211021423339844, -0.05200958251953125, -0.024669647216796875, -0.023998260498046875, -0.33358001708984375, -0.152313232421875, -0.3344268798828125, -0.3299102783203125, 0.366424560546875, -0.10758209228515625, 0.010894775390625, 0.409454345703125, -0.18158531188964844, 0.42559814453125, 0.007266998291015625, -0.2740478515625, 0.313995361328125, 0.5663604736328125, -0.032196044921875, -0.11293792724609375, -0.23578643798828125, -0.054668426513671875, 0.0053253173828125, -0.04620361328125, 0.04065895080566406, 0.3603668212890625, 0.03912162780761719, -0.01972198486328125, 0.04883575439453125, -0.37454986572265625, 0.06785774230957031, 0.135986328125, 0.306427001953125, 0.08570671081542969, -0.0117645263671875, 0.032939910888671875, 0.109405517578125, -0.09596061706542969, -0.12532615661621094, -0.07458686828613281, -0.348480224609375, -0.07242584228515625, 0.08311080932617188, -0.0575714111328125, -0.4517097473144531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000021.npy"}
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": 0.041648685932159424, "std": 0.3197229504585266, "min": -1.3625640869140625, "p10": -0.314251708984375, "median": 0.0786752700805664, "p90": 0.37467193603515625, "max": 0.6451263427734375, "pos_frac": 0.65625, "sample": [-0.318359375, -0.7898979187011719, -0.13095855712890625, 0.32398223876953125, 0.11473846435546875, 0.5463027954101562, 0.07579421997070312, 0.3692779541015625, 0.09977340698242188, 0.14020156860351562, 0.3687629699707031, 0.036376953125, -0.2099781036376953, 0.08155632019042969, 0.42906951904296875, 0.0930328369140625, -0.093902587890625, -0.3777809143066406, 0.03775787353515625, -0.3168792724609375, 0.36489105224609375, 0.6451263427734375, 0.205963134765625, 0.0203704833984375, 0.11729621887207031, -1.3625640869140625, 0.13512802124023438, -0.15456771850585938, -0.031164169311523438, 0.16007232666015625, -0.46893310546875, -0.1839141845703125, 0.08986663818359375, 0.13632965087890625, -0.030853271484375, 0.001132965087890625, 0.020757675170898438, 0.34332275390625, -0.46520233154296875, 0.04183197021484375, 0.11396408081054688, 0.28924560546875, 0.4937896728515625, 0.07000732421875, -0.14945602416992188, 0.2158660888671875, 0.15999221801757812, -0.1272430419921875, 0.376983642578125, -0.3028717041015625, -0.13782119750976562, 0.176727294921875, -0.1779327392578125, 0.152587890625, 0.354339599609375, 0.053623199462890625, -0.01287078857421875, 0.43623924255371094, 0.40279388427734375, 0.316070556640625, 0.2436676025390625, 0.044719696044921875, -0.08254623413085938, -0.3081207275390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000022.npy"}
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": -0.015064775943756104, "std": 0.34985482692718506, "min": -0.971466064453125, "p10": -0.39312973022460934, "median": 0.003513336181640625, "p90": 0.39782295227050785, "max": 0.7305145263671875, "pos_frac": 0.5, "sample": [-0.8695755004882812, -0.968963623046875, 0.15654754638671875, -0.259674072265625, -0.03149986267089844, 0.06348228454589844, -0.3241729736328125, -0.335540771484375, -0.39987945556640625, 0.0573577880859375, 0.101318359375, 0.021814346313476562, 0.2742156982421875, -0.15071678161621094, 0.26610374450683594, 0.056396484375, -0.8044891357421875, 0.4964790344238281, -0.16552734375, 0.5248870849609375, 0.175689697265625, 0.07179069519042969, -0.11193084716796875, -0.2365264892578125, 0.21197509765625, 0.1513824462890625, -0.25351715087890625, -0.08917045593261719, 0.7305145263671875, 0.3287506103515625, -0.971466064453125, 0.060695648193359375, 0.308197021484375, -0.014787673950195312, 0.5128059387207031, -0.16905975341796875, -0.097625732421875, 0.4026031494140625, 0.12766647338867188, -0.432220458984375, -0.2657318115234375, -0.16381072998046875, -0.17972946166992188, -0.37738037109375, 0.5398292541503906, 0.0998077392578125, -0.05753326416015625, 0.2553901672363281, -0.04907989501953125, -0.6270751953125, 0.2829437255859375, 0.09299659729003906, -0.029012680053710938, -0.016847610473632812, -0.10700607299804688, -0.05347442626953125, -0.0770263671875, 0.08521270751953125, -0.1515655517578125, 0.23713302612304688, 0.4939727783203125, 0.3866691589355469, 0.2677326202392578, 0.0351104736328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000023.npy"}
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": 0.002107083797454834, "std": 0.24489238858222961, "min": -0.5718460083007812, "p10": -0.3157489776611328, "median": 0.025852203369140625, "p90": 0.27273921966552733, "max": 0.6023483276367188, "pos_frac": 0.53125, "sample": [0.21451187133789062, -0.016788482666015625, -0.0454254150390625, -0.004608154296875, 0.2925567626953125, -0.19791030883789062, 0.25365447998046875, -0.16561126708984375, -0.2945976257324219, -0.5718460083007812, -0.03665351867675781, 0.0428466796875, 0.06109046936035156, -0.5360260009765625, 0.372711181640625, 0.2589874267578125, -0.01750946044921875, -0.06521797180175781, -0.13826751708984375, 0.07849693298339844, -0.40827178955078125, 0.09795379638671875, -0.16991424560546875, -0.22922134399414062, 0.6023483276367188, -0.13117599487304688, 0.12325286865234375, 0.08135223388671875, -0.40167236328125, 0.10760498046875, -0.016998291015625, -0.1339397430419922, -0.27205657958984375, -0.10612106323242188, 0.1017303466796875, -0.182403564453125, 0.2701263427734375, -0.06872367858886719, 0.2946319580078125, 0.025360107421875, 0.11341094970703125, 0.4929656982421875, 0.07371139526367188, 0.27385902404785156, -0.3662109375, 0.5137786865234375, -0.2583122253417969, -0.193603515625, 0.05368614196777344, 0.2181396484375, -0.12667465209960938, -0.2740325927734375, 0.20440673828125, 0.02634429931640625, -0.3248138427734375, 0.13959884643554688, 0.171173095703125, 0.0926513671875, 0.009044647216796875, 0.23484420776367188, 0.080780029296875, -0.416534423828125, 0.09246826171875, 0.2359161376953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000024.npy"}
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": -0.0020221471786499023, "std": 0.3267594575881958, "min": -0.7387847900390625, "p10": -0.28720684051513673, "median": -0.03180980682373047, "p90": 0.3518867492675782, "max": 1.45782470703125, "pos_frac": 0.40625, "sample": [0.34160614013671875, 0.221038818359375, 0.033573150634765625, -0.028751373291015625, 0.2306690216064453, -0.041168212890625, 0.2366943359375, 0.09700584411621094, -0.54541015625, 0.093963623046875, -0.2777099609375, 0.25394630432128906, -0.2744598388671875, -0.352020263671875, 0.3289031982421875, 0.3895301818847656, -0.7387847900390625, -0.2700691223144531, 0.356292724609375, -0.20581817626953125, -0.23299789428710938, -0.021604537963867188, 0.02613067626953125, 1.45782470703125, 0.31690216064453125, -0.014429092407226562, -0.44419097900390625, 0.024017333984375, -0.2985496520996094, -0.019649505615234375, -0.26017189025878906, 0.3968334197998047, -0.2317047119140625, -0.13196945190429688, 0.2336273193359375, -0.08635711669921875, 0.587371826171875, -0.2912769317626953, -0.16074371337890625, 0.0155181884765625, -0.3724212646484375, -0.21859359741210938, -0.22530555725097656, -0.10752487182617188, -0.1067047119140625, -0.14205551147460938, 0.15222930908203125, -0.18017578125, -0.2613372802734375, -0.21868133544921875, -0.18072891235351562, -0.03312492370605469, 0.23285675048828125, -0.0016632080078125, -0.03049468994140625, -0.11923408508300781, -0.154571533203125, 0.04619598388671875, 0.1058807373046875, -0.10004425048828125, 0.01366424560546875, 0.779205322265625, -0.1909503936767578, 0.470550537109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000025.npy"}
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": 0.010331422090530396, "std": 0.31066516041755676, "min": -0.7294464111328125, "p10": -0.36815032958984373, "median": -0.0008306503295898438, "p90": 0.4536441802978517, "max": 0.839752197265625, "pos_frac": 0.484375, "sample": [0.20113754272460938, -0.7294464111328125, 0.02762603759765625, -0.14128875732421875, -0.051242828369140625, 0.1268634796142578, -0.28208160400390625, -0.1752471923828125, 0.024585723876953125, 0.20694351196289062, -0.043514251708984375, 0.69329833984375, -0.3698577880859375, -0.23516082763671875, -0.4997100830078125, 0.13036346435546875, -0.06748580932617188, 0.1994647979736328, 0.07511138916015625, 0.2366046905517578, 0.225372314453125, -0.1866931915283203, -0.127685546875, -2.86102294921875e-05, -0.04804229736328125, -0.11914825439453125, 0.4664764404296875, -0.03652000427246094, -0.13666725158691406, -0.5286407470703125, 0.018220901489257812, 0.46755027770996094, 0.10870361328125, 0.18748855590820312, -0.35889434814453125, -0.4046783447265625, 0.06180572509765625, 0.1262836456298828, -0.018463134765625, 0.5333404541015625, -0.0016326904296875, -0.6183700561523438, -0.21914291381835938, -0.18839645385742188, 0.718292236328125, -0.364166259765625, 0.0709381103515625, 0.18955421447753906, 0.839752197265625, -0.10015678405761719, 0.5906753540039062, -0.45208740234375, 0.4237022399902344, -0.26653289794921875, -0.10325241088867188, 0.21416091918945312, -0.06338119506835938, -0.09621238708496094, 0.27219390869140625, 0.011121749877929688, -0.011936187744140625, 0.029592514038085938, 0.0005130767822265625, 0.22923851013183594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000026.npy"}
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": 0.03918081521987915, "std": 0.255842000246048, "min": -0.5568466186523438, "p10": -0.2731201171875, "median": 0.043178558349609375, "p90": 0.32829875946044923, "max": 0.7299118041992188, "pos_frac": 0.546875, "sample": [0.4877815246582031, -0.09002304077148438, -0.11127471923828125, 0.2294597625732422, 0.06991386413574219, -0.02037811279296875, -0.11331558227539062, -0.2797393798828125, 0.22931671142578125, -0.2576751708984375, -0.01031494140625, -0.5568466186523438, -0.007171630859375, -0.033817291259765625, -0.3031024932861328, 0.09109115600585938, 0.3293418884277344, -0.2288055419921875, -0.5535392761230469, -0.3817863464355469, 0.40044403076171875, -0.0195770263671875, -0.12127685546875, 0.17523956298828125, -0.02606964111328125, 0.2109527587890625, 0.43897247314453125, -0.17693710327148438, 0.6418304443359375, -0.1575794219970703, 0.08906936645507812, 0.03489494323730469, -0.07864761352539062, -0.32483673095703125, 0.3258647918701172, 0.13809967041015625, 0.1342334747314453, 0.16765594482421875, 0.0381622314453125, 0.2020721435546875, 0.04819488525390625, -0.19794464111328125, 0.23106002807617188, -0.15493011474609375, -0.24248886108398438, 0.15955352783203125, 0.7299118041992188, 0.2663230895996094, 0.229095458984375, 0.017885208129882812, 0.27657318115234375, 0.115753173828125, 0.07024574279785156, 0.3833465576171875, 0.2712993621826172, -0.058300018310546875, 0.11962890625, -0.14513206481933594, 0.1757965087890625, -0.11226463317871094, -0.03012847900390625, -0.4508514404296875, 0.15239715576171875, 0.07086563110351562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000027.npy"}
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": -0.03986704349517822, "std": 0.27063697576522827, "min": -0.65594482421875, "p10": -0.42482547760009765, "median": -0.013090133666992188, "p90": 0.30217723846435546, "max": 0.40301513671875, "pos_frac": 0.46875, "sample": [-0.3442840576171875, -0.246337890625, -0.59722900390625, 0.182373046875, -0.2716217041015625, 0.028858184814453125, 0.1304931640625, -0.06158447265625, 0.3025531768798828, 0.235687255859375, -0.3556709289550781, -0.0735626220703125, -0.19757843017578125, -0.16977691650390625, 0.20574951171875, -0.214752197265625, -0.42680931091308594, -0.48870277404785156, 0.2875862121582031, 0.31783294677734375, 0.40301513671875, 0.000896453857421875, 0.14897537231445312, -0.0607452392578125, 0.2082843780517578, 0.05283546447753906, -0.00017547607421875, 0.09992218017578125, -0.65594482421875, 0.13689422607421875, -0.6371078491210938, -0.15266799926757812, 0.2695350646972656, -0.089385986328125, 0.0733795166015625, 0.31629180908203125, 0.131378173828125, -0.0843658447265625, -0.0629425048828125, -0.4765625, -0.1490459442138672, -0.44171905517578125, -0.15308380126953125, 0.35780906677246094, 0.13694000244140625, -0.007049560546875, 0.3344402313232422, -0.414794921875, -0.0787200927734375, -0.019130706787109375, -0.04706573486328125, 0.26595306396484375, 0.3072471618652344, 0.2596435546875, 0.301300048828125, -0.391998291015625, 0.06264686584472656, -0.420196533203125, -0.20360565185546875, 0.1715831756591797, -0.1641693115234375, 0.07257652282714844, 0.07603645324707031, -0.271820068359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000028.npy"}
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": 0.008836179971694946, "std": 0.34548816084861755, "min": -0.7635345458984375, "p10": -0.3940261840820312, "median": -0.013178825378417969, "p90": 0.4268836975097657, "max": 1.2909698486328125, "pos_frac": 0.46875, "sample": [0.23191070556640625, -0.46927642822265625, 0.12120246887207031, -0.36840057373046875, 1.2909698486328125, 0.08299636840820312, -0.22332000732421875, -0.014028549194335938, -0.402984619140625, -0.1836395263671875, 0.1719989776611328, 0.43497467041015625, -0.21842575073242188, -0.26305389404296875, 0.36963653564453125, 0.27573394775390625, 0.059173583984375, -0.3524341583251953, -0.7635345458984375, -0.0215606689453125, 0.003253936767578125, -0.0123291015625, 0.17860031127929688, -0.00984954833984375, 0.16339111328125, 0.17967987060546875, -0.2949333190917969, 0.12823486328125, 0.4080047607421875, 0.2341175079345703, -0.035366058349609375, -0.0806427001953125, -0.6260528564453125, 0.067535400390625, 0.524200439453125, 0.06979942321777344, -0.0552520751953125, 0.5508575439453125, -0.035213470458984375, -0.2204132080078125, -0.19671630859375, -0.015132904052734375, -0.024404525756835938, -0.03939056396484375, -0.030338287353515625, -0.1578216552734375, 0.1658782958984375, 0.254547119140625, 0.15578460693359375, 0.08301353454589844, 0.506195068359375, -0.17943763732910156, 0.6737098693847656, 0.06842422485351562, -0.5065994262695312, -0.0477142333984375, -0.5845260620117188, -0.3731231689453125, -0.059234619140625, 0.552215576171875, -0.692413330078125, -0.23620986938476562, 0.20136260986328125, 0.151885986328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000029.npy"}
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": -0.01739645004272461, "std": 0.3164796531200409, "min": -0.7031059265136719, "p10": -0.4000518798828125, "median": -0.009342193603515625, "p90": 0.31958332061767586, "max": 0.98162841796875, "pos_frac": 0.484375, "sample": [-0.22412490844726562, 0.052371978759765625, -0.29004669189453125, -0.064208984375, -0.4148731231689453, 0.07758331298828125, -0.09064102172851562, 0.7241058349609375, -0.13791656494140625, -0.64697265625, 0.19124603271484375, -0.389862060546875, -0.21059417724609375, 0.3253631591796875, -0.550384521484375, -0.0130767822265625, -0.219207763671875, 0.4932403564453125, -0.21944427490234375, 0.5035171508789062, 0.0673828125, -0.06956100463867188, -0.15407943725585938, 0.18722152709960938, 0.18853759765625, -0.34818267822265625, -0.00560760498046875, -0.5114974975585938, 0.1630401611328125, 0.10506439208984375, -0.2330169677734375, 0.0836334228515625, -0.24407196044921875, 0.0786895751953125, -0.274017333984375, 0.1796875, -0.32758331298828125, -0.17230796813964844, 0.409637451171875, -0.054622650146484375, -0.023824691772460938, 0.06912994384765625, 0.98162841796875, 0.11110687255859375, 0.09133529663085938, 0.2982330322265625, 0.15634918212890625, -0.2837715148925781, 0.20439529418945312, -0.4044189453125, -0.7031059265136719, 0.47344970703125, 0.11968994140625, 0.20272064208984375, -0.258148193359375, 0.09886550903320312, 0.11118316650390625, 0.052219390869140625, -0.6446533203125, -0.03570556640625, -0.043361663818359375, -0.16710662841796875, 0.20989990234375, 0.30609703063964844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000030.npy"}
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": 0.042792946100234985, "std": 0.35233232378959656, "min": -1.12469482421875, "p10": -0.30418987274169923, "median": 0.007161140441894531, "p90": 0.5619262695312501, "max": 0.7359161376953125, "pos_frac": 0.515625, "sample": [0.6756057739257812, 0.38103485107421875, -0.03125, 0.28780174255371094, -0.08240509033203125, 0.0390167236328125, -0.23095703125, 0.316131591796875, -0.0707244873046875, -0.48186492919921875, 0.11293411254882812, -0.1270904541015625, 0.6367568969726562, 0.0999908447265625, 0.019866943359375, -0.222869873046875, 0.202362060546875, 0.02967071533203125, 0.533599853515625, 0.2842292785644531, 0.7359161376953125, 0.1455402374267578, -0.07310676574707031, -0.17607879638671875, 0.5037403106689453, 0.712615966796875, 0.63702392578125, -0.149627685546875, 0.02016448974609375, -0.30460357666015625, 0.0413970947265625, 0.30059814453125, -0.48851776123046875, -0.3032245635986328, 0.0071201324462890625, -0.23630142211914062, 0.02761077880859375, -0.40126800537109375, -0.15661239624023438, -0.010417938232421875, -0.09105300903320312, -0.7583465576171875, -0.20821762084960938, 0.2997283935546875, -0.075592041015625, -0.00743865966796875, -0.13911056518554688, -0.2944679260253906, 0.15969467163085938, -0.00531768798828125, -1.12469482421875, -0.3272666931152344, 0.371246337890625, 0.07761383056640625, 0.665557861328125, 0.0072021484375, 0.25502777099609375, 0.574066162109375, -0.21463584899902344, -0.09969329833984375, 0.451873779296875, 0.1919403076171875, -0.0175323486328125, -0.15564346313476562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000031.npy"}
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.014745950698852539, "std": 0.3519182503223419, "min": -1.0792388916015625, "p10": -0.46595096588134766, "median": -0.006104469299316406, "p90": 0.44872970581054694, "max": 0.9544143676757812, "pos_frac": 0.484375, "sample": [-0.02976226806640625, 0.07118988037109375, -0.5298690795898438, -0.1094512939453125, -0.261505126953125, 0.3041114807128906, 0.2114715576171875, -0.5001907348632812, -0.0685882568359375, -0.1584930419921875, 0.7981681823730469, 0.793243408203125, 0.2883720397949219, -0.00308990478515625, -0.15077781677246094, 0.43560791015625, -0.04443359375, 0.005588531494140625, -0.3924407958984375, 0.06700515747070312, -0.01325225830078125, -0.2362537384033203, 0.08891868591308594, 0.48626708984375, -0.0372772216796875, -0.1327075958251953, 0.3897857666015625, 0.03771209716796875, -0.2230682373046875, 0.9544143676757812, -0.029590606689453125, 0.36226654052734375, 0.2541942596435547, 0.1629180908203125, 0.45435333251953125, 0.474639892578125, -0.32605743408203125, -0.13414382934570312, 0.05316162109375, 0.49337005615234375, 0.05818939208984375, 0.3237495422363281, 0.1096038818359375, -0.11254501342773438, -0.04148292541503906, 0.0492401123046875, 0.29057884216308594, -0.4741249084472656, -0.469970703125, 0.29589080810546875, -1.0792388916015625, -0.034374237060546875, -0.4565715789794922, -0.009119033813476562, -0.3491706848144531, -0.20951461791992188, -0.24365234375, 0.22010421752929688, -0.491851806640625, 0.1835613250732422, 0.09177017211914062, -0.0337371826171875, -0.532562255859375, 0.05316162109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000032.npy"}
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": 0.0647314041852951, "std": 0.34106022119522095, "min": -1.177032470703125, "p10": -0.2890762329101562, "median": 0.04342174530029297, "p90": 0.49266738891601564, "max": 0.9906845092773438, "pos_frac": 0.546875, "sample": [-0.169647216796875, -1.177032470703125, 0.0311431884765625, -0.01163482666015625, 0.133514404296875, 0.17826080322265625, -0.0533294677734375, 0.44199371337890625, 0.5227432250976562, 0.9906845092773438, 0.305206298828125, 0.3472766876220703, -0.23952674865722656, -0.20232582092285156, -0.2246246337890625, -0.4248924255371094, -0.25, 0.5233268737792969, -0.00539398193359375, 0.08001708984375, -0.05261993408203125, 0.4954986572265625, 0.48606109619140625, 0.3273277282714844, -0.716552734375, 0.1221466064453125, -0.0281982421875, 0.052349090576171875, 0.8199310302734375, 0.1668548583984375, 0.29892539978027344, -0.021883010864257812, 0.4674072265625, -0.09824371337890625, -0.291717529296875, -0.014316558837890625, 0.238067626953125, -0.0097503662109375, -0.028728485107421875, -0.3448829650878906, -0.011810302734375, -0.0864715576171875, -0.24905014038085938, 0.25742340087890625, 0.0031280517578125, 0.1286773681640625, 0.22953033447265625, 0.24530029296875, -0.1002511978149414, 0.24625778198242188, -0.31998443603515625, 0.08660888671875, -0.2829132080078125, -0.08657073974609375, -0.13329696655273438, 0.05377197265625, 0.20911026000976562, 0.6100845336914062, 0.2353515625, -0.347900390625, 0.097015380859375, 0.5855369567871094, 0.03449440002441406, 0.0753326416015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000033.npy"}
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": 0.04163375496864319, "std": 0.3270690143108368, "min": -0.8069534301757812, "p10": -0.38298797607421875, "median": 0.04393577575683594, "p90": 0.343951416015625, "max": 0.97918701171875, "pos_frac": 0.546875, "sample": [-0.13335037231445312, 0.5102920532226562, 0.34255218505859375, 0.33111000061035156, -0.2014923095703125, -0.029598236083984375, 0.10453987121582031, 0.12563705444335938, 0.14934921264648438, 0.32117462158203125, 0.92120361328125, 0.056060791015625, 0.06829833984375, -0.0020751953125, -0.3768463134765625, -0.6219711303710938, 0.34455108642578125, 0.10128021240234375, 0.15878677368164062, 0.04406929016113281, -0.3856201171875, -0.009462356567382812, 0.1943817138671875, -0.129119873046875, -0.03308868408203125, -0.02536773681640625, 0.2082061767578125, 0.9110488891601562, -0.47310638427734375, -0.009462356567382812, 0.1033782958984375, 0.17958831787109375, 0.13612937927246094, -0.015254974365234375, -0.19021987915039062, 0.3600006103515625, -0.4117584228515625, 0.018571853637695312, -0.04907989501953125, 0.48519134521484375, -0.0879974365234375, -0.12406539916992188, 0.04380226135253906, 0.07560348510742188, 0.333038330078125, 0.97918701171875, -0.6260986328125, 0.2362518310546875, 0.060909271240234375, -0.17078590393066406, -0.12391853332519531, -0.8069534301757812, 0.27271270751953125, 0.0004119873046875, 0.09636688232421875, -0.2584266662597656, 0.299224853515625, -0.13718414306640625, -0.13524627685546875, 0.1553325653076172, 0.15985870361328125, -0.1841888427734375, -0.06285858154296875, -0.40894317626953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000034.npy"}
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": -0.037748783826828, "std": 0.3217014968395233, "min": -0.816986083984375, "p10": -0.4215473175048828, "median": -0.011260986328125, "p90": 0.24770431518554697, "max": 0.7453231811523438, "pos_frac": 0.484375, "sample": [0.18575286865234375, -0.59765625, 0.1725902557373047, 0.1392669677734375, -0.23733139038085938, 0.6728668212890625, -0.4052238464355469, -0.00038909912109375, 0.15424346923828125, -0.28554534912109375, 0.06298828125, 0.1704387664794922, 0.10688018798828125, 0.024686813354492188, -0.0748291015625, 0.15613937377929688, 0.59912109375, -0.15753936767578125, 0.7453231811523438, 0.056690216064453125, 0.5178718566894531, -0.041721343994140625, 0.3428153991699219, -0.3620758056640625, -0.04309654235839844, -0.3918609619140625, -0.7547149658203125, -0.284149169921875, -0.27397918701171875, -0.04827117919921875, -0.816986083984375, 0.14202117919921875, -0.1886444091796875, -0.1582183837890625, -0.3333740234375, -0.72723388671875, -0.16091537475585938, -0.2982635498046875, 0.19278717041015625, -0.4576759338378906, -0.07596969604492188, 0.18428802490234375, 0.22547149658203125, -0.3188743591308594, 0.19454193115234375, -0.02213287353515625, 0.3425636291503906, -0.3095550537109375, 0.02948760986328125, 0.12293243408203125, -0.11533164978027344, -0.043304443359375, 0.08615875244140625, -0.12432861328125, -0.5948333740234375, 0.2080230712890625, -0.20172119140625, -0.4285430908203125, 0.19673538208007812, 0.1591796875, 0.20161819458007812, 0.1461334228515625, 0.12151718139648438, 0.257232666015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000035.npy"}
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": 0.009210050106048584, "std": 0.3933529257774353, "min": -0.9165840148925781, "p10": -0.522244644165039, "median": 0.008199691772460938, "p90": 0.4710021972656252, "max": 1.190216064453125, "pos_frac": 0.515625, "sample": [-0.21356201171875, 1.190216064453125, -0.1127166748046875, 0.008466720581054688, -0.3175468444824219, -0.9165840148925781, 0.28382110595703125, 0.19860458374023438, 0.34983253479003906, -0.18027496337890625, 0.4111480712890625, 0.11857032775878906, 0.2587127685546875, -0.5540771484375, -0.046405792236328125, -0.02400970458984375, 0.174041748046875, -0.310546875, -0.15394973754882812, 0.10482406616210938, 0.28262901306152344, 0.236572265625, 0.168060302734375, 0.0738525390625, -0.19809341430664062, -0.6619720458984375, -0.09078216552734375, 0.35882568359375, 0.5190391540527344, -0.57537841796875, -0.14371490478515625, 0.09876632690429688, -0.3086204528808594, 0.092437744140625, -0.733428955078125, -0.20088958740234375, -0.2465362548828125, 0.6901092529296875, -0.40571022033691406, -0.5381393432617188, 0.2840003967285156, 0.521148681640625, 0.8497390747070312, -0.4248981475830078, 0.1293182373046875, 0.04688453674316406, 0.4218902587890625, 0.4032096862792969, 0.26598358154296875, 0.6597938537597656, 0.14404296875, -0.16302490234375, -0.03144645690917969, -0.12586212158203125, -0.4851570129394531, -0.167755126953125, -0.310638427734375, 0.232147216796875, -0.729949951171875, 0.07220458984375, -0.16947364807128906, 0.4920501708984375, 0.007932662963867188, -0.01828765869140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000036.npy"}
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": 0.10025274753570557, "std": 0.45199495553970337, "min": -1.5465087890625, "p10": -0.36084251403808587, "median": 0.07401275634765625, "p90": 0.669640350341797, "max": 1.347991943359375, "pos_frac": 0.59375, "sample": [-0.2745819091796875, 0.0825958251953125, 0.419403076171875, 0.03482818603515625, 0.467254638671875, -0.3844871520996094, -0.1100311279296875, 0.10669708251953125, 0.2798748016357422, 0.09500885009765625, 0.526947021484375, 0.14879608154296875, 0.3849830627441406, 0.784423828125, 0.24100494384765625, -0.5826339721679688, 0.09309768676757812, 0.24606704711914062, 0.4044914245605469, 0.774017333984375, -0.2999897003173828, -0.41510963439941406, -1.5465087890625, -0.09177780151367188, -0.22638511657714844, -0.106353759765625, -0.30567169189453125, -0.23706817626953125, 0.04296112060546875, -0.20534515380859375, 0.1372528076171875, 0.1281585693359375, 0.8679885864257812, 0.08736419677734375, 1.0182037353515625, 0.6305313110351562, 0.0237884521484375, 0.026165008544921875, -0.10530662536621094, -0.000301361083984375, 1.347991943359375, -0.42255401611328125, -0.08916091918945312, 0.2513580322265625, -0.49609375, 0.09616661071777344, 0.276947021484375, 0.6864013671875, -0.0121307373046875, -0.79034423828125, 0.06096839904785156, -0.13656234741210938, -0.03045654296875, -0.05919075012207031, -0.0870361328125, 0.5468559265136719, 0.0654296875, 0.22864913940429688, 0.18682861328125, 0.434600830078125, 0.4573822021484375, -0.2728118896484375, -0.051761627197265625, 1.0643463134765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000037.npy"}
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": -0.02763056755065918, "std": 0.3937397301197052, "min": -0.7192001342773438, "p10": -0.5533683776855468, "median": -0.02515125274658203, "p90": 0.41493663787841817, "max": 1.2531700134277344, "pos_frac": 0.46875, "sample": [-0.1368408203125, 0.5613307952880859, 0.062591552734375, 1.2531700134277344, -0.4771728515625, 0.24454498291015625, -0.06748199462890625, 0.2747611999511719, -0.5353851318359375, 0.1464080810546875, -0.03422737121582031, 0.10624313354492188, 0.22069549560546875, 0.004390716552734375, -0.14426422119140625, -0.3375701904296875, -0.05916595458984375, 0.059696197509765625, -0.34436798095703125, 0.06865692138671875, -0.18378448486328125, -0.4815826416015625, -0.5599899291992188, -0.7192001342773438, -0.16083145141601562, -0.0109710693359375, 0.4841766357421875, -0.6911811828613281, 0.11576080322265625, -0.29225921630859375, -0.1708507537841797, -0.6925888061523438, 0.11643791198730469, 0.17645645141601562, 0.9866714477539062, 0.29693603515625, -0.28850555419921875, -0.6154308319091797, 0.01665496826171875, -0.0329132080078125, 0.17608642578125, 0.4373779296875, 0.07508087158203125, -0.08281326293945312, 0.2776908874511719, -0.2996368408203125, -0.32106781005859375, -0.582275390625, 0.27117919921875, 0.87158203125, -0.5379180908203125, -0.2523193359375, 0.49280548095703125, -0.25820159912109375, -0.017389297485351562, 0.36257362365722656, 0.02957916259765625, -0.13975906372070312, -0.5801239013671875, 0.10640335083007812, 0.21024322509765625, -0.080047607421875, 0.23489761352539062, -0.3213214874267578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000038.npy"}
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.06974801421165466, "std": 0.3510797917842865, "min": -0.8198013305664062, "p10": -0.3922958374023437, "median": 0.09930038452148438, "p90": 0.3978921890258789, "max": 0.994903564453125, "pos_frac": 0.59375, "sample": [0.3433837890625, -0.41644287109375, 0.08939361572265625, 0.1998748779296875, -0.3359527587890625, -0.4842987060546875, 0.3267364501953125, 0.36530113220214844, 0.3952178955078125, -0.008760452270507812, -0.02036285400390625, -0.1692676544189453, 0.3517322540283203, -0.09065818786621094, -0.0157928466796875, 0.00643157958984375, -0.020524978637695312, 0.19573020935058594, 0.11508369445800781, -0.0812835693359375, -0.68402099609375, -0.321868896484375, -0.13969802856445312, 0.2789497375488281, 0.994903564453125, -0.08073806762695312, -0.5294342041015625, 0.39903831481933594, 0.31237030029296875, -0.179290771484375, 0.16746902465820312, 0.5341339111328125, 0.2352752685546875, -0.16073226928710938, 0.10946273803710938, 0.4407539367675781, 0.063568115234375, 0.627960205078125, 0.41652679443359375, -0.6675300598144531, 0.1980438232421875, -0.7584609985351562, -0.028310775756835938, 0.26776123046875, 0.323089599609375, -0.1576080322265625, 0.01568603515625, -0.01678466796875, 0.10358810424804688, 0.1504535675048828, 0.99285888671875, -0.8198013305664062, 0.17527008056640625, -0.1318187713623047, 0.3224639892578125, 0.308807373046875, 0.118560791015625, 0.0659942626953125, 0.2826957702636719, -0.13941574096679688, 0.33536529541015625, 0.09501266479492188, -0.023347854614257812, 0.22113037109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000039.npy"}
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": 0.07750925421714783, "std": 0.35122382640838623, "min": -0.7952423095703125, "p10": -0.3852371215820312, "median": 0.07974433898925781, "p90": 0.5287384033203126, "max": 0.7517967224121094, "pos_frac": 0.59375, "sample": [-0.32541656494140625, 0.6032180786132812, 0.3741302490234375, 0.35350608825683594, 0.15937232971191406, 0.24895286560058594, 0.3241615295410156, -0.230255126953125, -0.25640869140625, 0.5409469604492188, 0.43468475341796875, 0.051013946533203125, -0.143890380859375, -0.20849227905273438, -0.07966041564941406, 0.006885528564453125, 0.0812530517578125, 0.4833221435546875, 0.26003456115722656, 0.03848457336425781, 0.07823562622070312, 0.7169342041015625, 0.3213005065917969, 0.4393424987792969, 0.7517967224121094, 0.39935302734375, 0.01305389404296875, 0.5002517700195312, 0.6327667236328125, -0.11716651916503906, 0.413726806640625, -0.7952423095703125, 0.588043212890625, -0.3975830078125, 0.24278831481933594, 0.15715789794921875, -0.04095458984375, 0.3126945495605469, -0.416961669921875, 0.603973388671875, -0.16427993774414062, -0.303009033203125, 0.019323348999023438, 0.272430419921875, -0.6367034912109375, 0.11715316772460938, 0.23675155639648438, -0.4117317199707031, -0.11958694458007812, -0.2943115234375, 0.29364013671875, -0.3564300537109375, -0.3551292419433594, -0.13680267333984375, 0.08526611328125, -0.4803581237792969, -0.481903076171875, 0.49922943115234375, 0.1665191650390625, -0.1498260498046875, 0.2092761993408203, -0.011198043823242188, -0.0022430419921875, -0.15483856201171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000040.npy"}
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.013299375772476196, "std": 0.4149687886238098, "min": -1.218414306640625, "p10": -0.5056732177734374, "median": 0.017111778259277344, "p90": 0.5666023254394534, "max": 0.9765663146972656, "pos_frac": 0.53125, "sample": [0.18627166748046875, 0.1488494873046875, -0.018438339233398438, -0.640350341796875, -0.19830322265625, -0.45489501953125, -0.16417694091796875, 0.9765663146972656, 0.03951263427734375, -0.2541351318359375, 0.12397003173828125, 0.0467529296875, 0.16394805908203125, 0.8979949951171875, 0.374176025390625, 0.076934814453125, -0.2817096710205078, 0.8845672607421875, -0.225341796875, -0.14153480529785156, 0.48969268798828125, -1.218414306640625, 0.11404991149902344, -0.107421875, 0.09631538391113281, 0.0081634521484375, -0.527435302734375, 0.2723846435546875, 0.052074432373046875, -0.62249755859375, 0.056308746337890625, 0.7721710205078125, 0.5995635986328125, 0.6128997802734375, -0.07946395874023438, -0.11371612548828125, 0.24330902099609375, -0.19371795654296875, 0.19628143310546875, -0.06599617004394531, -0.15755081176757812, -0.6755218505859375, -0.14434814453125, -0.0068511962890625, 0.023651123046875, 0.010572433471679688, 0.20775985717773438, 0.0984039306640625, 0.6388092041015625, 0.2283935546875, -0.442169189453125, -0.863037109375, -0.18796920776367188, -0.06874847412109375, -0.1349334716796875, -0.104705810546875, 0.3530426025390625, 0.25708770751953125, 0.32065391540527344, -0.3846588134765625, 0.40268707275390625, 0.29099273681640625, -0.1541309356689453, -0.7814788818359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000041.npy"}
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.044570907950401306, "std": 0.38386741280555725, "min": -1.0023651123046875, "p10": -0.42126617431640623, "median": 0.034343719482421875, "p90": 0.4959812164306642, "max": 1.2247161865234375, "pos_frac": 0.5625, "sample": [0.73565673828125, -0.009124755859375, -0.04010009765625, -0.098541259765625, -0.18993186950683594, -0.8635025024414062, 0.12957382202148438, -0.1393442153930664, -0.454742431640625, 0.3993568420410156, -0.13561248779296875, 0.135711669921875, -0.09894561767578125, 0.1181640625, -0.25818634033203125, 0.5064582824707031, 1.2247161865234375, 0.5435638427734375, 0.09544754028320312, 0.06810760498046875, 0.23886871337890625, 0.47153472900390625, 0.18973541259765625, -0.16897964477539062, -0.34026336669921875, 0.040740966796875, 0.21108627319335938, 0.01720428466796875, 0.00225830078125, -0.32204246520996094, 0.3022003173828125, 0.7373027801513672, 0.021238327026367188, -0.16329193115234375, 0.3144073486328125, 0.328704833984375, 0.7028274536132812, -0.02916717529296875, -0.39276885986328125, -0.02304840087890625, -0.20795822143554688, -0.27490234375, -1.0023651123046875, -0.6255950927734375, -0.194549560546875, -0.0059814453125, 0.43088531494140625, -0.3099517822265625, 0.07704544067382812, -0.45214080810546875, 0.02794647216796875, 0.43637847900390625, 0.4495391845703125, 0.13060379028320312, 0.09123420715332031, 0.12351226806640625, -0.43347930908203125, 0.09428215026855469, 0.63128662109375, 0.43624114990234375, 0.08263587951660156, 0.06606864929199219, -0.46999359130859375, -0.055477142333984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000042.npy"}
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": 0.11562475562095642, "std": 0.42908158898353577, "min": -1.0069808959960938, "p10": -0.3088327407836914, "median": 0.1329193115234375, "p90": 0.540719223022461, "max": 1.4279327392578125, "pos_frac": 0.59375, "sample": [-0.11348724365234375, -0.02471160888671875, 0.118316650390625, 0.32761192321777344, 0.1310882568359375, -0.4897003173828125, -0.15320205688476562, 0.2576446533203125, 0.341796875, -0.1438140869140625, 0.22538375854492188, 0.7385406494140625, 0.204986572265625, -0.2648048400878906, 1.3892822265625, 0.1626129150390625, 0.06076622009277344, 0.2998313903808594, 0.24202728271484375, -0.0334930419921875, 0.08581161499023438, -0.5922393798828125, -0.3244476318359375, -0.07214736938476562, 1.0173530578613281, -0.2723979949951172, 0.1482219696044922, 0.1347503662109375, -0.050075531005859375, 0.329864501953125, 0.548614501953125, 0.2695770263671875, 0.3542938232421875, -0.155975341796875, -0.07171440124511719, 0.4354095458984375, 0.423614501953125, -0.19844818115234375, 0.5222969055175781, 0.7147979736328125, 0.22515106201171875, -0.21382904052734375, -0.021327972412109375, -1.0069808959960938, 0.0038051605224609375, 1.4279327392578125, -0.0239410400390625, 0.3852729797363281, -0.25263023376464844, 0.196563720703125, 0.14624595642089844, 0.7639389038085938, 0.40483856201171875, -0.24909210205078125, 0.0399322509765625, 0.26177215576171875, -0.45319557189941406, 0.4613189697265625, 0.1877899169921875, 0.230865478515625, -0.7150421142578125, -0.23966217041015625, -0.077423095703125, -0.6061553955078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000043.npy"}
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": 0.017076104879379272, "std": 0.42561137676239014, "min": -1.1875, "p10": -0.4746711730957031, "median": 0.00727081298828125, "p90": 0.43104782104492195, "max": 1.7065505981445312, "pos_frac": 0.5, "sample": [-0.3322296142578125, -0.4084587097167969, -0.583282470703125, -0.11107444763183594, 0.34340667724609375, -0.62481689453125, 0.2116241455078125, -0.39800262451171875, -0.43741607666015625, -0.46295928955078125, 0.38919830322265625, 0.3575592041015625, -1.1875, 0.15029144287109375, 0.08248519897460938, -0.13641357421875, 0.21054649353027344, -0.11639404296875, 0.12966537475585938, 0.13621139526367188, -0.1182098388671875, 0.10387802124023438, 0.43898773193359375, -0.4796905517578125, -0.18218612670898438, 0.3541412353515625, 0.4978485107421875, -0.5101089477539062, 0.4125213623046875, -0.28757476806640625, 0.108917236328125, -0.0467681884765625, 0.07663726806640625, 0.14360809326171875, -0.09418106079101562, 0.108917236328125, 1.7065505981445312, 0.12799072265625, -0.16646575927734375, -0.29474639892578125, -0.09698486328125, 0.0273895263671875, -0.140228271484375, -0.00301361083984375, 0.6563262939453125, 0.4428253173828125, -0.2841644287109375, 0.3571624755859375, -0.30939483642578125, 0.2101917266845703, -0.5423736572265625, -0.10042572021484375, -0.069305419921875, 0.1512298583984375, 0.01755523681640625, 0.30574798583984375, 0.44004058837890625, 1.2094268798828125, 0.3745880126953125, -0.0547027587890625, 0.1202239990234375, -0.1441192626953125, -0.04439544677734375, -0.5432357788085938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000044.npy"}
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.14694073796272278, "std": 0.3873726427555084, "min": -0.8098640441894531, "p10": -0.3259063720703125, "median": 0.15210914611816406, "p90": 0.585436248779297, "max": 1.257232666015625, "pos_frac": 0.71875, "sample": [0.11702728271484375, -0.3033905029296875, 0.2868995666503906, -0.3355560302734375, -0.07076454162597656, -0.48290252685546875, -0.2017059326171875, 0.3077392578125, -0.120941162109375, -0.693115234375, 0.21810150146484375, 0.38006591796875, 0.4690589904785156, 0.01657867431640625, 0.06696319580078125, -0.22504043579101562, 0.18582725524902344, 0.5490188598632812, 0.12769317626953125, 0.5172500610351562, -0.1287841796875, 0.0948638916015625, 0.51202392578125, 0.08085823059082031, 0.4077606201171875, 0.482086181640625, 0.0330047607421875, 0.2761268615722656, 0.48223876953125, 0.6020698547363281, 0.1448516845703125, 0.601043701171875, -0.7951126098632812, 0.15936660766601562, 0.3429679870605469, 1.257232666015625, 0.0727081298828125, 0.21129608154296875, -0.390533447265625, 0.09873580932617188, -0.8098640441894531, 0.68353271484375, -0.138336181640625, 0.05748939514160156, 0.21686553955078125, 0.08879280090332031, 0.192657470703125, 0.02605438232421875, 0.2105712890625, 0.6888580322265625, -0.24968719482421875, -0.06167030334472656, 0.42073822021484375, 0.21239471435546875, -0.28769874572753906, 0.8543472290039062, 1.0010223388671875, 0.22139739990234375, 0.2484912872314453, -0.05715370178222656, 0.1388702392578125, 0.22406005859375, 0.5060272216796875, -0.33716583251953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000045.npy"}
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.174896240234375, "std": 0.4413764178752899, "min": -0.851959228515625, "p10": -0.31275749206542963, "median": 0.18408584594726562, "p90": 0.719585418701172, "max": 1.2734222412109375, "pos_frac": 0.640625, "sample": [0.4205474853515625, -0.37371826171875, -0.5093460083007812, 0.27042388916015625, 0.4193572998046875, -0.0150146484375, 0.06901931762695312, -0.6383895874023438, -0.00141143798828125, -0.851959228515625, 0.2606964111328125, 0.19117355346679688, 0.3727874755859375, 0.9047775268554688, -0.24103546142578125, 0.30218505859375, -0.7519989013671875, 0.3089752197265625, 0.4339580535888672, 0.3045654296875, 0.2671031951904297, 1.2734222412109375, -0.2208690643310547, 0.25041961669921875, 0.6600189208984375, 0.14190673828125, 0.6017532348632812, 0.6151351928710938, -0.1649169921875, 0.0880584716796875, 0.41872406005859375, 0.6838302612304688, -0.26104736328125, -0.0937042236328125, -0.3349189758300781, 1.165557861328125, -0.14481163024902344, -0.146453857421875, -0.49659156799316406, 1.1084861755371094, 0.7967529296875, -0.20159149169921875, 0.18862152099609375, 0.1795501708984375, 0.5379104614257812, 0.08840560913085938, -0.1495819091796875, 0.02734375, -0.02558135986328125, 0.09184646606445312, 0.3908958435058594, 0.07739830017089844, -0.1224822998046875, 0.23772048950195312, 0.7349090576171875, 1.101318359375, -0.09246826171875, 0.1741790771484375, -0.19733238220214844, 0.5050811767578125, 0.2169342041015625, 0.241729736328125, -0.1276531219482422, 0.232757568359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000046.npy"}
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.18104803562164307, "std": 0.4068479835987091, "min": -0.7953643798828125, "p10": -0.3264484405517578, "median": 0.16492462158203125, "p90": 0.7106779098510745, "max": 1.08013916015625, "pos_frac": 0.65625, "sample": [0.21934127807617188, 0.0853424072265625, 0.07007217407226562, -0.5759201049804688, 0.5616073608398438, 0.3233680725097656, -7.05718994140625e-05, 0.8174095153808594, -0.21956634521484375, 0.03476715087890625, -0.11165046691894531, -0.39657020568847656, -0.255645751953125, -0.1425933837890625, 0.406097412109375, 0.42368316650390625, 0.2530536651611328, 0.01690673828125, 0.0567626953125, -0.3432464599609375, -0.3222694396972656, 0.0772705078125, 0.052883148193359375, 0.31962013244628906, -0.22605133056640625, 0.6028289794921875, 0.5123310089111328, -0.05207633972167969, -0.06209564208984375, 0.3160057067871094, -0.18419647216796875, 0.608795166015625, 0.03519439697265625, 0.9265213012695312, -0.370513916015625, 0.2022857666015625, 0.5571022033691406, 0.6074752807617188, 0.6510143280029297, 0.7879486083984375, -0.7329864501953125, 0.819061279296875, 0.28792572021484375, 0.458099365234375, 0.7362480163574219, 0.136993408203125, 1.08013916015625, -0.15777587890625, 0.3627891540527344, 0.5530757904052734, -0.002166748046875, 0.6077117919921875, -0.32823944091796875, -0.7953643798828125, -0.18880844116210938, 0.6042213439941406, 0.026020050048828125, 0.7484931945800781, 0.3205451965332031, -0.1065673828125, 0.1928558349609375, 0.2530975341796875, 0.4830169677734375, -0.03453254699707031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000047.npy"}
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.010134011507034302, "std": 0.45008012652397156, "min": -1.2618637084960938, "p10": -0.4931585311889648, "median": 0.06311798095703125, "p90": 0.5544891357421875, "max": 1.108856201171875, "pos_frac": 0.578125, "sample": [0.1326160430908203, -0.24700927734375, 0.17457008361816406, 0.346099853515625, 0.03578948974609375, 0.6412582397460938, -0.736358642578125, 0.2438812255859375, -0.39501190185546875, 0.01132965087890625, 0.1036376953125, 0.32000160217285156, -0.43445587158203125, -1.2618637084960938, 0.0395355224609375, 0.85699462890625, 0.22007369995117188, 0.160980224609375, -0.5728302001953125, 0.5419387817382812, -0.12392044067382812, -0.4649696350097656, -0.31375885009765625, 0.3565635681152344, 0.117034912109375, 0.26453399658203125, 0.0884552001953125, 0.12320327758789062, -0.6977691650390625, 0.7705459594726562, -0.21640777587890625, -0.346649169921875, -0.021776199340820312, 0.059803009033203125, 0.2842903137207031, 0.29203033447265625, 0.06643295288085938, -0.15875244140625, 1.0337181091308594, -0.22282028198242188, -0.3849945068359375, 0.17801666259765625, -0.04773712158203125, 0.4502143859863281, -0.9362869262695312, 0.13800430297851562, 0.21255874633789062, 0.5598678588867188, -0.260498046875, -0.4388294219970703, 0.3521728515625, -0.45023345947265625, -0.5052394866943359, 0.12463951110839844, -0.42539405822753906, 0.11186027526855469, 1.108856201171875, -0.10129165649414062, 0.0302886962890625, -0.4295539855957031, 0.11436653137207031, 0.8322906494140625, -0.1281452178955078, -0.5273208618164062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000048.npy"}
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": 0.12266790866851807, "std": 0.4459313750267029, "min": -1.2774810791015625, "p10": -0.3360477447509765, "median": 0.11316680908203125, "p90": 0.6053810119628906, "max": 1.4867401123046875, "pos_frac": 0.5625, "sample": [-0.0532989501953125, -0.039215087890625, -0.015186309814453125, -0.001983642578125, 0.3515625, -0.0291748046875, -0.2298870086669922, 0.28066062927246094, 0.373748779296875, 0.2558441162109375, 0.07702255249023438, -0.020046234130859375, 0.047698974609375, 0.5170364379882812, -0.10086250305175781, -0.1382923126220703, -0.3613243103027344, -0.72296142578125, 0.14261436462402344, -0.867431640625, 0.283233642578125, 0.44753456115722656, 0.13921356201171875, 0.2581939697265625, 0.19705581665039062, 0.7432327270507812, -0.01491546630859375, -0.032726287841796875, 0.1385955810546875, -0.5303497314453125, 0.26763343811035156, 0.4445381164550781, -0.15177536010742188, 0.6507797241210938, 0.38262176513671875, 1.4867401123046875, -0.15778350830078125, 0.09063720703125, 0.5948333740234375, 0.8453903198242188, 0.2528076171875, -0.10167694091796875, -0.0599212646484375, 0.5197601318359375, -0.2611846923828125, 0.08971405029296875, -0.0027313232421875, 0.41283226013183594, 0.261993408203125, -0.1526927947998047, 0.6099014282226562, 0.7645645141601562, -1.2774810791015625, -0.6547088623046875, -0.054355621337890625, -0.402099609375, 0.4925880432128906, 1.1345062255859375, 0.44809722900390625, 0.15531349182128906, -0.25655364990234375, 0.1356964111328125, 0.5242385864257812, -0.277069091796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000049.npy"}
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": 0.1348104178905487, "std": 0.345580518245697, "min": -0.7747726440429688, "p10": -0.30967636108398433, "median": 0.19187164306640625, "p90": 0.56215877532959, "max": 0.952301025390625, "pos_frac": 0.640625, "sample": [0.22951507568359375, 0.4187469482421875, 0.4253082275390625, 0.2559394836425781, 0.380706787109375, 0.0218963623046875, 0.27972412109375, -0.2364501953125, 0.8169708251953125, 0.31594085693359375, -0.011045455932617188, 0.2190704345703125, -0.7747726440429688, -0.09835052490234375, -0.18712234497070312, -0.3706817626953125, 0.6094589233398438, 0.3060951232910156, 0.5918426513671875, 0.6174163818359375, 0.26584625244140625, 0.1487579345703125, -0.25212860107421875, -0.33159637451171875, 0.09918594360351562, 0.952301025390625, 0.07923126220703125, 0.3198738098144531, 0.08861923217773438, 0.24237823486328125, 0.5247440338134766, 0.14980697631835938, 0.5100307464599609, 0.39313697814941406, -0.46743011474609375, -0.2585296630859375, 0.4839935302734375, 0.85638427734375, 0.30966758728027344, -0.08693695068359375, 0.5781936645507812, 0.3549652099609375, 0.25698089599609375, -0.2585296630859375, -0.21822738647460938, 0.31186866760253906, 0.3209991455078125, 0.2639617919921875, -0.066619873046875, 0.30478668212890625, 0.1646728515625, -0.00930023193359375, -0.06736564636230469, -0.011951446533203125, 0.28894996643066406, 0.04691314697265625, -0.3916206359863281, -0.24780654907226562, -0.392181396484375, -0.20814895629882812, 0.24353599548339844, 0.03810882568359375, -0.03063201904296875, -0.481231689453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000050.npy"}
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": 0.2530254125595093, "std": 0.5203163623809814, "min": -0.6180458068847656, "p10": -0.35348434448242183, "median": 0.14400386810302734, "p90": 0.8393318176269531, "max": 2.25164794921875, "pos_frac": 0.6875, "sample": [0.701385498046875, 0.3000621795654297, -0.10471725463867188, -0.07587432861328125, 0.25650787353515625, 0.382415771484375, 0.059490203857421875, 0.0058746337890625, -0.5282440185546875, -0.1261138916015625, -0.10416221618652344, 0.6033821105957031, -0.09211349487304688, 0.92840576171875, -0.0496063232421875, 0.04669189453125, 0.7774276733398438, -0.27899169921875, 0.41486358642578125, 1.3057861328125, -0.187286376953125, 0.5507469177246094, 0.7171173095703125, 0.8807220458984375, 0.23563385009765625, -0.38690185546875, 0.2467803955078125, -0.33667755126953125, 0.08087730407714844, 0.23386573791503906, 0.076873779296875, 1.595123291015625, 0.11664390563964844, 0.396331787109375, -0.4748992919921875, -0.360687255859375, 0.14817428588867188, -0.2973518371582031, -0.042301177978515625, 0.2684288024902344, 0.12427139282226562, 0.10748291015625, 0.3708763122558594, 0.839080810546875, -0.1652069091796875, 0.17795181274414062, -0.6180458068847656, 2.25164794921875, 0.8041839599609375, 0.8394393920898438, 0.5484027862548828, 0.715789794921875, 0.4854888916015625, 0.02664947509765625, 0.10234832763671875, -0.0087738037109375, 0.309356689453125, 1.0370712280273438, 0.10703468322753906, 0.25177764892578125, -0.4190940856933594, 0.8133087158203125, 0.1398334503173828, -0.5309333801269531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000051.npy"}
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.27468955516815186, "std": 0.5739064812660217, "min": -0.6094093322753906, "p10": -0.4054786682128906, "median": 0.16887569427490234, "p90": 1.1553054809570313, "max": 1.5068435668945312, "pos_frac": 0.609375, "sample": [1.5068435668945312, 1.0594024658203125, 0.563812255859375, 1.175384521484375, 1.0422515869140625, -0.545623779296875, 0.4644756317138672, -0.23744964599609375, -0.056026458740234375, 1.4356613159179688, 0.923828125, 0.7247238159179688, -0.5080795288085938, 0.9202041625976562, 1.03778076171875, -0.5232086181640625, 0.743804931640625, -0.01947021484375, 0.16613388061523438, -0.24783706665039062, -0.6094093322753906, -0.24057388305664062, 0.42580413818359375, -0.12129783630371094, 1.127716064453125, 1.1671295166015625, -0.20360565185546875, -0.07747268676757812, 0.2869415283203125, -0.13998031616210938, 0.3126716613769531, -0.345550537109375, 0.00971221923828125, -0.2957134246826172, 0.1716175079345703, 1.4871444702148438, 0.06989288330078125, 0.186737060546875, 0.30126953125, -0.493743896484375, 0.08804130554199219, 0.48409271240234375, 0.022130966186523438, 0.3173675537109375, -0.16774368286132812, 0.535369873046875, -0.014026641845703125, 0.21894454956054688, 1.299560546875, -0.2506103515625, 0.9236106872558594, -0.48554229736328125, -0.351715087890625, 0.5643539428710938, 0.4817047119140625, -0.42852020263671875, 1.2868804931640625, -0.21316146850585938, 0.08758544921875, 0.052112579345703125, 0.3617095947265625, -0.03964805603027344, 0.20004844665527344, -0.0383148193359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000052.npy"}
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.13352787494659424, "std": 0.4287063777446747, "min": -1.0623397827148438, "p10": -0.3920974731445312, "median": 0.17315673828125, "p90": 0.6182062149047854, "max": 1.1612815856933594, "pos_frac": 0.65625, "sample": [0.23514556884765625, -0.004108428955078125, 0.4584503173828125, 0.1414031982421875, 0.2973442077636719, 0.2420520782470703, 0.36890411376953125, 0.032073974609375, 0.417266845703125, -0.0455169677734375, 0.1734619140625, -0.405029296875, -0.15004730224609375, 0.5479869842529297, -0.1997528076171875, 0.19948959350585938, 0.1554889678955078, 0.6483001708984375, -0.3619232177734375, -1.0623397827148438, -0.07285118103027344, 0.05789947509765625, 0.2582244873046875, 0.6923904418945312, -0.7978897094726562, 0.12331390380859375, -0.43367767333984375, 0.4508171081542969, 0.40648841857910156, 0.15969085693359375, 0.1815967559814453, -0.31976318359375, -0.02162933349609375, -0.16031646728515625, 0.27263641357421875, 0.521942138671875, 0.3830432891845703, 1.1612815856933594, -0.5214042663574219, 0.08998298645019531, -0.12214469909667969, -0.28441619873046875, 0.3151092529296875, -0.7828216552734375, 0.3004913330078125, 0.732666015625, 0.19965362548828125, 0.40270233154296875, 0.4230537414550781, 1.13665771484375, 0.21611785888671875, 0.4462013244628906, -0.25566864013671875, 0.4152641296386719, -0.16693878173828125, -0.4294395446777344, 0.79522705078125, 0.004047393798828125, 1.0826797485351562, 0.21718597412109375, -0.20086288452148438, 0.1728515625, -0.23886489868164062, 0.04660606384277344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000053.npy"}
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.1580154299736023, "std": 0.432833731174469, "min": -1.0099601745605469, "p10": -0.3797412872314453, "median": 0.14458465576171875, "p90": 0.5817491531372071, "max": 1.758514404296875, "pos_frac": 0.671875, "sample": [0.28568267822265625, 0.8112411499023438, -0.8135833740234375, 0.4100189208984375, -0.320343017578125, -0.5013504028320312, 0.4958457946777344, 0.478240966796875, 0.494964599609375, -0.25234222412109375, -0.09685134887695312, -0.032012939453125, 0.348388671875, -1.0099601745605469, 0.2545909881591797, 0.25228118896484375, 0.5865631103515625, 0.5705165863037109, 0.027099609375, 0.11182403564453125, 0.17315673828125, 1.758514404296875, 0.10049629211425781, 0.5906791687011719, -0.3499755859375, 0.4416370391845703, 0.10376358032226562, 0.0780029296875, 0.33740234375, -0.1443939208984375, 0.07193756103515625, -0.034984588623046875, 0.04496002197265625, -0.42267608642578125, 0.48639869689941406, 0.43819427490234375, 0.17095184326171875, -0.26650238037109375, 0.5521965026855469, 0.6384391784667969, 0.5264549255371094, 0.159576416015625, 0.07764053344726562, 0.7267913818359375, 0.1295928955078125, 0.7807903289794922, 0.3586997985839844, 0.4011955261230469, -0.13578033447265625, -0.025638580322265625, 0.02440643310546875, -0.131256103515625, 0.2613639831542969, 0.12306594848632812, 0.48905181884765625, -0.047760009765625, 0.4716644287109375, -0.3924980163574219, 0.32755279541015625, -0.5114593505859375, -0.121307373046875, 0.5241546630859375, -0.172454833984375, -0.5998725891113281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000054.npy"}
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.2314915508031845, "std": 0.6160315275192261, "min": -1.13580322265625, "p10": -0.43847541809082025, "median": 0.14813613891601562, "p90": 1.070774459838868, "max": 2.53094482421875, "pos_frac": 0.59375, "sample": [1.1653060913085938, 0.40529632568359375, 0.27611541748046875, 0.45244598388671875, -0.3461875915527344, -0.7340087890625, 0.5689659118652344, -0.1941680908203125, -0.1935882568359375, -0.25379180908203125, 0.22075271606445312, 0.5547332763671875, -0.849884033203125, -0.06775665283203125, -0.12936019897460938, 1.3989486694335938, 1.2272872924804688, 0.502349853515625, 0.82928466796875, -0.11341476440429688, 0.16801834106445312, -1.13580322265625, 0.7163009643554688, 0.42058372497558594, 2.53094482421875, 0.4076271057128906, 0.026170730590820312, 0.6248626708984375, -0.30925750732421875, -0.3280029296875, 0.6024322509765625, 0.6669464111328125, 0.13593292236328125, -0.48307228088378906, -0.08893585205078125, -0.05545806884765625, 0.4144430160522461, 0.16033935546875, 0.2572021484375, -0.08668136596679688, -0.19609832763671875, 0.8502006530761719, -0.0452728271484375, 0.8131103515625, 0.246307373046875, 0.06540870666503906, 1.1997146606445312, -0.47802734375, -0.5898056030273438, -0.230743408203125, 0.18415069580078125, 0.5028038024902344, -0.00655364990234375, -0.18468475341796875, -0.2765998840332031, 0.7386398315429688, 0.7664566040039062, 0.11718368530273438, 0.08700180053710938, 1.368316650390625, 0.023406982421875, -0.037017822265625, 1.1704444885253906, -0.6368026733398438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000055.npy"}
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.18476131558418274, "std": 0.4983710050582886, "min": -1.598297119140625, "p10": -0.3741657257080078, "median": 0.17078208923339844, "p90": 0.8556015014648439, "max": 1.3540802001953125, "pos_frac": 0.609375, "sample": [-0.18904876708984375, 0.5084724426269531, 0.030941009521484375, -0.4058113098144531, 0.01557159423828125, -0.0828399658203125, 0.9352645874023438, -0.3420562744140625, -0.13985443115234375, 0.046367645263671875, 0.28246307373046875, -0.266937255859375, 0.403106689453125, 0.21643829345703125, 0.26544189453125, 0.9708328247070312, -0.03975677490234375, -0.5117397308349609, 0.5258560180664062, 0.8776702880859375, -0.27935791015625, 1.2775115966796875, 0.4996223449707031, 0.39215850830078125, 0.2505931854248047, 0.5718250274658203, -0.374267578125, 0.27050018310546875, -0.15961074829101562, 0.6260833740234375, -0.3739280700683594, 0.804107666015625, 1.1698150634765625, 0.5048236846923828, 0.2765045166015625, 0.0455322265625, -1.598297119140625, 0.34499359130859375, -0.16084671020507812, 0.12512588500976562, 0.5978126525878906, 0.6167449951171875, -0.3900146484375, -0.5592575073242188, 0.301849365234375, -0.103759765625, -0.04929351806640625, -0.02099609375, -0.020076751708984375, 0.34110069274902344, 0.8917694091796875, -0.05950164794921875, 0.6118545532226562, 0.6160926818847656, 0.56109619140625, -0.1643047332763672, -0.05374908447265625, 0.01708984375, 0.3572883605957031, 0.2253875732421875, -0.24167823791503906, 1.3540802001953125, 0.10208892822265625, -0.4201698303222656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000056.npy"}
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.2588345408439636, "std": 0.5302484035491943, "min": -1.2437896728515625, "p10": -0.35002479553222654, "median": 0.3870048522949219, "p90": 0.8466167449951174, "max": 1.4091110229492188, "pos_frac": 0.671875, "sample": [-0.23489761352539062, -0.33949851989746094, 0.4448108673095703, 0.97845458984375, -0.022441864013671875, 0.5790863037109375, 0.7248916625976562, -0.35045623779296875, 0.7082443237304688, 0.61199951171875, 1.1025390625, 0.9187088012695312, -0.7885208129882812, 0.12096977233886719, 0.5251312255859375, -0.4244976043701172, 0.7374649047851562, 0.4352264404296875, 1.4091110229492188, 0.3961200714111328, 0.42254638671875, -0.679229736328125, 0.44104957580566406, 0.608795166015625, 0.27076148986816406, -0.8230609893798828, 0.11031723022460938, 0.458343505859375, 0.07504844665527344, -1.2437896728515625, 0.4486236572265625, 0.5487651824951172, 0.4553489685058594, 0.410308837890625, -0.30585479736328125, 0.8716506958007812, -0.34063720703125, 0.2812004089355469, -0.3477020263671875, 0.6554412841796875, 0.5172996520996094, -0.15869140625, -0.6567764282226562, 0.5567855834960938, -0.0999908447265625, 0.3623809814453125, 0.7767257690429688, 0.4825935363769531, 0.48990631103515625, 0.37788963317871094, 0.7882041931152344, -0.12644195556640625, -0.11814117431640625, 1.333526611328125, 0.18153953552246094, 1.1395301818847656, -0.13885498046875, 0.5618019104003906, 0.3232994079589844, 0.1527862548828125, -0.021076202392578125, -0.00691986083984375, 0.3466796875, -0.3490180969238281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000057.npy"}
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.16943949460983276, "std": 0.795191764831543, "min": -1.6763687133789062, "p10": -0.6800035476684569, "median": 0.1218109130859375, "p90": 1.0425506591796876, "max": 3.50689697265625, "pos_frac": 0.59375, "sample": [-1.6763687133789062, 0.4167022705078125, 0.48476409912109375, -0.3721733093261719, 0.0732269287109375, -0.5254058837890625, 0.2698516845703125, 1.0469818115234375, 0.0870361328125, -0.46082305908203125, 0.7751274108886719, -0.226654052734375, 1.422027587890625, 0.0257568359375, -0.4215736389160156, 0.5611991882324219, 1.1874122619628906, -0.7980804443359375, 0.117767333984375, 1.0322113037109375, 0.14391517639160156, -0.20191574096679688, 0.3024139404296875, 0.8830413818359375, -0.7504920959472656, -0.34442138671875, -0.8572158813476562, -0.2474365234375, -0.5876140594482422, 0.8776206970214844, -0.9123382568359375, -0.7127285003662109, 0.24525833129882812, 0.76165771484375, 3.50689697265625, -0.18716812133789062, 0.481292724609375, 1.6399993896484375, 1.072265625, 0.2686614990234375, -1.0302734375, 0.1674175262451172, 0.5046577453613281, -0.43097686767578125, -0.246490478515625, 1.88104248046875, -0.4020652770996094, -0.4887847900390625, 0.017681121826171875, 0.391998291015625, 0.4971160888671875, 0.8785018920898438, -0.35549163818359375, -0.3860054016113281, -0.2525444030761719, 0.6425018310546875, 0.11596107482910156, -0.30770111083984375, -0.6036453247070312, 0.891693115234375, 0.1258544921875, 0.1531982421875, 0.4080371856689453, 0.27176666259765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000058.npy"}
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.21487951278686523, "std": 0.5931217074394226, "min": -0.8118438720703125, "p10": -0.2969167709350586, "median": 0.09895896911621094, "p90": 0.8623241424560548, "max": 2.129608154296875, "pos_frac": 0.5625, "sample": [-0.089630126953125, -0.20767974853515625, -0.4315643310546875, 0.07711410522460938, -0.1363391876220703, -0.2945842742919922, 0.29325103759765625, -0.19177627563476562, 0.8203926086425781, 0.662200927734375, 2.10504150390625, -0.19371414184570312, 1.04583740234375, 0.8802947998046875, 0.33913230895996094, 0.42850303649902344, 0.69232177734375, -0.5967826843261719, 0.131927490234375, -0.0984344482421875, 0.044647216796875, 0.139312744140625, 0.2394561767578125, 0.408050537109375, -0.12216949462890625, 0.2222137451171875, -0.00943756103515625, 1.034149169921875, 2.129608154296875, 0.20455169677734375, -0.1469268798828125, 0.13501739501953125, 0.4057121276855469, -0.06286239624023438, -0.8118438720703125, -0.0051097869873046875, -0.02312469482421875, 0.2408905029296875, -0.43511199951171875, -0.1371135711669922, 1.86083984375, 0.238372802734375, 0.7589569091796875, 0.15213394165039062, -0.11832237243652344, 0.5500240325927734, -0.288818359375, 0.7099266052246094, -0.14099502563476562, 1.3201446533203125, -0.2979164123535156, 0.6303939819335938, 0.060283660888671875, -0.7216796875, 0.53411865234375, 0.1208038330078125, -0.7369384765625, -0.2327117919921875, -0.0124359130859375, -0.19944000244140625, 0.2712860107421875, 0.017364501953125, 0.6932830810546875, -0.101806640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000059.npy"}
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": -0.0033222734928131104, "std": 0.5381281971931458, "min": -1.11981201171875, "p10": -0.8321384429931641, "median": 0.11030101776123047, "p90": 0.6232620239257813, "max": 1.0255756378173828, "pos_frac": 0.515625, "sample": [0.3254280090332031, -0.32684326171875, 0.23889541625976562, 0.33319854736328125, -0.5185012817382812, 0.637786865234375, -0.7959442138671875, -0.8646316528320312, -0.010181427001953125, -0.968292236328125, 0.3681449890136719, -0.00629425048828125, -0.3626441955566406, -0.1513652801513672, -0.9856338500976562, 0.4307708740234375, -0.18462753295898438, 0.7834358215332031, 0.9500885009765625, 0.1764678955078125, -0.2111358642578125, 0.17979812622070312, -0.11676788330078125, -0.5337677001953125, 0.3113899230957031, 0.3782958984375, 0.2910614013671875, 0.09382057189941406, 0.46209716796875, -0.7720222473144531, 0.12678146362304688, -0.3123779296875, 0.5614013671875, 0.24285888671875, -0.15163421630859375, 0.3055267333984375, 0.2833080291748047, -0.682281494140625, -0.4796943664550781, -0.8326225280761719, -0.8310089111328125, 0.8053455352783203, 0.5893707275390625, -0.8516654968261719, 0.5149478912353516, 0.8108367919921875, 0.18351364135742188, -0.9146270751953125, 0.305633544921875, -0.16948509216308594, 0.31900787353515625, 0.3709869384765625, 0.33123207092285156, -0.1865692138671875, 0.9586334228515625, -0.07756996154785156, -0.02239990234375, 0.19285964965820312, 0.303314208984375, -0.2044811248779297, -1.11981201171875, -0.6411476135253906, 1.0255756378173828, -0.11841011047363281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000060.npy"}
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.08451053500175476, "std": 0.7612563967704773, "min": -1.467254638671875, "p10": -0.7656000137329101, "median": 0.15701675415039062, "p90": 1.1116863250732423, "max": 1.6868209838867188, "pos_frac": 0.53125, "sample": [0.171722412109375, 0.5016326904296875, 1.6868209838867188, 1.6461029052734375, -0.5718345642089844, 0.27793312072753906, 0.4007415771484375, 0.9068832397460938, 0.48531341552734375, -0.03403472900390625, 0.22667694091796875, -0.6029300689697266, -0.403472900390625, -0.0213775634765625, 0.4585304260253906, -0.5768413543701172, 0.15361785888671875, -0.11278152465820312, -0.05669403076171875, 0.3948631286621094, -0.12098312377929688, 0.3686332702636719, 0.42746734619140625, 0.110260009765625, -0.051303863525390625, -0.093963623046875, -0.6749267578125, -1.467254638671875, 0.21160888671875, -0.4450225830078125, 0.6945457458496094, -0.4107208251953125, -0.5756988525390625, 0.1604156494140625, -0.5727691650390625, 0.1785736083984375, -0.3252105712890625, -0.6830654144287109, -0.40863800048828125, 1.508209228515625, 1.6293792724609375, -1.4337997436523438, 0.35642242431640625, -1.3702430725097656, 0.412506103515625, 1.0711593627929688, 0.2181262969970703, 0.29785919189453125, -0.02568817138671875, -0.2776145935058594, 1.1290550231933594, -0.8009719848632812, -0.2128143310546875, 0.2135467529296875, 1.4228973388671875, -1.1742477416992188, 1.3779754638671875, 0.52734375, 0.5738677978515625, -0.6033782958984375, 0.751800537109375, 1.0250396728515625, -1.2929878234863281, -1.1675872802734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000061.npy"}
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.19806808233261108, "std": 0.6960156559944153, "min": -2.183074951171875, "p10": -0.6486640930175781, "median": 0.16861724853515625, "p90": 1.0484264373779297, "max": 1.9752655029296875, "pos_frac": 0.671875, "sample": [0.9214096069335938, -0.9057331085205078, -0.7818527221679688, 0.39893150329589844, 0.504058837890625, 0.07047271728515625, 1.0606956481933594, 0.023057937622070312, -0.29431915283203125, -0.4561614990234375, 1.1857032775878906, -0.20649147033691406, 1.1011886596679688, -1.1544189453125, 0.17661285400390625, -0.00127410888671875, 0.5570831298828125, 0.6773681640625, 0.18162155151367188, 0.8203582763671875, -0.6756515502929688, -0.16890716552734375, -0.585693359375, 0.6335411071777344, 0.2777595520019531, -0.1076507568359375, -0.7685546875, 0.42020416259765625, 0.412567138671875, 0.16062164306640625, -0.5471649169921875, 0.11492347717285156, -0.006259918212890625, 0.0368804931640625, 1.19964599609375, -2.183074951171875, -0.46661376953125, 0.9567737579345703, 0.15492630004882812, 0.071533203125, 0.16046524047851562, 0.6479568481445312, 0.4913482666015625, 0.7212295532226562, 0.6684036254882812, -0.10718154907226562, 0.46685791015625, -0.4130439758300781, 0.34295654296875, 1.1061248779296875, 0.1003875732421875, 0.387542724609375, 0.4732017517089844, 0.2592010498046875, -0.08415985107421875, 0.13608551025390625, 1.9752655029296875, 1.8653411865234375, 0.026523590087890625, -0.5718154907226562, 1.0197982788085938, 0.7132797241210938, -0.8372039794921875, 0.3196754455566406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000062.npy"}
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.43434983491897583, "std": 0.5089176893234253, "min": -0.8258209228515625, "p10": -0.05956344604492187, "median": 0.3561973571777344, "p90": 0.9697654724121095, "max": 2.110595703125, "pos_frac": 0.875, "sample": [0.5088729858398438, 0.6727676391601562, 0.0111236572265625, 0.31642913818359375, 1.5894775390625, -0.2600994110107422, -0.3848552703857422, 2.110595703125, 0.9317779541015625, 0.11841964721679688, -0.8258209228515625, -0.09405136108398438, 0.3067207336425781, -0.6651153564453125, 0.9860458374023438, 0.31597900390625, 0.6911468505859375, 0.022613525390625, 0.16765213012695312, 0.45802879333496094, -0.33779144287109375, 0.10067367553710938, 0.4246025085449219, 0.76165771484375, 0.8978958129882812, 0.4239997863769531, 0.313812255859375, 0.1553974151611328, 0.2648658752441406, 0.77508544921875, 0.4956817626953125, 0.5416259765625, 0.7212448120117188, 0.5734786987304688, 0.27231597900390625, -0.06201934814453125, 0.24333572387695312, 0.3441925048828125, 0.36069679260253906, 0.31290435791015625, 0.27524375915527344, 1.5711669921875, 0.044891357421875, 1.2191352844238281, 0.3950462341308594, 0.8609218597412109, 0.636688232421875, 1.189208984375, 0.733856201171875, 0.3780059814453125, 0.3516979217529297, 0.3325061798095703, 0.4901123046875, 0.048839569091796875, 0.7953109741210938, 0.0234375, 0.5866737365722656, 0.29038047790527344, -0.0538330078125, 0.057338714599609375, 0.6167488098144531, 1.4424591064453125, 0.874786376953125, 0.0764007568359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000063.npy"}
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.07845339179039001, "std": 0.6559348702430725, "min": -1.9287109375, "p10": -0.5541595458984375, "median": 0.07763385772705078, "p90": 0.6883380889892581, "max": 2.78387451171875, "pos_frac": 0.609375, "sample": [0.5767898559570312, 2.78387451171875, -0.5582427978515625, -0.5423507690429688, 0.3165016174316406, -0.655120849609375, 0.57244873046875, 0.0367584228515625, 0.16803359985351562, 1.0558624267578125, 0.096160888671875, -0.1751708984375, 0.4584369659423828, -0.15041351318359375, 0.6227798461914062, -0.10420989990234375, 0.316619873046875, 0.02768707275390625, 0.7467269897460938, 0.3760547637939453, 0.730194091796875, 0.2546234130859375, -0.29601287841796875, 0.04759979248046875, -0.0413818359375, 0.05910682678222656, 0.2960662841796875, -1.9287109375, 0.3666839599609375, 0.3481731414794922, 0.21309661865234375, -0.125, -0.23241806030273438, -0.4455108642578125, 0.03035736083984375, 0.17620849609375, 0.3297615051269531, -1.566741943359375, -0.4457263946533203, 0.1048126220703125, 0.5325927734375, 0.21332931518554688, 0.457305908203125, -0.15386009216308594, 0.40616798400878906, -1.481201171875, 0.01284027099609375, -0.2231903076171875, -0.2699737548828125, 1.108856201171875, -0.5446319580078125, -0.06611824035644531, 1.2752838134765625, 0.28820037841796875, 0.017900466918945312, 0.34061431884765625, 0.20173072814941406, -0.07810592651367188, -0.3728179931640625, -0.1168060302734375, -0.5920333862304688, -0.639984130859375, 0.1440753936767578, 0.7164344787597656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000064.npy"}
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.5296029448509216, "std": 0.761102557182312, "min": -1.7001495361328125, "p10": -0.4659584045410155, "median": 0.4563717842102051, "p90": 1.3834230422973635, "max": 2.39105224609375, "pos_frac": 0.765625, "sample": [0.4629831314086914, 0.3709564208984375, 0.9451904296875, 0.44976043701171875, -0.011648178100585938, 0.6549606323242188, -0.1439971923828125, 2.39105224609375, 0.20735740661621094, 0.9339523315429688, -0.6324310302734375, 1.0584678649902344, 1.1080913543701172, 0.8174896240234375, 0.865692138671875, 0.6295013427734375, 0.8271255493164062, 0.5226802825927734, 0.214752197265625, 2.084381103515625, 1.2833251953125, 0.270477294921875, -0.5246734619140625, 1.523284912109375, 1.3328056335449219, 0.0793609619140625, -1.7001495361328125, 0.0262908935546875, -0.5349273681640625, 0.33174896240234375, 0.17829322814941406, 0.968414306640625, -0.82958984375, -0.6442832946777344, 0.8226242065429688, 1.141815185546875, 0.3160247802734375, 0.7125701904296875, 1.0978355407714844, 1.366903305053711, 1.3905029296875, -0.019229888916015625, 1.3112258911132812, 0.8546485900878906, 0.14357376098632812, 0.0959014892578125, 0.27706146240234375, 0.42372894287109375, -0.9621696472167969, -0.010650634765625, 0.39007568359375, 0.3163623809814453, -0.32895660400390625, 1.1507282257080078, -0.1709136962890625, 0.9344520568847656, 1.4930343627929688, -0.09916496276855469, 1.1401519775390625, 1.6474609375, 1.0586929321289062, -0.3040771484375, 1.936981201171875, 0.2506980895996094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000065.npy"}
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.26758939027786255, "std": 0.6964896321296692, "min": -1.4351730346679688, "p10": -0.497796630859375, "median": 0.18543052673339844, "p90": 1.0641918182373047, "max": 2.6670074462890625, "pos_frac": 0.671875, "sample": [0.3901042938232422, 1.5650634765625, 0.18395614624023438, 0.6612033843994141, 0.2708396911621094, 0.7366256713867188, 0.42299842834472656, 0.172637939453125, 0.12680435180664062, 0.33403587341308594, 0.10280609130859375, 0.5918121337890625, 1.0670928955078125, 0.13962173461914062, -0.5491790771484375, 0.0858001708984375, 0.02738189697265625, 0.5044670104980469, 1.36065673828125, 0.889007568359375, 0.480743408203125, 1.0574226379394531, -0.6242294311523438, 0.24448394775390625, 0.01195526123046875, 0.32212066650390625, 0.28990936279296875, 1.0985679626464844, 0.7709884643554688, 0.38523101806640625, 0.4013328552246094, -0.1983623504638672, -0.2828369140625, 0.18358612060546875, 0.22184181213378906, -0.2003936767578125, 0.22356605529785156, 0.3384361267089844, -0.45781707763671875, -0.00482940673828125, -1.184356689453125, -1.4351730346679688, 0.13027572631835938, -0.5149307250976562, -0.38878631591796875, -0.8380813598632812, 0.648468017578125, -0.06649017333984375, 0.51763916015625, -0.02564239501953125, -0.1993389129638672, -0.16241455078125, 0.17726898193359375, 2.6670074462890625, 2.375885009765625, -0.5789337158203125, 0.1869049072265625, -0.3993988037109375, -0.1026611328125, 1.4178009033203125, 0.7184867858886719, -0.03643035888671875, -0.096832275390625, 0.970001220703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000066.npy"}
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.23076248168945312, "std": 0.6564959287643433, "min": -1.7611083984375, "p10": -0.4137519836425781, "median": 0.19681549072265625, "p90": 0.9527095794677735, "max": 2.0756072998046875, "pos_frac": 0.625, "sample": [-0.04389190673828125, -0.228302001953125, -0.3590373992919922, -0.186492919921875, -0.9816665649414062, 0.4838123321533203, 1.5221023559570312, 0.6765232086181641, 0.6206588745117188, -0.56060791015625, 0.3341484069824219, 0.06580924987792969, -0.34564208984375, -0.35137939453125, 0.27529144287109375, 0.8295669555664062, -0.07164192199707031, 0.6764297485351562, -0.35424041748046875, 0.529937744140625, 0.6901092529296875, 1.3384323120117188, -1.7611083984375, 1.9812088012695312, 0.04618263244628906, -0.3912200927734375, 0.38787841796875, 0.3909645080566406, 0.8847503662109375, -0.736663818359375, 0.00402069091796875, 0.5384941101074219, 0.11063385009765625, 0.3983116149902344, 0.6205787658691406, 0.4201831817626953, -0.025970458984375, 0.1702289581298828, -0.1902179718017578, 0.7290458679199219, 2.0756072998046875, -0.4538116455078125, 1.1311721801757812, 0.9563980102539062, -0.2943878173828125, -0.20468902587890625, 0.351837158203125, 0.5121040344238281, 0.40148162841796875, -0.269439697265625, 0.10800552368164062, 1.29693603515625, 0.2234020233154297, -0.0027923583984375, 0.36876678466796875, 0.14349365234375, -0.30942535400390625, 0.159454345703125, -0.42340850830078125, -0.625701904296875, 0.9441032409667969, 0.5241928100585938, 0.3226318359375, -0.304351806640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000067.npy"}
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.33240818977355957, "std": 0.7034757733345032, "min": -1.559051513671875, "p10": -0.5471942901611327, "median": 0.365234375, "p90": 1.0194427490234377, "max": 3.13934326171875, "pos_frac": 0.75, "sample": [3.13934326171875, 0.658294677734375, 0.4883995056152344, 1.5857391357421875, -0.41605377197265625, 1.27276611328125, 0.09465789794921875, 0.0552825927734375, 0.39649200439453125, 0.05377197265625, 0.382781982421875, -1.559051513671875, 0.7844161987304688, 1.307769775390625, 0.04317474365234375, 0.9152145385742188, -0.50885009765625, -0.0111236572265625, 0.260528564453125, 0.46605682373046875, 0.4989166259765625, 0.033313751220703125, 0.463226318359375, 0.5658836364746094, 0.513427734375, 0.2420177459716797, 0.7055130004882812, -0.11780738830566406, 0.607574462890625, 0.308807373046875, -0.2044696807861328, 0.3941783905029297, 0.45945167541503906, 0.285736083984375, -0.7796211242675781, 0.9844551086425781, 1.594146728515625, 0.20735549926757812, 0.13794708251953125, -0.06670951843261719, 0.347686767578125, -0.822113037109375, 0.9882965087890625, 0.7136917114257812, -0.732757568359375, -0.9566268920898438, 0.170135498046875, 0.2790374755859375, -0.14331817626953125, -0.10889434814453125, 1.0327911376953125, 0.42425537109375, -0.6782302856445312, 0.42307281494140625, 0.566436767578125, 0.449310302734375, 0.2474212646484375, 0.8860092163085938, 0.787750244140625, 1.3616981506347656, 0.6773948669433594, 0.2032318115234375, -0.5309562683105469, -0.5541534423828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000068.npy"}
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.41042596101760864, "std": 0.6660028696060181, "min": -0.7356452941894531, "p10": -0.3169410705566406, "median": 0.39943885803222656, "p90": 1.0628807067871096, "max": 2.80145263671875, "pos_frac": 0.75, "sample": [0.12912940979003906, -0.026561737060546875, 1.1877021789550781, -0.2100372314453125, 1.0320816040039062, 0.07463836669921875, -0.025096893310546875, 0.029735565185546875, 0.235748291015625, 0.6680755615234375, 0.43993377685546875, 0.5002174377441406, 0.2820549011230469, 1.0926437377929688, 0.8131866455078125, 0.9837417602539062, 0.302642822265625, 0.6984939575195312, 0.770538330078125, -0.6432723999023438, 2.100830078125, -0.673736572265625, 2.2399749755859375, -0.134246826171875, 0.948577880859375, 0.08406448364257812, 0.238677978515625, 0.30530548095703125, 0.5319366455078125, 0.5841064453125, 0.4899578094482422, 0.561309814453125, 0.4113807678222656, 0.25229644775390625, 0.5479316711425781, -0.6796417236328125, 0.7716484069824219, -0.08630752563476562, 0.115447998046875, 0.9448776245117188, -0.1148834228515625, 0.27060890197753906, 0.3874969482421875, 0.27423858642578125, 1.076080322265625, 0.6979293823242188, 0.7062911987304688, 0.4192218780517578, -0.3093681335449219, -0.6868896484375, -0.3201866149902344, 0.7088642120361328, -0.7356452941894531, 0.0070781707763671875, 0.5097160339355469, -0.5800590515136719, 0.7834320068359375, 0.7064094543457031, 0.76300048828125, -0.19632530212402344, 2.80145263671875, 0.11751747131347656, 1.3863067626953125, -0.295013427734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000069.npy"}
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.3075684607028961, "std": 0.662090539932251, "min": -1.5406646728515625, "p10": -0.6454095840454102, "median": 0.3750190734863281, "p90": 1.1693023681640629, "max": 1.47430419921875, "pos_frac": 0.734375, "sample": [0.5756607055664062, -0.5014514923095703, 0.522308349609375, 0.0072021484375, 0.6404628753662109, -0.4577903747558594, 0.2618865966796875, 0.635284423828125, 0.7561187744140625, -0.39186668395996094, 0.38726806640625, 0.044765472412109375, -1.5406646728515625, 0.2785758972167969, 0.2171344757080078, -0.6462993621826172, 0.19031143188476562, -0.8071670532226562, -0.4883460998535156, 0.11326026916503906, 1.2239761352539062, 0.7527389526367188, 0.9860706329345703, 0.5216827392578125, -0.12308502197265625, 1.2011947631835938, 0.4996490478515625, 1.0948867797851562, -0.6542510986328125, 1.368408203125, 0.3253936767578125, 1.033782958984375, -0.47278594970703125, -0.127655029296875, 1.2695159912109375, 0.954986572265625, 1.3980789184570312, 0.824951171875, 0.5720577239990234, 0.8383255004882812, 0.40320777893066406, 0.0006809234619140625, 0.060611724853515625, 0.36277008056640625, 0.3975372314453125, 0.6369743347167969, -0.8971366882324219, 1.39251708984375, 0.2927227020263672, 0.966339111328125, 0.967132568359375, 0.8379898071289062, 0.14003372192382812, -0.6728668212890625, 0.3228340148925781, 0.4278240203857422, -0.6433334350585938, -0.9950752258300781, 0.5353469848632812, 0.23706436157226562, -0.3181610107421875, -0.043514251708984375, 0.5140018463134766, 1.47430419921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000070.npy"}
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.15357764065265656, "std": 0.6985422372817993, "min": -1.8368072509765625, "p10": -0.9570293426513672, "median": 0.22541427612304688, "p90": 0.8460710525512696, "max": 1.7182769775390625, "pos_frac": 0.6875, "sample": [-0.26216697692871094, 0.596923828125, 0.3563575744628906, 0.8203811645507812, 0.141937255859375, 0.28037261962890625, 0.5840511322021484, -1.2107009887695312, -0.5552978515625, 1.428497314453125, 0.15946388244628906, -0.9793624877929688, 0.025634765625, 0.46028900146484375, -0.9752960205078125, -0.10132980346679688, -0.29421234130859375, 1.1929969787597656, 0.13641357421875, 0.6493377685546875, -1.8368072509765625, 1.7182769775390625, 0.09949874877929688, 0.4392585754394531, 0.8505153656005859, 0.7718353271484375, 0.6555023193359375, 0.21729278564453125, 0.8575344085693359, 0.13445281982421875, -1.23712158203125, 0.09423828125, -0.9611892700195312, -0.9473228454589844, 0.3218879699707031, 0.2630577087402344, 0.3767738342285156, -0.17047119140625, 0.724884033203125, -0.48111724853515625, 0.193023681640625, -1.7695465087890625, 0.6579437255859375, 1.1396942138671875, -0.3287067413330078, 0.018631935119628906, -0.32220458984375, 0.9996414184570312, -0.216552734375, 0.05172920227050781, 0.6689395904541016, 0.6622848510742188, 0.2938079833984375, 0.5613632202148438, -0.2200164794921875, 0.14592361450195312, 0.3762779235839844, 0.8357009887695312, 0.2335357666015625, -0.1781024932861328, 0.75048828125, 0.5681819915771484, 0.5081634521484375, -0.14650344848632812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000071.npy"}
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.4299052357673645, "std": 0.7584711909294128, "min": -1.6465110778808594, "p10": -0.4690315246582031, "median": 0.41114044189453125, "p90": 1.3056690216064455, "max": 2.5322723388671875, "pos_frac": 0.703125, "sample": [0.09534835815429688, 1.0320510864257812, 0.28072357177734375, 0.7637176513671875, -0.8416900634765625, 0.9374504089355469, 0.1034393310546875, 2.5322723388671875, -1.6465110778808594, 0.3366737365722656, 1.2748832702636719, 1.3188629150390625, 0.47919464111328125, 1.334177017211914, 0.2852745056152344, -0.5566368103027344, -0.41986083984375, 0.8409767150878906, -0.0346832275390625, -0.25540924072265625, 0.7413787841796875, 0.506805419921875, 1.4392929077148438, 0.34308624267578125, 0.9170455932617188, -0.2240753173828125, -0.3088359832763672, -0.5876235961914062, -0.49010467529296875, -0.2469635009765625, 1.8530197143554688, 0.9480056762695312, 0.69879150390625, 1.920867919921875, -0.7907485961914062, -0.33695030212402344, 0.8075160980224609, 0.5534381866455078, 0.6173820495605469, -0.0245513916015625, 1.033233642578125, 0.5014991760253906, 1.1312484741210938, 0.03449249267578125, 1.1867218017578125, 0.2982635498046875, 0.2278900146484375, 0.905548095703125, -0.15844345092773438, 0.6977615356445312, 0.22015380859375, 1.083709716796875, 1.7108993530273438, -0.2881298065185547, 0.9176254272460938, 0.3310394287109375, 0.4956092834472656, -0.0732269287109375, 0.1947174072265625, 0.15062713623046875, -0.9806747436523438, 1.2286300659179688, 0.7738990783691406, -0.30619049072265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000072.npy"}
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.14263877272605896, "std": 0.8253279328346252, "min": -1.69140625, "p10": -0.8158721923828125, "median": 0.013798713684082031, "p90": 1.2131698608398442, "max": 2.3215293884277344, "pos_frac": 0.5, "sample": [0.4451427459716797, 0.21197891235351562, -0.19875335693359375, -0.6548957824707031, -1.1035633087158203, -0.350372314453125, -0.8274765014648438, -0.8854179382324219, -1.69140625, 0.9174976348876953, -0.41251373291015625, -0.09648895263671875, -0.03057861328125, -0.6735305786132812, -0.7887954711914062, 0.13833999633789062, -0.19390869140625, 0.5197219848632812, 1.1168899536132812, 0.05728912353515625, -1.4650421142578125, -0.5682182312011719, -0.8537502288818359, 0.5896987915039062, 1.42327880859375, 0.996734619140625, -0.327606201171875, -0.38044166564941406, -0.007007598876953125, -0.2748680114746094, 0.4881744384765625, -0.2229747772216797, 0.4476451873779297, 0.3812751770019531, -0.4088592529296875, 0.18407249450683594, 0.9280452728271484, 1.596099853515625, -1.2572097778320312, -0.6454696655273438, 0.74224853515625, 1.9200897216796875, -0.6826019287109375, -0.00034332275390625, -0.047149658203125, 1.2844467163085938, 0.5432586669921875, 0.027940750122070312, 0.6267318725585938, 1.025970458984375, -0.7039337158203125, 1.2544326782226562, 0.9386787414550781, 0.6493301391601562, -0.021747589111328125, -0.12532806396484375, 0.18799972534179688, 0.4240570068359375, 0.19002151489257812, -0.12371826171875, 0.6092338562011719, 2.3215293884277344, 2.046234130859375, -0.08123588562011719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000073.npy"}
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.42662960290908813, "std": 0.7013421654701233, "min": -0.9884872436523438, "p10": -0.4990617752075195, "median": 0.3724327087402344, "p90": 1.2527591705322267, "max": 2.1847457885742188, "pos_frac": 0.6875, "sample": [-0.1768646240234375, 0.15462493896484375, -0.1853485107421875, -0.030731201171875, 1.0771102905273438, 0.13015365600585938, 0.762115478515625, 1.95013427734375, 1.2680473327636719, 1.8377151489257812, 1.04443359375, 0.4460334777832031, 0.3818206787109375, 0.6097946166992188, -0.21558380126953125, 1.2031211853027344, 1.102325439453125, 0.5170745849609375, -0.08844375610351562, -0.3904571533203125, 0.2870941162109375, 0.7232818603515625, 0.6303443908691406, -0.053997039794921875, 0.3185539245605469, 0.07127571105957031, -0.4689502716064453, 0.08157730102539062, 1.7579803466796875, -0.9884872436523438, 0.6320419311523438, 0.9435558319091797, 0.6899509429931641, -0.29135894775390625, -0.6706809997558594, 0.4571533203125, -0.5810642242431641, 0.36304473876953125, -0.528472900390625, -0.5119667053222656, 0.857421875, -0.7117557525634766, 0.3221282958984375, 0.6663131713867188, 0.5584564208984375, 1.599334716796875, 2.1847457885742188, -0.19292068481445312, 1.5557403564453125, 1.2170867919921875, -0.5253448486328125, 0.2064361572265625, 0.7931098937988281, 0.13236618041992188, -0.13329315185546875, 0.1457958221435547, -0.00339508056640625, 0.641754150390625, 1.115875244140625, 1.0228805541992188, 1.0353336334228516, 0.09141921997070312, -0.18706130981445312, 0.6539154052734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000074.npy"}
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.4792408347129822, "std": 0.7964400053024292, "min": -1.5093803405761719, "p10": -0.5100051879882812, "median": 0.4536895751953125, "p90": 1.6615936279296875, "max": 2.34747314453125, "pos_frac": 0.765625, "sample": [-0.4233970642089844, -0.2711944580078125, 0.27604103088378906, 0.8621444702148438, 2.1285934448242188, -0.04788780212402344, 0.061435699462890625, 0.6659126281738281, 1.253631591796875, 1.668436050415039, 0.2371673583984375, 0.126129150390625, -0.8119392395019531, 0.48081207275390625, 0.09630203247070312, 1.3025016784667969, 0.06325531005859375, -0.5241279602050781, 0.34978485107421875, 0.4641227722167969, 1.73516845703125, 0.8812942504882812, 1.8071517944335938, 0.80120849609375, 0.8393783569335938, 0.6904563903808594, 1.3155746459960938, -1.0327911376953125, 0.5738449096679688, 0.29402923583984375, 0.5611476898193359, 1.6456279754638672, 0.47158050537109375, 1.0959320068359375, 0.26146697998046875, -0.11139106750488281, 1.6959381103515625, -0.514129638671875, 1.7663726806640625, 1.1082744598388672, 0.23377227783203125, 0.7613811492919922, -1.5093803405761719, 0.31819725036621094, 1.00274658203125, 0.2511787414550781, 0.11095809936523438, 0.1788482666015625, 0.07663917541503906, 0.012750625610351562, 0.5293178558349609, 1.4696731567382812, -0.7030086517333984, -0.09974288940429688, 0.4432563781738281, -1.0497817993164062, 1.435028076171875, -0.29535675048828125, -0.5003814697265625, 0.609893798828125, 0.6763553619384766, -0.08606338500976562, 2.34747314453125, 0.613800048828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000075.npy"}
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.27817073464393616, "std": 0.8101134300231934, "min": -1.25152587890625, "p10": -0.665243911743164, "median": 0.2839975357055664, "p90": 1.4877824783325198, "max": 2.7934951782226562, "pos_frac": 0.625, "sample": [-1.0773239135742188, 1.4296035766601562, -0.994781494140625, -0.69854736328125, 0.35266876220703125, -0.5875358581542969, 0.46288490295410156, 0.7296104431152344, 0.15906906127929688, 1.9312820434570312, 1.512716293334961, 2.7934951782226562, -0.24756813049316406, -1.25152587890625, 0.43548583984375, -1.1572189331054688, 0.32341766357421875, 0.7901153564453125, 0.3807220458984375, 0.187347412109375, 1.5760955810546875, 1.75390625, -0.5267410278320312, 1.2940292358398438, 0.5296630859375, 0.3986053466796875, 0.045501708984375, 1.5187911987304688, -0.4390296936035156, -0.12385368347167969, 0.2642841339111328, -0.4005928039550781, -0.891021728515625, 0.3385276794433594, -0.13714599609375, 0.8971939086914062, 0.44136810302734375, 0.2935943603515625, -0.5213031768798828, 0.43686676025390625, 0.2744007110595703, -0.34378814697265625, -0.0886688232421875, -0.13691329956054688, 0.6482162475585938, 0.6019382476806641, 0.00858306884765625, -0.7226619720458984, 0.481231689453125, 0.3207244873046875, -0.401092529296875, 0.12633514404296875, -0.02095794677734375, 0.5603485107421875, 2.0937042236328125, 0.7428150177001953, -0.3508758544921875, -0.2322826385498047, -0.49881744384765625, 0.64971923828125, 1.0533256530761719, -0.07787704467773438, 0.8470077514648438, 0.045856475830078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000076.npy"}
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.3415394425392151, "std": 0.7624257802963257, "min": -1.2566375732421875, "p10": -0.840785217285156, "median": 0.32645225524902344, "p90": 1.2874542236328126, "max": 2.0107574462890625, "pos_frac": 0.703125, "sample": [0.28722381591796875, 0.4742889404296875, -0.021396636962890625, -0.20144271850585938, 0.7494125366210938, 1.2837905883789062, 0.6404876708984375, 0.28064537048339844, -1.2566375732421875, 0.2565460205078125, 1.2766799926757812, 1.1146736145019531, 0.387451171875, -0.4296760559082031, 0.3864631652832031, 1.5088787078857422, -0.2522735595703125, 1.2890243530273438, 0.7858047485351562, 1.6844100952148438, 0.031887054443359375, 0.279815673828125, 1.498565673828125, 0.98638916015625, -0.3672370910644531, -0.03403472900390625, 0.04870796203613281, 0.3546600341796875, 0.13590621948242188, 0.14722442626953125, -0.6283798217773438, -1.0387191772460938, -0.15228652954101562, 0.3924713134765625, 0.466888427734375, 0.7193603515625, 0.6158981323242188, 1.2140064239501953, 0.2982444763183594, 1.2568092346191406, -0.194305419921875, -0.35610198974609375, 0.474334716796875, 0.9709091186523438, 0.46009063720703125, 2.0107574462890625, 0.11322402954101562, -0.020711898803710938, 1.9906158447265625, 0.11611175537109375, 1.1019210815429688, -0.9318161010742188, 0.0559234619140625, 1.3675670623779297, 0.7733383178710938, 0.08585739135742188, -1.0899314880371094, -0.9729461669921875, -1.12646484375, -0.06318092346191406, 0.47844696044921875, 0.50750732421875, 0.7001781463623047, -1.0633316040039062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000077.npy"}
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.3667002320289612, "std": 0.7798960208892822, "min": -1.7414703369140625, "p10": -0.7024894714355467, "median": 0.4689369201660156, "p90": 1.2334243774414064, "max": 2.2032623291015625, "pos_frac": 0.765625, "sample": [0.35921478271484375, -1.037811279296875, 0.5406112670898438, 0.5990486145019531, 0.20034027099609375, 2.066579818725586, 0.9198837280273438, -0.1118927001953125, 0.5201015472412109, -0.740142822265625, 1.1531906127929688, -1.083282470703125, -0.15367507934570312, 0.6245651245117188, 0.2646331787109375, 0.482421875, -1.0858078002929688, 2.2032623291015625, 0.13707733154296875, 0.49213409423828125, 0.04248046875, 0.09454345703125, 1.5602874755859375, 0.4147796630859375, 1.2781982421875, 0.5247421264648438, 0.3277397155761719, -1.7414703369140625, 1.3089218139648438, -0.0254058837890625, 1.244873046875, 1.01275634765625, 0.5615158081054688, 1.9564361572265625, 0.45545196533203125, -1.111846923828125, -1.4755783081054688, 0.5266571044921875, 0.3576984405517578, 0.8344306945800781, 0.5596160888671875, 0.7384986877441406, 0.6799468994140625, 0.02877044677734375, -0.6146316528320312, 0.9727516174316406, -0.5791168212890625, 0.2549114227294922, 0.6190338134765625, 0.7005233764648438, -0.38686370849609375, 0.7566299438476562, 0.29994964599609375, -0.26153564453125, 0.4383087158203125, 0.1514739990234375, 0.2085285186767578, -0.19605255126953125, 1.1774635314941406, 0.19815826416015625, 1.2067108154296875, 0.6205940246582031, 0.6224441528320312, 0.7750377655029297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000078.npy"}
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 0.5585935115814209, "std": 0.815706193447113, "min": -1.5657577514648438, "p10": -0.27055702209472654, "median": 0.5156822204589844, "p90": 1.4235382080078125, "max": 3.296844482421875, "pos_frac": 0.78125, "sample": [0.14813232421875, 0.2322540283203125, -1.0764408111572266, 1.399209976196289, -0.2611656188964844, 0.8700103759765625, -1.5657577514648438, -0.08363914489746094, 0.905853271484375, -0.0921783447265625, -0.4980316162109375, -0.19921112060546875, 1.3089981079101562, 2.7895851135253906, -0.2745819091796875, 0.38726043701171875, 1.3997001647949219, 0.588226318359375, -0.7994308471679688, 0.18496322631835938, 0.5114250183105469, 2.31256103515625, 0.3404083251953125, 1.427703857421875, 0.5059585571289062, -0.5161304473876953, 1.486419677734375, 0.6170654296875, 1.2384147644042969, 0.39884185791015625, -0.049407958984375, -0.28160667419433594, 1.28076171875, 0.7194671630859375, 0.79730224609375, 0.5199394226074219, 0.7036972045898438, 0.8341159820556641, -0.11037063598632812, 0.20952606201171875, 0.09601974487304688, 0.8813400268554688, 0.39959716796875, 0.06862831115722656, 0.1380176544189453, 0.3114585876464844, 0.5233612060546875, 0.5745563507080078, 0.862640380859375, 0.8009033203125, 0.6070213317871094, 0.6475372314453125, 1.413818359375, -0.159210205078125, 0.20484542846679688, 3.296844482421875, 0.9128265380859375, 0.40682220458984375, 0.685302734375, 1.4594345092773438, 0.5496444702148438, 0.3478050231933594, 2.0979537963867188, 0.31296539306640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000079.npy"}
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.3605545461177826, "std": 0.8876761198043823, "min": -2.3143157958984375, "p10": -0.6501190185546875, "median": 0.3019676208496094, "p90": 1.5479335784912116, "max": 2.710205078125, "pos_frac": 0.65625, "sample": [-0.04901885986328125, 1.3776893615722656, -0.070343017578125, 0.7860641479492188, 1.0087528228759766, -1.5210723876953125, -0.08607864379882812, 0.2947998046875, -0.28095245361328125, -0.7119598388671875, 1.8238887786865234, -0.1705322265625, 0.3525848388671875, -0.4845161437988281, 0.27965736389160156, 2.710205078125, 0.404510498046875, 0.146270751953125, 0.8220634460449219, -0.0291290283203125, 0.6165504455566406, 0.407196044921875, -0.231109619140625, 0.30913543701171875, 2.1828460693359375, 1.9503860473632812, 0.881805419921875, -0.08628082275390625, 1.6208953857421875, 1.076904296875, 0.05064201354980469, 1.0122451782226562, -0.09112548828125, 0.6429729461669922, 0.7774124145507812, 0.8394317626953125, -0.5731887817382812, 0.7535781860351562, 1.3423576354980469, -2.3143157958984375, -0.64898681640625, -0.28575897216796875, -1.0793609619140625, 0.9949951171875, -0.05872154235839844, 0.05934906005859375, 1.7866973876953125, 0.2270641326904297, 0.7206192016601562, -0.9349937438964844, 1.849456787109375, 0.37142181396484375, -0.650604248046875, 0.0561065673828125, 0.17902374267578125, -0.09032249450683594, 0.4890289306640625, 0.07673263549804688, 1.141103744506836, 0.41606903076171875, 0.04299163818359375, -1.0060272216796875, 1.0955810546875, 0.5528030395507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000080.npy"}
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.5562969446182251, "std": 0.7961805462837219, "min": -0.745025634765625, "p10": -0.371013069152832, "median": 0.38868141174316406, "p90": 1.622810745239258, "max": 2.989288330078125, "pos_frac": 0.734375, "sample": [0.5557537078857422, 0.2749347686767578, 1.6347846984863281, -0.745025634765625, 1.8571739196777344, -0.37876129150390625, 0.8194675445556641, 0.20981597900390625, -0.07562255859375, -0.26318359375, 0.505584716796875, -0.000762939453125, -0.2103424072265625, 1.1763381958007812, 0.19365692138671875, 0.41436004638671875, 0.11857032775878906, -0.169189453125, -0.39160919189453125, 1.169393539428711, 0.3630027770996094, 0.003387451171875, 0.7847824096679688, 0.069091796875, 2.19464111328125, 0.6757888793945312, 0.076995849609375, 0.8398284912109375, -0.3848152160644531, -0.05255126953125, 0.7079849243164062, 1.4801368713378906, 0.8111953735351562, 0.304473876953125, 0.04473114013671875, 1.9006195068359375, -0.0731964111328125, 2.305023193359375, 1.4202880859375, 1.826568603515625, 0.53997802734375, 0.540618896484375, 1.314056396484375, -0.3052482604980469, 1.3838882446289062, 0.024888992309570312, 1.5948715209960938, 0.12783050537109375, 0.48584747314453125, 1.0491790771484375, 2.989288330078125, 1.422943115234375, 0.8980236053466797, 0.7217483520507812, 0.1907196044921875, -0.4733734130859375, -0.3529338836669922, 0.3165855407714844, 0.13333511352539062, 0.9911727905273438, -0.09556007385253906, 1.2711372375488281, -0.47803497314453125, -0.6812744140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000081.npy"}
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 0.6699748039245605, "std": 0.718742311000824, "min": -1.5862655639648438, "p10": -0.13983726501464838, "median": 0.6967964172363281, "p90": 1.5252128601074222, "max": 2.3480072021484375, "pos_frac": 0.828125, "sample": [0.5185279846191406, 1.3906478881835938, 0.38067626953125, 0.12323760986328125, 1.2793121337890625, 0.16469192504882812, 0.6777114868164062, 1.0994911193847656, 1.4455108642578125, 0.7906818389892578, 0.8911895751953125, 0.8534793853759766, -0.5059661865234375, 1.86566162109375, 0.5479011535644531, 1.076324462890625, 0.7287445068359375, 0.6967430114746094, 1.0376110076904297, 0.3367767333984375, 0.0032196044921875, 0.6968498229980469, 0.5426616668701172, 0.21590805053710938, -0.4278411865234375, 0.9956893920898438, 0.6502933502197266, 0.87701416015625, 1.272165298461914, 0.881134033203125, 0.5152912139892578, 0.2798137664794922, 2.3480072021484375, 0.0413360595703125, -0.0179290771484375, 1.1277923583984375, 1.2645263671875, 0.2660980224609375, 1.5573272705078125, 2.2961196899414062, 0.28896331787109375, -0.6451187133789062, 0.9777069091796875, -0.4220771789550781, 1.2331829071044922, 0.20052337646484375, 0.9678115844726562, 0.7308311462402344, 1.7166061401367188, 1.6651458740234375, -1.5862655639648438, -0.03498649597167969, 1.2154083251953125, 1.2782821655273438, 0.26412200927734375, -0.06754493713378906, 0.117889404296875, 0.9204139709472656, 1.6933765411376953, 0.60455322265625, -0.092559814453125, -0.2224884033203125, -0.16009902954101562, 1.4502792358398438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000082.npy"}
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 0.5883078575134277, "std": 0.947523295879364, "min": -2.209829330444336, "p10": -0.5390655517578123, "median": 0.5731334686279297, "p90": 1.5240568161010746, "max": 4.133941650390625, "pos_frac": 0.78125, "sample": [1.5812835693359375, 0.4303131103515625, 0.8018608093261719, 0.5934104919433594, 1.258535385131836, 0.6595230102539062, -0.1605072021484375, -0.00026702880859375, 0.9653396606445312, 1.625457763671875, -1.0529327392578125, 0.39407920837402344, -0.357421875, 0.4382209777832031, 1.1300621032714844, 0.1267547607421875, -0.7548294067382812, 1.1449966430664062, 0.2121715545654297, 0.8328933715820312, 1.0624351501464844, 0.5528564453125, 0.8470458984375, 0.021356582641601562, -0.3356781005859375, -0.07608795166015625, 0.6350250244140625, 1.1155014038085938, 0.1801776885986328, 0.02100372314453125, 1.3204498291015625, 0.029071807861328125, 1.9828453063964844, -2.209829330444336, -0.7280597686767578, 0.7386302947998047, 0.8269062042236328, -0.767913818359375, 0.5048961639404297, -0.616912841796875, 0.003543853759765625, 0.8586807250976562, 1.6816558837890625, 0.8288307189941406, -0.12354660034179688, 0.7338542938232422, 0.3377647399902344, 0.48036956787109375, 0.4982452392578125, 1.2533988952636719, 3.34686279296875, 1.1313018798828125, 1.5512199401855469, 0.06943893432617188, 0.8301773071289062, 1.4606761932373047, 1.2465057373046875, 1.3520679473876953, 1.2159194946289062, 4.133941650390625, 0.277313232421875, -0.2352142333984375, 0.3825263977050781, -0.6364974975585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000083.npy"}
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 0.36277878284454346, "std": 0.8141404390335083, "min": -2.082916259765625, "p10": -0.7348953247070312, "median": 0.40599632263183594, "p90": 1.4436515808105472, "max": 1.6876544952392578, "pos_frac": 0.6875, "sample": [0.7162704467773438, 1.528900146484375, 0.8361034393310547, 1.2456130981445312, 0.3991966247558594, 1.1668033599853516, 1.4724197387695312, -0.218902587890625, 0.28659820556640625, -0.641845703125, -1.606781005859375, 1.6876544952392578, 0.9782562255859375, 0.498687744140625, 0.9058914184570312, 0.12210845947265625, -0.4603080749511719, -0.0965118408203125, 1.37652587890625, -1.0240287780761719, -0.5182647705078125, -0.0694427490234375, 0.6726589202880859, 0.9599761962890625, 0.2716255187988281, 0.81854248046875, -0.1186676025390625, 1.523529052734375, 0.2941780090332031, -0.7431716918945312, 0.8193511962890625, 0.3674774169921875, 0.3310737609863281, 0.6183013916015625, -0.256317138671875, 0.8162689208984375, 1.1896247863769531, 1.0149192810058594, 0.29993438720703125, -0.7268524169921875, -2.082916259765625, 0.98944091796875, 0.52874755859375, 0.4763069152832031, 1.0822677612304688, -0.026731491088867188, 0.2938423156738281, 1.656890869140625, 0.4127960205078125, -0.5893478393554688, 0.18959808349609375, -0.83551025390625, 1.6280536651611328, 0.6754302978515625, 0.7217445373535156, 0.4178428649902344, 1.033477783203125, 0.3846015930175781, -0.6366043090820312, -0.7684173583984375, 1.650360107421875, -0.15647125244140625, -0.73834228515625, 0.1733856201171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000084.npy"}
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 0.7022866010665894, "std": 0.982679009437561, "min": -2.51239013671875, "p10": -0.3197784423828125, "median": 0.5819549560546875, "p90": 1.9879859924316408, "max": 2.8158111572265625, "pos_frac": 0.796875, "sample": [1.9958114624023438, 0.3055305480957031, 0.6522121429443359, 0.5553665161132812, 0.47841644287109375, 0.9419784545898438, 0.40106964111328125, 0.5676555633544922, -0.2862739562988281, 1.5327949523925781, 2.5126876831054688, -2.51239013671875, 1.3209667205810547, 2.8158111572265625, 0.975341796875, 1.3513755798339844, 1.8880271911621094, 1.3685684204101562, 1.0498199462890625, 1.3101654052734375, 0.07736968994140625, 0.5847549438476562, -0.792572021484375, 1.1787223815917969, 0.48313140869140625, 0.5469512939453125, 0.12586593627929688, 1.134063720703125, 2.0921592712402344, 0.0102081298828125, -0.8922462463378906, 1.1796073913574219, -0.2984161376953125, 0.160491943359375, -0.1744213104248047, -0.20818328857421875, 1.9697265625, 1.5976581573486328, 0.5618057250976562, -0.026475906372070312, -1.6710319519042969, 1.3684844970703125, 1.810272216796875, 1.3036460876464844, 2.5865478515625, 2.30303955078125, 0.9385833740234375, 1.3181266784667969, 0.5275421142578125, 0.8045463562011719, -0.00305938720703125, -0.4184112548828125, 0.16760635375976562, 0.7034835815429688, 0.5791549682617188, 0.115081787109375, -0.3289337158203125, -0.529327392578125, 1.4082565307617188, 0.11959075927734375, 2.0406265258789062, 0.19384765625, 0.7608871459960938, 0.31264686584472656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000085.npy"}
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 0.5233776569366455, "std": 0.8827517628669739, "min": -1.3205642700195312, "p10": -0.4251197814941406, "median": 0.44826507568359375, "p90": 1.5291885375976566, "max": 2.8158721923828125, "pos_frac": 0.703125, "sample": [1.3976364135742188, 0.7306175231933594, -1.3205642700195312, -0.10693931579589844, 1.94488525390625, 0.8259468078613281, 0.4427947998046875, 0.7382888793945312, -0.397247314453125, 0.24763870239257812, -0.2314453125, -1.1601638793945312, 0.8756103515625, 2.2829971313476562, 0.16822052001953125, 1.2688369750976562, 1.2368144989013672, -0.5356063842773438, -0.0573883056640625, -0.14627838134765625, 1.343109130859375, -0.36476898193359375, 0.06375503540039062, 0.5237445831298828, 1.0987739562988281, 0.3066425323486328, 1.078155517578125, 0.5668220520019531, 2.0943756103515625, 0.0224456787109375, 1.2306632995605469, 1.2786865234375, 0.3484954833984375, 0.4537353515625, 1.04193115234375, 0.4315948486328125, -1.3144149780273438, 0.015172958374023438, 1.382965087890625, 0.9051132202148438, -0.43706512451171875, 0.6106300354003906, 1.0088081359863281, -0.058811187744140625, 1.0729637145996094, 1.1043891906738281, 1.4390449523925781, 0.018430709838867188, 1.6007957458496094, -0.32266807556152344, 2.8158721923828125, 0.11502838134765625, -0.696746826171875, -0.0008106231689453125, 1.5678215026855469, 2.4808120727539062, -0.31157493591308594, -0.22613525390625, 0.2049102783203125, -0.8765792846679688, 0.2549858093261719, -0.2115020751953125, 1.0878791809082031, 0.544036865234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000086.npy"}
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 0.5250037908554077, "std": 1.158825397491455, "min": -3.3365097045898438, "p10": -0.6633056640624999, "median": 0.577998161315918, "p90": 1.7779548645019534, "max": 3.237335205078125, "pos_frac": 0.703125, "sample": [0.8325233459472656, 3.237335205078125, 1.4862899780273438, 0.026641845703125, 2.0617523193359375, 1.2050552368164062, -2.098724365234375, -1.3502426147460938, 1.42022705078125, 1.4493274688720703, -0.28131103515625, -0.5987043380737305, 0.7192745208740234, -0.5035781860351562, -1.8250732421875, 1.3742218017578125, 0.4327392578125, 1.7283248901367188, 0.9425621032714844, 1.352630615234375, -1.3265228271484375, 2.567962646484375, 1.3541393280029297, 0.13672256469726562, 1.6326522827148438, -3.3365097045898438, -0.6271820068359375, -0.188018798828125, 1.4841384887695312, -0.4340553283691406, -0.13578414916992188, -0.2931365966796875, 2.116823196411133, -0.6787872314453125, 0.36818695068359375, -0.4691619873046875, 0.5013809204101562, 0.3112030029296875, 0.5770092010498047, 0.5429725646972656, 0.16546630859375, 0.88372802734375, 1.2261810302734375, -1.210968017578125, 0.32928466796875, -0.57879638671875, 0.8760509490966797, 1.799224853515625, 0.7646160125732422, 0.8620758056640625, 1.1741485595703125, 1.388519287109375, 1.981048583984375, -0.41175079345703125, 0.5789871215820312, -0.025177001953125, 0.5236568450927734, 0.03582763671875, 1.45501708984375, 1.4541397094726562, 1.64825439453125, 0.7561302185058594, 0.3617401123046875, 1.8475341796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000087.npy"}
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 0.564567506313324, "std": 1.0719166994094849, "min": -2.0467987060546875, "p10": -0.7581205368041991, "median": 0.502833366394043, "p90": 1.902365112304688, "max": 3.044097900390625, "pos_frac": 0.703125, "sample": [1.117095947265625, 0.4214038848876953, -0.5495929718017578, 1.1569099426269531, 2.78289794921875, 0.4644317626953125, 3.044097900390625, 0.3273277282714844, 1.3536529541015625, -0.11684417724609375, 1.7302360534667969, 1.4301071166992188, 1.0698356628417969, 0.8689746856689453, -0.8171463012695312, 0.827362060546875, 1.3145370483398438, -0.6734390258789062, -2.0467987060546875, 1.0351409912109375, 1.780965805053711, 0.30767822265625, 0.5412349700927734, -0.4149608612060547, -1.5351676940917969, 1.135711669921875, 1.6915740966796875, 2.5784912109375, -0.13986968994140625, 0.10440635681152344, -1.5435562133789062, -0.8326339721679688, 0.603607177734375, 0.9625244140625, 1.3534088134765625, -0.0752410888671875, 0.9042739868164062, 1.118988037109375, 2.123767852783203, -0.15779495239257812, 0.46154212951660156, 0.7068710327148438, 1.7058525085449219, 0.4364814758300781, 0.196624755859375, 0.40386962890625, -0.273681640625, 0.036651611328125, 2.290679931640625, 1.0796947479248047, 1.0024337768554688, 0.21288299560546875, 2.5766754150390625, -0.418914794921875, -0.2282562255859375, -0.7944126129150391, 0.6484832763671875, 0.0053386688232421875, 0.746368408203125, 1.9543933868408203, -0.20419692993164062, 0.3085479736328125, -0.5952968597412109, -1.373910903930664], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000088.npy"}
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 0.5347847938537598, "std": 1.0490516424179077, "min": -2.738861083984375, "p10": -0.41831188201904296, "median": 0.5082054138183594, "p90": 2.017968559265137, "max": 2.80255126953125, "pos_frac": 0.765625, "sample": [0.8591365814208984, 0.00131988525390625, 0.4702873229980469, 0.2022857666015625, 2.4546966552734375, 2.501476287841797, -0.07993316650390625, 0.012983322143554688, -0.009517669677734375, 2.2106246948242188, 1.2486343383789062, 1.3333969116210938, 0.562255859375, 0.956573486328125, 0.054386138916015625, 0.5775909423828125, 0.7959022521972656, -0.021270751953125, -0.12237167358398438, -1.743844985961914, 1.1128616333007812, -0.2222881317138672, 0.3925323486328125, 1.5135650634765625, 0.7946281433105469, 2.80255126953125, 0.9150314331054688, 0.21935653686523438, -0.4307727813720703, 0.7759857177734375, -0.2005767822265625, 0.596771240234375, 0.5838947296142578, 0.177001953125, 1.5204925537109375, 0.23344039916992188, 2.0387344360351562, 0.472198486328125, -2.738861083984375, -0.3892364501953125, 1.9695148468017578, 1.5878753662109375, 2.4095497131347656, 2.240589141845703, 0.5535449981689453, 0.05907630920410156, 0.4802360534667969, 1.6612091064453125, 1.6851081848144531, -1.6678695678710938, 0.04730224609375, 0.25201416015625, 0.38006591796875, 0.5361747741699219, 1.1802997589111328, -0.715484619140625, 0.6293659210205078, 0.00533294677734375, -1.2818279266357422, -1.2084007263183594, 0.567169189453125, -0.359130859375, 0.6137199401855469, 0.1688690185546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000089.npy"}
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 0.47241726517677307, "std": 1.128819227218628, "min": -3.3494873046875, "p10": -0.7542842864990235, "median": 0.4589672088623047, "p90": 1.489641571044922, "max": 4.926849365234375, "pos_frac": 0.71875, "sample": [-0.9575710296630859, 1.5577468872070312, -0.334228515625, 0.7088279724121094, 1.3562793731689453, 0.46222877502441406, 1.2537593841552734, 1.6688156127929688, 0.710418701171875, -3.3494873046875, 1.8836441040039062, 1.0224800109863281, 1.0739765167236328, 0.9008598327636719, 1.0958938598632812, -0.7316055297851562, -0.04254150390625, 1.472503662109375, -0.5884857177734375, 0.21118927001953125, 1.3465194702148438, 0.1730499267578125, 1.1053924560546875, 2.9745407104492188, 1.7724151611328125, 0.41905975341796875, -1.1248855590820312, 4.926849365234375, 1.0302658081054688, 0.8887481689453125, 0.9731712341308594, 0.35417938232421875, 0.11003684997558594, -0.3158416748046875, 0.6179542541503906, -0.5450363159179688, -0.029521942138671875, -1.4469375610351562, 0.35860443115234375, 0.8272743225097656, 0.5842704772949219, 0.2784576416015625, 1.28204345703125, 0.5075225830078125, 1.4969863891601562, 0.4557056427001953, 0.28665924072265625, 1.0525035858154297, 0.11438751220703125, 0.3075752258300781, -0.7640037536621094, 0.9156417846679688, 0.5958175659179688, -0.2551155090332031, 0.1371135711669922, 0.35935211181640625, -0.43475341796875, -1.5348167419433594, -0.48954010009765625, 1.1876945495605469, 0.24796295166015625, -1.0732574462890625, -0.22182655334472656, 1.4077816009521484], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000090.npy"}
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 0.712165355682373, "std": 1.062704086303711, "min": -2.3885116577148438, "p10": -0.304448127746582, "median": 0.5087203979492188, "p90": 2.0731964111328125, "max": 3.1903228759765625, "pos_frac": 0.765625, "sample": [0.92938232421875, 0.9919281005859375, 1.9363269805908203, -0.5634326934814453, 0.6941986083984375, 2.0795364379882812, -0.3315563201904297, 0.65423583984375, 1.971038818359375, 3.1903228759765625, 2.521575927734375, -0.2411956787109375, 1.5643081665039062, 0.20280075073242188, 0.2733592987060547, -0.17748260498046875, 1.50579833984375, 0.8312835693359375, 0.169647216796875, -0.72698974609375, 1.6195144653320312, 3.017913818359375, 0.10222625732421875, 0.5069732666015625, 2.0584030151367188, 1.2441749572753906, 0.8290290832519531, -0.03624916076660156, 0.4481964111328125, 1.0195236206054688, 0.47296142578125, 0.2600135803222656, 0.510467529296875, 0.46537017822265625, 1.501535415649414, 0.48860931396484375, -0.0990753173828125, 0.7133941650390625, 0.13593482971191406, -2.3885116577148438, 0.23859405517578125, -0.752227783203125, -0.14522552490234375, -1.955535888671875, 0.5889549255371094, 1.1878547668457031, -0.055217742919921875, 2.5444259643554688, 1.1960678100585938, 1.6575546264648438, 0.40699005126953125, -0.650970458984375, -0.17298507690429688, 0.1319866180419922, 1.7972068786621094, 2.7096595764160156, 0.1988983154296875, -0.021150588989257812, 2.108509063720703, 1.6698379516601562, 0.8370361328125, 0.18982505798339844, 1.21533203125, 0.30767250061035156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000091.npy"}
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 0.5885197520256042, "std": 1.1328461170196533, "min": -2.355548858642578, "p10": -0.8707275390624999, "median": 0.6434707641601562, "p90": 2.069784545898438, "max": 3.3881912231445312, "pos_frac": 0.671875, "sample": [0.662567138671875, -0.05157470703125, -1.3464813232421875, -0.12060737609863281, 0.17557525634765625, 0.7428207397460938, 0.06766891479492188, 0.7596054077148438, 0.05582427978515625, -0.6951141357421875, -0.92657470703125, 1.4466304779052734, 1.6454524993896484, -0.4295654296875, 2.170001983642578, 1.6096019744873047, 0.9882774353027344, -0.06009101867675781, 2.225811004638672, -0.7111358642578125, 0.560546875, -1.13604736328125, 1.7960624694824219, 0.5662841796875, 1.996490478515625, 0.2164478302001953, 0.8001327514648438, 1.0967636108398438, -0.32047271728515625, 0.8729705810546875, -1.16455078125, 2.1011962890625, 0.5501632690429688, -0.08475303649902344, 0.4050445556640625, -0.41039276123046875, -0.195037841796875, 1.6282882690429688, -0.472869873046875, -0.52044677734375, 1.123077392578125, 0.4667472839355469, 1.5474319458007812, 1.6011962890625, 0.8857002258300781, -1.4873123168945312, 1.5391998291015625, 1.7366790771484375, 3.3881912231445312, -1.27911376953125, -0.74041748046875, 2.6908721923828125, 1.0696792602539062, -0.0010318756103515625, 1.0005645751953125, -2.355548858642578, 1.657501220703125, 0.6243743896484375, 0.7914600372314453, 2.170440673828125, 2.1562252044677734, 0.2663383483886719, 1.0978164672851562, 1.2206802368164062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000092.npy"}
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 0.5529758930206299, "std": 1.0988205671310425, "min": -2.5490341186523438, "p10": -0.7586837768554687, "median": 0.5428810119628906, "p90": 1.833590888977051, "max": 3.8421554565429688, "pos_frac": 0.671875, "sample": [0.7032852172851562, 1.2029953002929688, 2.39324951171875, 0.7390861511230469, -2.5490341186523438, -0.17514801025390625, 0.9903106689453125, -0.19195175170898438, -0.14611053466796875, 1.7187538146972656, 2.1320152282714844, 1.8369274139404297, 3.8421554565429688, -1.3786964416503906, 0.4407615661621094, 0.5408706665039062, 0.9100723266601562, 1.3012657165527344, -0.18623733520507812, -0.8883323669433594, 0.9822998046875, 0.714019775390625, -0.5406646728515625, 0.8974838256835938, 0.12241935729980469, -0.29590797424316406, 1.8023147583007812, -0.26079559326171875, -0.66986083984375, -0.231903076171875, 0.75592041015625, -0.15474319458007812, 0.7413921356201172, 2.08331298828125, -0.7651290893554688, 0.544891357421875, -0.7436447143554688, 0.442047119140625, 0.2019195556640625, 0.27097320556640625, 0.8545265197753906, 0.7039299011230469, 1.3917903900146484, 1.3274822235107422, 0.5173263549804688, 1.0034103393554688, 0.4317169189453125, -0.15136337280273438, 2.74310302734375, 0.20739173889160156, 2.6999263763427734, 0.7270641326904297, 1.7368316650390625, 1.2295303344726562, -0.4301910400390625, -0.856201171875, 1.8258056640625, -0.6912918090820312, -1.1070022583007812, 0.29602813720703125, 1.4778900146484375, -0.7962265014648438, 0.2743034362792969, 0.8420906066894531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000093.npy"}
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 0.6602364778518677, "std": 1.326074242591858, "min": -4.495250701904297, "p10": -0.8008960723876952, "median": 0.7379970550537109, "p90": 2.0005924224853517, "max": 4.49151611328125, "pos_frac": 0.765625, "sample": [1.7931442260742188, 0.36602020263671875, 2.4019317626953125, 2.360696792602539, 0.9648475646972656, -1.8324851989746094, -0.16158485412597656, 0.5262660980224609, 0.7197494506835938, 1.886016845703125, 0.2637481689453125, 2.8999557495117188, 0.0057373046875, -4.495250701904297, 0.8918952941894531, 0.8235702514648438, -1.1485099792480469, 0.101104736328125, 0.9560089111328125, 0.5325469970703125, 1.8627490997314453, 1.8379287719726562, 0.8613471984863281, 1.3117446899414062, 1.5103759765625, 0.2625141143798828, 0.3531455993652344, -0.7035026550292969, -1.31134033203125, 0.18097686767578125, 1.1021995544433594, -0.27567100524902344, 0.5694808959960938, -0.34857177734375, 0.9761466979980469, 0.9736175537109375, 0.8231277465820312, 2.0074920654296875, 0.7562446594238281, 4.49151611328125, 1.7520580291748047, 1.4940567016601562, -0.5130577087402344, 0.649017333984375, 1.9844932556152344, 1.2658157348632812, -0.4747314453125, 2.359619140625, 1.8167266845703125, 0.9855556488037109, 3.22039794921875, 0.4056110382080078, 0.6350479125976562, -0.5040626525878906, 0.9314002990722656, 0.5023975372314453, -0.8426361083984375, 1.2340431213378906, -0.5498046875, 1.1017017364501953, 0.32938194274902344, -1.3668289184570312, -1.2809600830078125, 0.022962570190429688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000094.npy"}
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 0.7608871459960938, "std": 1.0863547325134277, "min": -2.010589599609375, "p10": -0.7371252059936523, "median": 0.7781333923339844, "p90": 1.9012790679931644, "max": 3.6617584228515625, "pos_frac": 0.796875, "sample": [3.474853515625, -1.5019187927246094, 0.9848098754882812, 0.3684196472167969, 0.00685882568359375, 0.2083415985107422, 1.8062744140625, 0.8568077087402344, 0.9325103759765625, 1.8472404479980469, 0.6028518676757812, 0.9182052612304688, 1.8193206787109375, 0.3399467468261719, -0.8631820678710938, -0.02841949462890625, 0.361785888671875, 0.7455673217773438, 0.5545177459716797, 0.7151985168457031, 0.3379058837890625, 2.1310958862304688, -0.7252864837646484, 1.7841911315917969, -0.178863525390625, -0.74322509765625, 2.7376708984375, 0.737579345703125, 0.8241348266601562, -2.010589599609375, 0.810699462890625, 1.350982666015625, 0.4586753845214844, 1.5790252685546875, -0.7421989440917969, 0.17688751220703125, 1.9244384765625, 1.160430908203125, -0.9910736083984375, 0.34552001953125, 0.068572998046875, 0.15737152099609375, 0.82220458984375, 1.2232723236083984, -0.1290435791015625, 1.3345108032226562, 1.3349609375, -0.23372650146484375, 1.053537368774414, 2.9355926513671875, 1.3333969116210938, 2.1808624267578125, -0.7759971618652344, 0.8450088500976562, 1.5616455078125, 3.6617584228515625, 0.4017448425292969, 0.4416961669921875, 1.5992107391357422, 1.628692626953125, 0.1272430419921875, 1.168039321899414, 1.3550567626953125, -0.5168228149414062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000095.npy"}
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 0.7806137800216675, "std": 1.1617895364761353, "min": -2.0982589721679688, "p10": -0.6893920898437499, "median": 0.7964019775390625, "p90": 2.134159851074219, "max": 3.728363037109375, "pos_frac": 0.734375, "sample": [1.3169288635253906, -1.6222152709960938, 2.00469970703125, 0.4467315673828125, 1.6125030517578125, 0.3244476318359375, 0.740325927734375, -1.0379180908203125, 0.9654006958007812, 0.00293731689453125, 1.3296737670898438, -0.11921882629394531, 0.905792236328125, 2.8936920166015625, 1.3594741821289062, 2.029998779296875, -0.7881240844726562, -0.07057952880859375, 1.2452201843261719, -0.3585243225097656, 1.5953941345214844, 1.6355209350585938, 1.1595916748046875, -2.0982589721679688, 0.09876251220703125, 0.356231689453125, 1.4802703857421875, 0.36551666259765625, 0.745086669921875, -0.08558273315429688, 1.6615867614746094, 2.1642303466796875, 3.728363037109375, 1.9125823974609375, -0.7391395568847656, 0.81646728515625, -1.0040817260742188, 0.3566913604736328, 0.776336669921875, 2.063995361328125, 2.30987548828125, -0.39923667907714844, 1.4568367004394531, 0.06384658813476562, 2.91619873046875, 0.2584991455078125, 1.8209819793701172, -0.5354461669921875, 2.40460205078125, -0.2848167419433594, 1.7158126831054688, -0.5733146667480469, 0.8828067779541016, 1.9217147827148438, 2.267974853515625, 1.1043930053710938, 0.4746265411376953, 0.17066192626953125, -0.0814361572265625, 1.8734970092773438, 1.2015533447265625, 0.05928230285644531, -0.21692657470703125, -1.0235137939453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000096.npy"}
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 0.5719987154006958, "std": 1.2836002111434937, "min": -2.1820907592773438, "p10": -0.8188222885131835, "median": 0.34498023986816406, "p90": 1.86147689819336, "max": 5.847381591796875, "pos_frac": 0.671875, "sample": [-0.3141937255859375, 0.8480300903320312, -1.5807571411132812, 3.6734485626220703, 1.0277233123779297, 3.0159149169921875, -0.040557861328125, -1.4625244140625, 0.27298736572265625, -0.856536865234375, -1.0686321258544922, -0.2972869873046875, 0.989654541015625, 1.3702754974365234, -1.0913467407226562, -2.1820907592773438, -0.3312854766845703, 1.6354598999023438, 1.3398895263671875, 0.22731399536132812, 0.8784618377685547, 0.85296630859375, -0.1632232666015625, -0.09835052490234375, 0.13161468505859375, 0.2740936279296875, 0.30217742919921875, 1.0924758911132812, 1.0424232482910156, 0.2140655517578125, 0.9234619140625, -0.7297821044921875, 1.9283676147460938, 0.6749305725097656, 0.5064296722412109, 1.108551025390625, 2.9105148315429688, 0.3877830505371094, -0.4621257781982422, 1.7053985595703125, 1.1199264526367188, -0.17612075805664062, 5.847381591796875, 0.9953880310058594, 0.49364471435546875, -1.0838813781738281, 0.19713211059570312, 0.15569305419921875, 2.14019775390625, -0.66448974609375, -0.33353424072265625, 0.2429962158203125, 1.133209228515625, -0.3019866943359375, 1.9761428833007812, 1.5389175415039062, 1.7021198272705078, 0.17383956909179688, 0.7512416839599609, 1.5703048706054688, 1.2857093811035156, -0.16825103759765625, 0.08743667602539062, -0.7308216094970703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000097.npy"}
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.5008175373077393, "std": 0.9746960997581482, "min": -1.2800636291503906, "p10": -0.8194396972656249, "median": 0.521514892578125, "p90": 1.773215293884278, "max": 2.755401611328125, "pos_frac": 0.71875, "sample": [0.116973876953125, 0.7304172515869141, 0.6184787750244141, 1.4424591064453125, 0.96429443359375, 0.2587471008300781, 0.29869842529296875, 0.19168472290039062, 0.9332122802734375, 1.4942741394042969, -0.838226318359375, -0.46222686767578125, 1.5587158203125, 0.5033645629882812, 1.8472900390625, 0.8010025024414062, 0.24415206909179688, 2.7246360778808594, 0.7366847991943359, -0.2128448486328125, 0.8482475280761719, 2.2519912719726562, -0.775604248046875, 1.34783935546875, 0.041095733642578125, -0.33954620361328125, 0.006561279296875, 0.48157501220703125, -0.5554103851318359, -0.416839599609375, 1.1471328735351562, 0.677032470703125, 1.2042999267578125, 1.0507049560546875, -1.2800636291503906, 0.2690887451171875, 0.7937088012695312, 2.3386497497558594, 0.5748844146728516, 0.3457221984863281, 1.1307601928710938, 0.08559417724609375, -0.5713958740234375, 0.9590129852294922, -1.1215591430664062, -1.1808929443359375, 1.6003742218017578, 0.2246417999267578, -0.133209228515625, 2.2471656799316406, 0.13216400146484375, 2.1385955810546875, -0.31249237060546875, -1.0564289093017578, -1.2030029296875, 0.5424957275390625, 1.0290145874023438, -0.08699607849121094, 0.5396652221679688, 2.755401611328125, -0.09505844116210938, 1.0297317504882812, 0.6331748962402344, -1.1972923278808594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000098.npy"}
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.756588876247406, "std": 1.2407251596450806, "min": -2.270721435546875, "p10": -0.4907482147216797, "median": 0.6395168304443359, "p90": 2.1720941543579104, "max": 4.125923156738281, "pos_frac": 0.6875, "sample": [2.84271240234375, 3.384124755859375, 1.1136016845703125, 3.3822860717773438, 0.3953704833984375, 0.7308807373046875, -0.4518890380859375, -0.49268341064453125, 0.647247314453125, 4.125923156738281, 1.3729248046875, 2.1180801391601562, 1.9887638092041016, 2.3566513061523438, 0.5448513031005859, -0.41428375244140625, 0.32822418212890625, 0.4403724670410156, 0.40729522705078125, 1.9774799346923828, 1.3704071044921875, -0.330718994140625, -0.12892532348632812, 0.2140350341796875, -0.01080322265625, -0.04819488525390625, 1.729522705078125, -0.851654052734375, -0.4862327575683594, 1.4361991882324219, 0.28688812255859375, 1.048583984375, 0.01358795166015625, 0.6396102905273438, 0.6394233703613281, 0.5076675415039062, 1.2752914428710938, -1.345998764038086, 0.8418292999267578, 2.1485862731933594, 0.8965225219726562, -1.328582763671875, 2.182168960571289, 3.7298736572265625, -0.36833953857421875, 1.6255569458007812, 1.5432586669921875, -0.1597900390625, -2.270721435546875, 1.0805034637451172, 1.5629043579101562, 1.2564525604248047, 1.3268852233886719, 1.1119575500488281, -1.2048492431640625, -0.08113861083984375, -0.06521987915039062, 1.4759674072265625, 0.5065250396728516, -0.4989776611328125, 0.6517868041992188, 0.23909378051757812, -0.215667724609375, -0.3215217590332031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000099.npy"}
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 0.6382259130477905, "std": 1.5311015844345093, "min": -3.1363754272460938, "p10": -0.9863174438476562, "median": 0.5313320159912109, "p90": 2.419554519653322, "max": 5.883453369140625, "pos_frac": 0.6875, "sample": [1.18817138671875, -0.0281982421875, -1.0182647705078125, 1.4720840454101562, 1.253265380859375, -0.4994049072265625, 0.6422615051269531, 1.7354545593261719, 2.5703887939453125, 0.7906875610351562, 1.147918701171875, 1.1492691040039062, 1.3483505249023438, 1.9463729858398438, 1.1674385070800781, 3.3862991333007812, -0.911773681640625, 0.17099380493164062, 0.4865875244140625, 0.39556884765625, 0.5642051696777344, 0.20843505859375, -1.5420455932617188, 0.06711006164550781, -0.3985404968261719, 1.4239959716796875, 0.11170768737792969, 1.3715286254882812, 0.4984588623046875, -1.61993408203125, 0.1955413818359375, -1.35443115234375, -0.845062255859375, -1.3714523315429688, -0.3504314422607422, -0.6255340576171875, 0.825836181640625, -0.20927047729492188, -3.1363754272460938, 3.3178176879882812, -0.8868408203125, 3.6344757080078125, -0.3326835632324219, 0.3854522705078125, -0.06608963012695312, 2.067607879638672, 2.0653457641601562, -0.12131118774414062, 0.3426399230957031, -0.8043403625488281, 3.4904632568359375, 1.839263916015625, 1.2154006958007812, 0.8416690826416016, 0.220245361328125, 1.2450084686279297, 0.14522552490234375, 2.7265472412109375, 1.9161224365234375, 1.0478591918945312, 5.883453369140625, 0.939361572265625, -3.06988525390625, 0.596435546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000100.npy"}
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 0.9371322393417358, "std": 1.197127103805542, "min": -1.1471328735351562, "p10": -0.558612060546875, "median": 0.8388223648071289, "p90": 2.747290420532227, "max": 4.294227600097656, "pos_frac": 0.765625, "sample": [0.350067138671875, 0.86114501953125, 0.28725433349609375, -0.5936279296875, 2.8822021484375, 2.2592926025390625, 1.3435897827148438, 0.5207748413085938, -0.2731475830078125, 1.0542144775390625, -0.3659248352050781, 1.78619384765625, -0.5707244873046875, 0.2847862243652344, 2.9659805297851562, -0.29253387451171875, 2.0924224853515625, 0.9904632568359375, -0.31260108947753906, -1.1471328735351562, 1.925750732421875, 0.43701934814453125, 1.5162811279296875, 0.4215888977050781, 0.730926513671875, 1.1197853088378906, 2.0829315185546875, 2.9351882934570312, 0.49753570556640625, 2.7546043395996094, 1.21636962890625, 2.730224609375, 2.7695446014404297, 0.1923370361328125, -0.7511825561523438, -1.1273193359375, 0.9409675598144531, -0.6815109252929688, -0.19144248962402344, -0.10870361328125, 0.31375885009765625, 1.9287948608398438, 0.828948974609375, 0.4919929504394531, 2.5655059814453125, 0.7561264038085938, 0.8909759521484375, -0.5303497314453125, 0.20575714111328125, 0.8486957550048828, 3.3010330200195312, 0.3293724060058594, 2.3859825134277344, 1.706888198852539, 0.38629150390625, -0.3527374267578125, 1.2359771728515625, 0.9029388427734375, -0.8807907104492188, 1.1587600708007812, 1.74676513671875, 1.351959228515625, 4.294227600097656, 0.5759983062744141], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000101.npy"}
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.6220629215240479, "std": 1.4308024644851685, "min": -2.5186233520507812, "p10": -0.9946138381958007, "median": 0.4859046936035156, "p90": 2.258677291870118, "max": 4.18511962890625, "pos_frac": 0.65625, "sample": [2.1385650634765625, -0.095458984375, -0.3069877624511719, 1.158294677734375, -2.31207275390625, -0.028623580932617188, -0.5381240844726562, 1.2510833740234375, -2.5186233520507812, 0.17694854736328125, 1.4979095458984375, 0.46909332275390625, -0.17372894287109375, 0.25283050537109375, 1.77545166015625, -0.6740493774414062, -1.5317630767822266, 3.6260223388671875, -0.4083099365234375, 1.8071212768554688, -1.005178451538086, 0.3694133758544922, 0.8916168212890625, 0.6789932250976562, 1.649271011352539, 1.2242813110351562, 1.5225105285644531, 1.7901649475097656, -0.9225921630859375, 0.5920581817626953, 2.0428085327148438, 2.3101539611816406, -0.29697418212890625, 1.4612236022949219, 0.03763580322265625, 0.4739837646484375, 0.8884010314941406, 3.6793746948242188, 3.231292724609375, -0.4316730499267578, 0.19962120056152344, 3.4466552734375, -1.4716873168945312, 0.833953857421875, -0.9699630737304688, 0.5436782836914062, 0.49782562255859375, 0.3228607177734375, 0.7067813873291016, 0.30298614501953125, 1.645925521850586, 1.7610015869140625, 0.01924896240234375, -1.083648681640625, -0.2834815979003906, -0.5741500854492188, -0.3528594970703125, 4.18511962890625, -0.38457489013671875, -2.2069854736328125, 2.0184555053710938, 1.8747100830078125, 2.423137664794922, 0.6050701141357422], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000102.npy"}
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 0.7298538088798523, "std": 1.3547868728637695, "min": -3.0872802734375, "p10": -0.4660758972167969, "median": 0.5379066467285156, "p90": 2.304262924194336, "max": 4.8697662353515625, "pos_frac": 0.703125, "sample": [-0.327392578125, 1.004913330078125, 2.0150604248046875, -1.6595230102539062, 1.717041015625, 0.47957611083984375, 1.2484054565429688, 0.3017005920410156, 0.7483634948730469, 0.9891204833984375, -0.31458282470703125, 0.903411865234375, 0.45084381103515625, 0.3047027587890625, -0.47263336181640625, 1.7250442504882812, 2.0123958587646484, 2.3116455078125, -3.0872802734375, 2.8450355529785156, -0.1666717529296875, 0.2988929748535156, 0.4444160461425781, 1.4701194763183594, 0.8836174011230469, -0.3133544921875, -0.3012275695800781, 1.925201416015625, -1.1006088256835938, 0.9159126281738281, 0.33434295654296875, 2.546478271484375, 2.840301513671875, 0.13328170776367188, -0.103851318359375, 0.5962371826171875, -0.3470458984375, 0.789520263671875, -0.15045928955078125, 0.09938812255859375, 1.9808349609375, 0.33548927307128906, 0.9375572204589844, 0.7864151000976562, 1.571817398071289, -0.126312255859375, 2.287036895751953, 0.06652450561523438, 2.790578842163086, -0.5853843688964844, 4.8697662353515625, 2.2517547607421875, -1.7545585632324219, 0.3085289001464844, -0.7940826416015625, -0.23146820068359375, 0.23448944091796875, -0.2536506652832031, -0.450775146484375, 0.70849609375, 0.9784088134765625, 1.0891799926757812, 4.7903594970703125, 0.9292984008789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000103.npy"}
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 0.7216675877571106, "std": 1.4268146753311157, "min": -2.2168655395507812, "p10": -0.84752140045166, "median": 0.46981143951416016, "p90": 2.486970138549805, "max": 5.26251220703125, "pos_frac": 0.703125, "sample": [0.3031005859375, 1.0138702392578125, 1.088653564453125, 0.3003578186035156, -1.5097122192382812, 0.40972137451171875, 0.719146728515625, 0.8731231689453125, 1.6577682495117188, -0.4508056640625, 0.10140609741210938, -1.2542610168457031, 5.26251220703125, 1.42840576171875, -1.6098175048828125, -0.326416015625, 2.51983642578125, 0.40395545959472656, 0.4012031555175781, 0.7764663696289062, -0.0398712158203125, -1.973541259765625, 2.36688232421875, 2.0595550537109375, -0.0208587646484375, 0.18802642822265625, 0.5300788879394531, -0.6582565307617188, 2.633413314819336, 0.08124542236328125, 0.5260448455810547, -0.21782493591308594, -0.4740161895751953, 0.44300270080566406, 1.5282554626464844, 1.40899658203125, 0.3294677734375, -0.7445163726806641, 2.4067916870117188, 0.49662017822265625, 1.075042724609375, 0.2070636749267578, 4.0960693359375, -0.3620624542236328, 1.5591201782226562, 2.4102821350097656, 1.872955322265625, 0.3565216064453125, -2.2168655395507812, -0.6920089721679688, 1.6310234069824219, 0.9562892913818359, 1.9061470031738281, 2.3049659729003906, 0.5688152313232422, 0.9987831115722656, -0.02089691162109375, -0.3461456298828125, 0.34673309326171875, -1.5778045654296875, -0.8916664123535156, 2.8528518676757812, 2.843017578125, 3.3304824829101562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000104.npy"}
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 0.616238534450531, "std": 1.4436521530151367, "min": -2.5247955322265625, "p10": -1.0690422058105467, "median": 0.5094919204711914, "p90": 2.2856979370117188, "max": 4.268014907836914, "pos_frac": 0.65625, "sample": [0.8266277313232422, 0.95684814453125, 0.9923095703125, 0.7658500671386719, 0.6543350219726562, 1.4886589050292969, -2.5247955322265625, 4.030120849609375, -1.558319091796875, 0.9821605682373047, -0.5228042602539062, 2.1760311126708984, 1.7276172637939453, 2.2657241821289062, 0.23067092895507812, 3.5652542114257812, -0.09094429016113281, -2.0806655883789062, -0.24815940856933594, 2.5504608154296875, 0.28855133056640625, 0.07004547119140625, 0.3296966552734375, -1.1309661865234375, -0.448638916015625, -0.2810821533203125, 1.3113212585449219, 2.9272003173828125, 0.7562255859375, -0.5190887451171875, -0.19769287109375, 1.0653400421142578, -0.9245529174804688, 0.26322174072265625, -0.8353729248046875, 1.9359130859375, 0.5444183349609375, 0.9748687744140625, -0.7095260620117188, 0.8257980346679688, -0.13330841064453125, -0.07741165161132812, -0.5479049682617188, 1.8926448822021484, 1.79498291015625, 2.2250709533691406, -2.2124481201171875, 0.0874176025390625, -1.483184814453125, -0.18531036376953125, 0.3143768310546875, 0.8507575988769531, 0.4745655059814453, 3.4899215698242188, 2.1260223388671875, 0.32110595703125, 4.268014907836914, 1.2012939453125, -0.7614593505859375, 1.436065673828125, -1.3750152587890625, 2.2942581176757812, 0.8035736083984375, 0.20257568359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000105.npy"}
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 0.6594526171684265, "std": 1.389833688735962, "min": -3.7085418701171875, "p10": -0.9515304565429685, "median": 0.865966796875, "p90": 2.273525428771973, "max": 3.770660400390625, "pos_frac": 0.71875, "sample": [0.8693923950195312, 1.7216949462890625, 1.134500503540039, 1.4297828674316406, -0.2017669677734375, -0.6849899291992188, 3.770660400390625, 1.7044830322265625, 0.10674285888671875, 1.1623802185058594, 1.0521163940429688, 1.3034324645996094, 3.2777099609375, 1.1170425415039062, 0.15073204040527344, -0.6946983337402344, 0.7259597778320312, 0.6021881103515625, 0.2083282470703125, -1.2849884033203125, 0.06514739990234375, 0.57330322265625, -0.72845458984375, 0.27715301513671875, -0.11728382110595703, -1.2910308837890625, -0.26897621154785156, -1.4897613525390625, 1.0680084228515625, 2.67694091796875, -0.7465591430664062, 0.7645282745361328, -0.434417724609375, 0.8625411987304688, 1.9480743408203125, 2.5533065795898438, 2.194814682006836, -0.5488204956054688, 2.3072586059570312, 1.60894775390625, 1.1995048522949219, 0.9427719116210938, 3.3065567016601562, 1.0804786682128906, -1.0393753051757812, 0.1324310302734375, -0.03604888916015625, 1.4165496826171875, -3.7085418701171875, 0.6908245086669922, -1.969390869140625, 0.9735107421875, 2.0288619995117188, 2.354684829711914, 1.5344619750976562, 0.5554122924804688, 0.09017181396484375, 1.8006420135498047, -0.29217529296875, 1.9679107666015625, -2.978851318359375, 1.0834007263183594, 1.0941619873046875, 1.2315902709960938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000106.npy"}
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 0.4087446928024292, "std": 1.5466645956039429, "min": -3.1146087646484375, "p10": -1.5494735717773436, "median": 0.3945579528808594, "p90": 2.237779808044434, "max": 4.4265289306640625, "pos_frac": 0.671875, "sample": [0.3078784942626953, -2.7254180908203125, 0.8974761962890625, 3.725200653076172, 1.4629745483398438, 0.9638290405273438, -1.716888427734375, 0.9340667724609375, -1.6875152587890625, 0.9442672729492188, -3.1146087646484375, 1.5011634826660156, 1.2825279235839844, 0.054595947265625, 0.8693618774414062, -1.247903823852539, 0.0628204345703125, 0.1805419921875, 0.08785629272460938, -1.317230224609375, -0.47196197509765625, 4.4265289306640625, 3.8747940063476562, -0.12477684020996094, 1.9484443664550781, -2.6156463623046875, 2.2855777740478516, -1.3967437744140625, -0.16350555419921875, 0.15830230712890625, 0.6602439880371094, -1.1507797241210938, 0.4939117431640625, 0.8149032592773438, 0.24080657958984375, 0.806640625, 3.07989501953125, 0.43126678466796875, 1.235443115234375, 3.290191650390625, 1.133941650390625, 2.126251220703125, -0.9760513305664062, -1.0199661254882812, 1.3146648406982422, 1.1635894775390625, 0.4813804626464844, 0.1064300537109375, 0.5203189849853516, -1.61492919921875, 3.0150146484375, 1.9714698791503906, 0.32842254638671875, -1.005208969116211, -0.5329151153564453, 0.35784912109375, -0.268463134765625, 0.9626617431640625, 0.7136344909667969, -0.6383743286132812, -0.7454910278320312, -1.77880859375, 0.2988739013671875, 0.9568328857421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000107.npy"}
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 0.7920923233032227, "std": 1.4767717123031616, "min": -2.9554061889648438, "p10": -1.02684326171875, "median": 0.7920570373535156, "p90": 2.917013549804688, "max": 3.76239013671875, "pos_frac": 0.640625, "sample": [1.2930145263671875, 1.6900177001953125, 2.7883453369140625, 2.043060302734375, -0.19921112060546875, 2.8657760620117188, 1.75, 1.9864120483398438, 3.289886474609375, -1.0560302734375, 0.34228515625, 3.4951400756835938, 1.3871612548828125, -0.2303905487060547, 1.522613525390625, 0.6322402954101562, -0.89483642578125, 0.03681182861328125, 0.5077018737792969, 1.6730232238769531, 1.6905364990234375, 0.23085403442382812, -0.1794261932373047, -1.3023223876953125, 1.6989364624023438, -0.78411865234375, 1.5934524536132812, -1.55572509765625, 3.76239013671875, 0.3235015869140625, 0.2461395263671875, -0.6472206115722656, 1.7191390991210938, -0.22600173950195312, 1.4384002685546875, 3.6393775939941406, -0.5812358856201172, 0.11262130737304688, -2.9554061889648438, 1.6856307983398438, -1.1690444946289062, 3.220703125, 0.4735984802246094, -0.3463478088378906, -0.5002479553222656, 1.4190254211425781, -0.5795516967773438, 2.9389724731445312, -0.5311737060546875, -1.344818115234375, -1.3268394470214844, -0.958740234375, -0.0946044921875, 2.0161666870117188, 0.951873779296875, 1.3537750244140625, 1.1857261657714844, 3.16436767578125, 1.4379997253417969, 1.8002700805664062, -0.21671676635742188, 1.9984512329101562, -0.200103759765625, 1.1686248779296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000108.npy"}
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 0.9786243438720703, "std": 1.2995643615722656, "min": -0.805633544921875, "p10": -0.5145942687988281, "median": 0.7186470031738281, "p90": 2.5641550064086913, "max": 5.574798583984375, "pos_frac": 0.78125, "sample": [0.853546142578125, 2.274688720703125, 3.9766845703125, -0.069854736328125, 1.6922779083251953, 1.2414665222167969, 2.8129539489746094, 1.3396377563476562, 0.9562511444091797, 1.5175552368164062, 0.5448703765869141, 0.2853546142578125, 2.540935516357422, 1.8945236206054688, 0.098236083984375, 0.61798095703125, 1.5235271453857422, 0.3192119598388672, 0.08945846557617188, 1.1308765411376953, 1.2946701049804688, 2.574106216430664, -0.704925537109375, -0.004876136779785156, 0.83380126953125, 4.4903411865234375, -0.5888442993164062, 0.21511459350585938, 0.4891185760498047, 5.574798583984375, 0.49080848693847656, 0.6484832763671875, -0.23759078979492188, -0.5111923217773438, 0.6966514587402344, 0.7864627838134766, -0.7976665496826172, 0.7132110595703125, 0.35987091064453125, 1.6869277954101562, -0.6392745971679688, 0.4508953094482422, 0.2472667694091797, -0.1537933349609375, -0.08945274353027344, -0.51605224609375, 0.6772613525390625, 0.7240829467773438, -0.7906570434570312, -0.805633544921875, 0.7470798492431641, 0.8278961181640625, 0.5838546752929688, 3.167449951171875, 1.576681137084961, 2.359619140625, 3.8954830169677734, 1.6931228637695312, 1.082427978515625, 1.5476570129394531, 0.3028564453125, -0.41461944580078125, 0.85699462890625, 1.6513595581054688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000109.npy"}
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": 0.48471444845199585, "std": 1.4259756803512573, "min": -3.3893966674804688, "p10": -1.2668205261230467, "median": 0.4790468215942383, "p90": 2.1248411178588875, "max": 4.114593505859375, "pos_frac": 0.703125, "sample": [0.1694793701171875, 0.522064208984375, -1.6521453857421875, 0.28878021240234375, 0.2515106201171875, 0.7422599792480469, 1.7591781616210938, 2.2304420471191406, 0.2987060546875, 2.7698822021484375, 1.7995491027832031, 1.1349506378173828, 0.329925537109375, 2.3271865844726562, -0.5134963989257812, -0.9078750610351562, -0.11561965942382812, 3.2686614990234375, -0.497650146484375, -3.036710739135742, 0.43602943420410156, 0.9125442504882812, 0.2095184326171875, 1.0826034545898438, -0.003276824951171875, 0.6502304077148438, 0.7060489654541016, -0.7922286987304688, 0.21892166137695312, -1.6451263427734375, 1.2609710693359375, 1.2070236206054688, 1.279296875, -1.0407028198242188, -1.0331306457519531, 0.29055023193359375, -1.345489501953125, 3.0688629150390625, 0.15120315551757812, 1.2489128112792969, 1.119140625, 0.531036376953125, 0.19500732421875, 0.19892120361328125, -0.17177963256835938, 1.1756362915039062, 1.6707000732421875, -1.0832595825195312, -1.0074462890625, -3.3893966674804688, 0.7628631591796875, 1.2003326416015625, 0.38059234619140625, -1.6017532348632812, 0.6808013916015625, 3.95721435546875, 1.5848236083984375, 1.878438949584961, 0.6466159820556641, 4.114593505859375, -1.3692817687988281, 1.1934967041015625, 0.85076904296875, -0.5281829833984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000110.npy"}
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 0.7781772613525391, "std": 1.5368856191635132, "min": -2.9588623046875, "p10": -1.168387222290039, "median": 0.984954833984375, "p90": 2.8503063201904295, "max": 4.2120361328125, "pos_frac": 0.6875, "sample": [-0.11321830749511719, -1.8479766845703125, -0.37258148193359375, 4.2120361328125, 1.5846290588378906, 1.0686492919921875, -1.1650199890136719, 1.6775588989257812, 2.1762428283691406, 1.082712173461914, 1.9473800659179688, 1.7319869995117188, -0.9704360961914062, 0.3779754638671875, -0.7658538818359375, 1.2373733520507812, 0.05951881408691406, 1.2329769134521484, -0.7438888549804688, 2.854633331298828, 0.0161590576171875, 0.5167083740234375, 1.3905868530273438, 2.883697509765625, 1.566741943359375, 1.6890602111816406, 1.0584182739257812, 3.1435928344726562, 0.7896213531494141, 0.96917724609375, -0.1445941925048828, -1.3777294158935547, -2.9588623046875, -1.4808425903320312, 0.3485527038574219, 1.0419921875, 0.8558254241943359, 2.9601287841796875, -0.6054744720458984, 2.8089370727539062, 2.4523544311523438, 1.9365463256835938, 2.8402099609375, -0.4212818145751953, 1.2967987060546875, 0.13088035583496094, 1.1519603729248047, 4.024818420410156, -1.169830322265625, 1.2357158660888672, 2.7611618041992188, 3.1536941528320312, 1.6391334533691406, -0.2683448791503906, -2.6383819580078125, 0.3753166198730469, -1.4244003295898438, 0.44211578369140625, -0.11762619018554688, 1.634307861328125, 1.000732421875, -0.4208526611328125, 0.2966041564941406, -0.8446807861328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000111.npy"}
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 0.9384238719940186, "std": 1.4297243356704712, "min": -3.2329177856445312, "p10": -0.8533596038818358, "median": 0.9981727600097656, "p90": 2.7493103027343753, "max": 3.8387222290039062, "pos_frac": 0.765625, "sample": [2.758026123046875, 3.823638916015625, 3.3659210205078125, 1.3168678283691406, -0.26854896545410156, 1.4063262939453125, 3.8387222290039062, 1.2388153076171875, 1.8917083740234375, 2.8610153198242188, 0.43114471435546875, 0.2316741943359375, -1.1705131530761719, 1.32354736328125, 0.1285076141357422, -1.55426025390625, 0.9290618896484375, 0.28208160400390625, -2.6140975952148438, 0.26760101318359375, 1.4649581909179688, -0.194183349609375, 2.6322898864746094, 2.1487503051757812, 0.6397323608398438, 2.0681190490722656, -0.0580902099609375, 0.7483482360839844, 1.42938232421875, 1.056356430053711, 2.728973388671875, 0.34033203125, -0.5771427154541016, 0.25576210021972656, 1.96075439453125, 0.9652099609375, 0.9599781036376953, 0.46176910400390625, 3.6267166137695312, 0.23074722290039062, -0.43153953552246094, 1.1265869140625, 1.8827934265136719, 0.7962398529052734, -0.9624481201171875, 3.0023193359375, 2.0211944580078125, 2.006988525390625, -0.7616004943847656, 0.2533760070800781, -3.2329177856445312, 2.5622100830078125, -0.04218292236328125, 1.6789665222167969, 1.6922454833984375, 0.729217529296875, -0.3288612365722656, -1.1271800994873047, 1.1974945068359375, 1.532440185546875, 1.3885269165039062, -0.8926849365234375, 1.5608024597167969, 1.0311355590820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000112.npy"}
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 0.7529294490814209, "std": 1.503631353378296, "min": -4.6584014892578125, "p10": -0.9128913879394531, "median": 0.7067527770996094, "p90": 2.6921472549438485, "max": 3.6806106567382812, "pos_frac": 0.71875, "sample": [-0.42145538330078125, -1.4913711547851562, -0.2179107666015625, 0.12452888488769531, -0.7537384033203125, 3.3004722595214844, 0.8176460266113281, -0.059600830078125, 0.6939544677734375, 2.7633495330810547, 0.4467201232910156, 0.2629203796386719, -0.46002197265625, 2.4589614868164062, -1.4829559326171875, 0.7195510864257812, 0.8149776458740234, 3.6806106567382812, -0.7281112670898438, -2.0433006286621094, 0.39023590087890625, 0.5555038452148438, 2.1717567443847656, -1.3542022705078125, -0.8246307373046875, 0.9004592895507812, 2.4135971069335938, 1.7514324188232422, 0.5899276733398438, 0.07546424865722656, 2.402860641479492, 2.44573974609375, -0.2548637390136719, -0.9433479309082031, 1.6149368286132812, 3.250917434692383, -4.6584014892578125, 1.0513362884521484, 1.45050048828125, -0.3205413818359375, 0.6455211639404297, 3.077024459838867, 1.051544189453125, 2.5260086059570312, -0.9214286804199219, 1.2326507568359375, 0.9431915283203125, 0.23740196228027344, -0.8929710388183594, 2.517974853515625, 0.61114501953125, 1.041656494140625, 0.9561424255371094, 1.7543182373046875, 2.8666839599609375, -0.31861305236816406, 2.0964431762695312, 0.07561111450195312, 1.3223114013671875, 0.29254913330078125, 1.5707015991210938, 0.8272285461425781, 0.31955718994140625, 3.2209243774414062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000113.npy"}
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 1.214421033859253, "std": 1.567624807357788, "min": -1.4183883666992188, "p10": -0.7338785171508788, "median": 0.9099521636962891, "p90": 3.23690299987793, "max": 5.060150146484375, "pos_frac": 0.765625, "sample": [-0.0946502685546875, 0.9996109008789062, 0.6287803649902344, 0.8344688415527344, 2.415834426879883, 0.461456298828125, 1.6841316223144531, 0.16446685791015625, 1.9819412231445312, -1.1944961547851562, -0.42922210693359375, 2.0976486206054688, 1.8147697448730469, 0.535980224609375, 0.4137611389160156, -1.0792922973632812, 2.0378646850585938, 0.5461940765380859, 0.039031982421875, 0.6625213623046875, 1.7433204650878906, 0.3538360595703125, 4.5628204345703125, -0.08776283264160156, 0.5793495178222656, 2.6031723022460938, 1.3485889434814453, 2.6058902740478516, -0.6118011474609375, 2.4362945556640625, 0.40349578857421875, 0.9854354858398438, 3.5919113159179688, 2.3113250732421875, 2.619537353515625, 0.294830322265625, 2.0352020263671875, 2.5831985473632812, 3.1613845825195312, -1.4183883666992188, 5.060150146484375, 2.9857826232910156, 1.1822662353515625, -0.2245044708251953, -0.5300216674804688, 0.2696247100830078, 4.128173828125, 1.3351097106933594, -0.7588787078857422, -0.14023590087890625, -0.8662261962890625, 3.81011962890625, 1.6298789978027344, 2.81378173828125, 3.269268035888672, 3.119140625, 0.7451076507568359, 0.13529014587402344, -1.1394195556640625, -1.3687267303466797, 4.1006927490234375, -0.6755447387695312, 1.4657974243164062, 0.7538738250732422], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000114.npy"}
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 0.8903037309646606, "std": 1.5158556699752808, "min": -2.183551788330078, "p10": -0.7686244964599609, "median": 0.6728382110595703, "p90": 2.6591732025146486, "max": 5.907804489135742, "pos_frac": 0.734375, "sample": [-0.4680519104003906, 1.0291786193847656, 1.75885009765625, -2.0333709716796875, 3.3323822021484375, 1.6028690338134766, 1.046661376953125, 1.7770195007324219, 0.5383987426757812, 5.67742919921875, 1.0792350769042969, -0.7568511962890625, 5.907804489135742, -0.7736701965332031, 0.6120681762695312, 1.0105056762695312, 0.2684669494628906, 0.6203460693359375, 0.4645538330078125, 0.6292228698730469, 2.387805938720703, 0.9864120483398438, -0.918701171875, -0.290069580078125, -1.2546405792236328, 1.3953056335449219, -0.8616180419921875, 0.723114013671875, -0.7013778686523438, 2.6620216369628906, 0.5661163330078125, 0.3609619140625, 0.5146541595458984, 1.192169189453125, 0.7783279418945312, 2.65252685546875, -0.01207733154296875, 1.7403278350830078, 1.1591262817382812, 2.01544189453125, 2.61065673828125, -2.183551788330078, 2.234893798828125, 0.4005126953125, 2.8981170654296875, 0.8230361938476562, 1.88140869140625, 1.5231304168701172, -0.22095108032226562, 0.64178466796875, 0.3592205047607422, 0.5500392913818359, 3.6337528228759766, 0.02410888671875, -0.12079429626464844, -0.2808666229248047, -0.5885696411132812, -0.7379703521728516, -1.3032913208007812, 0.7038917541503906, 1.3649044036865234, 0.5401725769042969, 0.7636871337890625, 3.0432395935058594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000115.npy"}
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 0.99721360206604, "std": 1.2959531545639038, "min": -1.9451370239257812, "p10": -0.3200149536132812, "median": 0.7616291046142578, "p90": 2.8025672912597654, "max": 3.5966415405273438, "pos_frac": 0.765625, "sample": [1.6831626892089844, 0.2062530517578125, 0.6816902160644531, 3.5966415405273438, 1.9460906982421875, -0.28711700439453125, -0.6463699340820312, 0.6718273162841797, 1.685953140258789, 2.4816131591796875, 0.9545669555664062, -0.10609626770019531, 0.4272918701171875, 1.80096435546875, 0.5702877044677734, -0.24054336547851562, 0.26239776611328125, 0.02455902099609375, 2.5367393493652344, -0.2218475341796875, -1.9451370239257812, 2.0474395751953125, 2.74591064453125, 0.28845977783203125, -0.11915016174316406, 1.306304931640625, 0.9404983520507812, 3.3873558044433594, 2.29071044921875, 1.20758056640625, -0.5590782165527344, 0.012889862060546875, -0.2771339416503906, 0.5642967224121094, 3.2574462890625, 0.8415679931640625, 2.6685409545898438, 3.2556381225585938, 0.2584381103515625, 0.02460479736328125, 0.14437103271484375, 1.0298805236816406, 2.6352157592773438, 1.3493881225585938, 2.844390869140625, 1.895263671875, 2.792083740234375, -0.33411407470703125, -0.1899871826171875, 1.605224609375, 0.11343574523925781, 1.9499969482421875, 0.20522308349609375, -1.06146240234375, 0.870025634765625, 0.5648345947265625, -0.22935867309570312, 1.9544677734375, 0.12263870239257812, 1.3206787109375, 2.8070602416992188, -0.6987895965576172, 3.155942916870117, -1.2499866485595703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000116.npy"}
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 0.8635537624359131, "std": 1.538791537284851, "min": -4.595512390136719, "p10": -0.8554899215698242, "median": 0.7776031494140625, "p90": 2.6042762756347657, "max": 4.520347595214844, "pos_frac": 0.71875, "sample": [1.4564056396484375, 2.942464828491211, 0.2490386962890625, 0.47574615478515625, -0.0223846435546875, 1.5983524322509766, 1.4916934967041016, 1.3517475128173828, 0.6277923583984375, 1.0026988983154297, -0.058624267578125, -0.7344131469726562, 2.59124755859375, -0.291839599609375, 0.4136810302734375, 2.1618576049804688, -0.1102447509765625, 1.17523193359375, 1.9593734741210938, 2.60748291015625, -0.7637252807617188, 1.3130760192871094, 0.89105224609375, 3.1220664978027344, -0.6604785919189453, 0.183319091796875, -0.9057464599609375, 2.1466522216796875, 0.27707862854003906, 4.520347595214844, 0.5540046691894531, 3.8461837768554688, 2.435028076171875, -0.8558750152587891, 1.347412109375, 1.9966087341308594, 2.139638900756836, 2.952892303466797, 0.8022384643554688, 3.0863571166992188, 0.6695785522460938, -0.35192108154296875, -0.6708602905273438, 2.046741485595703, -0.5320510864257812, -4.595512390136719, 0.9401321411132812, 0.3592720031738281, 2.5967941284179688, 0.7417526245117188, 0.2512397766113281, 1.70648193359375, 1.5023956298828125, -1.4022979736328125, 2.5066070556640625, -1.8797836303710938, -1.094827651977539, -0.8545913696289062, -1.550018310546875, 0.6224288940429688, 2.588653564453125, 1.2938156127929688, 0.3050060272216797, 0.7529678344726562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000117.npy"}
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 1.058354377746582, "std": 1.4532575607299805, "min": -2.5996036529541016, "p10": -0.7449403762817383, "median": 1.195648193359375, "p90": 3.0846208572387694, "max": 4.001655578613281, "pos_frac": 0.71875, "sample": [1.6507492065429688, 0.6769866943359375, 2.551483154296875, -0.76092529296875, -2.5996036529541016, 1.2699508666992188, 1.4022140502929688, 0.3704872131347656, 2.031087875366211, -0.3593406677246094, 0.9343948364257812, 1.20721435546875, 0.4356346130371094, 3.7570724487304688, 0.4844684600830078, -0.7399749755859375, -1.4753265380859375, 1.4046173095703125, -0.028850555419921875, 1.3618621826171875, 0.19755935668945312, 2.25299072265625, 1.6002388000488281, 3.2047882080078125, 1.3770866394042969, -0.24529266357421875, -0.38660430908203125, -0.9057693481445312, 2.0316524505615234, -0.25673484802246094, 1.6582412719726562, 1.2635726928710938, 3.4968338012695312, -0.5512752532958984, 0.32924652099609375, 1.18408203125, -0.04842376708984375, 3.336273193359375, 2.5531158447265625, -1.4662551879882812, -0.7470684051513672, 3.0968780517578125, -0.124176025390625, 1.8890838623046875, 0.6683425903320312, 1.1790924072265625, -0.559417724609375, 1.637847900390625, 2.9698143005371094, 1.5565643310546875, 1.4792251586914062, 1.882171630859375, 0.034820556640625, 3.6216773986816406, 4.001655578613281, 1.0767555236816406, -0.78546142578125, 2.2443084716796875, 3.056020736694336, 3.020212173461914, 2.1678314208984375, -0.47658538818359375, 0.5387458801269531, 0.10681915283203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000118.npy"}
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 0.8163291811943054, "std": 1.6036120653152466, "min": -2.5980377197265625, "p10": -1.274945831298828, "median": 0.6877021789550781, "p90": 2.969727325439453, "max": 3.9903564453125, "pos_frac": 0.734375, "sample": [1.1058788299560547, 2.9724349975585938, -0.9514846801757812, 1.3412551879882812, -1.3464736938476562, 0.6738052368164062, 2.354095458984375, 0.8680419921875, -0.8503150939941406, 1.0000495910644531, -0.8184261322021484, 0.12398529052734375, 1.625244140625, 1.5172805786132812, 2.3341121673583984, -1.9510040283203125, 0.42691802978515625, -2.2304840087890625, 0.28113746643066406, 0.473541259765625, -0.8960666656494141, 3.621274948120117, 0.17116546630859375, 1.8117904663085938, 3.9903564453125, 0.5305328369140625, -1.66033935546875, 2.6471099853515625, 2.963409423828125, 2.4099884033203125, 1.955322265625, -1.6223678588867188, 1.7738513946533203, 1.7037220001220703, -0.455780029296875, 1.5084228515625, 0.5955734252929688, 0.01618194580078125, 0.7065467834472656, 3.433624267578125, 2.296722412109375, 1.768157958984375, -0.8234024047851562, 0.6166534423828125, -1.4800567626953125, -0.9619827270507812, 2.6124267578125, 2.1782608032226562, 3.0618133544921875, 0.2711639404296875, 2.0214080810546875, -2.5980377197265625, -1.1080474853515625, -0.9327297210693359, 0.70159912109375, 0.241455078125, 1.9439849853515625, 3.9314727783203125, 3.3533096313476562, 0.30297088623046875, 0.7265415191650391, 0.26456451416015625, -0.4016227722167969, 0.10453033447265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000119.npy"}
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 1.1603679656982422, "std": 1.4889086484909058, "min": -2.1149425506591797, "p10": -0.4673748016357422, "median": 0.9847087860107422, "p90": 2.937209701538086, "max": 5.8820953369140625, "pos_frac": 0.796875, "sample": [-1.22564697265625, 0.7325115203857422, -0.41638946533203125, 2.9007339477539062, 2.3247528076171875, 0.1446685791015625, 1.9732513427734375, 0.5902023315429688, 0.3168220520019531, -1.4630813598632812, 0.4088592529296875, 0.292694091796875, 0.5121078491210938, 1.1863174438476562, 0.656524658203125, 4.261199951171875, 1.7890548706054688, 0.6416740417480469, 0.6652908325195312, 1.0893325805664062, 2.9301681518554688, 0.442535400390625, 1.1833953857421875, 1.3505229949951172, -0.1650390625, 1.1561832427978516, 2.15447998046875, 1.868194580078125, 1.428924560546875, 1.4870529174804688, 5.286628723144531, -0.6017303466796875, 2.7636642456054688, 0.23358917236328125, 2.8912277221679688, 0.8385047912597656, 1.6938400268554688, -0.3468608856201172, 3.2826919555664062, 1.2606048583984375, -0.5045852661132812, 0.9791946411132812, -0.27759552001953125, 5.8820953369140625, -0.4760322570800781, 0.622039794921875, -2.1149425506591797, 2.9591827392578125, 2.4661712646484375, 1.229705810546875, 0.6499557495117188, -0.6464519500732422, 2.940227508544922, 1.26141357421875, 2.1941909790039062, -0.21875762939453125, 0.41876220703125, 2.9864578247070312, 0.3583717346191406, -0.447174072265625, 1.6584701538085938, 0.1604785919189453, 2.672689437866211, 0.9902229309082031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000120.npy"}
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 0.9012415409088135, "std": 1.439166784286499, "min": -1.6347198486328125, "p10": -0.6723222732543945, "median": 0.7754220962524414, "p90": 2.4120294570922853, "max": 6.5657958984375, "pos_frac": 0.734375, "sample": [3.9752655029296875, 1.264251708984375, -0.5551643371582031, -0.5400466918945312, 2.9424476623535156, 2.4033203125, 3.133686065673828, -0.6729717254638672, 1.5847053527832031, -0.7111968994140625, 0.645751953125, -0.37158203125, 4.8407440185546875, 0.43383026123046875, 0.0009765625, -0.07295417785644531, 2.415761947631836, -1.6347198486328125, 1.3932952880859375, -1.3449554443359375, 0.641326904296875, 0.7554512023925781, 1.7649192810058594, 0.9897689819335938, 0.44677734375, 1.2672805786132812, 0.7930908203125, -0.7556285858154297, 0.8445053100585938, 1.0191879272460938, 0.18957138061523438, 6.5657958984375, 1.6468048095703125, 1.0574188232421875, -0.670806884765625, -0.5239105224609375, 0.8198623657226562, 1.0544509887695312, 2.3622970581054688, 0.683258056640625, 0.987518310546875, 0.1769256591796875, 0.5199127197265625, 1.6713714599609375, 0.44515228271484375, 1.318450927734375, -1.0040359497070312, 2.0729141235351562, 1.8350677490234375, 1.2571392059326172, -0.3027057647705078, -0.326873779296875, 1.4071846008300781, 1.5950260162353516, 0.6559181213378906, 3.383758544921875, 0.8679542541503906, -0.8210601806640625, 0.7577533721923828, -0.36173248291015625, 0.08438491821289062, 1.1715736389160156, 0.656982421875, -0.4509868621826172], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000121.npy"}
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 1.2689316272735596, "std": 1.7588613033294678, "min": -2.9261474609375, "p10": -0.3782438278198242, "median": 0.8607244491577148, "p90": 3.1535564422607427, "max": 8.106048583984375, "pos_frac": 0.796875, "sample": [8.106048583984375, -0.7789077758789062, -0.181976318359375, 3.1930809020996094, -2.9261474609375, -0.3354778289794922, 1.5709381103515625, 1.5304641723632812, 0.24712562561035156, 1.2623138427734375, 2.2401885986328125, 0.26564788818359375, 0.3871479034423828, 0.530792236328125, -0.050426483154296875, 2.04412841796875, -0.035186767578125, 2.4931411743164062, -1.0729026794433594, 2.6280994415283203, 0.5774745941162109, 5.953550338745117, 1.4915733337402344, 2.566577911376953, 4.4983062744140625, 0.7855682373046875, 0.8177623748779297, -0.3965721130371094, 1.990692138671875, 2.490966796875, 0.589324951171875, 2.7541961669921875, -0.01572418212890625, 1.5632781982421875, -2.2155609130859375, 1.9749183654785156, 0.7668476104736328, 0.171234130859375, 2.1149768829345703, 0.4012603759765625, -0.07710647583007812, 0.7643852233886719, -0.8084297180175781, 3.609375, 1.3443145751953125, 2.880250930786133, 1.1068649291992188, 0.7871284484863281, 0.9036865234375, 0.6793212890625, 1.369873046875, 2.066192626953125, 0.21577072143554688, 0.44408416748046875, 0.23352813720703125, 0.9918346405029297, 0.9092903137207031, 0.46319580078125, 0.27005767822265625, 2.4247817993164062, 3.0613327026367188, 3.248271942138672, -0.417724609375, 4.7425994873046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000122.npy"}
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 0.9632514715194702, "std": 1.6074215173721313, "min": -3.934234619140625, "p10": -0.8674774169921875, "median": 1.005894660949707, "p90": 2.7853979110717777, "max": 5.196277618408203, "pos_frac": 0.734375, "sample": [3.5436782836914062, 0.8493270874023438, -0.43743896484375, 0.921722412109375, -1.7199649810791016, -0.15235137939453125, 0.5275764465332031, 0.06670379638671875, 5.196277618408203, 2.810220718383789, 0.3511505126953125, 1.6143035888671875, -0.059856414794921875, -1.502838134765625, 2.4701919555664062, 1.7251167297363281, 0.43352508544921875, 1.8179512023925781, -1.7554206848144531, 1.8722686767578125, 1.3695602416992188, 2.72747802734375, 0.8713226318359375, 2.66485595703125, 2.2526321411132812, 3.2225494384765625, 1.0956745147705078, 0.019378662109375, 1.6805419921875, 1.335693359375, -0.012495040893554688, 4.087677001953125, -2.6662979125976562, 1.780294418334961, -0.8367156982421875, -1.5612564086914062, 1.4402923583984375, -0.0897064208984375, -0.11555099487304688, 1.1092109680175781, 0.48339080810546875, 0.8816452026367188, 1.9815616607666016, 1.2831497192382812, 0.9437732696533203, 1.7215652465820312, 0.054340362548828125, 0.03847503662109375, -0.0028057098388671875, 3.6340484619140625, 1.2369155883789062, 1.0680160522460938, -0.00061798095703125, 2.286895751953125, -3.934234619140625, 2.3626708984375, -0.4088096618652344, 1.0769691467285156, -0.8806610107421875, 2.6651229858398438, 0.8422775268554688, 1.1829643249511719, 3.422607421875, 0.7615509033203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000123.npy"}
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 1.4154765605926514, "std": 1.6664632558822632, "min": -3.1238937377929688, "p10": -0.43770408630371094, "median": 1.4120960235595703, "p90": 3.834976196289064, "max": 5.3936767578125, "pos_frac": 0.8125, "sample": [-0.43309783935546875, 5.03399658203125, 1.9544830322265625, -3.1238937377929688, 1.4556865692138672, 0.9764022827148438, 4.7288665771484375, 2.1684951782226562, 2.1166000366210938, 4.181610107421875, 1.74127197265625, 2.5048866271972656, 0.7836132049560547, 3.558868408203125, 0.6417312622070312, 0.3393592834472656, 1.0313339233398438, 2.6092357635498047, 0.039813995361328125, -0.9825096130371094, 5.058837890625, 1.7411041259765625, 2.7654495239257812, -0.53350830078125, 0.3992958068847656, 1.4498939514160156, 0.7755241394042969, 0.647003173828125, 2.3412933349609375, -0.8153076171875, 1.9281749725341797, 1.374298095703125, 2.1181106567382812, 2.8427810668945312, 1.6978797912597656, 1.60284423828125, 4.148704528808594, -1.9405059814453125, 5.3936767578125, 1.115997314453125, 0.6510696411132812, 3.95330810546875, -0.4396781921386719, 2.9638824462890625, 1.617258071899414, 1.2371749877929688, 2.15264892578125, 0.5580902099609375, -0.8866710662841797, 2.713869094848633, 2.5640487670898438, -0.020496368408203125, -0.01625823974609375, 1.7070503234863281, 1.081634521484375, 0.02597808837890625, 0.5514869689941406, 0.14899444580078125, -0.3998565673828125, 0.6915664672851562, -0.14921188354492188, 2.060333251953125, 0.5350875854492188, 1.85089111328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000124.npy"}
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 1.139931559562683, "std": 1.697829008102417, "min": -1.8907699584960938, "p10": -1.082105827331543, "median": 0.9206075668334961, "p90": 3.2858596801757813, "max": 6.207878112792969, "pos_frac": 0.765625, "sample": [3.297271728515625, 2.9069595336914062, 1.6247444152832031, 1.05523681640625, 0.9105682373046875, -1.1233463287353516, 6.207878112792969, 0.5006809234619141, 1.384033203125, 1.8749637603759766, 3.990509033203125, 0.1141510009765625, -0.9067764282226562, 2.2858428955078125, 0.2732715606689453, 3.524707794189453, 2.3871536254882812, 2.0287933349609375, 0.896820068359375, 0.46105003356933594, 4.194583892822266, -0.013652801513671875, 1.982757568359375, 2.1398239135742188, 0.0678863525390625, 0.9306468963623047, 3.8502349853515625, -0.9858779907226562, -1.6054916381835938, 0.6510848999023438, 2.084625244140625, 0.5923538208007812, -0.5148086547851562, 2.85443115234375, 2.481945037841797, -0.9675750732421875, 2.432872772216797, 0.883819580078125, 2.7756118774414062, 2.2746734619140625, 4.69140625, 0.29273223876953125, 0.13750457763671875, 1.5203971862792969, 1.57891845703125, -1.8907699584960938, 0.5501174926757812, 3.2592315673828125, 0.00911712646484375, -0.07495307922363281, -0.786041259765625, 1.408203125, 1.1735763549804688, 0.4286155700683594, -1.165283203125, 0.7239990234375, 0.3046302795410156, -1.3414382934570312, 1.7880096435546875, -0.46855735778808594, 0.9593944549560547, -1.2928810119628906, -1.6854400634765625, 3.030670166015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000125.npy"}
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 0.9358232021331787, "std": 2.0215063095092773, "min": -3.3047332763671875, "p10": -1.1525882720947263, "median": 0.9158525466918945, "p90": 3.0481090545654297, "max": 9.5234375, "pos_frac": 0.6875, "sample": [1.5566120147705078, 0.3342018127441406, 2.949647903442383, 0.9653491973876953, -0.47425270080566406, 3.0348777770996094, 3.3577041625976562, -0.02362823486328125, 0.7331008911132812, -1.2480354309082031, 2.4315929412841797, 2.9049530029296875, -3.19598388671875, 1.458404541015625, 3.0537796020507812, 0.9189586639404297, -0.2526702880859375, -0.9298782348632812, -0.30690574645996094, -0.509429931640625, 0.1891632080078125, 1.7437896728515625, 0.7477149963378906, 1.045212745666504, 0.029582977294921875, -0.3203582763671875, 1.0626144409179688, -1.6120452880859375, 1.4547805786132812, -1.4058876037597656, 0.1636199951171875, 0.8408203125, -1.8075904846191406, 1.795297622680664, -0.5051975250244141, -0.5885696411132812, -0.8555221557617188, 9.5234375, 1.7056026458740234, -3.3047332763671875, 1.0853958129882812, 1.6876678466796875, 1.113626480102539, 1.3303375244140625, 0.87640380859375, 1.9344711303710938, 2.0271835327148438, 5.418815612792969, 1.2825851440429688, 2.6652374267578125, 0.9127464294433594, 4.0828704833984375, 1.34405517578125, 1.5241127014160156, 0.26560211181640625, 0.03314971923828125, -0.538909912109375, 3.126514434814453, 0.8379669189453125, 1.0156784057617188, -0.62969970703125, 4.77801513671875, -2.8684558868408203, -0.07281494140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000126.npy"}
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 1.358619213104248, "std": 1.5614140033721924, "min": -2.86474609375, "p10": -0.40924301147460934, "median": 1.2153205871582031, "p90": 3.524214363098145, "max": 5.204170227050781, "pos_frac": 0.8125, "sample": [-0.28985595703125, 2.448516845703125, 1.2087554931640625, 2.4428939819335938, 0.7852554321289062, 1.749114990234375, 1.2305068969726562, 3.4324512481689453, 0.37099456787109375, 3.2640380859375, 2.9344329833984375, 0.9081573486328125, 1.5231094360351562, 3.6826934814453125, 3.0088119506835938, 1.2218856811523438, 2.4974899291992188, 1.8978729248046875, -0.6004714965820312, 1.0877265930175781, 2.1725540161132812, -2.86474609375, 1.51007080078125, 4.1087799072265625, 1.3939208984375, -1.3980484008789062, 1.1834259033203125, 0.8576698303222656, -0.9497222900390625, 1.352203369140625, 0.2674522399902344, 1.3402023315429688, 0.11048126220703125, 2.9595489501953125, -0.07332038879394531, 3.3958187103271484, 0.2513580322265625, 3.6796493530273438, 3.218242645263672, 5.204170227050781, 0.4253082275390625, 2.2502174377441406, 0.659759521484375, -1.2781047821044922, 1.9072494506835938, 0.6768875122070312, 0.38092803955078125, -0.41864776611328125, 4.481597900390625, -0.681488037109375, 0.73797607421875, 3.9803314208984375, 1.1895637512207031, 0.39981842041015625, 1.0071601867675781, 0.50274658203125, 1.3437957763671875, 0.6426925659179688, -0.387298583984375, 1.7535820007324219, 1.3858280181884766, -0.05756950378417969, 3.5635414123535156, -0.03833770751953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000127.npy"}
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 1.0918666124343872, "std": 1.9570531845092773, "min": -4.586708068847656, "p10": -1.2509513854980465, "median": 1.2332353591918945, "p90": 4.0032173156738295, "max": 5.3787689208984375, "pos_frac": 0.703125, "sample": [1.517486572265625, 3.5406646728515625, 0.6686820983886719, 2.1113853454589844, 2.131610870361328, 0.85467529296875, 1.052093505859375, 1.10516357421875, 0.8748741149902344, 4.337738037109375, 1.5504074096679688, -3.0550537109375, 2.48663330078125, -0.6458930969238281, 1.4443817138671875, -0.4490203857421875, 0.2985572814941406, -1.8918228149414062, -1.9595870971679688, 3.04400634765625, 2.8554840087890625, -2.446290969848633, -0.9841842651367188, 4.958404541015625, 4.3839874267578125, 1.640777587890625, 0.6997604370117188, 4.201454162597656, -0.4506683349609375, 2.702423095703125, 0.889739990234375, 1.7882003784179688, 1.5015716552734375, 1.7503738403320312, 2.1663742065429688, 1.8020172119140625, -0.09348297119140625, 4.419527053833008, -1.6983623504638672, -0.311126708984375, -1.3652801513671875, 1.1193981170654297, 1.5389862060546875, 0.32071876525878906, -0.5053596496582031, -0.446624755859375, 1.766366958618164, 2.155914306640625, 2.3013458251953125, -0.7586441040039062, 1.2980499267578125, 1.1236724853515625, 5.3787689208984375, 0.0994110107421875, 2.457805633544922, 1.1684207916259766, -0.44161224365234375, 1.3395156860351562, 4.59552001953125, -0.31084251403808594, -4.586708068847656, -0.7117080688476562, 1.9274520874023438, 1.6219329833984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000128.npy"}
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 0.9599519968032837, "std": 1.7897183895111084, "min": -3.8626708984375, "p10": -1.0210309982299803, "median": 0.7513828277587891, "p90": 3.1858749389648438, "max": 5.909633636474609, "pos_frac": 0.640625, "sample": [1.281280517578125, -0.5975627899169922, 1.8301925659179688, -0.6086883544921875, -0.41172027587890625, 4.061405181884766, -1.3857269287109375, -0.9820728302001953, -0.23333740234375, 3.2028236389160156, 2.28973388671875, 0.6844825744628906, 2.0610733032226562, -0.05743408203125, -1.8429031372070312, -1.0377273559570312, 3.1463279724121094, 0.00550079345703125, 3.7306594848632812, 3.0947189331054688, 0.6858978271484375, -0.639617919921875, 1.6588058471679688, 0.8742446899414062, -0.9321060180664062, 5.079681396484375, 2.2077388763427734, -0.086700439453125, 1.3439064025878906, 2.7230682373046875, 1.7277412414550781, 0.8364486694335938, -1.2284774780273438, 3.065723419189453, 0.05672264099121094, 0.7226791381835938, -1.1993637084960938, -0.164459228515625, 0.7894420623779297, 4.244224548339844, 1.796234130859375, 0.09280586242675781, -0.16131210327148438, 0.6498069763183594, 0.9543266296386719, 1.387430191040039, 0.9063873291015625, 5.909633636474609, 1.8363800048828125, 3.5830612182617188, 0.7800865173339844, 2.2696609497070312, 2.9147872924804688, -0.44855690002441406, -0.20478248596191406, -1.0478858947753906, 2.694965362548828, -3.8626708984375, -0.769561767578125, -0.3757648468017578, 0.6899089813232422, 0.2266693115234375, -0.011285781860351562, 1.629974365234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000129.npy"}
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 1.0787646770477295, "std": 1.5599279403686523, "min": -1.8782806396484375, "p10": -0.9662384033203122, "median": 0.9338703155517578, "p90": 3.1810531616210938, "max": 5.861820220947266, "pos_frac": 0.765625, "sample": [1.732757568359375, 0.17949295043945312, 3.6839141845703125, 1.9489288330078125, 3.1509323120117188, 2.087890625, -0.667572021484375, 3.248689651489258, 0.8702259063720703, -0.2148914337158203, -1.1072998046875, 2.12335205078125, 1.9493179321289062, 0.33636474609375, -0.46044921875, 0.9458084106445312, 1.487020492553711, -0.1230316162109375, 1.1822662353515625, 2.842315673828125, 2.434152603149414, 2.6983413696289062, 4.089363098144531, 2.7190322875976562, 0.0618133544921875, -1.2234725952148438, 0.1098175048828125, 0.1787261962890625, -1.6317710876464844, 0.1544647216796875, 1.406463623046875, 2.7989273071289062, 1.8077926635742188, 1.0580520629882812, 0.2254486083984375, 5.861820220947266, 2.2741165161132812, 0.6749458312988281, 4.0128326416015625, -0.2963752746582031, -0.12239456176757812, 3.1939620971679688, -1.09423828125, -1.3450546264648438, 1.5855216979980469, 1.327188491821289, 1.5179405212402344, -0.10425949096679688, 1.834991455078125, 0.0482635498046875, -1.338094711303711, 1.2145805358886719, -0.5404205322265625, 1.3899612426757812, -1.8782806396484375, 0.8256492614746094, 0.26692962646484375, 0.9191627502441406, 0.16421127319335938, 0.379638671875, 3.4890213012695312, 1.4844322204589844, 0.9219322204589844, 0.28977203369140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000130.npy"}
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 1.3480902910232544, "std": 2.0183160305023193, "min": -3.569580078125, "p10": -1.1645856857299803, "median": 1.2996282577514648, "p90": 3.8104095458984375, "max": 5.677421569824219, "pos_frac": 0.71875, "sample": [-0.04669189453125, 1.9047832489013672, 0.10223197937011719, -0.1997051239013672, -1.1077899932861328, 5.0335540771484375, 1.4440956115722656, 1.9709625244140625, -0.8037452697753906, 2.9395751953125, 2.3326644897460938, 0.5179805755615234, -1.3655738830566406, -1.1889266967773438, 1.0005531311035156, 2.146268844604492, -2.5048828125, 0.8918533325195312, 3.7886199951171875, 0.848052978515625, -1.39715576171875, -1.5243759155273438, 2.994659423828125, 2.3662872314453125, 1.093963623046875, 0.29427337646484375, 1.6071949005126953, -2.6567764282226562, 1.3216285705566406, -0.6370697021484375, 1.79638671875, 2.9467315673828125, 4.47113037109375, 2.573780059814453, 3.142578125, 3.0984344482421875, 1.0435676574707031, 5.173370361328125, 1.277627944946289, 3.6783599853515625, 1.1501903533935547, 5.677421569824219, -0.9062118530273438, 3.8197479248046875, 1.4537353515625, -0.06386566162109375, 0.5277919769287109, 3.7120513916015625, -0.9310226440429688, 2.507965087890625, -0.4371070861816406, -0.1363067626953125, 2.112457275390625, 0.24365615844726562, -3.569580078125, 2.9298019409179688, 0.42609405517578125, 1.4581832885742188, 3.0344696044921875, -0.264129638671875, 1.2053298950195312, 3.908374786376953, 5.201757431030273, 2.848499298095703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000131.npy"}
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 1.1757543087005615, "std": 1.9796653985977173, "min": -4.9005126953125, "p10": -0.9626991271972655, "median": 1.191075325012207, "p90": 3.3680177688598634, "max": 6.27972412109375, "pos_frac": 0.75, "sample": [4.1888885498046875, 0.8391990661621094, 1.9564208984375, 2.464733123779297, -0.1543445587158203, -1.315969467163086, 2.5133819580078125, 0.14844512939453125, -1.0425949096679688, 3.250579833984375, 3.3933887481689453, 2.937763214111328, 1.7380294799804688, 1.2122650146484375, -1.7060546875, 0.6704063415527344, 0.18324661254882812, 2.540924072265625, 0.9528350830078125, -0.776275634765625, -0.5627593994140625, -4.9005126953125, 1.9958629608154297, -0.1473541259765625, -0.7609291076660156, -1.5573654174804688, 1.7225418090820312, 3.6161155700683594, 3.213489532470703, -0.6136322021484375, 0.24730682373046875, 0.16314697265625, -4.54265022277832, 0.6029205322265625, 1.0525894165039062, 1.9259796142578125, 5.0604400634765625, 2.7306385040283203, 2.6938743591308594, 1.9044876098632812, -0.2917327880859375, 6.27972412109375, 0.8050613403320312, 2.4569015502929688, 3.308818817138672, 1.4260711669921875, 0.20241928100585938, 0.42038726806640625, 1.1698856353759766, 1.6517086029052734, 2.7269229888916016, 0.46517181396484375, 4.0105438232421875, 1.7896480560302734, 3.7191028594970703, 1.3906097412109375, 1.310028076171875, -0.6552200317382812, 0.4739227294921875, 2.1317825317382812, -1.5943069458007812, 0.947052001953125, 3.300750732421875, -0.03643798828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000132.npy"}
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 1.0888490676879883, "std": 1.8979343175888062, "min": -4.1795806884765625, "p10": -0.8768962860107422, "median": 0.7865791320800781, "p90": 3.1110198974609378, "max": 7.057342529296875, "pos_frac": 0.75, "sample": [2.4917678833007812, -0.8778839111328125, 0.41417694091796875, 3.0334320068359375, 0.22107887268066406, 0.62628173828125, 0.3704643249511719, 3.5702056884765625, 0.29212188720703125, -0.5655288696289062, 1.8799171447753906, 3.4449920654296875, 2.607616424560547, 0.9080772399902344, -1.2811012268066406, 2.6234893798828125, 2.5410308837890625, -4.1795806884765625, 2.4875526428222656, -0.1256256103515625, -0.6496047973632812, 0.5415420532226562, -0.7714500427246094, 2.5519161224365234, 2.522602081298828, 0.23236465454101562, 2.3798370361328125, -0.2312469482421875, 0.680267333984375, -0.3039836883544922, 2.079925537109375, 1.2304649353027344, -3.0566558837890625, 2.9476795196533203, -0.8745918273925781, 1.4529342651367188, -0.03240203857421875, 0.06172943115234375, 2.763874053955078, 2.289337158203125, 0.8928909301757812, 1.6008682250976562, 2.0124282836914062, -2.956584930419922, 0.487823486328125, 4.712921142578125, 1.9348068237304688, 3.1442718505859375, 0.6693191528320312, 7.057342529296875, 0.26433563232421875, 0.25137901306152344, -0.7025070190429688, 4.188995361328125, 1.8168792724609375, 0.5182304382324219, 2.0656280517578125, 3.313528060913086, 1.5737991333007812, -1.1189899444580078, 2.8256149291992188, 0.095123291015625, -1.7515678405761719, 0.49277496337890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000133.npy"}
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 1.5489141941070557, "std": 1.9704227447509766, "min": -3.0533294677734375, "p10": -0.93508186340332, "median": 1.5523681640625, "p90": 3.398676300048828, "max": 6.706275939941406, "pos_frac": 0.828125, "sample": [0.780914306640625, 0.1492919921875, 1.6869010925292969, 1.3874931335449219, 0.9372825622558594, 1.4235992431640625, 0.5538406372070312, 3.0843353271484375, 3.818500518798828, 2.575164794921875, 2.5501556396484375, 2.1669044494628906, 2.2630462646484375, 1.7233924865722656, 0.7477245330810547, -0.12401008605957031, 2.7420616149902344, 1.5646133422851562, -0.6668968200683594, 1.5131149291992188, 2.973529815673828, 1.5401229858398438, 3.368213653564453, 0.7918281555175781, 0.3516845703125, 3.411731719970703, 0.4250335693359375, 0.2234039306640625, -1.3055801391601562, 2.7743072509765625, -1.5448493957519531, 0.0045604705810546875, 3.2848358154296875, 1.0175933837890625, 0.3157634735107422, -2.231800079345703, 2.39422607421875, 1.8772335052490234, 2.687042236328125, 2.2612133026123047, 6.706275939941406, 2.8653640747070312, -3.0533294677734375, -2.7666244506835938, 2.6931190490722656, -0.5835189819335938, 3.6751708984375, -1.050018310546875, 5.543426513671875, 1.3609199523925781, 0.2685661315917969, 2.8437271118164062, 2.2810516357421875, 2.4531707763671875, 5.8916168212890625, 0.029296875, -1.1226463317871094, 2.06640625, 1.5252208709716797, 2.08837890625, 3.3031463623046875, 6.500068664550781, 0.39966773986816406, -0.2854728698730469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000134.npy"}
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 1.03687584400177, "std": 1.9360361099243164, "min": -3.4698944091796875, "p10": -1.1741275787353516, "median": 0.7759561538696289, "p90": 3.0828990936279306, "max": 6.650932312011719, "pos_frac": 0.71875, "sample": [2.51751708984375, -1.1785850524902344, 3.1666183471679688, 0.3877277374267578, 0.7988471984863281, -1.163726806640625, 0.31079864501953125, -1.4800033569335938, -0.03477287292480469, 6.650932312011719, 2.459686279296875, -0.14098358154296875, 2.1022872924804688, 0.08790206909179688, -0.21988677978515625, -1.6598739624023438, 2.6369171142578125, 1.572967529296875, -0.7094745635986328, 0.7707233428955078, 0.9373970031738281, 0.21916961669921875, -3.4698944091796875, 0.78118896484375, 1.0485305786132812, 3.8002967834472656, 0.3316650390625, -0.20872116088867188, 0.6304779052734375, 0.6998329162597656, 0.4645957946777344, 1.557159423828125, 2.8578720092773438, 0.7284069061279297, 2.2272796630859375, 1.2708854675292969, 0.3465728759765625, 1.3628616333007812, 2.5599899291992188, -2.8015003204345703, -0.08052825927734375, 5.1370697021484375, 0.022233963012695312, 1.6418838500976562, 1.5052986145019531, 0.8360443115234375, -0.068023681640625, -2.8012161254882812, 2.149486541748047, 1.9242019653320312, 2.887554168701172, -2.2009429931640625, 0.3841400146484375, 2.8456668853759766, 4.417121887207031, 5.257728576660156, -0.6816024780273438, 2.1954784393310547, -0.3248176574707031, 1.9418220520019531, 0.31207275390625, 2.0900421142578125, 4.852203369140625, -0.10254669189453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000135.npy"}
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 1.318336009979248, "std": 2.6778769493103027, "min": -4.190225601196289, "p10": -1.883267211914062, "median": 1.3772163391113281, "p90": 5.3729230880737315, "max": 8.799728393554688, "pos_frac": 0.671875, "sample": [7.597930908203125, 3.744617462158203, -2.9682140350341797, -1.3468055725097656, -1.5431365966796875, 1.6542282104492188, -0.08544158935546875, 0.7847404479980469, -0.47154998779296875, 1.3425369262695312, 1.432586669921875, -2.9594268798828125, -0.4942951202392578, 2.3104248046875, -0.5013275146484375, 2.0185699462890625, 5.058198928833008, 0.4105491638183594, 0.8740501403808594, -1.1411628723144531, 2.5458221435546875, 0.8815498352050781, 3.19146728515625, 5.6562347412109375, -0.7519149780273438, 0.13463973999023438, 5.507804870605469, 1.4376716613769531, 1.412872314453125, 7.334930419921875, 3.0833892822265625, 0.24394607543945312, 2.895294189453125, 3.6355209350585938, -0.9357147216796875, 2.24566650390625, -0.3455047607421875, 1.8327999114990234, 1.4273147583007812, 1.3039321899414062, -4.190225601196289, 1.901336669921875, 1.580841064453125, 3.0208206176757812, 0.5349235534667969, 3.2861671447753906, -0.6802215576171875, 1.169036865234375, -2.0290374755859375, 8.799728393554688, 1.6510467529296875, -2.4874343872070312, -2.2477188110351562, 1.4838104248046875, -0.468109130859375, 2.4170684814453125, 1.411895751953125, -0.5848770141601562, 5.789588928222656, 6.413366317749023, 3.11181640625, -0.4971771240234375, 0.07634353637695312, -3.5442771911621094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000136.npy"}
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 1.4218181371688843, "std": 2.193927526473999, "min": -3.80999755859375, "p10": -1.3156105041503905, "median": 1.4500579833984375, "p90": 3.806906127929688, "max": 8.173179626464844, "pos_frac": 0.734375, "sample": [-0.41869354248046875, -3.80999755859375, 2.019550323486328, 3.2846145629882812, 3.0158214569091797, 3.8417739868164062, 3.1748390197753906, 4.699432373046875, -0.40106201171875, 4.2866363525390625, 3.043111801147461, 2.0241241455078125, 4.098026275634766, 2.7188262939453125, 4.081153869628906, 1.0696487426757812, 1.676788330078125, 3.5829010009765625, -0.2592926025390625, 0.7971019744873047, -1.4426860809326172, 2.000415802001953, 2.6380691528320312, -1.5050086975097656, 0.4825916290283203, 0.21124267578125, 2.6862640380859375, 8.173179626464844, -0.9627094268798828, -0.1841888427734375, -2.3983001708984375, 0.18016433715820312, 0.298919677734375, 2.022735595703125, 7.363704681396484, 0.6517868041992188, -0.008270263671875, -1.4808273315429688, 1.67413330078125, 0.1385784149169922, -0.6981468200683594, 0.5172824859619141, 2.0502052307128906, 1.9457931518554688, 2.396331787109375, -0.22798728942871094, 0.8733711242675781, 3.678264617919922, 1.2297821044921875, 3.4924163818359375, 0.15969085693359375, -1.2462844848632812, 0.063232421875, -2.4544906616210938, 3.111095428466797, 1.8232421875, 0.9000530242919922, -0.5697174072265625, -1.3453216552734375, 1.6703338623046875, 0.8476104736328125, 3.7255477905273438, 3.082244873046875, 2.906707763671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000137.npy"}
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 1.0759272575378418, "std": 2.0785276889801025, "min": -4.328056335449219, "p10": -1.4644412994384766, "median": 1.1970500946044922, "p90": 3.5768539428710944, "max": 6.04876708984375, "pos_frac": 0.75, "sample": [0.35649871826171875, 3.6385650634765625, 2.3359642028808594, 3.432861328125, 1.4641914367675781, 1.4467926025390625, 0.47771453857421875, 2.71234130859375, -2.123016357421875, 1.3484001159667969, 0.8737697601318359, -3.0523681640625, 3.7358245849609375, 0.621490478515625, 6.04876708984375, -2.25762939453125, -1.316436767578125, 2.7545394897460938, 1.4699172973632812, 2.8428611755371094, 0.3261260986328125, 1.3724822998046875, 3.9986724853515625, 2.5466575622558594, 0.1710968017578125, 2.073223114013672, 2.6389236450195312, 1.7359848022460938, 0.6266708374023438, 0.5680065155029297, 3.412790298461914, -0.052165985107421875, 3.7107925415039062, -0.7943191528320312, -1.4567108154296875, -0.02280426025390625, 0.11437606811523438, -3.4539260864257812, 0.04281806945800781, 3.419384002685547, 1.3265876770019531, 3.0778656005859375, 0.10042572021484375, 1.4460983276367188, -0.9523735046386719, -4.328056335449219, 5.777654647827148, 0.7661170959472656, 1.4567413330078125, 0.4476165771484375, 1.3410186767578125, 1.8319091796875, 2.4777145385742188, -1.2916069030761719, -1.9564666748046875, -0.1928424835205078, -1.4677543640136719, 1.355215072631836, -0.24070167541503906, 1.0675125122070312, 4.950225830078125, 0.9623947143554688, 2.84259033203125, 0.27233123779296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000138.npy"}
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 1.389855146408081, "std": 1.9410593509674072, "min": -2.467132568359375, "p10": -1.0446395874023438, "median": 1.2872390747070312, "p90": 4.275618362426758, "max": 6.462955474853516, "pos_frac": 0.765625, "sample": [2.158182144165039, 3.7333621978759766, 0.41156768798828125, -0.04299163818359375, -1.8151359558105469, 2.386188507080078, 0.48416900634765625, 4.227657318115234, 2.3653831481933594, 0.3209705352783203, -0.10505294799804688, 6.462955474853516, 3.0354785919189453, -0.5943260192871094, 1.928558349609375, 4.3538970947265625, 5.19635009765625, 0.3757896423339844, -1.3578414916992188, 2.4539756774902344, -0.6260528564453125, -2.467132568359375, -0.9654388427734375, 1.374288558959961, 2.0067977905273438, 1.4803962707519531, 1.3092803955078125, 0.7259159088134766, 3.5009002685546875, 1.29046630859375, 1.1516036987304688, 2.59649658203125, 0.7058868408203125, 0.72564697265625, -0.7554702758789062, 2.9798583984375, 1.7717399597167969, 1.2905235290527344, 0.876708984375, 0.2976531982421875, 3.3152236938476562, 1.0120124816894531, 4.814817428588867, -0.16024208068847656, -1.078582763671875, 0.17248916625976562, 1.2840118408203125, 1.3504638671875, 4.53533935546875, -0.11945343017578125, 4.84149169921875, 0.8266830444335938, 2.2555694580078125, -1.9757919311523438, -1.368865966796875, 0.7027168273925781, 0.37415504455566406, 2.3064956665039062, -1.9354228973388672, 2.459075927734375, 4.066093444824219, 4.296173095703125, 1.4081611633300781, 0.31890869140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000139.npy"}
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 1.7729384899139404, "std": 2.73354434967041, "min": -8.131942749023438, "p10": -1.5809741973876952, "median": 1.8491554260253906, "p90": 5.108400344848633, "max": 6.851945877075195, "pos_frac": 0.71875, "sample": [5.04144287109375, 1.724029541015625, 0.18247604370117188, 0.05512237548828125, 0.7424774169921875, 2.7110252380371094, -2.7000350952148438, -2.29803466796875, 3.7370376586914062, 3.8198471069335938, -8.131942749023438, 2.7993011474609375, 2.8246612548828125, 0.806610107421875, 5.657478332519531, 2.1294097900390625, -1.6689567565917969, 1.6057968139648438, -0.656463623046875, 3.5973358154296875, 1.7692852020263672, 3.932586669921875, 2.6793670654296875, 2.2740097045898438, -0.01873779296875, 2.052886962890625, 1.3179302215576172, 6.851945877075195, 1.2418594360351562, -1.436920166015625, -0.2921295166015625, -1.1287460327148438, 3.0167312622070312, -0.48722076416015625, -0.4961967468261719, -0.600921630859375, 5.137096405029297, 4.5900421142578125, 5.183906555175781, 4.7682037353515625, 2.709564208984375, 4.331214904785156, 6.0492095947265625, 4.365991592407227, 2.0777587890625, -1.9610443115234375, 4.9201507568359375, 0.9642066955566406, 6.365753173828125, -0.9821796417236328, -2.5176124572753906, 3.1894607543945312, 0.8516693115234375, 1.929025650024414, 6.275718688964844, 1.0309295654296875, 4.45477294921875, -0.39650726318359375, 3.2932586669921875, 1.2852630615234375, -0.29601287841796875, 3.705951690673828, -1.6427116394042969, 1.1306381225585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000140.npy"}
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 1.674462080001831, "std": 2.6136372089385986, "min": -4.60528564453125, "p10": -0.8180057525634765, "median": 1.2569103240966797, "p90": 4.861822128295898, "max": 8.218170166015625, "pos_frac": 0.734375, "sample": [-4.60528564453125, 0.17213058471679688, 3.639617919921875, 0.2979393005371094, 2.240123748779297, 0.7283248901367188, 0.14820098876953125, 2.602680206298828, 0.6512565612792969, -2.5257530212402344, 1.8632888793945312, 2.0800399780273438, 3.9697608947753906, -0.7222938537597656, 4.881488800048828, 1.7016410827636719, 2.927440643310547, 0.07981491088867188, -0.40564918518066406, -2.7512741088867188, 0.892852783203125, 0.6940155029296875, 0.5484466552734375, 2.4513092041015625, 3.1109237670898438, 0.3409423828125, 1.6447982788085938, 3.2218170166015625, -0.81427001953125, -1.84954833984375, -0.7980270385742188, -0.7569808959960938, 1.485687255859375, 1.9422836303710938, 3.511974334716797, -0.7467231750488281, 4.353252410888672, 0.37435150146484375, 7.996774673461914, 4.8159332275390625, 5.592010498046875, 3.907014846801758, 3.126293182373047, 2.9473037719726562, 4.163749694824219, -0.6375885009765625, -0.28609657287597656, 1.9300613403320312, 6.3982696533203125, 4.612373352050781, -2.0285911560058594, 0.40796661376953125, 0.925994873046875, 7.49896240234375, 3.5251617431640625, 8.218170166015625, -0.8196067810058594, 6.280065536499023, -0.861358642578125, 0.810577392578125, 1.374908447265625, 1.1389122009277344, -0.08966064453125, -0.3626289367675781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000141.npy"}
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 1.2388105392456055, "std": 2.058485984802246, "min": -3.3280258178710938, "p10": -1.158884048461914, "median": 0.922515869140625, "p90": 3.873997497558594, "max": 6.212482452392578, "pos_frac": 0.765625, "sample": [0.6245193481445312, -2.7497787475585938, 1.3082313537597656, 2.7201919555664062, 4.436515808105469, 5.678314208984375, 1.9947967529296875, -1.0514183044433594, 1.3749771118164062, 2.3724136352539062, 1.530120849609375, 0.0044097900390625, 2.8104515075683594, -3.3280258178710938, 0.37223052978515625, 2.0125045776367188, 1.6776123046875, -3.2132091522216797, 0.9178848266601562, 0.9072284698486328, 1.9424972534179688, 3.041412353515625, 0.8970909118652344, 2.108936309814453, 1.710479736328125, 2.4640731811523438, -0.18190765380859375, 0.6104965209960938, 6.212482452392578, -1.2049407958984375, 0.5790481567382812, 0.422760009765625, 3.77435302734375, 5.7802581787109375, 1.3734893798828125, -1.667510986328125, 0.6682548522949219, -1.7863101959228516, -0.195037841796875, 0.9271469116210938, 5.4361724853515625, 3.9167022705078125, 0.5898208618164062, 0.7354507446289062, 0.0058135986328125, 2.9696617126464844, -1.8733272552490234, 0.9012680053710938, -0.5743827819824219, 2.0705032348632812, -0.809417724609375, -0.70416259765625, 1.1872711181640625, 3.5226593017578125, 0.5298957824707031, 0.7265777587890625, 3.4615936279296875, 4.137077331542969, 1.1970367431640625, -0.5656795501708984, 0.36444091796875, -0.3556709289550781, 3.187175750732422, 1.3503494262695312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000142.npy"}
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 1.2781095504760742, "std": 2.5415401458740234, "min": -5.862762451171875, "p10": -1.196253204345703, "median": 1.0130538940429688, "p90": 4.300845336914064, "max": 7.007843017578125, "pos_frac": 0.75, "sample": [1.036041259765625, 2.237579345703125, 2.1385879516601562, 2.2351646423339844, 1.5185775756835938, -0.7772598266601562, 1.5407485961914062, -0.3404502868652344, 4.774438858032227, -2.569812774658203, 4.022218704223633, 0.3510589599609375, 5.8543548583984375, 0.9294013977050781, 2.0828857421875, 2.6058597564697266, -4.903228759765625, 2.077117919921875, 4.649200439453125, 0.48412132263183594, 4.419525146484375, -0.3708915710449219, 6.9730987548828125, 3.966033935546875, 2.021209716796875, 0.9443798065185547, 2.6229095458984375, -4.60528564453125, 0.7460594177246094, 0.7811241149902344, 0.8132781982421875, 0.8755626678466797, -1.4167938232421875, 3.2866477966308594, 5.371936798095703, -1.2284774780273438, -0.286651611328125, 0.25567054748535156, -0.24393463134765625, 0.9900665283203125, 1.4002151489257812, -5.862762451171875, 4.02392578125, 0.449554443359375, -0.2978782653808594, 0.419921875, 2.076629638671875, 3.792999267578125, 0.5625762939453125, 0.4836273193359375, -1.121063232421875, 1.4612407684326172, 3.5284500122070312, 7.007843017578125, -4.261940002441406, 3.746246337890625, 1.647430419921875, 0.6589813232421875, 3.2828750610351562, 2.391368865966797, -0.34393882751464844, -0.5299282073974609, 1.0554771423339844, 0.36508750915527344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000143.npy"}
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 1.5406036376953125, "std": 2.5064778327941895, "min": -5.905479431152344, "p10": -1.2932525634765624, "median": 1.3927764892578125, "p90": 5.19360218048096, "max": 7.464256286621094, "pos_frac": 0.703125, "sample": [5.58802604675293, 2.804363250732422, -1.0988616943359375, 0.35967254638671875, -1.0772552490234375, 0.8807010650634766, -1.2066631317138672, 2.7483596801757812, 0.6826534271240234, 0.18784523010253906, -0.40755462646484375, 0.5474472045898438, 1.7172088623046875, 2.2798690795898438, 3.5551223754882812, 1.3507537841796875, 3.994476318359375, 0.6154594421386719, -0.8354949951171875, 2.6276779174804688, 2.748046875, 2.253314971923828, 1.4347991943359375, -0.047943115234375, -0.09494400024414062, 0.7362747192382812, 2.5595474243164062, 3.9371109008789062, -1.330362319946289, -0.5305938720703125, 3.0833892822265625, 1.3263702392578125, -2.148029327392578, -0.12711334228515625, 3.1477432250976562, 0.6212387084960938, -0.9360980987548828, 4.383398056030273, 5.553802490234375, 5.54083251953125, 3.358856201171875, 2.5832366943359375, -1.7069969177246094, 4.1790008544921875, -1.7343940734863281, 1.7269134521484375, 2.112152099609375, -1.7761898040771484, 3.6976776123046875, -0.1477813720703125, 0.3883800506591797, -1.03857421875, 1.5056838989257812, 6.1378173828125, -2.150146484375, 5.551992416381836, 7.464256286621094, 7.2687225341796875, 1.1379547119140625, -5.905479431152344, 2.2311782836914062, 0.7476997375488281, 2.60809326171875, 2.933988571166992], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000144.npy"}
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 1.484137773513794, "std": 2.736412525177002, "min": -7.0660858154296875, "p10": -1.4182167053222656, "median": 1.3547248840332031, "p90": 4.915968132019046, "max": 8.388130187988281, "pos_frac": 0.703125, "sample": [1.51019287109375, 2.4284133911132812, -1.1533279418945312, -0.244476318359375, 0.22556304931640625, -7.0660858154296875, 2.622833251953125, -0.5427780151367188, 7.542240142822266, 1.6877899169921875, 1.878763198852539, 3.4612655639648438, 0.2873382568359375, 3.019439697265625, 0.4189281463623047, 1.3246002197265625, -0.7247695922851562, 4.3114776611328125, -0.9273834228515625, 1.93212890625, 0.16326904296875, 0.7693748474121094, -1.4820632934570312, 3.966522216796875, 5.862581253051758, -0.357269287109375, -0.4586029052734375, 2.542266845703125, 3.968994140625, 0.23006820678710938, 8.388130187988281, 3.588714599609375, 1.4099349975585938, 6.398101806640625, 1.6671142578125, 5.17503547668457, 2.223459243774414, 3.9046268463134766, 1.2969436645507812, 3.7121009826660156, -3.3244476318359375, 2.470550537109375, 5.794586181640625, 3.5977935791015625, -0.08774185180664062, -1.9022979736328125, 0.9253749847412109, 1.3848495483398438, -0.9942970275878906, 1.18560791015625, 2.904266357421875, 1.8409099578857422, 3.587879180908203, -0.8476676940917969, 1.1022109985351562, 6.4122314453125, 0.97589111328125, -1.4623336791992188, -3.94696044921875, 0.665069580078125, 3.4712066650390625, -1.69329833984375, -1.315277099609375, -0.7207412719726562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000145.npy"}
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 1.6037828922271729, "std": 2.4118683338165283, "min": -2.7598419189453125, "p10": -1.7801616668701172, "median": 1.6166496276855469, "p90": 4.388123321533203, "max": 9.677268981933594, "pos_frac": 0.734375, "sample": [3.2955398559570312, 3.373903274536133, -1.3190383911132812, 1.6415939331054688, 0.5650100708007812, -1.8748779296875, -1.2445297241210938, 2.837646484375, 2.8066635131835938, 2.1778640747070312, -2.7598419189453125, -2.3541717529296875, 4.0232696533203125, 2.721731185913086, 0.7204017639160156, -1.5504684448242188, 2.7858047485351562, -0.6171112060546875, 3.7107810974121094, -2.1310043334960938, 2.3149871826171875, 5.5063018798828125, 1.797576904296875, 0.776580810546875, -2.4436988830566406, -0.6358795166015625, 1.3674068450927734, 4.3982696533203125, -1.7830581665039062, 2.1530914306640625, 1.6853275299072266, -2.2777252197265625, 6.6613311767578125, 1.1662445068359375, 0.7499351501464844, -0.7468414306640625, 1.591705322265625, 0.5424652099609375, 1.5028839111328125, -0.4154510498046875, 5.026824951171875, 1.7734489440917969, 0.7154884338378906, 2.9571456909179688, 4.364448547363281, 5.4922332763671875, 3.181438446044922, 1.8339176177978516, 3.781719207763672, 1.4388980865478516, 2.506467819213867, 2.018463134765625, 4.104869842529297, 1.4378662109375, -1.7734031677246094, -0.15526580810546875, 9.677268981933594, -0.023651123046875, 1.2881088256835938, 1.081705093383789, 3.3591079711914062, 1.5872154235839844, 1.7078933715820312, 4.539272308349609], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000146.npy"}
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 1.1894733905792236, "std": 2.6453418731689453, "min": -8.11578369140625, "p10": -1.1597274780273437, "median": 1.1315536499023438, "p90": 3.897174835205078, "max": 6.646320343017578, "pos_frac": 0.671875, "sample": [1.5008106231689453, 1.27532958984375, 3.1860218048095703, -0.237152099609375, 1.7091751098632812, 3.1231918334960938, -2.1656646728515625, 0.8084869384765625, -1.130401611328125, -0.7032718658447266, 2.956695556640625, 2.023242950439453, 3.690643310546875, 3.886199951171875, -0.7727279663085938, 3.7347145080566406, 6.4380950927734375, -8.11578369140625, 3.228548049926758, -0.6393013000488281, 1.6045150756835938, 0.01718902587890625, 0.4405345916748047, 0.79217529296875, 0.4479522705078125, 5.950916290283203, -1.1613845825195312, 3.289989471435547, 1.8873977661132812, 0.9877777099609375, -0.3006458282470703, 1.855224609375, 1.564910888671875, -1.8776702880859375, -1.0202789306640625, 2.838897705078125, 3.411497116088867, 6.0084381103515625, 5.5863800048828125, -1.1735572814941406, -0.854644775390625, -0.012786865234375, 0.41446685791015625, 2.837268829345703, 3.224365234375, -1.1558609008789062, 6.646320343017578, -0.7492752075195312, 2.3042678833007812, 0.6572036743164062, 1.5940322875976562, 1.9246292114257812, 1.5570068359375, -0.8595752716064453, -0.943634033203125, 0.4216117858886719, 0.9165859222412109, 0.3728599548339844, 3.9018783569335938, -4.6810302734375, -4.116950988769531, 2.1640853881835938, 5.847209930419922, -0.23084640502929688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000147.npy"}
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 1.7444007396697998, "std": 2.64003586769104, "min": -3.6846237182617188, "p10": -1.5019418716430657, "median": 1.4870023727416992, "p90": 5.326238250732422, "max": 7.490943908691406, "pos_frac": 0.75, "sample": [5.9130096435546875, -2.770383834838867, 2.0646591186523438, -2.1495227813720703, 3.9821548461914062, 7.490943908691406, 2.5547332763671875, -3.4816551208496094, 4.1708831787109375, 4.5442657470703125, -0.7549781799316406, 4.673160552978516, 1.1554489135742188, 1.4213752746582031, 3.894664764404297, -0.38054656982421875, -0.9075450897216797, 5.4060516357421875, 2.29150390625, -0.3617973327636719, 1.9708518981933594, -1.756683349609375, 5.464351654052734, 0.8566627502441406, 4.666479110717773, 7.1702728271484375, 3.3717613220214844, 0.0731048583984375, 6.732757568359375, 1.531707763671875, 1.4764518737792969, -0.872711181640625, 4.697999954223633, -2.3010711669921875, 1.8854293823242188, 3.5541610717773438, 0.6207714080810547, 3.876544952392578, -3.6846237182617188, 0.20862579345703125, 1.440582275390625, -0.2898292541503906, 5.140007019042969, 7.248680114746094, -2.580799102783203, 2.8452014923095703, 2.183277130126953, 1.4975528717041016, 2.1605682373046875, 1.3739776611328125, 2.5169429779052734, 0.5747146606445312, 0.8634262084960938, 0.474151611328125, 0.10869789123535156, 3.0933837890625, 1.192474365234375, 2.0890731811523438, 1.0379295349121094, -0.37326812744140625, 0.36461830139160156, 1.9386367797851562, -0.880462646484375, -0.6771621704101562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000148.npy"}
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 1.9707480669021606, "std": 2.7148995399475098, "min": -5.06341552734375, "p10": -1.3358364105224607, "median": 1.5860233306884766, "p90": 5.837226867675782, "max": 8.81219482421875, "pos_frac": 0.78125, "sample": [2.6180419921875, 3.947174072265625, 5.5603179931640625, 0.6651191711425781, 2.0008697509765625, -0.17455291748046875, -0.044986724853515625, 3.451679229736328, -0.06581687927246094, 4.053312301635742, -2.4427261352539062, 1.7537097930908203, 0.559051513671875, 1.6577987670898438, 2.826629638671875, 0.7579231262207031, 1.3295478820800781, 1.2024765014648438, 0.4940357208251953, 1.0976715087890625, 1.2436275482177734, 6.921348571777344, 5.26310920715332, -1.7544975280761719, 8.81219482421875, 2.635293960571289, 0.7299823760986328, -0.8779182434082031, 1.5142478942871094, -5.06341552734375, -0.1800556182861328, 4.1777801513671875, -1.0450553894042969, 8.165311813354492, 0.6211910247802734, 6.426660537719727, 4.422367095947266, -1.4604568481445312, 5.955902099609375, -1.5668411254882812, -1.8864593505859375, 2.0276031494140625, 0.4122200012207031, 3.8289642333984375, 6.7914581298828125, 4.490060806274414, 1.3431243896484375, 3.5170536041259766, 1.3266220092773438, 1.38531494140625, 1.9640960693359375, 1.098785400390625, -0.3836212158203125, -2.913726806640625, 0.3348979949951172, 3.909088134765625, 2.093292236328125, 6.857612609863281, 3.7936477661132812, 1.9867229461669922, 1.7114906311035156, 3.8394126892089844, 0.18122100830078125, 2.2309417724609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000149.npy"}
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 2.4391417503356934, "std": 2.593012809753418, "min": -3.3605880737304688, "p10": -0.4159118652343748, "median": 2.300509452819824, "p90": 5.740140533447266, "max": 8.92236328125, "pos_frac": 0.84375, "sample": [2.308797836303711, 0.16977691650390625, 7.967315673828125, 1.7435150146484375, -2.5906753540039062, -0.4970550537109375, 1.4563922882080078, 4.892189025878906, 2.7262725830078125, 4.996316909790039, 0.3426666259765625, 2.99359130859375, 0.436737060546875, 3.7696685791015625, -0.16199302673339844, 2.0736465454101562, 2.6395187377929688, 6.5055389404296875, 2.207550048828125, 2.9101638793945312, 3.2496871948242188, 0.7924613952636719, 4.194091796875, -2.5193634033203125, 0.659332275390625, 4.23710823059082, 1.1278572082519531, 3.716289520263672, 0.4461517333984375, 6.1654052734375, 2.8091259002685547, 2.59197998046875, 5.051841735839844, 1.4239845275878906, 8.194023132324219, -3.3605880737304688, 1.4023876190185547, 2.7034378051757812, 3.06695556640625, 6.5105743408203125, 2.2922210693359375, 5.546058654785156, 0.5289077758789062, 1.9684524536132812, 5.8233184814453125, 1.0045013427734375, 4.110555648803711, 4.477901458740234, -1.0567855834960938, 8.92236328125, 0.374755859375, 0.6969528198242188, 3.0179901123046875, -0.1371612548828125, 1.1081199645996094, 4.3272705078125, -1.4246559143066406, -1.6368865966796875, 5.071779251098633, -0.2265777587890625, 1.3798294067382812, 3.9390640258789062, 2.2177658081054688, 4.426666259765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000150.npy"}
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 2.517019271850586, "std": 2.7532529830932617, "min": -3.5068588256835938, "p10": -0.9596584320068359, "median": 2.483335494995117, "p90": 5.513274383544923, "max": 10.59088134765625, "pos_frac": 0.8125, "sample": [3.4697952270507812, 1.8955307006835938, 4.88664436340332, -0.9970817565917969, 0.7855567932128906, 0.3112831115722656, 10.59088134765625, 0.3646049499511719, 6.85986328125, 0.8864574432373047, 2.2998733520507812, 3.9171009063720703, 2.26904296875, 0.5052947998046875, 2.3469314575195312, 3.207550048828125, 4.684305191040039, -0.5481033325195312, 2.036823272705078, 0.31719970703125, -1.1975555419921875, -0.8723373413085938, 4.289649963378906, -1.373687744140625, 8.7674560546875, -1.8145980834960938, 3.2483444213867188, 2.6604957580566406, -0.7109546661376953, -0.5550994873046875, 3.0916576385498047, -1.4341659545898438, 4.5027008056640625, 2.8906707763671875, 1.414764404296875, -1.07769775390625, 2.644287109375, 2.639434814453125, 1.075347900390625, 8.0504150390625, 0.581695556640625, 2.619739532470703, 5.018100738525391, 1.5526008605957031, 7.004890441894531, 5.606903076171875, 5.142875671386719, 5.033843994140625, 5.294807434082031, 2.094402313232422, 2.7875137329101562, 3.387054443359375, 1.5590343475341797, 7.113655090332031, -3.5068588256835938, 4.424137115478516, 3.6026077270507812, 1.939208984375, -0.5821952819824219, 1.373626708984375, 3.3392333984375, 5.094383239746094, 0.1878986358642578, 4.091392517089844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000151.npy"}
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 2.091653823852539, "std": 3.2176833152770996, "min": -4.646720886230469, "p10": -1.1607200622558593, "median": 1.6153326034545898, "p90": 6.575566101074219, "max": 9.63958740234375, "pos_frac": 0.75, "sample": [-0.8815937042236328, 0.862060546875, 4.036800384521484, 0.29985809326171875, -4.646720886230469, 0.24488067626953125, 2.900604248046875, 3.8622207641601562, 1.4309158325195312, -1.219329833984375, 1.6107006072998047, -1.343536376953125, 2.440032958984375, 0.570159912109375, 5.55963134765625, 6.529380798339844, 0.5056591033935547, 1.619964599609375, 0.785736083984375, -0.7701988220214844, -4.460884094238281, 6.347198486328125, 2.9201126098632812, 3.3103561401367188, -0.32037353515625, 0.42719459533691406, 4.191356658935547, 2.2827529907226562, 0.9073600769042969, 1.9308700561523438, -0.2221240997314453, 7.829013824462891, 0.6261577606201172, 0.543121337890625, 0.5102157592773438, -1.01031494140625, 1.7502880096435547, 0.11411285400390625, 3.92449951171875, 2.154022216796875, -2.352581024169922, 6.647798538208008, 4.565834045410156, -1.0239639282226562, 6.185844421386719, 7.011238098144531, 5.365032196044922, -0.7768058776855469, 3.70733642578125, 2.6559295654296875, -4.3572540283203125, 6.595359802246094, 0.3938007354736328, 8.536537170410156, 4.852832794189453, 2.604450225830078, 7.874736785888672, 6.233345031738281, -0.9262542724609375, 9.63958740234375, -0.4986572265625, -1.8911457061767578, 0.13077545166015625, 4.539909362792969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000152.npy"}
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 2.1042513847351074, "std": 2.622159242630005, "min": -3.1847400665283203, "p10": -1.335255622863769, "median": 1.70709228515625, "p90": 5.407537841796875, "max": 8.118976593017578, "pos_frac": 0.796875, "sample": [2.5533504486083984, 1.7047271728515625, 2.7304000854492188, 5.0494537353515625, 4.097831726074219, 5.581596374511719, 2.987060546875, 1.6659126281738281, 1.2643966674804688, 1.317779541015625, 1.050516128540039, 6.285022735595703, 0.9470062255859375, 5.393150329589844, 2.663421630859375, -1.5339508056640625, 4.09814453125, 0.42943572998046875, -0.15264892578125, 6.5288543701171875, 8.074005126953125, 5.337348937988281, -1.5679206848144531, 1.9658546447753906, 8.118976593017578, -1.9232177734375, 6.503166198730469, 0.7655525207519531, 0.44378662109375, 5.379692077636719, -3.0134353637695312, 1.5150814056396484, -3.1847400665283203, 0.91241455078125, 2.7979660034179688, 0.5244159698486328, 0.7626113891601562, 5.413703918457031, 1.1873779296875, 2.0267105102539062, 1.834869384765625, -0.7153663635253906, 5.386810302734375, 2.6542930603027344, 2.32928466796875, -0.2684059143066406, 3.5094223022460938, 1.6955795288085938, 2.7029495239257812, -0.8716335296630859, 0.2964019775390625, 3.2246780395507812, 0.5706748962402344, 0.5087051391601562, 4.545940399169922, -1.80133056640625, -2.0242080688476562, 4.828020095825195, -0.42791748046875, 5.00347900390625, 3.3436508178710938, 0.4725532531738281, -0.5366249084472656, 1.7094573974609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000153.npy"}
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 1.499024748802185, "std": 3.274091958999634, "min": -6.658905029296875, "p10": -2.7480541229248043, "median": 1.6546649932861328, "p90": 5.659085083007813, "max": 9.259559631347656, "pos_frac": 0.703125, "sample": [-3.5257339477539062, 1.3134002685546875, -6.658905029296875, -6.533174514770508, 5.802032470703125, 3.0632877349853516, -3.295663833618164, 1.364553451538086, 0.7760467529296875, -2.8042068481445312, 2.4164066314697266, -0.17276382446289062, 3.0651283264160156, -0.562225341796875, -2.351409912109375, -2.6170310974121094, -0.7288093566894531, 1.753499984741211, 0.3598365783691406, 0.9093246459960938, 4.7580718994140625, 7.6181182861328125, 6.925132751464844, 2.29620361328125, -3.98944091796875, 2.6734180450439453, 6.412406921386719, 0.269927978515625, 7.01409912109375, -6.4038848876953125, 2.8233509063720703, 2.037351608276367, 5.220802307128906, 3.211334228515625, 2.242961883544922, -0.23594284057617188, 1.5075111389160156, 2.8332901000976562, 3.4837589263916016, 1.5558300018310547, -0.5066452026367188, -0.649810791015625, 2.8090991973876953, -0.8213596343994141, 0.7028961181640625, 9.259559631347656, 2.601917266845703, 2.9984893798828125, 3.0449676513671875, -1.3152503967285156, 0.1439971923828125, 0.372222900390625, 1.5432815551757812, -0.43357086181640625, 5.53631591796875, 2.7358551025390625, 5.711700439453125, -0.3544578552246094, 5.007246017456055, 3.6702499389648438, 4.04901123046875, 1.9659709930419922, 1.269388198852539, 2.7686195373535156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000154.npy"}
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 2.0669875144958496, "std": 3.6528372764587402, "min": -5.519012451171875, "p10": -2.8867336273193356, "median": 1.6379814147949219, "p90": 7.125010490417482, "max": 10.415884017944336, "pos_frac": 0.671875, "sample": [1.0766143798828125, 0.6429290771484375, 7.483306884765625, 2.8854598999023438, 0.6413955688476562, 1.394256591796875, 0.16692352294921875, 4.440061569213867, -1.0721874237060547, 7.833324432373047, 5.350074768066406, 5.3358001708984375, -0.2771759033203125, -4.1496734619140625, -0.3278541564941406, -1.9132080078125, 3.1348876953125, 8.052215576171875, 3.1325149536132812, 6.768693923950195, 7.9775238037109375, -0.19873046875, 4.2534942626953125, -3.1399097442626953, 4.324066162109375, 8.95920181274414, 0.8664321899414062, 4.7871246337890625, -1.6171951293945312, 5.119598388671875, -5.519012451171875, 1.5448970794677734, 10.415884017944336, 4.839805603027344, 6.3082733154296875, 4.56298828125, -0.10338592529296875, -1.415924072265625, -4.6674957275390625, 3.0110244750976562, -3.6194610595703125, 1.1140174865722656, 3.86407470703125, 1.1151695251464844, 5.552146911621094, 3.006389617919922, 1.8904571533203125, 5.795341491699219, -2.6994476318359375, 2.4693260192871094, 3.5427398681640625, 1.721832275390625, -0.04241180419921875, -0.13214111328125, 1.4171066284179688, -0.7024154663085938, 2.8227882385253906, -2.966999053955078, -5.463649749755859, -0.9666328430175781, 1.5541305541992188, 7.277717590332031, 5.01765251159668, -0.18756103515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000155.npy"}
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 2.5308117866516113, "std": 3.1939165592193604, "min": -4.901153564453125, "p10": -1.0619388580322264, "median": 2.1280784606933594, "p90": 6.909588623046875, "max": 11.487785339355469, "pos_frac": 0.78125, "sample": [1.4922561645507812, -0.9753761291503906, -0.7896652221679688, -3.3561019897460938, 4.78485107421875, 6.8136138916015625, 11.463949203491211, 1.8902130126953125, 3.445068359375, 6.977993011474609, 1.691314697265625, 4.592475891113281, -1.4176063537597656, 3.8019866943359375, 6.84844970703125, 4.950233459472656, 5.312187194824219, 3.5285720825195312, -0.6239852905273438, 3.8859596252441406, 1.1169414520263672, 0.8509712219238281, 2.968120574951172, 2.628314971923828, 0.228973388671875, 1.2913589477539062, -0.26314353942871094, 4.731964111328125, 3.8155288696289062, 5.306640625, 2.6483383178710938, 2.08819580078125, -1.32763671875, 3.8887863159179688, 2.1679611206054688, 11.487785339355469, -4.901153564453125, 2.9478988647460938, -2.1741790771484375, 7.460441589355469, 4.668891906738281, 8.247207641601562, 1.0122146606445312, 0.9698562622070312, 0.2565937042236328, -0.04581451416015625, -0.2393341064453125, 0.01032257080078125, 2.6453857421875, 1.5921440124511719, 1.7771892547607422, -1.535400390625, 3.882244110107422, 0.6798171997070312, 4.2075347900390625, 1.4911117553710938, -0.68731689453125, 3.4233169555664062, 6.935791015625, 0.8607673645019531, 2.222963333129883, 7.7342987060546875, 1.6827125549316406, -1.0990371704101562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000156.npy"}
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 2.1488969326019287, "std": 3.2347939014434814, "min": -5.895133972167969, "p10": -1.0290904998779298, "median": 1.9528121948242188, "p90": 6.235007476806642, "max": 13.373123168945312, "pos_frac": 0.734375, "sample": [5.545745849609375, 8.240530014038086, -0.5821380615234375, 3.781494140625, 4.135002136230469, -3.3096847534179688, 2.3290252685546875, -0.6228179931640625, 6.363243103027344, -0.017612457275390625, -0.178802490234375, 2.1058807373046875, 7.4185943603515625, 1.1057071685791016, -5.895133972167969, 0.072540283203125, 0.7160453796386719, 1.0672607421875, 4.800920486450195, 6.788599014282227, 3.4611663818359375, 4.540870666503906, 13.373123168945312, 3.256315231323242, 5.206136703491211, 1.79974365234375, -1.0154800415039062, -0.10050392150878906, 2.1515769958496094, -0.7397384643554688, 4.723394393920898, 0.14344024658203125, 3.0528011322021484, 1.7182540893554688, 2.2459487915039062, 6.9258575439453125, -1.4446945190429688, 6.60304069519043, 5.283180236816406, 1.2711257934570312, -1.0349235534667969, 2.2318038940429688, -0.1349163055419922, 4.955162048339844, 5.935791015625, 0.8845348358154297, -0.41703033447265625, 2.264720916748047, 1.655099868774414, 0.6825504302978516, 2.140380859375, -1.9421119689941406, 1.2778968811035156, 1.4533233642578125, 2.9948902130126953, 4.861207962036133, 0.7525062561035156, 2.6307945251464844, -4.75506591796875, 3.742494583129883, 1.3178253173828125, -0.5911712646484375, -3.2966461181640625, 3.6003265380859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000157.npy"}
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 2.0900795459747314, "std": 3.0977566242218018, "min": -4.296104431152344, "p10": -1.6029588699340818, "median": 2.121426582336426, "p90": 6.358378410339356, "max": 9.13873291015625, "pos_frac": 0.703125, "sample": [1.4873886108398438, 3.845355987548828, 2.5436630249023438, 2.4758548736572266, -4.296104431152344, 4.295719146728516, 8.192245483398438, 4.430450439453125, -1.0016937255859375, -0.25858306884765625, 6.0423431396484375, 3.4383468627929688, 3.07275390625, 3.7653141021728516, -4.168943405151367, 2.855632781982422, 8.130889892578125, 1.291219711303711, 1.5027332305908203, 4.237174987792969, -0.51806640625, 1.0012664794921875, 4.071666717529297, 0.5141487121582031, 6.49382209777832, 2.4804725646972656, -0.79046630859375, 7.1747894287109375, 2.8209228515625, 3.620147705078125, -0.8773345947265625, 3.0213470458984375, 5.2019195556640625, 1.454925537109375, 6.033048629760742, 4.825847625732422, 4.7162017822265625, -1.38177490234375, 1.766998291015625, -2.9116592407226562, 6.7020111083984375, 1.3771514892578125, 3.0284500122070312, 6.78302001953125, 2.6434860229492188, 3.34771728515625, -0.1981067657470703, -0.39487457275390625, 9.13873291015625, -0.8599758148193359, 1.2516403198242188, -1.1088027954101562, -1.6977519989013672, 0.8376502990722656, 0.4820384979248047, -0.4852313995361328, -2.1459178924560547, 4.947139739990234, 3.7586402893066406, 1.38385009765625, -2.4329452514648438, 1.1113128662109375, -0.3095245361328125, -3.994598388671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000158.npy"}
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 2.436002492904663, "std": 3.757107973098755, "min": -11.1871337890625, "p10": -1.6859651565551756, "median": 2.2367353439331055, "p90": 6.962055587768556, "max": 11.356369018554688, "pos_frac": 0.765625, "sample": [4.514766693115234, 4.234109878540039, 2.0876617431640625, 6.717475891113281, 3.187774658203125, -2.5128860473632812, -5.30242919921875, 8.132144927978516, 0.26085662841796875, 11.356369018554688, 6.107269287109375, 1.621063232421875, 7.725725173950195, 2.507232666015625, -0.632537841796875, 1.6985130310058594, 3.2758560180664062, 10.597946166992188, 3.503814697265625, 0.2403736114501953, 0.8982772827148438, -4.713996887207031, 1.9707221984863281, 3.50213623046875, 1.8528900146484375, 1.7185916900634766, 2.012369155883789, -1.9643630981445312, 6.0303497314453125, 10.407638549804688, -1.7062149047851562, 3.552276611328125, 2.317464828491211, -0.33150482177734375, 1.32867431640625, 2.840465545654297, 7.066875457763672, 3.0066375732421875, 0.8465347290039062, -0.21659088134765625, 1.80023193359375, -2.0616455078125, 8.086776733398438, 0.8706855773925781, 2.5385971069335938, 6.411243438720703, -1.6387157440185547, 3.381681442260742, -0.08807373046875, 2.156005859375, -0.05079841613769531, 5.9457550048828125, 2.7445297241210938, 1.2580718994140625, -0.12140083312988281, -11.1871337890625, 5.232883453369141, 2.7464065551757812, -0.0785369873046875, 1.9137458801269531, 3.3752975463867188, 2.3855667114257812, 6.129913330078125, 4.4127349853515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000159.npy"}
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 2.2266225814819336, "std": 3.857661724090576, "min": -5.84503173828125, "p10": -2.0073076248168946, "median": 1.985814094543457, "p90": 7.712969017028809, "max": 13.099075317382812, "pos_frac": 0.71875, "sample": [0.8752899169921875, -1.1952552795410156, 2.93389892578125, 8.857856750488281, 1.7166404724121094, 5.497505187988281, -0.075897216796875, -0.3700218200683594, 2.572742462158203, 4.8125762939453125, 11.256240844726562, -0.543365478515625, 7.696382522583008, 4.8413543701171875, 3.9140148162841797, 7.7200775146484375, 0.13316917419433594, -4.000892639160156, 3.8039093017578125, 5.227148056030273, -0.9924087524414062, -2.844991683959961, 0.591033935546875, -1.1073989868164062, 2.6427555084228516, 2.7736968994140625, 4.182163238525391, 0.4678058624267578, 3.4330101013183594, 0.4626121520996094, 2.2549877166748047, 2.711080551147461, 13.099075317382812, 10.488964080810547, 0.6395664215087891, -1.7072734832763672, 4.3942718505859375, 0.7276115417480469, 0.08774948120117188, 5.902515411376953, 0.6313018798828125, 4.181529998779297, 0.6701927185058594, -2.426116943359375, -0.6905555725097656, 3.3574371337890625, -4.961021423339844, -1.9158916473388672, 3.7520294189453125, 2.6510581970214844, -3.5522537231445312, -1.095743179321289, -1.6648178100585938, 5.015514373779297, 4.2638092041015625, 0.7393951416015625, 0.38401031494140625, 7.7431488037109375, -5.84503173828125, 4.150794982910156, 5.351409912109375, 8.734130859375, 1.19580078125, -2.0464859008789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000160.npy"}
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 2.474228858947754, "std": 3.829712390899658, "min": -8.41619873046875, "p10": -1.5572638511657715, "median": 2.065652847290039, "p90": 7.761318969726563, "max": 12.41021728515625, "pos_frac": 0.78125, "sample": [2.1986961364746094, -6.188606262207031, 0.9025039672851562, 2.7336883544921875, 2.401458740234375, -0.5563583374023438, 4.7091064453125, 9.747749328613281, -1.2387561798095703, -1.591660499572754, -2.6942977905273438, 3.015338897705078, 2.691946029663086, 4.259452819824219, 7.655906677246094, 2.1576786041259766, 3.2953243255615234, -8.41619873046875, 3.6092147827148438, -1.0787811279296875, -1.59619140625, 0.9149169921875, 1.503173828125, -0.0433807373046875, 2.8870162963867188, -2.526172637939453, 0.2812995910644531, 3.9486255645751953, 3.7359695434570312, 0.35147857666015625, 7.164722442626953, 7.806495666503906, 4.85041618347168, 8.950210571289062, 8.36136245727539, -1.4770050048828125, -1.9400825500488281, 11.796340942382812, 6.2867889404296875, 5.063236236572266, 1.8623008728027344, 7.557884216308594, 0.05606651306152344, 1.789215087890625, 2.0166244506835938, 0.050632476806640625, 1.4002494812011719, 3.452554702758789, 1.2875518798828125, 3.6831130981445312, -0.5198287963867188, 2.1146812438964844, 7.971242904663086, 1.4662628173828125, 0.30985260009765625, 0.03375053405761719, 1.9852142333984375, 1.5453224182128906, 6.108085632324219, 3.483856201171875, 12.41021728515625, -1.238372802734375, 0.5409984588623047, 5.050540924072266], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000161.npy"}
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 3.4372503757476807, "std": 3.7925853729248047, "min": -4.4793548583984375, "p10": -1.942629051208496, "median": 3.430586814880371, "p90": 7.874702453613281, "max": 13.505447387695312, "pos_frac": 0.828125, "sample": [6.08428955078125, 7.205497741699219, -2.66143798828125, 1.5386238098144531, 1.5369644165039062, 5.668891906738281, 3.453947067260742, 1.3668746948242188, 6.5776824951171875, 8.349977493286133, 2.7010879516601562, 5.222461700439453, 4.495710372924805, 13.505447387695312, 5.147758483886719, 3.3883190155029297, 2.24627685546875, 10.277450561523438, 4.48907470703125, 10.277385711669922, 5.201084136962891, 4.947582244873047, 1.7325210571289062, -1.6895751953125, 3.406291961669922, -2.7996368408203125, 5.197235107421875, 2.0880355834960938, -3.0088653564453125, 6.78472900390625, 1.1729412078857422, 3.850250244140625, 0.6619186401367188, 7.798732757568359, 7.7243194580078125, 3.088897705078125, 7.517915725708008, 5.101108551025391, 0.5543842315673828, -1.961038589477539, -3.278900146484375, -0.7121124267578125, 8.99420166015625, 7.907260894775391, 4.127834320068359, 3.4072265625, -2.8046035766601562, 3.6990966796875, -4.4793548583984375, -0.39813232421875, 0.9254035949707031, 0.49615478515625, 4.8640594482421875, 4.813528060913086, 9.516250610351562, 4.91192626953125, 0.5654869079589844, -1.8996734619140625, 1.915679931640625, 1.6287040710449219, 1.6275749206542969, 7.508249282836914, 7.071012496948242, 1.336029052734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000162.npy"}
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 3.368985414505005, "std": 4.329891681671143, "min": -5.786590576171875, "p10": -2.462203216552734, "median": 3.240285873413086, "p90": 8.784138488769532, "max": 12.212850570678711, "pos_frac": 0.796875, "sample": [-0.6732263565063477, 5.268707275390625, -5.786590576171875, 8.268821716308594, 3.718719482421875, 0.8766441345214844, -5.527301788330078, 3.4278564453125, 10.989013671875, -2.5115203857421875, 4.3901824951171875, 5.691780090332031, 11.386909484863281, 1.4280929565429688, 2.2570724487304688, 4.678855895996094, 4.710918426513672, -0.7272109985351562, 12.136505126953125, 3.315216064453125, 1.0585174560546875, 4.563957214355469, 6.795452117919922, -2.1674652099609375, -2.7489986419677734, 8.77328109741211, 3.165355682373047, 6.9713287353515625, 3.813993453979492, 1.6849746704101562, 5.338235855102539, 2.478496551513672, 6.064689636230469, 0.5263633728027344, 6.7268524169921875, 7.9077911376953125, -3.3686580657958984, 8.187973022460938, -2.951568603515625, 1.0946273803710938, 4.884979248046875, 12.212850570678711, 2.262786865234375, 6.294834136962891, 0.128387451171875, -1.2163009643554688, -0.2480316162109375, 1.6845741271972656, 1.254629135131836, 0.7524566650390625, -4.93756103515625, 8.255210876464844, 6.517017364501953, 0.5816802978515625, 1.5037803649902344, -2.3471298217773438, 9.288070678710938, 10.265228271484375, 8.78879165649414, 6.514001846313477, 4.9505615234375, 1.9759540557861328, 2.3857078552246094, 2.6279544830322266], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000163.npy"}
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 3.307917594909668, "std": 4.501279830932617, "min": -8.281715393066406, "p10": -2.429873657226562, "median": 2.55157470703125, "p90": 8.86113166809082, "max": 14.356513977050781, "pos_frac": 0.8125, "sample": [-3.23577880859375, 0.6800689697265625, 0.3244056701660156, 2.5951995849609375, 1.6278457641601562, 2.0071563720703125, 13.704360961914062, 10.113410949707031, 0.143585205078125, 1.6550178527832031, 5.982879638671875, -1.6111526489257812, 6.353748321533203, -3.505483627319336, 5.306026458740234, 6.8188323974609375, 7.20831298828125, 2.725757598876953, 10.765289306640625, -1.2319812774658203, 2.1820411682128906, -4.14544677734375, 3.75506591796875, 2.56365966796875, 0.3873138427734375, 2.1986427307128906, 8.868541717529297, 0.9352226257324219, 2.38360595703125, 0.6185417175292969, 5.55872917175293, 8.72391128540039, -1.0763530731201172, 3.719127655029297, 5.220760345458984, -2.7807540893554688, 7.317832946777344, 14.356513977050781, 11.703065872192383, -4.590858459472656, 1.8131523132324219, 3.7539443969726562, 1.6406211853027344, 6.404985427856445, 2.506275177001953, 3.405029296875, 6.429771423339844, 0.7628841400146484, 7.728179931640625, 1.2855949401855469, 1.4172515869140625, 2.967121124267578, 9.656635284423828, -2.9512996673583984, -0.40818214416503906, -8.281715393066406, 1.64654541015625, 8.843841552734375, 8.73040771484375, -0.6476249694824219, 8.5147705078125, 2.53948974609375, 3.757291793823242, 3.8650970458984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000164.npy"}
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 3.021076202392578, "std": 3.969986915588379, "min": -5.391990661621094, "p10": -1.9906007766723628, "median": 3.291606903076172, "p90": 8.225833320617678, "max": 11.256866455078125, "pos_frac": 0.765625, "sample": [9.790779113769531, 4.059431076049805, 4.381923675537109, 6.280479431152344, 1.5089988708496094, -0.8461074829101562, 1.9658851623535156, -0.9277191162109375, 4.259510040283203, 2.4917850494384766, 0.7463150024414062, 5.382099151611328, 7.2378692626953125, 0.3618125915527344, -1.0937042236328125, 3.5195999145507812, 3.26287841796875, 3.047840118408203, -4.629240036010742, -0.7941207885742188, -4.74945068359375, 0.007415771484375, 10.210708618164062, -0.8018894195556641, 8.76641845703125, 9.030158996582031, 2.2233009338378906, 2.9481430053710938, 10.323234558105469, 8.416252136230469, 3.4366455078125, 0.054553985595703125, 11.256866455078125, 6.999351501464844, 4.46484375, -5.245582580566406, 0.931671142578125, 6.719669342041016, 5.381381988525391, 6.0124359130859375, -1.5520153045654297, -0.06642913818359375, 1.35382080078125, 1.1374092102050781, 7.781522750854492, -2.1785659790039062, 2.4535980224609375, 1.8150711059570312, 4.196552276611328, -0.007923126220703125, -2.281829833984375, 4.395866394042969, -5.391990661621094, 6.234470367431641, 3.617828369140625, -3.414081573486328, 4.509193420410156, 1.2748184204101562, 4.652759552001953, 3.3203353881835938, 5.843940734863281, 7.12200927734375, 5.374900817871094, 6.765159606933594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000165.npy"}
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 2.3022077083587646, "std": 3.9822916984558105, "min": -4.857490539550781, "p10": -2.043878936767578, "median": 1.7611427307128906, "p90": 7.6112998962402365, "max": 13.91244888305664, "pos_frac": 0.65625, "sample": [6.0223236083984375, -1.0843658447265625, -2.06353759765625, 6.86541748046875, 6.105560302734375, 1.4142837524414062, -1.9980087280273438, 8.57623291015625, 1.8787841796875, 3.160919189453125, -1.5879364013671875, 2.5166282653808594, 8.046428680419922, -1.8170547485351562, 5.6344757080078125, 0.702728271484375, -0.5619964599609375, 6.98875617980957, 1.0794525146484375, -0.4260597229003906, -0.9401798248291016, 7.113269805908203, -4.169200897216797, 2.76739501953125, 7.824741363525391, 2.395709991455078, -0.3763275146484375, 5.551605224609375, 5.8638153076171875, 3.2020950317382812, -2.5926361083984375, -3.7370567321777344, 1.4188766479492188, 9.553192138671875, 4.359947204589844, -0.7581100463867188, 7.056587219238281, 1.6435012817382812, 3.9839324951171875, 3.3227367401123047, 0.728057861328125, 10.189178466796875, -1.2887496948242188, -4.857490539550781, 4.16204833984375, -2.9904861450195312, 0.31055450439453125, -1.3351097106933594, 2.2375431060791016, 13.91244888305664, 2.9763565063476562, -3.161773681640625, 0.6264801025390625, 4.487815856933594, -1.5443477630615234, -1.1693878173828125, 4.690956115722656, -1.5228328704833984, 8.912490844726562, -1.1008453369140625, 3.4119186401367188, 5.0716552734375, 1.0847644805908203, 0.5731239318847656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000166.npy"}
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 3.5844414234161377, "std": 4.199509620666504, "min": -6.533733367919922, "p10": -1.3472091674804687, "median": 2.865851402282715, "p90": 9.0829948425293, "max": 12.884864807128906, "pos_frac": 0.8125, "sample": [1.204681396484375, 2.1824874877929688, 1.8628273010253906, 4.557117462158203, -4.100212097167969, 6.429206848144531, 8.235008239746094, 1.1771812438964844, 2.900766372680664, 5.247447967529297, 6.71197509765625, 2.0318222045898438, 1.7882957458496094, 1.1590805053710938, -0.16010475158691406, 0.873291015625, 7.421669006347656, 6.5959320068359375, 9.426864624023438, -1.6086196899414062, -6.533733367919922, 2.954021453857422, 2.5196590423583984, 0.5716400146484375, 2.408845901489258, 1.2747039794921875, 3.2323226928710938, -0.7004547119140625, 2.380970001220703, 7.6858367919921875, 2.0871658325195312, 7.001924514770508, 0.4575920104980469, 5.107940673828125, 5.124671936035156, 5.73651123046875, 10.356742858886719, 12.884864807128906, 8.229156494140625, 5.905479431152344, 1.0710620880126953, 4.8514862060546875, -1.8563461303710938, 2.8309364318847656, 7.311246871948242, -1.150604248046875, -4.2487335205078125, 10.883255004882812, -1.2558746337890625, 5.1473388671875, 6.719287872314453, -0.20186614990234375, 0.09845733642578125, 12.505077362060547, 5.826799392700195, 9.494136810302734, -2.0714111328125, 11.986892700195312, 6.12994384765625, 8.280632019042969, 4.2729949951171875, 0.5428371429443359, 1.0004730224609375, -1.3863525390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000167.npy"}
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 2.3846707344055176, "std": 3.834333658218384, "min": -5.156455993652344, "p10": -2.168920135498047, "median": 2.0619640350341797, "p90": 7.64448471069336, "max": 10.587806701660156, "pos_frac": 0.6875, "sample": [4.542716979980469, -2.99755859375, 6.947742462158203, -2.0690994262695312, 1.8063392639160156, 6.9500579833984375, 5.161956787109375, -1.4952774047851562, -3.1976852416992188, -3.308990478515625, 3.2214279174804688, -0.7971572875976562, 2.0423355102539062, 9.81937026977539, 4.3496246337890625, 1.5092506408691406, 4.204906463623047, 2.3149795532226562, 0.09183502197265625, 4.537879943847656, 5.084877014160156, -0.9036941528320312, 1.2816696166992188, 2.6266117095947266, -0.27722740173339844, 4.57550048828125, 9.909576416015625, 2.081592559814453, -5.156455993652344, 2.1390380859375, -0.5102462768554688, 2.4377365112304688, -3.063995361328125, 5.613014221191406, 6.888221740722656, 0.1770172119140625, 7.31829833984375, 1.4494705200195312, -1.3286857604980469, 1.054046630859375, 1.0708065032958984, 10.587806701660156, -2.211700439453125, 8.066570281982422, 6.646736145019531, -0.9425582885742188, 5.1014556884765625, -1.0839385986328125, 2.891693115234375, -1.8439826965332031, 3.619873046875, 1.1825523376464844, 5.637073516845703, 1.2903594970703125, -0.8612060546875, 8.90266227722168, -1.7750473022460938, -4.395782470703125, -0.4059906005859375, 0.17840194702148438, 6.94862174987793, 7.784278869628906, 8.066024780273438, 3.1332015991210938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000168.npy"}
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 3.4599175453186035, "std": 5.216036319732666, "min": -5.961883544921875, "p10": -2.1993839263916013, "median": 2.8255348205566406, "p90": 10.346932220458987, "max": 21.290451049804688, "pos_frac": 0.78125, "sample": [3.2548065185546875, 7.279041290283203, 6.5892333984375, -3.1025009155273438, 4.87574577331543, 1.3173027038574219, -2.3962783813476562, 0.18334197998046875, 5.891563415527344, 4.329948425292969, 1.70343017578125, 0.5576858520507812, 7.438812255859375, 9.824661254882812, 0.0633544921875, -1.7399635314941406, 3.045818328857422, 1.6281204223632812, 4.8319091796875, -0.31725311279296875, -0.4657745361328125, 9.76287841796875, 11.202194213867188, 3.6276702880859375, -0.920257568359375, 18.345436096191406, 1.6460723876953125, 2.8853607177734375, 0.4150123596191406, 5.590427398681641, 1.9113330841064453, 1.9965133666992188, 6.872692108154297, -1.6057147979736328, 5.482706069946289, 1.871622085571289, 7.8771820068359375, -5.6898193359375, 3.3186702728271484, 2.7657089233398438, 0.4316997528076172, 3.7318267822265625, 1.0521926879882812, -0.38759613037109375, 4.086633682250977, 1.6361465454101562, 5.743595123291016, 0.383087158203125, 10.907669067382812, 5.989715576171875, -1.153076171875, 6.018589019775391, 3.516399383544922, -5.961883544921875, -2.6257476806640625, 21.290451049804688, 11.102493286132812, 10.570762634277344, -5.557321548461914, 13.756664276123047, 2.1391067504882812, 6.900259017944336, -4.324859619140625, 0.039226531982421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000169.npy"}
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 3.0963125228881836, "std": 5.06438684463501, "min": -11.818437576293945, "p10": -3.523620605468749, "median": 3.667369842529297, "p90": 9.263465881347656, "max": 13.31256103515625, "pos_frac": 0.765625, "sample": [6.6282958984375, 1.2209548950195312, 3.2034168243408203, 4.254722595214844, 2.641895294189453, 13.293739318847656, 2.9668750762939453, 4.189825057983398, 2.5896377563476562, 11.170902252197266, -3.9653396606445312, 1.2500724792480469, 5.335845947265625, 13.31256103515625, 9.461265563964844, -1.680501937866211, 2.0091686248779297, 4.282657623291016, 6.2839202880859375, -0.9325485229492188, -1.1523017883300781, 9.102981567382812, 3.352079391479492, 0.5766487121582031, -2.4929428100585938, 3.9040145874023438, -5.9505615234375, -1.8387451171875, -11.818437576293945, 3.682281494140625, 7.142696380615234, 3.6524581909179688, 7.828338623046875, 3.725618362426758, -6.752653121948242, 2.2779502868652344, -5.18756103515625, 6.7023468017578125, 9.053657531738281, 0.0586090087890625, 4.7500152587890625, -0.30609130859375, 0.6976585388183594, 5.5808868408203125, 4.160713195800781, 13.279861450195312, 4.226104736328125, 9.332244873046875, 4.558193206787109, 0.8078994750976562, -0.9944381713867188, -2.244579315185547, 1.2245407104492188, 10.676021575927734, -4.694236755371094, 0.4354743957519531, 6.331077575683594, 6.3249664306640625, 1.0752296447753906, 4.009607315063477, 7.832664489746094, 7.620159149169922, 6.109642028808594, -6.013450622558594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000170.npy"}
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 3.776066780090332, "std": 5.6516947746276855, "min": -12.43499755859375, "p10": -1.0356855392456055, "median": 2.8564815521240234, "p90": 11.829857635498048, "max": 13.852741241455078, "pos_frac": 0.75, "sample": [3.849517822265625, 3.4360694885253906, 11.141494750976562, 6.557411193847656, 5.414924621582031, 13.852741241455078, 5.7804107666015625, 8.648452758789062, 0.25768470764160156, 2.351736068725586, 0.06476783752441406, 0.764801025390625, 13.414604187011719, 5.23590087890625, 12.62252426147461, -0.3606147766113281, 7.405275344848633, 2.677440643310547, 8.345399856567383, -0.43547821044921875, -0.5430622100830078, 2.307109832763672, 3.0875816345214844, -3.02655029296875, 8.636598587036133, 0.4936485290527344, 2.008392333984375, 0.4931488037109375, 11.764045715332031, -8.86187744140625, 3.770599365234375, 5.398612976074219, 13.022499084472656, -1.0487003326416016, 3.7720375061035156, 10.106674194335938, -1.7857894897460938, -4.6358184814453125, 13.476001739501953, 11.858062744140625, -12.43499755859375, 1.2839164733886719, 6.180946350097656, 2.159637451171875, 1.522064208984375, 10.531745910644531, 8.483566284179688, -0.2734222412109375, -0.8985824584960938, 6.7689971923828125, 3.0355224609375, 12.935302734375, 8.28587532043457, 2.257965087890625, -0.9062232971191406, 1.3159637451171875, 7.9826202392578125, -0.10432815551757812, -1.0053176879882812, -8.951080322265625, 9.76202392578125, 0.9360198974609375, 2.3353118896484375, -0.8535346984863281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000171.npy"}
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 3.747652053833008, "std": 5.022698879241943, "min": -13.374408721923828, "p10": -1.550949668884277, "median": 3.9617652893066406, "p90": 10.278161430358887, "max": 15.311067581176758, "pos_frac": 0.8125, "sample": [-4.146186828613281, 4.4400634765625, 1.1667861938476562, 2.807159423828125, 2.5997314453125, 1.4942150115966797, 0.008148193359375, 5.279533386230469, 6.5712738037109375, 10.349258422851562, 14.286376953125, 4.15496826171875, 3.888397216796875, -0.48574066162109375, 6.454521179199219, 15.311067581176758, 6.9044189453125, -4.36895751953125, -0.8021659851074219, 4.7603302001953125, 7.6378631591796875, 1.9843978881835938, -5.794504165649414, -0.06427764892578125, 4.035133361816406, 3.7975082397460938, 2.9563140869140625, 6.136322021484375, 5.886070251464844, 1.440704345703125, 4.323184967041016, 5.181785583496094, 5.3077239990234375, 8.081245422363281, 0.4808921813964844, -1.1639728546142578, 10.668458938598633, 5.148155212402344, 10.112268447875977, 1.7816238403320312, 0.25023841857910156, 4.487331390380859, 4.480712890625, 8.668241500854492, 8.79351806640625, -3.192913055419922, 1.2900848388671875, 0.2509613037109375, 2.3512649536132812, 5.71197509765625, -3.5515098571777344, 8.72674560546875, 10.911628723144531, -13.374408721923828, 2.045278549194336, 7.7981109619140625, 0.82867431640625, 3.01739501953125, 1.024972915649414, 11.180147171020508, -1.716796875, -0.0043544769287109375, 13.938926696777344, 7.323402404785156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000172.npy"}
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 3.9660136699676514, "std": 6.744984149932861, "min": -9.382328033447266, "p10": -3.838529586791992, "median": 3.7475900650024414, "p90": 13.660464859008789, "max": 17.881820678710938, "pos_frac": 0.703125, "sample": [-5.883756637573242, 1.5755996704101562, -1.0784149169921875, 5.155029296875, 4.011383056640625, 10.263778686523438, 5.024633407592773, -1.1638374328613281, -0.7380199432373047, 5.741737365722656, 0.8737602233886719, 3.7276611328125, 11.77313232421875, -8.340145111083984, -3.7080860137939453, 13.227256774902344, 14.923263549804688, 1.0365371704101562, 3.767518997192383, 2.2829017639160156, 7.3628692626953125, 15.185371398925781, -2.6683998107910156, 14.158294677734375, -9.002809524536133, -7.5118560791015625, 6.993865966796875, 8.533012390136719, 2.5192527770996094, 1.165863037109375, -2.8061447143554688, 13.428619384765625, -2.7716102600097656, 13.683670043945312, 6.742164611816406, 1.8692913055419922, 9.674381256103516, -2.749584197998047, 8.994888305664062, 11.092926025390625, 4.987033843994141, 2.192779541015625, 2.180604934692383, 11.082801818847656, 5.508317947387695, -9.382328033447266, -0.6108169555664062, -2.1284637451171875, 17.881820678710938, 10.152069091796875, 13.998540878295898, -4.949773788452148, 5.424812316894531, 5.689231872558594, 13.606319427490234, -2.9099502563476562, 0.0583038330078125, -3.8944339752197266, 4.170980453491211, 16.094146728515625, 0.36468505859375, 6.943822860717773, -0.24060821533203125, 1.238983154296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000173.npy"}
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 4.132425308227539, "std": 6.132242202758789, "min": -11.709144592285156, "p10": -3.3324655532836913, "median": 3.5461978912353516, "p90": 11.97748489379883, "max": 18.460655212402344, "pos_frac": 0.75, "sample": [3.6228065490722656, 4.092041015625, 9.993949890136719, 2.995319366455078, 5.134799957275391, -3.4312744140625, 2.2930221557617188, 8.242263793945312, 0.6771087646484375, 4.642341613769531, 12.037559509277344, 9.925567626953125, 1.084686279296875, -0.5001373291015625, 9.9779052734375, -4.176292419433594, 7.989145278930664, 7.608308792114258, 5.589702606201172, -7.9595184326171875, 11.837310791015625, 6.658561706542969, -1.9556503295898438, 7.6460113525390625, 17.990631103515625, -1.7865371704101562, 6.052215576171875, -1.7735099792480469, 15.26202392578125, 1.9295654296875, 0.42218017578125, 18.460655212402344, 16.25901222229004, -3.1019115447998047, 5.519859313964844, 9.4061279296875, 2.40106201171875, 8.319427490234375, -0.16654586791992188, 12.134063720703125, -2.0186309814453125, 10.831069946289062, 4.084850311279297, -0.6721935272216797, 0.17948150634765625, 7.931648254394531, 1.9819107055664062, -3.7130966186523438, 13.292800903320312, 1.8676509857177734, -11.709144592285156, 8.498802185058594, 1.372039794921875, 1.5835342407226562, 8.252914428710938, -3.5597305297851562, 5.789222717285156, -2.178577423095703, 7.718025207519531, -5.1597137451171875, 1.5076427459716797, 3.203094482421875, 3.4695892333984375, 0.5681571960449219], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000174.npy"}
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 4.604219436645508, "std": 6.623898029327393, "min": -11.448638916015625, "p10": -2.7608161926269528, "median": 3.699932098388672, "p90": 13.937572097778325, "max": 17.4842529296875, "pos_frac": 0.71875, "sample": [-4.789350509643555, 6.665033340454102, -2.2907867431640625, 1.6206302642822266, -0.8811416625976562, 10.060428619384766, 0.06117057800292969, 11.212493896484375, 5.372100830078125, -0.6206436157226562, 0.8826065063476562, 11.8447265625, 4.404911041259766, 6.3372039794921875, -2.1700973510742188, 14.509344100952148, 6.0233306884765625, 10.407051086425781, 9.267158508300781, 1.5748291015625, 3.5988311767578125, 2.827136993408203, 2.93463134765625, 8.941360473632812, 9.811767578125, 5.19561767578125, -1.1104774475097656, -2.0331573486328125, 16.55722999572754, -11.448638916015625, 11.314537048339844, 0.7658615112304688, 3.8010330200195312, 4.8214569091796875, -1.686767578125, -0.05175971984863281, 3.044445037841797, 16.65949249267578, -1.2889347076416016, -5.970947265625, 5.458168029785156, -2.2347259521484375, 1.2401275634765625, 3.1687088012695312, -3.8704376220703125, -2.0949554443359375, 11.641189575195312, -4.094024658203125, 11.25921630859375, 12.603437423706055, 6.262834548950195, 11.859550476074219, 1.0718002319335938, -4.864967346191406, 1.5970230102539062, 11.028861999511719, -2.9622573852539062, 16.95307159423828, 7.581855773925781, 17.4842529296875, 0.33350372314453125, 15.20465087890625, 7.511665344238281, 16.357799530029297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000175.npy"}
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 3.1816506385803223, "std": 6.6463942527771, "min": -11.877120971679688, "p10": -4.30770034790039, "median": 3.3099193572998047, "p90": 10.879400825500497, "max": 22.784225463867188, "pos_frac": 0.671875, "sample": [16.228958129882812, -9.680091857910156, 7.783454895019531, 8.457761764526367, 0.37676239013671875, 4.1195831298828125, -1.1762924194335938, 5.587806701660156, 3.7686405181884766, 1.4635543823242188, -9.38311767578125, -4.470405578613281, -3.9280548095703125, 6.457038879394531, -5.168418884277344, 11.879888534545898, 0.036754608154296875, 7.450336456298828, -2.0649261474609375, -0.16851806640625, -1.1530380249023438, -1.681121826171875, 3.5201454162597656, 18.498790740966797, 5.2957000732421875, 3.0996932983398438, -3.286285400390625, 5.365760803222656, -1.1714096069335938, 7.982505798339844, 8.245880126953125, -1.1108875274658203, -1.500152587890625, 8.544929504394531, 4.28337287902832, 16.108379364013672, 0.8644866943359375, 5.51800537109375, 1.2482986450195312, -4.672523498535156, 1.0259780883789062, 1.0830078125, 5.07188606262207, 11.96603775024414, 3.9269561767578125, 7.871425628662109, -1.919189453125, -2.783538818359375, 8.384254455566406, 15.939727783203125, -0.9873199462890625, 7.823070526123047, 1.3337478637695312, 6.770317077636719, 1.3899803161621094, 4.183311462402344, 22.784225463867188, 0.9388885498046875, -0.7201385498046875, 7.7657928466796875, 5.6365203857421875, -11.877120971679688, 4.365791320800781, -7.919219970703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000176.npy"}
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 6.162568092346191, "std": 6.206217288970947, "min": -10.401992797851562, "p10": -0.40593643188476547, "median": 5.637149810791016, "p90": 14.796844673156741, "max": 21.343345642089844, "pos_frac": 0.875, "sample": [2.3535385131835938, 0.5341243743896484, 8.962491989135742, 19.548690795898438, 9.522193908691406, -0.25075531005859375, 2.6445236206054688, 7.7661895751953125, 15.956573486328125, 5.838508605957031, 21.343345642089844, 11.359172821044922, 13.637928009033203, -0.9085617065429688, 3.9757041931152344, -4.193021774291992, 15.374168395996094, 2.001495361328125, -2.634927749633789, 7.8949127197265625, 6.8582763671875, 5.435791015625, 2.183115005493164, 11.381147384643555, 9.290376663208008, 1.8935470581054688, 4.109464645385742, 11.796985626220703, -1.6095867156982422, 1.9486408233642578, 6.7235565185546875, 3.6678390502929688, 16.404296875, 8.47943115234375, 0.1292552947998047, 10.306865692138672, 14.144262313842773, 4.782768249511719, 3.9434814453125, 7.806640625, 1.2188262939453125, 4.21565055847168, 8.879749298095703, 1.0745697021484375, 10.067146301269531, -5.3748779296875, 3.6419906616210938, -0.472442626953125, 11.515514373779297, 0.1281890869140625, 11.384567260742188, 0.3935070037841797, -10.401992797851562, 18.53097915649414, 6.160285949707031, 10.039527893066406, 5.879156112670898, 15.076522827148438, 8.234619140625, 3.701536178588867, 4.210456848144531, 11.042720794677734, 0.5477981567382812, 4.307914733886719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000177.npy"}
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 3.6693332195281982, "std": 5.864775657653809, "min": -9.780960083007812, "p10": -3.9031646728515623, "median": 3.2954540252685547, "p90": 12.343359375000004, "max": 16.55743408203125, "pos_frac": 0.78125, "sample": [2.594278335571289, 11.3543701171875, 4.98187255859375, 0.564117431640625, 9.183029174804688, 0.8703975677490234, -1.795623779296875, 6.737262725830078, 7.7510833740234375, 5.549877166748047, -3.993865966796875, 4.159477233886719, 1.4431686401367188, -7.24888801574707, 4.2305450439453125, 4.040504455566406, 13.249526977539062, -0.222137451171875, -1.160797119140625, 0.6719207763671875, 13.991260528564453, 1.1893348693847656, 4.2940673828125, -7.829833984375, 3.569316864013672, 1.54644775390625, 2.492523193359375, 5.795867919921875, 1.76971435546875, 2.404359817504883, 6.74481201171875, 0.029268264770507812, 1.4150314331054688, 15.094478607177734, 1.8592529296875, 8.04697036743164, 2.4870758056640625, 3.0215911865234375, 16.55743408203125, 2.1072616577148438, -6.49383544921875, 9.47946548461914, 13.6380615234375, 4.808803558349609, -9.780960083007812, 10.04958724975586, -0.8496971130371094, -4.122711181640625, 12.7672119140625, 4.298137664794922, 10.03277587890625, 15.226455688476562, 4.21125602722168, 7.485282897949219, 2.081462860107422, 0.553558349609375, 9.196762084960938, -0.09514617919921875, -6.6671142578125, -3.6915283203125, 4.425628662109375, 5.404426574707031, 6.91033935546875, -3.5772628784179688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000178.npy"}
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 4.09881591796875, "std": 6.959219932556152, "min": -13.705245971679688, "p10": -3.767561912536621, "median": 4.637180328369141, "p90": 11.481114959716797, "max": 22.639190673828125, "pos_frac": 0.703125, "sample": [5.537086486816406, -2.011962890625, -0.11100006103515625, 6.750244140625, -0.551177978515625, 6.4404754638671875, 5.543767929077148, 4.7762603759765625, 16.328170776367188, 7.175239562988281, 9.699014663696289, 11.438758850097656, 7.940654754638672, -7.028923034667969, 7.235086441040039, -4.280281066894531, -13.705245971679688, -3.6830005645751953, 1.4443283081054688, 4.5333709716796875, -2.1472339630126953, 10.199188232421875, 0.9646759033203125, 8.450141906738281, 11.978496551513672, -6.4493255615234375, -1.0654525756835938, 9.05471420288086, 1.7906208038330078, 2.848115921020508, 4.740989685058594, 10.085533142089844, 1.7464752197265625, -3.803802490234375, 1.836334228515625, -6.2551727294921875, 1.172271728515625, 22.189056396484375, 8.956626892089844, -1.9517593383789062, 7.677080154418945, 7.481071472167969, 7.987770080566406, 7.758663177490234, -1.8028945922851562, 11.499267578125, 13.525184631347656, 2.58465576171875, 14.877120971679688, 22.639190673828125, 7.542583465576172, 6.237642288208008, 11.170135498046875, 1.0077934265136719, -1.5953712463378906, 8.054641723632812, 1.84063720703125, 0.92333984375, 3.7712268829345703, -1.9528045654296875, -11.11387825012207, -0.8698959350585938, 8.928955078125, -3.659252166748047], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000179.npy"}
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 3.9159164428710938, "std": 7.431264877319336, "min": -13.2122802734375, "p10": -5.046771240234374, "median": 3.605527877807617, "p90": 14.817338180541993, "max": 20.09185791015625, "pos_frac": 0.703125, "sample": [13.162368774414062, 0.31018829345703125, 6.210441589355469, 0.978179931640625, 3.1588134765625, 8.303497314453125, -8.438072204589844, 5.277156829833984, -0.5040111541748047, -2.2617340087890625, 13.605878829956055, 9.360183715820312, -3.89154052734375, -0.5720653533935547, 6.791618347167969, 0.80718994140625, -5.5418701171875, 12.440505981445312, -0.5950736999511719, -9.695655822753906, 15.661674499511719, 3.7412490844726562, 5.728237152099609, 6.2622222900390625, 6.390632629394531, 8.341943740844727, -0.16836166381835938, 1.9334030151367188, 3.818960189819336, 4.378971099853516, 6.9221038818359375, 12.940492630004883, -6.4567718505859375, 1.1586151123046875, 5.984081268310547, 6.1064300537109375, 1.3275032043457031, 15.873023986816406, 0.28211402893066406, -1.8913764953613281, 2.7346839904785156, 7.509885787963867, 15.593915939331055, -1.3301124572753906, -3.525205612182617, 2.6891517639160156, 19.87541961669922, -1.2597846984863281, 20.09185791015625, -0.6156444549560547, 14.866031646728516, -8.384208679199219, 3.489990234375, -3.7374649047851562, 1.0271530151367188, 3.7210655212402344, 19.47046661376953, -13.2122802734375, 4.765613555908203, -9.783493041992188, 7.8523101806640625, 4.349082946777344, 14.703720092773438, 2.485342025756836], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000180.npy"}
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 5.88032341003418, "std": 6.138671398162842, "min": -6.407073974609375, "p10": -0.9076478958129881, "median": 5.5699462890625, "p90": 13.62308979034424, "max": 20.982772827148438, "pos_frac": 0.84375, "sample": [4.0478668212890625, 10.249942779541016, 0.953582763671875, -0.5450363159179688, 4.551250457763672, 3.4003524780273438, 5.48785400390625, -6.407073974609375, 3.029094696044922, 1.5892143249511719, 10.01483154296875, 3.4556655883789062, 4.915447235107422, -0.7559661865234375, 10.419378280639648, 7.356147766113281, 8.518049240112305, 7.5338134765625, 13.252838134765625, -3.9531402587890625, 6.033233642578125, 15.487625122070312, 10.329742431640625, 6.527374267578125, 20.19586181640625, 9.223373413085938, 3.9760360717773438, 20.982772827148438, 0.6866455078125, -5.885906219482422, 12.070648193359375, 3.377643585205078, 16.185577392578125, -5.757165908813477, 11.885910034179688, 0.9729232788085938, -0.9726543426513672, 14.036102294921875, 1.3911323547363281, 0.5175247192382812, 8.83236312866211, 3.3479061126708984, 6.453636169433594, 12.859199523925781, 10.86419677734375, 0.9276142120361328, 1.5244884490966797, 5.8632659912109375, 7.648193359375, 0.6702423095703125, 2.0500335693359375, 13.771717071533203, 17.357406616210938, 3.9155807495117188, 7.9644012451171875, 13.27629280090332, 4.1575164794921875, 5.65203857421875, 8.138851165771484, -2.3046493530273438, 8.00213623046875, -0.5138454437255859, 10.393165588378906, -2.889556884765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000181.npy"}
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 6.304078102111816, "std": 8.257317543029785, "min": -16.312034606933594, "p10": -2.582478332519531, "median": 5.326530456542969, "p90": 16.927525711059573, "max": 22.287246704101562, "pos_frac": 0.75, "sample": [22.287246704101562, -2.995016098022461, -2.2403602600097656, 6.972248077392578, 12.919227600097656, 11.026565551757812, 7.397178649902344, -2.7356109619140625, -3.6260833740234375, 19.557846069335938, 9.501277923583984, 3.5601959228515625, 20.963531494140625, -1.58526611328125, 12.334457397460938, 8.238067626953125, 5.586692810058594, 4.66351318359375, 16.305774688720703, 5.066368103027344, 7.045450210571289, 0.5284481048583984, 2.7468719482421875, 0.2273712158203125, 8.677177429199219, -0.8003997802734375, 15.456573486328125, 7.017171859741211, 15.6312255859375, 16.045732498168945, 22.0762939453125, 4.340738296508789, 9.549545288085938, 4.005126953125, 17.128013610839844, -2.6410675048828125, 1.0164337158203125, 3.3236846923828125, 15.397735595703125, -0.5036163330078125, 3.219757080078125, -9.855619430541992, 19.12694549560547, -2.445770263671875, 15.020309448242188, 9.358161926269531, 3.1694068908691406, -1.0474090576171875, 13.384552001953125, 4.2606658935546875, 9.03769302368164, 16.459720611572266, -2.008087158203125, 15.943658828735352, 17.766220092773438, 2.5968017578125, -2.247589111328125, 5.821210861206055, 9.03902816772461, 1.9403495788574219, 1.4499282836914062, -8.830322265625, -0.8529567718505859, -16.312034606933594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000182.npy"}
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 2.4906740188598633, "std": 6.676412582397461, "min": -17.994983673095703, "p10": -5.705212211608886, "median": 1.9089508056640625, "p90": 12.294620513916016, "max": 14.352245330810547, "pos_frac": 0.640625, "sample": [1.1441268920898438, 12.336082458496094, -0.9165191650390625, -0.22722244262695312, 2.6567535400390625, -1.4925460815429688, -4.5551300048828125, 12.9981689453125, -7.706645965576172, -0.6348724365234375, -17.994983673095703, 0.8378219604492188, 7.003671646118164, -2.689971923828125, 6.623878479003906, 12.929508209228516, -1.0232582092285156, 4.8565521240234375, 10.078781127929688, 7.395641326904297, 6.671173095703125, -3.6755523681640625, 11.981925964355469, -5.557199478149414, 1.5135154724121094, -2.9979171752929688, 6.4093170166015625, 1.9082107543945312, -7.7044830322265625, -5.768646240234375, 14.155677795410156, 6.5288543701171875, 1.9096908569335938, 1.9860954284667969, -6.686790466308594, 8.117034912109375, 5.321235656738281, 13.646244049072266, -11.027412414550781, 14.352245330810547, -6.500743865966797, 0.0842132568359375, 11.450141906738281, -0.5910758972167969, 1.873250961303711, 4.708244323730469, 9.639284133911133, 3.510265350341797, -4.438758850097656, -1.0895061492919922, 1.1133499145507812, 6.174560546875, -0.07722282409667969, 2.3219375610351562, 12.1978759765625, -0.25588226318359375, 4.524589538574219, -5.004798889160156, 1.6953907012939453, 0.4292259216308594, 2.86065673828125, 12.413284301757812, 6.006771087646484, 3.655029296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000183.npy"}
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 5.0511322021484375, "std": 6.497813701629639, "min": -5.496845245361328, "p10": -3.097519874572754, "median": 3.372237205505371, "p90": 14.962099838256837, "max": 19.374664306640625, "pos_frac": 0.734375, "sample": [2.399658203125, 10.814399719238281, 15.117717742919922, 2.6356430053710938, 14.598991394042969, 3.483551025390625, -1.2187976837158203, -1.0028457641601562, -0.2826576232910156, 3.2075576782226562, 0.22323036193847656, 17.26197052001953, 2.4560794830322266, 3.260923385620117, 10.11895751953125, -5.496845245361328, 13.63421630859375, 12.299118041992188, 4.390110015869141, 19.374664306640625, -3.0147476196289062, 15.38465690612793, 5.865451812744141, 18.674936294555664, -3.740367889404297, 11.950851440429688, 0.9816608428955078, 9.481948852539062, 15.837894439697266, 1.4548110961914062, 4.755195617675781, -0.34178924560546875, 5.18902587890625, -3.1833229064941406, -0.016265869140625, 3.5137100219726562, 15.164596557617188, 4.315473556518555, 1.4710159301757812, 11.613731384277344, 3.8368663787841797, 1.57232666015625, 12.387062072753906, -0.3105010986328125, 12.590019226074219, 6.400970458984375, 4.1312255859375, 2.4570159912109375, -0.2180805206298828, 12.722557067871094, -4.414104461669922, -0.534332275390625, 2.0092239379882812, 3.184356689453125, 10.190200805664062, 13.106277465820312, -4.7461090087890625, 5.6416015625, -0.2595844268798828, 5.3845977783203125, -3.4702072143554688, 1.1310234069824219, -3.132993698120117, 0.97894287109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000184.npy"}
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 5.253050804138184, "std": 9.062512397766113, "min": -12.465438842773438, "p10": -4.92075252532959, "median": 5.048128128051758, "p90": 17.60090503692627, "max": 22.5489501953125, "pos_frac": 0.671875, "sample": [3.8139266967773438, 21.8348388671875, -2.8633499145507812, 12.849679946899414, 22.165191650390625, -12.267704010009766, 6.775737762451172, 3.4365501403808594, -3.3675613403320312, -0.0655059814453125, 20.750091552734375, 19.6436767578125, 0.05255889892578125, -3.14251708984375, 16.938095092773438, 3.740436553955078, 22.5489501953125, 3.4348297119140625, 14.436286926269531, 15.885490417480469, -3.273540496826172, 18.7115478515625, 6.0305328369140625, 5.267066955566406, 1.8598957061767578, 10.688350677490234, 6.331523895263672, 9.829303741455078, 4.4118194580078125, 8.8548583984375, -4.840370178222656, -1.8039932250976562, 10.997474670410156, 17.6385498046875, -12.465438842773438, 9.488731384277344, 9.844390869140625, 7.434320449829102, 7.149600982666016, -1.2504196166992188, -9.77813720703125, 6.2304534912109375, -2.6907005310058594, 8.1300048828125, 11.890678405761719, -10.161468505859375, 13.165143966674805, -1.3211421966552734, 15.859024047851562, -2.700643539428711, 2.3325042724609375, -10.67579460144043, 4.7143096923828125, -3.7360153198242188, 1.3703231811523438, 6.5672760009765625, -8.68927001953125, -4.955202102661133, 9.307403564453125, -2.5243148803710938, -1.4836463928222656, 4.829189300537109, 15.498291015625, 17.5130672454834], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000185.npy"}
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 5.825077533721924, "std": 9.250648498535156, "min": -13.920661926269531, "p10": -4.83162841796875, "median": 5.318138122558594, "p90": 17.076001548767092, "max": 25.46978759765625, "pos_frac": 0.71875, "sample": [16.193313598632812, 21.573043823242188, -11.175376892089844, 4.160503387451172, 6.542549133300781, -2.17437744140625, 9.82366943359375, -6.7186431884765625, -4.8610992431640625, 13.713706970214844, 7.9595794677734375, -3.763172149658203, 6.4824371337890625, 16.769699096679688, 25.46978759765625, 16.768203735351562, 3.1278762817382812, 8.325630187988281, 2.2649765014648438, -10.709732055664062, 1.8506107330322266, 6.705305099487305, 11.55770492553711, 21.854736328125, -13.920661926269531, -4.7628631591796875, -3.5161380767822266, -4.252445220947266, 0.39046478271484375, 2.8530921936035156, 5.283592224121094, 16.932344436645508, -2.57952880859375, -0.5883026123046875, -1.4529285430908203, -1.49676513671875, 7.700904846191406, 17.116188049316406, 15.576972961425781, 3.517711639404297, -1.9429779052734375, -10.00518798828125, 0.4035491943359375, 2.4952239990234375, 8.037811279296875, 4.931034088134766, 16.98223304748535, 12.95257568359375, -6.647735595703125, 5.352684020996094, 24.123931884765625, 8.06865119934082, 5.248558044433594, 22.895401000976562, 4.678016662597656, 5.845542907714844, 24.596237182617188, 11.569372177124023, 8.163185119628906, 8.450754165649414, 4.7937774658203125, 8.479156494140625, -1.7134857177734375, 6.504064559936523], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000186.npy"}
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 5.456380844116211, "std": 8.581183433532715, "min": -17.987991333007812, "p10": -4.625600051879882, "median": 6.468479156494141, "p90": 16.44412612915039, "max": 23.340164184570312, "pos_frac": 0.703125, "sample": [10.760726928710938, -1.868377685546875, 13.315338134765625, 19.45928192138672, 23.340164184570312, 13.8387451171875, 6.401496887207031, 2.793882369995117, 6.53546142578125, 14.791728973388672, -10.2396240234375, 7.076385498046875, -0.6458816528320312, 10.133407592773438, 11.117324829101562, 3.933145523071289, 1.3172168731689453, 16.518936157226562, 8.459526062011719, 8.713220596313477, -8.886016845703125, -17.987991333007812, 15.133949279785156, -0.08612823486328125, 3.9563064575195312, -0.65728759765625, 1.4382896423339844, 0.41552734375, 0.877899169921875, 12.660930633544922, 12.225215911865234, 14.443092346191406, -9.320659637451172, -1.9995613098144531, 7.1746978759765625, 5.396087646484375, 3.576793670654297, 0.8910675048828125, -1.9830093383789062, 9.246526718139648, -2.09051513671875, -4.890987396240234, 16.269569396972656, -0.2833118438720703, 7.3020782470703125, 10.148674011230469, 17.02159881591797, -0.8747482299804688, 7.8927764892578125, 12.906768798828125, 16.57733154296875, -2.3660049438476562, 10.142738342285156, 22.19976806640625, 3.478057861328125, -4.0063629150390625, 11.229095458984375, 17.05738067626953, 4.3052825927734375, 8.424629211425781, -0.8478240966796875, -10.17707633972168, -9.803802490234375, 7.325431823730469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000187.npy"}
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 5.61491584777832, "std": 8.860124588012695, "min": -24.37183380126953, "p10": -3.435024642944336, "median": 4.195854187011719, "p90": 18.200889205932626, "max": 25.71855926513672, "pos_frac": 0.796875, "sample": [4.378044128417969, -3.5134830474853516, 21.500743865966797, -4.108808517456055, -1.1075057983398438, 13.331527709960938, 10.924545288085938, 6.239603042602539, 12.3924560546875, 2.1110687255859375, 2.1708946228027344, 21.631332397460938, 12.877460479736328, -1.8386611938476562, 0.198760986328125, 1.0077190399169922, 9.229209899902344, 19.123291015625, 0.3748588562011719, -4.043973922729492, 5.9835357666015625, 2.7553558349609375, 0.793975830078125, -0.19091033935546875, 7.715126037597656, 0.09803009033203125, -3.251955032348633, 13.549522399902344, 3.1132469177246094, 11.06295394897461, 12.532768249511719, 1.3271636962890625, 7.4005126953125, 25.71855926513672, 13.649070739746094, 0.015590667724609375, 16.00550079345703, 5.677490234375, -1.938568115234375, 5.247383117675781, 23.552631378173828, 0.9775390625, 6.091468811035156, 3.016021728515625, 0.619537353515625, 0.1888885498046875, 5.583209991455078, 14.526847839355469, 3.2026824951171875, 0.7760448455810547, 19.49853515625, -12.016616821289062, -2.2059783935546875, 7.677513122558594, 3.652984619140625, 4.013664245605469, 6.249153137207031, -5.946138381958008, 16.04861831665039, 20.601604461669922, 15.872770309448242, 5.1566619873046875, -3.5546531677246094, -24.37183380126953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000188.npy"}
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 5.634346961975098, "std": 8.569024085998535, "min": -15.21670913696289, "p10": -5.69389419555664, "median": 5.372350692749023, "p90": 17.61235275268555, "max": 22.80017852783203, "pos_frac": 0.8125, "sample": [2.469512939453125, 6.351276397705078, 7.226039886474609, 6.531402587890625, 5.1190948486328125, 17.4007568359375, 15.795082092285156, 11.943395614624023, 0.5121898651123047, 3.1630859375, -11.666824340820312, 17.680957794189453, -11.115455627441406, 1.6952133178710938, 0.5591278076171875, -1.341806411743164, 18.463531494140625, 11.024333953857422, -11.094467163085938, 6.6585540771484375, 4.451850891113281, 15.480888366699219, 5.220943450927734, 8.255905151367188, -0.08626556396484375, 3.508068084716797, 9.974098205566406, 6.165014266967773, 3.686176300048828, -1.2820281982421875, 16.015640258789062, -3.433258056640625, 3.7512855529785156, 20.216026306152344, 17.452274322509766, 5.5440521240234375, 0.8892288208007812, 2.553955078125, 2.154256820678711, 12.219001770019531, 16.332763671875, 6.5975341796875, 7.556375503540039, 5.093463897705078, 11.717018127441406, 8.657527923583984, -4.734870910644531, -13.002246856689453, 4.53717041015625, 2.1924667358398438, 1.6856231689453125, 17.889612197875977, -7.38720703125, 22.80017852783203, 11.7354736328125, 5.5237579345703125, -6.1049041748046875, -15.21670913696289, 8.089374542236328, 1.97021484375, 6.634119033813477, 1.2602462768554688, 18.111366271972656, 18.547761917114258], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000189.npy"}
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 7.574289321899414, "std": 8.843179702758789, "min": -11.335739135742188, "p10": -3.202770805358886, "median": 7.178079605102539, "p90": 19.095532989501955, "max": 27.616378784179688, "pos_frac": 0.796875, "sample": [6.145381927490234, 13.4365234375, 26.108016967773438, -5.26275634765625, 10.758411407470703, 8.498451232910156, 0.5344696044921875, 1.1406707763671875, -10.236063003540039, 22.778213500976562, 8.215713500976562, 15.081535339355469, 7.424074172973633, 1.3529815673828125, 3.4409408569335938, 2.4051589965820312, -0.3366279602050781, 19.0614013671875, 3.2170848846435547, 8.996257781982422, 15.923755645751953, 18.243816375732422, 1.485687255859375, 0.06534194946289062, -4.653688430786133, 5.968864440917969, -11.335739135742188, 5.078586578369141, 4.928062438964844, 8.502517700195312, 10.240577697753906, -2.598846435546875, 6.802886962890625, 9.303009033203125, 13.768447875976562, 0.2925910949707031, 19.605865478515625, -2.4655380249023438, -3.4615955352783203, 8.627235412597656, -0.0239410400390625, 15.8446044921875, 10.957267761230469, 27.616378784179688, 6.407865524291992, 1.3309249877929688, -5.563142776489258, 13.106353759765625, -5.407135009765625, -1.5592193603515625, 6.932085037231445, 9.933662414550781, 2.9297828674316406, 11.31915283203125, 14.486953735351562, 17.091278076171875, 24.50640869140625, 4.5137939453125, 16.08237075805664, 19.11016082763672, -2.1164474487304688, 20.24908447265625, 14.562042236328125, 15.362545013427734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000190.npy"}
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 5.71037483215332, "std": 9.270731925964355, "min": -20.806838989257812, "p10": -4.1574113845825185, "median": 5.065978050231934, "p90": 16.315205383300782, "max": 26.688507080078125, "pos_frac": 0.8125, "sample": [14.957855224609375, -3.1901302337646484, 3.92889404296875, 12.505577087402344, 24.89687728881836, 1.266876220703125, -0.0250701904296875, 21.443626403808594, 5.119787216186523, 3.2809524536132812, 3.2093353271484375, 26.688507080078125, 1.8597640991210938, 1.3172988891601562, 8.640960693359375, 3.4976062774658203, -16.181304931640625, 2.2857589721679688, 16.453857421875, 7.15168571472168, 9.422889709472656, 12.8604736328125, 13.338184356689453, 13.109283447265625, -8.385734558105469, 14.372467041015625, 7.188407897949219, 7.515228271484375, -4.57196044921875, 0.6709232330322266, 9.433618545532227, 0.17343902587890625, -1.8792438507080078, 1.6121635437011719, 21.970687866210938, -20.806838989257812, 0.15603256225585938, -7.8234405517578125, 10.18686294555664, 13.157554626464844, 9.889060974121094, 10.965709686279297, 4.0305633544921875, 11.330215454101562, 10.356298446655273, 2.9366207122802734, 15.426445007324219, 17.227169036865234, -2.6584625244140625, 1.23199462890625, -3.1786022186279297, 5.160760879516602, 18.27001953125, 5.012168884277344, -9.256855010986328, 9.441925048828125, 12.685226440429688, -14.343948364257812, 8.698556900024414, 0.3940887451171875, 0.6898117065429688, 15.991683959960938, 1.9378433227539062, 2.415985107421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000191.npy"}
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 4.094944953918457, "std": 8.16848373413086, "min": -21.883018493652344, "p10": -4.267369079589844, "median": 3.661654472351074, "p90": 15.010435676574712, "max": 24.595073699951172, "pos_frac": 0.65625, "sample": [-21.883018493652344, -7.903724670410156, -0.11608505249023438, 7.451244354248047, 11.59560775756836, 1.4497261047363281, -4.383613586425781, 10.894332885742188, 17.514358520507812, -2.836090087890625, -6.988933563232422, -3.9961318969726562, -2.795482635498047, 0.44928741455078125, 2.34356689453125, 7.650299072265625, 15.71023178100586, 4.247222900390625, -0.21192550659179688, 1.9533538818359375, 15.585912704467773, 19.964195251464844, -2.0188941955566406, -1.7056198120117188, 13.667655944824219, -0.45285797119140625, 0.96697998046875, 8.863941192626953, 3.08892822265625, 3.576618194580078, 9.490104675292969, 11.026824951171875, 6.302778244018555, 13.634767532348633, -1.4925346374511719, -1.0419197082519531, 3.9785308837890625, -0.19502830505371094, -2.6694374084472656, 3.8670425415039062, 5.024421691894531, 1.1012382507324219, -11.73367691040039, 24.595073699951172, 10.392898559570312, 6.873992919921875, -3.9846038818359375, 3.7466907501220703, 4.757606506347656, -7.9734954833984375, 5.0483856201171875, 17.912155151367188, 1.4903030395507812, 8.727256774902344, -1.2465229034423828, 9.059104919433594, 12.66610336303711, -1.44842529296875, 13.532325744628906, 17.31195831298828, 8.201135635375977, -5.423513412475586, 2.7174606323242188, 6.146404266357422], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000192.npy"}
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 7.2386627197265625, "std": 8.880001068115234, "min": -18.021259307861328, "p10": -2.255194854736328, "median": 5.961713790893555, "p90": 19.547920608520506, "max": 27.76995086669922, "pos_frac": 0.8125, "sample": [1.6399803161621094, 4.057718276977539, 11.168439865112305, 3.891826629638672, 5.169984817504883, -3.7393035888671875, -1.45831298828125, 17.870506286621094, 0.4094085693359375, 0.029607772827148438, 18.086685180664062, 1.8434009552001953, 13.962966918945312, 8.692413330078125, 19.46080780029297, -0.02069854736328125, 6.183879852294922, 2.4036483764648438, 11.221641540527344, 13.53945541381836, -3.334257125854492, 21.36157989501953, 27.76995086669922, -3.589893341064453, 8.692756652832031, 17.94776153564453, 2.06195068359375, 8.070941925048828, -2.323944091796875, 8.468713760375977, 19.585254669189453, 7.1530303955078125, 2.065601348876953, 12.122360229492188, 8.958648681640625, 2.82354736328125, -2.0947799682617188, -0.6018638610839844, 10.677215576171875, 5.7395477294921875, -4.9686279296875, 2.668262481689453, 17.752140045166016, 22.81951904296875, -2.674640655517578, 2.5281829833984375, 7.510993957519531, -1.270254135131836, 2.003917694091797, 19.756813049316406, 12.414566040039062, 0.3856849670410156, 24.188201904296875, 24.383291244506836, 18.467533111572266, 6.7417755126953125, 11.897445678710938, -18.021259307861328, 8.074935913085938, 18.189300537109375, 2.4712791442871094, 0.7955875396728516, 1.0183334350585938, 0.17327880859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000193.npy"}
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 5.805791854858398, "std": 10.726485252380371, "min": -20.73938751220703, "p10": -8.558594131469725, "median": 6.431939125061035, "p90": 19.17971363067627, "max": 25.274648666381836, "pos_frac": 0.75, "sample": [23.465538024902344, 6.069709777832031, 11.9345703125, -9.147041320800781, 11.401130676269531, 2.24224853515625, -12.809537887573242, 2.2371139526367188, 1.1765365600585938, 9.490570068359375, 5.435577392578125, 9.097305297851562, 5.3432159423828125, 15.944686889648438, -15.343826293945312, -4.291709899902344, 11.560928344726562, -7.1107635498046875, 5.320747375488281, -2.2706260681152344, 10.096450805664062, 3.7100753784179688, 14.529945373535156, 15.962074279785156, -4.390419006347656, 8.404838562011719, -2.9255905151367188, 14.4725341796875, 0.009368896484375, 12.286964416503906, 1.7789764404296875, 17.660001754760742, -10.6802978515625, -7.185550689697266, -17.85236358642578, -1.4784564971923828, 24.857587814331055, 9.316879272460938, 13.076396942138672, 19.40863609313965, -17.943389892578125, 7.607826232910156, 6.794168472290039, 2.8625030517578125, 16.891082763671875, 25.274648666381836, 10.944833755493164, -20.73938751220703, 22.34454345703125, 13.592132568359375, 12.812368392944336, -1.3779296875, 9.343536376953125, 20.53372573852539, 18.64556121826172, 19.565841674804688, 11.128215789794922, 0.302154541015625, 0.6545925140380859, 13.868499755859375, 3.4984188079833984, 2.3852500915527344, -2.0034351348876953, 3.780466079711914], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000194.npy"}
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 7.205946922302246, "std": 9.796111106872559, "min": -12.419921875, "p10": -4.8455078124999975, "median": 6.254526138305664, "p90": 22.34642276763916, "max": 27.329437255859375, "pos_frac": 0.734375, "sample": [3.1893539428710938, 6.173761367797852, -1.5590133666992188, 6.676483154296875, 25.542505264282227, 11.74713134765625, -0.729583740234375, 6.435333251953125, 7.9838104248046875, 24.2022705078125, 12.246986389160156, 13.772649765014648, -6.8544921875, -5.739295959472656, -8.69110107421875, -12.419921875, -1.3876762390136719, 8.816585540771484, 6.335290908813477, 17.300527572631836, -1.7865753173828125, 4.685279846191406, 11.28001594543457, 11.748947143554688, 5.957466125488281, 5.312156677246094, 22.411596298217773, 22.62885284423828, 2.6532058715820312, 9.530082702636719, 4.755039215087891, -0.6728057861328125, 25.71487808227539, 8.379478454589844, 22.194351196289062, -0.8242416381835938, -1.3767967224121094, 19.347591400146484, 12.890533447265625, 17.934127807617188, 5.667537689208984, -2.7600021362304688, -2.0266036987304688, -0.357513427734375, 1.6225433349609375, 2.640928268432617, 2.16192626953125, 20.54987335205078, -12.225013732910156, 0.3320598602294922, -7.293918609619141, 0.45782470703125, 27.329437255859375, 17.343717575073242, 12.802749633789062, 13.33082389831543, 12.419479370117188, 16.6939697265625, -8.433853149414062, 22.557159423828125, 6.6581878662109375, 3.1545867919921875, 3.2261505126953125, 9.523740768432617], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000195.npy"}
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 6.163267135620117, "std": 9.294116020202637, "min": -26.674457550048828, "p10": -3.512971305847168, "median": 5.536933898925781, "p90": 19.185398864746094, "max": 25.756973266601562, "pos_frac": 0.765625, "sample": [2.1927947998046875, -26.674457550048828, -3.5622501373291016, 2.0116424560546875, 5.4412384033203125, -6.858407974243164, -0.11187744140625, 5.63262939453125, 5.84442138671875, -3.3979873657226562, 2.498077392578125, 20.630226135253906, 17.54694366455078, 9.078262329101562, 8.891864776611328, 3.7163925170898438, -11.788703918457031, 19.193267822265625, -2.497020721435547, 19.167037963867188, -1.7466907501220703, 4.997001647949219, 22.439254760742188, 1.1145248413085938, -3.361663818359375, 4.146453857421875, 6.1986846923828125, 5.977348327636719, 7.1898345947265625, 25.756973266601562, 5.162967681884766, 14.662704467773438, 2.2985076904296875, -3.80615234375, -1.8280715942382812, 11.567176818847656, 1.9607524871826172, 10.069231033325195, 6.779869079589844, 19.21387481689453, 10.980819702148438, 6.449554443359375, 15.28335189819336, 2.927215576171875, 14.901283264160156, -7.938934326171875, 0.6920452117919922, 4.34271240234375, 7.80877685546875, 19.911697387695312, 16.101852416992188, 15.679594039916992, -1.3318748474121094, -1.5577392578125, 9.203065872192383, 14.416229248046875, 17.010740280151367, 11.245468139648438, 2.380573272705078, 1.3545608520507812, 3.1180038452148438, -5.2277679443359375, 8.425285339355469, 22.525909423828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000196.npy"}
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 7.7070698738098145, "std": 12.405313491821289, "min": -23.78835678100586, "p10": -10.86921405792236, "median": 6.840253829956055, "p90": 22.02467212677002, "max": 29.351341247558594, "pos_frac": 0.765625, "sample": [-2.304309844970703, 27.844337463378906, 6.333202362060547, 2.2206573486328125, 4.662109375, 11.626594543457031, -12.233829498291016, -18.828414916992188, 2.92193603515625, -2.6969375610351562, 19.643844604492188, 22.134992599487305, -12.273468017578125, -12.152626037597656, 6.752880096435547, 20.273040771484375, 11.153488159179688, 2.861713409423828, 29.351341247558594, 23.736053466796875, 3.449127197265625, 6.9276275634765625, 17.596858978271484, 11.950469970703125, 11.869865417480469, 2.46435546875, 5.451446533203125, -7.87458610534668, 20.290273666381836, 12.309471130371094, 20.682640075683594, 21.744384765625, -16.40496826171875, 19.79728126525879, -4.2587738037109375, -23.78835678100586, 4.5396881103515625, 8.097518920898438, 21.767257690429688, 11.52947998046875, -0.6187343597412109, 20.740177154541016, 22.885841369628906, 0.480194091796875, -1.7650680541992188, 25.671905517578125, 1.451904296875, 0.41565704345703125, 4.677925109863281, 19.558412551879883, -5.780851364135742, -3.0500030517578125, 14.855667114257812, 16.767457962036133, 3.48068904876709, -14.351055145263672, 28.15290069580078, 15.551681518554688, 1.5745372772216797, 13.437946319580078, 4.256805419921875, 16.4957275390625, 8.001134872436523, 21.19394874572754], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000197.npy"}
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 8.412599563598633, "std": 9.481371879577637, "min": -14.364089965820312, "p10": -3.781833648681639, "median": 8.425399780273438, "p90": 19.55833740234375, "max": 31.542558670043945, "pos_frac": 0.796875, "sample": [-2.099578857421875, 17.85907745361328, 0.11088752746582031, 9.226249694824219, 6.9409332275390625, 11.317329406738281, -5.006448745727539, 6.867500305175781, 5.211404800415039, 6.137451171875, 16.43304443359375, 6.379913330078125, -0.15198707580566406, -9.112688064575195, 31.542558670043945, 23.595703125, 9.84063720703125, 6.06854248046875, 9.38238525390625, -0.5446720123291016, 13.718658447265625, 13.660848617553711, 20.607152938842773, -7.785848617553711, 23.475753784179688, -12.469375610351562, -5.21240234375, -0.11347389221191406, 19.440460205078125, 13.797037124633789, 15.285049438476562, 14.273834228515625, 9.429855346679688, -14.364089965820312, 7.9127197265625, -0.9833908081054688, 1.1740646362304688, 2.281024932861328, -4.502799987792969, 7.6197357177734375, 7.2025299072265625, 15.92864990234375, 19.173080444335938, 11.831659317016602, 13.851964950561523, 7.4965667724609375, 0.035808563232421875, 24.28712272644043, 30.656394958496094, 8.22119140625, 4.5530242919921875, 2.2694931030273438, 14.113418579101562, 19.158071517944336, 5.077239990234375, 9.1800537109375, 10.66668701171875, 10.763124465942383, -1.0040740966796875, 7.900341033935547, 8.629608154296875, 9.199527740478516, 19.608856201171875, 12.362964630126953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000198.npy"}
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 6.392084121704102, "std": 12.372413635253906, "min": -22.267078399658203, "p10": -9.750880050659179, "median": 8.71673583984375, "p90": 20.39656219482422, "max": 30.192073822021484, "pos_frac": 0.6875, "sample": [15.075054168701172, 15.317680358886719, 3.4758453369140625, 8.320941925048828, -8.115594863891602, -15.657154083251953, 6.833641052246094, 20.844863891601562, 12.545272827148438, 8.273727416992188, 13.000762939453125, -4.135490417480469, 30.192073822021484, 16.33098602294922, 27.997516632080078, 20.829498291015625, 17.263065338134766, 17.601776123046875, 13.723175048828125, 19.324295043945312, -3.5046768188476562, -15.420368194580078, 14.997570037841797, 18.882858276367188, -2.750974655151367, -0.0928802490234375, 9.695564270019531, -5.589561462402344, 14.06414794921875, 14.544517517089844, 11.169364929199219, 2.086301803588867, 16.356399536132812, -2.8952255249023438, -0.8536615371704102, 2.3450927734375, 4.968414306640625, -9.864131927490234, -12.747230529785156, 11.553733825683594, -22.267078399658203, 9.63623046875, 2.3827285766601562, 18.81017303466797, -18.285537719726562, 9.182823181152344, 9.112529754638672, 4.8865203857421875, -4.1630706787109375, 2.1680221557617188, 0.20541763305664062, -21.946090698242188, -7.419891357421875, 17.546947479248047, -9.486625671386719, -0.89501953125, 20.671051025390625, -3.4158477783203125, 19.756088256835938, 12.54465103149414, 26.074424743652344, 0.9659042358398438, 25.978515625, 11.063297271728516], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000199.npy"}
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 9.212747573852539, "std": 12.946781158447266, "min": -29.014892578125, "p10": -5.218550777435302, "median": 8.064777374267578, "p90": 28.63356094360352, "max": 30.946380615234375, "pos_frac": 0.78125, "sample": [4.62550163269043, 19.519298553466797, 28.9495849609375, 5.5308685302734375, 8.214797973632812, 1.976675033569336, 25.52484130859375, 4.933082580566406, 7.053016662597656, -3.62689208984375, 2.05987548828125, 16.2119140625, -8.538372039794922, 30.562454223632812, 1.5237541198730469, -1.7575759887695312, 5.682830810546875, -1.2045059204101562, 27.89617156982422, -9.352859497070312, 1.0586929321289062, 0.39569091796875, 13.811653137207031, 20.49996566772461, 1.409036636352539, 14.562164306640625, 19.804105758666992, -7.6838836669921875, 17.38614273071289, -0.9476432800292969, 24.63764190673828, 16.618507385253906, 8.06695556640625, 29.589813232421875, 30.468368530273438, -3.1595401763916016, -29.014892578125, 15.908817291259766, 30.829925537109375, 23.773941040039062, 21.454063415527344, 12.684173583984375, 30.946380615234375, 7.826271057128906, 8.062599182128906, -19.029205322265625, 2.5041885375976562, 5.39471435546875, 9.880060195922852, 30.73211669921875, 0.868133544921875, 10.718231201171875, 8.29718017578125, -5.0328369140625, 8.344673156738281, 14.29559326171875, -5.662984848022461, -4.901145935058594, 11.127338409423828, 5.758136749267578, 0.4681243896484375, -5.298142433166504, 22.586517333984375, 23.791717529296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000200.npy"}
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 7.742550849914551, "std": 12.095890045166016, "min": -19.504501342773438, "p10": -9.108273315429686, "median": 6.629125595092773, "p90": 25.780061721801765, "max": 34.31475067138672, "pos_frac": 0.765625, "sample": [4.4684600830078125, 10.800567626953125, 7.7847442626953125, 6.588512420654297, 0.5194034576416016, 9.502449035644531, 18.335338592529297, 4.304494857788086, 10.354698181152344, -0.6336135864257812, 1.5494155883789062, 16.537769317626953, 18.996442794799805, 15.055953979492188, 0.5001316070556641, -8.349441528320312, 34.31475067138672, 24.07317352294922, 0.03033447265625, 30.195884704589844, -9.433486938476562, 3.0069808959960938, 7.821495056152344, -10.56753921508789, 2.209115982055664, 26.511585235595703, 14.068984985351562, 12.973854064941406, -9.896537780761719, 23.410560607910156, 2.3640384674072266, 26.512405395507812, 2.7083568572998047, -4.24237060546875, 17.554550170898438, 16.62649917602539, -1.3471183776855469, -11.063346862792969, 0.722015380859375, 5.5421142578125, 18.5849609375, 8.741500854492188, -2.977323532104492, 17.52203369140625, -4.435626983642578, 18.011383056640625, 8.723274230957031, -13.642486572265625, 5.85169792175293, 28.528182983398438, -2.403606414794922, 6.66973876953125, 10.18698501586914, 17.868022918701172, 12.31972885131836, 30.859939575195312, 17.23183822631836, -5.254341125488281, 1.52642822265625, -9.878326416015625, 27.580951690673828, 1.06951904296875, 1.9316558837890625, -19.504501342773438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000201.npy"}
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 9.028682708740234, "std": 12.040160179138184, "min": -22.599403381347656, "p10": -4.805844116210937, "median": 10.16712760925293, "p90": 21.987000846862795, "max": 36.7261962890625, "pos_frac": 0.78125, "sample": [-4.8444976806640625, 7.05389404296875, 33.86820602416992, 36.7261962890625, 22.120439529418945, -2.4508590698242188, 14.507457733154297, 17.432327270507812, 18.387855529785156, 13.937267303466797, 2.7150325775146484, 15.427978515625, 3.087383270263672, 6.061851501464844, -2.8051300048828125, 16.900344848632812, 21.675643920898438, 1.75390625, 15.420211791992188, -4.7156524658203125, 8.633522033691406, 8.487186431884766, 16.468246459960938, 4.228515625, -5.147754669189453, 18.935012817382812, 5.093626022338867, 33.05427932739258, -1.0339508056640625, 18.03765296936035, -1.4943199157714844, 10.079011917114258, 12.147693634033203, 13.088821411132812, 3.089263916015625, -22.599403381347656, 16.111175537109375, 10.255243301391602, 15.828269958496094, -16.934181213378906, 9.645431518554688, 10.368820190429688, 12.461376190185547, 7.141761779785156, 22.600631713867188, 8.148368835449219, 20.732444763183594, 4.085844039916992, 23.974777221679688, -2.069580078125, 0.6650333404541016, -17.574111938476562, -0.9801044464111328, -6.33167839050293, 3.3567123413085938, 14.725723266601562, 22.880477905273438, 1.4582786560058594, 13.208744049072266, 16.19615936279297, -19.24279022216797, 21.02678680419922, 11.131780624389648, 21.637046813964844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000202.npy"}
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 8.2492036819458, "std": 12.291041374206543, "min": -20.6468505859375, "p10": -4.898927307128906, "median": 6.153125762939453, "p90": 25.11334495544434, "max": 38.002899169921875, "pos_frac": 0.765625, "sample": [6.15032958984375, 1.0502166748046875, -10.664871215820312, -4.221710205078125, 3.4989471435546875, 12.197734832763672, 0.8165283203125, 12.197998046875, 5.7790679931640625, -4.0967559814453125, 20.906890869140625, 9.779197692871094, 9.28868293762207, 31.8572998046875, 2.9559459686279297, 38.002899169921875, 25.52019500732422, 2.9353256225585938, 20.89852523803711, 11.505813598632812, -20.6468505859375, -1.3719291687011719, 12.598365783691406, -2.5669898986816406, -1.115011215209961, -7.23651123046875, 6.1377716064453125, 30.860671997070312, 2.8673782348632812, 16.11811065673828, 7.6600494384765625, 4.273231506347656, 24.16402816772461, 7.515485763549805, 15.821121215820312, 0.0346832275390625, 25.532554626464844, 13.785585403442383, 26.317459106445312, 7.697134017944336, 3.0256805419921875, -2.18878173828125, 3.7674026489257812, 9.204204559326172, 19.577301025390625, -1.6082725524902344, 23.7947998046875, 23.679168701171875, -5.1891632080078125, 34.929100036621094, 12.886943817138672, 2.378448486328125, 0.894287109375, 5.6574249267578125, 7.2498779296875, -13.381292343139648, -0.08284759521484375, 6.155921936035156, 22.36003875732422, 18.300443649291992, 15.379203796386719, -13.969451904296875, -9.837215423583984, 0.16120529174804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000203.npy"}
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 8.533686637878418, "std": 12.715483665466309, "min": -26.09783935546875, "p10": -4.048527717590332, "median": 7.063482284545898, "p90": 28.731735420227054, "max": 35.58641052246094, "pos_frac": 0.734375, "sample": [31.699501037597656, 23.912734985351562, -6.54850959777832, 6.9065399169921875, 7.108219146728516, 35.58641052246094, 2.0620269775390625, 11.714094161987305, -8.360809326171875, 7.4725341796875, 30.676986694335938, -16.075332641601562, -4.092624664306641, 7.018745422363281, 9.260055541992188, 10.425827026367188, 15.027057647705078, 3.2626419067382812, -14.401371002197266, -0.072784423828125, 17.143762588500977, 6.264244079589844, 15.129005432128906, 24.48602294921875, -1.0896930694580078, -3.4788360595703125, 10.85040283203125, 9.868770599365234, 2.6571807861328125, 6.0080718994140625, 29.04646110534668, -3.5231399536132812, 5.187517166137695, 12.711204528808594, 14.854820251464844, -2.9812355041503906, 13.686714172363281, 32.21971130371094, -3.9456348419189453, 5.474266052246094, 3.8041839599609375, 3.0788536071777344, 21.730194091796875, 24.714874267578125, -7.850929260253906, 6.778251647949219, 13.004386901855469, -1.6743488311767578, 19.08416748046875, -0.7590370178222656, 27.99737548828125, 29.840301513671875, 1.0777873992919922, 17.68267822265625, 9.405380249023438, 30.751461029052734, 12.225112915039062, 1.1214828491210938, 0.5558013916015625, 13.101608276367188, -26.09783935546875, -3.779050827026367, 8.963376998901367, -1.7516937255859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000204.npy"}
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 8.687185287475586, "std": 13.984874725341797, "min": -26.46457290649414, "p10": -7.576078033447265, "median": 9.148402214050293, "p90": 27.23662338256836, "max": 40.672950744628906, "pos_frac": 0.734375, "sample": [38.62060546875, 3.8084468841552734, -0.11912155151367188, 6.708824157714844, 10.210790634155273, 36.82962417602539, 9.000158309936523, 15.655227661132812, -25.78839111328125, 24.390213012695312, 3.7228527069091797, 3.683177947998047, 40.672950744628906, 11.269914627075195, 11.057624816894531, 1.2671356201171875, 7.373771667480469, -8.265853881835938, -4.368804931640625, 14.196945190429688, 5.850242614746094, 13.668083190917969, 1.6893730163574219, 7.83740234375, 18.416954040527344, -2.0282516479492188, -7.8712921142578125, -1.1751556396484375, 2.8512611389160156, 31.873695373535156, -2.6370849609375, -5.92889404296875, 13.821868896484375, 9.515266418457031, -2.2249374389648438, 14.190704345703125, 13.015623092651367, 11.908950805664062, -1.951934814453125, 20.584793090820312, -0.6945343017578125, 23.632179260253906, 18.050735473632812, 15.919631958007812, 32.19768524169922, 27.514312744140625, 0.33208465576171875, 9.467994689941406, 24.072715759277344, -8.510368347167969, 9.296646118164062, 7.974922180175781, 17.03961181640625, 12.316566467285156, -20.3751220703125, 27.21002197265625, 19.04135513305664, -7.667198181152344, 27.248023986816406, -7.36346435546875, 1.7286300659179688, -26.46457290649414, 0.6265335083007812, 12.052680969238281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000205.npy"}
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 6.389400482177734, "std": 14.800884246826172, "min": -24.843643188476562, "p10": -12.337408447265622, "median": 4.417427062988281, "p90": 28.552610778808603, "max": 33.656394958496094, "pos_frac": 0.65625, "sample": [-7.845922470092773, -0.7134552001953125, 4.2675323486328125, -21.275047302246094, -8.982383728027344, -2.0424270629882812, 3.79437255859375, 9.428668975830078, 19.075462341308594, 8.442604064941406, 1.8990840911865234, 32.097808837890625, 12.888038635253906, -2.718111038208008, -8.169898986816406, 16.105438232421875, -2.8981857299804688, 29.478538513183594, -18.369476318359375, 5.281152725219727, 33.656394958496094, 14.23947525024414, 2.564849853515625, 4.3484039306640625, -5.129371643066406, 26.392112731933594, 7.04676628112793, -20.33429527282715, 16.41554069519043, 21.554298400878906, 21.604156494140625, 30.5092716217041, 1.4782943725585938, 6.607828140258789, 25.336196899414062, -2.14617919921875, -24.843643188476562, -13.775276184082031, -4.822534561157227, 1.1176605224609375, 14.457807540893555, 1.1225013732910156, 18.248123168945312, -6.822357177734375, -2.0676727294921875, -17.138996124267578, 29.59484100341797, 19.08544921875, 3.9278335571289062, 4.4864501953125, 18.524810791015625, 23.85100555419922, 30.634193420410156, -8.42911148071289, 15.557670593261719, 24.926918029785156, 5.040283203125, -1.0137481689453125, 4.8644561767578125, 30.743850708007812, -20.919050216674805, -2.1616363525390625, 0.2590446472167969, 10.585220336914062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000206.npy"}
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 9.291353225708008, "std": 13.449788093566895, "min": -17.244415283203125, "p10": -5.84473876953125, "median": 6.4235992431640625, "p90": 29.478859710693367, "max": 39.2471923828125, "pos_frac": 0.75, "sample": [39.2471923828125, -0.3101177215576172, -2.9699649810791016, 2.2021484375, 12.080970764160156, 11.045305252075195, 1.326507568359375, 4.319480895996094, 1.8933181762695312, 27.934829711914062, 8.703849792480469, 2.90399169921875, -1.2171897888183594, 30.140586853027344, -2.633577346801758, -5.826751708984375, 9.536941528320312, 17.83660888671875, 5.7168121337890625, 24.472362518310547, 27.770816802978516, 6.9596099853515625, 15.996994018554688, 4.59173583984375, -7.950067520141602, -6.46807861328125, -3.9040584564208984, 32.90038299560547, -5.852447509765625, 6.6228179931640625, 27.519775390625, 4.622859954833984, 36.000152587890625, -2.3733901977539062, 6.2243804931640625, -3.4730186462402344, -11.209249496459961, 15.259841918945312, 7.45697021484375, 1.8620128631591797, 16.204086303710938, 23.794593811035156, 4.0117034912109375, 12.3653564453125, 0.4320869445800781, 18.70618438720703, 14.530723571777344, -16.18756103515625, 24.10251808166504, 12.836944580078125, -17.244415283203125, 18.369600296020508, 0.1251678466796875, 11.42724609375, 4.547637939453125, 36.36824035644531, 33.38835906982422, 3.94219970703125, 11.129837036132812, 13.826910018920898, -9.066238403320312, -2.054901123046875, 4.870994567871094, 35.25802230834961], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000207.npy"}
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 5.523914813995361, "std": 13.268763542175293, "min": -27.29238510131836, "p10": -10.179047393798825, "median": 5.249968528747559, "p90": 23.92555732727051, "max": 31.299171447753906, "pos_frac": 0.609375, "sample": [10.440164566040039, 0.9579792022705078, 21.239206314086914, -4.963323593139648, 26.817230224609375, 24.090591430664062, 1.6637248992919922, -2.8483734130859375, -12.623443603515625, -22.18315887451172, -1.7631607055664062, -11.167816162109375, 23.540477752685547, 12.881851196289062, 15.336360931396484, 3.505016326904297, 7.398597717285156, 10.642059326171875, -3.9673404693603516, -8.210975646972656, 17.49480438232422, -6.626792907714844, 1.4681072235107422, -4.0074310302734375, 13.417936325073242, 28.567138671875, 21.4241943359375, -5.14952278137207, 7.764167785644531, 16.153413772583008, -4.715818405151367, -0.9543304443359375, -7.326530456542969, 5.5691986083984375, -19.227699279785156, 16.05333709716797, 24.10165786743164, 13.71881103515625, 5.714683532714844, 0.6876945495605469, 31.299171447753906, 8.650711059570312, 5.872779846191406, -2.9962921142578125, 22.357635498046875, -2.043853759765625, 28.70440673828125, -2.4990692138671875, 16.335256576538086, 4.93073844909668, -11.022506713867188, 17.985084533691406, -0.46006011962890625, 16.055801391601562, -7.897151947021484, 10.945587158203125, 9.238082885742188, -5.525911331176758, -16.50941276550293, -1.3359870910644531, -27.29238510131836, 4.447303771972656, 13.384986877441406, 25.992942810058594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000208.npy"}
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 11.015926361083984, "std": 16.071937561035156, "min": -35.200157165527344, "p10": -5.840234375, "median": 9.342456817626953, "p90": 30.973474884033205, "max": 40.48914337158203, "pos_frac": 0.796875, "sample": [28.979145050048828, 29.459720611572266, 4.2844696044921875, 9.333381652832031, -9.797088623046875, 35.74078369140625, 11.252874374389648, 28.39244842529297, 4.593177795410156, 36.45747375488281, -4.890586853027344, 5.6274566650390625, -10.1273193359375, 24.797454833984375, 0.0578460693359375, 7.146186828613281, -1.9289093017578125, -0.7358207702636719, -1.8834877014160156, 39.397796630859375, 1.9254837036132812, -10.405437469482422, 0.4651527404785156, 25.8223876953125, 5.088478088378906, 5.432060241699219, 19.23858642578125, 20.966033935546875, 32.8309326171875, 13.583984375, 20.41200828552246, 12.643218994140625, 4.326908111572266, 30.36022186279297, 30.283645629882812, 0.8033447265625, 31.236297607421875, 14.270706176757812, -35.200157165527344, 6.634857177734375, -5.612213134765625, 1.9219036102294922, -3.2462387084960938, -5.937957763671875, 29.293670654296875, 14.313846588134766, 40.48914337158203, 27.159439086914062, 11.819198608398438, 24.174808502197266, 4.570499420166016, -13.485136032104492, 3.483316421508789, 14.661178588867188, 8.36724853515625, 2.316793441772461, 21.430469512939453, -33.046173095703125, 9.351531982421875, 13.473678588867188, 28.155105590820312, 34.48902893066406, 9.358654022216797, 0.6418418884277344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000209.npy"}
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 10.645635604858398, "std": 15.620795249938965, "min": -19.52367401123047, "p10": -9.438563919067379, "median": 7.50532341003418, "p90": 32.69986419677735, "max": 41.78873825073242, "pos_frac": 0.75, "sample": [33.73162078857422, 6.1435089111328125, 3.706939697265625, 13.630805969238281, -1.8349266052246094, 18.529754638671875, 41.78873825073242, 17.415130615234375, 36.052879333496094, 33.929412841796875, 2.0913352966308594, 33.90611267089844, 32.839019775390625, -1.6110992431640625, 32.37516784667969, 2.5910491943359375, -16.841344833374023, -19.52367401123047, 31.257186889648438, -10.895393371582031, 32.21849060058594, 15.318618774414062, 5.053377151489258, -0.1607818603515625, 30.130287170410156, 1.3699722290039062, -0.9312400817871094, 7.9084014892578125, 30.02984619140625, -5.361991882324219, 29.34056854248047, 1.225351333618164, 9.971160888671875, 6.7397308349609375, 11.63235092163086, -4.625831604003906, 1.7631645202636719, -6.039295196533203, -1.2367706298828125, 29.83548927307129, -15.584543228149414, 14.5, 8.630844116210938, 4.408134460449219, 23.652565002441406, -0.42598533630371094, 14.372243881225586, 2.760784149169922, 8.507904052734375, 3.908416748046875, 10.666996002197266, 23.85630989074707, 7.102245330810547, -15.291688919067383, 4.223655700683594, 36.991676330566406, 21.408973693847656, 22.851219177246094, 21.61549186706543, -18.65502166748047, 2.1976318359375, -12.756355285644531, 5.613677978515625, 23.3023681640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000210.npy"}
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 10.028705596923828, "std": 14.743659973144531, "min": -21.1412353515625, "p10": -6.913178634643554, "median": 6.06852912902832, "p90": 31.711694717407227, "max": 47.612457275390625, "pos_frac": 0.765625, "sample": [0.7714633941650391, -21.1412353515625, 15.048866271972656, 5.270893096923828, 0.8101806640625, 1.4433441162109375, -6.1288909912109375, -1.2318572998046875, 9.12298583984375, 2.5713062286376953, 19.29962158203125, -0.5267753601074219, 16.978866577148438, 30.72015380859375, 7.409236907958984, 15.607608795166016, 1.6898422241210938, -0.5594520568847656, 27.466827392578125, -9.08681869506836, 4.064060211181641, -8.060821533203125, 17.812580108642578, -7.439361572265625, 2.8394737243652344, -14.954610824584961, 1.5788345336914062, 6.8661651611328125, 37.15756607055664, 8.303064346313477, 19.650856018066406, 4.355194091796875, 10.426668167114258, 3.2909469604492188, 33.06257629394531, 15.5250244140625, 9.93121337890625, 3.6248931884765625, -7.249301910400391, 18.13543701171875, 5.269474029541016, 47.612457275390625, 37.301239013671875, 3.4071483612060547, -0.45900726318359375, 24.43935775756836, 31.813060760498047, 18.951904296875, 11.935890197753906, 35.56770324707031, -17.621620178222656, 9.086200714111328, 31.475173950195312, 29.153568267822266, 26.836669921875, 26.80899429321289, 5.065406799316406, 15.937171936035156, -3.2970104217529297, 34.02958679199219, -2.1384239196777344, -5.725440979003906, 0.4857139587402344, 1.4452667236328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000211.npy"}
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 11.304862976074219, "std": 15.86205768585205, "min": -37.3026123046875, "p10": -6.287658309936521, "median": 11.726658821105957, "p90": 32.72035102844239, "max": 41.16373825073242, "pos_frac": 0.796875, "sample": [32.861846923828125, 22.021663665771484, 23.519657135009766, 32.390193939208984, 2.379301071166992, 3.0094375610351562, 41.16373825073242, 16.05720329284668, -3.0301437377929688, 27.75066375732422, -2.836009979248047, -37.3026123046875, 28.53748893737793, 39.012542724609375, 7.9035491943359375, 24.12175750732422, 26.823410034179688, 15.59566879272461, -15.791343688964844, 19.829824447631836, 12.310585021972656, 1.1740341186523438, 0.7519893646240234, 9.077476501464844, -21.907257080078125, 1.8404388427734375, 11.252410888671875, -2.448648452758789, 31.89420509338379, 34.00389099121094, 19.551349639892578, 15.52829360961914, 22.434425354003906, 10.852088928222656, 14.850822448730469, 5.273826599121094, -4.457798004150391, 15.603904724121094, 0.8779830932617188, 13.568553924560547, 0.9826812744140625, 2.3177947998046875, 0.8972129821777344, -3.00421142578125, 12.200906753540039, 23.32862091064453, 13.509113311767578, 38.869911193847656, -12.247264862060547, -7.0718841552734375, -1.1538639068603516, 1.4499893188476562, 20.25885009765625, -7.378440856933594, 18.894180297851562, 33.124366760253906, 29.251731872558594, 34.48706817626953, 24.882278442382812, 0.8679122924804688, 5.4967803955078125, 7.8366546630859375, -11.551185607910156, 1.2116317749023438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000212.npy"}
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 12.911934852600098, "std": 16.940160751342773, "min": -38.931297302246094, "p10": -5.101603126525877, "median": 9.77551555633545, "p90": 35.117481613159185, "max": 51.164276123046875, "pos_frac": 0.78125, "sample": [27.13268280029297, 3.0288162231445312, 5.471343994140625, 30.243141174316406, 12.972991943359375, 6.122804641723633, -6.952606201171875, -2.965465545654297, -0.98687744140625, 26.22710418701172, 33.9771728515625, 1.6688575744628906, 3.2720794677734375, -5.830739974975586, -38.931297302246094, -8.7244873046875, 22.0424861907959, 51.164276123046875, -0.6679630279541016, 19.639450073242188, 28.81470489501953, -21.271991729736328, 34.26354217529297, 41.76976776123047, 29.360427856445312, 5.42860221862793, -1.818704605102539, -13.385093688964844, 3.8018627166748047, 6.728208541870117, -6.338809967041016, 8.02740478515625, 17.17308807373047, 10.358732223510742, 18.206863403320312, 17.204071044921875, 3.1655521392822266, 25.916534423828125, 32.05508804321289, 6.579597473144531, 5.061061859130859, 9.192298889160156, 36.094234466552734, 1.2596359252929688, 28.39702606201172, 12.14837646484375, 2.031707763671875, 23.20632553100586, 37.91064453125, 2.2692413330078125, -3.4002838134765625, 22.981048583984375, -1.5843505859375, 1.1464462280273438, 4.148807525634766, 17.922264099121094, -0.8577880859375, 39.55902099609375, 14.870819091796875, 13.360198974609375, 33.65141296386719, 36.4088249206543, 31.16016387939453, 35.483455657958984], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000213.npy"}
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 12.526634216308594, "std": 17.349206924438477, "min": -26.198822021484375, "p10": -9.135639572143553, "median": 11.55282974243164, "p90": 36.724267578125, "max": 49.70640563964844, "pos_frac": 0.703125, "sample": [20.740854263305664, 16.35492706298828, -0.1185455322265625, 8.696914672851562, 7.224540710449219, 25.363428115844727, 2.9085121154785156, 7.197582244873047, 22.67882537841797, 32.99231719970703, 35.718177795410156, 27.03494644165039, 13.799642562866211, 2.0844268798828125, 21.84362030029297, 33.72706604003906, 22.704511642456055, -22.24938201904297, 19.12529754638672, 23.005237579345703, -0.752716064453125, 19.892730712890625, 10.208641052246094, -9.512359619140625, -0.4618034362792969, 33.98554229736328, -11.654983520507812, 20.911436080932617, 39.6457633972168, 30.038589477539062, 16.850746154785156, -6.0435791015625, -5.449312210083008, 4.128837585449219, 18.121253967285156, 30.236812591552734, -14.29827880859375, -2.5691986083984375, 37.15544891357422, 49.70640563964844, 0.8762741088867188, -10.657161712646484, 29.851524353027344, 18.728172302246094, 40.32041549682617, 40.93226623535156, 6.401397705078125, -0.9102935791015625, 7.6392974853515625, -8.25662612915039, -6.147552490234375, 41.61223602294922, 5.8837890625, -2.489154815673828, 5.401378631591797, -0.689208984375, 37.466888427734375, 10.075305938720703, -26.198822021484375, 20.379318237304688, 12.897018432617188, 16.61319351196289, -2.722087860107422, -16.275827407836914], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000214.npy"}
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 12.92614459991455, "std": 18.34331703186035, "min": -26.69855499267578, "p10": -9.488687324523926, "median": 8.156318664550781, "p90": 35.651842498779295, "max": 42.46435546875, "pos_frac": 0.703125, "sample": [11.894580841064453, -12.328346252441406, -26.69855499267578, 16.908218383789062, 8.743782043457031, -16.74420166015625, 5.470703125, -3.0833358764648438, 34.14745330810547, 36.03019332885742, 5.091407775878906, -3.096446990966797, -13.228363037109375, 8.235671997070312, 16.421005249023438, 39.59783935546875, 3.5804901123046875, 31.895713806152344, 9.643722534179688, -1.3099822998046875, 6.8023681640625, 6.674888610839844, -1.0568885803222656, 26.95575714111328, 33.701866149902344, 0.7702045440673828, -3.2776260375976562, 32.032981872558594, 30.347427368164062, -9.526514053344727, -8.120975494384766, 33.410736083984375, 3.4918861389160156, -9.168106079101562, -3.70892333984375, -15.974197387695312, 24.8022518157959, 21.686538696289062, 29.3443603515625, 35.5155029296875, 4.942317962646484, 8.07696533203125, -6.198638916015625, -1.8788223266601562, 31.220794677734375, 33.875282287597656, 41.06421661376953, 41.748592376708984, 29.722183227539062, 42.46435546875, 22.490142822265625, 34.30455017089844, 29.886856079101562, -14.239883422851562, 6.132083892822266, 1.6007213592529297, 5.50823974609375, 40.614837646484375, 29.92583465576172, -0.5381412506103516, 35.71027374267578, 3.8166961669921875, 30.549110412597656, -9.40042495727539], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000215.npy"}
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 13.775141716003418, "std": 18.01058578491211, "min": -35.50201416015625, "p10": -5.782713317871092, "median": 12.325691223144531, "p90": 37.421710205078135, "max": 72.82980346679688, "pos_frac": 0.765625, "sample": [2.0006866455078125, 34.668113708496094, 18.667633056640625, 30.603721618652344, 22.28551483154297, 8.544132232666016, 14.205333709716797, 3.7341156005859375, 27.470855712890625, 34.95664978027344, 24.354965209960938, 18.945388793945312, 2.1932907104492188, 12.350082397460938, 12.220352172851562, -0.63201904296875, 7.250888824462891, 12.194435119628906, -4.7362060546875, 40.923553466796875, 22.924102783203125, 38.47816467285156, 18.31230926513672, 23.842987060546875, 19.446426391601562, 7.130130767822266, 38.4932861328125, 1.1611251831054688, 2.9196929931640625, 43.349456787109375, 27.80815887451172, -35.50201416015625, 9.489431381225586, 13.796794891357422, 10.103317260742188, 14.61761474609375, -20.031227111816406, 1.158041000366211, 20.235584259033203, -3.598388671875, -1.2038688659667969, 72.82980346679688, 38.65533447265625, -6.285408020019531, 7.111499786376953, -12.866844177246094, 15.764915466308594, 1.3822364807128906, -9.926597595214844, 23.908248901367188, 45.4124870300293, -13.03609848022461, -4.15185546875, -0.2646198272705078, 33.87348175048828, -0.4410438537597656, 12.301300048828125, 24.830482482910156, -6.2312164306640625, 34.476104736328125, -1.273050308227539, 25.445396423339844, 12.927841186523438, 12.034011840820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000216.npy"}
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 14.26774787902832, "std": 18.198497772216797, "min": -27.908607482910156, "p10": -3.8252231597900375, "median": 10.526115417480469, "p90": 40.01768112182617, "max": 52.35490417480469, "pos_frac": 0.796875, "sample": [24.238006591796875, 39.593345642089844, -6.243339538574219, 1.3642120361328125, 0.001445770263671875, -4.512668609619141, 2.571828842163086, 52.35490417480469, -0.4040985107421875, 0.611541748046875, 45.68708801269531, 46.571990966796875, 3.2968063354492188, -1.9640235900878906, 10.438232421875, 4.613044738769531, 34.670013427734375, 19.576183319091797, 8.295822143554688, 35.36158752441406, 7.303537368774414, 28.269533157348633, 3.5, 27.5347900390625, 7.868377685546875, 36.81365966796875, -27.908607482910156, -0.5688552856445312, 2.5126991271972656, 11.724994659423828, 38.039398193359375, 20.491737365722656, 8.983871459960938, -6.6808319091796875, 18.710411071777344, -14.388202667236328, 3.4956283569335938, 22.33235740661621, 0.7230491638183594, -12.730545043945312, -25.6591796875, 36.93159484863281, 14.520511627197266, -2.2211837768554688, -0.5702667236328125, 2.9163894653320312, 6.158653259277344, 10.613998413085938, 14.577232360839844, 50.6378173828125, 16.300813674926758, 30.82568359375, 7.494724273681641, 1.6515636444091797, 41.828697204589844, 19.11077880859375, 41.31927490234375, 19.872947692871094, 16.927734375, 34.70648956298828, 26.78006362915039, 40.19953918457031, 16.65247344970703, -0.5893783569335938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000217.npy"}
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 9.140449523925781, "std": 20.4138126373291, "min": -47.453590393066406, "p10": -14.403359603881835, "median": 7.469635009765625, "p90": 36.11085815429688, "max": 51.15826416015625, "pos_frac": 0.6875, "sample": [2.5126724243164062, 1.0312938690185547, 9.355270385742188, 19.398897171020508, 0.6746826171875, -4.0174407958984375, 7.931724548339844, -2.3056640625, 1.4985198974609375, 36.726287841796875, 8.969993591308594, 11.874763488769531, 33.536865234375, -33.216766357421875, -13.692825317382812, -22.523698806762695, 6.952445983886719, 39.58856964111328, -27.86962890625, -0.2767486572265625, 23.421859741210938, 13.454170227050781, 3.8997116088867188, 9.91044807434082, 24.132915496826172, 51.15826416015625, 36.411590576171875, 45.45088195800781, 20.33024024963379, 23.378585815429688, -2.9976844787597656, 28.541900634765625, 13.129905700683594, -10.496519088745117, -7.725757598876953, -8.061531066894531, 1.31414794921875, 15.430643081665039, -2.2752761840820312, 29.29718780517578, 34.49863052368164, 19.1209716796875, -2.0719757080078125, 5.963203430175781, 8.545604705810547, -14.707874298095703, 24.85247230529785, -0.29979896545410156, -17.482120513916016, -9.038352966308594, 28.209518432617188, -30.38445281982422, 13.310417175292969, 0.21311378479003906, 7.007545471191406, 32.84619140625, 4.012859344482422, 44.23114776611328, -47.453590393066406, -3.834522247314453, 16.570907592773438, 5.416465759277344, 35.409149169921875, 46.19834899902344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000218.npy"}
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 9.844168663024902, "std": 17.140901565551758, "min": -27.28802490234375, "p10": -10.55791549682617, "median": 7.322246551513672, "p90": 35.289040374755864, "max": 42.73908233642578, "pos_frac": 0.703125, "sample": [7.6966552734375, 38.8207893371582, 27.041358947753906, 5.320539474487305, 17.112281799316406, -0.9519729614257812, 2.2363319396972656, 1.2190971374511719, 7.826316833496094, -13.873329162597656, -1.5928401947021484, 42.73908233642578, 20.140594482421875, -11.272010803222656, -14.987274169921875, 14.677413940429688, 18.034828186035156, 2.487333297729492, 15.391578674316406, 1.7823352813720703, 1.5246028900146484, -8.890090942382812, -8.891693115234375, 9.204761505126953, -17.499359130859375, -3.3767547607421875, 4.464973449707031, -1.3023147583007812, 17.581207275390625, 6.947837829589844, 30.817527770996094, 36.156044006347656, 35.56848907470703, 39.68653869628906, 27.072589874267578, 34.636993408203125, -5.724273681640625, 1.955923080444336, -2.5093154907226562, 15.083759307861328, 10.447792053222656, 10.587120056152344, -27.28802490234375, 13.406501770019531, -1.0893478393554688, -1.8806571960449219, 26.62883758544922, 5.1170501708984375, 17.732791900634766, 42.71659851074219, 30.110366821289062, 30.928382873535156, 8.704296112060547, -2.037322998046875, -25.873729705810547, 3.179981231689453, 3.9661331176757812, 6.767448425292969, 30.950958251953125, -1.0797920227050781, 35.584434509277344, -21.39312744140625, 32.106693267822266, 9.37689208984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000219.npy"}
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 11.597736358642578, "std": 20.082504272460938, "min": -30.855350494384766, "p10": -8.986643600463866, "median": 6.330654144287109, "p90": 39.479119110107426, "max": 53.69670104980469, "pos_frac": 0.671875, "sample": [2.6915931701660156, -15.754402160644531, 18.58924102783203, 6.0497589111328125, 1.1147422790527344, 13.735763549804688, 45.94610595703125, 8.080673217773438, 5.701608657836914, 5.11676025390625, -3.5826263427734375, 46.763458251953125, 3.757610321044922, -28.183025360107422, -9.132396697998047, 1.2732467651367188, 41.01519012451172, 1.33050537109375, -2.200450897216797, 0.8006515502929688, 31.554500579833984, 11.5972900390625, -1.9320106506347656, -25.333498001098633, -22.657981872558594, -12.038370132446289, -6.272424697875977, -8.646553039550781, -5.470306396484375, 27.093551635742188, 30.219337463378906, 22.587867736816406, 22.527908325195312, 35.81972885131836, 40.403709411621094, 2.679370880126953, 34.69184875488281, 26.872772216796875, 30.88448143005371, -3.533538818359375, 2.5818405151367188, 19.099424362182617, -5.206153869628906, 18.600379943847656, 53.69670104980469, -1.7759552001953125, 40.15950012207031, -0.2577667236328125, 9.449968338012695, 37.891563415527344, 30.650705337524414, -5.919944763183594, 35.89625549316406, -0.05483245849609375, 13.587318420410156, 6.611549377441406, -30.855350494384766, 13.421669006347656, 33.39202117919922, -0.1789703369140625, 49.902099609375, -4.981372833251953, 34.02312088012695, 18.359683990478516], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000220.npy"}
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 12.508028030395508, "std": 19.794475555419922, "min": -32.81507110595703, "p10": -10.285166549682616, "median": 8.234127044677734, "p90": 41.74824371337893, "max": 54.106597900390625, "pos_frac": 0.765625, "sample": [4.213766098022461, -7.332126617431641, 17.101547241210938, 18.96190643310547, -0.6896228790283203, 31.49213409423828, 6.15605354309082, 3.1869354248046875, 21.096946716308594, -19.04741668701172, -9.26198959350586, 7.34918212890625, 11.235645294189453, 9.119071960449219, -13.547096252441406, 6.461204528808594, 3.64697265625, -0.7423877716064453, 17.505361557006836, 20.973278045654297, 32.59787368774414, 34.513938903808594, 11.40616226196289, 35.380226135253906, 52.082305908203125, 3.7285385131835938, 31.271831512451172, 51.13486862182617, -3.2552413940429688, 2.411306381225586, 20.5887451171875, 44.477394104003906, 0.6248741149902344, 14.94491195678711, 50.850982666015625, 10.883991241455078, 27.97540283203125, -10.723670959472656, 24.0107421875, 17.71978759765625, 26.450477600097656, -12.050949096679688, 3.773845672607422, -1.127511978149414, 0.26953125, -1.0634841918945312, 0.6402816772460938, 46.25337219238281, 0.7087249755859375, 6.505817413330078, -26.069229125976562, -32.81507110595703, 1.8495521545410156, 54.106597900390625, 32.85353088378906, -23.23476219177246, 7.1042022705078125, 12.833251953125, 48.46803283691406, 31.26556396484375, -1.3410415649414062, 1.7005233764648438, 33.185672760009766, 9.742515563964844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000221.npy"}
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 16.42353057861328, "std": 21.11109161376953, "min": -33.49108123779297, "p10": -7.460014915466307, "median": 16.862228393554688, "p90": 41.72143707275391, "max": 55.61944580078125, "pos_frac": 0.78125, "sample": [-1.5642948150634766, -19.737037658691406, 21.343852996826172, 9.7822265625, 36.57695770263672, 36.45294189453125, 17.396068572998047, -5.2960205078125, 41.53074645996094, 47.078460693359375, 15.958343505859375, 4.401556015014648, -29.97136688232422, 2.0510711669921875, 35.898529052734375, 2.9162445068359375, 7.866355895996094, 7.639202117919922, 7.798015594482422, 32.1845817565918, 24.037559509277344, -0.06783485412597656, 36.84541320800781, 36.60541534423828, -2.149871826171875, 42.917938232421875, 3.1449050903320312, 54.211761474609375, -6.203004837036133, 30.8177490234375, 18.278350830078125, 7.86962890625, 43.331573486328125, 37.42943572998047, 8.158859252929688, 16.36919403076172, 18.859756469726562, -8.829978942871094, -4.658269882202148, -7.9987335205078125, -31.974716186523438, 17.355262756347656, 20.19672393798828, 11.327774047851562, 1.8997764587402344, 55.61944580078125, 6.648979187011719, -33.49108123779297, 0.246063232421875, 34.827308654785156, 36.23900604248047, -0.5584964752197266, 35.676910400390625, 48.37493896484375, 20.6483154296875, 9.206474304199219, 39.6046257019043, 37.80851745605469, 9.171356201171875, 40.91026306152344, 26.236541748046875, 20.097076416015625, 41.80316162109375, -16.0445556640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000222.npy"}
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 15.375591278076172, "std": 18.790061950683594, "min": -15.883747100830078, "p10": -5.602482795715331, "median": 14.519264221191406, "p90": 42.59496192932129, "max": 55.050811767578125, "pos_frac": 0.765625, "sample": [24.97838592529297, 5.196268081665039, 32.79850769042969, 7.516937255859375, -3.5946578979492188, 25.35752296447754, 0.352325439453125, 22.890838623046875, 10.091651916503906, 26.4739990234375, 2.723125457763672, 48.12646484375, 29.12041473388672, 26.29253387451172, 1.8199577331542969, -10.064899444580078, -0.6486396789550781, 26.433372497558594, 1.878173828125, -3.6612396240234375, 39.27544021606445, 14.799911499023438, -11.586074829101562, 28.804933547973633, -9.980033874511719, 3.0833740234375, 1.0625286102294922, 26.361404418945312, -4.168998718261719, -10.591632843017578, 31.009437561035156, 7.423095703125, -11.699295043945312, 4.293819427490234, 0.0358123779296875, 48.445350646972656, 41.59672546386719, 22.005084991455078, -3.0797576904296875, -15.883747100830078, -4.8804168701171875, -5.91193962097168, 21.513595581054688, 28.394515991210938, 46.50079345703125, 6.608638763427734, 14.672332763671875, 31.19322395324707, 16.891220092773438, 43.02277755737305, 55.050811767578125, -1.886688232421875, 54.07588195800781, 47.98664093017578, 39.69226837158203, 17.75775146484375, 31.256988525390625, 33.707237243652344, 16.684982299804688, -2.4315547943115234, 1.0280303955078125, 14.366195678710938, 1.4036483764648438, 2.052490234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000223.npy"}
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 8.56376838684082, "std": 19.547433853149414, "min": -32.05122375488281, "p10": -14.484741973876954, "median": 4.238330841064453, "p90": 37.34429740905762, "max": 53.314876556396484, "pos_frac": 0.640625, "sample": [7.863346099853516, 52.92156982421875, 14.035381317138672, 10.209304809570312, 6.4125518798828125, 43.1838264465332, 20.33716583251953, 21.867523193359375, -0.7305316925048828, 4.5172882080078125, 37.766422271728516, 4.605642318725586, 31.441551208496094, 36.48784255981445, -17.378223419189453, -0.7417182922363281, 4.950815200805664, 15.85906982421875, 14.4051513671875, 2.5407257080078125, 1.9075393676757812, 22.157203674316406, -0.6497344970703125, -32.05122375488281, -14.494316101074219, -0.5919418334960938, -6.3452911376953125, -7.713020324707031, 12.94158935546875, -2.1217803955078125, 32.585906982421875, -31.9271240234375, -19.669771194458008, -9.075775146484375, -7.6449737548828125, 0.20439720153808594, -0.8378448486328125, -25.057167053222656, 5.122119903564453, 12.088706970214844, -21.210769653320312, 22.41021728515625, 31.98015785217285, 3.2428741455078125, -14.46240234375, 3.645050048828125, 12.734947204589844, 32.5489616394043, -11.955879211425781, -1.892333984375, -1.7630672454833984, 18.07649040222168, 33.630332946777344, 40.1824951171875, 53.314876556396484, 16.941490173339844, 3.3416900634765625, 3.19464111328125, -0.14452552795410156, 44.06063461303711, -2.6113128662109375, 37.71134948730469, 3.9593734741210938, 1.7637405395507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000224.npy"}
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 15.735418319702148, "std": 21.033153533935547, "min": -27.52880859375, "p10": -7.546736907958983, "median": 11.760604858398438, "p90": 47.85483856201172, "max": 57.7894287109375, "pos_frac": 0.78125, "sample": [1.13446044921875, 16.16667938232422, 5.602512359619141, -0.1341552734375, -6.294281005859375, 25.36358642578125, -9.38399887084961, 0.4674224853515625, 26.701082229614258, 11.370063781738281, 7.711311340332031, 45.239227294921875, 6.319648742675781, 48.964508056640625, -12.07900619506836, 1.4715385437011719, 17.44628143310547, 1.2085037231445312, 33.56992721557617, 46.671356201171875, 48.36204528808594, 20.324966430664062, 52.862449645996094, 18.965286254882812, 57.7894287109375, 18.541614532470703, 6.215179443359375, 38.89458465576172, 46.47727584838867, -3.9950809478759766, 1.1481246948242188, -22.766250610351562, 24.65111541748047, -8.083503723144531, 52.886817932128906, 17.823867797851562, 1.0054187774658203, 18.019744873046875, 29.18072509765625, 10.330169677734375, 1.6333847045898438, 5.049720764160156, 45.51353454589844, -0.1472644805908203, 22.23003387451172, -20.836273193359375, -14.211395263671875, -2.462493896484375, 16.815673828125, 6.3749847412109375, 29.95006561279297, 0.7091426849365234, 5.31072998046875, 52.46711730957031, 39.05917739868164, -0.7578487396240234, 3.7373809814453125, 12.151145935058594, 49.96304702758789, 24.64739418029785, -27.52880859375, 21.533126831054688, 42.237281799316406, -2.5226898193359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000225.npy"}
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 12.781817436218262, "std": 23.639680862426758, "min": -41.25177001953125, "p10": -13.582601547241211, "median": 9.052985191345215, "p90": 42.88998641967773, "max": 53.30511474609375, "pos_frac": 0.75, "sample": [6.620704650878906, -13.822784423828125, 24.361900329589844, 17.248205184936523, 10.117324829101562, 10.924713134765625, 17.355812072753906, 2.6047744750976562, 28.920528411865234, -13.022174835205078, 36.28180694580078, 8.277772903442383, 32.557350158691406, 41.46157455444336, -28.8797607421875, 0.4521331787109375, 3.6210403442382812, -41.25177001953125, 42.1998405456543, 20.82007598876953, 46.54154968261719, -2.3071441650390625, 36.23456573486328, 2.95709228515625, 27.321205139160156, 0.10024833679199219, 6.0242767333984375, 42.45355987548828, 40.374061584472656, 31.70526885986328, -6.253782272338867, 48.09295654296875, 9.828197479248047, 14.062278747558594, 48.899322509765625, 5.2557220458984375, 25.44945526123047, -37.29328918457031, -2.2685298919677734, 1.5118751525878906, 1.4813690185546875, 5.7759246826171875, -9.724861145019531, -36.32299041748047, 47.43346405029297, 53.30511474609375, -26.091934204101562, 7.267669677734375, -33.536964416503906, 23.976341247558594, 7.203548431396484, 2.6742172241210938, 13.717121124267578, 10.037919998168945, 1.122955322265625, 50.42816162109375, 39.67242431640625, 41.626426696777344, -4.665283203125, -1.1976604461669922, 43.0770263671875, -1.3145332336425781, -5.030517578125, 41.58338928222656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000226.npy"}
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 13.956131935119629, "std": 24.46851348876953, "min": -41.640647888183594, "p10": -17.238189697265625, "median": 5.544754981994629, "p90": 48.48859558105469, "max": 55.222564697265625, "pos_frac": 0.6875, "sample": [40.588287353515625, 12.86726188659668, -0.9617996215820312, 45.157859802246094, 3.8379058837890625, -0.9408454895019531, 35.56561279296875, 50.80701446533203, -41.640647888183594, -3.730133056640625, 53.47291564941406, -15.750732421875, -2.1977920532226562, 50.74616241455078, 11.866798400878906, 4.757293701171875, 1.3447227478027344, -1.601409912109375, 44.76482391357422, -4.393455505371094, 14.548080444335938, 43.421661376953125, 4.975627899169922, 5.573335647583008, 55.222564697265625, 20.60810089111328, -2.58245849609375, 36.730010986328125, 0.25154685974121094, -1.6265144348144531, 30.0025634765625, 54.96150207519531, 32.01817321777344, 5.51617431640625, 6.25299072265625, 35.38566589355469, 53.56465530395508, 2.0016098022460938, 18.25501251220703, 47.705894470214844, 4.564857482910156, 11.194129943847656, -21.152191162109375, 10.343521118164062, 37.64794921875, -1.7502517700195312, 2.8579254150390625, 3.336833953857422, -20.134170532226562, -0.12073135375976562, -32.083702087402344, 30.81295394897461, 13.488723754882812, 36.63856506347656, 2.777975082397461, -22.94720458984375, 3.9108657836914062, -0.7188873291015625, 47.72571563720703, -4.2830810546875, -32.502044677734375, -17.87567138671875, 45.29878234863281, 48.81554412841797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000227.npy"}
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 13.987799644470215, "std": 21.801000595092773, "min": -36.56028366088867, "p10": -11.505220794677733, "median": 10.269128799438477, "p90": 45.22024650573731, "max": 59.90972900390625, "pos_frac": 0.71875, "sample": [-0.10110855102539062, 41.52375793457031, 2.481109619140625, 45.77606201171875, 9.582183837890625, 21.884368896484375, 14.19819450378418, 50.83574676513672, 8.356353759765625, -21.667007446289062, 24.977571487426758, 0.30579376220703125, 7.756103515625, -36.56028366088867, 7.281715393066406, 2.2998390197753906, 43.923343658447266, -12.552337646484375, 8.199115753173828, 17.499141693115234, 0.6451587677001953, 46.070098876953125, -30.174781799316406, 10.330986022949219, -7.6535186767578125, 39.72873306274414, 33.20538330078125, 59.90972900390625, 5.001527786254883, 31.042098999023438, -8.281925201416016, -9.947486877441406, 38.29922866821289, -0.6118011474609375, 26.318588256835938, -3.8267364501953125, 31.705554962158203, -1.5729522705078125, -14.051025390625, -6.206016540527344, 17.632949829101562, 29.953826904296875, 24.54150390625, -19.61590003967285, 10.643539428710938, 1.9946155548095703, 50.580955505371094, 54.168426513671875, -12.172821044921875, 8.635810852050781, -2.127593994140625, 10.207271575927734, 17.040420532226562, 16.600631713867188, -3.8821487426757812, 14.105613708496094, 43.61518859863281, 49.4359130859375, 37.73193359375, 23.926010131835938, -8.071823120117188, 13.185409545898438, 32.64241027832031, 8.516548156738281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000228.npy"}
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 13.739013671875, "std": 25.092138290405273, "min": -48.687442779541016, "p10": -10.343705749511718, "median": 12.403694152832031, "p90": 44.76239242553711, "max": 57.547698974609375, "pos_frac": 0.75, "sample": [31.30640411376953, 44.95764923095703, 23.720809936523438, 24.63861846923828, 16.17430877685547, 16.1282958984375, 35.24418640136719, 22.338960647583008, 19.73514175415039, -24.976638793945312, 40.599876403808594, 24.1243896484375, 45.89794158935547, -8.726737976074219, 25.453353881835938, 5.550907135009766, -45.6884765625, -5.544612884521484, 43.855072021484375, 43.192813873291016, 10.983047485351562, 1.6128101348876953, 15.287139892578125, 20.249710083007812, -29.056900024414062, 7.084430694580078, -8.590583801269531, -0.6262722015380859, 4.1758575439453125, -10.83782958984375, 0.014678955078125, 52.279205322265625, -48.687442779541016, 42.027610778808594, -9.190750122070312, 4.536260604858398, 41.18999481201172, -29.176937103271484, 4.017677307128906, 15.05474853515625, 48.49016571044922, 20.30929183959961, 0.4329109191894531, -1.70587158203125, 40.19905471801758, -0.5119895935058594, 51.144187927246094, 13.8243408203125, 44.306793212890625, 7.088403701782227, 6.951683044433594, -0.5521411895751953, 27.982315063476562, 3.439483642578125, 54.21123504638672, 2.460601806640625, -0.046600341796875, 0.4734916687011719, 43.80722427368164, 57.547698974609375, 42.44322204589844, 2.1368331909179688, 2.0829734802246094, -47.547149658203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000229.npy"}
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 15.695627212524414, "std": 22.52355194091797, "min": -46.771881103515625, "p10": -3.1178413391113278, "median": 8.02955436706543, "p90": 46.80790557861329, "max": 72.18386840820312, "pos_frac": 0.84375, "sample": [-3.3066368103027344, 8.804275512695312, 40.12715148925781, 1.5703086853027344, 25.259742736816406, -21.437255859375, 1.9014301300048828, 0.908599853515625, 6.541343688964844, 52.68151092529297, 12.626617431640625, 52.43817901611328, 49.39904022216797, 35.042030334472656, 19.23651123046875, 27.277517318725586, 21.830673217773438, 43.87199401855469, -26.22705841064453, 3.8046417236328125, -46.771881103515625, 4.7688140869140625, 4.69842529296875, 27.441516876220703, 19.59844207763672, -2.677318572998047, 47.51923370361328, 6.206787109375, 45.14813995361328, 42.417091369628906, 27.782363891601562, 2.866741180419922, -1.1808528900146484, 2.3591384887695312, 3.462188720703125, 10.538017272949219, 5.962394714355469, 5.661529541015625, 72.18386840820312, 7.795684814453125, 5.6297149658203125, 5.542610168457031, 2.452342987060547, 26.822738647460938, 65.0705795288086, 34.11669158935547, -15.308267593383789, 52.278038024902344, 4.666015625, 0.7462425231933594, 36.19464111328125, -17.880767822265625, 41.862403869628906, 36.606590270996094, 14.614315032958984, 6.9256591796875, 2.597930908203125, -1.7784366607666016, 9.331701278686523, 3.3927536010742188, 8.263423919677734, 39.075164794921875, -6.945648193359375, 12.112735748291016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000230.npy"}
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 22.748565673828125, "std": 24.211551666259766, "min": -49.5140380859375, "p10": -4.873424911499023, "median": 25.89825439453125, "p90": 54.5241569519043, "max": 64.99409484863281, "pos_frac": 0.8125, "sample": [48.03643798828125, 9.990287780761719, 34.6762580871582, 41.930519104003906, 0.8851394653320312, 24.274642944335938, 3.7265586853027344, 53.584381103515625, 9.704948425292969, 10.142601013183594, 37.36107635498047, 2.69061279296875, 28.620071411132812, -4.536746978759766, -5.7005157470703125, 34.26531219482422, 9.844833374023438, 54.770729064941406, 14.775314331054688, -14.268440246582031, 35.52519989013672, -3.7223434448242188, 36.723609924316406, 39.655941009521484, 6.763393402099609, 57.16255187988281, -5.0177154541015625, -19.518627166748047, 56.123451232910156, 31.612598419189453, 51.091888427734375, -40.54438400268555, 27.723800659179688, 53.948822021484375, 41.45815658569336, 16.567657470703125, 3.8932018280029297, 24.151409149169922, -0.13665771484375, -1.8523788452148438, 64.99409484863281, 4.123695373535156, 24.079540252685547, 21.38385772705078, 38.30223846435547, 13.726583480834961, 22.12999725341797, -8.310657501220703, 9.138179779052734, -1.4513053894042969, 44.37380599975586, 50.90643310546875, 27.521865844726562, 32.834449768066406, 43.289337158203125, 56.295326232910156, 29.430084228515625, 29.373178482055664, 31.889419555664062, 33.70182800292969, -49.5140380859375, 18.794607162475586, 55.67731475830078, 56.83479309082031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000231.npy"}
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 15.489090919494629, "std": 27.67613983154297, "min": -51.80242919921875, "p10": -12.257874298095702, "median": 10.260644912719727, "p90": 52.99346427917481, "max": 81.00030517578125, "pos_frac": 0.75, "sample": [0.5528373718261719, 52.3746223449707, 39.10834503173828, 60.093528747558594, 5.282251358032227, 50.876434326171875, 31.50090789794922, 43.503700256347656, -1.0472145080566406, -19.32245635986328, -17.758129119873047, 5.5887908935546875, -2.9389495849609375, 10.60147476196289, 17.749080657958984, 49.76943588256836, 1.0534801483154297, 2.0523853302001953, 22.97607421875, 1.9703788757324219, 0.16126251220703125, 30.82391357421875, 24.9222412109375, 24.727294921875, 7.0716552734375, 31.24408721923828, -6.425691604614258, 59.688499450683594, -8.68295669555664, 12.708869934082031, 19.766393661499023, 50.34739685058594, -10.470979690551758, -2.202482223510742, 11.89599609375, 1.6882171630859375, 0.9762496948242188, -42.58006286621094, 63.57264709472656, 13.390167236328125, 1.2925834655761719, 55.398162841796875, -10.147346496582031, 74.37454223632812, 53.25868225097656, 81.00030517578125, -12.448539733886719, -11.81298828125, -4.953765869140625, 9.467166900634766, 21.416847229003906, 47.51323699951172, 31.978927612304688, 9.731666564941406, 2.9835853576660156, -37.74163818359375, 0.3500804901123047, 52.33330535888672, 10.683467864990234, 24.153430938720703, -17.737319946289062, -51.80242919921875, 9.919815063476562, 15.480348587036133], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000232.npy"}
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 12.59751033782959, "std": 24.70276641845703, "min": -44.482574462890625, "p10": -13.120269393920898, "median": 6.365699768066406, "p90": 46.36572189331055, "max": 64.69493103027344, "pos_frac": 0.6875, "sample": [63.00731658935547, 11.90777587890625, 46.55536651611328, -17.564208984375, 6.39349365234375, 19.931257247924805, 21.514404296875, 18.694477081298828, -2.805408477783203, -6.568336486816406, 6.3379058837890625, 64.69493103027344, -3.684619903564453, -0.30255889892578125, -1.5490188598632812, 13.363815307617188, 42.31576919555664, 3.146272659301758, -2.4908065795898438, 0.36629486083984375, 48.017120361328125, 5.683204650878906, 39.85272216796875, 45.9232177734375, -13.089569091796875, 20.49060821533203, -8.227378845214844, 22.48883056640625, -27.636016845703125, -25.952285766601562, 43.38416290283203, 7.339231491088867, 48.10472869873047, 20.805503845214844, 41.17286682128906, -44.482574462890625, 11.487197875976562, 1.873941421508789, -2.515594482421875, 0.5589351654052734, 5.4179840087890625, -33.006813049316406, 28.661964416503906, 0.2951202392578125, 17.23912811279297, -8.933296203613281, -6.095878601074219, 21.070663452148438, 43.178619384765625, 36.726226806640625, -13.133426666259766, -11.132455825805664, 1.5913314819335938, 64.62479400634766, 27.27276611328125, 62.20562744140625, 3.693073272705078, -5.546855926513672, 43.48594665527344, 1.8304290771484375, 12.27313232421875, 0.11273956298828125, -21.56654167175293, 17.433425903320312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000233.npy"}
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 12.892807960510254, "std": 27.373390197753906, "min": -40.908103942871094, "p10": -18.267544555664063, "median": 4.354333877563477, "p90": 53.05240936279298, "max": 62.63526916503906, "pos_frac": 0.65625, "sample": [6.000511169433594, -38.22892761230469, 4.3267364501953125, -10.75909423828125, 27.38520050048828, -2.027202606201172, 2.253080368041992, -12.114143371582031, 37.0367431640625, -15.409599304199219, 13.510948181152344, 24.458545684814453, 55.28496551513672, 17.346113204956055, 55.40961456298828, 38.621089935302734, -19.28668212890625, -31.66586685180664, 44.156707763671875, -2.891836166381836, 3.5005359649658203, 2.0291099548339844, 50.71098327636719, 62.63526916503906, -0.9583587646484375, -6.099822998046875, 49.48521041870117, -1.295297622680664, -18.447715759277344, 8.02255630493164, 47.21138000488281, 58.6336784362793, 54.055877685546875, 47.04848098754883, -2.1453723907470703, -40.908103942871094, -1.121734619140625, 4.231414794921875, 3.1141014099121094, -14.46285629272461, 39.46903991699219, 40.627769470214844, 4.208148956298828, 26.992088317871094, 47.12443542480469, 3.4417781829833984, 3.274372100830078, 19.773483276367188, 54.96058654785156, -7.642389297485352, -14.445724487304688, 16.797964096069336, -32.514312744140625, 24.78024673461914, -39.549842834472656, 0.12817001342773438, 34.067138671875, 10.026222229003906, -4.914756774902344, 4.381931304931641, 14.831298828125, 59.269630432128906, 39.25335693359375, -17.847145080566406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000234.npy"}
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 21.488697052001953, "std": 24.788860321044922, "min": -40.834815979003906, "p10": -2.701533508300781, "median": 19.439388275146484, "p90": 53.797717666625985, "max": 68.72542572021484, "pos_frac": 0.84375, "sample": [1.9441070556640625, 18.71137237548828, 20.167404174804688, 28.81822395324707, 36.58302307128906, 15.599750518798828, 10.250335693359375, 68.72542572021484, 65.440185546875, 38.53071594238281, 24.803955078125, 45.651344299316406, 15.83673095703125, 28.754343032836914, 51.19075012207031, 43.717018127441406, 20.89182472229004, 4.807895660400391, 41.95032501220703, 58.33131790161133, 3.3878250122070312, 8.798500061035156, 15.26690673828125, 51.92646789550781, 50.654808044433594, 35.632469177246094, -2.757190704345703, -3.8501625061035156, 52.087886810302734, -39.60246658325195, 31.103065490722656, 2.2024154663085938, 6.553020477294922, 18.479034423828125, 54.56459426879883, 9.289634704589844, 22.946765899658203, 4.2704620361328125, 25.335861206054688, -35.317359924316406, -4.381437301635742, 43.60797119140625, 17.237030029296875, 7.403419494628906, 38.64019775390625, 5.796070098876953, -1.2768173217773438, 49.64437484741211, 65.65451049804688, 5.4358367919921875, 8.189964294433594, -14.15875244140625, 58.660247802734375, 25.478424072265625, 1.1234779357910156, -40.834815979003906, 29.00379180908203, 39.62162780761719, 54.53050231933594, 3.5283546447753906, 33.516700744628906, -2.571666717529297, 2.12249755859375, -2.3735008239746094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000235.npy"}
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 16.652339935302734, "std": 28.860448837280273, "min": -65.86691284179688, "p10": -8.960432815551759, "median": 8.941462516784668, "p90": 56.22747459411621, "max": 72.46031188964844, "pos_frac": 0.71875, "sample": [48.4740104675293, -4.05419921875, 3.626251220703125, 40.01176071166992, 9.021787643432617, 2.24920654296875, 48.335914611816406, 16.858001708984375, -6.408149719238281, 18.350284576416016, 33.55744171142578, 5.68853759765625, -4.236104965209961, 4.5311126708984375, -6.596208572387695, 3.2259445190429688, 41.749595642089844, 51.88426208496094, 21.401260375976562, 56.33216857910156, 2.3660526275634766, -2.8673934936523438, 0.05902099609375, -1.1206779479980469, 13.082923889160156, 8.861137390136719, 34.71176528930664, 46.99053955078125, 59.34323501586914, 36.739280700683594, 54.62590789794922, -2.8262252807617188, 5.0543060302734375, 12.973949432373047, 39.84779357910156, -19.222259521484375, 3.3853759765625, 72.46031188964844, 0.6013069152832031, 11.103178024291992, -38.952484130859375, 14.36379623413086, 51.964820861816406, 55.98318862915039, 39.892845153808594, 27.552541732788086, -1.1767578125, -8.945289611816406, 60.003360748291016, 2.6283226013183594, -0.9248847961425781, -65.86691284179688, -59.05461120605469, 61.38254165649414, 60.1517333984375, 54.577674865722656, 9.23394775390625, -10.893133163452148, -9.335273742675781, -8.966922760009766, 57.450870513916016, -1.6245956420898438, 7.563924789428711, 8.568679809570312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000236.npy"}
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 19.405229568481445, "std": 29.409412384033203, "min": -59.15248107910156, "p10": -19.032652282714842, "median": 15.69427490234375, "p90": 56.483409118652354, "max": 76.53666687011719, "pos_frac": 0.78125, "sample": [49.26554870605469, 51.77655792236328, 0.5377197265625, 62.08281707763672, 1.2695465087890625, -16.811752319335938, 9.75213623046875, 16.33313751220703, 8.824748992919922, 46.69965362548828, 22.312957763671875, 25.184326171875, -12.981430053710938, -0.7158355712890625, 46.13694381713867, 10.937095642089844, -1.6439208984375, 32.596595764160156, -30.107269287109375, 53.81700897216797, -59.15248107910156, 44.220829010009766, 16.976699829101562, 46.001426696777344, 71.08247375488281, -3.5089473724365234, 42.23602294921875, 47.391807556152344, -19.984466552734375, 19.69983673095703, 51.359947204589844, 25.694419860839844, 7.715446472167969, -42.7203369140625, 34.53297424316406, 2.933086395263672, 5.15667724609375, -40.428993225097656, 65.30902099609375, 6.099191665649414, 24.738723754882812, 15.055412292480469, 11.702537536621094, 13.831649780273438, 6.367408752441406, 0.018953323364257812, 33.783546447753906, 47.7027587890625, 24.1085205078125, 57.62615203857422, 51.88916778564453, 6.9194793701171875, 62.237083435058594, 76.53666687011719, -21.211456298828125, 6.025127410888672, -2.2464065551757812, 52.72442626953125, 5.4485015869140625, 24.572601318359375, -3.1660194396972656, 11.639991760253906, -20.394561767578125, 60.143218994140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000237.npy"}
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 19.425922393798828, "std": 28.365190505981445, "min": -50.60237121582031, "p10": -16.600362586975095, "median": 16.15370273590088, "p90": 57.44048004150391, "max": 68.32926940917969, "pos_frac": 0.734375, "sample": [-20.530696868896484, 7.129341125488281, 56.336055755615234, -13.941947937011719, 58.89736557006836, 65.50084686279297, 0.9751091003417969, 9.79632568359375, 58.29528045654297, -17.739683151245117, 13.108856201171875, -9.51617431640625, 57.793296813964844, 17.218555450439453, 40.198333740234375, 27.75045394897461, -1.6457805633544922, -11.261594772338867, 26.981094360351562, 62.9599609375, 43.83262634277344, 36.6212158203125, 47.58995819091797, 38.13077163696289, 12.725786209106445, 56.61724090576172, 21.890350341796875, -10.276298522949219, -25.190898895263672, 43.20209503173828, 53.07398986816406, 40.82141876220703, 4.392423629760742, 55.40251159667969, -0.11238861083984375, 9.030500411987305, 27.22662353515625, -3.558382034301758, 43.330291748046875, 46.68006896972656, 3.7036094665527344, -11.88189697265625, 1.0353813171386719, 11.89263916015625, 33.707427978515625, 40.21290588378906, 68.32926940917969, 25.238128662109375, 15.088850021362305, 31.910003662109375, 5.0535888671875, 38.415306091308594, 54.13087463378906, -17.812124252319336, -9.551698684692383, 0.9160499572753906, -50.60237121582031, 5.67767333984375, 66.05439758300781, -5.7990264892578125, 18.7603759765625, -24.941390991210938, -28.580127716064453, 2.566324234008789], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000238.npy"}
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 19.522605895996094, "std": 30.347740173339844, "min": -50.420387268066406, "p10": -16.465236854553222, "median": 14.63595962524414, "p90": 63.70700721740724, "max": 81.00990295410156, "pos_frac": 0.734375, "sample": [11.881622314453125, 9.846923828125, 34.43077087402344, -0.6101264953613281, 58.66612243652344, 10.778335571289062, -7.612270355224609, -0.9624404907226562, -8.484039306640625, 14.057914733886719, -13.410736083984375, 31.392333984375, -1.2482032775878906, 66.77507019042969, 51.98350524902344, 2.9769439697265625, -16.958084106445312, -26.836078643798828, 3.86737060546875, 57.63226318359375, 3.4422454833984375, 12.601127624511719, 49.74897766113281, 66.33309173583984, 65.24012756347656, 14.722503662109375, 35.323585510253906, 43.02309036254883, 26.09552001953125, 9.2227783203125, 60.12972640991211, 27.619388580322266, -30.13666534423828, -15.31525993347168, 16.98349380493164, 49.94488525390625, -1.5855884552001953, 8.640066146850586, -18.98785400390625, 30.771392822265625, -1.8653907775878906, 44.19306945800781, 14.549415588378906, 6.682220458984375, -45.679039001464844, 37.113807678222656, 71.4355239868164, 75.44442749023438, 81.00990295410156, -35.31719207763672, 0.23622894287109375, 7.4826202392578125, -50.420387268066406, 54.455360412597656, 21.34320068359375, 44.83270263671875, 66.21417236328125, -2.9369850158691406, 20.831924438476562, 40.53216552734375, 2.6425552368164062, 21.178993225097656, 28.439979553222656, 15.063713073730469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000239.npy"}
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 25.508594512939453, "std": 30.002666473388672, "min": -59.675148010253906, "p10": -6.020940780639648, "median": 22.2509708404541, "p90": 66.99662017822266, "max": 79.94664001464844, "pos_frac": 0.8125, "sample": [36.43009567260742, -6.1844482421875, 55.568603515625, 1.560821533203125, 43.65679931640625, 76.92713928222656, 21.892303466796875, 34.679115295410156, -23.04475975036621, -19.790191650390625, 5.277587890625, -3.3584651947021484, 47.65078353881836, 26.179548263549805, 43.48482131958008, 69.33810424804688, 48.743255615234375, -13.676078796386719, 64.7113265991211, 27.770973205566406, -5.639423370361328, 48.16584014892578, 22.609638214111328, -24.68286895751953, 20.23614501953125, 45.60663604736328, 12.592681884765625, 58.68608093261719, 78.23548889160156, 19.73621368408203, 56.372093200683594, 79.94664001464844, 5.642669677734375, 1.393341064453125, 1.6794891357421875, 52.35570526123047, 21.38085174560547, 50.820404052734375, 62.492889404296875, 1.6370105743408203, 50.185791015625, 5.145923614501953, -6.546806335449219, 30.627967834472656, 70.94336700439453, -59.675148010253906, -1.1622867584228516, 0.23307037353515625, 26.69087791442871, 4.230224609375, 20.518939971923828, 65.61068725585938, -0.10762786865234375, 23.275421142578125, 1.0545730590820312, 10.421646118164062, 67.59059143066406, 68.5133056640625, 33.71189880371094, 7.98851203918457, 54.071937561035156, 2.8033676147460938, -3.0132827758789062, 12.352279663085938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000240.npy"}
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 17.190608978271484, "std": 30.0471134185791, "min": -59.864952087402344, "p10": -10.802209472656248, "median": 14.501585006713867, "p90": 56.8606315612793, "max": 82.9854507446289, "pos_frac": 0.65625, "sample": [-9.618759155273438, -8.556846618652344, 46.70182800292969, -27.938867568969727, 32.500511169433594, -9.337997436523438, 22.225929260253906, 43.38756561279297, 45.0311279296875, 39.90473937988281, -5.9730682373046875, 44.47034454345703, -11.309402465820312, 13.678537368774414, 5.303657531738281, 68.1744613647461, 1.7347183227539062, 3.041818618774414, -1.4788589477539062, 49.39167022705078, 32.08598327636719, 18.59417724609375, -4.6038970947265625, -1.1906967163085938, 37.601661682128906, -3.592496871948242, 56.32296371459961, -0.40985679626464844, -5.0092620849609375, -16.93364715576172, 64.61601257324219, 79.23524475097656, 73.77012634277344, 26.465011596679688, -2.0291519165039062, -40.097686767578125, -4.350700378417969, 21.958953857421875, 15.9124755859375, 1.9459648132324219, -59.864952087402344, 2.621734619140625, 0.037036895751953125, 55.62892150878906, 15.32463264465332, 18.908287048339844, -28.828628540039062, 82.9854507446289, 25.61962890625, -32.87641143798828, 57.55950927734375, 37.44349670410156, 23.276779174804688, 19.779388427734375, 57.091060638427734, -6.083324432373047, 5.324493408203125, 51.062583923339844, 44.395469665527344, 36.043006896972656, 3.0333709716796875, 8.130990982055664, -5.261407852172852, -2.776418685913086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000241.npy"}
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 23.670665740966797, "std": 32.59656524658203, "min": -63.970069885253906, "p10": -4.331365013122558, "median": 17.406953811645508, "p90": 70.90330123901367, "max": 100.18338012695312, "pos_frac": 0.8125, "sample": [-46.109642028808594, 27.712364196777344, 16.635108947753906, 5.239437103271484, 33.33806610107422, -16.296703338623047, 71.15425872802734, 44.169044494628906, 15.222198486328125, 33.008033752441406, 48.11334991455078, 28.83193588256836, -0.1129913330078125, 85.50397491455078, 65.94599914550781, -6.3113555908203125, 73.88683319091797, -3.134246826171875, 5.366203308105469, 17.455081939697266, 12.113990783691406, 25.821983337402344, 70.31773376464844, 3.834169387817383, 1.7138595581054688, 17.35882568359375, 21.0609130859375, 2.560882568359375, 54.268409729003906, 28.03961181640625, 47.48680877685547, 1.8869743347167969, 81.24638366699219, 10.534412384033203, 51.111454010009766, 11.728935241699219, 0.23244094848632812, 1.9319725036621094, 100.18338012695312, 11.073101043701172, -22.783340454101562, 29.685386657714844, 31.987136840820312, 2.8099632263183594, 25.849143981933594, 1.1551589965820312, 42.174251556396484, 8.117546081542969, 62.10108947753906, 62.31490707397461, 74.65829467773438, 35.111549377441406, 84.70169067382812, -4.453428268432617, -0.18152999877929688, 51.45838165283203, 50.21806335449219, -0.3929004669189453, 6.025184631347656, -63.970069885253906, 4.086448669433594, -4.046550750732422, -39.677711486816406, 23.850793838500977], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000242.npy"}
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 20.903339385986328, "std": 31.16473960876465, "min": -57.22752380371094, "p10": -15.686209869384765, "median": 18.971603393554688, "p90": 65.15886688232422, "max": 76.12770080566406, "pos_frac": 0.734375, "sample": [2.738128662109375, 48.87615203857422, 29.572471618652344, -21.752914428710938, 56.257537841796875, 4.115106582641602, -57.22752380371094, 6.009086608886719, 56.859703063964844, -1.719085693359375, 45.50952911376953, 20.422134399414062, -3.346832275390625, 14.666885375976562, 13.310043334960938, 73.26290130615234, -0.5963306427001953, -1.3204574584960938, 66.74906921386719, -5.337432861328125, 1.7432098388671875, 60.96943664550781, 24.968048095703125, -3.3289947509765625, 40.94439697265625, 32.0648307800293, 3.7310867309570312, 44.230186462402344, 65.28376770019531, 31.344566345214844, -35.97822570800781, -32.66837692260742, 36.32470703125, -9.208616256713867, 61.27368927001953, 29.310094833374023, 43.66070556640625, 64.867431640625, -15.706573486328125, 17.521072387695312, 74.12939453125, 39.68523406982422, 35.931396484375, 76.12770080566406, 1.918975830078125, -0.14434051513671875, 2.5517120361328125, 5.971044540405273, 65.34701538085938, 60.44898223876953, 2.0359649658203125, 33.55744171142578, 4.752677917480469, 47.427696228027344, -31.297161102294922, 22.74273681640625, 30.942039489746094, 23.47171401977539, -15.638694763183594, -11.965049743652344, -21.90251922607422, 71.5055160522461, 5.4873046875, 6.33221435546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000243.npy"}
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 27.367176055908203, "std": 30.483474731445312, "min": -42.7489013671875, "p10": -1.9516504287719711, "median": 21.70119285583496, "p90": 70.67704238891602, "max": 80.76724243164062, "pos_frac": 0.875, "sample": [9.553184509277344, 17.29192352294922, 14.837051391601562, 25.90570068359375, -21.288726806640625, 21.16959571838379, -2.6066341400146484, 70.49373626708984, 25.385805130004883, -22.384361267089844, 21.930667877197266, 59.81401443481445, 0.3403778076171875, 65.57075500488281, 33.048919677734375, 4.697906494140625, 53.246910095214844, 72.17125701904297, -39.17626953125, 66.7574691772461, 21.03094482421875, 12.862621307373047, -42.7489013671875, 50.6412467956543, 10.254707336425781, 63.762977600097656, 34.906333923339844, 53.188995361328125, 70.65691375732422, 10.726037979125977, 78.63825988769531, 9.281181335449219, 2.4476661682128906, -9.872261047363281, 0.86114501953125, 14.462701797485352, -0.4233551025390625, 52.376243591308594, 56.12492370605469, 80.76724243164062, 1.5807781219482422, 55.281036376953125, 0.9064846038818359, 14.135692596435547, 2.438262939453125, 74.97039794921875, 21.471717834472656, 50.8619384765625, 3.7518081665039062, 23.018970489501953, 2.0377960205078125, 10.930702209472656, 70.20500183105469, 25.429351806640625, 75.69886779785156, 34.47383117675781, -3.3321304321289062, 26.564010620117188, 3.525188446044922, 4.2632904052734375, 34.91642761230469, 26.930452346801758, 74.04882049560547, 70.6856689453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000244.npy"}
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 31.766828536987305, "std": 32.20853042602539, "min": -44.0284423828125, "p10": -2.6400360107421874, "median": 32.047767639160156, "p90": 75.04070358276367, "max": 87.30037689208984, "pos_frac": 0.84375, "sample": [37.106895446777344, 44.676361083984375, 39.59769058227539, 76.35719299316406, 13.654088973999023, 7.079639434814453, 2.0080223083496094, 64.18022155761719, 62.89502716064453, 38.60625457763672, 1.8696365356445312, 15.8243408203125, 71.93605041503906, 46.44632339477539, 35.43682098388672, 68.47418212890625, -2.742584228515625, 64.92265319824219, -27.145431518554688, 87.30037689208984, 83.75884246826172, 2.3329544067382812, -0.958740234375, 30.617143630981445, 65.56808471679688, 16.353607177734375, 21.70263671875, -2.4007568359375, 67.66889953613281, 1.0365543365478516, 55.49481201171875, -37.740699768066406, 1.2439804077148438, 47.24595642089844, 6.318336486816406, 31.529312133789062, -8.980209350585938, 24.033761978149414, 57.766544342041016, 81.62850189208984, 33.251930236816406, 1.344635009765625, 77.00562286376953, 11.926383972167969, -3.982931137084961, 36.651878356933594, 73.70842742919922, 74.30411529541016, 0.2142486572265625, -1.190185546875, 32.56622314453125, 63.066341400146484, -7.788909912109375, 54.72567367553711, 81.80133056640625, 28.467510223388672, -44.0284423828125, 75.35638427734375, 28.219547271728516, 6.825399398803711, 5.3026580810546875, 50.306732177734375, 20.541576385498047, 41.77762985229492], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000245.npy"}
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 26.117189407348633, "std": 39.82099151611328, "min": -70.72055053710938, "p10": -14.339141845703123, "median": 19.309463500976562, "p90": 74.69947052001955, "max": 107.52532958984375, "pos_frac": 0.75, "sample": [64.19882202148438, 70.4178695678711, 71.33145904541016, 70.54421997070312, 14.49139404296875, 4.3093414306640625, -58.72249984741211, 75.77711486816406, 24.172073364257812, -9.629953384399414, 56.53410339355469, 64.77479553222656, 32.50428771972656, -3.956249237060547, 49.816253662109375, 52.1807975769043, 11.276702880859375, -4.555248260498047, -4.499168395996094, -2.7248382568359375, 6.89556884765625, 48.601295471191406, -0.05762481689453125, 75.65296936035156, 88.78800964355469, -21.208786010742188, 25.902374267578125, 107.52532958984375, -14.842681884765625, 5.3403472900390625, 12.779901504516602, 71.7278823852539, 11.864593505859375, 5.123212814331055, -39.16241455078125, 30.385894775390625, -62.450538635253906, 9.902368545532227, 78.93694305419922, 0.77532958984375, 45.682403564453125, -39.19536590576172, -8.809173583984375, 83.19348907470703, 23.57589340209961, 3.968719482421875, 55.82268524169922, 69.04020690917969, 61.93605041503906, -70.72055053710938, 2.05523681640625, 22.986862182617188, 57.35552215576172, 89.8701400756836, 12.338001251220703, 2.2751617431640625, 69.3675765991211, 72.47463989257812, -1.0569610595703125, -13.164215087890625, 34.7938232421875, 15.632064819335938, 57.13232421875, 4.2244110107421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000246.npy"}
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 14.344557762145996, "std": 34.92690658569336, "min": -75.41314697265625, "p10": -26.625650024414053, "median": 9.93946647644043, "p90": 61.626440429687506, "max": 102.09780883789062, "pos_frac": 0.671875, "sample": [-41.5008544921875, 22.132457733154297, 42.73020935058594, 4.059989929199219, 2.529163360595703, -7.5086212158203125, 4.1729736328125, 76.62222290039062, 79.23677062988281, -5.259613037109375, 26.648452758789062, 33.46049499511719, 57.634239196777344, 25.818368911743164, 9.197586059570312, 37.020782470703125, 60.27685546875, 24.623371124267578, -4.568780899047852, -33.801666259765625, 102.09780883789062, 72.04109191894531, -55.656890869140625, -5.630483627319336, 5.513153076171875, -5.249755859375, 44.29698944091797, -3.8718795776367188, -30.89849853515625, 41.89496612548828, 13.324264526367188, 6.337821960449219, 11.652557373046875, 3.377166748046875, -1.0, 7.61651611328125, 48.212852478027344, -75.41314697265625, -14.274978637695312, 0.36760902404785156, -0.9560203552246094, 10.681346893310547, 0.9053955078125, 63.93048095703125, 67.94113159179688, 8.835498809814453, -0.006443023681640625, -16.655670166015625, 11.831157684326172, 35.27833557128906, 14.796310424804688, 40.74250793457031, -15.0701904296875, 11.719362258911133, -61.20359802246094, 39.94945526123047, 50.13676452636719, 62.204833984375, -7.03997802734375, -50.079315185546875, -6.40765380859375, 51.26202392578125, 15.517906188964844, 11.476490020751953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000247.npy"}
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 27.606952667236328, "std": 41.12266159057617, "min": -50.68041229248047, "p10": -20.5143440246582, "median": 16.081220626831055, "p90": 82.1442886352539, "max": 100.9570083618164, "pos_frac": 0.71875, "sample": [4.733543395996094, 68.70991516113281, -33.222869873046875, -40.226715087890625, 90.93540954589844, 74.50230407714844, 63.87858581542969, 63.47865295410156, -1.215921401977539, -19.062803268432617, 0.9349441528320312, 82.22926330566406, 16.299713134765625, -41.30145263671875, 75.97598266601562, 39.71167755126953, 86.3505859375, -13.661521911621094, -49.275482177734375, 14.821186065673828, -25.603187561035156, 5.529327392578125, 27.376731872558594, 13.2822265625, 76.08123779296875, 64.55899047851562, 71.67172241210938, 11.452316284179688, 49.6230583190918, 81.607177734375, -19.28594207763672, 70.25996398925781, -2.058940887451172, -15.14401626586914, 81.94601440429688, 78.18525695800781, 9.791610717773438, 21.24773406982422, 26.204343795776367, 15.532821655273438, 50.644508361816406, 66.4288101196289, 15.862728118896484, 93.27952575683594, 89.7835693359375, 35.2164192199707, 3.6210384368896484, -0.7994575500488281, 4.695430755615234, 43.732513427734375, 100.9570083618164, 8.721622467041016, 37.70677185058594, 18.028846740722656, -50.68041229248047, 5.119508743286133, -12.08154296875, 85.10795593261719, 72.63823699951172, -1.66973876953125, -8.066633224487305, 8.387073516845703, -21.040802001953125, -5.60150146484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000248.npy"}
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 10.874479293823242, "std": 38.560733795166016, "min": -95.79740905761719, "p10": -39.366639328002925, "median": 11.944854736328125, "p90": 63.7209342956543, "max": 79.53134155273438, "pos_frac": 0.703125, "sample": [14.554740905761719, -35.08974838256836, 5.3619232177734375, 47.48907470703125, 12.192264556884766, -21.657363891601562, -51.624351501464844, 36.312103271484375, 15.913482666015625, 35.816436767578125, 32.084861755371094, 61.98514938354492, 3.1075973510742188, 64.07645416259766, -72.41526794433594, -14.462600708007812, 22.20550537109375, -46.040077209472656, -50.724090576171875, -32.818687438964844, 3.849222183227539, 0.099761962890625, -28.850990295410156, 6.195539474487305, 33.118202209472656, 11.715080261230469, -0.34532737731933594, -11.794700622558594, 74.62075805664062, -15.44866943359375, -24.96674346923828, 71.30117797851562, 11.250076293945312, 29.97417640686035, 42.167301177978516, -49.33880615234375, 19.57965850830078, 4.4640960693359375, 12.174629211425781, -26.7274169921875, 5.367898941040039, 79.53134155273438, 59.407745361328125, 51.83165740966797, 12.332321166992188, 7.426372528076172, -41.19959259033203, 31.758647918701172, 18.92339324951172, 12.250267028808594, 32.96574401855469, 74.03192138671875, 48.085662841796875, 1.9947128295898438, 15.336980819702148, 13.563430786132812, -19.01426124572754, 62.891387939453125, 78.88883209228516, -95.79740905761719, 77.75187683105469, -28.41370964050293, 2.654205322265625, 4.092864990234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000249.npy"}
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 27.471759796142578, "std": 38.12800979614258, "min": -96.00521850585938, "p10": -5.692404556274414, "median": 23.56277370452881, "p90": 78.69868545532228, "max": 95.20549774169922, "pos_frac": 0.796875, "sample": [51.1097412109375, 13.009885787963867, -6.754737854003906, 11.980974197387695, 20.888916015625, 31.122528076171875, 41.362388610839844, 89.86329650878906, 80.21412658691406, 4.33416748046875, 62.813777923583984, -35.68147277832031, 43.128662109375, 10.54412841796875, 52.586631774902344, 0.19562149047851562, -0.165496826171875, -96.00521850585938, 62.834693908691406, 3.9714584350585938, 33.39239501953125, -4.579168319702148, 28.827896118164062, -44.365447998046875, 67.04143524169922, -5.451995849609375, 85.17051696777344, 38.60620880126953, 2.0824127197265625, 34.96369934082031, 10.430599212646484, 27.367630004882812, 76.99539947509766, 3.947826385498047, 33.75102233886719, 24.50415802001953, 0.18630218505859375, 72.26985168457031, 95.20549774169922, 16.773513793945312, -53.52558135986328, -1.6736488342285156, 32.29849624633789, 2.164764404296875, 91.57582092285156, 58.86056900024414, 45.44293212890625, 59.3908576965332, 79.42866516113281, 67.15615844726562, -4.8623809814453125, -13.084098815917969, 57.188323974609375, 4.312021255493164, 4.609954833984375, 8.011367797851562, 92.87619018554688, 70.6209716796875, 13.160099029541016, -0.3267250061035156, -5.795436859130859, 76.13960266113281, 22.621389389038086, 13.12847900390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000250.npy"}
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 17.87444496154785, "std": 39.42509841918945, "min": -78.31022644042969, "p10": -31.104537963867188, "median": 14.916162490844727, "p90": 67.91232757568362, "max": 98.66606903076172, "pos_frac": 0.703125, "sample": [1.7161273956298828, 89.33130645751953, 90.20511627197266, -0.14834976196289062, 42.63475036621094, 51.607269287109375, -78.31022644042969, 22.833148956298828, 29.63268280029297, 98.66606903076172, 3.5010757446289062, -68.17377471923828, 5.5182342529296875, -28.72681427001953, 8.914976119995117, -21.913387298583984, 46.83054733276367, 93.53463745117188, 11.434514999389648, -60.66221618652344, -30.391624450683594, 87.5988540649414, -2.2305660247802734, -36.5687255859375, 30.36663818359375, -15.412582397460938, 14.700607299804688, 60.735565185546875, -13.439308166503906, 0.6563568115234375, 74.81812286376953, 60.574310302734375, 15.131717681884766, 24.80600929260254, 42.59595489501953, 27.544029235839844, 38.08837890625, 37.23396301269531, -50.24552917480469, -0.9326152801513672, 52.02359390258789, 2.891143798828125, 57.900856018066406, 13.656627655029297, 11.278091430664062, 33.787384033203125, 13.174041748046875, 12.080345153808594, 22.609176635742188, 36.809234619140625, 70.98808288574219, 3.4530487060546875, 53.35074234008789, -26.228126525878906, -4.147224426269531, -1.3946819305419922, 44.13786697387695, -34.392974853515625, 26.88936424255371, -31.410072326660156, -21.26513671875, 29.796829223632812, 56.60633850097656, 17.31467628479004], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000251.npy"}
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 16.565898895263672, "std": 42.84104919433594, "min": -93.19898986816406, "p10": -46.13313217163085, "median": 11.667562484741211, "p90": 75.32946395874023, "max": 101.07369995117188, "pos_frac": 0.734375, "sample": [31.394363403320312, -65.31033325195312, -11.730112075805664, -54.9310302734375, 43.371063232421875, -16.98627471923828, 7.089176177978516, 45.25242614746094, 0.3752937316894531, 88.67069244384766, 75.402099609375, 50.27946472167969, 11.50289535522461, 1.6265335083007812, 95.02557373046875, -13.097999572753906, 88.8908462524414, -54.98534393310547, 9.050621032714844, -0.6071319580078125, -54.08331298828125, 15.423791885375977, 28.617111206054688, 9.124095916748047, 50.14604187011719, 6.74639892578125, 11.832229614257812, -20.558883666992188, 71.326904296875, 6.307353973388672, 46.250450134277344, 98.63609313964844, 64.05184936523438, 8.771347045898438, 32.222312927246094, 5.0311737060546875, 16.512298583984375, 16.16242790222168, 29.007165908813477, -32.32168960571289, 43.79129409790039, -22.049644470214844, 1.163665771484375, 61.268672943115234, 2.4609222412109375, 101.07369995117188, -32.46282196044922, 5.084327697753906, -93.19898986816406, -59.79728317260742, -14.852300643920898, 6.8029022216796875, 62.67375946044922, -51.99183654785156, 6.561103820800781, 75.75646209716797, 26.994308471679688, 75.15998077392578, 18.716201782226562, 46.30266571044922, 44.901611328125, 19.462966918945312, -15.897642135620117, 12.805557250976562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000252.npy"}
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 19.39974594116211, "std": 37.86435317993164, "min": -63.951290130615234, "p10": -29.267375755310056, "median": 14.020355224609375, "p90": 73.53634185791016, "max": 96.2530288696289, "pos_frac": 0.6875, "sample": [52.81012725830078, 1.4933624267578125, 89.08427429199219, -2.6926956176757812, -1.9097728729248047, 31.267993927001953, 19.308189392089844, 66.10861206054688, 54.48223876953125, 21.063859939575195, 50.166839599609375, -6.746665954589844, 62.0198974609375, 1.83416748046875, -2.788389205932617, -63.951290130615234, -15.610931396484375, 38.78826141357422, -34.24699401855469, 48.446434020996094, 37.67698669433594, 47.24037170410156, -47.38978958129883, 45.19806671142578, 16.588489532470703, 0.554412841796875, 73.72457885742188, 14.783309936523438, -3.805267333984375, 33.600738525390625, 10.694320678710938, 6.892303466796875, 69.36924743652344, -4.387584686279297, 7.8202972412109375, 96.22164916992188, 28.328720092773438, -57.68122100830078, 8.692092895507812, 29.667268753051758, 6.597923278808594, -10.014991760253906, 16.026504516601562, -3.8580265045166016, 85.72335052490234, -27.236928939819336, -14.604520797729492, 29.677749633789062, 88.83109283447266, 22.632579803466797, 13.725265502929688, 76.35816955566406, 96.2530288696289, 5.032188415527344, 73.09712219238281, -30.178680419921875, 14.315444946289062, -27.77536964416504, 54.716407775878906, -29.90680694580078, -34.29335021972656, -4.148859024047852, 8.93206787109375, 8.96585464477539], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000253.npy"}
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 23.414003372192383, "std": 39.18019485473633, "min": -72.78490447998047, "p10": -15.973787689208983, "median": 10.506586074829102, "p90": 81.93723220825197, "max": 95.47280883789062, "pos_frac": 0.671875, "sample": [-3.6652984619140625, 19.765274047851562, 74.68292236328125, 4.1943511962890625, -7.327667236328125, 61.913307189941406, 53.080318450927734, 76.471435546875, 4.928781509399414, 0.04548072814941406, -2.7795486450195312, 9.346019744873047, 48.41645050048828, 68.16374206542969, 64.78777313232422, 69.96131896972656, 95.47280883789062, 19.955345153808594, -16.49542999267578, 49.968406677246094, -7.194984436035156, 94.35052490234375, -10.706764221191406, 86.87447357177734, -6.856422424316406, -72.78490447998047, -12.642745971679688, 92.3299331665039, 21.997465133666992, -1.7661628723144531, 92.6592025756836, -20.753021240234375, 5.48480224609375, -2.344318389892578, -1.7598991394042969, 30.17668914794922, 3.343891143798828, 70.23310852050781, 0.6174526214599609, 47.26806640625, 11.667152404785156, -14.756622314453125, 16.051860809326172, -22.870582580566406, -12.244329452514648, 90.51181030273438, 2.80609130859375, 84.27971649169922, 25.96435546875, 16.71829605102539, 14.771743774414062, -19.488433837890625, -11.170814514160156, 76.00660705566406, 68.28451538085938, -4.982414245605469, 25.019386291503906, 3.2544517517089844, 62.87916564941406, 41.29154968261719, 6.984235763549805, -17.907638549804688, 6.803619384765625, -50.789588928222656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000254.npy"}
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 26.04759979248047, "std": 43.45039749145508, "min": -77.9261245727539, "p10": -19.103435516357415, "median": 18.22115135192871, "p90": 93.84352340698243, "max": 112.01153564453125, "pos_frac": 0.734375, "sample": [88.3370361328125, 21.766265869140625, 112.01153564453125, 28.060577392578125, 68.61711120605469, 44.010223388671875, -3.2878265380859375, -13.226318359375, 12.41330337524414, 70.43061065673828, -40.832435607910156, -1.1890296936035156, 20.400096893310547, 47.116676330566406, -46.05760192871094, 6.211238861083984, 43.45384979248047, 11.942329406738281, 16.042205810546875, -77.9261245727539, 90.32518005371094, -6.51910400390625, 12.544815063476562, 8.763839721679688, 95.53057861328125, 49.50489044189453, -48.27166748046875, 20.666278839111328, 98.8196029663086, -7.813720703125, 91.37714385986328, 63.20665740966797, -21.62220001220703, 42.24335861206055, 4.58087158203125, 13.614103317260742, 8.642419815063477, 41.61399841308594, 98.65937042236328, 9.426107406616211, 4.325201034545898, 41.844146728515625, 75.60989379882812, -1.7577667236328125, 24.8952579498291, -1.8585014343261719, 14.974143981933594, 104.2459716796875, -51.523353576660156, 30.548439025878906, 2.4606246948242188, 95.33236694335938, 13.018791198730469, 94.90054321289062, 36.09535217285156, 30.604461669921875, 2.4181251525878906, 52.033512115478516, 68.02456665039062, -71.84513854980469, -0.29974365234375, -1.2932662963867188, 32.13623809814453, -1.4297103881835938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000255.npy"}
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 25.631669998168945, "std": 45.92692947387695, "min": -96.86707305908203, "p10": -21.711415672302245, "median": 23.380462646484375, "p90": 89.12755355834962, "max": 103.8328857421875, "pos_frac": 0.6875, "sample": [-22.23180389404297, 29.422409057617188, 52.83232879638672, 38.850154876708984, 49.09626770019531, -30.49614715576172, -22.095056533813477, 93.33113098144531, 36.942222595214844, -2.270261764526367, -46.58392333984375, 2.0331859588623047, -17.73486328125, 100.53916931152344, 13.532562255859375, 100.81517791748047, 2.0300064086914062, 7.677825927734375, -96.86707305908203, 55.73572540283203, 4.105831146240234, 44.36297607421875, -13.65620231628418, 16.345603942871094, -18.02002716064453, -73.08654022216797, 37.05841064453125, -8.054801940917969, 32.95484161376953, 71.79637145996094, -3.6760711669921875, 42.07642364501953, 31.83527374267578, 87.60441589355469, 82.49659729003906, 31.39238739013672, 87.42900085449219, 80.32849884033203, -8.622940063476562, 94.65129089355469, 22.589988708496094, 39.416229248046875, 103.8328857421875, -20.816253662109375, 82.89205932617188, 24.170936584472656, 79.43293762207031, 4.5008392333984375, 21.21510124206543, 90.27740478515625, -17.312522888183594, 80.07848358154297, 4.232940673828125, 5.5673370361328125, 67.80044555664062, -20.35983657836914, 46.201446533203125, -8.326255798339844, -0.9303989410400391, 89.78032684326172, -66.89196014404297, 22.392837524414062, -16.274070739746094, 43.07567596435547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000256.npy"}
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 28.95406150817871, "std": 45.756534576416016, "min": -88.98699188232422, "p10": -13.711820983886716, "median": 15.21267318725586, "p90": 95.53549270629884, "max": 129.58975219726562, "pos_frac": 0.71875, "sample": [-0.7930583953857422, 13.657600402832031, -10.618484497070312, 129.58975219726562, 103.02139282226562, 32.98955154418945, 5.068758010864258, 62.75177001953125, 29.33782196044922, 67.00341033935547, 96.882568359375, 11.244743347167969, 4.228158950805664, -6.801361083984375, 73.23681640625, 68.98035430908203, 93.05091857910156, 1.178863525390625, -26.501190185546875, -4.994132995605469, 3.725290298461914, 32.406593322753906, 30.200439453125, 12.697845458984375, 38.644203186035156, -3.1922454833984375, -2.6398658752441406, 96.15924835205078, 66.49188995361328, -1.1102371215820312, 78.57789611816406, 93.74024963378906, 12.064773559570312, 3.7928848266601562, 32.976951599121094, 12.783988952636719, -15.03753662109375, 93.1337661743164, 107.08189392089844, -61.732364654541016, -88.98699188232422, 80.55386352539062, 30.391571044921875, 30.786415100097656, 2.5820465087890625, -32.604156494140625, -1.1905441284179688, -1.8930206298828125, 44.49712371826172, 56.58222198486328, 59.81199645996094, 7.8162384033203125, 97.8814926147461, 17.226444244384766, -63.910736083984375, 96.78561401367188, -1.3956527709960938, -31.404685974121094, 94.08006286621094, 9.5262451171875, 51.95197677612305, 16.767745971679688, 11.045608520507812, -7.120765686035156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000257.npy"}
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 25.243789672851562, "std": 34.35588455200195, "min": -66.11866760253906, "p10": -3.3421159744262696, "median": 23.9349365234375, "p90": 70.2313026428223, "max": 107.14361572265625, "pos_frac": 0.875, "sample": [36.54313659667969, 4.210765838623047, 34.91017150878906, -3.374685287475586, 7.032295227050781, 81.23950958251953, 102.77987670898438, 24.1060791015625, 57.658287048339844, 2.1009883880615234, 12.648168563842773, 24.137191772460938, -57.6239013671875, 22.918649673461914, 40.67536163330078, 57.578704833984375, -3.2661209106445312, -10.454936981201172, 35.36553955078125, -53.032752990722656, 3.5466651916503906, 39.50733947753906, -50.92377471923828, 31.78729248046875, 9.547470092773438, 48.38213348388672, 17.074447631835938, 42.69970703125, 18.430133819580078, 15.563980102539062, 19.37908935546875, 4.016040802001953, 53.83454132080078, 60.18573760986328, 17.720247268676758, 6.8891448974609375, 6.2661895751953125, 14.603317260742188, 32.656768798828125, 107.14361572265625, 32.50346374511719, -9.026947021484375, 96.51329040527344, 42.336578369140625, 28.519058227539062, 89.51243591308594, 50.3626708984375, 46.139652252197266, 3.4903335571289062, 76.8086929321289, 3.019195556640625, 24.209182739257812, 41.40514373779297, 26.70751953125, 13.293624877929688, 8.285654067993164, 23.7637939453125, -66.11866760253906, 6.3646240234375, 2.977540969848633, 33.181884765625, 43.05775451660156, 74.53654479980469, 9.297140121459961], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000258.npy"}
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 38.41786193847656, "std": 45.8281364440918, "min": -65.25698852539062, "p10": -28.660602188110328, "median": 45.47842216491699, "p90": 96.60394439697266, "max": 113.09011840820312, "pos_frac": 0.796875, "sample": [-3.74713134765625, -56.88665771484375, -60.96485900878906, 77.69532775878906, 26.66546630859375, 91.14604187011719, 94.15385437011719, 88.10995483398438, 51.61192321777344, -65.25698852539062, 61.30323791503906, -1.7122039794921875, 30.83457374572754, 73.292724609375, 56.853981018066406, -4.2706451416015625, -38.31169891357422, 4.086112976074219, 104.334228515625, 25.370140075683594, 52.47074508666992, 73.81362915039062, 87.4024658203125, 95.70039367675781, 59.83277130126953, -6.141376495361328, -39.453887939453125, 29.612045288085938, 8.567352294921875, 9.296449661254883, 75.93356323242188, 62.67469787597656, -3.1953048706054688, 27.961631774902344, 62.705535888671875, 41.40437316894531, 98.80956268310547, 2.6258087158203125, 77.83224487304688, 49.32170104980469, 96.99118041992188, 47.36655807495117, 25.429824829101562, 113.09011840820312, 6.578704833984375, 5.16058349609375, -41.73143768310547, -2.758535385131836, -59.06608581542969, 1.2115097045898438, 104.44584655761719, 100.47529602050781, 71.14219665527344, 56.744384765625, 48.71046447753906, 65.28250122070312, 35.37841033935547, 33.50823974609375, 12.91598129272461, 38.369789123535156, 71.91929626464844, 100.5312271118164, 61.97510528564453, 43.59028625488281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000259.npy"}
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 29.661205291748047, "std": 47.596275329589844, "min": -80.23240661621094, "p10": -27.90146903991699, "median": 20.00823974609375, "p90": 94.71019287109375, "max": 103.97586822509766, "pos_frac": 0.671875, "sample": [97.71149444580078, -42.058509826660156, 71.08592224121094, -21.03689193725586, 2.0872535705566406, 101.12249755859375, 29.971893310546875, 19.928382873535156, 98.89659118652344, -42.98284912109375, 89.69158935546875, 56.984962463378906, 85.8205337524414, 52.14729309082031, 74.7112808227539, -11.103343963623047, 93.68559265136719, 51.88435363769531, 16.35761070251465, 100.47148132324219, 77.3335952758789, 11.814620971679688, 99.67071533203125, 94.53768157958984, -9.256568908691406, 19.58188247680664, 78.95616149902344, 94.78412628173828, -2.1237716674804688, -4.149871826171875, 9.360145568847656, 12.387939453125, 74.21431732177734, 20.088096618652344, -0.22643089294433594, 47.0532341003418, 18.187057495117188, -20.442726135253906, -80.23240661621094, 23.21465301513672, 1.8558235168457031, 49.46765899658203, -0.41182518005371094, -3.4188079833984375, 18.807228088378906, 51.791221618652344, 103.97586822509766, -19.900253295898438, 91.39447021484375, -16.044540405273438, -48.03253936767578, 56.859130859375, 20.720947265625, 4.9098968505859375, 70.40064239501953, 47.40393829345703, -40.558837890625, 79.89167785644531, 58.9599609375, -29.448745727539062, -0.3908042907714844, -64.38994598388672, -24.291156768798828, -1.3635177612304688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000260.npy"}
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 25.013710021972656, "std": 41.558292388916016, "min": -53.43719482421875, "p10": -23.13259162902832, "median": 16.971221923828125, "p90": 89.10440673828127, "max": 107.59827423095703, "pos_frac": 0.671875, "sample": [3.5273513793945312, 27.911529541015625, 35.389686584472656, 4.652425765991211, -12.529861450195312, 53.083213806152344, -0.9472827911376953, 100.43907165527344, 21.559616088867188, -13.655771255493164, -27.535446166992188, -10.157211303710938, -4.455787658691406, 95.70997619628906, 64.32427978515625, -12.485305786132812, -1.8083724975585938, 103.17985534667969, 3.484292984008789, 95.19670104980469, 10.709442138671875, 35.5888671875, -20.545955657958984, -29.655181884765625, 83.40557861328125, 40.167457580566406, 107.59827423095703, 27.609371185302734, -7.149528503417969, 30.259065628051758, 38.640323638916016, 59.8350830078125, 46.60326385498047, 90.88873291015625, 29.8591251373291, -5.538244247436523, 91.31132507324219, -53.43719482421875, -32.082611083984375, 69.3103256225586, 24.148765563964844, 4.965276718139648, 61.01902389526367, 68.98938751220703, 11.071857452392578, -51.167144775390625, 2.4894638061523438, -24.24114990234375, -51.744972229003906, 9.288776397705078, 31.4271240234375, -10.306304931640625, 68.77235412597656, 84.94097900390625, 17.703948974609375, -16.431121826171875, 42.79056930541992, 16.238494873046875, -3.290283203125, 68.91415405273438, 10.884315490722656, 81.892822265625, 15.258377075195312, -0.9976329803466797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000261.npy"}
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 25.855836868286133, "std": 39.66415786743164, "min": -56.41230010986328, "p10": -30.486758422851562, "median": 20.947410583496094, "p90": 78.19427490234376, "max": 103.96745300292969, "pos_frac": 0.71875, "sample": [-36.37547302246094, 43.55513000488281, 8.938423156738281, 0.48550987243652344, -1.6834869384765625, 73.91355895996094, 7.811279296875, 5.5765228271484375, 21.287017822265625, 83.57109069824219, -5.1761322021484375, 67.67579650878906, -33.326438903808594, 7.541139602661133, 62.11929702758789, -38.126224517822266, 32.449676513671875, 87.838134765625, 31.549560546875, 45.34589767456055, -56.41230010986328, -5.112751007080078, 2.9836196899414062, 6.1011962890625, 26.372024536132812, 67.50532531738281, 9.2413330078125, 103.96745300292969, -35.590545654296875, 10.773067474365234, 79.55593872070312, 49.858543395996094, 20.607803344726562, 74.94277954101562, -4.578468322753906, 93.10884857177734, 7.501436233520508, 24.711437225341797, 98.82046508789062, -1.5821304321289062, -2.5035762786865234, -31.380889892578125, -21.938682556152344, 68.71282958984375, 73.95870971679688, 41.633087158203125, 43.91108703613281, 6.13749885559082, -28.40045166015625, 23.874370574951172, 56.658599853515625, -3.4009933471679688, 70.3967514038086, -4.941171646118164, 41.42375946044922, 30.555999755859375, 75.01705932617188, 81.46897888183594, 3.7514495849609375, 58.01756286621094, 6.853912353515625, -12.054130554199219, 71.14324188232422, -31.86675262451172], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000262.npy"}
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 24.195327758789062, "std": 40.73281478881836, "min": -80.85728454589844, "p10": -18.064022827148435, "median": 18.914735794067383, "p90": 74.92969665527343, "max": 100.47724914550781, "pos_frac": 0.734375, "sample": [35.473026275634766, 56.98233413696289, -40.94188690185547, -10.65496826171875, -9.251771926879883, -12.648948669433594, 71.0782470703125, 2.7869720458984375, 71.88261413574219, 73.74952697753906, 42.39874267578125, 1.3225154876708984, -2.4699783325195312, 44.53500747680664, 0.0484466552734375, -39.96082305908203, 75.75499725341797, -8.462234497070312, 12.883041381835938, -10.126518249511719, 43.849090576171875, 74.82637786865234, 98.78416442871094, 25.774524688720703, 3.437286376953125, 1.686727523803711, 77.1487045288086, 53.90782165527344, 74.39481353759766, 64.60472869873047, 97.17169189453125, 16.766326904296875, -25.537796020507812, 6.226318359375, -7.5885009765625, 33.29315948486328, -80.85728454589844, 63.562503814697266, 33.71034240722656, 73.51171875, 16.155853271484375, -5.413658142089844, -13.980728149414062, 74.9739761352539, -19.32007598876953, 5.2657318115234375, 28.243667602539062, 68.21565246582031, 45.381614685058594, 21.06314468383789, 0.0752716064453125, 100.47724914550781, 3.1892242431640625, 4.774684906005859, 26.840850830078125, -21.31138038635254, 27.28936004638672, 66.07032012939453, 11.964710235595703, -15.133232116699219, 4.8085784912109375, -76.82987213134766, 81.2135009765625, 31.435546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000263.npy"}
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 26.922195434570312, "std": 42.82482147216797, "min": -83.33892822265625, "p10": -24.921714782714844, "median": 22.136302947998047, "p90": 83.30488204956055, "max": 108.16180419921875, "pos_frac": 0.71875, "sample": [46.649688720703125, -32.384490966796875, 21.882728576660156, 0.6918659210205078, 1.482583999633789, -13.736764907836914, 4.6143646240234375, -0.17629241943359375, 57.28145217895508, 25.721527099609375, -35.56373596191406, 4.005598068237305, -83.33892822265625, 48.80944061279297, -21.425308227539062, 22.389877319335938, 52.933128356933594, 58.34825134277344, 13.162406921386719, 79.41419982910156, 69.29854583740234, 43.018577575683594, 83.74313354492188, 40.233978271484375, 46.69743347167969, 108.16180419921875, -1.6680660247802734, 21.810317993164062, -11.78302001953125, 39.11027526855469, 8.356658935546875, 0.7359542846679688, 73.90715789794922, 1.600006103515625, -10.624340057373047, 30.785640716552734, 94.56480407714844, -21.57508087158203, 66.57108306884766, 100.09410095214844, -25.583572387695312, -14.896476745605469, 20.472190856933594, 98.94157409667969, 54.10991668701172, 69.32170104980469, -48.62422180175781, 45.024200439453125, -34.32014465332031, 17.340713500976562, -23.37738037109375, 2.0148849487304688, 99.6327133178711, 39.48849105834961, 46.00156021118164, -33.66714859008789, 19.315025329589844, 68.08575439453125, 29.8868408203125, -1.2774467468261719, -3.8739013671875, 107.07237243652344, 75.84999084472656, 82.28229522705078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000264.npy"}
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 31.338701248168945, "std": 50.79274368286133, "min": -91.19100952148438, "p10": -31.93230361938476, "median": 22.790732383728027, "p90": 99.1508918762207, "max": 109.97639465332031, "pos_frac": 0.75, "sample": [10.109649658203125, 2.649374008178711, -3.6766281127929688, -33.731719970703125, 8.493593215942383, -2.1880264282226562, 31.13579559326172, 19.173439025878906, -43.20738220214844, 98.86329650878906, 33.16908264160156, 14.883682250976562, 41.86421203613281, 60.85725402832031, -91.19100952148438, -8.819526672363281, 49.541831970214844, 1.9588127136230469, 13.953458786010742, -72.10899353027344, 19.9073486328125, 83.23284912109375, 90.54790496826172, 23.303281784057617, 5.474206924438477, -10.165843963623047, 83.12229919433594, 7.430564880371094, 35.66755676269531, 15.687280654907227, 104.32513427734375, 109.97639465332031, -27.733665466308594, 96.54136657714844, 62.172645568847656, 103.07942962646484, 2.3725852966308594, 12.534618377685547, -5.677345275878906, -21.928939819335938, 32.756690979003906, 88.88922119140625, 97.00259399414062, 100.04288482666016, 99.2741470336914, 77.0179443359375, 102.97925567626953, 44.505577087402344, 15.008277893066406, -49.27836608886719, 22.278182983398438, 40.148834228515625, -10.411174774169922, 96.25212097167969, 87.70927429199219, 92.80108642578125, 104.40789794921875, -68.17961120605469, 45.611419677734375, -6.999752044677734, 63.313133239746094, 4.809867858886719, 68.36016082763672, -64.22273254394531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000265.npy"}
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 28.80366325378418, "std": 44.79357147216797, "min": -84.63312530517578, "p10": -14.632535171508787, "median": 25.24304962158203, "p90": 93.52221221923828, "max": 120.42999267578125, "pos_frac": 0.75, "sample": [-9.188301086425781, 96.70143127441406, 94.43305206298828, 4.274778366088867, 42.95049285888672, -1.0333251953125, -5.408805847167969, -52.35481262207031, 3.1956958770751953, 31.2293701171875, 36.610965728759766, 5.9723358154296875, 6.700462341308594, 55.490623474121094, 53.46869659423828, 60.18867874145508, 5.8707427978515625, 120.42999267578125, -15.516956329345703, -12.568885803222656, 101.4044189453125, 25.288742065429688, 1.2013893127441406, 104.6881103515625, 42.70500946044922, 103.53251647949219, 5.288330078125, 79.9581298828125, 41.50568389892578, -21.22460174560547, 91.39691925048828, 31.13951873779297, 90.72254180908203, -1.6652984619140625, 39.76802062988281, -5.488029479980469, 38.351959228515625, 54.17591857910156, 64.54339599609375, -3.9624366760253906, 1.824127197265625, -41.36985778808594, -40.31388854980469, 7.603260040283203, -84.63312530517578, 11.805984497070312, 25.197357177734375, 10.990985870361328, 33.299415588378906, 69.97515106201172, -60.456886291503906, 2.937196731567383, -5.992340087890625, 35.132110595703125, 84.35355377197266, 44.211856842041016, 105.06330108642578, -3.8297252655029297, 70.26759338378906, 5.638580322265625, 71.7762680053711, 3.9356861114501953, 90.4912109375, 0.7501564025878906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000266.npy"}
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 22.094219207763672, "std": 46.121028900146484, "min": -103.14825439453125, "p10": -27.37469806671142, "median": 20.203733444213867, "p90": 87.16893539428712, "max": 110.94122314453125, "pos_frac": 0.703125, "sample": [65.85939025878906, 0.6346549987792969, 1.6095542907714844, -77.91496276855469, 4.17938232421875, -15.928844451904297, 73.7998046875, 98.2391357421875, 65.86415100097656, 1.8620567321777344, -5.960580825805664, 23.357864379882812, 4.112827301025391, 107.70699310302734, 2.2887420654296875, -45.675621032714844, 28.995391845703125, -42.24625015258789, -13.671401977539062, -85.77131652832031, -29.36280632019043, -16.006629943847656, -6.024833679199219, 47.1285285949707, 70.52851867675781, 85.35784149169922, -0.07407379150390625, 11.73828125, 72.86599731445312, -22.73577880859375, 33.74384689331055, 1.0465583801269531, 86.26920318603516, 50.25227737426758, -15.561843872070312, 25.222888946533203, 35.40971374511719, -5.57135009765625, 1.5938873291015625, 110.94122314453125, 23.406906127929688, 0.2681903839111328, 77.25277709960938, -1.3777999877929688, 87.55453491210938, 56.43240737915039, 93.80652618408203, 25.045326232910156, -103.14825439453125, 90.7392349243164, 24.907249450683594, -34.16644287109375, -2.024843215942383, 21.263294219970703, 23.288612365722656, 12.921390533447266, 52.30278015136719, 45.33802032470703, 89.03284454345703, 24.103256225585938, 3.654888153076172, 57.57621383666992, 19.14417266845703, -1.3937206268310547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000267.npy"}
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 27.803451538085938, "std": 38.53549575805664, "min": -53.061492919921875, "p10": -8.480500030517575, "median": 16.72037124633789, "p90": 89.43621520996095, "max": 112.7093505859375, "pos_frac": 0.65625, "sample": [9.353012084960938, 108.0220718383789, -0.27391624450683594, -6.2658538818359375, 75.93531799316406, 92.72216033935547, 28.000076293945312, -5.770072937011719, -19.766597747802734, 39.54195785522461, 78.24528503417969, -0.5949859619140625, -1.0620059967041016, 12.856597900390625, 16.84046173095703, 13.636211395263672, 45.70448303222656, -12.830810546875, -0.8337860107421875, 86.07534790039062, 58.73045349121094, 65.13090515136719, -1.9763813018798828, 5.8970489501953125, -5.897331237792969, 19.7933349609375, 13.029022216796875, 52.87646484375, 11.736923217773438, -2.118133544921875, 82.16087341308594, 4.087810516357422, 22.285335540771484, 17.897964477539062, -9.988143920898438, 92.11275482177734, -3.5968704223632812, -10.75973129272461, -43.232879638671875, 17.262331008911133, 59.46998596191406, 6.49049186706543, -1.1573410034179688, 82.70303344726562, 52.788597106933594, 44.418548583984375, 90.8765869140625, -1.7247543334960938, 16.60028076171875, -53.061492919921875, -2.384613037109375, -9.429634094238281, 48.76678466796875, -0.7712764739990234, 112.7093505859375, 56.79083251953125, 22.98296356201172, 28.136478424072266, 99.61380004882812, 33.832950592041016, -0.8628635406494141, 101.342041015625, 42.73308181762695, 3.590351104736328], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000268.npy"}
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 31.73358917236328, "std": 43.437705993652344, "min": -80.75077819824219, "p10": -7.366401863098145, "median": 20.94127655029297, "p90": 95.64515380859376, "max": 119.61843872070312, "pos_frac": 0.75, "sample": [-2.786224365234375, -31.338912963867188, 7.5539703369140625, 90.7620849609375, 27.962554931640625, -3.669342041015625, 24.570297241210938, 86.7757797241211, 12.171775817871094, -5.8815765380859375, -8.846549987792969, 24.423866271972656, 47.98463439941406, 29.223129272460938, 102.57284545898438, 92.71878051757812, -3.0712547302246094, 43.944801330566406, 115.9702377319336, -56.6630859375, 26.031158447265625, -5.614166259765625, 85.9657974243164, 23.13372802734375, 3.9451904296875, 98.10821533203125, -22.959741592407227, 119.61843872070312, 47.565643310546875, 94.08219146728516, 18.62795066833496, 16.135799407958984, -7.227743148803711, 63.12190246582031, -7.4258270263671875, 3.6628246307373047, 2.1917762756347656, 32.53804016113281, -0.31775665283203125, 79.86695861816406, 96.79627990722656, 46.785709381103516, 8.701072692871094, 7.851625442504883, 96.31499481201172, -25.944915771484375, 56.927642822265625, 15.290260314941406, -0.9962387084960938, 4.3122100830078125, 79.21743774414062, 0.4126319885253906, 98.89192199707031, 12.645042419433594, -2.4786148071289062, -80.75077819824219, 9.56744384765625, 65.12559509277344, 33.73832702636719, 53.81561279296875, 74.18012237548828, 18.748825073242188, 80.85096740722656, 15.518373489379883], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000269.npy"}
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 24.839338302612305, "std": 44.98329544067383, "min": -93.9263916015625, "p10": -9.238596153259277, "median": 15.03941535949707, "p90": 92.69366378784181, "max": 111.9880142211914, "pos_frac": 0.765625, "sample": [48.148956298828125, 103.9610824584961, -9.711090087890625, 109.12498474121094, 44.15134048461914, 104.76544189453125, 6.978181838989258, 28.27670669555664, 61.00152587890625, 22.090850830078125, 111.9880142211914, 69.22720336914062, 13.982505798339844, 60.27958679199219, 2.924457550048828, -44.235862731933594, 24.701828002929688, 47.4737434387207, 17.419727325439453, 64.81422424316406, 30.96527099609375, 64.59303283691406, 10.274887084960938, -44.59322738647461, -6.01318359375, 5.5860137939453125, 80.72715759277344, -89.0355224609375, 26.787616729736328, 9.258949279785156, 9.685455322265625, 8.959060668945312, 4.055274963378906, 34.346214294433594, -4.783729553222656, 4.817192077636719, 101.3560562133789, -7.8325653076171875, 15.330318450927734, -93.9263916015625, 95.92544555664062, 69.10063934326172, 79.48295593261719, -8.136110305786133, 8.931221008300781, 23.802474975585938, -2.5713653564453125, 6.1286773681640625, 94.93626403808594, 38.95171356201172, 87.46092987060547, 0.8685226440429688, -15.121017456054688, 56.8343505859375, 1.8236007690429688, -7.109260559082031, -2.9139938354492188, -79.30228424072266, 48.56489562988281, 7.975982666015625, 14.748512268066406, -3.3733139038085938, 8.145355224609375, 16.642135620117188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000270.npy"}
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 20.922527313232422, "std": 51.06695556640625, "min": -84.51539611816406, "p10": -27.307888984680176, "median": 4.97218132019043, "p90": 100.3957389831543, "max": 136.63760375976562, "pos_frac": 0.609375, "sample": [24.441253662109375, 44.09333801269531, -7.0973052978515625, 8.160469055175781, 106.33312225341797, 78.56119537353516, 14.210155487060547, 2.834339141845703, 95.65966796875, -42.889015197753906, 49.78813171386719, -4.411592483520508, 121.47439575195312, -17.843883514404297, 57.92512512207031, 99.57249450683594, 2.2111568450927734, -4.7050323486328125, 0.7894630432128906, 111.27653503417969, 9.516990661621094, 2.5710372924804688, 79.7791748046875, -7.0672149658203125, -11.6563720703125, 0.48883056640625, -27.443618774414062, 53.14418029785156, -4.7007598876953125, -26.991186141967773, 7.548307418823242, 68.75654602050781, 16.771377563476562, -78.44049835205078, -2.085052490234375, 87.39839172363281, 18.802406311035156, 3.7975387573242188, 67.73273468017578, 100.7485580444336, -2.4877548217773438, -45.430625915527344, 136.63760375976562, -44.96240234375, -7.499725341796875, 11.302108764648438, -4.3068084716796875, -73.48285675048828, -84.51539611816406, 4.928550720214844, 21.874439239501953, -15.541053771972656, 95.99859619140625, 104.94540405273438, 48.07538604736328, -3.59490966796875, -26.977203369140625, 46.69768524169922, -9.681449890136719, 5.015811920166016, -15.009981155395508, -24.230606079101562, 112.44496154785156, 9.786514282226562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000271.npy"}
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 41.886714935302734, "std": 40.69086456298828, "min": -58.64286422729492, "p10": -2.594492530822752, "median": 44.075204849243164, "p90": 102.74307632446292, "max": 116.04534912109375, "pos_frac": 0.84375, "sample": [14.374900817871094, -3.2889480590820312, -10.796417236328125, 54.92007064819336, 51.38860321044922, 108.56204223632812, 21.814910888671875, 94.79615783691406, 91.74507904052734, 7.551887512207031, 78.06905364990234, 43.691951751708984, 14.136688232421875, 1.3725662231445312, 60.34529113769531, 46.34990692138672, 62.06151580810547, 3.6558837890625, 76.0582275390625, 52.07796096801758, 94.23258972167969, 14.008052825927734, 69.73008728027344, 67.07852172851562, 112.66798400878906, 14.047004699707031, 49.460479736328125, -26.14159393310547, 14.455619812011719, 110.49951934814453, 44.458457946777344, 110.09386444091797, 73.51611328125, 0.4474220275878906, 54.25084686279297, 20.985057830810547, 22.118160247802734, 0.5500679016113281, 26.770103454589844, -58.64286422729492, 79.11457061767578, 109.4012222290039, 116.04534912109375, 45.67858123779297, 1.6719970703125, 11.859943389892578, 97.25971984863281, 56.019996643066406, 57.3446159362793, 79.61202239990234, 31.526906967163086, 87.70182800292969, -7.669700622558594, -0.0373992919921875, 14.703132629394531, 23.482269287109375, 57.472572326660156, 39.597381591796875, -10.10086441040039, -0.649871826171875, -5.222871780395508, -0.9740962982177734, 105.09308624267578, 8.346429824829102], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000272.npy"}
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 28.892606735229492, "std": 56.62029266357422, "min": -110.74983215332031, "p10": -45.37425231933594, "median": 30.93480110168457, "p90": 98.0315383911133, "max": 153.52694702148438, "pos_frac": 0.734375, "sample": [0.4983482360839844, 54.233551025390625, 100.14776611328125, -46.09857177734375, 68.222900390625, -11.435600280761719, -73.50692749023438, 70.99774169921875, -92.75494384765625, 52.51194763183594, 47.83341979980469, -43.684173583984375, 46.74468994140625, -2.450155258178711, 5.164741516113281, 33.2877082824707, 103.35857391357422, 19.570472717285156, 83.89862060546875, 90.70516204833984, 83.1780776977539, 7.170001983642578, -5.535957336425781, 1.5917243957519531, -110.74983215332031, 84.26696014404297, 52.36973571777344, 2.1374893188476562, -28.554941177368164, 2.4652137756347656, 61.94921875, 153.52694702148438, 93.09367370605469, -98.1240234375, -1.1003265380859375, 54.45320129394531, 63.937191009521484, 59.03433609008789, -1.7701950073242188, 85.62306213378906, -71.33576202392578, 28.581893920898438, 66.1595687866211, -16.726661682128906, 38.95195388793945, 130.9662628173828, 113.29777526855469, 60.559165954589844, 9.866287231445312, 69.66743469238281, 1.8408336639404297, 25.960975646972656, 1.038259506225586, -11.099960327148438, 15.99346923828125, 107.13678741455078, 0.6956787109375, 41.10718536376953, 26.21657943725586, -8.589744567871094, 71.7305908203125, 107.19441986083984, -71.83946228027344, 45.546470642089844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000273.npy"}
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 32.851715087890625, "std": 54.45930099487305, "min": -101.57942962646484, "p10": -31.259119415283195, "median": 27.233169555664062, "p90": 106.50573196411133, "max": 117.35857391357422, "pos_frac": 0.75, "sample": [39.328575134277344, -41.81611633300781, -7.187345504760742, 3.9125518798828125, -12.264816284179688, -70.25801086425781, 97.57106018066406, 97.91383361816406, 19.23017120361328, 10.259992599487305, 63.14459228515625, 90.04986572265625, -3.1914749145507812, -16.990142822265625, 48.78881072998047, 115.11082458496094, 108.71165466308594, -61.59510040283203, 12.736835479736328, 22.231037139892578, 107.35897827148438, 101.50149536132812, 60.851261138916016, 104.44039916992188, 59.834861755371094, 2.7578468322753906, 104.51482391357422, 109.28555297851562, 34.971351623535156, 6.3541107177734375, 4.375240325927734, 3.044342041015625, 96.16983032226562, -4.66326904296875, -18.919355392456055, 46.80833435058594, -69.07813262939453, -22.560073852539062, 80.00382995605469, 5.4702606201171875, -34.987281799316406, 43.65069580078125, 17.097332000732422, 17.465896606445312, -9.84428596496582, 28.338294982910156, 26.12804412841797, -101.57942962646484, 34.30231475830078, 32.826534271240234, 107.8652572631836, 100.11738586425781, 3.0798606872558594, 39.22282409667969, 8.766265869140625, -16.225191116333008, -66.05667114257812, 9.503185272216797, 110.9205322265625, 103.47058868408203, 71.99986267089844, 30.600440979003906, 117.35857391357422, 100.28023529052734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000274.npy"}
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 33.09941864013672, "std": 47.52848434448242, "min": -107.27685546875, "p10": -11.50954475402832, "median": 19.97644805908203, "p90": 106.07534790039064, "max": 119.31885528564453, "pos_frac": 0.765625, "sample": [26.7108154296875, 10.75520133972168, -11.191024780273438, 43.284217834472656, 101.865478515625, 15.911750793457031, 36.907623291015625, -107.27685546875, 111.26858520507812, 90.52398681640625, 11.986312866210938, 16.256820678710938, -0.8714828491210938, 107.87957763671875, -1.1028823852539062, 9.027042388916016, 80.14889526367188, 25.423583984375, 101.02442932128906, -25.43621826171875, -83.85246276855469, 113.02272033691406, 13.8720703125, 6.434715270996094, 61.43492889404297, -0.2397918701171875, -34.31230545043945, 9.231941223144531, -3.1842994689941406, 16.205810546875, 17.057968139648438, 119.31885528564453, 36.617835998535156, 42.63612365722656, 108.7188949584961, 61.12226104736328, 51.30609130859375, 15.881345748901367, -8.634735107421875, -0.517852783203125, 22.894927978515625, 16.298206329345703, 25.394126892089844, 1.473672866821289, 8.078516006469727, 114.0362319946289, 51.405548095703125, 87.66349029541016, -11.646053314208984, 61.23735427856445, 1.6356773376464844, 4.167304992675781, -3.7326297760009766, 12.738235473632812, 109.01854705810547, 90.7778091430664, -15.376140594482422, 59.91065216064453, 67.33911895751953, 87.8056640625, 52.22826385498047, 38.55982971191406, -27.032651901245117, 78.27098846435547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000275.npy"}
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 32.068294525146484, "std": 47.24830627441406, "min": -89.69258880615234, "p10": -17.602937316894526, "median": 25.30925464630127, "p90": 96.66237182617188, "max": 120.02006530761719, "pos_frac": 0.78125, "sample": [97.07575988769531, 8.441734313964844, -4.823453903198242, 62.151283264160156, 3.980752944946289, 24.60765838623047, 120.02006530761719, -64.85405731201172, 62.631309509277344, 18.829130172729492, 11.441692352294922, 24.945566177368164, 48.675819396972656, 73.33665466308594, -9.30645751953125, 109.32258605957031, -3.937633514404297, 2.7600860595703125, 20.676589965820312, 5.416423797607422, 33.17654037475586, 87.88976287841797, 88.8689193725586, 114.60154724121094, 9.356201171875, -46.88340759277344, 85.05292510986328, -30.34021759033203, 38.635589599609375, 95.69779968261719, 7.195354461669922, 8.291130065917969, -35.519615173339844, -89.69258880615234, 76.86970520019531, -20.328384399414062, 83.89066314697266, -6.530097961425781, 48.46044921875, 80.35820007324219, -58.65492248535156, 15.499931335449219, 105.23899841308594, 0.7907180786132812, -7.205268859863281, 36.684906005859375, 88.56888580322266, 38.110595703125, 90.10179138183594, 25.672943115234375, -6.889488220214844, -11.243560791015625, 15.012290954589844, 28.06583023071289, 15.659576416015625, 114.70401000976562, 105.42986297607422, 51.32699203491211, 34.51721954345703, 28.69676971435547, 22.124055862426758, 46.22105407714844, 31.38436508178711, 2.1112442016601562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000276.npy"}
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 19.465354919433594, "std": 49.20631408691406, "min": -91.87521362304688, "p10": -43.85110893249511, "median": 12.692878723144531, "p90": 89.24972229003909, "max": 119.94483947753906, "pos_frac": 0.640625, "sample": [67.74906921386719, -74.48811340332031, 75.89751434326172, -5.938545227050781, 78.6941146850586, -62.73152542114258, 9.424266815185547, 13.886642456054688, 80.20050811767578, 11.499114990234375, -35.37568664550781, -4.257911682128906, 39.980987548828125, -1.79766845703125, -23.472274780273438, 40.27674102783203, 34.14703369140625, -3.9609298706054688, 1.534576416015625, 19.547649383544922, 40.23436737060547, 31.630531311035156, 4.496284484863281, 5.62860107421875, -6.638790130615234, 32.768402099609375, -9.009292602539062, -64.50979614257812, 21.814865112304688, 91.83706665039062, 0.4506187438964844, 39.4293212890625, -28.988901138305664, 104.65794372558594, 31.66570472717285, 92.84637451171875, -10.485206604003906, -67.1198501586914, 25.710617065429688, 34.40657424926758, -0.5183258056640625, -47.48343276977539, 100.98921203613281, 67.27909851074219, 30.269775390625, 5.431877136230469, -6.235435485839844, 72.18946838378906, -79.79569244384766, 119.94483947753906, 58.261619567871094, -26.43509864807129, 95.824951171875, 44.054473876953125, -5.9029693603515625, 0.9045066833496094, 83.21258544921875, -91.87521362304688, 55.374420166015625, 109.18539428710938, -29.789199829101562, 5.823341369628906, -0.06111907958984375, 53.49266815185547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000277.npy"}
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 32.18288040161133, "std": 51.914974212646484, "min": -113.21122741699219, "p10": -24.353845024108885, "median": 23.170719146728516, "p90": 107.21389617919922, "max": 121.34223937988281, "pos_frac": 0.765625, "sample": [-23.373140335083008, -49.085693359375, 86.45916748046875, 25.609710693359375, 0.282928466796875, -5.505060195922852, -88.0814208984375, 53.962799072265625, 117.33629608154297, 118.65379333496094, 37.543487548828125, 3.263275146484375, 106.0174789428711, 107.72664642333984, 20.731727600097656, -8.41834831237793, 30.233966827392578, 64.23001098632812, 74.69550323486328, 82.52218627929688, 57.357154846191406, 56.180564880371094, 11.434085845947266, 2.1809310913085938, -68.932861328125, 121.34223937988281, -30.836685180664062, 72.60298156738281, 14.2130126953125, 4.042366027832031, 10.248008728027344, 34.791175842285156, -3.834381103515625, -24.774147033691406, 65.17356872558594, 5.073616027832031, 101.95303344726562, 9.710250854492188, 63.687870025634766, 68.9145736694336, 112.92475128173828, 117.687255859375, -4.883579254150391, 59.282867431640625, 41.46941375732422, 33.974586486816406, 1.1980438232421875, 4.674562454223633, -33.2293701171875, 4.267333984375, 48.796112060546875, 72.60635375976562, -113.21122741699219, -2.7078895568847656, 7.198951721191406, 91.16725158691406, -0.492340087890625, 115.29310607910156, 48.758785247802734, 13.611885070800781, 8.122217178344727, -4.481967926025391, 99.11219024658203, 13.2322998046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000278.npy"}
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 44.7180290222168, "std": 47.09531784057617, "min": -78.68180847167969, "p10": -3.062721824645994, "median": 37.484432220458984, "p90": 112.06735458374024, "max": 121.27902221679688, "pos_frac": 0.859375, "sample": [-0.6486225128173828, 4.657291412353516, 118.0665283203125, 20.2344970703125, -78.68180847167969, 95.20878601074219, 9.840744018554688, -4.0973358154296875, 8.656829833984375, 69.32661437988281, 11.850837707519531, 97.74593353271484, 53.12731170654297, 105.31803894042969, 41.45466232299805, 121.27902221679688, 65.20719909667969, 77.82965087890625, -35.75221252441406, 4.600826263427734, -34.94414520263672, 102.61460876464844, 69.78313446044922, 36.147064208984375, 9.371734619140625, 63.483863830566406, 20.241769790649414, -0.14263534545898438, 18.187644958496094, 73.48405456542969, 27.413284301757812, 86.3612060546875, 120.1403579711914, 38.821800231933594, -9.169906616210938, 100.87596130371094, 58.94412612915039, 27.985137939453125, -13.154594421386719, 114.9884033203125, 112.43846130371094, 7.316493988037109, 111.2014389038086, 7.4850921630859375, 26.596389770507812, 42.739166259765625, 4.751453399658203, 4.4897918701171875, 35.88525390625, 116.88310241699219, 74.0767822265625, -26.159034729003906, 73.96881103515625, 111.13609313964844, 1.7947521209716797, 9.603118896484375, 17.539962768554688, 12.42034912109375, 80.83074951171875, 75.823974609375, 84.33927917480469, 58.207244873046875, 115.74716186523438, 6.1803741455078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000279.npy"}
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 45.510032653808594, "std": 46.516353607177734, "min": -33.630714416503906, "p10": -1.9748867034912108, "median": 25.909648895263672, "p90": 114.66529846191406, "max": 143.92059326171875, "pos_frac": 0.859375, "sample": [84.47613525390625, 114.49749755859375, 4.204626083374023, 80.51986694335938, -7.521583557128906, 113.79849243164062, 1.538675308227539, -2.539642333984375, 0.8865947723388672, 66.31800842285156, 23.623794555664062, 4.0301666259765625, 88.05111694335938, 96.89593505859375, 112.90495300292969, 109.74510192871094, 45.66596221923828, 29.49831771850586, 108.00389099121094, 28.19550323486328, 5.0763092041015625, 3.3405532836914062, -2.01055908203125, -1.4071331024169922, 114.73721313476562, 79.10528564453125, 10.407032012939453, 72.81993865966797, 88.47833251953125, 143.92059326171875, 28.241039276123047, 5.41302490234375, 17.748260498046875, 79.66636657714844, 4.604152679443359, 39.39642333984375, 123.36529541015625, -4.095054626464844, 117.54682159423828, 117.21753692626953, -33.630714416503906, -3.4041175842285156, 6.429435729980469, -1.8916511535644531, 72.42567443847656, 18.918506622314453, 53.637168884277344, -3.669015884399414, 16.135574340820312, 8.414617538452148, 116.52471160888672, 36.659202575683594, 12.587696075439453, 100.92120361328125, 9.280172348022461, 23.602127075195312, 18.7847900390625, 11.520660400390625, 13.866792678833008, 17.84632110595703, 10.613578796386719, 102.87799072265625, 121.399169921875, 36.42729187011719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000280.npy"}
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 41.260860443115234, "std": 53.436500549316406, "min": -73.38744354248047, "p10": -16.922039794921872, "median": 27.54232120513916, "p90": 112.26598358154297, "max": 123.34103393554688, "pos_frac": 0.796875, "sample": [-27.424861907958984, 94.41181945800781, 10.036529541015625, 2.441680908203125, 101.86697387695312, 80.76385498046875, 30.307886123657227, 65.05647277832031, 35.085540771484375, 116.77505493164062, 42.71638488769531, 115.10692596435547, 79.57838439941406, 66.55189514160156, 0.8174285888671875, 4.101869583129883, 6.572813034057617, 112.73635864257812, -0.14974021911621094, 92.76862335205078, 76.96894836425781, 2.7480010986328125, -10.679092407226562, 24.776756286621094, 102.5908432006836, 2.337139129638672, 10.004798889160156, -14.434917449951172, 38.531158447265625, -73.38744354248047, 6.4761505126953125, 10.071308135986328, 5.781288146972656, 16.965688705444336, 110.73336029052734, -65.5116195678711, 68.48918151855469, 123.34103393554688, 14.808212280273438, -49.410213470458984, 114.6715316772461, 99.70814514160156, 115.79448699951172, 66.58906555175781, 115.6025161743164, 21.80242156982422, 106.23963928222656, 9.085453033447266, 110.79129028320312, 57.6295166015625, -2.1724090576171875, 3.97357177734375, 111.16844177246094, -58.04084777832031, 0.8313179016113281, 98.12641906738281, 108.60214233398438, -8.172195434570312, 78.99696350097656, 87.74498748779297, -17.98794937133789, -6.678998947143555, 14.201736450195312, -19.134611129760742], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000281.npy"}
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 37.09632873535156, "std": 59.02665328979492, "min": -102.54646301269531, "p10": -40.74098434448241, "median": 26.034974098205566, "p90": 118.14609603881838, "max": 147.02969360351562, "pos_frac": 0.6875, "sample": [24.43488311767578, -21.40782928466797, -58.848636627197266, -5.4328155517578125, 6.806617736816406, 95.93460083007812, 83.09617614746094, 9.5679931640625, -3.0516014099121094, 126.36933898925781, 70.15912628173828, 110.20054626464844, 22.768007278442383, 96.78962707519531, 104.70991516113281, 17.551971435546875, 121.02012634277344, 61.1021842956543, -3.474884033203125, 83.89306640625, 35.54313659667969, 38.078094482421875, -4.254434585571289, 125.1722412109375, 27.459627151489258, -9.760051727294922, -45.35420227050781, 105.50981140136719, 108.26702880859375, 6.863613128662109, -33.84708023071289, -19.583635330200195, -102.54646301269531, 107.06562805175781, 6.229581832885742, -49.28105163574219, 120.26616668701172, 13.212488174438477, -47.189239501953125, 31.796096801757812, 91.2639389038086, 104.04895782470703, 3.1807403564453125, 48.7127571105957, 113.19926452636719, 147.02969360351562, 64.45132446289062, -1.0398139953613281, -3.2948455810546875, 30.949111938476562, -17.292221069335938, -15.598098754882812, -34.05635070800781, 17.12887954711914, -43.60582733154297, 127.96134185791016, 123.69691467285156, 82.17031860351562, -59.41002655029297, 24.610321044921875, 53.937255859375, 51.441383361816406, 96.49195861816406, 12.352287292480469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000282.npy"}
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 33.79491424560547, "std": 55.39789581298828, "min": -102.15225219726562, "p10": -29.075839233398433, "median": 18.63418960571289, "p90": 108.0747169494629, "max": 126.4864273071289, "pos_frac": 0.765625, "sample": [125.64479064941406, -65.1580581665039, -25.790573120117188, -30.733444213867188, 83.8385009765625, 73.79107666015625, 5.120342254638672, 3.4017562866210938, 83.74085235595703, 1.0707244873046875, -19.802017211914062, 92.45271301269531, 33.28334045410156, 25.113941192626953, 112.42288208007812, 10.244029998779297, 11.27886962890625, 108.45836639404297, -30.483810424804688, 20.36993408203125, 103.1577377319336, 71.70982360839844, 94.98269653320312, 14.196830749511719, 69.12435913085938, 115.95526885986328, 84.39171600341797, -0.24921417236328125, 126.4864273071289, 10.916908264160156, 1.5660324096679688, 107.17953491210938, -62.923309326171875, 8.294517517089844, 16.89844512939453, 63.983489990234375, -74.73026275634766, 23.893707275390625, 60.21403503417969, 100.486328125, 111.49783325195312, -23.875524520874023, 100.99662780761719, 93.55429077148438, 13.332378387451172, 16.141551971435547, 29.69129180908203, -102.15225219726562, 5.92725944519043, -59.9349365234375, 89.58943176269531, 1.8311500549316406, 61.7203369140625, -1.4100151062011719, 48.851654052734375, 87.2329330444336, -7.5184783935546875, -14.91006851196289, 9.706781387329102, 3.37164306640625, -19.3773193359375, 119.64488220214844, 13.476974487304688, 31.68694496154785], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000283.npy"}
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 36.043357849121094, "std": 51.06206130981445, "min": -109.76085662841797, "p10": -4.59390182495117, "median": 16.76789665222168, "p90": 112.72749786376954, "max": 129.1955108642578, "pos_frac": 0.8125, "sample": [-8.110626220703125, 5.507118225097656, 42.04963684082031, 20.378128051757812, 125.70121002197266, 66.59320068359375, 3.1788997650146484, -19.999267578125, 129.12986755371094, 6.565185546875, 8.849864959716797, 17.32613754272461, -0.16704177856445312, 0.4602508544921875, 96.89067840576172, -5.408794403076172, 9.420660018920898, 66.40650939941406, 0.4099578857421875, 123.47396850585938, 16.20965576171875, 30.806472778320312, 9.546096801757812, -35.62005615234375, -16.046791076660156, 14.586349487304688, 3.6707687377929688, 23.15966796875, 50.11623001098633, -2.692485809326172, 111.60609436035156, 69.50420379638672, 78.47736358642578, 90.49261474609375, 9.473579406738281, 11.162330627441406, 93.19053649902344, 113.20809936523438, 119.81729125976562, 7.424934387207031, 71.48773956298828, 129.1955108642578, 53.01995849609375, -0.008007049560546875, 0.7959346771240234, -0.19203567504882812, 86.61166381835938, 7.134700775146484, 62.348304748535156, 98.58905029296875, 3.2326087951660156, 60.54887008666992, 90.39608764648438, 8.001449584960938, 9.44500732421875, 25.180313110351562, 126.12942504882812, 48.22105407714844, 52.842071533203125, -109.76085662841797, 79.3955307006836, -94.81163787841797, 13.493690490722656, -1.2699832916259766], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000284.npy"}
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 30.173725128173828, "std": 52.59982681274414, "min": -108.63909149169922, "p10": -35.83352699279785, "median": 18.860084533691406, "p90": 111.90569076538087, "max": 122.71263122558594, "pos_frac": 0.71875, "sample": [29.41728973388672, 5.911338806152344, -17.82110595703125, 112.54569244384766, 8.696334838867188, -36.907649993896484, -108.63909149169922, 7.171295166015625, 56.449832916259766, -33.327239990234375, 47.55315399169922, 114.10282897949219, 102.81333923339844, 17.905921936035156, 90.76066589355469, -48.151451110839844, 53.651947021484375, 116.7005615234375, 2.3940773010253906, 6.129276275634766, 34.70758056640625, 1.7020301818847656, -42.6234016418457, -37.661556243896484, 1.5473480224609375, -63.62007141113281, -1.966522216796875, 19.692581176757812, 25.41009521484375, -45.845245361328125, -12.774593353271484, 122.71263122558594, -11.176338195800781, 109.71895599365234, 53.82917785644531, 67.39735412597656, 48.070335388183594, 6.7743682861328125, 61.50050354003906, 92.41664123535156, 64.94751739501953, -2.9464111328125, -1.2087020874023438, 118.52322387695312, 83.55701446533203, 30.445106506347656, 0.5514030456542969, 110.412353515625, -7.54608154296875, -2.574504852294922, 39.578041076660156, 90.40226745605469, 18.027587890625, 59.322357177734375, 3.3868865966796875, 42.931304931640625, 119.0486831665039, 118.76040649414062, -21.71459197998047, 6.341835021972656, 0.322906494140625, 49.38029479980469, -11.077682495117188, 65.07829284667969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000285.npy"}
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 48.285194396972656, "std": 52.9658203125, "min": -51.72956848144531, "p10": -3.3322616577148434, "median": 36.182533264160156, "p90": 127.67817764282226, "max": 154.08160400390625, "pos_frac": 0.859375, "sample": [45.57667541503906, 8.472635269165039, 1.1784687042236328, 67.00679779052734, 4.940032958984375, 115.5888671875, 8.057426452636719, 2.274576187133789, 23.861703872680664, -3.5132293701171875, 64.10749816894531, 6.561309814453125, 35.72686767578125, 97.11891174316406, 108.39131164550781, 132.41558837890625, 102.98478698730469, 19.790498733520508, -19.83792495727539, 11.127204895019531, 77.01264953613281, -8.020103454589844, 39.264339447021484, 121.12403869628906, 3.5064048767089844, -51.72956848144531, 71.71867370605469, 44.70193099975586, -35.35734558105469, 55.20562744140625, 111.25672149658203, 73.63345336914062, 118.30744934082031, 129.31382751464844, 71.96704864501953, 134.2227783203125, 31.944568634033203, 105.5033187866211, -48.0710334777832, 3.8128509521484375, 128.16700744628906, 154.08160400390625, 18.235809326171875, -2.910003662109375, 84.96562957763672, 10.874557495117188, 15.104814529418945, 19.724868774414062, 126.71733856201172, 120.32884216308594, 128.0899658203125, 36.63819885253906, 21.40715789794922, 9.792623519897461, 2.542736053466797, 25.155593872070312, 72.10771942138672, 13.708660125732422, -36.52677917480469, 135.3178253173828, 56.63514709472656, -0.4927864074707031, 60.22712707519531, 9.211212158203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000286.npy"}
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 48.909759521484375, "std": 55.89419937133789, "min": -103.65811157226562, "p10": -3.6496095657348615, "median": 32.39004135131836, "p90": 126.18462982177735, "max": 139.57998657226562, "pos_frac": 0.828125, "sample": [44.43769073486328, -39.15107727050781, 11.5157470703125, 97.09890747070312, 139.57998657226562, 119.61681365966797, 10.3359375, 34.908233642578125, 57.26116180419922, -4.4898834228515625, 55.77101135253906, 12.480220794677734, 46.492156982421875, 27.223434448242188, 129.80435180664062, 131.0606689453125, 132.45379638671875, 64.2395248413086, 119.38774108886719, 88.30197143554688, 31.979454040527344, 32.800628662109375, 106.33111572265625, 2.2940196990966797, 8.007213592529297, 95.99927520751953, -2.073444366455078, -25.90491485595703, 107.67784881591797, 0.4575691223144531, 101.68047332763672, 6.1544952392578125, -103.65811157226562, 6.541297912597656, -4.296976089477539, 84.98133087158203, 23.000823974609375, -1.3166999816894531, 29.961769104003906, 107.90414428710938, 110.92635345458984, -21.95671844482422, 125.52210998535156, 102.79931640625, -15.684982299804688, 4.190147399902344, 126.64936828613281, 2.8801841735839844, 109.05986785888672, 112.68082427978516, 0.24282455444335938, 5.2744140625, 89.04320526123047, 4.797224044799805, 1.7230720520019531, 31.382713317871094, 129.3829803466797, 8.639686584472656, 73.62208557128906, -2.139087677001953, -1.4433059692382812, 126.46856689453125, 118.28253936767578, 1.0313186645507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000287.npy"}
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 35.25421142578125, "std": 56.798301696777344, "min": -100.86129760742188, "p10": -23.50111389160156, "median": 27.323808670043945, "p90": 117.57759857177734, "max": 138.83554077148438, "pos_frac": 0.703125, "sample": [29.981937408447266, 2.2420654296875, -24.562347412109375, -100.86129760742188, 32.81232452392578, 16.969684600830078, 19.77242088317871, -19.71820068359375, 75.62997436523438, 48.59186553955078, 16.993000030517578, 74.47916412353516, 14.820632934570312, 116.8286361694336, -9.720611572265625, 84.90068054199219, -21.02490234375, 51.27614974975586, -19.186553955078125, 63.716495513916016, -86.8061752319336, 116.75167846679688, 138.83554077148438, 119.72994232177734, -8.76385498046875, 127.49603271484375, 26.528871536254883, 52.06263732910156, 5.210472106933594, 28.118745803833008, 126.86489868164062, -75.71171569824219, 51.8145751953125, 66.28150939941406, 122.55143737792969, 60.566776275634766, 46.13905334472656, -33.88786315917969, 37.789756774902344, -18.573204040527344, -47.037940979003906, -56.595733642578125, 117.8985824584961, -1.11614990234375, -1.3488235473632812, -12.323654174804688, 98.5330581665039, -8.140594482421875, 105.31546020507812, 107.29672241210938, 102.58523559570312, -0.59613037109375, 81.95864868164062, 8.64251708984375, 3.06903076171875, -0.21061325073242188, 75.17854309082031, 17.01782989501953, 0.6017398834228516, 123.32931518554688, 24.50615882873535, 79.2134780883789, 17.464080810546875, 64.08834838867188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000288.npy"}
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 32.55389404296875, "std": 59.53990936279297, "min": -110.10577392578125, "p10": -36.27716598510742, "median": 15.764274597167969, "p90": 118.05256118774415, "max": 144.15823364257812, "pos_frac": 0.703125, "sample": [26.517154693603516, 13.356735229492188, 109.46214294433594, -15.172842025756836, -36.65619659423828, -15.64990234375, -10.919658660888672, 27.3236083984375, 91.46832275390625, 27.809974670410156, 10.445114135742188, 37.03107452392578, 106.42449188232422, 106.58036804199219, 35.93501281738281, 24.136186599731445, 104.93661499023438, -36.78565979003906, -10.310794830322266, 116.97425079345703, 15.37103271484375, 124.30064392089844, 1.67645263671875, -35.39276123046875, 102.95125579833984, 0.7719230651855469, 118.51469421386719, 4.096660614013672, -1.0119209289550781, 124.17793273925781, -0.7211761474609375, 29.026941299438477, -22.262981414794922, -110.10577392578125, 135.13958740234375, -3.855224609375, 4.778650283813477, 144.15823364257812, 11.07574462890625, 19.921775817871094, 2.0696239471435547, 16.157516479492188, -38.42169952392578, 21.91579818725586, 61.28839111328125, -33.059059143066406, 50.53036117553711, -3.6585960388183594, -51.421722412109375, -80.56376647949219, 92.31448364257812, 18.04705810546875, 4.8506622314453125, 14.571273803710938, 121.8787612915039, 116.10212707519531, 93.06478118896484, 116.28909301757812, 120.1575927734375, 6.276947021484375, 0.8615455627441406, -47.301353454589844, 112.26763153076172, -6.285987854003906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000289.npy"}
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 40.3026123046875, "std": 61.6275520324707, "min": -105.57525634765625, "p10": -12.888119316101072, "median": 31.743335723876953, "p90": 126.25508270263673, "max": 132.99658203125, "pos_frac": 0.75, "sample": [78.70005798339844, 30.640968322753906, 107.8497085571289, 132.99658203125, 49.47660827636719, 9.099599838256836, -13.693628311157227, -6.535379409790039, 2.98284912109375, 132.57672119140625, 54.96168518066406, 21.585803985595703, 30.13446807861328, -75.9260482788086, 40.02410888671875, -1.20196533203125, -11.008598327636719, -21.696083068847656, -10.453121185302734, -10.166465759277344, 54.774131774902344, 36.181602478027344, 127.10855102539062, 129.6303253173828, 28.03376007080078, 23.893455505371094, 7.867404937744141, -2.378570556640625, 8.060216903686523, -85.14002990722656, 123.16559600830078, 130.50213623046875, 113.77562713623047, 43.96892166137695, 50.816986083984375, 128.85037231445312, 123.16732025146484, 78.4111099243164, 39.028663635253906, -77.7589111328125, 115.30766296386719, 101.82747650146484, 12.579036712646484, -104.71156311035156, -1.4835891723632812, -5.797088623046875, -4.28972053527832, 131.574951171875, 119.84969329833984, 113.81670379638672, 20.28665542602539, 42.9801025390625, 2.9950408935546875, 14.883285522460938, -105.57525634765625, 114.36550903320312, 15.029212951660156, 124.26365661621094, 32.845703125, 37.783592224121094, 87.64764404296875, 28.679168701171875, 60.24559020996094, 1.9572219848632812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000290.npy"}
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 42.05046081542969, "std": 58.07929992675781, "min": -111.38766479492188, "p10": -18.381506729125977, "median": 17.88071060180664, "p90": 125.2741828918457, "max": 166.533447265625, "pos_frac": 0.75, "sample": [136.24420166015625, 125.36253356933594, 19.092300415039062, -24.381942749023438, 28.814071655273438, 18.904327392578125, -17.454246520996094, 0.17332077026367188, 123.9248046875, 4.922733306884766, 83.68140411376953, 81.27600860595703, -20.71567153930664, 0.43504905700683594, 117.70364379882812, 113.5038070678711, -24.50688934326172, -9.550209045410156, 15.514663696289062, 106.7291030883789, -3.437164306640625, 11.48345947265625, 30.548171997070312, 41.63928985595703, 57.42465591430664, 105.8692626953125, 105.70819091796875, -0.90130615234375, 128.89242553710938, 3.5591888427734375, 56.823570251464844, 131.6017303466797, -1.3124504089355469, 16.20879364013672, -15.896026611328125, 5.255701065063477, 91.11016845703125, -11.88531494140625, 129.11428833007812, 12.089332580566406, -2.0620040893554688, 125.06803131103516, -111.38766479492188, 14.061073303222656, 127.25172424316406, 35.75981140136719, 50.55049133300781, 116.08378601074219, 77.3590087890625, 166.533447265625, -18.77890396118164, 13.670761108398438, -32.185054779052734, 16.857093811035156, -45.610450744628906, 5.195552825927734, 16.442703247070312, 121.79639434814453, 8.687149047851562, 84.35340118408203, 84.11282348632812, 59.543853759765625, 4.369384765625, -0.0119171142578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000291.npy"}
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 34.47936248779297, "std": 61.154842376708984, "min": -115.46902465820312, "p10": -34.99910202026366, "median": 22.253341674804688, "p90": 113.41948852539063, "max": 156.43585205078125, "pos_frac": 0.703125, "sample": [-98.67378234863281, -8.895957946777344, 96.53094482421875, 0.27176856994628906, 128.23849487304688, 63.790802001953125, 39.158748626708984, 156.43585205078125, 99.91797637939453, -18.919403076171875, -47.04612731933594, 4.004179000854492, 8.894243240356445, 11.065406799316406, -56.831787109375, -1.488616943359375, 66.1776123046875, 41.80503845214844, 81.14277648925781, 12.468963623046875, 35.69474792480469, 93.10067749023438, 85.57234191894531, -38.583595275878906, -11.17266845703125, -3.2205429077148438, -1.5728874206542969, 114.3045425415039, -26.635284423828125, 91.54808807373047, 23.94953155517578, 99.09839630126953, 6.292741775512695, -6.5360260009765625, -0.0489959716796875, -107.43363189697266, 94.79046630859375, -115.46902465820312, 0.28057098388671875, 7.033866882324219, -6.5081329345703125, 111.35436248779297, 97.38156127929688, -8.479928970336914, 128.67391967773438, 93.83827209472656, 5.9891204833984375, 57.96240997314453, 35.356842041015625, 6.245635986328125, 129.1766357421875, 103.41936492919922, 2.7449588775634766, 103.45948791503906, 61.004581451416016, 19.09876251220703, 66.63259887695312, 30.958911895751953, 124.50044250488281, 23.4697265625, 21.036956787109375, -1.1697998046875, -48.816253662109375, 130.30825805664062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000292.npy"}
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 48.75517272949219, "std": 67.24690246582031, "min": -126.57273864746094, "p10": -21.49643287658691, "median": 33.47398376464844, "p90": 126.79587020874024, "max": 161.5994873046875, "pos_frac": 0.78125, "sample": [4.521831512451172, 28.058731079101562, 8.324993133544922, 126.26277923583984, 85.63983154296875, 25.41665267944336, -23.35672378540039, 19.64429473876953, 97.58943939208984, 4.87286376953125, 127.02433776855469, -7.939390182495117, 128.5096893310547, 100.68628692626953, 105.20405578613281, 4.940711975097656, 116.25725555419922, 76.25041198730469, 111.04791259765625, 110.62775421142578, 96.091796875, -35.12234115600586, -10.631843566894531, -80.36000061035156, 101.24054718017578, 0.743072509765625, 88.89987182617188, -48.256412506103516, 15.569801330566406, 147.34646606445312, 0.7718925476074219, 138.47305297851562, 139.6954345703125, -91.08827209472656, 107.26860046386719, 4.808307647705078, 123.6994857788086, -126.57273864746094, -5.564075469970703, 13.132980346679688, 27.220718383789062, 126.12501525878906, -74.00677490234375, -10.447723388671875, -4.331634521484375, 1.5614242553710938, 38.88923645019531, 104.58647155761719, 161.5994873046875, 21.815475463867188, 113.93212890625, 1.301910400390625, 58.517616271972656, 5.023326873779297, 120.58773803710938, -17.15575408935547, 115.34010314941406, 16.95355224609375, 85.6827392578125, 128.14334106445312, 44.67979431152344, -6.737888336181641, 105.70249938964844, 125.61892700195312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000293.npy"}
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 32.19460678100586, "std": 53.512062072753906, "min": -100.17200469970703, "p10": -16.54956150054931, "median": 12.249507904052734, "p90": 122.82835083007812, "max": 152.481201171875, "pos_frac": 0.734375, "sample": [122.18370056152344, 22.577423095703125, 42.05986022949219, 11.70867919921875, 115.96649169921875, 18.00542449951172, 128.92306518554688, -27.662826538085938, -35.365447998046875, 4.442756652832031, 70.06187438964844, -100.17200469970703, -9.859949111938477, 22.096527099609375, 16.21958351135254, 74.15184020996094, 123.10462951660156, 9.858680725097656, 125.32388305664062, 17.580177307128906, 0.1805572509765625, 12.790336608886719, 2.3357925415039062, 67.76959991455078, -5.09124755859375, 132.8353729248047, -4.282161712646484, 70.95416259765625, -2.0086631774902344, 148.82623291015625, 113.5181884765625, -33.93148422241211, 1.4035282135009766, 152.481201171875, -3.876363754272461, 0.6494903564453125, -8.320381164550781, -2.7605857849121094, -33.640716552734375, 34.99835205078125, 7.806983947753906, 1.04010009765625, 120.41504669189453, -0.7122917175292969, 4.456426620483398, 56.17052459716797, 17.099273681640625, 38.028079986572266, 5.734825134277344, 128.51980590820312, -19.41653823852539, 48.5999641418457, 90.44729614257812, 4.382049560546875, 4.846784591674805, 43.56517028808594, 2.1531829833984375, -0.6151180267333984, 18.814071655273438, 5.3242645263671875, 78.63931274414062, -24.747615814208984, 41.923770904541016, -8.056243896484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000294.npy"}
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 30.745624542236328, "std": 57.79365539550781, "min": -99.73434448242188, "p10": -32.18832550048827, "median": 12.512977600097656, "p90": 114.39968490600586, "max": 153.09979248046875, "pos_frac": 0.71875, "sample": [122.31201171875, 11.318923950195312, -4.4113616943359375, -8.547958374023438, 52.052284240722656, 95.21907043457031, 114.46735382080078, 34.38084030151367, 20.645858764648438, 23.493057250976562, 131.36929321289062, -15.860851287841797, 153.09979248046875, 14.400672912597656, 8.63534927368164, 9.006126403808594, 1.6673583984375, 5.197227478027344, -22.545455932617188, 87.07838439941406, 123.97213745117188, -89.60737609863281, -2.898578643798828, 6.922147750854492, 61.3779296875, 13.121612548828125, 5.73845100402832, -13.014278411865234, 111.97744750976562, 8.036140441894531, -39.67405700683594, -2.8829421997070312, 132.40126037597656, 114.24179077148438, 19.732955932617188, -69.02200317382812, 109.40325927734375, 30.028583526611328, 53.25214767456055, 129.2516326904297, 60.311859130859375, -99.73434448242188, 11.904342651367188, -75.05474853515625, 91.21284484863281, 46.190895080566406, -36.32098388671875, -1.8745498657226562, -40.82665252685547, 68.53034973144531, -8.66712760925293, 2.9059600830078125, 18.69916534423828, 6.3117828369140625, 4.433393478393555, 6.033315658569336, 89.446533203125, -9.646453857421875, 24.198387145996094, 1.588897705078125, 82.2865219116211, 89.35748291015625, 105.80814361572266, -4.711273193359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000295.npy"}
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 38.593257904052734, "std": 62.072792053222656, "min": -106.92141723632812, "p10": -44.957034683227526, "median": 24.783071517944336, "p90": 123.46815185546878, "max": 150.75494384765625, "pos_frac": 0.796875, "sample": [67.77445983886719, 115.2220458984375, 110.75187683105469, 11.069517135620117, 1.1148757934570312, 18.111251831054688, 3.2541027069091797, 141.67788696289062, 2.7779083251953125, 93.43319702148438, 24.12673568725586, 14.040157318115234, -60.14733123779297, 130.19537353515625, 1.2177658081054688, 46.4615478515625, 115.28181457519531, 77.7318344116211, 11.732986450195312, 37.55619812011719, 0.81591796875, 38.22647476196289, 3.1805877685546875, -67.34991455078125, -28.56060791015625, -18.096603393554688, 1.828765869140625, 45.216064453125, 15.201820373535156, 18.67560577392578, 9.367488861083984, -69.54533386230469, 33.7296142578125, -10.20892333984375, 150.75494384765625, 60.425628662109375, 2.6619491577148438, -49.64188766479492, 108.49140930175781, -71.18299865722656, 42.18242645263672, 86.66545104980469, 131.4398193359375, -106.92141723632812, 111.79710388183594, 25.439407348632812, 85.0784912109375, 89.48403930664062, 94.98668670654297, 78.42507934570312, 116.77391052246094, 19.904693603515625, 126.33711242675781, 14.299009323120117, 145.7542724609375, -9.059051513671875, 10.716018676757812, 131.79830932617188, -3.3344650268554688, -34.02571105957031, -67.70292663574219, 97.1004638671875, 81.58203125, 63.873451232910156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000296.npy"}
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 42.7445068359375, "std": 54.03239822387695, "min": -93.26148986816406, "p10": -10.197447776794432, "median": 30.57942008972168, "p90": 123.8336524963379, "max": 134.734130859375, "pos_frac": 0.78125, "sample": [111.29946899414062, -36.066925048828125, 5.735099792480469, 94.6044692993164, 36.538055419921875, 20.827804565429688, 12.146202087402344, 121.61312103271484, -2.7336578369140625, -93.26148986816406, 95.20285034179688, -1.98223876953125, 22.929264068603516, 39.78559494018555, 2.697286605834961, 128.37277221679688, 16.563079833984375, 65.48905944824219, 14.079166412353516, 5.1034393310546875, 38.01216125488281, 8.383285522460938, -34.15605163574219, -65.7617416381836, -6.155614852905273, 124.78530883789062, 32.40784454345703, 7.5390777587890625, 104.45065307617188, 5.321739196777344, 119.32048797607422, 128.3529815673828, -5.6638946533203125, -8.274126052856445, 11.286808013916016, 20.33492088317871, 10.21240234375, -11.021728515625, 104.38497924804688, 57.883750915527344, 53.919464111328125, 30.926471710205078, 97.75611877441406, 59.58265686035156, 25.805763244628906, 85.09232330322266, 30.23236846923828, 0.5835695266723633, -12.183364868164062, 125.81262969970703, -0.7952537536621094, 102.94850158691406, 118.2266845703125, 89.34645080566406, -2.7902183532714844, 62.4849853515625, 127.62141418457031, 75.59490966796875, 0.1784820556640625, 134.734130859375, -11.539894104003906, 34.33295822143555, 131.79983520507812, 75.3919677734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000297.npy"}
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 31.96074676513672, "std": 64.25730895996094, "min": -119.45063781738281, "p10": -49.37774276733398, "median": 12.029895782470703, "p90": 122.75492782592774, "max": 142.1756591796875, "pos_frac": 0.734375, "sample": [30.792396545410156, -84.0785903930664, 142.1756591796875, 11.458526611328125, 131.19686889648438, 6.9205474853515625, 98.21046447753906, 0.6858367919921875, -59.24482727050781, 105.81815338134766, 131.39468383789062, 118.8332748413086, -58.411651611328125, 37.80743408203125, 6.625984191894531, 51.409053802490234, 83.68191528320312, -41.399253845214844, 95.05866241455078, 87.25994873046875, 2.899669647216797, -0.3412513732910156, 99.1786117553711, 30.687164306640625, 29.034210205078125, 3.91766357421875, -85.66233825683594, 10.170034408569336, 79.57088470458984, 122.53096008300781, 1.779367446899414, 131.72250366210938, -6.5650482177734375, 12.601264953613281, 27.763721466064453, 98.38275146484375, 6.64106559753418, 124.56893157958984, 4.412729263305664, 2.5277633666992188, -4.198143005371094, -2.184682846069336, -2.0701141357421875, 17.103866577148438, 19.020660400390625, 138.64962768554688, 121.35025024414062, 1.2204360961914062, -17.704288482666016, 5.307373046875, 43.41847610473633, 10.9161376953125, -42.11052703857422, -52.49226379394531, -9.493240356445312, 104.34300994873047, -85.6787109375, 92.42576599121094, -19.576690673828125, -119.45063781738281, 122.85091400146484, 16.69076919555664, 1.3050460815429688, 113.82882690429688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000298.npy"}
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 28.163633346557617, "std": 65.83679962158203, "min": -110.49800872802734, "p10": -63.24825897216796, "median": 20.097213745117188, "p90": 113.82326965332032, "max": 156.31570434570312, "pos_frac": 0.640625, "sample": [110.01251220703125, 87.00436401367188, -12.19758415222168, -16.629554748535156, 6.692216873168945, 102.44393920898438, 20.031021118164062, 56.64899826049805, 156.31570434570312, -0.9300670623779297, -54.79437255859375, 73.02107238769531, -110.49800872802734, 2.6227493286132812, -91.84391784667969, -17.658203125, 20.163406372070312, 119.51451110839844, 56.92301940917969, -54.352699279785156, 123.54522705078125, -26.234500885009766, 39.0875129699707, 66.07606506347656, -66.87135314941406, -92.91410064697266, -5.123527526855469, 0.08472061157226562, 134.0423126220703, -1.3906173706054688, 98.80825805664062, -1.4470443725585938, 59.628623962402344, -95.01176452636719, 5.174102783203125, 8.4119873046875, 61.11103820800781, 115.45645141601562, 15.210136413574219, 24.238121032714844, 59.138397216796875, -50.938194274902344, 107.3652572631836, 108.46560668945312, 128.87416076660156, -68.1760482788086, 13.003469467163086, 91.6561050415039, 88.54611206054688, 79.96026611328125, -0.1278667449951172, 29.039466857910156, 82.45712280273438, -77.09255981445312, 8.24346923828125, 84.14834594726562, -30.067901611328125, 101.95004272460938, -0.7273578643798828, 22.31817626953125, -1.1167335510253906, 42.38739776611328, -54.7367057800293, 123.53176879882812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000299.npy"}
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 36.18071746826172, "std": 62.76785659790039, "min": -99.47361755371094, "p10": -52.09706573486328, "median": 33.64421463012695, "p90": 123.53611068725586, "max": 142.2881317138672, "pos_frac": 0.734375, "sample": [36.92833709716797, 109.42668914794922, 4.8521881103515625, 84.15379333496094, -1.0635414123535156, 32.73176574707031, 98.7799301147461, 12.039031982421875, -0.05023002624511719, -87.97049713134766, 135.61253356933594, 26.13469696044922, 90.9278335571289, 131.0078582763672, 131.78799438476562, 61.74208068847656, 126.43122863769531, -53.945831298828125, -27.808692932128906, 96.77611541748047, 2.491792678833008, 90.70294952392578, 41.83100891113281, -12.904876708984375, -70.75540161132812, 83.98809814453125, 110.43193054199219, -17.595603942871094, 35.179466247558594, 22.147106170654297, 5.455074310302734, -99.47361755371094, 87.01286315917969, -0.8059310913085938, -92.7139892578125, 49.1692008972168, 34.556663513183594, 132.47203063964844, 2.7373809814453125, 123.52765655517578, 54.890106201171875, 38.65347671508789, 3.84051513671875, 142.2881317138672, 98.99826049804688, -47.78327941894531, 0.8420448303222656, 99.34469604492188, -3.81488037109375, 71.73991394042969, 66.13931274414062, 9.69797134399414, 22.965545654296875, 39.079002380371094, -10.089454650878906, -71.50883483886719, 28.743789672851562, 112.52365112304688, 72.58474731445312, -69.7338638305664, 15.096214294433594, 0.48571014404296875, 123.53973388671875, -18.90375518798828], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000300.npy"}
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 32.39488983154297, "std": 60.45381546020508, "min": -124.4368667602539, "p10": -29.331337356567378, "median": 26.047266960144043, "p90": 119.93441696166992, "max": 159.73971557617188, "pos_frac": 0.734375, "sample": [105.37852478027344, 11.580757141113281, 44.28575897216797, 33.12314224243164, 39.927947998046875, -31.843994140625, 101.22489166259766, 129.61180114746094, 97.09796142578125, 8.854080200195312, 38.97211456298828, 28.462045669555664, 21.01422119140625, -11.146595001220703, -108.34297180175781, 15.18798828125, -3.0079345703125, 78.38812255859375, 88.4775390625, 4.913341522216797, -48.13883972167969, 49.86494827270508, 124.7352294921875, -6.661933898925781, 50.960594177246094, 2.7986412048339844, -4.307416915893555, 122.77142333984375, 89.2965087890625, -20.65705108642578, -14.03873062133789, 36.344764709472656, 63.1022834777832, 109.506591796875, -53.477569580078125, 0.2028350830078125, -22.432540893554688, 159.73971557617188, -61.68888854980469, 23.632488250732422, -20.97609519958496, 32.94976043701172, 31.592819213867188, -86.31343078613281, 3.8104400634765625, 87.09149932861328, 107.15525817871094, 51.095123291015625, 18.802261352539062, 120.79782104492188, 11.260602951049805, 75.67666625976562, 68.56376647949219, -23.46847152709961, -20.472503662109375, 6.714405059814453, 120.92649841308594, 117.91980743408203, 37.65254211425781, 5.670082092285156, 20.913558959960938, -124.4368667602539, 131.87452697753906, 4.761116027832031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000301.npy"}
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 24.005821228027344, "std": 57.314208984375, "min": -127.86943054199219, "p10": -42.10930175781249, "median": 17.656198501586914, "p90": 102.5805877685547, "max": 150.0008544921875, "pos_frac": 0.71875, "sample": [59.32781982421875, 141.97927856445312, 2.6185150146484375, 150.0008544921875, 58.68017578125, 30.019113540649414, 103.91305541992188, -13.631080627441406, 128.0067138671875, 15.27410888671875, 39.29876708984375, 36.14740753173828, 5.568048477172852, 6.453403472900391, 78.99176025390625, 115.80616760253906, 2.96044921875, 12.787220001220703, 9.632240295410156, 7.9128570556640625, 20.585430145263672, -87.28060150146484, -89.66596984863281, 23.293212890625, 25.958847045898438, -8.804418563842773, 35.12633514404297, 51.97046661376953, -7.498447418212891, -5.249359130859375, -21.732505798339844, 52.45466613769531, 69.71116638183594, 3.580728530883789, -48.265907287597656, 110.14865112304688, 129.09487915039062, -32.287750244140625, 11.967010498046875, 34.43356704711914, 52.52558135986328, 89.34740447998047, 1.1844673156738281, 46.34510803222656, 37.45062255859375, 48.03173828125, -117.155517578125, 0.10198974609375, -19.24908447265625, 49.863746643066406, 88.09895324707031, 51.39497375488281, -5.093254089355469, -0.05938720703125, 18.523181915283203, -45.64958190917969, -33.84864807128906, 99.47149658203125, -8.55352783203125, -54.64065170288086, -127.86943054199219, 16.789215087890625, 78.69322204589844, 11.383054733276367], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000302.npy"}
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 44.49180603027344, "std": 64.84395599365234, "min": -100.14112854003906, "p10": -49.39916801452636, "median": 52.28899383544922, "p90": 125.45113830566406, "max": 148.96343994140625, "pos_frac": 0.6875, "sample": [-9.750747680664062, 66.49089050292969, 84.339111328125, 24.16166114807129, 131.54437255859375, 104.91941833496094, 91.59017944335938, -52.83015823364258, 105.9079818725586, 54.22306823730469, 118.69688415527344, -40.80564880371094, 31.415325164794922, -6.51530647277832, -85.283935546875, -65.00235748291016, -59.18907165527344, 118.2012939453125, -100.14112854003906, -21.753005981445312, -0.12292098999023438, 14.090055465698242, 148.96343994140625, 78.26451110839844, -41.393524169921875, 87.40625, 90.91144561767578, -66.00609588623047, -3.3931121826171875, 30.143951416015625, 65.0698471069336, -0.941162109375, 6.859184265136719, 50.35491943359375, 20.21569061279297, -14.359127044677734, 125.04606628417969, 17.672752380371094, 95.38665771484375, 94.04876708984375, 20.422523498535156, 112.41243743896484, 60.01805114746094, 130.83560180664062, -2.4515609741210938, 102.99186706542969, 134.07542419433594, 41.318084716796875, 56.28123474121094, 119.3534927368164, -12.851486206054688, -6.136713027954102, -54.83784866333008, 30.342864990234375, 129.50521850585938, 100.16053771972656, 22.13615608215332, 72.12220764160156, 73.44328308105469, 115.13780212402344, 125.62474060058594, -37.54383850097656, 87.88670349121094, 138.79232788085938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000303.npy"}
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 43.97296142578125, "std": 50.342872619628906, "min": -114.8714599609375, "p10": -6.21152992248535, "median": 41.084556579589844, "p90": 115.83663558959962, "max": 143.4862518310547, "pos_frac": 0.828125, "sample": [121.87417602539062, -3.2700881958007812, 59.61968994140625, 126.50904846191406, -7.778839111328125, 60.26139450073242, 34.57996368408203, 63.19263458251953, -31.039121627807617, 15.766700744628906, 8.127838134765625, 95.67637634277344, 86.23672485351562, 39.88270568847656, 100.57608032226562, 43.28845977783203, -2.3498458862304688, 37.22129440307617, 87.27548217773438, -114.8714599609375, 15.143692016601562, 107.01753234863281, 42.286407470703125, 24.261672973632812, -4.417045593261719, 7.8933868408203125, 7.231828689575195, 52.94712448120117, 38.0426025390625, 110.93534851074219, 47.414207458496094, 3.443532943725586, 135.69998168945312, 42.29602813720703, 11.641029357910156, -49.6368408203125, 83.42887115478516, 74.35415649414062, 24.72087860107422, 70.8724365234375, 1.702920913696289, 0.5960140228271484, 143.4862518310547, 47.467376708984375, 1.3406009674072266, 10.794769287109375, 57.256080627441406, 71.19319915771484, 117.93718719482422, 130.622802734375, -19.208404541015625, -2.4289016723632812, 87.99142456054688, 0.7749710083007812, 93.62430572509766, -6.980594635009766, 47.46670913696289, 126.43953704833984, 82.09092712402344, 28.183975219726562, 22.688125610351562, -20.425052642822266, 37.226043701171875, 88.0731201171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000304.npy"}
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 33.213111877441406, "std": 62.712013244628906, "min": -126.09513854980469, "p10": -52.895094299316405, "median": 28.127531051635742, "p90": 114.4311149597168, "max": 142.45994567871094, "pos_frac": 0.75, "sample": [114.36375427246094, 16.15509796142578, 12.984237670898438, -66.98418426513672, -11.975112915039062, 0.6041297912597656, 137.64944458007812, -61.34942626953125, 34.10227966308594, 121.03132629394531, 29.38895034790039, 137.33897399902344, -26.500579833984375, 102.95918273925781, -96.58155822753906, -15.848089218139648, 15.490997314453125, 6.427021026611328, -60.81500244140625, 74.25555419921875, 11.751667022705078, 17.058021545410156, 14.945777893066406, 53.24815368652344, 42.414817810058594, 64.55096435546875, 72.8233642578125, 1.97088623046875, 46.6728515625, -126.09513854980469, 4.94732666015625, 42.813865661621094, 113.830322265625, 96.85941314697266, -18.770538330078125, 36.58362579345703, 142.45994567871094, -14.8046875, 20.32184410095215, 57.873626708984375, 15.098838806152344, 9.307548522949219, 63.05914306640625, 133.85107421875, 101.37567901611328, -4.5557403564453125, 13.000747680664062, -53.55357360839844, 129.38238525390625, 72.63020324707031, 97.69847869873047, 28.73790740966797, 27.517154693603516, 102.16934204101562, 62.709381103515625, 90.72518920898438, -48.027740478515625, 67.24691772460938, -89.08883666992188, 114.4599838256836, -16.76801109313965, 14.652727127075195, 101.21563720703125, -51.358642578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000305.npy"}
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 34.661781311035156, "std": 49.734092712402344, "min": -105.3082275390625, "p10": -12.159325027465815, "median": 23.781859397888184, "p90": 109.06050338745119, "max": 154.22442626953125, "pos_frac": 0.796875, "sample": [129.70492553710938, 78.29642486572266, 22.843168258666992, 126.58258056640625, 14.949050903320312, 87.75618743896484, 8.264190673828125, 12.861980438232422, 25.882524490356445, 58.203697204589844, 111.3826675415039, 93.12144470214844, -41.32044219970703, 16.449085235595703, -13.90896987915039, -105.3082275390625, 111.56034851074219, 31.145843505859375, 49.394474029541016, -1.4437522888183594, 2.8579940795898438, 95.09239959716797, 98.03752899169922, 9.605093002319336, 7.570600509643555, 7.141197204589844, -19.458600997924805, 101.85193634033203, 4.592613220214844, 57.33430480957031, -27.973342895507812, 103.64212036132812, -8.076820373535156, 6.650300979614258, 66.83544158935547, 32.400535583496094, 114.07505798339844, 139.4763641357422, 10.0408935546875, 0.0346832275390625, 37.78832244873047, 4.781835556030273, 31.13547134399414, 40.26177215576172, -42.60620880126953, 12.964088439941406, -2.1577816009521484, 47.60969543457031, 7.030174255371094, 34.01692199707031, 71.70085906982422, 58.175270080566406, 1.6493377685546875, -1.4403533935546875, 24.720550537109375, 16.129154205322266, 26.042076110839844, -7.01715087890625, 33.71581268310547, 154.22442626953125, -35.04582214355469, 21.2685546875, 66.38555908203125, -1.1260604858398438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000306.npy"}
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 16.271129608154297, "std": 61.87257766723633, "min": -104.25379943847656, "p10": -53.52414894104004, "median": 5.364692687988281, "p90": 107.93981094360352, "max": 136.88560485839844, "pos_frac": 0.5625, "sample": [99.27010345458984, -47.216346740722656, 8.302810668945312, -24.56597137451172, -102.210205078125, 58.59953308105469, 136.88560485839844, -35.89368438720703, 126.69732666015625, 92.90315246582031, -39.282958984375, 20.1842041015625, 14.098316192626953, 124.95128631591797, -104.25379943847656, 42.135894775390625, 31.077880859375, 0.26959991455078125, 71.60263061523438, -34.52821350097656, 43.93695068359375, -84.25867462158203, -3.421173095703125, -75.28085327148438, 13.218238830566406, -65.33265686035156, -42.252479553222656, 41.888526916503906, 55.99818420410156, -1.1110458374023438, -20.180381774902344, 16.761579513549805, -0.003162384033203125, -53.65959548950195, -53.208106994628906, 57.68548583984375, -45.88520812988281, -1.7808761596679688, 105.56465911865234, -10.059707641601562, 2.2202014923095703, -12.451286315917969, 0.7491455078125, 120.751220703125, 119.09196472167969, -32.33204650878906, 28.498958587646484, 108.95773315429688, 76.59260559082031, 46.59789276123047, -76.26376342773438, 34.607994079589844, 98.9483871459961, -1.339609146118164, 55.28289794921875, 48.30156707763672, 2.42657470703125, 123.50918579101562, -51.12177276611328, -40.92076873779297, -47.06796646118164, 27.686370849609375, 92.6435317993164, -1.6635398864746094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000307.npy"}
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 32.88367462158203, "std": 57.91510009765625, "min": -126.90171813964844, "p10": -37.104077911376955, "median": 23.026569366455078, "p90": 116.43410110473636, "max": 138.0029296875, "pos_frac": 0.75, "sample": [122.42498016357422, 59.320159912109375, 47.42933654785156, -35.17759323120117, 107.02503967285156, 83.089111328125, -21.49353790283203, 34.49760437011719, 46.47821044921875, -3.9430313110351562, 118.9175033569336, 110.63949584960938, 120.81114959716797, 13.819087982177734, 2.71356201171875, -56.1469612121582, 67.51306915283203, 108.85549926757812, 54.56066131591797, 24.542858123779297, 68.89749145507812, -14.684242248535156, 122.01752471923828, 49.147098541259766, 14.200325012207031, 46.56050109863281, -2.847116470336914, 70.94977569580078, 58.14985656738281, -84.77078247070312, 138.0029296875, 5.793113708496094, 21.51028060913086, -37.92971420288086, 53.061370849609375, 2.7416915893554688, 92.69403839111328, -1.1092414855957031, 1.2215118408203125, -126.90171813964844, 7.91339111328125, 2.1085586547851562, 73.82713317871094, 129.20591735839844, 11.664016723632812, 28.907546997070312, 54.342735290527344, -13.77911376953125, 81.36854553222656, 5.083789825439453, 103.64845275878906, 10.407501220703125, -13.175537109375, 62.94275665283203, 103.5032958984375, 18.254539489746094, 122.66942596435547, -50.80482482910156, -56.502777099609375, 7.172018051147461, 5.890966415405273, -81.19288635253906, 10.874130249023438, -2.3554821014404297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000308.npy"}
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 33.654327392578125, "std": 51.640380859375, "min": -90.32525634765625, "p10": -15.959812164306637, "median": 27.513251304626465, "p90": 100.25930328369141, "max": 145.21690368652344, "pos_frac": 0.71875, "sample": [81.9732666015625, 140.5150909423828, 5.859687805175781, 100.70922088623047, 52.16345977783203, 0.8970718383789062, 84.60973358154297, -10.79180908203125, 66.77633666992188, 11.297187805175781, 34.465911865234375, -1.0132064819335938, -45.76218032836914, -8.995412826538086, 17.874221801757812, -85.72804260253906, 34.79217529296875, 92.30519104003906, -37.86511993408203, -6.665712356567383, 3.3224525451660156, 52.38649368286133, 87.72747802734375, -0.8585529327392578, 73.850830078125, 84.56428527832031, -0.68017578125, 28.927274703979492, 22.480186462402344, 3.4857635498046875, 13.951047897338867, 26.099227905273438, -17.341079711914062, 105.42021179199219, 78.48872375488281, 50.90700912475586, 33.501136779785156, 3.0679054260253906, -9.966699600219727, 59.318565368652344, -5.8249664306640625, 123.24822998046875, 75.54034423828125, 8.362632751464844, 145.21690368652344, -90.32525634765625, 99.2094955444336, 47.54012680053711, 21.241392135620117, 44.80858612060547, -12.736854553222656, 4.121574401855469, 39.7822265625, 56.78471374511719, 126.64598846435547, 24.85253143310547, 45.40929412841797, 85.96183013916016, -38.29850769042969, -44.10845947265625, -3.3311767578125, -12.728729248046875, 57.023399353027344, 129.4123077392578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000309.npy"}
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 20.566661834716797, "std": 54.627559661865234, "min": -73.0595932006836, "p10": -41.58157196044921, "median": 10.121971130371094, "p90": 98.69631423950197, "max": 153.96853637695312, "pos_frac": 0.59375, "sample": [100.62388610839844, -1.9583663940429688, 19.47475814819336, 41.13050842285156, 26.907363891601562, -64.81803131103516, -16.217071533203125, 75.05216979980469, 93.00821685791016, -4.196035385131836, -35.62440490722656, -31.609743118286133, 51.48485565185547, 125.720458984375, 30.96624755859375, -3.7658119201660156, 0.3849945068359375, 67.78146362304688, 139.7334442138672, -26.593833923339844, 3.255420684814453, -34.32475662231445, -73.0595932006836, 153.96853637695312, -28.298301696777344, -34.366600036621094, 44.40363311767578, -21.976280212402344, 17.56329345703125, 94.19864654541016, -46.022735595703125, 65.64112854003906, -3.8506622314453125, 91.02688598632812, -44.1346435546875, 48.04534912109375, 3.0046844482421875, 61.11772155761719, 11.631126403808594, 52.44355773925781, 3.4361343383789062, 13.620468139648438, -63.854888916015625, -29.024520874023438, 45.45679473876953, 36.75898742675781, -3.6488876342773438, 52.86009216308594, -3.516754150390625, -9.547454833984375, 8.612815856933594, 116.07627868652344, 103.4952621459961, -67.18608093261719, 1.2021503448486328, -34.20758819580078, -12.107147216796875, 119.47801208496094, 29.696813583374023, -16.856033325195312, -60.26954650878906, 93.70232391357422, 20.51886558532715, 23.818748474121094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000310.npy"}
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 33.13947677612305, "std": 54.22047805786133, "min": -139.2346649169922, "p10": -21.219057464599604, "median": 25.39160919189453, "p90": 110.976725769043, "max": 206.02456665039062, "pos_frac": 0.765625, "sample": [102.73722076416016, 2.8550872802734375, 16.330543518066406, -15.451347351074219, -23.690933227539062, -23.900375366210938, 40.338260650634766, 114.50794219970703, -2.527799606323242, 77.97063446044922, 7.6136474609375, -139.2346649169922, 2.8804969787597656, 69.31824493408203, 91.62826538085938, -1.2749500274658203, 82.84855651855469, 42.850135803222656, 116.723876953125, 39.7142333984375, 4.316650390625, 60.401611328125, 0.5806751251220703, 33.51995086669922, 80.27179718017578, 10.660518646240234, -29.682472229003906, -5.542331695556641, 17.032913208007812, 26.6766357421875, -24.423309326171875, 136.09347534179688, 153.8722686767578, 38.368370056152344, 7.239860534667969, 56.465667724609375, 4.550209045410156, 55.55997085571289, 16.8232421875, 50.52964782714844, 206.02456665039062, 24.106582641601562, -5.810523986816406, 115.00474548339844, -46.42255401611328, 12.258926391601562, 15.699478149414062, -9.238571166992188, 41.779937744140625, 14.419761657714844, 133.8800811767578, 46.61663818359375, 52.00018310546875, -10.740827560424805, 29.856521606445312, 80.30811309814453, 1.2264842987060547, -41.834495544433594, -13.817886352539062, 28.292678833007812, 0.7872543334960938, 31.69879913330078, 68.14384460449219, 51.1343994140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000311.npy"}
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 43.01353454589844, "std": 49.50448226928711, "min": -77.49250030517578, "p10": -10.577713394165038, "median": 35.456756591796875, "p90": 114.89601821899416, "max": 137.11489868164062, "pos_frac": 0.859375, "sample": [28.348308563232422, 2.7962989807128906, -16.812362670898438, 9.401138305664062, 27.952621459960938, 4.026374816894531, 50.97549057006836, -1.1434707641601562, 22.676803588867188, 19.786949157714844, 134.201416015625, 86.13600158691406, 77.60802459716797, 83.0854263305664, 2.3466339111328125, 78.52108001708984, 9.512306213378906, 0.8819713592529297, 7.554101943969727, 63.968292236328125, 127.77959442138672, 17.62688446044922, 14.349178314208984, 14.576454162597656, 37.980499267578125, 110.35953521728516, 11.428031921386719, -9.418106079101562, 29.22265625, 69.97966003417969, 132.5277099609375, -58.020957946777344, 64.8559341430664, 56.33562469482422, -77.49250030517578, 39.05216598510742, 97.10723114013672, 19.93722152709961, 87.75233459472656, 136.46168518066406, 137.11489868164062, 1.2602195739746094, 30.62903594970703, 62.26853942871094, 97.81842041015625, 32.933013916015625, 41.997032165527344, 116.84022521972656, 52.57926940917969, -11.074687957763672, -18.910842895507812, 75.7293930053711, 125.86248016357422, 75.97709655761719, 29.00177001953125, 82.003662109375, 72.84325408935547, 46.36199951171875, -73.48770141601562, 79.2108154296875, 65.0176010131836, -18.966720581054688, 26.929458618164062, 8.70367431640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000312.npy"}
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 22.5720272064209, "std": 53.27481460571289, "min": -84.23394012451172, "p10": -39.41127815246582, "median": 11.638311386108398, "p90": 97.98922729492189, "max": 159.3764190673828, "pos_frac": 0.6875, "sample": [1.804494857788086, 25.51213836669922, 49.347023010253906, 11.120967864990234, 12.155654907226562, 41.44133758544922, 3.300434112548828, 26.402341842651367, 64.82339477539062, -67.04850769042969, 1.6046142578125, 43.09861755371094, 20.914108276367188, 34.888336181640625, 4.269599914550781, -4.668739318847656, 1.7578353881835938, -36.35712814331055, -5.8492889404296875, -3.822052001953125, 48.15459442138672, 81.80711364746094, 3.967203140258789, 143.30491638183594, -84.23394012451172, -32.70863342285156, -6.5191192626953125, 13.6851806640625, 95.94367980957031, 30.780433654785156, 136.03900146484375, 98.86589050292969, -60.414825439453125, -3.491342544555664, 107.81179809570312, 31.50347137451172, 2.0637893676757812, 5.943016052246094, -2.5006179809570312, 130.607177734375, 26.26131820678711, 21.528018951416016, 46.82633972167969, -6.956512451171875, 159.3764190673828, 0.46378326416015625, -11.304367065429688, -17.12706756591797, 20.847396850585938, 0.2420654296875, 95.72319793701172, 29.932769775390625, 46.33789825439453, -1.2716140747070312, -1.4101791381835938, -74.24928283691406, 73.79375457763672, -81.80714416503906, 110.94657135009766, -40.72019958496094, -43.16610336303711, 25.285858154296875, 4.544769287109375, 95.2079849243164], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000313.npy"}
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 45.03102111816406, "std": 56.65116500854492, "min": -63.061424255371094, "p10": -7.794742012023925, "median": 36.855628967285156, "p90": 128.67604064941406, "max": 159.81591796875, "pos_frac": 0.75, "sample": [25.407257080078125, 8.897724151611328, 18.97644805908203, 135.34527587890625, 47.088985443115234, 81.64435577392578, -1.8075103759765625, 102.31458282470703, -63.061424255371094, 42.257354736328125, -0.593780517578125, -5.224826812744141, 150.95904541015625, 37.61286926269531, -4.362030029296875, 159.81591796875, 46.840572357177734, 36.106346130371094, 118.0149154663086, 8.728294372558594, 25.592864990234375, -1.9764404296875, 113.68511962890625, 125.61151123046875, 119.40327453613281, -61.872833251953125, 25.509483337402344, 135.61801147460938, 6.461029052734375, 1.0561676025390625, 86.81813049316406, 49.709228515625, 26.092105865478516, -5.13970947265625, 1.8360023498535156, 38.75667190551758, 74.103759765625, -23.914199829101562, 12.846963882446289, 81.80908203125, -7.586164474487305, -0.504058837890625, 129.98941040039062, 66.1712646484375, 115.8616714477539, 13.736114501953125, 81.48080444335938, 117.9195327758789, -7.884132385253906, 134.0499267578125, 42.26496887207031, 122.548828125, 77.96627807617188, 11.984619140625, 85.90019226074219, 19.964859008789062, -19.6583251953125, 37.60491180419922, 11.542388916015625, -1.1312789916992188, 45.644126892089844, -56.37379455566406, -49.556575775146484, 133.08322143554688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000314.npy"}
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 34.609683990478516, "std": 51.31373596191406, "min": -98.51448059082031, "p10": -25.370175170898438, "median": 33.85580062866211, "p90": 101.57727508544923, "max": 140.93304443359375, "pos_frac": 0.71875, "sample": [-9.942878723144531, 79.24578857421875, 108.28132629394531, -4.548540115356445, -0.3063831329345703, 36.475364685058594, -98.51448059082031, -1.4113750457763672, -63.0419921875, 27.640830993652344, -60.548255920410156, 32.370216369628906, -3.4684104919433594, 124.0080795288086, -1.3698005676269531, 48.43273162841797, 24.636856079101562, -15.477806091308594, 10.163105010986328, 0.6825656890869141, 42.69830322265625, 61.75138854980469, 1.4682064056396484, 46.78614807128906, 26.300552368164062, 49.52766799926758, 70.271484375, -24.806381225585938, 37.37862014770508, 30.61787223815918, 35.34138488769531, 58.47599411010742, 99.19816589355469, 16.937759399414062, 28.96484375, 91.45066833496094, 94.90446472167969, -25.611801147460938, -38.746063232421875, 51.11308288574219, 102.59689331054688, -5.962158203125, 93.06623840332031, 113.33548736572266, -23.270065307617188, 61.79620361328125, 54.290626525878906, 140.93304443359375, -40.98581314086914, 30.381515502929688, 51.23211669921875, 4.1530914306640625, 98.31100463867188, 1.1016883850097656, 76.79820251464844, 10.51446533203125, 97.92711639404297, 57.6722412109375, 115.76698303222656, -49.70384979248047, 71.71080780029297, -2.5113887786865234, 122.64825439453125, 45.88771057128906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000315.npy"}
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 44.60052490234375, "std": 51.72370147705078, "min": -50.26324462890625, "p10": -17.863514137268066, "median": 43.253610610961914, "p90": 121.64878387451172, "max": 141.01422119140625, "pos_frac": 0.734375, "sample": [-7.963531494140625, 25.34881591796875, 66.02066040039062, -19.380943298339844, -32.966400146484375, -8.534072875976562, 48.672271728515625, 40.29558563232422, 24.456390380859375, 116.04956817626953, 133.5960693359375, 79.19287109375, 77.4742431640625, 89.48114013671875, 20.544921875, -40.05390930175781, -50.26324462890625, 18.75780487060547, -16.60731315612793, 90.72634887695312, 16.107866287231445, 125.64144897460938, -2.6343307495117188, 6.508476257324219, 26.303512573242188, 40.30208206176758, 49.28401184082031, 5.255126953125, -9.340763092041016, 27.46881866455078, 121.99032592773438, 84.21324157714844, 90.98709106445312, 133.99708557128906, -8.852777481079102, 66.09263610839844, -18.401885986328125, 84.1921157836914, 60.33116912841797, 85.19768524169922, 117.14108276367188, 62.64209747314453, 43.50571060180664, -3.747570037841797, 120.85185241699219, 4.000986099243164, 1.1784553527832031, 141.01422119140625, -12.120513916015625, 56.91468811035156, -27.034500122070312, -30.313974380493164, 8.135040283203125, -12.641143798828125, 57.68232727050781, 43.00151062011719, -1.1183052062988281, 127.78575134277344, 91.66383361816406, 95.88237762451172, 60.86016082763672, 49.15080261230469, 130.0964813232422, 90.41191864013672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000316.npy"}
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 47.49908447265625, "std": 55.03853225708008, "min": -94.28781127929688, "p10": -11.59907531738281, "median": 41.13867950439453, "p90": 133.10600738525392, "max": 151.01260375976562, "pos_frac": 0.8125, "sample": [71.3792953491211, 77.82522583007812, 53.56216812133789, 85.12322998046875, 40.821258544921875, 151.01260375976562, -21.593788146972656, 54.113121032714844, -34.99229431152344, 31.673782348632812, 2.5609970092773438, 44.5921745300293, -12.445274353027344, 11.404136657714844, 44.26959228515625, 37.62483596801758, 93.6246337890625, 37.24476623535156, 86.06339263916016, 137.14942932128906, -47.78985595703125, -4.438545227050781, 37.44834899902344, -47.89311218261719, 40.26338195800781, 34.762115478515625, 107.13890075683594, 75.94805908203125, 81.26155853271484, -94.28781127929688, 133.49391174316406, 12.583053588867188, 62.13043975830078, 139.3780975341797, 132.20089721679688, -61.86443328857422, 54.69334411621094, 1.2911643981933594, 73.62230682373047, 140.77850341796875, 16.46170425415039, 41.45610046386719, 142.41018676757812, -6.741065979003906, 23.929397583007812, 8.720447540283203, 24.340423583984375, 118.13885498046875, 11.58251953125, -2.768310546875, 77.39375305175781, 26.911434173583984, 46.34283447265625, 134.69161987304688, -1.7729034423828125, 50.91908264160156, -9.624610900878906, 59.820770263671875, 121.66735076904297, 101.97032165527344, 0.9898834228515625, 116.53279113769531, 36.522705078125, 38.31251525878906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000317.npy"}
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 32.58120346069336, "std": 68.05919647216797, "min": -110.84611511230469, "p10": -55.224120330810536, "median": 30.46680450439453, "p90": 138.7897689819336, "max": 159.35679626464844, "pos_frac": 0.671875, "sample": [35.08242416381836, 109.52245330810547, 12.019783020019531, -80.00601196289062, 114.19738006591797, 119.07592010498047, 31.0823974609375, -31.83648681640625, 62.792335510253906, 139.0528564453125, 20.916568756103516, -47.073402404785156, 61.836551666259766, 51.316749572753906, 126.17013549804688, -1.5167465209960938, 25.865493774414062, -45.990509033203125, 38.885860443115234, 3.6801891326904297, -15.561866760253906, 60.87926483154297, 138.55703735351562, -13.511390686035156, 9.572830200195312, 54.4473991394043, 143.47865295410156, -93.76040649414062, 108.90361022949219, 91.65879821777344, -22.54875946044922, 29.851211547851562, -43.18870544433594, 21.06760025024414, 1.375, 158.53546142578125, 159.35679626464844, -43.874534606933594, 138.88951110839844, 6.729911804199219, -58.71728515625, 43.51155090332031, 58.399986267089844, -110.84611511230469, -0.3711395263671875, 49.99909973144531, 25.06244659423828, 140.3946990966797, -0.726654052734375, 86.57174682617188, 0.9573879241943359, 144.63497924804688, 44.53242492675781, 75.70733642578125, -78.39224243164062, -23.56463623046875, 63.29774475097656, -75.53955078125, 46.81852722167969, -3.2410354614257812, 77.1835708618164, -66.32537078857422, -28.556732177734375, 38.47480010986328], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000318.npy"}
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 45.87800598144531, "std": 62.85170364379883, "min": -85.95919036865234, "p10": -34.675755310058584, "median": 47.87974548339844, "p90": 135.43976898193358, "max": 175.13629150390625, "pos_frac": 0.75, "sample": [48.247894287109375, 143.24200439453125, 21.13263702392578, 135.63253784179688, 27.820363998413086, 109.9334716796875, 3.298931121826172, 48.773193359375, 45.51728820800781, 47.5115966796875, 130.27418518066406, 117.56546783447266, -3.445758819580078, 0.9871139526367188, 16.298240661621094, 60.197998046875, 12.750679016113281, 151.83787536621094, 4.748485565185547, 82.55931091308594, 5.266910552978516, -27.770828247070312, 51.40022277832031, 67.24932861328125, 160.31398010253906, -7.032417297363281, 134.98997497558594, 68.48297882080078, 137.35877990722656, 134.9520721435547, -37.635009765625, -57.88661193847656, 48.348114013671875, 175.13629150390625, -7.264560699462891, 68.62319946289062, -48.51383972167969, 98.81775665283203, 72.13140106201172, -53.79609298706055, 134.8762664794922, 48.703582763671875, -14.717323303222656, -1.4658126831054688, 10.766242980957031, 13.194208145141602, -71.53315734863281, -5.2912139892578125, -56.91679382324219, 97.594970703125, -7.589542388916016, 33.66583251953125, 22.95965576171875, -16.533554077148438, 37.515357971191406, 98.55624389648438, 61.25836181640625, 67.48008728027344, 75.3174057006836, 14.722719192504883, 86.65621948242188, -85.95919036865234, 140.35107421875, 64.52555847167969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000319.npy"}
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 37.191551208496094, "std": 63.201751708984375, "min": -121.44386291503906, "p10": -27.24335346221924, "median": 9.792304992675781, "p90": 125.16368789672853, "max": 194.24591064453125, "pos_frac": 0.6875, "sample": [100.32327270507812, -43.6653938293457, 0.6851940155029297, 47.77442169189453, 9.155776977539062, 32.92987060546875, 127.43443298339844, 3.3228111267089844, 87.85369110107422, -1.7743511199951172, -34.21592712402344, 121.630859375, -0.37775611877441406, -0.44123077392578125, -51.92125701904297, 118.73037719726562, 68.70288848876953, 1.3888168334960938, -20.936248779296875, 99.99595642089844, -27.397619247436523, 3.240673065185547, -121.44386291503906, 0.5211353302001953, 7.341888427734375, 194.24591064453125, 148.06878662109375, 112.39948272705078, 100.74261474609375, -13.450347900390625, 51.58991241455078, -17.253896713256836, 11.203285217285156, 2.397247314453125, -0.6661205291748047, 126.6777572631836, 7.258079528808594, -26.883399963378906, -6.582828521728516, -10.837230682373047, 41.8262825012207, 143.1493682861328, 9.065391540527344, 56.65052795410156, 10.4288330078125, 58.40868377685547, 148.90919494628906, 54.07624816894531, -4.6694488525390625, -41.90345001220703, 2.5698928833007812, -65.67692565917969, 147.79266357421875, -5.718193054199219, 0.4201469421386719, 53.205787658691406, 18.43072509765625, 98.86332702636719, 90.77456665039062, 68.45592498779297, 70.26394653320312, -8.684738159179688, 119.5966796875, 106.25637817382812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000320.npy"}
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 42.57990264892578, "std": 63.26607894897461, "min": -108.67741394042969, "p10": -49.30181274414062, "median": 38.80641746520996, "p90": 126.51702117919922, "max": 141.8067169189453, "pos_frac": 0.78125, "sample": [106.70449829101562, 91.5893783569336, 2.3139801025390625, 106.65514373779297, 104.01707458496094, -28.75177764892578, 141.8067169189453, 84.28350067138672, 127.03477478027344, 42.64934539794922, 140.2759552001953, 34.9634895324707, 49.00492477416992, 72.09941864013672, -108.67741394042969, 6.463863372802734, 24.422264099121094, 67.77664947509766, 33.399169921875, 125.3414535522461, 43.00590515136719, 22.045852661132812, -53.47978973388672, -0.2525482177734375, 46.536338806152344, -72.37294006347656, -76.86940002441406, -75.07630920410156, 65.4369888305664, 33.05390930175781, 32.80122375488281, 49.91701889038086, 110.86427307128906, 77.37660217285156, 2.5157852172851562, 99.00628662109375, 63.954925537109375, -1.2310810089111328, 29.696102142333984, 115.8737564086914, 2.492267608642578, -2.445148468017578, 31.76260757446289, 106.57817840576172, -0.4938812255859375, -98.01374816894531, 127.02083587646484, 140.63323974609375, 123.26919555664062, 64.28826904296875, 7.776374816894531, 5.654916763305664, 31.579444885253906, 130.5530242919922, 121.08035278320312, -9.100471496582031, 27.86255645751953, 64.48799896240234, 140.75961303710938, -41.48974609375, 13.605857849121094, 1.2308731079101562, 52.49561309814453, -52.64984130859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000321.npy"}
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 50.791160583496094, "std": 58.9851188659668, "min": -105.34213256835938, "p10": -9.228665924072265, "median": 44.983009338378906, "p90": 134.49225463867188, "max": 146.14779663085938, "pos_frac": 0.859375, "sample": [-19.002256393432617, 68.92337799072266, 136.96261596679688, 20.517684936523438, 104.43516540527344, 13.594200134277344, 146.14779663085938, 103.96800994873047, 131.12046813964844, 27.935209274291992, 37.619873046875, 51.89337158203125, 129.07420349121094, 142.0350799560547, 56.522613525390625, 3.1952667236328125, 16.709854125976562, 55.40818786621094, 4.9171142578125, 11.348388671875, 77.99378204345703, 7.305412292480469, -79.12676239013672, 131.1065216064453, 119.06320190429688, 11.314937591552734, 32.33918762207031, 26.218698501586914, 0.26969146728515625, -5.4795989990234375, -47.74400329589844, 97.84477233886719, 57.24873352050781, 126.43901062011719, 102.72958374023438, 59.88169860839844, 7.707466125488281, 140.16256713867188, 6.5802459716796875, -8.970443725585938, 70.45094299316406, 47.62957000732422, 1.1087112426757812, 133.1282501220703, 42.336448669433594, 134.08624267578125, 134.666259765625, 80.07725524902344, -23.830459594726562, -105.34213256835938, 37.92631912231445, 13.715255737304688, 27.15549087524414, 53.10387420654297, 136.9604949951172, 31.074356079101562, 49.182708740234375, 26.046951293945312, 66.45669555664062, -50.48029708862305, 139.4060516357422, -9.339332580566406, 89.81538391113281, 19.08824348449707], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000322.npy"}
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 32.55350112915039, "std": 70.82257080078125, "min": -124.22259521484375, "p10": -42.50440521240235, "median": 9.382007598876953, "p90": 137.94090270996094, "max": 165.447021484375, "pos_frac": 0.6875, "sample": [0.02577972412109375, -95.98861694335938, 69.38206481933594, 32.70805358886719, -7.825752258300781, 111.26031494140625, 20.086578369140625, 14.66067123413086, 7.364173889160156, 132.14035034179688, 37.42555236816406, -17.709379196166992, 137.77609252929688, 4.862358093261719, 131.39962768554688, -39.515933990478516, -51.014129638671875, 78.041259765625, 113.9887924194336, 1.7074432373046875, 43.34123229980469, 138.01153564453125, 4.009552001953125, 10.45428466796875, 150.6290283203125, -1.6920547485351562, -42.80122375488281, 77.42694091796875, 0.8223094940185547, -19.01470947265625, 165.447021484375, -41.81182861328125, 10.980148315429688, -70.7047119140625, 141.66470336914062, 9.688240051269531, -90.78082275390625, 136.19139099121094, 62.41575622558594, 0.11360359191894531, 8.698783874511719, 71.12123107910156, -6.2484893798828125, -6.675743103027344, 153.16163635253906, 153.7074432373047, 141.53976440429688, -124.22259521484375, 128.3631591796875, 2.3383102416992188, -121.78246307373047, -1.8510894775390625, 90.60865783691406, 9.027557373046875, -17.29437255859375, 83.90876007080078, 76.78164672851562, 52.30377197265625, 9.075775146484375, -4.284294128417969, 34.03341293334961, 4.577171325683594, -1.9590301513671875, -16.67064666748047], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000323.npy"}
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 47.79777526855469, "std": 57.81854248046875, "min": -95.28006744384766, "p10": -8.737258148193357, "median": 35.398895263671875, "p90": 130.54032135009766, "max": 161.01858520507812, "pos_frac": 0.828125, "sample": [81.730712890625, 45.462554931640625, 27.73455810546875, 7.9877777099609375, -72.94927978515625, 147.0850067138672, 58.615386962890625, 62.337730407714844, -18.399551391601562, -3.264739990234375, 0.5204238891601562, 118.61311340332031, 80.3236083984375, 46.59208297729492, 105.65019226074219, 21.660633087158203, 131.66873168945312, -31.07860565185547, 79.6070327758789, 1.10565185546875, 40.61451721191406, 22.3834228515625, 4.725818634033203, 82.20491790771484, 32.831443786621094, 18.283782958984375, -2.3335952758789062, 29.89641571044922, 161.01858520507812, 13.295612335205078, 16.629173278808594, -95.28006744384766, 51.606414794921875, 127.90736389160156, 18.284759521484375, 61.17367172241211, 25.96758270263672, -10.076925277709961, 85.18663024902344, 96.37117767333984, 37.966346740722656, -40.29554748535156, 157.6328125, 16.666404724121094, 140.215576171875, 21.77718734741211, 0.758880615234375, 126.0059814453125, 55.30098342895508, 126.16796112060547, 149.90150451660156, 0.8496971130371094, 5.712131500244141, 104.78587341308594, 135.15716552734375, -0.9164581298828125, 41.226219177246094, -10.060073852539062, -5.650688171386719, 49.40481948852539, 125.26023864746094, 118.70158386230469, 29.81036376953125, 0.9850673675537109], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000324.npy"}
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 42.006649017333984, "std": 66.41486358642578, "min": -168.04612731933594, "p10": -35.54867477416991, "median": 39.76740264892578, "p90": 125.50550994873048, "max": 172.37289428710938, "pos_frac": 0.75, "sample": [36.83652114868164, 128.12237548828125, 2.265575408935547, 108.78885650634766, -39.79094696044922, 90.2628173828125, 88.40431213378906, 54.963417053222656, 61.05683898925781, 58.77478790283203, 59.96083068847656, 116.64652252197266, -5.090394973754883, -25.650039672851562, 26.444595336914062, -7.350986480712891, 126.42068481445312, 95.80413818359375, 113.1461410522461, -104.20637512207031, 150.6002197265625, -24.99054718017578, 121.72441864013672, 1.3382205963134766, -0.7946205139160156, 146.88430786132812, 16.594318389892578, 123.37010192871094, -51.06758499145508, 58.30719757080078, -72.23786926269531, 42.20804214477539, 100.56363677978516, 27.786712646484375, -65.38159942626953, 39.8621826171875, -51.69832992553711, 117.32536315917969, 35.89490509033203, 109.1658935546875, 66.21087646484375, 20.19044303894043, 8.725738525390625, 172.37289428710938, -168.04612731933594, 52.53148651123047, 33.6251220703125, 21.5150146484375, 106.50090026855469, 90.30895233154297, 143.1953887939453, -9.337272644042969, 39.67262268066406, -8.802154541015625, 4.619422912597656, -12.335441589355469, 49.108680725097656, 53.830074310302734, 134.39454650878906, 46.67918395996094, 11.194686889648438, 22.449058532714844, 15.803535461425781, -17.246749877929688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000325.npy"}
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 19.965797424316406, "std": 64.87031555175781, "min": -110.08921813964844, "p10": -68.75663375854492, "median": 12.64312744140625, "p90": 111.20279922485352, "max": 145.4899139404297, "pos_frac": 0.59375, "sample": [-11.758186340332031, -5.068935394287109, 6.136768341064453, 29.01723861694336, 38.9583740234375, -96.91552734375, 143.56822204589844, 81.86962890625, -4.614471435546875, 77.16814422607422, -49.19383239746094, 39.076663970947266, 20.55074691772461, 1.135589599609375, 61.151309967041016, 104.47686767578125, 5.854337692260742, 77.27198791503906, -5.63323974609375, 32.28346252441406, 117.56320190429688, -0.523773193359375, 25.788833618164062, 71.65312194824219, -69.1478500366211, -30.39300537109375, -110.08921813964844, -15.873809814453125, -21.29669189453125, -5.3311767578125, 145.4899139404297, -95.27881622314453, 19.790260314941406, -4.430362701416016, 45.56707000732422, 12.1007080078125, 5.165733337402344, -24.722640991210938, -50.20056915283203, -14.91455078125, 93.93363952636719, 136.2647705078125, -78.48124694824219, -4.4074859619140625, 112.87579345703125, 31.42560577392578, 22.03732681274414, -100.43647003173828, 99.8175277709961, 136.72422790527344, -2.4560890197753906, -3.9971790313720703, -54.19425964355469, 111.70972442626953, 5.88330078125, 21.842025756835938, -101.3756103515625, 67.318115234375, 64.64802551269531, 14.68212890625, 110.01997375488281, -67.84379577636719, 13.185546875, 102.38383483886719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000326.npy"}
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 33.48351287841797, "std": 67.56505584716797, "min": -101.69248962402344, "p10": -61.43244285583496, "median": 25.329540252685547, "p90": 124.54356842041018, "max": 166.31289672851562, "pos_frac": 0.6875, "sample": [49.171775817871094, -58.539695739746094, -15.788093566894531, 110.765380859375, 48.18467712402344, 26.12335205078125, -75.97160339355469, 68.05309295654297, -91.73185729980469, 79.93058776855469, 146.39329528808594, 26.84726333618164, -79.51161193847656, 113.46888732910156, 24.535728454589844, 136.94956970214844, 108.21255493164062, 16.215377807617188, 33.12882995605469, -62.67219161987305, 0.9470958709716797, 8.261905670166016, 53.43970489501953, 161.87158203125, 136.26564025878906, 15.727706909179688, 20.83184814453125, 89.16361236572266, 67.42398834228516, -3.4154205322265625, 15.779342651367188, 2.887470245361328, -85.20909118652344, 106.13101196289062, 99.22364807128906, -6.649147033691406, 42.38380432128906, 28.181381225585938, -4.7650146484375, 2.9285659790039062, 144.1571044921875, 19.719100952148438, -99.40330505371094, -3.5541534423828125, 96.10701751708984, 166.31289672851562, -43.79673767089844, 102.40357971191406, 127.38639068603516, 41.59638214111328, 81.02629852294922, -8.897111892700195, 22.43796157836914, 12.165992736816406, -34.50626754760742, -17.6214542388916, 117.91031646728516, 112.09874725341797, 26.4461669921875, -0.9112281799316406, -12.931907653808594, -101.69248962402344, 50.73039245605469, -9.413719177246094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000327.npy"}
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 41.296051025390625, "std": 61.506168365478516, "min": -114.51302337646484, "p10": -21.94052352905273, "median": 22.262252807617188, "p90": 131.18711853027344, "max": 165.68502807617188, "pos_frac": 0.75, "sample": [98.94834899902344, 4.749849319458008, 8.198369979858398, 43.78166198730469, 124.29229736328125, 4.867338180541992, -8.763439178466797, 96.04939270019531, 76.32169342041016, 15.867080688476562, 8.106498718261719, 107.4481201171875, 132.4526824951172, 9.147380828857422, 53.9337158203125, -49.14653015136719, 139.90689086914062, 26.588254928588867, 22.030227661132812, -6.092327117919922, 22.026901245117188, -7.694122314453125, 6.285261154174805, 25.236263275146484, -16.702224731445312, -33.254615783691406, 146.77130126953125, 10.860313415527344, -73.92169189453125, 28.25713348388672, -114.51302337646484, 122.2909927368164, 98.49751281738281, 3.74029541015625, 131.201171875, -24.185508728027344, 20.654037475585938, -5.28863525390625, 165.68502807617188, 22.98187255859375, 120.77299499511719, 132.1163787841797, -3.3167877197265625, -37.774452209472656, 22.494277954101562, 108.31832122802734, -2.387075424194336, 8.499191284179688, -3.5511016845703125, 19.171619415283203, 74.72228240966797, -43.87773132324219, 80.18302917480469, 25.448593139648438, 7.56280517578125, 75.73924255371094, 104.7233657836914, -0.9556484222412109, 131.15432739257812, 31.44251251220703, 99.07970428466797, 0.8573532104492188, 145.8468780517578, 109.06116485595703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000328.npy"}
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 42.78675079345703, "std": 62.36354446411133, "min": -86.52472686767578, "p10": -31.423982429504395, "median": 33.07331848144531, "p90": 141.82959442138673, "max": 167.82794189453125, "pos_frac": 0.734375, "sample": [22.326086044311523, 16.108631134033203, 49.406341552734375, 59.155792236328125, -51.85484313964844, 37.51494598388672, 167.82794189453125, -12.582244873046875, 80.6798095703125, 34.16228485107422, 105.52590942382812, 57.071014404296875, -8.290075302124023, 99.30633544921875, 0.4602527618408203, 23.55712127685547, 31.984352111816406, 108.89332580566406, -7.277740478515625, 38.77500915527344, -1.90899658203125, 26.410097122192383, 146.96217346191406, 42.06304931640625, -86.52472686767578, 144.33880615234375, 149.83534240722656, 104.72064971923828, 44.35931396484375, 40.603172302246094, 141.84323120117188, 124.50885009765625, -45.369789123535156, 73.3548583984375, 1.465200424194336, 7.6357269287109375, 131.4671630859375, 0.7130565643310547, 141.7977752685547, 88.31062316894531, 150.43218994140625, 122.7373046875, 6.165340423583984, -60.96965026855469, -3.459308624267578, 133.18408203125, 1.2536811828613281, 25.77918243408203, 14.608047485351562, 38.294925689697266, -24.384151458740234, 37.416961669921875, 0.0706634521484375, -31.24850845336914, 143.84982299804688, 86.70941162109375, -31.49918556213379, 70.05937957763672, -11.110092163085938, -38.297393798828125, 21.087966918945312, -1.7768630981445312, -0.3423023223876953, -39.545257568359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000329.npy"}
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 47.11699676513672, "std": 75.85592651367188, "min": -144.60964965820312, "p10": -56.57777481079101, "median": 49.24773406982422, "p90": 139.93245849609377, "max": 166.425048828125, "pos_frac": 0.71875, "sample": [2.7905349731445312, -23.399925231933594, -14.425909042358398, -129.5226287841797, 48.41133117675781, -144.60964965820312, 119.38972473144531, 28.398582458496094, -23.27794647216797, 134.047607421875, 135.5958251953125, 63.12677764892578, -86.75331115722656, -0.9930801391601562, 14.37454605102539, 133.40505981445312, 151.65646362304688, -72.78791046142578, 166.425048828125, 143.62884521484375, 82.85391235351562, -0.3629302978515625, 116.71066284179688, 2.2918453216552734, 132.12327575683594, 111.25892639160156, -0.0489959716796875, 14.813079833984375, -50.727691650390625, 6.050376892089844, 57.45184326171875, 122.47999572753906, -4.5742950439453125, 52.58189010620117, -65.69630432128906, 116.81844329833984, 97.24707794189453, 141.791015625, 74.22821044921875, 24.50429916381836, -4.940925598144531, 103.07569885253906, 55.5970458984375, 109.14557647705078, 135.40667724609375, 129.3079833984375, 1.8868694305419922, 58.00936508178711, 2.0839080810546875, 39.44755554199219, -2.7470321655273438, 18.46246337890625, 142.46798706054688, 128.8155517578125, -27.001846313476562, 21.5421142578125, 50.084136962890625, -59.08495330810547, 151.59356689453125, -80.10659790039062, 97.79270935058594, 114.85714721679688, 6.543388366699219, 145.97491455078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000330.npy"}
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 57.430519104003906, "std": 71.29207611083984, "min": -147.0289764404297, "p10": -12.524482727050778, "median": 64.70857238769531, "p90": 145.1454849243164, "max": 201.7230987548828, "pos_frac": 0.796875, "sample": [150.52357482910156, 9.209457397460938, 20.467655181884766, 99.7977294921875, 120.85385131835938, 150.13607788085938, 84.53077697753906, 77.36453247070312, 98.80262756347656, 4.441070556640625, -3.551929473876953, 160.2167205810547, 140.0453338623047, 142.78489685058594, 119.93727111816406, 7.677558898925781, 155.0440673828125, 29.975357055664062, 36.16957092285156, 146.15716552734375, 103.43757629394531, 3.851318359375, 7.6565399169921875, 4.26885986328125, -15.52978515625, 140.59786987304688, 98.81594848632812, 201.7230987548828, 19.073822021484375, 111.95623779296875, 86.83799743652344, 57.1303825378418, 142.16357421875, 77.35870361328125, -8.650535583496094, -1.6606369018554688, 58.127235412597656, -88.60881042480469, -84.62482452392578, 71.28990936279297, -65.9951171875, 152.8395538330078, 32.44627380371094, -2.636383056640625, 33.505699157714844, -147.0289764404297, 96.03174591064453, -1.316131591796875, -14.184745788574219, 21.19700050354004, 14.305488586425781, 87.65383911132812, 84.94345092773438, 33.72096252441406, 51.763755798339844, 5.655220031738281, -0.6762800216674805, 130.71414184570312, 74.76929473876953, -82.78305053710938, 140.7866668701172, 85.15596008300781, 72.87442016601562, 136.0126190185547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000331.npy"}
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 47.13706588745117, "std": 63.5460319519043, "min": -92.2254638671875, "p10": -4.379666709899902, "median": 24.592300415039062, "p90": 143.48009033203127, "max": 170.28729248046875, "pos_frac": 0.828125, "sample": [-54.63412094116211, 1.8402481079101562, 5.503440856933594, 1.090728759765625, 20.71299934387207, 103.72559356689453, 15.309333801269531, 32.21171569824219, 21.076705932617188, 33.239566802978516, 0.43795013427734375, 170.28729248046875, 97.72407531738281, 71.76019287109375, 139.38372802734375, 17.668426513671875, 16.254806518554688, 0.22491836547851562, 88.93728637695312, 46.033203125, 78.62874603271484, 10.461761474609375, -92.2254638671875, 141.47122192382812, 98.11148071289062, 114.82933807373047, -64.93221282958984, 20.927574157714844, 2.4405670166015625, -3.8640213012695312, 43.70098114013672, 144.34103393554688, 18.19970703125, -3.9495086669921875, 0.619171142578125, 26.001998901367188, -1.6712779998779297, 161.988525390625, -74.3780288696289, 6.94774055480957, 100.30252075195312, 133.0758819580078, 97.23233795166016, -4.564020156860352, 27.81768798828125, 3.1457748413085938, 86.19999694824219, 116.91383361816406, 107.47676086425781, 56.12846374511719, 157.1313934326172, 20.398284912109375, 61.84462356567383, -1.116943359375, 29.91944122314453, -20.083023071289062, 146.6156768798828, 5.795719146728516, 16.085617065429688, 145.78457641601562, 168.5894775390625, 120.38102722167969, -37.92293167114258, 23.182601928710938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000332.npy"}
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 49.97686767578125, "std": 64.06417083740234, "min": -103.70185089111328, "p10": -9.807615852355955, "median": 41.427223205566406, "p90": 147.49130401611328, "max": 180.06671142578125, "pos_frac": 0.796875, "sample": [16.152542114257812, -10.76392936706543, 8.768417358398438, 74.34884643554688, 16.37945556640625, -0.5084686279296875, 28.915157318115234, 51.940948486328125, 5.104972839355469, -52.91796875, 40.41735076904297, 68.53251647949219, 87.51522064208984, 106.10065460205078, 133.4894256591797, 21.083160400390625, 54.87518310546875, 148.56332397460938, 166.806396484375, 154.3335418701172, -7.5762176513671875, 70.31871032714844, 45.52648162841797, 33.89762878417969, 55.38761901855469, -54.742469787597656, -26.22867202758789, 50.2691650390625, 4.764133453369141, 42.437095642089844, 146.86663818359375, 142.99810791015625, -17.04693603515625, 43.44725799560547, 97.653076171875, 24.005382537841797, 164.40145874023438, 3.4907150268554688, 13.13253402709961, 19.329177856445312, -1.4210529327392578, 52.5947265625, -103.70185089111328, 3.9600353240966797, 55.024391174316406, -71.15760803222656, 180.06671142578125, 28.792091369628906, -4.2447509765625, 24.616416931152344, 3.1418075561523438, -6.158721923828125, 95.34162139892578, 147.75901794433594, 9.862157821655273, -0.8826522827148438, 87.16632843017578, 7.554191589355469, 138.70033264160156, 55.693634033203125, 169.41015625, 90.19635772705078, 134.108154296875, 130.63037109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000333.npy"}
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 36.44950866699219, "std": 73.49918365478516, "min": -128.938232421875, "p10": -56.692934417724594, "median": 22.299198150634766, "p90": 140.99223480224612, "max": 177.49337768554688, "pos_frac": 0.6875, "sample": [-64.31407928466797, -128.938232421875, 57.598487854003906, 79.81624603271484, 70.11703491210938, -12.3138427734375, -115.13151550292969, 111.15221405029297, 20.60865020751953, -114.65674591064453, 3.982940673828125, 177.49337768554688, -11.183662414550781, 79.27859497070312, 2.6343841552734375, 95.81523132324219, -14.25054931640625, -23.753679275512695, 24.267822265625, 90.44212341308594, 80.47006225585938, 157.1855010986328, 25.63603973388672, 75.63162231445312, 86.45953369140625, -65.14385986328125, 26.37383270263672, 0.6190052032470703, 127.88674926757812, 6.608707427978516, -2.362337112426758, 0.5406494140625, 151.98269653320312, -101.39312744140625, 85.75211334228516, 146.1519775390625, 59.828857421875, 7.66668701171875, 23.98974609375, -1.7534255981445312, 4.7806549072265625, -23.738784790039062, -6.5390625, -11.253005981445312, 143.44613647460938, -1.837921142578125, 1.1344223022460938, 14.238487243652344, -38.91026306152344, 108.22540283203125, 118.7682876586914, 58.20984649658203, -66.19842529296875, 94.16617584228516, -37.56531524658203, 20.266632080078125, 35.12767028808594, 151.66543579101562, -3.537139892578125, 16.994203567504883, 131.4383544921875, 92.59078216552734, 175.23345947265625, 135.26646423339844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000334.npy"}
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 41.03268051147461, "std": 70.47026062011719, "min": -110.35504150390625, "p10": -38.06456413269043, "median": 28.136869430541992, "p90": 148.8904022216797, "max": 167.50381469726562, "pos_frac": 0.75, "sample": [101.14991760253906, 69.84908294677734, -43.60417175292969, 167.50381469726562, 24.169158935546875, 66.29367065429688, 32.985504150390625, 1.297109603881836, 165.56533813476562, 94.66160583496094, 156.6571807861328, 142.3082275390625, 31.028854370117188, 63.65179443359375, 152.35299682617188, 145.34503173828125, 5.754966735839844, 34.99333953857422, 26.212539672851562, 148.93765258789062, -45.36983108520508, 5.5562896728515625, 53.298614501953125, 155.60977172851562, -35.81315612792969, 4.484344482421875, 87.09501647949219, -38.005943298339844, -31.717493057250977, 68.57465362548828, 138.6317596435547, 7.854515075683594, 4.725593566894531, 24.94927215576172, 121.11691284179688, -38.08968734741211, 148.7801513671875, -11.11300277709961, 27.270057678222656, 149.37599182128906, 7.672454833984375, -5.406282424926758, 32.30223083496094, 21.41913604736328, 58.824310302734375, 104.04071044921875, -13.656379699707031, -98.37174987792969, 29.003681182861328, 1.3743648529052734, 51.83772277832031, -35.9119758605957, -9.528160095214844, -110.33491516113281, -110.35504150390625, -55.31268310546875, 129.11264038085938, 92.00631713867188, 1.707916259765625, 73.729736328125, -11.177772521972656, 10.301261901855469, 13.664413452148438, 64.822265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000335.npy"}
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 41.96696853637695, "std": 69.19522857666016, "min": -99.78763580322266, "p10": -37.64541320800781, "median": 25.43833827972412, "p90": 147.3147720336914, "max": 200.85679626464844, "pos_frac": 0.734375, "sample": [-52.152587890625, 1.406219482421875, 129.53533935546875, 31.139144897460938, 182.80728149414062, 143.2401580810547, 64.80497741699219, -3.8897323608398438, 149.06103515625, 1.5739459991455078, 3.5158863067626953, -7.455591201782227, 150.1676025390625, -6.487274169921875, 131.2452392578125, 10.410839080810547, 6.2768707275390625, 97.09159851074219, 161.77896118164062, -85.66250610351562, -37.43132019042969, 57.937042236328125, 153.04620361328125, 50.92302703857422, -4.150245666503906, 113.97073364257812, 34.1201057434082, -99.78763580322266, -20.802490234375, -65.14348602294922, 154.1682586669922, -56.159019470214844, 57.738502502441406, 23.688735961914062, 18.332386016845703, 7.958433151245117, 0.9561004638671875, -2.1570663452148438, 56.53003692626953, 132.7787322998047, 7.67877197265625, -15.997520446777344, 55.11140441894531, 9.473236083984375, 127.13511657714844, 29.910812377929688, 68.14585876464844, -1.453460693359375, 27.18794059753418, 99.33560180664062, 6.4591217041015625, 0.519775390625, 74.35942840576172, 137.60047912597656, 200.85679626464844, 70.70014953613281, -54.428802490234375, 7.967315673828125, 62.78205108642578, -15.157989501953125, 33.30682373046875, 89.87631225585938, 17.329299926757812, -37.73716735839844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000336.npy"}
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 33.47923278808594, "std": 63.927181243896484, "min": -130.32583618164062, "p10": -32.626854705810544, "median": 27.502155303955078, "p90": 123.12420349121095, "max": 170.74618530273438, "pos_frac": 0.703125, "sample": [6.340751647949219, -69.8713150024414, -130.32583618164062, -27.133522033691406, 35.719879150390625, 112.2236328125, -12.64232063293457, 120.73225402832031, 35.434417724609375, -7.840324401855469, 125.868896484375, 14.852256774902344, 43.27552032470703, -119.002685546875, -34.98114013671875, -11.207389831542969, 28.837997436523438, 75.58334350585938, 34.42346954345703, -18.2791748046875, 9.270195007324219, 0.9534740447998047, -21.980445861816406, -56.98087692260742, 102.65097045898438, 85.87708282470703, -9.3310546875, 44.44279479980469, 52.562255859375, 157.4102020263672, 26.16631317138672, 4.161399841308594, 137.70982360839844, -3.9266357421875, 121.45138549804688, 68.51101684570312, 123.90191650390625, -4.0534210205078125, 0.9615020751953125, 79.13716125488281, 142.80630493164062, 8.682518005371094, 123.84112548828125, 16.730194091796875, 60.9327392578125, -15.990554809570312, 74.4896240234375, 7.6603851318359375, 34.601654052734375, 25.340682983398438, 76.67332458496094, 170.74618530273438, -2.3926773071289062, -13.464141845703125, 33.250396728515625, 94.54164123535156, 6.919504165649414, -36.34070587158203, 74.39231872558594, 73.70638275146484, -94.56285095214844, 15.04693603515625, 107.12911987304688, 37.02688980102539], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000337.npy"}
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 36.299102783203125, "std": 68.54258728027344, "min": -143.4274444580078, "p10": -27.178560256958008, "median": 21.014738082885742, "p90": 138.83305358886722, "max": 160.8790283203125, "pos_frac": 0.765625, "sample": [-17.386016845703125, 27.698406219482422, 144.64794921875, 127.23562622070312, 134.4655303955078, 118.87149047851562, -91.78341674804688, 117.81343841552734, 17.3980712890625, 140.70484924316406, 156.72509765625, 8.281715393066406, 1.3729476928710938, 26.848831176757812, 3.029022216796875, -83.40272521972656, 8.93524169921875, -100.177978515625, -2.4819717407226562, 3.2149429321289062, -16.80040740966797, 134.323974609375, 24.631404876708984, 1.6066417694091797, 10.342872619628906, 85.71519470214844, -143.4274444580078, 88.54134368896484, 78.90884399414062, 63.75751495361328, 7.251274108886719, 63.037841796875, 14.968585968017578, 27.412700653076172, 77.47724151611328, 61.84294128417969, 4.790386199951172, 12.243255615234375, 160.8790283203125, 8.701065063476562, 12.255256652832031, 108.9825439453125, -27.280372619628906, 101.68785095214844, 1.7614707946777344, 25.573532104492188, -43.18547821044922, -24.90692138671875, 34.700721740722656, -26.940998077392578, 26.671791076660156, 149.19456481933594, 4.312564849853516, 2.0596466064453125, 60.2640266418457, 66.89517211914062, -16.410629272460938, 144.91073608398438, 124.3345718383789, -67.71853637695312, -19.47574806213379, 34.40802764892578, 146.72984313964844, -3.8963356018066406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000338.npy"}
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 39.29054260253906, "std": 62.83485794067383, "min": -124.79148864746094, "p10": -20.424176788330072, "median": 35.378713607788086, "p90": 134.53658447265627, "max": 163.61700439453125, "pos_frac": 0.765625, "sample": [44.06898498535156, 19.997726440429688, 6.746116638183594, -13.773468017578125, -12.140602111816406, 135.57144165039062, 12.383575439453125, 45.100196838378906, 9.046459197998047, -76.1888198852539, 61.89970397949219, -14.117645263671875, 6.219001770019531, 39.10225296020508, 47.71208572387695, 5.359580993652344, -77.47171783447266, 97.12684631347656, 112.14530944824219, 6.135904312133789, 132.12191772460938, 27.417259216308594, 57.080474853515625, 53.19507598876953, 33.727073669433594, 143.98736572265625, 0.692962646484375, 20.445858001708984, -23.126976013183594, 52.652957916259766, 56.28519821166992, 101.97599792480469, 155.61439514160156, 154.6654052734375, 68.58687591552734, 5.556636810302734, 71.98650360107422, -36.03846740722656, 78.55511474609375, 62.96614074707031, 151.05307006835938, 29.637161254882812, -14.020792007446289, -6.916694641113281, 39.80517578125, -2.3060302734375, 15.7862548828125, 68.23382568359375, -95.13692474365234, -36.83221435546875, 11.675317764282227, 5.003841400146484, 104.06398010253906, 118.94705200195312, -124.79148864746094, 163.61700439453125, -7.383575439453125, 45.46696472167969, 1.3289947509765625, 87.439697265625, 89.77214050292969, 37.03035354614258, -1.0267353057861328, 160.8776092529297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000339.npy"}
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 41.866920471191406, "std": 75.9669189453125, "min": -144.41810607910156, "p10": -44.47160110473632, "median": 27.512131690979004, "p90": 149.63031616210938, "max": 163.99156188964844, "pos_frac": 0.640625, "sample": [67.19776916503906, 145.54360961914062, 20.977375030517578, 57.85687255859375, -4.069976806640625, 156.35398864746094, 1.6976032257080078, -3.5295963287353516, -40.14305877685547, 153.86465454101562, 1.4001846313476562, 101.80117797851562, 49.11814880371094, 120.64834594726562, 154.75289916992188, 100.48558044433594, 27.354875564575195, -59.89720153808594, 41.432403564453125, 76.81748962402344, -25.090425491333008, -7.075712203979492, -90.51327514648438, 89.00300598144531, -9.74871826171875, 163.99156188964844, -8.2186279296875, 140.16409301757812, -10.459205627441406, 148.84747314453125, -104.62845611572266, 131.72093200683594, 2.560577392578125, 27.669387817382812, 132.75973510742188, 64.67900085449219, 135.31167602539062, -0.5129299163818359, -15.965927124023438, 141.82789611816406, 111.74186706542969, -32.50242614746094, -2.5597076416015625, 151.11378479003906, 45.589412689208984, -3.7101306915283203, -46.326690673828125, -78.94393920898438, 2.242156982421875, 144.2635498046875, -13.999603271484375, 54.98591995239258, 85.99394989013672, -13.611200332641602, -144.41810607910156, -47.668731689453125, 149.9658203125, 16.40247344970703, 35.588218688964844, 162.10562133789062, 28.93558120727539, 4.516075134277344, -23.60883331298828, 17.40264129638672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000340.npy"}
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 47.656349182128906, "std": 70.70791625976562, "min": -112.43846130371094, "p10": -36.160398864746085, "median": 33.02596855163574, "p90": 146.0653106689453, "max": 164.49807739257812, "pos_frac": 0.75, "sample": [-59.37380599975586, 59.85179138183594, 136.803466796875, 77.31692504882812, 114.76258087158203, 81.68309020996094, 115.29727935791016, 164.49807739257812, 146.21502685546875, -26.34381103515625, 137.84201049804688, 23.4580078125, -100.93915557861328, -3.568561553955078, 9.744115829467773, 3.2466354370117188, 106.98255157470703, 4.168601989746094, 121.85601806640625, 154.1976318359375, 122.86476135253906, 149.06594848632812, 149.3065643310547, 72.43602752685547, -60.51666259765625, 143.9026336669922, 99.73463439941406, 13.22562026977539, -24.596153259277344, -15.629241943359375, 36.98245620727539, 9.503219604492188, -17.182998657226562, 3.1441650390625, 23.283584594726562, 141.81796264648438, 47.983768463134766, 27.545286178588867, 70.51016998291016, -45.85816955566406, -24.34038543701172, -63.244598388671875, 68.62017059326172, 145.71597290039062, 2.9359092712402344, 150.5861358642578, 24.359375, 65.84819030761719, 87.49846649169922, 153.27346801757812, 55.02000427246094, 8.835624694824219, 109.87789154052734, -40.36750793457031, 4.247711181640625, 111.58612060546875, 74.19902801513672, 29.069480895996094, -0.6999683380126953, -18.324447631835938, -19.025779724121094, 16.40167236328125, 5.149953842163086, -112.43846130371094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000341.npy"}
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 48.661231994628906, "std": 69.29727172851562, "min": -107.93709564208984, "p10": -26.23728332519531, "median": 30.72578239440918, "p90": 151.86395568847655, "max": 194.6045684814453, "pos_frac": 0.734375, "sample": [22.644996643066406, 34.990020751953125, -0.876495361328125, -3.3601455688476562, 2.9431018829345703, 147.9414520263672, 30.505680084228516, 122.96931457519531, -57.57698059082031, 18.750625610351562, -2.5204906463623047, 36.40550231933594, -2.1841907501220703, 149.25515747070312, 3.5852127075195312, -59.75614929199219, 4.81037712097168, -107.93709564208984, -1.752817153930664, -35.04174041748047, 28.82790756225586, 151.86402893066406, 99.11744689941406, 151.86378479003906, -27.546478271484375, 51.946022033691406, 25.033218383789062, 77.59984588623047, 113.59668731689453, 71.96749877929688, 44.332889556884766, 48.8657341003418, 28.701583862304688, 128.96890258789062, 46.188507080078125, -8.303472518920898, 122.43157958984375, 10.705795288085938, -23.1824951171875, 147.02757263183594, 11.221511840820312, 155.75437927246094, 54.552734375, 77.52143859863281, -43.26527404785156, 156.7998504638672, 3.5181808471679688, -2.9723777770996094, 194.6045684814453, 162.90773010253906, 104.2791519165039, -71.0988998413086, 16.18596649169922, 150.5416259765625, -13.23550033569336, 155.14662170410156, 58.53794479370117, -1.4453048706054688, 7.579032897949219, 62.393985748291016, 76.8340835571289, 30.945884704589844, 7.3104400634765625, 165.89910888671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000342.npy"}
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 31.978862762451172, "std": 63.893314361572266, "min": -137.90692138671875, "p10": -23.277117156982413, "median": 23.72125244140625, "p90": 115.31954650878906, "max": 198.45608520507812, "pos_frac": 0.71875, "sample": [76.17877197265625, 8.032821655273438, -15.871345520019531, 0.115570068359375, -4.595039367675781, 70.99665832519531, -7.3665008544921875, -0.8546504974365234, 3.52728271484375, -2.2571868896484375, -129.23989868164062, 34.723663330078125, 31.215606689453125, 77.0235824584961, 198.45608520507812, -6.823070526123047, 136.93023681640625, 135.1546630859375, 113.09961700439453, 103.96043395996094, 30.080429077148438, 3.288400650024414, 53.265045166015625, 114.4866943359375, 11.600353240966797, 50.65513229370117, 20.830947875976562, -42.754974365234375, 33.746482849121094, -2.353689193725586, -79.6973648071289, 37.51329803466797, 101.94502258300781, 12.3240966796875, 13.975830078125, -2.2400894165039062, 78.24117279052734, 115.67648315429688, -128.64927673339844, 143.0130157470703, -26.451019287109375, 42.17613220214844, 40.47498321533203, 152.85794067382812, 62.0736083984375, 9.503387451171875, 41.101776123046875, 33.66740036010742, 11.243354797363281, 116.46949768066406, 102.9497299194336, 10.807182312011719, -5.678001403808594, 86.81610870361328, 49.12891387939453, 26.611557006835938, -1.5438079833984375, 9.450569152832031, 49.96437072753906, 20.16027069091797, -137.90692138671875, -5.7591705322265625, -34.64023971557617, 5.815202713012695], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000343.npy"}
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 48.302486419677734, "std": 65.7585220336914, "min": -87.39362335205078, "p10": -27.44520797729492, "median": 40.07397270202637, "p90": 143.6508987426758, "max": 209.36288452148438, "pos_frac": 0.765625, "sample": [-50.12751770019531, 128.7630615234375, 17.380043029785156, 41.760250091552734, 150.21511840820312, -1.5727481842041016, 4.369457244873047, 33.47342300415039, 14.995582580566406, 58.897796630859375, 8.027069091796875, 13.0067138671875, 24.83753776550293, 101.40054321289062, 84.5396499633789, 67.43324279785156, 87.13851928710938, 209.36288452148438, -5.7445220947265625, 75.79415893554688, -39.995765686035156, -27.772323608398438, -26.81096649169922, 160.3905487060547, 3.8811302185058594, -8.595008850097656, -50.07579040527344, 1.8940200805664062, 160.9829559326172, 68.2652587890625, -27.717025756835938, 153.94338989257812, -4.919700622558594, 85.1903076171875, 53.21134948730469, 145.32009887695312, 140.72683715820312, 10.797393798828125, 92.49761199951172, -87.39362335205078, 0.4217872619628906, 9.450481414794922, -19.324966430664062, 2.3563976287841797, -9.769783020019531, -5.604806900024414, 144.90406799316406, 135.75025939941406, 56.4793701171875, 62.55057907104492, 100.5636215209961, 49.67222595214844, 94.18376922607422, 6.488702774047852, 137.29095458984375, 18.66118049621582, 38.3876953125, 89.49856567382812, 10.750717163085938, 113.79334259033203, -87.02196502685547, 91.59027099609375, 110.19136047363281, 72.32432556152344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000344.npy"}
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 41.25560760498047, "std": 66.23143005371094, "min": -105.8781509399414, "p10": -25.39547538757324, "median": 32.21760177612305, "p90": 132.6217514038086, "max": 186.74903869628906, "pos_frac": 0.765625, "sample": [47.27459716796875, 15.226600646972656, 157.04959106445312, -83.30582427978516, 7.935314178466797, 10.705917358398438, -1.789459228515625, -52.385162353515625, 39.23381805419922, -13.380050659179688, 129.53912353515625, 92.80581665039062, 93.18285369873047, 94.81343078613281, -63.760990142822266, 87.09918975830078, -72.58159637451172, 43.92929458618164, 25.316307067871094, 109.95793151855469, 186.74903869628906, 9.488363265991211, 92.76353454589844, 38.108062744140625, 108.15252685546875, -9.43414306640625, 7.2525787353515625, 35.46794891357422, 1.5087127685546875, 50.58863067626953, 81.11614227294922, 160.73648071289062, 101.05523681640625, 58.00592803955078, 28.967254638671875, -105.8781509399414, 155.5121307373047, 159.13247680664062, 68.60539245605469, -13.365924835205078, 132.18838500976562, 37.580726623535156, -27.248977661132812, 132.80747985839844, 28.608489990234375, -21.070636749267578, 20.793190002441406, -19.97449493408203, 20.075103759765625, 20.01866912841797, 18.79877471923828, 10.115127563476562, -11.654365539550781, 24.6917724609375, 2.3916702270507812, 69.79278564453125, 0.9978237152099609, -99.5113296508789, -16.93244171142578, 35.68584442138672, 61.5736083984375, 68.25558471679688, 122.73733520507812, 148.23980712890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000345.npy"}
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 48.48233413696289, "std": 70.2108383178711, "min": -150.0867919921875, "p10": -37.30534782409668, "median": 38.00081443786621, "p90": 148.63702087402345, "max": 166.04248046875, "pos_frac": 0.765625, "sample": [91.10944366455078, 111.8353271484375, 90.40414428710938, 85.03083038330078, 98.52601623535156, 39.91380310058594, 21.266387939453125, 76.2134780883789, 139.51242065429688, 159.01258850097656, 148.69091796875, 4.676719665527344, 51.57122802734375, 131.29571533203125, -33.34273910522461, 152.64208984375, -87.85006713867188, 40.954166412353516, -39.00360870361328, 49.03251266479492, 77.14640808105469, 73.08737182617188, 32.91523742675781, 42.605133056640625, 142.92593383789062, 166.04248046875, -150.0867919921875, 16.79283905029297, 6.314907073974609, -68.64460754394531, 5.417877197265625, -61.880226135253906, 133.83578491210938, -1.3559722900390625, 36.087825775146484, 24.341026306152344, 5.122795104980469, -9.915863037109375, 114.32125854492188, 120.64115905761719, 158.33834838867188, 100.83375549316406, -7.569732666015625, 151.8166961669922, 25.77115249633789, 24.52727508544922, 102.1479263305664, 60.22088623046875, -45.751060485839844, 64.92860412597656, 33.89208984375, 17.115264892578125, 25.83045768737793, -3.5183639526367188, -67.03524017333984, 7.9990234375, 21.83356475830078, 96.46942138671875, -22.13385772705078, 164.56845092773438, -9.937145233154297, -0.8027019500732422, 17.607452392578125, 148.51126098632812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000346.npy"}
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 36.77296447753906, "std": 66.1285629272461, "min": -102.37274169921875, "p10": -35.50855712890625, "median": 17.959484100341797, "p90": 141.3763168334961, "max": 171.38262939453125, "pos_frac": 0.703125, "sample": [36.90460205078125, -4.471549987792969, 32.374244689941406, 53.52299499511719, -85.96803283691406, 15.518463134765625, 79.53024291992188, 51.044921875, 98.40631103515625, 138.44862365722656, 2.0618133544921875, 135.26254272460938, 9.314239501953125, -69.13068389892578, -37.348785400390625, 165.5953369140625, 13.120040893554688, 42.87587356567383, -20.986053466796875, 5.233085632324219, 13.570114135742188, 3.6643123626708984, 103.14878845214844, 2.013275146484375, -42.79679870605469, 15.374515533447266, 73.34845733642578, 113.1864013671875, 142.63104248046875, -60.537109375, 26.86493682861328, -20.744855880737305, -29.246192932128906, 108.09944915771484, -6.129903793334961, 46.15542221069336, -7.297809600830078, 104.05758666992188, -23.656494140625, 27.99187469482422, 48.551509857177734, 15.681816101074219, -31.214691162109375, 80.33338928222656, 3.440542221069336, 158.6022186279297, -45.60027313232422, -8.180076599121094, 171.38262939453125, 161.44076538085938, 152.4988555908203, 124.45063781738281, 20.237152099609375, 25.417922973632812, -102.37274169921875, 56.73114013671875, 59.966392517089844, 5.865680694580078, -6.087665557861328, 0.3244915008544922, -14.689151763916016, -2.4301986694335938, 144.6139678955078, 83.50039672851562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000347.npy"}
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 46.65338134765625, "std": 74.19989013671875, "min": -100.20889282226562, "p10": -36.220587158203124, "median": 25.418701171875, "p90": 157.5456741333008, "max": 209.5928955078125, "pos_frac": 0.6875, "sample": [134.05206298828125, 149.18438720703125, 9.963546752929688, 2.7286605834960938, 159.53890991210938, 49.531211853027344, -34.09051513671875, -11.132675170898438, -100.20889282226562, 8.139884948730469, 169.24099731445312, -58.67207336425781, 26.076400756835938, 67.3948745727539, 130.58462524414062, 132.98733520507812, 24.669448852539062, 170.7687225341797, 33.5685920715332, 165.58302307128906, 5.886997222900391, 116.14533996582031, 131.45162963867188, 18.637741088867188, -1.8113327026367188, 144.46922302246094, -3.9322357177734375, 139.92352294921875, -42.04206085205078, -8.972526550292969, 77.79827880859375, -1.3188400268554688, 2.461488723754883, 209.5928955078125, 24.761001586914062, -74.18124389648438, 40.47078323364258, -63.09830093383789, -19.29712677001953, 73.98694610595703, 164.02540588378906, 152.89479064941406, 46.570404052734375, 0.113800048828125, 68.21261596679688, -15.776237487792969, -36.445770263671875, -38.851531982421875, -11.231880187988281, 75.13044738769531, -15.830669403076172, 162.0477752685547, -20.02815055847168, 65.89362335205078, 83.21812438964844, 7.072105407714844, 84.88783264160156, 36.81571960449219, -35.695159912109375, 3.778493881225586, -21.981048583984375, 72.63250732421875, 143.65478515625, 13.86770248413086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000348.npy"}
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 46.47608947753906, "std": 68.78804016113281, "min": -98.42169189453125, "p10": -24.346443176269528, "median": 27.881237030029297, "p90": 148.91238708496095, "max": 173.42115783691406, "pos_frac": 0.6875, "sample": [-58.59469985961914, 27.604705810546875, 146.29844665527344, 27.81989288330078, 44.380348205566406, -1.1935195922851562, 56.98217010498047, 146.90530395507812, 55.916236877441406, 11.511146545410156, -1.6394977569580078, 25.195432662963867, 142.9923095703125, 9.675430297851562, 24.25817108154297, 87.92448425292969, 104.3167953491211, -67.63178253173828, -98.42169189453125, 154.37464904785156, 29.473312377929688, 107.01091003417969, -26.135040283203125, 0.6248245239257812, 128.43429565429688, 27.942581176757812, 5.683906555175781, 62.3253288269043, -5.757743835449219, 78.84085083007812, 117.60284423828125, 128.28558349609375, 6.918846130371094, 18.832761764526367, -7.6861419677734375, 147.05209350585938, 28.675722122192383, 167.9948272705078, -80.39102935791016, 48.42152786254883, 164.4307861328125, 13.590049743652344, 6.601776123046875, 32.311492919921875, 78.23036193847656, 156.996337890625, 149.70965576171875, -2.103565216064453, -14.454986572265625, -2.109098434448242, 99.25245666503906, -20.173049926757812, -2.477783203125, -1.6721649169921875, 43.112953186035156, 169.45346069335938, -26.4644775390625, -19.761978149414062, -39.98359680175781, 113.86959075927734, -1.3021488189697266, -3.7340621948242188, 84.90210723876953, 173.42115783691406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000349.npy"}
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 57.00303268432617, "std": 68.38157653808594, "min": -90.24166870117188, "p10": -28.552890777587876, "median": 54.21055030822754, "p90": 151.4138870239258, "max": 186.28941345214844, "pos_frac": 0.796875, "sample": [132.82632446289062, -16.410049438476562, 76.21441650390625, -13.827911376953125, 82.17546844482422, 23.15806770324707, 138.1500244140625, 77.70896911621094, 75.84945678710938, 147.09523010253906, 10.876358032226562, -49.50965881347656, 44.845672607421875, 95.01298522949219, 42.447227478027344, 72.50605773925781, 121.2820816040039, 23.006229400634766, -7.611518859863281, -4.058164596557617, 51.318572998046875, 70.70046997070312, -83.46332550048828, 105.78695678710938, 68.61788177490234, 3.9133224487304688, -90.24166870117188, 186.28941345214844, -14.04966926574707, 34.37833023071289, 117.18755340576172, -71.6697998046875, 139.73568725585938, 7.126533508300781, 127.42466735839844, 158.49893188476562, 65.48236846923828, 123.53988647460938, 155.18771362304688, 181.18319702148438, 62.62736511230469, 166.20843505859375, 153.26473999023438, -33.75696563720703, 41.688594818115234, 5.236717224121094, 5.890876770019531, 18.355318069458008, -36.58646774291992, 69.44121551513672, 75.28863525390625, 57.1025276184082, 0.9475688934326172, 38.87413024902344, 38.475074768066406, -64.64584350585938, 42.2110595703125, 39.16453552246094, -4.967353820800781, 38.780242919921875, 156.6736602783203, 125.50595092773438, 98.3701171875, 145.35964965820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000350.npy"}
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 45.3077507019043, "std": 75.30657958984375, "min": -153.9269256591797, "p10": -42.93347396850585, "median": 37.61031532287598, "p90": 143.26119232177737, "max": 157.82180786132812, "pos_frac": 0.765625, "sample": [-1.057626724243164, 107.48487091064453, 150.89752197265625, 106.45098114013672, 2.4457054138183594, 131.23275756835938, -14.25333023071289, 34.721832275390625, 150.96792602539062, 35.76884460449219, 27.756988525390625, 146.20883178710938, 13.97874641418457, -27.086212158203125, 21.210174560546875, -15.030708312988281, 18.388442993164062, 7.756732940673828, 112.5982437133789, 80.12396240234375, 61.351844787597656, -101.40381622314453, 145.2000732421875, 12.966873168945312, 138.7371368408203, 151.17352294921875, 83.53321838378906, 93.70858764648438, -151.07711791992188, 51.72113037109375, 132.2683563232422, -80.28133392333984, 98.30596923828125, -1.2670745849609375, 117.69758605957031, -50.46295166015625, -13.259910583496094, 114.57806396484375, 32.10973358154297, 39.451786041259766, 81.17670440673828, 157.82180786132812, 24.762229919433594, -45.17333221435547, 66.02610778808594, 16.888267517089844, 79.07565307617188, 61.78590393066406, 3.6324615478515625, 110.84014129638672, -37.70713806152344, 2.496143341064453, 110.18948364257812, 91.81287384033203, 149.6071014404297, 32.90015411376953, 17.119659423828125, 91.63627624511719, -153.9269256591797, 11.017210006713867, -130.5962371826172, 109.77374267578125, 98.39662170410156, -15.475006103515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000351.npy"}
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 48.94923400878906, "std": 68.77718353271484, "min": -112.53323364257812, "p10": -15.952065467834469, "median": 36.990413665771484, "p90": 148.45564880371094, "max": 206.58230590820312, "pos_frac": 0.765625, "sample": [145.7794189453125, 176.82821655273438, 144.3611602783203, 6.6772613525390625, 130.01719665527344, -112.53323364257812, -8.81906509399414, -1.0806007385253906, 50.680233001708984, 67.73822021484375, 65.42298126220703, 121.27841186523438, -82.98712158203125, -2.353260040283203, 70.69990539550781, 12.74261474609375, -17.615234375, -6.0778045654296875, -53.16820526123047, 18.189186096191406, 84.45803833007812, 59.897727966308594, 21.354415893554688, 155.58863830566406, 148.44879150390625, 72.41363525390625, 33.096595764160156, -107.91655731201172, 21.267620086669922, -13.239727020263672, 33.08120346069336, -1.2958126068115234, 7.7467193603515625, 56.09656524658203, 59.40472412109375, 94.98350524902344, 32.78772735595703, 150.6956787109375, 158.31056213378906, 49.13500213623047, 148.45858764648438, 1.8116531372070312, 206.58230590820312, 28.641237258911133, 99.7347183227539, 139.45013427734375, -9.02734375, 11.136909484863281, -17.1144962310791, 40.93232345581055, 2.136260986328125, 53.55364990234375, 19.287689208984375, 129.63612365722656, 116.15142059326172, 0.2761688232421875, -22.734657287597656, 150.5142059326172, -6.495128631591797, 12.914382934570312, 0.38935089111328125, 96.04765319824219, 40.88423156738281, 47.4884033203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000352.npy"}
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 48.590087890625, "std": 68.93651580810547, "min": -155.72084045410156, "p10": -10.1560489654541, "median": 31.75770378112793, "p90": 146.32149505615234, "max": 165.1547088623047, "pos_frac": 0.78125, "sample": [87.44779205322266, 157.027587890625, -46.489994049072266, 36.45539474487305, -80.28304290771484, 23.868846893310547, 87.48442840576172, -6.142021179199219, 142.6678924560547, 18.84473419189453, 2.594188690185547, 43.316017150878906, 85.75541687011719, -11.24993896484375, 147.4012451171875, 10.963129043579102, 32.08522033691406, 129.88748168945312, 120.03500366210938, 147.94851684570312, 94.94908142089844, 162.75009155273438, 1.7737369537353516, 78.47821044921875, -9.452911376953125, -2.197704315185547, 157.44183349609375, 19.54590606689453, 136.13638305664062, 78.98989868164062, 90.20585632324219, 2.4672508239746094, 75.24488830566406, 1.0779685974121094, -155.72084045410156, 137.63914489746094, -65.38284301757812, -52.6863899230957, 142.4813232421875, 136.87387084960938, 57.823360443115234, 148.974853515625, 165.1547088623047, 1.0779609680175781, -3.84368896484375, 31.430187225341797, 110.11842346191406, 13.024881362915039, -10.457393646240234, 14.178558349609375, 36.702117919921875, 23.07281494140625, -5.0213775634765625, 36.99462127685547, -0.3677825927734375, 7.6956939697265625, 3.3213424682617188, 17.372390747070312, -5.758697509765625, 11.459877014160156, 101.979736328125, 0.5097808837890625, 143.8020782470703, 50.28840637207031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000353.npy"}
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 52.743797302246094, "std": 74.73797607421875, "min": -103.97390747070312, "p10": -61.314606475830075, "median": 46.70135688781738, "p90": 152.10535278320313, "max": 177.5409393310547, "pos_frac": 0.765625, "sample": [165.17074584960938, 44.066410064697266, -23.475780487060547, 124.24113464355469, -63.12964630126953, 36.66502380371094, 15.157844543457031, 102.70995330810547, 115.49018859863281, 12.945621490478516, 109.56985473632812, 155.89723205566406, 88.24751281738281, 49.780235290527344, -14.930229187011719, 1.4180717468261719, 141.16290283203125, -31.062793731689453, 135.01290893554688, 177.5409393310547, 91.13502502441406, 150.89666748046875, 39.23765563964844, 62.214385986328125, -79.67720794677734, 0.2919292449951172, -103.97390747070312, 133.0382080078125, 32.45411682128906, 49.65656280517578, 30.414947509765625, 108.61216735839844, 152.39544677734375, 49.3363037109375, 43.014923095703125, 24.387611389160156, 83.21746826171875, -74.06736755371094, 154.05116271972656, 10.96051025390625, 105.8210220336914, 140.13162231445312, -86.3753890991211, -74.83988952636719, 156.61376953125, -63.5377197265625, 164.78077697753906, 122.02024841308594, 12.749673843383789, -25.81635284423828, 35.46289825439453, 49.5709228515625, 109.62602233886719, 151.428466796875, 53.05271530151367, 21.905643463134766, -1.1268081665039062, 12.352977752685547, -11.923851013183594, 98.60850524902344, 42.517578125, 136.5657958984375, -57.07951354980469, -16.980953216552734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000354.npy"}
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 55.1788215637207, "std": 78.37299346923828, "min": -134.96942138671875, "p10": -53.53758010864257, "median": 51.203264236450195, "p90": 154.31775665283203, "max": 175.71881103515625, "pos_frac": 0.703125, "sample": [89.54853820800781, 164.89645385742188, -59.03779602050781, 7.2266082763671875, 154.7550048828125, 49.60649871826172, 47.37169647216797, -16.51064109802246, -60.99884796142578, 69.43549346923828, 61.01741027832031, 29.47265625, 84.86698150634766, -55.56744384765625, 143.90432739257812, 98.83985900878906, -2.306903839111328, 52.80002975463867, -67.19325256347656, 108.2631607055664, -61.05061340332031, 71.09083557128906, 28.095001220703125, 150.26144409179688, 143.459228515625, 49.32838439941406, 112.21625518798828, 30.59412384033203, 113.22853088378906, 175.71881103515625, -48.801231384277344, 36.410152435302734, -2.1211585998535156, 57.765899658203125, 9.05401611328125, -134.96942138671875, 95.58760070800781, -6.9933319091796875, 57.061763763427734, 150.83985900878906, -4.939369201660156, 19.448883056640625, 138.84820556640625, -18.621002197265625, -2.247488021850586, 162.45489501953125, 139.39468383789062, 9.499446868896484, 154.556640625, -133.65647888183594, 149.4027099609375, 144.40110778808594, 158.00108337402344, -2.181102752685547, 153.76036071777344, -39.736351013183594, -4.688011169433594, 28.307125091552734, 167.03884887695312, -4.467681884765625, 124.19756317138672, 29.263925552368164, 103.26384735107422, 132.97686767578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000355.npy"}
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 44.889495849609375, "std": 86.37255859375, "min": -159.78964233398438, "p10": -68.55360641479493, "median": 33.327762603759766, "p90": 154.4009780883789, "max": 170.68138122558594, "pos_frac": 0.703125, "sample": [151.39805603027344, 163.833740234375, 29.591567993164062, -87.28742980957031, 137.9278564453125, 11.48931884765625, 0.8793258666992188, 170.68138122558594, 92.20120239257812, 1.806142807006836, -123.14878845214844, -146.06240844726562, 90.2515869140625, -94.23381805419922, 154.69168090820312, 165.6270751953125, 8.921897888183594, 17.405269622802734, 142.43252563476562, -1.0472240447998047, 155.71856689453125, -159.78964233398438, 151.99545288085938, 20.79718017578125, 1.7658920288085938, 160.21913146972656, -67.38450622558594, -43.36140441894531, 59.350154876708984, 37.06395721435547, 80.87147521972656, 4.9589996337890625, -8.435976028442383, 3.064146041870117, -69.05464935302734, 130.5537109375, 131.60623168945312, -55.78826141357422, 71.69795227050781, 127.201171875, -2.6003761291503906, 2.8416595458984375, 153.72267150878906, -20.135513305664062, 61.446990966796875, 86.43574523925781, 110.06318664550781, 13.343605041503906, -53.83124542236328, 74.02999877929688, -0.9420814514160156, 89.3469467163086, 150.90988159179688, 158.39370727539062, -6.735008239746094, 108.85029602050781, 40.94850540161133, -19.752525329589844, 148.19876098632812, -33.74488830566406, -74.73764038085938, 148.6976776123047, 29.019866943359375, 88.74908447265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000356.npy"}
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 39.92060852050781, "std": 71.52594757080078, "min": -146.1352996826172, "p10": -41.79815864562988, "median": 35.967220306396484, "p90": 140.86991424560546, "max": 166.8396759033203, "pos_frac": 0.75, "sample": [135.5245361328125, -6.384435653686523, 130.64926147460938, -26.747421264648438, -127.94499969482422, 151.725341796875, 40.86119079589844, 70.89485168457031, 21.148834228515625, 20.20953369140625, -21.416595458984375, 59.58521270751953, -41.170494079589844, 10.378604888916016, 157.27720642089844, 79.15451049804688, 140.96815490722656, 71.35829162597656, 84.4287109375, 3.8938560485839844, 83.57459259033203, 49.92011260986328, 54.34199142456055, 25.252214431762695, 22.183631896972656, 76.56819915771484, -5.99903678894043, 3.8439712524414062, -54.141265869140625, 46.18732833862305, -51.30683898925781, 33.583900451660156, -23.044361114501953, 102.67728424072266, 74.1269760131836, 154.0369873046875, 140.64068603515625, 38.35054016113281, 83.92057800292969, 154.45135498046875, -19.72699737548828, 57.817596435546875, 13.867523193359375, 3.1046066284179688, 147.27450561523438, 40.49156188964844, 2.5165061950683594, -26.919479370117188, -42.06715774536133, 20.072174072265625, 44.3741340637207, 166.8396759033203, 81.32732391357422, 8.072193145751953, 114.72830963134766, 33.142822265625, 1.416727066040039, 0.5704116821289062, 133.95388793945312, -115.11732482910156, -65.16780090332031, -146.1352996826172, 137.49868774414062, -0.5786190032958984], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000357.npy"}
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 55.03110885620117, "std": 75.14755249023438, "min": -116.91058349609375, "p10": -42.17394714355468, "median": 38.59357452392578, "p90": 157.8624267578125, "max": 220.02630615234375, "pos_frac": 0.78125, "sample": [124.42623138427734, 104.69270324707031, 106.24751281738281, 157.97952270507812, -46.41608428955078, 162.89601135253906, 152.00599670410156, -32.27562713623047, -6.927677154541016, 25.084075927734375, -8.56325912475586, 117.41957092285156, 40.819786071777344, 96.82302856445312, 147.12290954589844, 36.8729248046875, -52.108001708984375, 16.657150268554688, 117.44589233398438, 21.698246002197266, 27.173828125, 19.837764739990234, -31.97364044189453, 112.20076751708984, 148.8984375, 3.5339431762695312, 26.388519287109375, 8.181121826171875, -23.682281494140625, 13.97006607055664, -78.39757537841797, 31.694236755371094, -51.72381591796875, 3.7948360443115234, -23.946449279785156, 58.409515380859375, 89.56436157226562, 40.43084716796875, 18.664817810058594, 0.09295845031738281, 161.98277282714844, 186.18026733398438, 220.02630615234375, 5.978946685791016, 182.64743041992188, 157.58920288085938, 87.99578094482422, -46.99039840698242, 40.31422424316406, 51.59625244140625, -1.6803321838378906, -46.877662658691406, -116.91058349609375, 136.41943359375, 5.8281707763671875, 140.51661682128906, 33.56276321411133, 86.44499206542969, 18.996368408203125, 96.03330993652344, 104.6444091796875, 114.9651870727539, 53.53265380859375, 174.18162536621094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000358.npy"}
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 27.96881866455078, "std": 67.21672821044922, "min": -108.38211059570312, "p10": -52.49590148925781, "median": 22.460079193115234, "p90": 127.46424560546879, "max": 173.97264099121094, "pos_frac": 0.671875, "sample": [-48.40409851074219, 78.06570434570312, -23.658935546875, 7.604217529296875, 55.54674530029297, 52.339256286621094, -24.618412017822266, 25.38262176513672, -55.72261047363281, -42.341552734375, 173.97264099121094, 63.91050720214844, -9.555526733398438, 39.9183349609375, -2.2470855712890625, 75.93618774414062, 146.70391845703125, -45.91386413574219, -2.9929351806640625, -108.38211059570312, 16.510272979736328, 40.879390716552734, 20.952850341796875, -48.90126037597656, 8.422927856445312, 141.50189208984375, 67.98210144042969, 20.28500747680664, 50.95884704589844, -98.23770904541016, 151.63186645507812, 167.471435546875, -49.843963623046875, 102.21731567382812, 23.53558349609375, 21.9716796875, 2.8649978637695312, 86.96879577636719, 31.591018676757812, 22.94847869873047, -62.72663116455078, 100.85431671142578, 131.89637756347656, 65.59613037109375, 0.6557235717773438, 41.35748291015625, -41.84564971923828, -67.6829833984375, -53.6324462890625, 62.080169677734375, 40.86391067504883, -4.122255325317383, -106.05741882324219, -10.873544692993164, 85.17383575439453, 0.376983642578125, 97.146728515625, 13.303804397583008, 74.47959899902344, 117.12260437011719, 24.894176483154297, 154.86920166015625, 7.105892181396484, -18.086074829101562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000359.npy"}
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 46.476253509521484, "std": 69.85125732421875, "min": -105.12488555908203, "p10": -22.726806640625, "median": 24.848295211791992, "p90": 144.5787139892578, "max": 184.4691162109375, "pos_frac": 0.65625, "sample": [-22.12908935546875, 152.55242919921875, 13.561431884765625, 3.0955047607421875, 62.8875732421875, 95.66439056396484, -2.485584259033203, 37.287906646728516, 31.70676040649414, 148.82217407226562, -0.1605396270751953, -68.80375671386719, -2.216094970703125, -58.126461029052734, 136.09649658203125, -52.57783508300781, 139.51202392578125, -31.651981353759766, -0.15887832641601562, 142.089111328125, 25.040180206298828, -2.863384246826172, -55.371185302734375, 15.418426513671875, -4.521148681640625, -3.7740325927734375, 124.02503967285156, 6.079399108886719, 103.59867095947266, -3.8055877685546875, 33.56499481201172, 12.810470581054688, 95.1688232421875, -22.98297119140625, 24.656410217285156, 144.6192626953125, -12.420501708984375, 83.67658233642578, 144.48410034179688, 16.8306884765625, 176.08950805664062, -2.1162567138671875, 134.80551147460938, -9.824560165405273, 184.4691162109375, 18.09001922607422, 149.41392517089844, 9.696853637695312, 47.00400924682617, 135.81138610839844, 133.03448486328125, 44.44024658203125, 162.7346649169922, -9.554841995239258, -7.332975387573242, 65.49554443359375, -15.39222526550293, -105.12488555908203, 47.667694091796875, 115.39752197265625, 36.07977294921875, 2.694610595703125, 99.06885528564453, 112.63227844238281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000360.npy"}
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 55.27438735961914, "std": 68.40703582763672, "min": -96.48460388183594, "p10": -21.4068431854248, "median": 57.13271141052246, "p90": 148.1016815185547, "max": 171.86251831054688, "pos_frac": 0.78125, "sample": [74.66146087646484, 103.77379608154297, 27.71493148803711, -7.917015075683594, 155.8739013671875, 62.63862609863281, -44.00514602661133, -96.48460388183594, 57.03971481323242, 50.17136001586914, 104.77914428710938, -42.559932708740234, -2.44012451171875, 149.1441650390625, 105.472900390625, 89.52352142333984, -22.315372467041016, 0.3545722961425781, 58.3043212890625, 12.633331298828125, 57.2257080078125, 50.54198455810547, 13.109123229980469, 157.67672729492188, 141.02658081054688, 122.028076171875, 137.25474548339844, -89.85595703125, 145.66921997070312, 110.01893615722656, 27.927356719970703, -6.573333740234375, 118.23056030273438, 171.86251831054688, 121.93293762207031, 67.72189331054688, 143.03518676757812, 89.77120971679688, -58.022064208984375, -3.5936203002929688, -1.0797252655029297, 5.8840179443359375, 155.6197052001953, 140.27545166015625, 9.56997299194336, 156.5047607421875, 133.59042358398438, 11.026359558105469, -52.686927795410156, -17.960037231445312, 71.32711791992188, 15.442794799804688, 83.74452209472656, 13.925487518310547, 159.588623046875, 26.6809139251709, 122.48385620117188, 5.884269714355469, 2.2420806884765625, 60.52986145019531, 64.11521911621094, 36.41278839111328, 0.3748817443847656, -19.286941528320312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000361.npy"}
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 55.67449188232422, "std": 83.33027648925781, "min": -153.3375244140625, "p10": -39.4911491394043, "median": 40.76060390472412, "p90": 152.6218048095703, "max": 283.19256591796875, "pos_frac": 0.75, "sample": [-45.065093994140625, 1.8500957489013672, 3.97296142578125, 1.176595687866211, -2.8852005004882812, -153.3375244140625, 21.89220428466797, 76.6873779296875, 283.19256591796875, 136.45860290527344, 15.213066101074219, -16.94119644165039, 20.040733337402344, -7.7601470947265625, -1.4860496520996094, -2.5744972229003906, 86.89125061035156, 75.20968627929688, 92.61309814453125, 157.912841796875, -2.1356029510498047, 24.062347412109375, 31.993497848510742, 99.0390625, 143.29782104492188, 1.58392333984375, 130.38694763183594, -1.421142578125, 2.2626609802246094, 119.28573608398438, 205.34722900390625, 3.817659378051758, -28.125404357910156, 152.75184631347656, 31.04462432861328, 174.75750732421875, 141.8992156982422, 19.51239013671875, 72.6626968383789, -39.66932678222656, 1.8933906555175781, 137.44052124023438, 152.21206665039062, 152.31837463378906, 23.510604858398438, 49.5277099609375, 118.344482421875, 173.7151336669922, 125.99395751953125, 116.42729187011719, -39.075401306152344, 11.425323486328125, 137.04736328125, -106.11085510253906, 62.794960021972656, 79.59366607666016, 148.20518493652344, -93.58352661132812, -57.511863708496094, 92.52183532714844, 178.80413818359375, -77.82308959960938, 85.55268859863281, 64.528564453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000362.npy"}
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 54.263099670410156, "std": 73.88468933105469, "min": -147.5037384033203, "p10": -15.911258888244621, "median": 40.331905364990234, "p90": 160.59320831298828, "max": 234.47378540039062, "pos_frac": 0.8125, "sample": [95.21463012695312, -109.93970489501953, 160.37899780273438, 69.26044464111328, 83.23744201660156, 40.11351776123047, 10.701305389404297, 166.7804412841797, 16.344192504882812, -34.148677825927734, -0.9194183349609375, 37.689788818359375, 234.47378540039062, 176.40194702148438, 10.356773376464844, 175.06924438476562, 37.80263137817383, 74.06433868408203, 23.276893615722656, 124.97666931152344, 9.121868133544922, 107.26512145996094, 12.635255813598633, 145.95497131347656, 152.44680786132812, 37.00559997558594, -2.351318359375, 20.088272094726562, -8.9830322265625, 27.77570343017578, -18.8804988861084, -147.5037384033203, 78.13204193115234, -95.49537658691406, 78.15280151367188, 72.89236450195312, 40.55029296875, 111.74417877197266, 33.657447814941406, -1.7458572387695312, 5.688053131103516, 9.037391662597656, 43.90386962890625, 160.6850128173828, 3.9130992889404297, 168.39987182617188, 107.63902282714844, 45.572486877441406, 151.26434326171875, 8.058341979980469, 136.697265625, 0.6974334716796875, 130.99362182617188, 86.38621520996094, -60.688720703125, 46.811676025390625, 163.24256896972656, -21.334178924560547, 62.34342956542969, 35.23042678833008, -0.7945117950439453, 82.68318176269531, 52.412322998046875, 10.398088455200195], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000363.npy"}
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 59.894256591796875, "std": 89.46224975585938, "min": -153.28866577148438, "p10": -36.97722816467285, "median": 59.15074920654297, "p90": 173.1342315673828, "max": 214.40126037597656, "pos_frac": 0.734375, "sample": [182.27426147460938, -3.499439239501953, 19.1693115234375, 165.331298828125, 171.10214233398438, 12.727134704589844, 81.7423324584961, 128.20947265625, -24.83350372314453, 182.85568237304688, 196.26979064941406, 122.76371002197266, -37.01503372192383, -57.06914520263672, 85.478271484375, 155.76812744140625, 61.03911590576172, -14.804386138916016, 214.40126037597656, 0.8492431640625, 178.22763061523438, 63.94599914550781, 125.77314758300781, 157.7571563720703, -119.173095703125, -66.31438446044922, 166.54205322265625, 2.092378616333008, 0.6140384674072266, 128.82354736328125, 155.1699981689453, 57.26238250732422, -3.601776123046875, 73.7563705444336, 20.275619506835938, 4.4219512939453125, 122.17401123046875, 174.005126953125, -30.15624237060547, 20.44773292541504, 124.56715393066406, 73.68798065185547, -36.889015197753906, -95.11860656738281, 185.69009399414062, 10.648914337158203, -8.103271484375, 151.18801879882812, 3.9126968383789062, 133.98915100097656, 9.776107788085938, 134.9954833984375, -124.16656494140625, 54.478248596191406, -153.28866577148438, 13.869552612304688, 78.25463104248047, 112.19462585449219, -2.046213150024414, 9.941619873046875, -1.3801918029785156, -28.909133911132812, 156.17764282226562, 154.95889282226562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000364.npy"}
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 57.29872131347656, "std": 65.968994140625, "min": -95.99301147460938, "p10": -7.0433092117309535, "median": 50.19676208496094, "p90": 147.61370239257812, "max": 217.46392822265625, "pos_frac": 0.8125, "sample": [48.974395751953125, 11.094720840454102, 86.85891723632812, 0.903411865234375, 58.22014617919922, 1.5664634704589844, 147.16778564453125, 114.91461181640625, 129.5393524169922, 74.54653930664062, 148.2936248779297, 69.50323486328125, -95.99301147460938, -1.4030685424804688, 15.096466064453125, 142.32826232910156, -1.6072006225585938, 0.5492897033691406, -14.585151672363281, 0.3428001403808594, 51.41912841796875, 83.74897003173828, 126.87427520751953, 7.248514175415039, -0.6612300872802734, -87.41056823730469, 64.34806060791016, 163.11595153808594, 27.542030334472656, 94.88055419921875, 12.680953979492188, 6.18882942199707, 140.2462921142578, 53.506221771240234, 135.9485321044922, 80.0894775390625, 147.8048095703125, -3.0845470428466797, 45.5413703918457, 48.25867462158203, 157.49612426757812, 82.56770324707031, 41.89039611816406, -15.64178466796875, 78.09614562988281, 146.7563934326172, 4.95787239074707, -11.140443801879883, 217.46392822265625, 53.75269317626953, 83.47693634033203, 2.9188365936279297, 134.680908203125, 15.781463623046875, -0.8158912658691406, 37.81329345703125, 5.5433807373046875, -32.69858932495117, 90.9989013671875, 6.3590850830078125, 162.22923278808594, 152.52340698242188, -8.739921569824219, 126.25030517578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000365.npy"}
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 47.42875671386719, "std": 86.75872039794922, "min": -155.33973693847656, "p10": -69.40392608642577, "median": 40.732648849487305, "p90": 156.17621002197265, "max": 199.4308319091797, "pos_frac": 0.71875, "sample": [1.8397903442382812, 176.41546630859375, 95.41959381103516, 67.07839965820312, -6.0202178955078125, 145.4791717529297, -52.265533447265625, -37.99542236328125, -45.917144775390625, -8.612953186035156, -65.09428405761719, 92.9578628540039, 22.97565460205078, 88.75965881347656, 52.755157470703125, 39.06781005859375, -82.41462707519531, 53.97021484375, 2.138631820678711, 106.7566146850586, 61.876190185546875, 44.869773864746094, 76.19515991210938, -14.34811782836914, 17.448057174682617, 121.76912689208984, 36.27762222290039, 154.07965087890625, -155.33973693847656, -14.386995315551758, 181.77310180664062, 4.579627990722656, -2.6192378997802734, 199.4308319091797, -85.19288635253906, 2.2006587982177734, -71.25091552734375, -101.43107604980469, 21.23827362060547, 42.39748764038086, 2.031097412109375, -16.37442398071289, 2.5891780853271484, 31.0953369140625, -105.04547119140625, 8.51544189453125, 14.881711959838867, 189.1621856689453, 170.01609802246094, 67.510009765625, 56.831146240234375, 148.77491760253906, 145.24302673339844, -108.59463500976562, 156.3219757080078, -38.645286560058594, 149.49044799804688, 151.19142150878906, 167.66940307617188, 140.42752075195312, 155.83609008789062, 104.17227172851562, 153.39971923828125, 122.0809326171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000366.npy"}
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 48.392539978027344, "std": 74.99783325195312, "min": -162.2642822265625, "p10": -42.279836463928206, "median": 48.496803283691406, "p90": 147.4152374267578, "max": 166.06007385253906, "pos_frac": 0.796875, "sample": [1.7862205505371094, 162.105224609375, 76.2081527709961, -5.5602264404296875, 88.10969543457031, -1.0282135009765625, 136.46336364746094, 147.83926391601562, 100.57484436035156, 1.9885635375976562, 28.579055786132812, 26.54840087890625, 7.235786437988281, 16.846435546875, 153.28762817382812, 12.136470794677734, 62.75712585449219, -3.1202926635742188, -49.00617980957031, 146.42584228515625, 16.693588256835938, 78.74069213867188, -10.22828483581543, -49.48258972167969, 53.52001953125, -156.4778289794922, -54.28656768798828, 85.35030364990234, 9.03066635131836, 36.415313720703125, 85.77590942382812, 131.11868286132812, 112.98078918457031, 82.39059448242188, 111.9466552734375, 3.331207275390625, 71.03074645996094, -69.62446594238281, 18.455617904663086, 153.17417907714844, 19.037460327148438, -162.2642822265625, 10.635599136352539, 166.06007385253906, 106.48487091064453, -18.19049072265625, 78.31732177734375, 13.806041717529297, 122.01203918457031, 25.646404266357422, 22.372055053710938, 69.24301147460938, -127.7946548461914, 145.83242797851562, 79.82012939453125, 154.02880859375, 43.47358703613281, -26.58503532409668, 101.16070556640625, 151.0996856689453, 19.962677001953125, 75.23541259765625, 61.89068603515625, 145.805419921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000367.npy"}
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 45.187255859375, "std": 73.00228881835938, "min": -156.69171142578125, "p10": -49.964709472656246, "median": 46.41522026062012, "p90": 141.7550231933594, "max": 169.31068420410156, "pos_frac": 0.734375, "sample": [-156.69171142578125, 125.41029357910156, 5.702953338623047, 30.25675392150879, 67.31971740722656, 96.2996597290039, 139.56179809570312, 62.72234344482422, -16.388214111328125, -11.097030639648438, 162.27352905273438, 2.7303504943847656, 29.469345092773438, 36.37541961669922, 159.70281982421875, 134.2879638671875, -51.83982849121094, -61.856964111328125, -68.51042175292969, 91.31226348876953, 64.69476318359375, 22.5986385345459, 55.42206573486328, -28.795188903808594, 157.9434814453125, 17.891834259033203, -23.254898071289062, 143.3180389404297, 169.31068420410156, 47.88729476928711, -21.48061180114746, 153.38168334960938, 77.37279510498047, 1.6049461364746094, 27.055774688720703, 90.14302062988281, 50.811737060546875, 96.31965637207031, 13.744712829589844, 88.91796875, 67.54991149902344, -12.206022262573242, 124.84588623046875, 23.586837768554688, -24.94219970703125, -19.438880920410156, 42.479854583740234, -18.03460693359375, 126.8675765991211, -45.58943176269531, 55.55607604980469, 123.14706420898438, 142.69497680664062, 136.314453125, 6.60319709777832, 80.54640197753906, 53.102088928222656, -55.71990966796875, -94.1502456665039, 100.0145263671875, 117.27346801757812, 30.655305862426758, -96.04469299316406, 44.943145751953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000368.npy"}
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 38.54582977294922, "std": 73.61951446533203, "min": -143.30050659179688, "p10": -45.05084133148192, "median": 30.319904327392578, "p90": 150.79826507568362, "max": 205.16888427734375, "pos_frac": 0.6875, "sample": [17.07482147216797, -87.7607192993164, 43.31916046142578, 80.03018188476562, -98.98076629638672, 55.0091552734375, 43.944149017333984, -0.150390625, 119.29676818847656, 13.862834930419922, 13.934127807617188, 122.84153747558594, 135.729736328125, 97.78148651123047, 154.05172729492188, 5.8811492919921875, 78.13970947265625, 36.412811279296875, 20.138465881347656, 32.29726791381836, -5.61407470703125, -143.30050659179688, 52.40998840332031, -74.80708312988281, 43.39744567871094, 91.31301879882812, -21.61160659790039, 70.92034912109375, 43.2530632019043, -29.6890811920166, 136.52035522460938, -24.5450439453125, 58.52398681640625, 160.70474243164062, 83.00545501708984, 131.42245483398438, 167.52471923828125, 2.8654632568359375, 19.585845947265625, -80.16939544677734, -9.415252685546875, 28.82628631591797, -12.626964569091797, 60.25005340576172, -9.634632110595703, 20.384193420410156, 205.16888427734375, 3.3309783935546875, 144.01666259765625, -14.571876525878906, -11.134620666503906, 30.79034423828125, -0.8183937072753906, 158.15518188476562, 1.4080810546875, -7.700437545776367, -0.6688518524169922, 166.22158813476562, 29.849464416503906, 40.05796813964844, -51.63445281982422, 74.65097045898438, -96.24005889892578, 153.7046661376953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000369.npy"}
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 55.63139343261719, "std": 82.79155731201172, "min": -177.3505401611328, "p10": -31.0930221557617, "median": 46.8736686706543, "p90": 165.6518295288086, "max": 187.01834106445312, "pos_frac": 0.765625, "sample": [65.21862030029297, 165.474853515625, 116.42701721191406, 54.447235107421875, 187.01834106445312, 4.664154052734375, 172.6528778076172, 4.9465789794921875, 162.45889282226562, 4.963451385498047, 16.189102172851562, 49.27201843261719, 1.9832382202148438, 148.96408081054688, 181.70545959472656, 84.36613464355469, 41.33351135253906, -87.89946746826172, -12.335006713867188, 50.342620849609375, 53.638031005859375, 91.36485290527344, 85.9959716796875, 85.33848571777344, 163.9209442138672, 108.22897338867188, 79.40925598144531, 34.319217681884766, -5.696495056152344, -4.321319580078125, -2.4113311767578125, -12.324317932128906, -2.0091171264648438, -78.37532043457031, 14.467376708984375, 69.35066986083984, 179.58151245117188, 40.07792663574219, 157.4569091796875, 144.12672424316406, 177.1207275390625, -177.3505401611328, 21.360071182250977, 82.56631469726562, 14.912534713745117, 167.14852905273438, 102.54220581054688, 125.19355773925781, 148.03732299804688, 165.72767639160156, -160.11056518554688, 5.00213623046875, 149.58602905273438, -2.6830615997314453, -39.132171630859375, 18.525039672851562, 27.647979736328125, -5.167144775390625, -70.4461441040039, 1.0870952606201172, 44.475318908691406, 150.47537231445312, -40.98563766479492, 40.544090270996094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000370.npy"}
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 48.98352813720703, "std": 76.47982025146484, "min": -87.65139770507812, "p10": -61.254566192626946, "median": 34.688232421875, "p90": 154.40694427490237, "max": 192.15185546875, "pos_frac": 0.71875, "sample": [144.61712646484375, -87.65139770507812, 173.328125, 34.531105041503906, 80.83004760742188, 4.142730712890625, 125.71006774902344, 17.8500919342041, -18.030380249023438, 13.136629104614258, -20.762680053710938, 122.24777221679688, -64.23558044433594, 168.7287139892578, -0.47090911865234375, 192.15185546875, -4.120725631713867, 113.76649475097656, 20.663864135742188, 157.90309143066406, 34.72178649902344, -80.80638122558594, -6.909090042114258, 2.6281356811523438, 34.65467834472656, 136.95309448242188, 121.09600067138672, 188.46493530273438, 128.89337158203125, 88.0905990600586, 117.821533203125, 170.57699584960938, -64.76238250732422, 5.7230682373046875, 85.67467498779297, 21.023712158203125, -7.683738708496094, -83.37178039550781, -15.221847534179688, 167.54539489746094, -73.62940216064453, 18.59463882446289, 5.378767013549805, 74.96409606933594, 61.41319274902344, -41.24195861816406, 112.70594024658203, 80.10161590576172, -31.418899536132812, 38.54522705078125, 108.44728088378906, -10.978042602539062, 5.808982849121094, 93.88723754882812, 103.10662841796875, 0.4838409423828125, 118.34188079833984, 16.98430633544922, 67.63841247558594, 58.94715118408203, 146.249267578125, 86.6197738647461, -54.298866271972656, -71.154052734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000371.npy"}
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 30.20751190185547, "std": 90.02851867675781, "min": -160.12353515625, "p10": -93.42602615356445, "median": 11.442438125610352, "p90": 162.0102005004883, "max": 180.53280639648438, "pos_frac": 0.6875, "sample": [-31.356109619140625, 15.927249908447266, 157.84524536132812, 103.96475219726562, 75.53485107421875, 79.8338851928711, 171.9398956298828, -34.420753479003906, 144.52566528320312, 112.8723373413086, 180.53280639648438, -6.1025390625, 102.16389465332031, 175.78955078125, 38.45563507080078, 114.35646057128906, 128.10745239257812, -150.09988403320312, 11.440681457519531, 18.50843048095703, 11.444194793701172, -12.520689010620117, 28.57196044921875, 157.15428161621094, 7.145648956298828, -12.867225646972656, -93.49380493164062, 8.0848388671875, 166.7349090576172, -93.26787567138672, 100.14250946044922, 34.50209045410156, -42.4398307800293, -4.276927947998047, -97.92312622070312, 11.156209945678711, -160.12353515625, 9.69793701171875, 132.33531188964844, 34.37311553955078, 178.8853759765625, 61.97133255004883, 120.79226684570312, 59.66493225097656, -65.31085205078125, 72.85922241210938, -151.8629608154297, 2.3227310180664062, 0.3805522918701172, 7.186347961425781, 6.212837219238281, -143.56068420410156, 97.83139038085938, 2.85986328125, -49.62501525878906, -51.45240783691406, -110.05828857421875, 40.269554138183594, 169.98724365234375, 163.79518127441406, -74.5936279296875, -11.8770751953125, 4.3800048828125, 7.973419189453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000372.npy"}
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 40.5648193359375, "std": 89.31717681884766, "min": -158.46688842773438, "p10": -88.03801345825195, "median": 39.72220420837402, "p90": 159.99585571289063, "max": 192.1858673095703, "pos_frac": 0.6875, "sample": [-134.02452087402344, 151.2001495361328, 140.9368896484375, 110.33392333984375, 148.53506469726562, -131.510009765625, -44.92278289794922, 131.62705993652344, 111.88325500488281, 39.8575439453125, 127.42434692382812, -119.83588409423828, 17.13494873046875, -36.50282287597656, -97.73648834228516, -18.164691925048828, 60.988067626953125, 81.52860260009766, -158.46688842773438, -39.68711853027344, 67.47425842285156, -18.901206970214844, 84.8779067993164, -62.61591339111328, 161.77243041992188, 151.54254150390625, -17.593490600585938, -5.5536651611328125, 79.92699432373047, 93.82073211669922, 163.4203643798828, 1.7492828369140625, 39.58686447143555, 158.74667358398438, -88.56612396240234, 2.128406524658203, -10.494964599609375, 64.4354248046875, 2.3334426879882812, 148.44485473632812, 105.21214294433594, -106.732177734375, -86.80575561523438, -19.38547134399414, 46.95655822753906, 130.7645263671875, -59.99293518066406, 163.38514709472656, 12.57305908203125, 48.668800354003906, -5.4550628662109375, 163.87989807128906, 2.5733413696289062, 192.1858673095703, 38.63372802734375, 27.94936752319336, 74.21553039550781, 57.84761047363281, 26.255943298339844, 189.3115692138672, 45.567962646484375, 1.0748405456542969, 29.799415588378906, 160.53121948242188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000373.npy"}
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 68.79034423828125, "std": 79.44470977783203, "min": -159.9183807373047, "p10": -6.6439689636230455, "median": 59.45279502868652, "p90": 172.24407348632815, "max": 230.76144409179688, "pos_frac": 0.859375, "sample": [230.76144409179688, 145.91908264160156, 147.66632080078125, -7.0590667724609375, 34.62247085571289, 135.62799072265625, 177.0450897216797, 174.69735717773438, 166.51974487304688, 52.798728942871094, 77.2913589477539, 12.638763427734375, 106.29158020019531, 211.46151733398438, 88.29827117919922, 25.1304931640625, 149.8880615234375, 3.9449310302734375, 24.033493041992188, 56.303993225097656, 25.13373565673828, 137.354736328125, 46.9335823059082, -98.79707336425781, 96.27020263671875, 92.27910614013672, 49.42451858520508, 126.37561798095703, -5.675407409667969, 106.60406494140625, 71.18191528320312, 71.03836059570312, 0.6733055114746094, 3.3901748657226562, 83.73658752441406, 60.58871841430664, 2.440887451171875, -159.9183807373047, 196.128662109375, 100.971923828125, 37.1834831237793, -115.3270034790039, 30.7603816986084, 186.52967834472656, 185.55902099609375, 51.763572692871094, -43.01263427734375, 3.88739013671875, 58.316871643066406, 42.902015686035156, 143.2523193359375, 139.80184936523438, 157.8824920654297, 68.17634582519531, 6.745548248291016, 2.9402828216552734, 57.63017272949219, 97.47784423828125, 146.61260986328125, 155.0122833251953, -50.744102478027344, -19.652359008789062, 42.95259094238281, -4.0856475830078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000374.npy"}
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 47.758262634277344, "std": 82.93278503417969, "min": -161.6555633544922, "p10": -48.04499053955078, "median": 46.44721603393555, "p90": 159.90617828369142, "max": 206.90823364257812, "pos_frac": 0.703125, "sample": [-46.587013244628906, -85.24510955810547, 44.308929443359375, 117.97535705566406, 91.34698486328125, 24.445138931274414, 176.28558349609375, -4.6866455078125, 10.544830322265625, 103.36985778808594, -8.771095275878906, 118.66279602050781, -44.11566162109375, 153.85348510742188, 5.547092437744141, -3.094623565673828, 52.81055450439453, -110.58732604980469, 112.63619232177734, -3.91064453125, 58.31269073486328, 164.65045166015625, 160.89608764648438, -63.867706298828125, 121.72638702392578, 163.65789794921875, 2.2210845947265625, -33.411102294921875, 76.57640075683594, 149.12242126464844, 5.101234436035156, 1.1630268096923828, 12.047185897827148, -48.669837951660156, 83.00961303710938, 74.93743896484375, 82.75870513916016, 25.501083374023438, 166.7588653564453, 83.55331420898438, 113.7916259765625, 48.58550262451172, 130.83175659179688, -161.6555633544922, -18.999404907226562, 21.759292602539062, 175.38546752929688, 2.7218093872070312, 206.90823364257812, 93.77273559570312, 148.89785766601562, 60.38896179199219, -26.394203186035156, 139.81027221679688, -37.71060562133789, 15.833091735839844, 97.24678802490234, -97.42439270019531, -40.20165252685547, 157.5963897705078, -6.4779205322265625, 110.36470794677734, 11.712898254394531, -81.04866027832031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000375.npy"}
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 58.73527908325195, "std": 85.80644226074219, "min": -149.7057647705078, "p10": -54.58490867614746, "median": 52.819196701049805, "p90": 165.37988739013673, "max": 247.81057739257812, "pos_frac": 0.703125, "sample": [37.69189453125, 1.1557769775390625, -2.6448307037353516, 30.412322998046875, -4.706169128417969, -13.052974700927734, -4.0548858642578125, 140.9119110107422, -7.313812255859375, 57.166168212890625, 101.55096435546875, 90.23237609863281, -149.7057647705078, -2.8413772583007812, -59.25935363769531, 12.402915954589844, 179.1759033203125, 66.23211669921875, 6.163181304931641, 247.81057739257812, 42.282379150390625, 175.2298583984375, 3.2088623046875, 154.7155303955078, 190.864990234375, -20.07635498046875, 146.13790893554688, 149.24656677246094, 151.16036987304688, 101.9450912475586, 130.04225158691406, 53.0267333984375, 160.16934204101562, 34.155548095703125, 88.76714324951172, 170.4788055419922, 123.61190795898438, -48.05680465698242, 167.43524169921875, -78.68617248535156, 63.002838134765625, -75.73207092285156, 145.52511596679688, 127.92579650878906, 120.42913055419922, 146.87429809570312, -57.382667541503906, 129.29747009277344, -43.6290283203125, 150.682373046875, -72.05803680419922, 19.846710205078125, 17.701759338378906, 52.61166000366211, 168.1678466796875, -3.966135025024414, 115.97593688964844, -9.568218231201172, 160.5840606689453, 83.09513854980469, -45.13893127441406, 12.386640548706055, 0.20401763916015625, -70.76383972167969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000376.npy"}
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 42.29701232910156, "std": 79.1687240600586, "min": -154.27281188964844, "p10": -52.50183639526366, "median": 34.28352737426758, "p90": 150.36964569091796, "max": 185.67120361328125, "pos_frac": 0.703125, "sample": [105.3170166015625, -31.49518585205078, 132.83663940429688, 134.4934844970703, 70.09089660644531, 5.565299987792969, -106.23017120361328, -65.99514770507812, 42.90485382080078, 20.96247100830078, 162.32601928710938, 1.2252273559570312, 122.53621673583984, -3.31463623046875, 149.52537536621094, 141.16537475585938, -113.82994842529297, 66.75113677978516, -65.62910461425781, 3.7203826904296875, 167.66905212402344, -3.7411880493164062, 80.67051696777344, 25.858985900878906, 106.81501770019531, -150.3436279296875, 16.963062286376953, 185.67120361328125, 41.911434173583984, -18.191442489624023, 144.8546142578125, -8.107122421264648, 118.28775787353516, 0.99041748046875, -2.3132476806640625, -57.613502502441406, 33.241180419921875, 35.32587432861328, 151.69143676757812, 20.553146362304688, -5.181404113769531, 86.12720489501953, 2.8773326873779297, 79.14944458007812, 68.516357421875, 160.81631469726562, 65.24171447753906, -9.28122329711914, -40.574615478515625, 53.11476135253906, 136.80023193359375, 9.34918212890625, -17.071969985961914, 160.27899169921875, 3.168487548828125, -15.150171279907227, 150.73147583007812, 84.05762481689453, -154.27281188964844, 9.076377868652344, 84.84485626220703, -6.627388000488281, 50.10969543457031, 87.78849792480469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000377.npy"}
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 40.23735427856445, "std": 76.35681915283203, "min": -155.1912841796875, "p10": -22.795986557006835, "median": 19.967571258544922, "p90": 153.94950714111332, "max": 189.57070922851562, "pos_frac": 0.734375, "sample": [134.93630981445312, 79.3576431274414, 0.5099582672119141, 14.517349243164062, 81.64591217041016, 147.39248657226562, -36.90016174316406, 16.452472686767578, 46.58186340332031, 137.9748992919922, -155.1912841796875, 2.8137569427490234, 189.57070922851562, -3.929698944091797, 160.3765106201172, 109.25238800048828, 68.45015716552734, 2.8927955627441406, 160.1956787109375, -4.823520660400391, 76.92787170410156, 10.259521484375, 11.970134735107422, 26.19579315185547, 156.75965881347656, 11.877227783203125, 68.13878631591797, 8.01861572265625, -79.40023803710938, 23.482913970947266, 100.36407470703125, 27.6719970703125, -84.98677062988281, -1.4917125701904297, 107.09005737304688, 20.72827911376953, -22.004959106445312, 19.206863403320312, 135.88519287109375, 0.09953498840332031, 172.3680877685547, 82.82981872558594, -1.8050537109375, 141.9816131591797, 24.249298095703125, 136.93251037597656, -17.184093475341797, 159.5022430419922, 79.00041961669922, 9.178937911987305, 2.5840110778808594, -8.208908081054688, 6.313510894775391, -128.9781951904297, -23.134998321533203, 53.276397705078125, -16.64546775817871, 49.67431640625, 2.765420913696289, 71.28743743896484, -6.018695831298828, -136.56382751464844, -6.77720832824707, 159.69403076171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000378.npy"}
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 53.86410903930664, "std": 80.73590850830078, "min": -135.0699462890625, "p10": -41.77860336303711, "median": 37.53767013549805, "p90": 162.65017242431642, "max": 215.93563842773438, "pos_frac": 0.796875, "sample": [84.48388671875, -19.822616577148438, 149.98973083496094, -5.027732849121094, 14.3768310546875, 49.19651794433594, 20.2343807220459, -135.0699462890625, 66.29829406738281, 7.74407958984375, 18.482872009277344, 33.43275451660156, -82.66432189941406, 132.56576538085938, 17.76740264892578, 215.93563842773438, 34.60003662109375, -46.94062423706055, 143.02182006835938, 172.1280517578125, 213.96380615234375, 86.72714233398438, 159.71363830566406, 128.21981811523438, 44.01214599609375, -43.539306640625, 97.10017395019531, -91.12834167480469, 39.46991729736328, -63.61962890625, 212.42315673828125, 39.66997528076172, 2.749542236328125, 48.82770919799805, -84.854248046875, 161.83721923828125, 157.01971435546875, 9.924354553222656, 15.746261596679688, -4.787273406982422, -37.67029571533203, 12.085212707519531, 58.17737579345703, 170.97140502929688, 139.59078979492188, 11.91952133178711, 162.9985809326172, 200.0408935546875, 1.4450340270996094, -15.008621215820312, 129.77818298339844, 15.09539794921875, 24.930679321289062, 60.734840393066406, 124.2587661743164, 1.1092681884765625, 35.60542297363281, 69.16386413574219, 45.242252349853516, 2.923433303833008, 103.3758316040039, 30.641067504882812, 115.80914306640625, -16.123809814453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000379.npy"}
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 67.21205139160156, "std": 83.4656982421875, "min": -163.61785888671875, "p10": -26.971783256530756, "median": 75.50872802734375, "p90": 162.3717834472656, "max": 209.20840454101562, "pos_frac": 0.796875, "sample": [42.72114181518555, 151.61203002929688, 88.1412353515625, 30.0894775390625, -67.03681945800781, -118.95511627197266, 162.36883544921875, 163.14364624023438, -3.994976043701172, 142.93446350097656, 124.71609497070312, 41.3889045715332, -5.739772796630859, 84.51820373535156, 5.996540069580078, -37.02033996582031, -0.8854103088378906, 62.21412658691406, 143.05706787109375, 162.373046875, 87.23475646972656, 65.86217498779297, -14.427032470703125, -163.61785888671875, 151.95449829101562, 2.9976348876953125, 132.55026245117188, 71.07544708251953, -28.894121170043945, 23.99171257019043, 124.56591796875, 139.37478637695312, 79.94200897216797, 67.03068542480469, 209.20840454101562, 7.009613037109375, 140.83969116210938, 138.14328002929688, 26.368240356445312, 0.04120635986328125, 149.66168212890625, -141.03829956054688, 160.868408203125, 160.67115783691406, 51.231597900390625, 132.74986267089844, -8.918601989746094, 114.17462158203125, -22.486328125, 118.3875732421875, 14.76568603515625, 4.079572677612305, -47.17962646484375, 179.76104736328125, 164.89878845214844, 158.0126190185547, 8.290521621704102, 173.15042114257812, 124.32349395751953, 169.43777465820312, 80.19432830810547, 85.99393463134766, 18.82132911682129, 18.826202392578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000380.npy"}
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 26.428909301757812, "std": 82.66667938232422, "min": -186.4525909423828, "p10": -64.44556121826172, "median": 12.18032455444336, "p90": 157.29839477539065, "max": 217.94964599609375, "pos_frac": 0.671875, "sample": [17.4356689453125, 3.8847808837890625, 0.23866844177246094, -95.7115478515625, -186.4525909423828, 7.122417449951172, -1.8630599975585938, 24.252628326416016, 171.25, 81.36637878417969, 10.625419616699219, -4.731540679931641, 11.119438171386719, 33.44309616088867, 0.8426342010498047, 217.94964599609375, 117.9937744140625, 0.5322151184082031, 71.32872009277344, -2.1974220275878906, 10.960065841674805, -73.42180633544922, 8.717031478881836, 186.2958221435547, -103.33927154541016, 73.69377899169922, -59.038421630859375, -57.67366027832031, 101.3880615234375, -2.696470260620117, 63.65892791748047, 150.61212158203125, 16.369094848632812, 13.888107299804688, -23.88486099243164, 170.24696350097656, 33.78200912475586, -37.821292877197266, -127.86485290527344, 18.226715087890625, 160.1639404296875, -12.856704711914062, 17.14844512939453, 9.930074691772461, 50.34876251220703, 168.4044647216797, 128.94017028808594, 66.6595687866211, -2.264066696166992, 31.391029357910156, 4.55267333984375, 113.58747863769531, 124.48698425292969, -14.832077026367188, 35.13320541381836, 13.2412109375, 84.42874145507812, -174.9381103515625, -66.15555572509766, 44.74751281738281, -2.769744873046875, -60.45557403564453, -29.183433532714844, 161.21380615234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000381.npy"}
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 45.52073287963867, "std": 69.90425109863281, "min": -154.14483642578125, "p10": -27.450645446777344, "median": 31.659873962402344, "p90": 146.61829833984376, "max": 191.51864624023438, "pos_frac": 0.75, "sample": [-13.283737182617188, 5.848123550415039, 182.4210205078125, 64.22036743164062, 166.9847412109375, -12.153457641601562, 36.21403503417969, 28.396865844726562, 106.97723388671875, 80.62042236328125, 70.06219482421875, -8.421859741210938, -26.426132202148438, 25.305694580078125, 137.87937927246094, 20.609481811523438, -26.8740234375, -30.1121826171875, -154.14483642578125, 133.6574249267578, -1.3077144622802734, 51.78904724121094, 15.542312622070312, 3.9539222717285156, -27.697769165039062, 0.509429931640625, 88.7844009399414, 44.572715759277344, -64.87908172607422, 167.85519409179688, 79.35694885253906, 14.56924057006836, -60.917362213134766, -59.86692810058594, 161.84262084960938, 5.452644348144531, -44.10270309448242, 1.4442291259765625, 37.88092041015625, 146.70620727539062, 70.28593444824219, 191.51864624023438, -0.9582061767578125, 26.842239379882812, 34.922882080078125, 90.6230697631836, -13.36307144165039, 1.9515762329101562, 24.9957275390625, 81.96733856201172, 83.69375610351562, 114.50883483886719, -21.96420669555664, 43.852230072021484, 131.8734130859375, 1.872161865234375, 73.57229614257812, 28.340587615966797, 146.41317749023438, 161.56153869628906, 97.88499450683594, 110.73454284667969, 68.8389892578125, 14.089302062988281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000382.npy"}
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 47.5542106628418, "std": 88.66026306152344, "min": -175.29074096679688, "p10": -66.06408081054687, "median": 31.44818878173828, "p90": 165.95009613037112, "max": 201.4486083984375, "pos_frac": 0.703125, "sample": [-1.6974430084228516, 145.13035583496094, 91.78379821777344, 8.887222290039062, 17.574180603027344, -52.96424865722656, 126.71174621582031, 61.51813507080078, 18.824317932128906, -87.86019897460938, 2.5884437561035156, -91.65396118164062, 79.96211242675781, 70.52613067626953, 110.52698516845703, 56.882354736328125, 153.11697387695312, -68.54096221923828, 176.38943481445312, 150.75892639160156, 28.092147827148438, -16.337799072265625, 174.2954864501953, -39.930450439453125, 159.78305053710938, 21.692962646484375, 43.417816162109375, -13.884109497070312, 21.160165786743164, 189.49313354492188, 131.9685516357422, -60.284690856933594, -9.906425476074219, 190.2417449951172, -26.067745208740234, 94.1427001953125, 142.49603271484375, -175.29074096679688, 9.10516357421875, 74.920654296875, -104.08226776123047, 43.10028076171875, 15.77410888671875, 39.47752380371094, 167.5587921142578, -88.11628723144531, 20.335556030273438, 34.804229736328125, -50.791229248046875, 201.4486083984375, 22.595495223999023, 178.44656372070312, 121.6144027709961, 12.901481628417969, -7.759002685546875, 147.8056640625, 135.367919921875, 5.4901123046875, -10.018630981445312, -96.55174255371094, 159.7882843017578, -36.06251525878906, 162.19647216796875, 60.573760986328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000383.npy"}
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 46.731719970703125, "std": 75.49002838134766, "min": -198.73477172851562, "p10": -41.31972541809082, "median": 44.728919982910156, "p90": 160.41666259765626, "max": 191.63211059570312, "pos_frac": 0.84375, "sample": [-51.0093994140625, 159.924560546875, 49.58399963378906, 120.85905456542969, 4.671588897705078, 78.88605499267578, 87.9554214477539, 43.937721252441406, 170.0355987548828, -41.03060531616211, 42.48797607421875, 44.43190002441406, 46.36456298828125, 9.949699401855469, -12.085945129394531, 22.992431640625, 5.656890869140625, 178.1827392578125, 2.695220947265625, 162.50450134277344, -14.338951110839844, 45.02593994140625, 22.67058563232422, -41.443634033203125, -109.47177124023438, 90.7699203491211, 191.63211059570312, 56.19823455810547, -113.94928741455078, 101.67878723144531, 165.74903869628906, 114.028564453125, 2.047557830810547, 72.11687469482422, 83.7552490234375, 6.220306396484375, 171.69125366210938, 28.730060577392578, -198.73477172851562, 149.2413330078125, 57.622711181640625, 63.24603271484375, 99.40166473388672, -49.05604553222656, 160.6275634765625, 68.83047485351562, 30.855152130126953, 8.697965621948242, 40.16383743286133, 126.4797134399414, 65.8045654296875, 15.278182983398438, 64.03828430175781, 38.59867858886719, 11.303428649902344, 1.3511466979980469, -96.47594451904297, 49.67352294921875, 2.487274169921875, 1.3667869567871094, 65.49673461914062, 124.95745086669922, 58.54127502441406, 30.928184509277344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000384.npy"}
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 62.60823440551758, "std": 75.86747741699219, "min": -64.12689971923828, "p10": -37.0130615234375, "median": 50.251115798950195, "p90": 167.1607437133789, "max": 205.1511688232422, "pos_frac": 0.765625, "sample": [-64.12689971923828, -6.161632537841797, 158.1878662109375, 84.76089477539062, 35.457427978515625, 7.927276611328125, 41.5111083984375, 117.36460876464844, 22.479171752929688, 33.637725830078125, -55.10636520385742, 104.26435852050781, 89.2400894165039, 160.3577880859375, 48.09777069091797, 140.7396240234375, 167.43212890625, 137.2937774658203, 0.5147933959960938, 49.57143020629883, -36.706085205078125, -45.78950500488281, 33.12095642089844, 163.68276977539062, 166.5275115966797, 182.57383728027344, 42.5129508972168, 63.48188781738281, 52.409278869628906, 121.47909545898438, 108.52175903320312, -58.530235290527344, 50.93080139160156, 9.69189453125, 53.37698745727539, 168.24728393554688, 142.98101806640625, 96.03720092773438, 95.14088439941406, 109.34129333496094, -27.525676727294922, -48.57061004638672, 117.81155395507812, 34.91789245605469, 2.0848922729492188, 205.1511688232422, 184.7271270751953, -45.01933288574219, -6.406887054443359, -17.488922119140625, -20.524932861328125, 33.443111419677734, -37.144622802734375, -4.6222991943359375, 133.72662353515625, 78.0274887084961, 11.170921325683594, 159.6177520751953, -30.934532165527344, 178.64793395996094, 4.043117523193359, 1.7236557006835938, 168.00692749023438, 139.59010314941406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000385.npy"}
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 39.64215087890625, "std": 74.09783935546875, "min": -172.37240600585938, "p10": -44.604446411132805, "median": 52.321651458740234, "p90": 129.10058593750003, "max": 183.3477783203125, "pos_frac": 0.734375, "sample": [90.36447143554688, -96.77901458740234, 2.732532501220703, 47.68788146972656, 61.10627746582031, -30.187469482421875, -0.8634872436523438, 74.47378540039062, 76.95951843261719, 85.2058334350586, 26.079574584960938, 69.33897399902344, -137.11346435546875, 7.540672302246094, 22.56060791015625, 132.6666259765625, 124.21823120117188, 121.08232116699219, 131.19302368164062, 97.66696166992188, 13.731771469116211, 78.68054962158203, 56.955421447753906, -64.02824401855469, -53.616546630859375, 132.7257843017578, 117.59344482421875, -15.6041259765625, 113.0239486694336, 80.6645278930664, 17.708839416503906, 70.0318603515625, -15.121391296386719, 17.558502197265625, -28.85280990600586, -30.389450073242188, 59.386573791503906, 12.456329345703125, 89.61956787109375, 163.47634887695312, 15.168060302734375, -34.87805938720703, 74.63134002685547, 94.916748046875, 109.77326202392578, -24.20816421508789, -46.85595703125, 23.78656768798828, 0.9946308135986328, 183.3477783203125, 10.293190002441406, -39.350921630859375, -113.51138305664062, 111.85820007324219, 181.05490112304688, 66.59098052978516, 62.92939758300781, 74.86295318603516, 76.30586242675781, 145.08184814453125, 14.231582641601562, -12.174922943115234, 12.687484741210938, -172.37240600585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000386.npy"}
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 60.14110565185547, "std": 71.39328002929688, "min": -124.10350036621094, "p10": -16.406113815307616, "median": 46.752838134765625, "p90": 166.96411132812503, "max": 228.9857940673828, "pos_frac": 0.828125, "sample": [43.23036193847656, 9.239479064941406, 137.69114685058594, -34.704139709472656, 6.2990264892578125, 182.66722106933594, 68.61256408691406, -15.225528717041016, 46.82942199707031, 228.9857940673828, 19.363252639770508, -3.4009037017822266, 47.937767028808594, 49.377349853515625, 168.5135040283203, 2.6717300415039062, 32.33680725097656, 146.2001495361328, 153.80618286132812, 13.520233154296875, 121.71893310546875, 169.21170043945312, 95.78678894042969, -23.291545867919922, 2.7446060180664062, 128.4127655029297, 6.5970458984375, 18.959352493286133, 152.67897033691406, 64.68470764160156, 1.2504768371582031, -49.623382568359375, 41.05554962158203, 163.34886169433594, -16.912078857421875, 94.83885955810547, 59.55319595336914, 27.51823616027832, 18.58205795288086, -5.7855072021484375, 0.2208690643310547, 62.50140380859375, 95.10985565185547, -124.10350036621094, -32.92601013183594, 56.496212005615234, 125.11555480957031, 178.41424560546875, 46.67625427246094, 159.20156860351562, -10.765644073486328, 43.93756103515625, 24.57940673828125, 64.97654724121094, 68.94438171386719, 41.44260787963867, -24.22388458251953, 19.265195846557617, 87.84504699707031, 184.0321807861328, 202.7459716796875, 76.44501495361328, 33.78131103515625, 94.0374984741211], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000387.npy"}
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 37.45484924316406, "std": 85.8631362915039, "min": -154.23428344726562, "p10": -56.827610778808584, "median": 20.528926849365234, "p90": 153.74555511474608, "max": 220.77685546875, "pos_frac": 0.703125, "sample": [164.15011596679688, 80.4378890991211, 113.88839721679688, 184.14427185058594, 53.69597625732422, 110.62287902832031, 34.90171813964844, 22.617630004882812, 20.871299743652344, -30.529869079589844, 36.028038024902344, 144.609619140625, -12.603988647460938, 102.74520111083984, -21.276992797851562, 220.77685546875, 2.2759361267089844, 211.92945861816406, 16.614654541015625, 120.31747436523438, 144.94332885742188, 2.334381103515625, 145.98890686035156, 20.186553955078125, 153.91177368164062, -154.23428344726562, 48.59004211425781, 66.31524658203125, 16.55820655822754, 4.971803665161133, -1.5892353057861328, -14.58952522277832, -8.005302429199219, 149.67230224609375, -2.384267807006836, 23.44854736328125, 26.847354888916016, -59.6649169921875, 49.03990936279297, -87.8062744140625, 53.05968475341797, 136.82830810546875, -120.93858337402344, 124.71885681152344, 158.58973693847656, 4.996612548828125, 3.8196640014648438, 0.28388214111328125, 13.838226318359375, 153.3577117919922, -7.0772247314453125, 161.51953125, -50.20722961425781, 5.6883392333984375, -135.84283447265625, -49.54560852050781, -48.53865051269531, 73.02340698242188, -10.962543487548828, 5.453529357910156, 48.41209411621094, -118.30268096923828, -109.56838989257812, 3.7534046173095703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000388.npy"}
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 61.06333541870117, "std": 79.54638671875, "min": -158.5655517578125, "p10": -25.99077186584472, "median": 48.05550003051758, "p90": 161.96338653564453, "max": 220.75875854492188, "pos_frac": 0.71875, "sample": [-42.66587829589844, -8.871397018432617, 87.59685516357422, 56.815673828125, 116.72862243652344, 39.10150146484375, 124.72999572753906, 142.28366088867188, 187.865966796875, 107.95962524414062, -45.9312744140625, -28.359329223632812, 69.150634765625, 116.3885269165039, 24.325546264648438, 87.0457763671875, 93.90806579589844, 156.22030639648438, 80.45330810546875, 26.595882415771484, 154.9735565185547, 153.58877563476562, -2.8220157623291016, 22.84991455078125, 127.12835693359375, -12.598625183105469, 147.14707946777344, -20.250755310058594, 36.34220886230469, 80.114990234375, -9.749664306640625, 138.29312133789062, -14.190628051757812, 161.1051025390625, 169.63018798828125, 22.84033203125, 178.54513549804688, 89.34798431396484, 37.011661529541016, -158.5655517578125, 188.50326538085938, -4.582996368408203, 5.638612747192383, -20.46413803100586, 47.11890411376953, 10.394927978515625, 160.6970977783203, -42.061641693115234, -79.932861328125, 27.48602294921875, 4.552360534667969, 2.938121795654297, 48.992095947265625, -1.834075927734375, -30.84008026123047, 186.06558227539062, 162.3312225341797, 145.2952117919922, 63.488311767578125, 220.75875854492188, -7.81181526184082, 128.09678649902344, 2.6620655059814453, -1.5214576721191406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000389.npy"}
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 46.405982971191406, "std": 72.13442993164062, "min": -152.35079956054688, "p10": -42.67697601318359, "median": 33.63225555419922, "p90": 149.82766418457032, "max": 180.27914428710938, "pos_frac": 0.78125, "sample": [-57.767059326171875, 81.48479461669922, 59.54869079589844, 30.048370361328125, 136.44964599609375, 34.769554138183594, 138.13134765625, 112.2794418334961, -51.20477294921875, 32.494956970214844, 0.2476482391357422, -22.200820922851562, 132.4873046875, 28.637380599975586, 2.4859848022460938, 180.27914428710938, 149.06744384765625, -43.73529815673828, 99.31669616699219, -46.028663635253906, 109.87539672851562, 9.075813293457031, 7.222343444824219, 157.22671508789062, 126.6263427734375, 150.75726318359375, 11.97802734375, -74.59864807128906, 1.9174079895019531, 110.70674896240234, -23.231605529785156, 6.869009971618652, -17.58021354675293, 36.943443298339844, -7.250766754150391, -40.207557678222656, -152.35079956054688, 83.43759155273438, 153.33984375, 0.9346809387207031, 150.15347290039062, 12.330957412719727, -3.263519287109375, 54.836822509765625, 0.9442100524902344, 74.4217758178711, 12.517822265625, 30.146835327148438, 169.15594482421875, 38.791343688964844, 129.347900390625, 106.81851196289062, 7.511688232421875, 43.60436248779297, 3.3711929321289062, 95.33071899414062, 0.3720874786376953, -70.60163879394531, 91.32464599609375, -22.11646842956543, 81.32754516601562, 75.67088317871094, 65.03687286376953, 174.46607971191406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000390.npy"}
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 55.940330505371094, "std": 87.07466125488281, "min": -157.09828186035156, "p10": -54.942006683349604, "median": 61.86225509643555, "p90": 164.31560058593752, "max": 227.49986267089844, "pos_frac": 0.78125, "sample": [17.929244995117188, -157.09828186035156, 163.48123168945312, 61.6695556640625, 53.86516571044922, 169.56716918945312, 9.700252532958984, 164.67318725585938, 158.2828826904297, 226.99542236328125, 72.28665161132812, -141.59381103515625, 125.21826171875, -12.898910522460938, 78.49188232421875, 183.88092041015625, -93.63018798828125, 44.029624938964844, -42.343475341796875, 29.050148010253906, -5.3518524169921875, 126.62446594238281, 83.88751220703125, 62.054954528808594, 29.63330841064453, -47.76042175292969, -117.2869873046875, -13.447933197021484, 74.18701171875, -9.834556579589844, 227.49986267089844, 109.97616577148438, 85.59589385986328, -2.8314056396484375, 71.02325439453125, 1.0300369262695312, 16.764650344848633, 85.71701049804688, 111.31596374511719, 130.73092651367188, 116.0180435180664, 73.11784362792969, -134.5299835205078, -71.67437744140625, 57.75847625732422, 45.05353927612305, 174.86776733398438, 16.63471221923828, 149.95834350585938, 132.28472900390625, 9.586669921875, 0.6996917724609375, 158.12261962890625, 137.9163818359375, 171.32009887695312, 129.98292541503906, 96.56790161132812, 52.760353088378906, -58.01982879638672, 9.426177978515625, 25.382980346679688, 65.51600646972656, 89.4984359741211, 0.8466644287109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000391.npy"}
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 60.86140441894531, "std": 87.31916809082031, "min": -156.267578125, "p10": -42.6178009033203, "median": 56.615684509277344, "p90": 175.43123016357424, "max": 262.7799072265625, "pos_frac": 0.796875, "sample": [-138.64181518554688, -0.666046142578125, 46.40644073486328, 64.08293151855469, 20.549713134765625, -12.026840209960938, 159.63790893554688, -53.647666931152344, 93.31106567382812, 155.4012908935547, 78.88423156738281, -156.267578125, 119.80386352539062, 164.28041076660156, 190.17465209960938, 58.001312255859375, 9.84075927734375, 153.43060302734375, 161.08888244628906, 11.082794189453125, 35.59740447998047, -5.471221923828125, 70.10648345947266, -87.61371612548828, 128.57135009765625, 151.1197509765625, 166.83596801757812, 55.23005676269531, 70.40518188476562, -94.06733703613281, 62.28309631347656, 184.9065704345703, 129.46055603027344, 19.925262451171875, -47.72477722167969, 79.92483520507812, 31.796266555786133, 103.05886840820312, 89.05931091308594, 147.2362060546875, 189.2193603515625, 184.05471801757812, -79.59368133544922, 114.35165405273438, 1.2126235961914062, 1.9520187377929688, 3.49896240234375, 23.211761474609375, 179.1149139404297, 0.80322265625, -0.5828094482421875, 262.7799072265625, 5.657657623291016, 33.826324462890625, -30.701522827148438, 7.285696029663086, 204.76702880859375, -16.70623779296875, 28.158721923828125, 122.72134399414062, 4.804597854614258, 104.71292877197266, 33.5501708984375, 101.66354370117188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000392.npy"}
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 62.72712707519531, "std": 74.9701919555664, "min": -104.98814392089844, "p10": -18.67591342926025, "median": 48.72534370422363, "p90": 166.1655487060547, "max": 202.95794677734375, "pos_frac": 0.75, "sample": [76.04914855957031, -4.067262649536133, 129.58453369140625, 168.70628356933594, 163.4581756591797, 91.61943817138672, 39.31278610229492, 49.78392791748047, 71.30803680419922, 48.98941421508789, -19.25954246520996, 3.0955276489257812, 183.1578369140625, 140.67678833007812, 54.041831970214844, -43.521095275878906, 2.6225357055664062, 172.40341186523438, 9.583343505859375, -3.8854942321777344, -9.383926391601562, 3.731241226196289, 27.42431640625, 184.36138916015625, 129.676513671875, 154.34547424316406, -25.59295082092285, 143.1343536376953, 123.943359375, 130.46499633789062, 48.461273193359375, -58.96001052856445, 142.3020477294922, 167.3258514404297, -0.2731647491455078, 35.59678649902344, -80.04869079589844, 176.602783203125, -0.4996490478515625, -17.643247604370117, 94.00043487548828, 19.0228271484375, 135.82777404785156, -8.360515594482422, 55.36821746826172, 131.03854370117188, 138.81642150878906, 14.14239501953125, 44.09762191772461, 129.00289916992188, 157.5125274658203, 202.95794677734375, 64.72154998779297, -6.222084045410156, 15.809669494628906, 135.37155151367188, 29.901718139648438, 47.07508850097656, -104.98814392089844, -19.118484497070312, 9.636123657226562, 117.25674438476562, -4.6138153076171875, 7.6509246826171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000393.npy"}
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 54.89018249511719, "std": 68.12503814697266, "min": -144.00791931152344, "p10": -14.621060180664061, "median": 54.32143592834473, "p90": 145.15064239501953, "max": 196.39193725585938, "pos_frac": 0.796875, "sample": [25.6865234375, 160.96205139160156, 116.11734008789062, 29.240234375, 20.793251037597656, 42.16926574707031, -11.094223022460938, 51.890106201171875, 183.23843383789062, 49.62361145019531, 164.72927856445312, 114.22874450683594, 19.690753936767578, 53.9153938293457, -44.22225570678711, 52.99462127685547, -8.162094116210938, -9.024393081665039, -13.398345947265625, 17.954418182373047, 141.9091796875, -35.43947982788086, 66.59498596191406, 75.41334533691406, 72.24162292480469, 176.99755859375, 127.80237579345703, 98.24942016601562, 117.58528900146484, 6.715858459472656, -89.34962463378906, 19.9801025390625, 156.57177734375, 83.33914184570312, 47.34510803222656, 98.24261474609375, -0.7477931976318359, 85.75001525878906, 54.72747802734375, 6.684322357177734, 92.90815734863281, -59.481021881103516, -15.14508056640625, 64.97320556640625, 87.3779296875, 196.39193725585938, -144.00791931152344, 118.52645874023438, -4.049781799316406, 67.918701171875, -88.54464721679688, 81.11673736572266, 42.788902282714844, 78.40316772460938, 68.25173950195312, 62.030982971191406, 74.02439880371094, 118.36202239990234, 2.7940444946289062, 27.290802001953125, 15.783218383789062, 146.5398406982422, 40.30365753173828, 110.46839141845703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000394.npy"}
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 45.047027587890625, "std": 74.65374755859375, "min": -109.85095977783203, "p10": -41.99283180236816, "median": 23.661678314208984, "p90": 152.5599548339844, "max": 225.52944946289062, "pos_frac": 0.734375, "sample": [115.7381362915039, 129.94790649414062, 104.85798645019531, 7.129402160644531, 124.37676239013672, 46.01301574707031, -2.077871322631836, 86.4984359741211, 172.33303833007812, -8.374589920043945, 36.442138671875, 77.3558349609375, 170.72251892089844, 118.32897186279297, 21.674957275390625, 165.100830078125, -17.602676391601562, 85.95561218261719, -13.11273193359375, 8.313247680664062, -3.02642822265625, 16.20812225341797, 7.911712646484375, 58.04269027709961, 139.81761169433594, -109.85095977783203, 25.648399353027344, 151.70294189453125, 54.52773666381836, -6.810310363769531, 55.680137634277344, 225.52944946289062, 51.690765380859375, 39.13037872314453, -84.15351867675781, 27.937042236328125, -36.7225456237793, 14.39907455444336, 83.43073272705078, -44.988502502441406, -9.07470703125, 0.28652000427246094, 150.19659423828125, 12.533123016357422, 4.783313751220703, 72.24285888671875, 2.6431655883789062, 190.77223205566406, -45.25181579589844, -18.104257583618164, 3.212512969970703, 216.17454528808594, -44.25152587890625, 5.066371917724609, 31.414047241210938, 152.92724609375, -52.3885383605957, 18.622024536132812, 81.75133514404297, -77.2518081665039, 8.58249282836914, -20.100234985351562, 4.947092056274414, 97.55182647705078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000395.npy"}
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 52.549766540527344, "std": 70.70561981201172, "min": -127.73711395263672, "p10": -19.6721866607666, "median": 31.43326187133789, "p90": 159.76871490478516, "max": 220.08851623535156, "pos_frac": 0.8125, "sample": [162.5196533203125, 220.08851623535156, 140.8942108154297, 152.58523559570312, 141.66004943847656, 97.71310424804688, 12.93893051147461, 156.97158813476562, 47.552589416503906, 28.78583526611328, 88.08385467529297, 19.526229858398438, 111.53018951416016, -35.00010299682617, 39.97223663330078, -1.1602020263671875, 34.0806884765625, 74.03144073486328, 1.50518798828125, 42.675209045410156, 68.06694030761719, -37.04994201660156, 176.82574462890625, 50.650245666503906, 3.0455551147460938, 36.82416534423828, 112.96830749511719, -127.73711395263672, 113.82769775390625, 98.48912048339844, 3.206584930419922, -64.53733825683594, 0.6361274719238281, 128.6802978515625, 25.239776611328125, 19.721221923828125, 98.09083557128906, 14.282958984375, 183.58920288085938, 164.56390380859375, -3.8549842834472656, 18.796430587768555, 1.1279144287109375, 85.59132385253906, 7.276771545410156, 57.08831787109375, 21.224388122558594, 10.372611999511719, 70.46078491210938, -41.28834533691406, 4.450035095214844, -20.700817108154297, 10.862771987915039, 15.58095932006836, -7.257358551025391, 154.71310424804688, 64.5876693725586, 170.52047729492188, 2.1305465698242188, 27.030553817749023, -17.272048950195312, -26.514179229736328, -9.048263549804688, 160.9674835205078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000396.npy"}
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 29.708749771118164, "std": 86.60920715332031, "min": -182.2151641845703, "p10": -75.77374877929687, "median": 30.285067558288574, "p90": 139.17169799804688, "max": 238.43435668945312, "pos_frac": 0.6875, "sample": [0.7812786102294922, 34.23499298095703, -68.7808837890625, 52.647674560546875, -49.82727813720703, 75.64321899414062, 126.28921508789062, 61.929115295410156, 132.83169555664062, -10.228378295898438, 1.4666938781738281, 66.44654846191406, 11.710929870605469, 93.12612915039062, 83.3390121459961, 95.76538848876953, -174.1127166748047, -112.72625732421875, 46.116939544677734, -4.112274169921875, 99.08926391601562, -10.831649780273438, 70.45219421386719, 105.07815551757812, 19.552513122558594, -182.2151641845703, -78.77069091796875, 14.940994262695312, 166.14120483398438, 56.69340515136719, 21.60553741455078, 27.79340934753418, 139.23922729492188, 37.30438232421875, 156.71006774902344, -28.44561767578125, 50.62540817260742, 143.1983642578125, 118.18942260742188, 7.40800666809082, 3.862884521484375, 182.7196044921875, 114.4444351196289, 47.25349426269531, -168.0278778076172, 37.148651123046875, 238.43435668945312, -53.169952392578125, -45.366111755371094, 3.9540252685546875, -0.4135704040527344, 52.40711975097656, -2.5079421997070312, 11.749969482421875, 0.3096179962158203, -31.735301971435547, 32.77672576904297, 48.373226165771484, -36.354095458984375, 139.01412963867188, -144.94227600097656, -93.73880767822266, -0.06583213806152344, 168.93399047851562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000397.npy"}
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 66.12939453125, "std": 75.26155853271484, "min": -62.90540313720703, "p10": -12.045615768432615, "median": 53.67776107788086, "p90": 165.4728759765625, "max": 291.94775390625, "pos_frac": 0.75, "sample": [141.97341918945312, 79.73658752441406, 123.87594604492188, 78.07252502441406, 164.8663787841797, 13.916034698486328, 168.61753845214844, 157.71624755859375, -13.218002319335938, 168.58514404296875, 107.41925048828125, 20.126693725585938, -6.44915771484375, -5.402580261230469, 106.18170166015625, 6.3543853759765625, 120.64302825927734, -5.070709228515625, 58.264923095703125, -45.63019943237305, 102.02726745605469, -5.122873306274414, 171.30960083007812, 203.07139587402344, -0.8854827880859375, 156.68893432617188, 147.19371032714844, 111.34854888916016, 46.08771514892578, 29.76422882080078, -35.686946868896484, 49.56663513183594, 16.115509033203125, 149.8568115234375, 2.3037338256835938, -6.646615982055664, 41.532379150390625, 142.09945678710938, 20.398101806640625, 64.88072204589844, -28.280502319335938, -16.54991912841797, 156.94664001464844, 85.99223327636719, 110.55606079101562, 124.76691436767578, 16.846364974975586, 8.363914489746094, -9.310047149658203, 76.72999572753906, 4.33587646484375, 291.94775390625, 16.14612579345703, -5.318214416503906, 73.51856994628906, -24.747718811035156, 177.31723022460938, -62.90540313720703, 57.78888702392578, 6.82383918762207, -8.866939544677734, 142.61160278320312, 165.73280334472656, 25.35319709777832], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000398.npy"}
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 71.71842956542969, "std": 84.07176971435547, "min": -103.99000549316406, "p10": -28.441773223876954, "median": 66.23224639892578, "p90": 180.24712524414062, "max": 300.48150634765625, "pos_frac": 0.78125, "sample": [122.54203796386719, 50.33155059814453, 124.69951629638672, 195.81317138671875, -28.024124145507812, 78.62506103515625, 143.70970153808594, 30.714744567871094, -7.251895904541016, -12.370475769042969, -77.43612670898438, 71.52252197265625, 79.1630859375, -56.268009185791016, 34.74433898925781, 69.77906799316406, 67.88052368164062, 63.133026123046875, 75.56401062011719, 196.72238159179688, -35.82391357421875, -17.703033447265625, -103.99000549316406, 124.91569519042969, 229.76260375976562, 34.55784606933594, 11.798511505126953, 300.48150634765625, 25.31206512451172, 6.541526794433594, 16.063949584960938, 1.211111068725586, 81.76829528808594, 36.5994987487793, 44.50544738769531, -28.620765686035156, 60.126426696777344, 178.0724334716797, 44.41095733642578, 185.83963012695312, 58.61609649658203, 0.7883472442626953, 178.4973602294922, -21.738922119140625, 175.41793823242188, 113.3706283569336, 78.32225799560547, 71.60220336914062, -49.81596374511719, -22.805152893066406, 164.25518798828125, 180.9970245361328, 162.65798950195312, 125.07555389404297, 143.72140502929688, 172.45169067382812, 131.03871154785156, 187.00546264648438, 64.58396911621094, -46.219547271728516, 178.3963623046875, -11.194625854492188, 77.70618438720703, 57.82592010498047], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000399.npy"}
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 43.25374221801758, "std": 75.67410278320312, "min": -148.38327026367188, "p10": -43.58563995361327, "median": 25.56472396850586, "p90": 153.75904083251956, "max": 196.0257110595703, "pos_frac": 0.703125, "sample": [-5.621051788330078, 85.8700942993164, 41.23871612548828, -21.683334350585938, 100.65518188476562, 115.45037841796875, 16.29326629638672, 120.86898803710938, 99.78646850585938, 145.38592529296875, 45.3463020324707, -52.95915222167969, 31.357357025146484, 1.8310890197753906, 1.7362327575683594, 2.7147216796875, 19.989288330078125, 157.34751892089844, 122.03748321533203, 117.58900451660156, -18.533279418945312, 30.763568878173828, 93.62322998046875, 0.7927494049072266, -62.21551513671875, 87.4601821899414, 18.44598960876465, 12.911209106445312, 22.75485610961914, -5.137754440307617, -74.89053344726562, -5.047719955444336, 119.45747375488281, -6.744712829589844, 168.70338439941406, -4.243885040283203, -53.49263000488281, 28.374591827392578, 164.4776611328125, 108.58651733398438, 175.22344970703125, 105.44495391845703, -148.38327026367188, 101.65261840820312, 9.821220397949219, 29.65674591064453, -3.365966796875, -31.063858032226562, -26.259742736816406, 196.0257110595703, 55.101985931396484, 132.49618530273438, -48.952117919921875, 48.192298889160156, -7.384405136108398, 171.23663330078125, 185.4699249267578, 72.94540405273438, 21.569778442382812, 9.279998779296875, -128.83328247070312, -10.725048065185547, 4.142173767089844, 83.66836547851562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000400.npy"}
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 60.88626480102539, "std": 74.38919830322266, "min": -72.67141723632812, "p10": -28.468026161193844, "median": 44.15607452392578, "p90": 168.56504516601564, "max": 211.6842041015625, "pos_frac": 0.796875, "sample": [-13.693641662597656, 129.1034698486328, 145.66744995117188, 55.38349914550781, -31.672229766845703, 1.1950340270996094, 13.1759033203125, 46.40864562988281, -32.54080581665039, 5.963233947753906, 172.6092529296875, -1.2395782470703125, -72.67141723632812, -1.8944625854492188, 36.342262268066406, -29.382835388183594, 75.8641357421875, 211.6842041015625, 150.79135131835938, 163.2554931640625, 57.67205047607422, 66.321533203125, 37.10984802246094, 16.68582534790039, -37.624267578125, -57.34473419189453, 62.713836669921875, 87.73773956298828, 69.22850799560547, 60.369789123535156, 93.03305053710938, 29.24317169189453, 7.668157577514648, 128.86114501953125, -4.817359924316406, 179.30197143554688, 4.711782455444336, 136.50900268554688, 24.109804153442383, 165.0482177734375, 11.706657409667969, 168.870849609375, 77.02056884765625, -26.333471298217773, 167.85150146484375, 16.317928314208984, 17.142974853515625, 207.77206420898438, 154.396728515625, 96.51924896240234, 102.68978118896484, 3.254568099975586, 12.203922271728516, 5.327507019042969, -48.021183013916016, 83.90837860107422, 72.87464904785156, 18.429101943969727, 164.2356719970703, 175.47926330566406, 41.90350341796875, -9.670108795166016, 23.60239028930664, 208.3504638671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000401.npy"}
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 50.277435302734375, "std": 84.96609497070312, "min": -131.77281188964844, "p10": -48.16056594848633, "median": 43.820552825927734, "p90": 169.77684173583987, "max": 198.1095428466797, "pos_frac": 0.734375, "sample": [87.24252319335938, 180.76388549804688, 126.8927001953125, 124.77555084228516, 116.32704162597656, 73.8720474243164, 195.96913146972656, 21.958221435546875, 46.73759460449219, 75.68302154541016, -34.71276092529297, 192.2562255859375, 42.14275360107422, -47.44871520996094, 45.49835205078125, 48.52229690551758, -2.886871337890625, 19.425668716430664, 33.11614990234375, 28.848419189453125, 107.1877212524414, -0.7854042053222656, 4.737007141113281, -104.30253601074219, 191.333740234375, 5.228137969970703, 79.43175506591797, 165.30728149414062, 60.4467658996582, 18.04279327392578, 171.69236755371094, 156.0862274169922, -11.918930053710938, 96.23381042480469, -7.2638702392578125, 60.442359924316406, 154.59921264648438, 150.313232421875, 5.988945007324219, 82.2961196899414, -48.46564483642578, -108.56712341308594, -119.70944213867188, 103.64225006103516, 22.72467041015625, 13.975202560424805, 41.86735534667969, -57.96427917480469, 5.306739807128906, 3.529857635498047, 140.03729248046875, 179.65982055664062, 139.1084747314453, 198.1095428466797, -31.769119262695312, -19.57326316833496, -15.159538269042969, -131.77281188964844, -122.21025085449219, 62.71699523925781, 40.33610153198242, 145.35806274414062, -36.08103942871094, 52.576011657714844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000402.npy"}
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 61.24797439575195, "std": 85.73719024658203, "min": -145.883056640625, "p10": -43.68704490661621, "median": 38.01521682739258, "p90": 167.9579818725586, "max": 262.5413818359375, "pos_frac": 0.765625, "sample": [-5.188102722167969, -1.28533935546875, 215.57205200195312, 20.124427795410156, 166.53878784179688, 42.50108337402344, 194.8101806640625, 132.62928771972656, 20.975982666015625, 5.488739013671875, 61.298301696777344, -44.90351867675781, 127.08979797363281, -46.28219985961914, 154.18084716796875, 96.71591186523438, 53.143760681152344, 9.96531867980957, 30.017196655273438, 5.452360153198242, -2.027402877807617, -57.617671966552734, 262.5413818359375, 152.36248779296875, -17.975433349609375, 125.82510375976562, 166.45855712890625, 5.507364273071289, 172.30372619628906, 18.62097930908203, -1.5380668640136719, 68.92559814453125, 31.263320922851562, 33.52935028076172, 53.56343078613281, -28.61733627319336, -1.836954116821289, 168.5662078857422, 18.79343605041504, 81.54745483398438, 22.796630859375, 1.4299488067626953, 165.36502075195312, 18.500438690185547, 115.77020263671875, -50.85015869140625, 138.98463439941406, 137.64895629882812, 77.26646423339844, 24.639801025390625, 163.6553955078125, 174.8646697998047, -145.883056640625, -40.84860610961914, 196.48077392578125, -133.8885040283203, 142.7471923828125, 77.62439727783203, 65.47770690917969, 159.2502899169922, -46.0115966796875, 5.499031066894531, 31.441566467285156, 128.8685760498047], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000403.npy"}
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 62.98853302001953, "std": 86.9850845336914, "min": -106.60794067382812, "p10": -41.5925422668457, "median": 52.98665428161621, "p90": 176.20272521972657, "max": 236.71246337890625, "pos_frac": 0.734375, "sample": [-22.770294189453125, 87.58833312988281, 99.01726531982422, 4.4338226318359375, 58.659942626953125, 11.359764099121094, 92.43843841552734, 11.063117980957031, -44.811744689941406, -34.08106994628906, 179.5272216796875, -31.937774658203125, -106.60794067382812, -84.0811767578125, -0.9866600036621094, 198.325439453125, -22.566909790039062, 2.950824737548828, 172.10855102539062, 165.77032470703125, 97.99954223632812, 128.69430541992188, 108.32022094726562, 185.09109497070312, -33.88104248046875, 142.5277862548828, -64.01380157470703, 175.1083984375, 153.4606170654297, 138.6697235107422, 7.1199188232421875, 123.5880126953125, -11.267980575561523, 143.20736694335938, -22.41657257080078, 6.650276184082031, 32.95874786376953, 130.10980224609375, 114.14791107177734, -0.20087814331054688, 0.5156784057617188, -5.8610076904296875, 122.09320068359375, 32.29412841796875, 158.77410888671875, 11.570972442626953, 127.47198486328125, 172.12643432617188, 176.90516662597656, 51.305335998535156, 1.2183189392089844, 231.90216064453125, 236.71246337890625, 64.48759460449219, 142.61764526367188, 112.1517562866211, 176.67172241210938, -61.23626708984375, -63.72773742675781, -71.09996032714844, 44.97160339355469, 54.667972564697266, 8.90008544921875, 14.560218811035156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000404.npy"}
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 49.75050354003906, "std": 78.2033920288086, "min": -137.04888916015625, "p10": -38.40311050415038, "median": 32.27101421356201, "p90": 167.6147705078125, "max": 203.64816284179688, "pos_frac": 0.671875, "sample": [141.34832763671875, -14.02889633178711, -71.96836853027344, 175.08737182617188, 16.87194061279297, 17.871681213378906, 165.9765167236328, -34.589752197265625, -3.642833709716797, -44.4693603515625, -18.62933349609375, 4.091375350952148, 3.4630298614501953, -4.539222717285156, 77.88965606689453, 11.566650390625, 192.49749755859375, -0.5912380218505859, -6.184364318847656, 102.13070678710938, 161.663330078125, 124.33627319335938, 9.396049499511719, 68.09169006347656, -1.4139518737792969, 35.4004020690918, 109.42776489257812, 168.31687927246094, -49.85546875, 99.65271759033203, 21.195552825927734, -4.966526031494141, 134.08648681640625, 160.05950927734375, -12.878692626953125, 70.1182632446289, 14.569538116455078, -137.04888916015625, 70.33425903320312, 47.47808837890625, 34.087608337402344, -44.95582580566406, 52.966209411621094, 47.35807800292969, 17.820472717285156, 30.45442008972168, 13.143257141113281, 45.92750549316406, 184.62274169921875, 145.0140380859375, -19.68380355834961, -14.524749755859375, 115.68994140625, 35.53553009033203, 170.56680297851562, 173.66537475585938, -15.6068115234375, 51.13051986694336, 139.65008544921875, -69.89141845703125, -14.838783264160156, 203.64816284179688, 144.17543029785156, -40.03740692138672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000405.npy"}
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 64.99994659423828, "std": 87.9071044921875, "min": -162.1294708251953, "p10": -40.86226387023925, "median": 63.57620620727539, "p90": 175.6826187133789, "max": 200.63389587402344, "pos_frac": 0.75, "sample": [165.0257568359375, 38.099151611328125, -34.13935852050781, -27.395065307617188, 134.7191162109375, 89.9924545288086, 155.72384643554688, 122.04443359375, -50.09895324707031, 200.63389587402344, 118.80205535888672, 139.0899658203125, -1.5560111999511719, -5.299095153808594, 183.3203125, 193.13038635253906, 195.30337524414062, 20.25345802307129, 60.64373779296875, 169.0345001220703, 78.89828491210938, 22.923118591308594, 94.81620788574219, 66.50867462158203, 19.07781219482422, 125.50645446777344, -38.67965316772461, 117.317626953125, -3.5525074005126953, 60.10043716430664, -0.45877838134765625, 0.3359832763671875, -144.2298126220703, -21.816455841064453, 45.427001953125, 176.00494384765625, 72.24005126953125, 174.93052673339844, -67.00006103515625, 189.89822387695312, 101.84681701660156, 38.13279724121094, 47.488929748535156, 199.95167541503906, 11.986724853515625, 161.32774353027344, -80.8402099609375, 16.785226821899414, 113.3493881225586, 157.18826293945312, 15.743741989135742, -41.79766845703125, 163.89002990722656, -98.23600769042969, 34.17570495605469, -162.1294708251953, 97.52743530273438, 107.72525024414062, -22.779830932617188, 30.315582275390625, 118.41972351074219, 106.29931640625, 53.6561279296875, 154.3931121826172], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000406.npy"}
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 50.83835220336914, "std": 85.81615447998047, "min": -144.73040771484375, "p10": -44.552468872070314, "median": 51.3558292388916, "p90": 167.85891876220703, "max": 219.6214141845703, "pos_frac": 0.671875, "sample": [-0.5488967895507812, -6.473493576049805, -44.82354736328125, 65.2956314086914, -38.314170837402344, 160.8045196533203, 165.39732360839844, -43.919952392578125, -85.489013671875, 109.82942199707031, 68.29290771484375, 59.2462043762207, 3.0108795166015625, 6.888824462890625, -98.18119812011719, 16.157752990722656, 80.25719451904297, 136.33175659179688, 187.46676635742188, 165.87518310546875, 61.164642333984375, 6.944032669067383, 63.627193450927734, 34.738494873046875, 76.40304565429688, 92.12492370605469, 2.2959671020507812, -10.534927368164062, 146.82415771484375, 189.95272827148438, 103.23689270019531, 3.2979812622070312, 177.32789611816406, 170.2447052001953, 123.23220825195312, 1.5843658447265625, -75.72091674804688, 118.02853393554688, -62.26234436035156, 39.577491760253906, -3.3329010009765625, 96.79385375976562, 168.70909118652344, 84.07105255126953, 187.8582763671875, 219.6214141845703, 114.87622833251953, -1.0655593872070312, 156.53329467773438, -0.6687965393066406, 43.4654541015625, -144.73040771484375, -139.61685180664062, -22.457275390625, -36.308624267578125, 41.60392379760742, -11.759401321411133, 157.73611450195312, -38.76337814331055, -8.268180847167969, -7.072853088378906, 87.61517333984375, 72.8625259399414, 66.7611083984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000407.npy"}
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 75.54414367675781, "std": 81.40088653564453, "min": -110.6732177734375, "p10": -14.14353561401367, "median": 73.67572402954102, "p90": 180.2080291748047, "max": 214.36715698242188, "pos_frac": 0.78125, "sample": [58.74358367919922, 166.49720764160156, 145.27679443359375, 91.69660949707031, -1.1898727416992188, -61.56791687011719, -110.6732177734375, 4.810466766357422, 148.527587890625, 83.51461029052734, 61.325035095214844, 195.88262939453125, -0.0026149749755859375, 114.3182373046875, 105.50660705566406, 214.36715698242188, 33.06350326538086, -28.63782501220703, 49.259544372558594, 27.5361328125, 88.07894897460938, 3.8782577514648438, -6.881753921508789, 148.91757202148438, 198.56060791015625, -83.68006896972656, 66.94681549072266, 198.47622680664062, 116.11995697021484, 153.3609619140625, 177.21963500976562, -18.745304107666016, 117.0886459350586, 11.678159713745117, 12.391204833984375, 169.90402221679688, 31.287189483642578, -14.780805587768555, 200.7344512939453, 56.62620544433594, 54.67528533935547, 171.46519470214844, 173.1068115234375, 181.9708709716797, 137.5614776611328, 15.733541488647461, 139.85830688476562, -12.656572341918945, 57.601287841796875, -3.7577667236328125, 163.75357055664062, 85.68328857421875, 181.48876953125, -103.00648498535156, 155.62472534179688, 119.080810546875, 45.57655715942383, -1.2811927795410156, 3.6899337768554688, 43.85859298706055, 121.64244842529297, 80.40463256835938, -4.7713470458984375, 102.08743286132812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000408.npy"}
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 37.16082000732422, "std": 77.45765686035156, "min": -158.88980102539062, "p10": -32.30727977752685, "median": 17.90991497039795, "p90": 159.79710388183597, "max": 191.6346893310547, "pos_frac": 0.640625, "sample": [17.142364501953125, 12.074487686157227, -16.1422119140625, -25.77923583984375, 1.7537651062011719, -24.576770782470703, 33.445457458496094, -33.19718551635742, 43.26454544067383, -22.660438537597656, -24.404430389404297, 176.73040771484375, -9.876665115356445, 173.99252319335938, 80.50645446777344, 4.634033203125, 65.29376220703125, -2.222991943359375, -4.377117156982422, 183.86904907226562, 127.20269775390625, -27.50870132446289, 109.57588195800781, 161.88519287109375, 13.778953552246094, -66.01513671875, -13.413070678710938, 79.41090393066406, 50.880950927734375, 1.180145263671875, -56.47010803222656, 61.07640075683594, 108.32567596435547, -17.70560073852539, 132.86239624023438, -12.966497421264648, 154.92489624023438, 112.88019561767578, 44.38887023925781, 18.677465438842773, -68.14106750488281, -1.8504257202148438, 95.12144470214844, -76.83505249023438, 15.729133605957031, -28.349411010742188, 9.86041259765625, 81.11097717285156, -150.888671875, 175.90499877929688, 27.96738052368164, 121.29654693603516, -0.773193359375, 176.71734619140625, -158.88980102539062, 14.337364196777344, 60.26348876953125, 71.20303344726562, 69.6542739868164, 58.09068298339844, 191.6346893310547, -30.230833053588867, 46.3311767578125, 66.58677673339844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000409.npy"}
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 60.74213790893555, "std": 69.91880798339844, "min": -112.22820281982422, "p10": -21.70900344848632, "median": 64.63409805297852, "p90": 158.2292617797852, "max": 217.63734436035156, "pos_frac": 0.796875, "sample": [63.983802795410156, 78.88841247558594, 18.145626068115234, -1.5297698974609375, 167.76583862304688, 26.300079345703125, 91.33039855957031, 32.0180549621582, -32.57737731933594, -112.22820281982422, 76.18001556396484, 165.9049530029297, 4.6457977294921875, -48.988853454589844, 150.88339233398438, 72.27704620361328, 132.26358032226562, -15.2930908203125, 93.53441619873047, 180.9955596923828, 217.63734436035156, 50.05866241455078, -2.6657257080078125, 99.37189483642578, 3.2908935546875, 122.39154052734375, 11.932632446289062, 105.91064453125, 27.27759552001953, -31.04735565185547, 116.65316009521484, -24.165878295898438, 21.404926300048828, 127.87084197998047, 25.810745239257812, 196.59011840820312, 145.80763244628906, 28.206348419189453, 65.94139099121094, 65.28439331054688, -15.976295471191406, -42.868804931640625, 77.19408416748047, 63.89421081542969, 131.34490966796875, 65.86970520019531, 152.80711364746094, -2.5888748168945312, 9.77920150756836, 72.09024047851562, 20.286590576171875, 92.46109008789062, 201.88592529296875, 65.34379577636719, 28.1396484375, 69.70162200927734, -9.145519256591797, 49.27647399902344, 3.6076507568359375, 131.9346923828125, 160.55303955078125, 69.18880462646484, -43.511322021484375, 18.16730499267578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000410.npy"}
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 59.26503372192383, "std": 85.3866958618164, "min": -189.24356079101562, "p10": -41.4570457458496, "median": 50.83882141113281, "p90": 168.75036010742187, "max": 253.43106079101562, "pos_frac": 0.71875, "sample": [-7.988777160644531, 155.23976135253906, 130.15708923339844, 161.923583984375, 93.41915130615234, -15.115447998046875, 13.285903930664062, -189.24356079101562, 104.77289581298828, 4.298454284667969, 83.17505645751953, 185.13107299804688, 120.37773895263672, 101.36683654785156, 142.89163208007812, 61.23705291748047, 88.67433166503906, -45.35395050048828, 63.118568420410156, 73.51921081542969, -5.307365417480469, 179.7583770751953, 167.5789794921875, 11.442758560180664, -33.14903259277344, 139.92808532714844, 2.6309585571289062, 15.629703521728516, -54.054141998291016, -33.83915710449219, 35.06610107421875, -13.906213760375977, -32.85755157470703, 40.05812072753906, 89.92111206054688, 32.501705169677734, -2.836763381958008, 157.69384765625, 13.8779296875, 253.43106079101562, 154.28890991210938, 44.578338623046875, -17.34844970703125, 169.25238037109375, 179.98696899414062, 170.3788604736328, -66.1011962890625, 30.275672912597656, -9.611042022705078, -81.5849609375, 27.431535720825195, -51.90208435058594, 167.33743286132812, 135.60415649414062, 74.04161834716797, 108.34412384033203, 113.99903106689453, 31.64975929260254, 216.36776733398438, 57.09930419921875, -44.72185516357422, 1.29364013671875, -2.4594268798828125, 96.30657958984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000411.npy"}
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 55.708438873291016, "std": 76.50900268554688, "min": -101.13494873046875, "p10": -38.726030349731445, "median": 44.02866554260254, "p90": 166.2738250732422, "max": 200.11009216308594, "pos_frac": 0.75, "sample": [-70.98853302001953, 149.66128540039062, 61.792457580566406, 175.68746948242188, 39.14289093017578, 121.4909896850586, 30.999954223632812, 165.25343322753906, -69.69854736328125, -0.21807479858398438, 65.45133972167969, 7.471458435058594, 76.30815124511719, 166.7111358642578, -7.196235656738281, -29.935264587402344, 151.24761962890625, -101.13494873046875, 42.422889709472656, 1.953826904296875, 170.39334106445312, 119.10298156738281, 40.71357727050781, 72.6379165649414, 87.23501586914062, -71.5110855102539, 16.718841552734375, 20.717174530029297, 24.598941802978516, 46.04713439941406, -4.653364181518555, 200.11009216308594, 3.941526412963867, 196.8602752685547, -38.073638916015625, 18.20751953125, 48.896270751953125, -8.30575180053711, 145.45858764648438, -4.873161315917969, -21.759836196899414, 10.030136108398438, 73.86500549316406, -39.0056266784668, 36.05288314819336, 177.37408447265625, 128.0745849609375, 124.66402435302734, 176.27430725097656, -61.292049407958984, 85.06534576416016, 104.50303649902344, 52.63905715942383, 15.162178039550781, 45.63444137573242, 147.8030242919922, -4.206201553344727, 75.2648696899414, -45.393802642822266, 80.57617950439453, 156.64776611328125, 4.05787467956543, 160.91403198242188, 21.74951171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000412.npy"}
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 47.663734436035156, "std": 96.02140808105469, "min": -185.72312927246094, "p10": -80.09554748535156, "median": 56.0307502746582, "p90": 168.3289001464844, "max": 291.63037109375, "pos_frac": 0.703125, "sample": [9.540050506591797, -104.03755187988281, 97.4920425415039, 64.84644317626953, -14.955619812011719, -37.88407897949219, -1.3636322021484375, 178.39559936523438, -52.690948486328125, 214.55221557617188, -178.18301391601562, 170.20181274414062, 87.22525787353516, -48.627685546875, 8.848587036132812, 19.359130859375, -76.54808044433594, 152.45318603515625, -0.5835800170898438, 12.086343765258789, 191.24716186523438, 105.92472839355469, 56.73625183105469, 69.01985168457031, 106.88031005859375, 34.96845245361328, 291.63037109375, 67.49520874023438, 16.53173065185547, 45.49669647216797, 90.06593322753906, 148.29595947265625, 74.50309753417969, 62.39054870605469, 177.1523895263672, -89.05191040039062, 92.41557312011719, -5.8342742919921875, 76.66217803955078, 9.221923828125, -124.85420989990234, 153.34890747070312, 166.937744140625, -85.1636734008789, 39.67144775390625, -81.61589050292969, -185.72312927246094, 50.99365997314453, 71.19712829589844, 9.073726654052734, -31.8818359375, -64.75369262695312, 156.22808837890625, -2.373218536376953, 57.12721252441406, 144.84547424316406, 62.32993698120117, 156.4547119140625, 27.399723052978516, 55.32524871826172, -63.92500305175781, 128.24801635742188, 168.92510986328125, 120.78506469726562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000413.npy"}
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 60.322879791259766, "std": 79.0887222290039, "min": -148.69024658203125, "p10": -33.43811264038085, "median": 53.42478942871094, "p90": 171.34082946777343, "max": 192.05258178710938, "pos_frac": 0.765625, "sample": [141.6097869873047, -4.918083190917969, 98.72203063964844, 81.56396484375, 10.787300109863281, 53.403564453125, -97.25970458984375, 97.67715454101562, 84.93570709228516, 14.664260864257812, 52.94573974609375, 156.3974151611328, 137.5474853515625, -51.459442138671875, 192.05258178710938, 62.59682083129883, -104.76641845703125, -37.25019073486328, 16.32215118408203, 99.9717025756836, 8.156671524047852, 70.92866516113281, 185.39810180664062, -24.95172882080078, 177.53256225585938, 35.035003662109375, 59.718265533447266, 15.850151062011719, 92.78927612304688, 125.19046020507812, 171.43118286132812, 24.501129150390625, 137.4219970703125, -37.07513427734375, 191.9626007080078, 89.99695587158203, 154.125732421875, 36.981719970703125, 53.446014404296875, 158.2923583984375, 31.116226196289062, -16.78137969970703, 12.659591674804688, -3.0858821868896484, 101.91389465332031, -12.772987365722656, 6.998619079589844, 108.77914428710938, -148.69024658203125, 71.0700912475586, 118.19378662109375, -4.7773590087890625, -1.4656524658203125, 40.7525749206543, 171.1300048828125, 183.24114990234375, 38.00719451904297, 16.915206909179688, 142.55419921875, 121.33492279052734, -54.11595153808594, -5.807065963745117, 20.952861785888672, 190.26577758789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000414.npy"}
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 51.81143569946289, "std": 84.64788055419922, "min": -151.2193603515625, "p10": -47.49115600585937, "median": 39.66792869567871, "p90": 166.2121124267578, "max": 256.042724609375, "pos_frac": 0.75, "sample": [-7.515764236450195, 124.49832916259766, 23.228036880493164, 47.634281158447266, 39.74312210083008, 12.122398376464844, 148.45166015625, 178.34866333007812, -52.585174560546875, 138.09286499023438, 83.27269744873047, -52.0029296875, 148.79486083984375, 165.30731201171875, 9.308677673339844, 168.19862365722656, 186.16690063476562, 106.23611450195312, 86.63658142089844, -1.582265853881836, -11.125518798828125, 78.32606506347656, 99.55706024169922, -1.4246139526367188, 29.902130126953125, 166.59988403320312, -126.00511169433594, 12.721733093261719, 56.60798645019531, 100.66571044921875, -28.426956176757812, 142.0757598876953, 72.19327545166016, 2.1357345581054688, -144.73507690429688, 256.042724609375, 109.63521575927734, 112.94117736816406, 156.72543334960938, -36.96368408203125, 102.0701675415039, -94.75300598144531, -73.23112487792969, -151.2193603515625, 21.374374389648438, 66.09980010986328, -15.298088073730469, 21.204757690429688, -3.6423568725585938, 5.00621223449707, 12.376371383666992, 61.05846405029297, 25.766342163085938, 15.966609954833984, 10.190422058105469, 78.45538330078125, -2.750307083129883, 120.3077163696289, 92.83587646484375, 4.864402770996094, 14.138648986816406, 39.592735290527344, 177.651123046875, 188.06259155273438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000415.npy"}
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 44.697818756103516, "std": 86.89057922363281, "min": -157.5208282470703, "p10": -49.037619781494136, "median": 29.209218978881836, "p90": 166.50616149902345, "max": 295.4414978027344, "pos_frac": 0.65625, "sample": [1.2950515747070312, -128.49136352539062, -8.269977569580078, 60.98695373535156, 42.099143981933594, 168.699951171875, -39.91151428222656, -50.050689697265625, 167.28121948242188, 181.17742919921875, 29.309661865234375, 153.4199676513672, 295.4414978027344, 21.77324676513672, -1.5704689025878906, 72.32392120361328, -1.8704357147216797, 102.55757904052734, -157.5208282470703, -27.685577392578125, -1.3065967559814453, 120.44169616699219, 27.75480842590332, 163.2350311279297, -0.8177833557128906, 113.54635620117188, 67.4027099609375, 174.42703247070312, 195.00311279296875, 67.87059783935547, 75.23233032226562, 136.96829223632812, -24.609893798828125, -55.446632385253906, 2.975343704223633, -0.5609817504882812, -78.58050537109375, -112.98704528808594, -2.0575122833251953, 4.068889617919922, -42.06438064575195, 31.306625366210938, -19.769927978515625, 7.6930999755859375, 101.38191223144531, -22.587520599365234, 111.74182891845703, -90.50051879882812, -1.7506885528564453, 109.95425415039062, 79.18074035644531, 5.495477676391602, -46.673789978027344, 112.37201690673828, 15.263313293457031, 29.108776092529297, 6.071113586425781, 40.83906555175781, 80.85018157958984, 38.00611114501953, 164.69769287109375, 136.20176696777344, 177.39479064941406, 82.89450073242188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000416.npy"}
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 59.71507263183594, "std": 92.38331604003906, "min": -251.8291015625, "p10": -54.89942779541015, "median": 60.125885009765625, "p90": 168.5431671142578, "max": 268.99468994140625, "pos_frac": 0.796875, "sample": [31.089637756347656, 6.016487121582031, -13.763694763183594, -169.67803955078125, 174.3370361328125, 144.74932861328125, -251.8291015625, 169.52493286132812, 69.08790588378906, 97.24868774414062, 268.99468994140625, 34.79725646972656, 17.429908752441406, 60.13825225830078, 103.47504425048828, -90.47848510742188, 164.63980102539062, 121.41936492919922, 3.7852935791015625, 70.93658447265625, 115.789794921875, 54.40740966796875, 83.29161071777344, 59.419769287109375, 60.11351776123047, -48.33418273925781, 170.3948211669922, 48.84638977050781, 28.890071868896484, 153.90496826171875, -78.11267852783203, 162.88345336914062, 171.02626037597656, -82.34144592285156, 145.89407348632812, -37.75236511230469, 49.05845642089844, -7.276298522949219, 156.89129638671875, 95.3082504272461, 166.25238037109375, 7.874366760253906, 171.19256591796875, 68.78104400634766, -63.730499267578125, 34.55964660644531, 2.7670841217041016, 48.99883270263672, 0.8796844482421875, 72.84727478027344, 25.974456787109375, 158.039794921875, 6.8677520751953125, -21.822174072265625, 95.9510726928711, -57.713104248046875, 87.61917114257812, 27.06912612915039, 98.34507751464844, -1.0237712860107422, 137.31558227539062, 67.48898315429688, 149.75685119628906, 223.24951171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000417.npy"}
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 46.89039993286133, "std": 85.20542907714844, "min": -142.32040405273438, "p10": -47.53012390136719, "median": 31.948883056640625, "p90": 166.92981567382816, "max": 305.878173828125, "pos_frac": 0.671875, "sample": [76.33197021484375, 135.645751953125, 118.2494888305664, 57.889923095703125, 62.02399826049805, 170.33554077148438, 11.45245361328125, 94.24384307861328, 176.99488830566406, -47.150611877441406, 158.98312377929688, 193.1460418701172, -37.55561828613281, 109.45303344726562, 65.24262237548828, -85.7646713256836, 153.15687561035156, 11.033103942871094, 141.6317901611328, -3.1416244506835938, -0.8386077880859375, 305.878173828125, 93.84449005126953, -20.482505798339844, 174.315673828125, 25.128089904785156, -25.634124755859375, 1.1842880249023438, 22.725627899169922, -46.779876708984375, -142.32040405273438, -77.6070556640625, 185.3985137939453, 37.934967041015625, 174.05218505859375, -16.803085327148438, 7.914005279541016, -14.255874633789062, 10.267906188964844, 19.144737243652344, -47.692771911621094, -48.069313049316406, -88.51985168457031, -14.289361953735352, 104.4394302368164, -19.296905517578125, -17.32396697998047, 39.993125915527344, 18.300695419311523, -4.470298767089844, 41.877525329589844, 79.57150268554688, 145.63314819335938, 143.10479736328125, 13.811592102050781, 25.962799072265625, -83.49079895019531, 79.81033325195312, 89.52177429199219, 109.81172943115234, 90.8260269165039, -44.44071960449219, 57.38458251953125, 53.261474609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000418.npy"}
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 67.16665649414062, "std": 88.55240631103516, "min": -144.69615173339844, "p10": -20.727886581420893, "median": 30.18474578857422, "p90": 182.28189544677736, "max": 248.22731018066406, "pos_frac": 0.84375, "sample": [-9.955257415771484, 248.22731018066406, 7.221641540527344, 4.264549255371094, -71.44941711425781, 21.087051391601562, 181.54864501953125, 21.384811401367188, 17.95366668701172, 25.63766860961914, -144.69615173339844, 159.66319274902344, 182.5961456298828, 103.85606384277344, -42.89520263671875, 22.356277465820312, 71.58335876464844, 29.0152587890625, 193.49359130859375, 21.497940063476562, 69.29670715332031, 90.09851837158203, -22.8341064453125, 105.2896728515625, 5.5692901611328125, 3.758941650390625, 208.41329956054688, -24.80291748046875, 235.1134490966797, 92.83251953125, 30.262104034423828, -73.13072967529297, -15.813373565673828, 167.44500732421875, 242.4818572998047, 95.40289306640625, 153.14610290527344, 6.800239562988281, 18.51348876953125, 90.56219482421875, 0.7527217864990234, 180.16250610351562, 16.011749267578125, 12.233016967773438, 166.66673278808594, 10.554275512695312, 80.06587982177734, 37.08549118041992, 25.350635528564453, 175.55535888671875, 170.48895263671875, 0.4421501159667969, 30.10738754272461, 27.33935546875, 147.93716430664062, 14.144372940063477, 130.74661254882812, 134.9329833984375, 207.20655822753906, -9.623443603515625, 96.66815185546875, 50.2855224609375, -104.0062026977539, 176.7615966796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000419.npy"}
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 35.994956970214844, "std": 92.27469635009766, "min": -186.33120727539062, "p10": -70.08271560668943, "median": 12.833213806152344, "p90": 179.93039245605468, "max": 210.16802978515625, "pos_frac": 0.625, "sample": [-6.563896179199219, -21.93472671508789, 133.5387725830078, 11.273368835449219, 102.05779266357422, -143.45335388183594, 18.488998413085938, 5.735851287841797, 2.0850296020507812, -168.72198486328125, -16.31378173828125, 79.03046417236328, -0.666595458984375, 150.63449096679688, 10.253753662109375, 182.25274658203125, 197.92605590820312, 92.05319213867188, 180.16854858398438, 155.17074584960938, 17.223241806030273, -50.811702728271484, -0.7634258270263672, 59.333763122558594, 98.4052734375, 166.93002319335938, -82.06668853759766, -85.01803588867188, 72.10647583007812, 7.397006988525391, -104.7523193359375, 39.46745300292969, -35.955543518066406, 25.802234649658203, 0.4086723327636719, -76.4090576171875, -20.28156280517578, -0.5382766723632812, 187.79031372070312, 14.393058776855469, -17.469871520996094, -186.33120727539062, -55.321250915527344, -6.6155548095703125, 151.08047485351562, 67.05326080322266, 15.663471221923828, 179.37469482421875, -0.892486572265625, 55.0399169921875, 188.17837524414062, 38.725799560546875, 137.0387420654297, 2.9342174530029297, 42.81553268432617, 126.887451171875, 210.16802978515625, 7.8984222412109375, -1.5076942443847656, 21.993576049804688, -3.4045486450195312, 203.84918212890625, -45.35972595214844, -23.79793930053711], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000420.npy"}
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 50.87043380737305, "std": 102.44617462158203, "min": -174.95175170898438, "p10": -91.99831542968748, "median": 36.723026275634766, "p90": 176.42424926757812, "max": 266.9171447753906, "pos_frac": 0.765625, "sample": [-129.31561279296875, 177.18856811523438, 4.928535461425781, 165.85504150390625, 159.71697998046875, 15.30517578125, 95.10578918457031, 14.259597778320312, 2.0188827514648438, 266.9171447753906, 221.2113800048828, -77.861083984375, 174.64083862304688, 49.04857635498047, -43.882450103759766, 56.80960464477539, 22.598838806152344, -40.82249450683594, 178.5560302734375, 124.9560546875, 111.97013854980469, 121.12022399902344, 0.9331836700439453, 4.20245361328125, 80.29331970214844, 2.954753875732422, 119.4653549194336, -174.95175170898438, -169.36444091796875, -68.50550842285156, 5.136735916137695, 166.5923309326172, 145.0875244140625, 10.242849349975586, 71.04830169677734, 181.0755615234375, 156.4157257080078, 136.88055419921875, 36.73979949951172, 2.6105918884277344, 165.4646759033203, -3.100688934326172, 92.80155944824219, -159.58746337890625, -99.92320251464844, 111.3225326538086, 1.7321319580078125, 55.19713592529297, -106.92898559570312, 32.81111145019531, 238.66574096679688, 180.63595581054688, -1.8716659545898438, 36.70625305175781, 32.852294921875, -98.05712890625, 33.924049377441406, 82.36235046386719, 117.6075439453125, -74.71074676513672, 169.06736755371094, 106.04395294189453, -34.72362518310547, 0.23360824584960938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000421.npy"}
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 67.65065002441406, "std": 82.23448181152344, "min": -89.32587432861328, "p10": -38.53519744873046, "median": 55.198923110961914, "p90": 176.0497955322266, "max": 233.83993530273438, "pos_frac": 0.78125, "sample": [180.79586791992188, 233.83993530273438, -59.35066223144531, -4.538906097412109, 170.39976501464844, -12.427032470703125, 164.7812042236328, -58.55552673339844, 111.67852783203125, 201.739990234375, 10.424057006835938, 52.61750793457031, 12.826934814453125, 35.72149658203125, 183.7607879638672, 29.250957489013672, 163.00778198242188, 178.4712371826172, -82.04158020019531, 147.600830078125, 71.26673889160156, -59.848480224609375, -3.339519500732422, -0.18442916870117188, 102.41456604003906, 155.83157348632812, 141.73057556152344, 137.92222595214844, 212.07235717773438, 104.94943237304688, 102.22708129882812, 86.52952575683594, 31.879398345947266, -5.56669807434082, 40.568931579589844, 41.16804885864258, 109.96710968017578, 32.964569091796875, 164.89825439453125, 19.05059814453125, -26.44994354248047, -65.93820190429688, 155.16134643554688, 109.25350952148438, 2.9781494140625, -23.536476135253906, 5.7042083740234375, 117.07080841064453, 7.4812774658203125, 58.4721794128418, 57.780338287353516, 212.15716552734375, 30.2997989654541, -43.71459197998047, 22.208980560302734, 94.01371765136719, 20.38580894470215, -89.32587432861328, 14.409574508666992, 77.22887420654297, 159.70614624023438, 112.6991195678711, 28.734207153320312, 146.35647583007812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000422.npy"}
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 68.36428833007812, "std": 86.45242309570312, "min": -114.62808990478516, "p10": -13.507750701904296, "median": 54.12356376647949, "p90": 186.87008514404297, "max": 249.4006805419922, "pos_frac": 0.78125, "sample": [32.13249206542969, -4.376434326171875, 163.2828369140625, -90.50874328613281, 187.52801513671875, 2.4146785736083984, 67.38363647460938, -41.47931671142578, 171.22972106933594, 52.12591552734375, -11.860870361328125, 160.93092346191406, 4.468170166015625, -60.209556579589844, 0.454803466796875, 4.3189239501953125, 13.330047607421875, -4.4414215087890625, 176.6947479248047, -87.08744049072266, 97.17031860351562, 115.3354721069336, -13.781143188476562, -5.509363174438477, 69.09791564941406, 225.69517517089844, 137.46026611328125, 6.691486358642578, -114.62808990478516, 118.74735260009766, 204.52639770507812, 151.667236328125, 107.9031982421875, 137.18284606933594, 84.93543243408203, 41.712432861328125, 28.649322509765625, 173.3767547607422, 195.9970245361328, 185.3349151611328, 18.679588317871094, 9.483444213867188, 8.64739990234375, 108.38349914550781, 144.30484008789062, 150.46209716796875, 173.85223388671875, 69.12843322753906, -7.122901916503906, 72.22115325927734, 249.4006805419922, 100.70671081542969, 3.585733413696289, 14.56797981262207, 127.30928802490234, 198.11398315429688, -7.89208984375, 3.8184356689453125, -12.869834899902344, 29.391036987304688, 195.08377075195312, 2.771820068359375, 56.121212005615234, -16.730253219604492], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000423.npy"}
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 65.47486114501953, "std": 89.11155700683594, "min": -143.24374389648438, "p10": -48.36287422180176, "median": 49.003990173339844, "p90": 181.54280700683594, "max": 280.3262939453125, "pos_frac": 0.796875, "sample": [-9.149566650390625, 206.49624633789062, 34.23298645019531, 142.9525909423828, -69.50140380859375, 153.59674072265625, 280.3262939453125, 153.93063354492188, -102.42304992675781, -69.16108703613281, 18.006290435791016, 15.568307876586914, 76.40784454345703, -143.24374389648438, 139.6300506591797, 10.595500946044922, 164.66624450683594, -48.080039978027344, 56.01678466796875, 23.374046325683594, -28.321218490600586, 86.18667602539062, 144.90948486328125, 21.251304626464844, -48.48408889770508, 174.4663543701172, 211.53689575195312, 13.275405883789062, 189.97454833984375, 32.12467956542969, 92.53473663330078, 109.79085540771484, 8.459495544433594, 59.29790496826172, 27.309326171875, 61.56770324707031, 150.8031768798828, 206.6381072998047, 147.64102172851562, 99.14949798583984, 118.01742553710938, 22.25165557861328, 12.16659927368164, 181.67730712890625, 16.308265686035156, 182.27931213378906, 118.28160095214844, -1.0343074798583984, -77.7000503540039, -2.979837417602539, 169.46136474609375, 62.72178649902344, 162.00067138671875, 181.22897338867188, 4.410179138183594, 24.116539001464844, 44.60365295410156, -73.86492919921875, 47.763458251953125, 126.05796813964844, 7.000480651855469, 25.444908142089844, -4.419914245605469, 50.24452209472656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000424.npy"}
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 52.69361877441406, "std": 86.61487579345703, "min": -138.5200958251953, "p10": -51.54704627990722, "median": 38.21492385864258, "p90": 174.12039489746093, "max": 229.5235595703125, "pos_frac": 0.75, "sample": [157.10971069335938, 126.93108367919922, 36.26264190673828, 11.5955810546875, 170.88729858398438, -15.966075897216797, 205.12063598632812, -12.136661529541016, 62.53533172607422, 102.61660766601562, 107.10235595703125, -69.39337921142578, 55.21579360961914, -28.694671630859375, -46.56272506713867, 112.60317993164062, 45.637290954589844, 183.82449340820312, 14.40191650390625, 63.49098587036133, -135.43496704101562, 64.67985534667969, 1.1735420227050781, 70.71752166748047, 134.83673095703125, 11.473625183105469, 0.11038780212402344, 195.7847900390625, 12.731208801269531, 179.08868408203125, 162.7584686279297, -61.15251541137695, 103.58982849121094, -2.9969863891601562, 55.5101318359375, 37.40290069580078, 39.026947021484375, 173.9005126953125, 22.56741714477539, 22.144756317138672, 184.60406494140625, -106.02664947509766, 58.55809020996094, 108.46539306640625, 28.738845825195312, -138.5200958251953, 32.60997009277344, 68.84971618652344, 43.335693359375, -10.458389282226562, 174.21463012695312, -22.31405258178711, -75.05628204345703, 170.93142700195312, 17.47136688232422, 1.1813926696777344, 150.72927856445312, 36.60343933105469, -9.57574462890625, 229.5235595703125, -49.43699264526367, 7.1735382080078125, 152.74644470214844, -52.45135498046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000425.npy"}
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 32.75650405883789, "std": 97.21167755126953, "min": -237.26817321777344, "p10": -73.4739616394043, "median": 17.220722198486328, "p90": 155.740544128418, "max": 238.85052490234375, "pos_frac": 0.609375, "sample": [-6.250759124755859, -2.515094757080078, 134.07504272460938, 158.50634765625, 18.011146545410156, 139.5962371826172, 14.570953369140625, 174.6660614013672, 17.08446502685547, 66.22334289550781, -18.14948272705078, -15.2034912109375, -1.7380599975585938, -144.67709350585938, 78.97422790527344, 17.356979370117188, 97.60160827636719, 31.127166748046875, 54.148277282714844, -30.43604278564453, -175.5787353515625, -23.652435302734375, 47.67401123046875, 72.26396179199219, -10.780403137207031, 9.233596801757812, 32.92022705078125, -215.82911682128906, 115.28695678710938, -5.576789855957031, 188.55368041992188, 110.27953338623047, 149.28700256347656, -23.574247360229492, -237.26817321777344, 64.56167602539062, -8.263992309570312, 0.9540386199951172, -2.5331573486328125, 238.85052490234375, 168.33778381347656, 214.8589630126953, -107.4208984375, -117.62454223632812, 4.108478546142578, 199.14578247070312, -10.836837768554688, 2.118549346923828, -4.489315032958984, -71.14839172363281, 84.58130645751953, 113.28775787353516, -12.307815551757812, 93.05067443847656, 46.598411560058594, -74.47063446044922, 12.966129302978516, 128.02896118164062, 43.056854248046875, -63.4185791015625, 116.47323608398438, 77.63038635253906, 145.6579132080078, -1.5479316711425781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000426.npy"}
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 52.506629943847656, "std": 94.78681182861328, "min": -154.5464324951172, "p10": -76.84007186889647, "median": 36.43975830078125, "p90": 175.65672912597657, "max": 296.88507080078125, "pos_frac": 0.765625, "sample": [93.51542663574219, 21.83449363708496, -11.66775894165039, -90.08281707763672, -68.22808837890625, 145.88702392578125, 175.92416381835938, 33.4329719543457, -7.3897857666015625, -84.29327392578125, 156.2735595703125, -22.562885284423828, -80.53092193603516, -116.63916015625, 38.44031524658203, 162.18051147460938, 96.37207794189453, 76.28202819824219, 0.6121902465820312, -25.655105590820312, 0.06950187683105469, 73.32798767089844, 153.30621337890625, 73.95021057128906, 27.5955810546875, 157.76068115234375, 139.39959716796875, 145.88516235351562, 5.760505676269531, 32.627296447753906, 183.5821075439453, 16.513046264648438, 89.38253784179688, 43.898155212402344, 21.36205291748047, 171.06103515625, 34.43920135498047, -154.5464324951172, 193.03060913085938, 182.50204467773438, 3.5179672241210938, 31.190879821777344, 45.11968994140625, 19.37530517578125, -142.36550903320312, -52.939674377441406, 138.43714904785156, 52.65557861328125, 27.592742919921875, -106.06575012207031, 175.03271484375, 225.04391479492188, 0.9244537353515625, 71.51836395263672, 83.12578582763672, 28.194915771484375, 93.92274475097656, 60.94648361206055, 5.9803314208984375, -21.6024169921875, 101.65068817138672, -60.945892333984375, 296.88507080078125, 198.6165771484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000427.npy"}
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 60.80500793457031, "std": 92.87889862060547, "min": -166.1099090576172, "p10": -38.170899581909175, "median": 57.19432067871094, "p90": 187.13856201171876, "max": 310.7712707519531, "pos_frac": 0.703125, "sample": [-91.60649108886719, 91.73799896240234, 188.73211669921875, -1.1314163208007812, 2.7268829345703125, 5.6866302490234375, 29.463016510009766, -126.21818542480469, -54.687408447265625, 51.136802673339844, 187.99514770507812, -28.015579223632812, 134.92324829101562, -17.33367156982422, 161.81423950195312, 55.6646728515625, 169.45921325683594, 21.052085876464844, -6.683998107910156, 58.723968505859375, 189.37815856933594, -1.9724769592285156, 1.4468765258789062, 1.9315185546875, 195.76707458496094, 98.99200439453125, 62.56492614746094, -166.1099090576172, 65.89604187011719, -108.22921752929688, 96.5689697265625, -0.18178558349609375, 17.761722564697266, 44.69301986694336, 66.41211700439453, 187.344482421875, 166.39337158203125, 141.9886016845703, 62.86455535888672, -22.55824089050293, 198.0983428955078, 310.7712707519531, 90.19039154052734, 152.12139892578125, -50.20439147949219, -2.6580810546875, 85.61923217773438, 59.72644805908203, -0.9264392852783203, -11.088134765625, 21.06848907470703, 21.61406707763672, -1.8192520141601562, 2.9627952575683594, 168.61474609375, 179.97532653808594, 186.6580810546875, 63.58378601074219, 135.5563507080078, 158.60525512695312, 122.2803955078125, 140.5032196044922, -34.289451599121094, -39.83437728881836], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000428.npy"}
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 46.66352844238281, "std": 88.67630004882812, "min": -142.67095947265625, "p10": -64.78830986022946, "median": 39.92943572998047, "p90": 166.57778472900392, "max": 196.22450256347656, "pos_frac": 0.734375, "sample": [95.10740661621094, -138.3326416015625, 2.430461883544922, -15.744682312011719, 150.57620239257812, 127.0339584350586, 45.0825080871582, 26.982139587402344, -15.94265365600586, 24.929489135742188, 48.180450439453125, -133.40777587890625, 76.6668701171875, 49.88642883300781, -3.01446533203125, 138.93280029296875, -140.47225952148438, 28.48138427734375, -23.344573974609375, 113.06421661376953, 133.23204040527344, -32.75449752807617, 133.56996154785156, 161.17881774902344, 168.22354125976562, 170.76705932617188, 185.4642333984375, 3.2642974853515625, 40.89434051513672, -33.5925178527832, -77.18476867675781, -35.86323928833008, 10.971900939941406, -16.424095153808594, 91.4212875366211, -4.54248046875, -89.44656372070312, 83.84651947021484, 0.7731094360351562, 4.1788787841796875, 171.86302185058594, 145.327880859375, 37.866111755371094, 162.73768615722656, 141.09983825683594, -142.67095947265625, -125.93190002441406, 19.207317352294922, 5.847648620605469, 145.54898071289062, 21.85901641845703, 38.96453094482422, 15.149879455566406, -16.964611053466797, 57.40108108520508, 64.24513244628906, 196.22450256347656, 71.13356018066406, 189.45289611816406, 179.33209228515625, 6.566383361816406, 126.43006896972656, 43.79524230957031, 76.9072494506836], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000429.npy"}
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 75.5631103515625, "std": 91.55827331542969, "min": -223.47598266601562, "p10": -25.271463012695303, "median": 74.65205383300781, "p90": 192.37509765625, "max": 205.25119018554688, "pos_frac": 0.796875, "sample": [74.74261474609375, 30.124067306518555, -28.741622924804688, 32.242774963378906, 0.30472564697265625, 172.65496826171875, -0.399566650390625, 165.6938018798828, 42.32579803466797, 112.46105194091797, -99.23812103271484, 153.02191162109375, 191.045166015625, 7.335321426391602, 174.73033142089844, 110.67780303955078, 88.51554107666016, 44.73045349121094, 192.945068359375, -36.21367645263672, 63.68293762207031, 205.25119018554688, 166.5831298828125, 119.18409729003906, -0.7358856201171875, 178.26063537597656, 25.415420532226562, 194.83599853515625, 119.81158447265625, 177.0098876953125, 93.43695068359375, 9.4638671875, 84.74676513671875, 178.925537109375, -43.85573196411133, 193.5482635498047, 203.75161743164062, 135.63876342773438, -113.43812561035156, 147.67617797851562, 8.077224731445312, 97.89775848388672, 39.76108932495117, 63.218421936035156, 15.441871643066406, 155.13475036621094, 18.12946319580078, 1.3320674896240234, -223.47598266601562, -8.769645690917969, -4.030548095703125, 169.10269165039062, -0.8831253051757812, 164.03839111328125, 37.334869384765625, 193.01400756835938, 197.41697692871094, 131.61061096191406, 156.42494201660156, -17.174423217773438, 74.56149291992188, -49.1441764831543, 9.92425537109375, 38.944091796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000430.npy"}
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 53.404659271240234, "std": 87.36559295654297, "min": -169.05909729003906, "p10": -29.419369506835935, "median": 38.35650634765625, "p90": 173.78366088867188, "max": 215.05859375, "pos_frac": 0.75, "sample": [-169.05909729003906, 137.17066955566406, 72.33624267578125, 43.62682342529297, 15.11147689819336, 128.06771850585938, 107.48040771484375, 144.5225830078125, 171.88844299316406, -1.6301212310791016, 7.245262145996094, 178.88153076171875, 171.65293884277344, -9.519248962402344, 44.37842559814453, 19.57350730895996, 214.83926391601562, -1.0369110107421875, 215.05859375, 34.311553955078125, 63.758338928222656, -166.5392608642578, 24.953575134277344, 85.06948852539062, -39.60614013671875, 174.35260009765625, 72.79560852050781, -89.78700256347656, 46.710174560546875, -6.691946029663086, 61.33946228027344, -7.342441558837891, 155.65228271484375, 173.43502807617188, -98.71693420410156, -3.4973831176757812, 175.104248046875, 97.78321838378906, -26.036651611328125, -45.958343505859375, -27.105148315429688, 4.817230224609375, 0.38704681396484375, 143.91481018066406, 1.4173660278320312, 18.973976135253906, 170.0237274169922, 57.74969482421875, 0.9829483032226562, 203.48646545410156, 17.59002685546875, 44.69694900512695, 5.00885009765625, 26.077747344970703, 104.2657241821289, 7.939693450927734, 42.401458740234375, -27.06574249267578, 14.321109771728516, 173.93307495117188, 169.17221069335938, 98.01764678955078, 25.624496459960938, -30.411178588867188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000431.npy"}
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 56.80330276489258, "std": 89.5664291381836, "min": -203.2420196533203, "p10": -47.55743942260742, "median": 50.009389877319336, "p90": 178.59453277587892, "max": 204.4715576171875, "pos_frac": 0.703125, "sample": [-16.203628540039062, -32.561065673828125, -50.70274353027344, -203.2420196533203, 3.9122982025146484, -0.6719036102294922, 10.542160034179688, 78.44391632080078, -45.60582733154297, 172.58761596679688, 117.22920227050781, -84.41869354248047, 41.5001220703125, 65.74713134765625, 177.57598876953125, -140.153564453125, 59.04216766357422, 19.840850830078125, 200.49586486816406, 137.61068725585938, 5.897987365722656, 93.96772003173828, 146.67648315429688, -16.4493408203125, -4.648921966552734, 184.50479125976562, -0.7728500366210938, 179.6907501220703, 155.58274841308594, -30.50441551208496, -53.06111145019531, 40.922054290771484, 10.27702522277832, -32.17335510253906, 38.258514404296875, 72.37843322753906, 180.92066955566406, 145.33566284179688, 49.508636474609375, 137.99942016601562, -22.63083267211914, 50.5101432800293, 132.93740844726562, 21.344146728515625, 138.99884033203125, 4.3908843994140625, 63.506080627441406, 179.0310516357422, 19.107051849365234, 186.83009338378906, 130.59219360351562, 110.46026611328125, 124.4048843383789, -48.39384460449219, 74.6610107421875, 204.4715576171875, -8.894172668457031, 143.8597869873047, 62.65753173828125, 11.377693176269531, -71.13671112060547, 159.44894409179688, -2.263916015625, 154.86172485351562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000432.npy"}
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 69.82337951660156, "std": 87.93963623046875, "min": -182.60818481445312, "p10": -36.65698280334472, "median": 53.58957290649414, "p90": 176.28937377929688, "max": 202.57640075683594, "pos_frac": 0.78125, "sample": [46.905216217041016, 135.86322021484375, 23.780620574951172, 61.46793746948242, -12.343612670898438, 45.552215576171875, 0.0038356781005859375, 137.54605102539062, -21.952564239501953, -0.143463134765625, 9.378952026367188, 27.27014923095703, 128.17337036132812, 43.69469451904297, 176.565185546875, -108.7391586303711, 101.48890686035156, 166.57420349121094, 151.527099609375, -39.16301345825195, 57.353240966796875, 28.178375244140625, 7.368705749511719, 178.16616821289062, -182.60818481445312, 63.940792083740234, 161.72769165039062, 37.07450866699219, 168.32887268066406, 147.08773803710938, 35.070945739746094, 78.39409637451172, -54.84934997558594, 3.9843826293945312, 48.351314544677734, -13.408447265625, -49.48097610473633, 48.61967468261719, 93.892578125, 178.93157958984375, 202.57640075683594, 175.64581298828125, 193.29653930664062, 49.1180419921875, 148.53453063964844, -99.3699722290039, 49.2183837890625, 135.38973999023438, -27.940834045410156, 174.6063690185547, -14.106285095214844, 121.91919708251953, 175.37802124023438, -30.80957794189453, 49.825904846191406, 170.02517700195312, 131.60792541503906, 62.8477783203125, 177.84317016601562, 191.45919799804688, 167.50486755371094, 42.38483428955078, 171.7420654296875, -59.574951171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000433.npy"}
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 75.5843734741211, "std": 101.4736328125, "min": -196.9726104736328, "p10": -57.69967956542967, "median": 85.69221496582031, "p90": 189.39988861083984, "max": 303.30389404296875, "pos_frac": 0.796875, "sample": [-179.5657958984375, 85.68258666992188, 139.08364868164062, 35.763916015625, 160.43341064453125, 193.10211181640625, -67.62759399414062, 190.07400512695312, -74.27146911621094, 85.70184326171875, -2.9122142791748047, -39.88270568847656, -196.9726104736328, 76.38441467285156, 157.45297241210938, -65.33552551269531, 127.17616271972656, 71.39968872070312, 303.30389404296875, 182.31895446777344, 104.45626068115234, 2.325349807739258, 218.2153778076172, 8.713493347167969, -129.87338256835938, 33.88066101074219, 181.75839233398438, 109.43797302246094, -1.8529701232910156, 138.270751953125, 9.34316635131836, 101.30458068847656, 109.46476745605469, 178.4066925048828, 193.79562377929688, 177.89080810546875, -11.826278686523438, 117.99724578857422, 203.0611572265625, 12.758071899414062, 187.8269500732422, 10.275558471679688, 37.766666412353516, 156.02943420410156, 21.728843688964844, 19.925148010253906, 32.78883361816406, 114.12056732177734, 83.42376708984375, 164.58279418945312, 7.464244842529297, 195.732177734375, -125.27578735351562, -24.443618774414062, 12.124183654785156, 112.8631362915039, 150.9609832763672, 174.78509521484375, 149.70440673828125, -1.0036048889160156, 62.92611312866211, 111.6301040649414, 181.03536987304688, 61.59099578857422], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000434.npy"}
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 58.70793914794922, "std": 91.64726257324219, "min": -159.64700317382812, "p10": -43.487387847900386, "median": 45.61909866333008, "p90": 181.84774322509767, "max": 211.74710083007812, "pos_frac": 0.671875, "sample": [108.39948272705078, 90.21788024902344, -11.512840270996094, 87.95220947265625, -111.84001922607422, 30.157508850097656, 10.528274536132812, -0.08660888671875, 142.520751953125, -24.259017944335938, 183.02227783203125, 170.49282836914062, -7.90283203125, 107.0669937133789, 170.04971313476562, 189.73658752441406, 28.426536560058594, -30.341934204101562, -15.388847351074219, 172.14111328125, 47.78044891357422, -7.551124572753906, 160.22805786132812, 190.90321350097656, 22.252182006835938, 40.46819305419922, -27.11431884765625, 61.96192932128906, 178.41757202148438, 191.40933227539062, -12.814910888671875, 80.28594970703125, -159.64700317382812, 148.15257263183594, 34.543697357177734, 81.57402038574219, -10.744377136230469, -29.55450439453125, 144.87339782714844, 185.93264770507812, 211.74710083007812, 197.33566284179688, 179.10716247558594, 55.98107147216797, 0.41927337646484375, 52.252601623535156, 42.4892463684082, -40.64696502685547, -33.71822738647461, 173.85482788085938, -90.44119262695312, 20.200363159179688, -63.097015380859375, 105.16568756103516, 109.02548217773438, 84.32168579101562, 160.00086975097656, -77.3472900390625, -95.06449127197266, 43.45774841308594, 2.75604248046875, -44.7047119140625, 153.56488037109375, -0.088897705078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000435.npy"}
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 47.23030090332031, "std": 96.241455078125, "min": -178.08157348632812, "p10": -77.67884979248046, "median": 26.669910430908203, "p90": 167.47641143798828, "max": 228.5269775390625, "pos_frac": 0.703125, "sample": [106.04470825195312, -30.134456634521484, 163.31985473632812, 3.278566360473633, 103.3134765625, 6.685930252075195, 96.43255615234375, 162.38726806640625, -102.75812530517578, 26.81011199951172, 26.529708862304688, 167.27183532714844, 8.937400817871094, 116.16885375976562, -51.55976486206055, 181.82440185546875, 130.35733032226562, 119.37876892089844, -178.08157348632812, 65.4771957397461, 14.22402572631836, 64.85735321044922, -27.67901611328125, 14.784080505371094, 45.985076904296875, -13.527931213378906, 138.49911499023438, -0.8969783782958984, 144.90167236328125, 23.32817840576172, 75.25180053710938, -17.580535888671875, -173.76272583007812, 111.11335754394531, -36.686180114746094, 60.1348876953125, 167.5640869140625, -85.1158676147461, -3.1222076416015625, -77.75457763671875, 159.5123291015625, 125.2112808227539, 23.42349624633789, 11.681900024414062, -105.87948608398438, 136.1439208984375, -77.50215148925781, 205.20481872558594, 91.98538208007812, 228.5269775390625, 154.0091552734375, 1.2598724365234375, 192.27423095703125, 117.1160888671875, -23.715408325195312, 10.847957611083984, 177.03305053710938, -75.30628967285156, 14.666358947753906, 2.374086380004883, -55.8404541015625, 192.1354217529297, -107.7250747680664, 79.09993743896484], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000436.npy"}
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 54.174720764160156, "std": 105.50337982177734, "min": -250.24026489257812, "p10": -76.25550155639648, "median": 61.35445022583008, "p90": 193.8257049560547, "max": 251.10720825195312, "pos_frac": 0.640625, "sample": [251.10720825195312, 210.61441040039062, 175.12094116210938, 111.33218383789062, 66.7310562133789, -22.10696792602539, 163.92236328125, 86.56309509277344, 207.41510009765625, 65.07652282714844, 180.9463348388672, -16.281982421875, -250.24026489257812, 62.93620300292969, -98.55523681640625, -76.31989288330078, 35.37635803222656, 106.13444519042969, 162.84014892578125, -144.98651123046875, 194.9390411376953, 214.27532958984375, 179.72427368164062, -23.113018035888672, -36.98064041137695, 80.25956726074219, 203.09881591796875, -20.617427825927734, -157.1549072265625, -77.56229400634766, -18.045669555664062, -70.85360717773438, 45.748558044433594, -17.65636444091797, -8.235069274902344, 177.7381591796875, 78.84480285644531, -24.193511962890625, 62.973304748535156, 100.42875671386719, 44.77430725097656, -8.164360046386719, -34.308719635009766, 191.22792053222656, 41.21148681640625, 169.3896484375, 59.77269744873047, 91.0743408203125, 92.05517578125, -138.42398071289062, 66.81517791748047, 45.21761703491211, 165.60916137695312, 146.21302795410156, -2.5666732788085938, 15.840522766113281, 35.242637634277344, 100.08226013183594, -76.10525512695312, -4.194004058837891, -1.220438003540039, 80.32330322265625, 22.395545959472656, 203.67721557617188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000437.npy"}
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 74.86723327636719, "std": 93.6883773803711, "min": -106.61528778076172, "p10": -26.782338714599607, "median": 50.77350616455078, "p90": 177.4303421020508, "max": 385.22906494140625, "pos_frac": 0.75, "sample": [203.8633575439453, -4.251289367675781, 385.22906494140625, -55.25555419921875, -1.6370735168457031, -56.07780456542969, -29.220294952392578, 165.99227905273438, 3.1061878204345703, 216.67210388183594, -24.81734848022461, 199.6090087890625, -16.08702850341797, 11.090751647949219, 178.24185180664062, 0.7842369079589844, 164.0585174560547, -106.61528778076172, 152.69371032714844, 98.4454574584961, 140.18685913085938, 154.39117431640625, -8.3758544921875, 29.23443603515625, 65.12451171875, 170.31846618652344, 47.774261474609375, 32.16230392456055, 174.0126953125, 124.78782653808594, 92.39918518066406, -30.039306640625, 139.9093017578125, 12.663633346557617, 14.248069763183594, 168.86135864257812, 175.5368194580078, -9.190336227416992, 89.50865173339844, 42.08866882324219, 137.11099243164062, 6.8677520751953125, 71.61333465576172, 252.8092041015625, 146.34156799316406, 12.266036987304688, 126.20572662353516, 37.12720489501953, 22.747711181640625, -2.910015106201172, -27.62447738647461, -7.812261581420898, 24.487634658813477, 164.36720275878906, 143.38064575195312, -94.62572479248047, 172.10574340820312, 33.36009979248047, -1.7142715454101562, 188.8001251220703, 6.669284820556641, 53.77275085449219, 139.1931915283203, 75.53575134277344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000438.npy"}
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 62.44725036621094, "std": 90.44873046875, "min": -220.00311279296875, "p10": -20.681716156005855, "median": 40.25525665283203, "p90": 195.46643676757813, "max": 229.53953552246094, "pos_frac": 0.75, "sample": [158.38552856445312, 5.803073883056641, 26.923620223999023, -24.110618591308594, 38.774478912353516, 170.32655334472656, 106.25343322753906, 53.700439453125, -24.672767639160156, 210.1067352294922, 209.5071563720703, -16.242631912231445, 184.49847412109375, 13.107986450195312, 41.14460754394531, 222.01266479492188, 170.29937744140625, 17.452308654785156, 192.59628295898438, -29.203750610351562, 5.564983367919922, 125.50960540771484, 57.959922790527344, -3.0619049072265625, 179.8993682861328, 46.60100555419922, 196.69650268554688, -111.80088806152344, 1.977365493774414, 189.6396026611328, -22.58418083190918, 197.92250061035156, -39.998023986816406, -15.96270751953125, 45.446563720703125, -8.704452514648438, -3.1647109985351562, 12.714431762695312, -220.00311279296875, 31.636966705322266, 39.6563720703125, 23.316177368164062, 179.79615783691406, 35.950714111328125, 60.661746978759766, 185.57106018066406, 54.966705322265625, 205.92518615722656, 49.11894226074219, 28.310089111328125, 22.480426788330078, 99.70146179199219, -12.467033386230469, 132.94894409179688, 40.85414123535156, 229.53953552246094, 103.62332153320312, -15.8017578125, -14.632087707519531, 1.8708038330078125, 18.91891098022461, 64.98023986816406, 75.07017517089844, -6.687599182128906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000439.npy"}
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 65.6700668334961, "std": 97.07954406738281, "min": -175.96852111816406, "p10": -45.132655715942384, "median": 49.71513748168945, "p90": 183.4134765625, "max": 260.6099853515625, "pos_frac": 0.75, "sample": [-22.19770050048828, 56.61183166503906, 49.498046875, 219.05130004882812, 183.2489013671875, 260.6099853515625, 185.0933837890625, -23.05522918701172, 182.03619384765625, 55.902801513671875, 24.607444763183594, 173.3514404296875, 66.2219467163086, 163.61900329589844, 140.2497100830078, 48.57208251953125, 181.42864990234375, -32.427833557128906, -138.76736450195312, 46.35151672363281, 105.6882095336914, -16.316980361938477, 8.55670166015625, 152.57479858398438, 49.932228088378906, 137.093505859375, 96.52352905273438, 179.59161376953125, 0.10031700134277344, 28.232879638671875, 2.511442184448242, 55.11009979248047, 155.74176025390625, -1.9662361145019531, 0.06021881103515625, 41.24469757080078, 39.29071044921875, -175.96852111816406, 19.264368057250977, 188.38865661621094, 134.54429626464844, 185.3602752685547, -1.6116790771484375, 177.34625244140625, -45.1994514465332, 29.812774658203125, 186.6221466064453, 41.44513702392578, 164.2380828857422, -44.97679901123047, -67.90510559082031, -80.47614288330078, 75.95999145507812, -39.750877380371094, 102.7903060913086, 29.212299346923828, -13.059223175048828, -58.681434631347656, 169.9382781982422, -147.81240844726562, 41.06694030761719, 183.4840087890625, 179.34988403320312, 115.52635955810547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000440.npy"}
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 70.84150695800781, "std": 76.87442779541016, "min": -77.12738800048828, "p10": -20.854964447021484, "median": 61.38019561767578, "p90": 178.6445541381836, "max": 200.1134796142578, "pos_frac": 0.796875, "sample": [160.01211547851562, 87.71745300292969, 184.72682189941406, 190.075927734375, -1.651693344116211, 161.523681640625, 164.385498046875, -12.770362854003906, 87.20065307617188, -13.438369750976562, -39.19894027709961, 147.19049072265625, -21.678749084472656, 179.09890747070312, 29.927684783935547, 40.77349853515625, 64.4672622680664, 6.385040283203125, -37.15361404418945, 177.5843963623047, 57.613189697265625, 67.46076202392578, 118.76947021484375, 173.42735290527344, -1.7462005615234375, 176.47406005859375, 0.10901260375976562, 6.523109436035156, 200.1134796142578, 85.45059204101562, 66.9582748413086, 58.4871826171875, -46.39434814453125, 106.34062194824219, 120.97299194335938, 9.282539367675781, -61.43974304199219, 175.5494842529297, 56.5333251953125, 65.11700439453125, 36.540008544921875, 173.70001220703125, 43.896827697753906, 12.627208709716797, 102.56196594238281, 51.720245361328125, 99.48458862304688, -0.7213516235351562, 112.51901245117188, -18.93280029296875, 181.22415161132812, 27.149972915649414, 180.99172973632812, 193.97845458984375, -51.9697265625, 36.93444061279297, 62.285499572753906, 5.052742004394531, 4.826133728027344, 109.44586944580078, -77.12738800048828, 60.474891662597656, 161.97146606445312, 34.4422607421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000441.npy"}
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 68.74054718017578, "std": 86.58409881591797, "min": -172.15411376953125, "p10": -28.010527038574217, "median": 62.54572677612305, "p90": 179.25980377197266, "max": 242.19183349609375, "pos_frac": 0.78125, "sample": [16.192527770996094, 199.81378173828125, 43.28892517089844, -52.41021728515625, 60.207115173339844, 242.19183349609375, 183.31103515625, 177.77902221679688, 64.88433837890625, 164.4393310546875, 67.60008239746094, 159.7841033935547, 160.29209899902344, -36.4407958984375, 137.311767578125, 104.9685287475586, 1.1437606811523438, 21.943023681640625, 36.174560546875, -27.343414306640625, -83.65823364257812, 152.53677368164062, 105.48709106445312, -8.953376770019531, 140.74630737304688, 3.7551803588867188, -55.30388641357422, 103.50592041015625, 9.953115463256836, 180.4386749267578, -0.5904693603515625, 75.8272476196289, 177.04592895507812, 166.58169555664062, 218.54531860351562, 104.45865631103516, 21.095245361328125, 1.5127449035644531, 179.89442443847656, 55.020843505859375, 85.07254791259766, 20.15692901611328, 5.259824752807617, 19.222972869873047, -0.7916774749755859, -172.15411376953125, -0.8468399047851562, 173.568359375, -28.296432495117188, -21.103012084960938, 180.7987823486328, 133.79562377929688, -64.47296142578125, 69.16783905029297, 103.12969970703125, -7.068380355834961, 84.96723937988281, 19.444976806640625, 110.94439697265625, 171.24383544921875, 55.13276672363281, 174.19122314453125, 2.2634811401367188, 12.737417221069336], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000442.npy"}
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 86.57083129882812, "std": 98.4903564453125, "min": -173.70394897460938, "p10": -12.795416259765625, "median": 89.8094711303711, "p90": 185.0427474975586, "max": 337.8819274902344, "pos_frac": 0.8125, "sample": [8.994403839111328, 163.25350952148438, 69.87730407714844, 176.8141326904297, 28.82654571533203, 45.717620849609375, 185.1891632080078, -71.7060546875, 50.4715576171875, 97.62901306152344, -35.842628479003906, -8.623369216918945, 64.412109375, 176.59024047851562, 76.46949768066406, 213.15457153320312, 9.494546890258789, 137.90789794921875, 109.0188980102539, 177.189453125, 337.8819274902344, -173.70394897460938, -12.940010070800781, 166.6224365234375, 4.917333602905273, 228.43997192382812, 157.73553466796875, 93.82062530517578, 166.16317749023438, 174.0914306640625, 159.65126037597656, 131.30560302734375, 150.1309356689453, 169.2935791015625, 65.39675903320312, 184.70111083984375, 170.46705627441406, 175.45916748046875, 128.1431884765625, 192.74636840820312, 7.2596588134765625, 162.89537048339844, -7.987115859985352, 85.7983169555664, 177.34133911132812, 171.60853576660156, -12.458030700683594, 13.881805419921875, 6.7022705078125, 5.553037643432617, 6.170597076416016, -57.69141387939453, 112.5850830078125, -2.1597747802734375, 34.17623519897461, 23.517566680908203, -166.5850830078125, -17.6607666015625, 220.9158477783203, 59.74132537841797, 0.08121109008789062, -10.397773742675781, 144.4813690185547, 237.6016082763672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000443.npy"}
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 33.09368896484375, "std": 97.50882720947266, "min": -177.70947265625, "p10": -76.07485961914062, "median": 14.611144065856934, "p90": 172.8931915283203, "max": 251.78744506835938, "pos_frac": 0.640625, "sample": [132.518798828125, -2.589550018310547, -48.919654846191406, 2.664766311645508, 54.865936279296875, 14.43655776977539, 97.317138671875, -31.249191284179688, 8.411846160888672, 129.77621459960938, 6.404808044433594, 85.9780502319336, -128.2172393798828, -167.64097595214844, 51.01432800292969, 12.606414794921875, 78.59300231933594, 42.10810852050781, 62.32746505737305, -64.81430053710938, 153.82809448242188, 218.63479614257812, -0.9833145141601562, -48.62742614746094, -11.904891967773438, 28.12952423095703, 5.833503723144531, -10.145246505737305, -76.88949584960938, -72.5711669921875, 172.99383544921875, 9.083309173583984, 177.3209228515625, 36.95298767089844, 16.41192626953125, 166.20993041992188, 32.785850524902344, 176.69699096679688, -10.909446716308594, -177.70947265625, 25.998504638671875, 251.78744506835938, -128.94796752929688, -44.56945037841797, -5.050077438354492, 176.03253173828125, 148.39071655273438, 168.89483642578125, 182.92401123046875, 67.15043640136719, 14.785730361938477, 102.49568176269531, -0.20665359497070312, 23.152435302734375, 172.65835571289062, -11.363245010375977, -74.17404174804688, 25.249048233032227, 3.78326416015625, -107.33131408691406, -164.61923217773438, 168.14292907714844, -2.9681644439697266, 5.046483993530273], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000444.npy"}
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 73.92229461669922, "std": 90.1535873413086, "min": -117.65709686279297, "p10": -33.72172260284424, "median": 78.57233428955078, "p90": 184.8625274658203, "max": 224.05758666992188, "pos_frac": 0.765625, "sample": [47.36379623413086, 224.05758666992188, 12.125755310058594, -34.89186096191406, -29.904449462890625, 24.324630737304688, 176.3602294921875, 162.91188049316406, 54.654693603515625, 80.58758544921875, -1.266693115234375, -115.06918334960938, -27.282089233398438, 12.647354125976562, -93.45419311523438, 164.33056640625, -27.902694702148438, 10.802131652832031, 3.00048828125, 67.15043640136719, -1.9834518432617188, 185.15484619140625, -117.65709686279297, 45.02537536621094, -60.85955047607422, 188.05612182617188, 4.784339904785156, 175.0953369140625, 48.110389709472656, -9.790924072265625, -54.35660934448242, -8.828039169311523, 98.71437072753906, 195.2855987548828, 184.18045043945312, 115.41200256347656, 158.92897033691406, 177.12464904785156, 91.98001098632812, -37.704429626464844, 188.47927856445312, 79.63658905029297, 73.25250244140625, 206.7802276611328, 179.5376434326172, 176.29226684570312, 204.783203125, 13.00377082824707, 148.45486450195312, -30.99139976501465, 114.78410339355469, 1.2216262817382812, 10.5228271484375, 99.09879302978516, 77.5080795288086, 117.70986938476562, 170.41275024414062, 174.89337158203125, 94.1169204711914, 85.37126922607422, 153.9593505859375, 105.52738189697266, 181.3040771484375, 18.14891815185547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000445.npy"}
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 74.3636245727539, "std": 85.51298522949219, "min": -114.19183349609375, "p10": -16.951260185241694, "median": 64.64851188659668, "p90": 186.03643188476562, "max": 256.2571105957031, "pos_frac": 0.78125, "sample": [7.13215446472168, -9.219429016113281, -114.19183349609375, 93.2083740234375, 13.887046813964844, 76.66087341308594, 63.59475326538086, -20.02753448486328, 256.2571105957031, 45.970672607421875, 22.31888198852539, 131.44638061523438, 162.0069122314453, 165.45303344726562, 129.1207275390625, 185.39878845214844, 8.396858215332031, 94.44359588623047, 66.47148895263672, 48.003501892089844, 2.6775550842285156, 131.65097045898438, -5.016021728515625, 168.41195678710938, 186.30970764160156, 86.66185760498047, 5.856327056884766, 167.63870239257812, 193.7858428955078, -42.34003448486328, 168.20542907714844, 0.6541347503662109, -87.97542572021484, -4.320182800292969, -22.437156677246094, -1.705545425415039, 23.53314971923828, -31.19118881225586, 72.17011260986328, 147.63455200195312, 37.5064697265625, 197.41380310058594, 195.4987030029297, 173.55101013183594, 235.83486938476562, -1.0681819915771484, 218.67562866210938, 174.1637725830078, -9.773286819458008, 41.631038665771484, 68.7857894897461, 46.6317138671875, 65.7022705078125, 107.55184936523438, 178.3866729736328, 180.7755889892578, 5.021900177001953, 6.6729736328125, 0.370513916015625, 99.33210754394531, -1.5857315063476562, 49.812007904052734, -32.333221435546875, 134.17691040039062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000446.npy"}
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 84.92918395996094, "std": 99.69498443603516, "min": -178.3341064453125, "p10": -66.28292541503903, "median": 96.10148620605469, "p90": 194.7408218383789, "max": 221.9730224609375, "pos_frac": 0.8125, "sample": [221.9730224609375, 194.87574768066406, 159.9473114013672, 15.530525207519531, 67.21171569824219, -21.985031127929688, 203.19674682617188, 27.7159423828125, -109.22284698486328, -15.689254760742188, 42.030670166015625, 151.32693481445312, 3.1888809204101562, 142.05239868164062, 189.43682861328125, 179.7814178466797, 98.29241943359375, 61.26596450805664, 84.106689453125, 93.03765869140625, 13.40875244140625, -77.01239776611328, 63.18260192871094, -41.24748992919922, 37.39678955078125, 209.49954223632812, 190.2144317626953, 200.33006286621094, 135.53952026367188, -78.34698486328125, 60.443328857421875, 78.22862243652344, 178.42095947265625, 140.35934448242188, 71.46099853515625, 168.4742431640625, 200.37835693359375, 136.65269470214844, 191.0672607421875, 6.893764495849609, 11.9869384765625, 123.51510620117188, 183.2642822265625, -128.04373168945312, -10.584781646728516, -135.26748657226562, 38.537925720214844, 160.708740234375, 209.38861083984375, 126.5693588256836, 4.434974670410156, 168.1379852294922, -80.0582275390625, 193.6646728515625, 113.51193237304688, 87.99307250976562, 117.33554077148438, -178.3341064453125, 143.41339111328125, 194.42599487304688, 184.31680297851562, -16.707542419433594, 93.91055297851562, 155.92996215820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000447.npy"}
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 69.22969055175781, "std": 91.10356140136719, "min": -174.14878845214844, "p10": -36.63091125488279, "median": 67.29420471191406, "p90": 180.70543823242187, "max": 269.9168395996094, "pos_frac": 0.796875, "sample": [180.5791015625, -130.73631286621094, 180.75958251953125, 91.91165161132812, 76.19539642333984, 114.71671295166016, 0.29758453369140625, 153.9219970703125, 27.98877716064453, -0.5033493041992188, 112.40028381347656, 94.98079681396484, 82.16230773925781, -14.232406616210938, 176.17156982421875, 196.89390563964844, -4.689144134521484, 197.26251220703125, 17.139854431152344, 12.96554183959961, 11.037643432617188, 168.61654663085938, 71.25617218017578, 100.60813903808594, 125.6128921508789, -1.1531028747558594, 179.8245849609375, 76.7005615234375, 26.96338653564453, -46.23027038574219, -72.00165557861328, 53.80198669433594, 51.315391540527344, 9.551467895507812, 4.422218322753906, 115.25776672363281, -53.156341552734375, 25.91905403137207, 218.68643188476562, -7.518503189086914, 3.733907699584961, -174.14878845214844, 101.03013610839844, 38.97084045410156, 96.2210693359375, 174.349365234375, 113.82086181640625, 165.18023681640625, 63.332237243652344, 41.81886291503906, -54.46863555908203, 41.996002197265625, 75.15483093261719, 176.14918518066406, 2.1630096435546875, 269.9168395996094, -0.711578369140625, -115.30533599853516, 171.6527099609375, 183.12322998046875, 179.34609985351562, 50.34490966796875, 9.831031799316406, 191.49835205078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000448.npy"}
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 34.47367477416992, "std": 84.902587890625, "min": -143.76217651367188, "p10": -84.4430549621582, "median": 22.467952728271484, "p90": 149.94144592285159, "max": 205.017578125, "pos_frac": 0.640625, "sample": [4.245302200317383, 24.971527099609375, -100.57347106933594, 61.693145751953125, 19.964378356933594, 0.7073287963867188, -18.912399291992188, 60.80640411376953, 54.220985412597656, 189.39556884765625, 151.75070190429688, 118.8589096069336, 3.4174423217773438, -17.724760055541992, -5.03448486328125, -7.7099609375, 205.017578125, 1.856048583984375, -1.5240325927734375, 53.22904586791992, -57.0800895690918, 57.300804138183594, 176.92941284179688, 125.03535461425781, -143.76217651367188, -26.91408920288086, 119.83893585205078, -84.74484252929688, 139.19277954101562, 51.162628173828125, 47.97328186035156, -18.477088928222656, 28.73095703125, -91.16783142089844, 137.0679931640625, 13.506439208984375, 195.8025360107422, -10.430248260498047, 69.93327331542969, 38.296478271484375, 13.837072372436523, 16.018844604492188, -85.39328002929688, 33.706695556640625, -98.6633071899414, 96.05928039550781, -3.7687835693359375, 191.695556640625, 11.201847076416016, 104.44989776611328, 187.2096710205078, -30.687713623046875, 29.76306915283203, -15.617179870605469, -75.94381713867188, 126.39739990234375, 129.65586853027344, -83.73888397216797, 145.7198486328125, -63.646995544433594, 88.00035095214844, -102.95706176757812, 34.08577346801758, -7.918739318847656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000449.npy"}
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 54.245628356933594, "std": 101.32833862304688, "min": -138.87261962890625, "p10": -53.88512687683105, "median": 18.54177951812744, "p90": 189.31826477050782, "max": 314.1717224121094, "pos_frac": 0.640625, "sample": [1.3635540008544922, -16.059341430664062, -56.06629180908203, -38.22126007080078, 314.1717224121094, 197.71694946289062, 84.73870849609375, 53.39874267578125, 163.58001708984375, 186.15310668945312, 103.77127838134766, 36.41100311279297, 52.117332458496094, 50.636505126953125, 2.5614700317382812, 166.32638549804688, -7.0539398193359375, 65.30006408691406, 168.29795837402344, 235.7547607421875, -2.207447052001953, 35.12765884399414, -14.980728149414062, -43.224647521972656, 80.29182434082031, -20.74759292602539, -13.002113342285156, 199.78073120117188, 73.39253997802734, 4.687189102172852, -12.055229187011719, 189.39797973632812, 29.895957946777344, 138.65066528320312, -110.21611785888672, 189.13226318359375, 169.16940307617188, -3.022235870361328, 227.27407836914062, 8.975582122802734, 8.89913558959961, -5.710884094238281, 110.4306411743164, -117.79027557373047, 15.564979553222656, 2.8699588775634766, 145.5924072265625, -3.79364013671875, 161.9317626953125, -48.79574203491211, 21.518579483032227, -138.87261962890625, 13.582780838012695, 98.43815612792969, 255.9583740234375, -6.286735534667969, -81.45314025878906, 173.54603576660156, -105.08474731445312, 179.05958557128906, -78.0987777709961, 9.483718872070312, -18.76579475402832, -11.721916198730469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000450.npy"}
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 49.16454315185547, "std": 91.44844055175781, "min": -166.94412231445312, "p10": -30.4573860168457, "median": 32.480743408203125, "p90": 176.03164520263675, "max": 238.89712524414062, "pos_frac": 0.6875, "sample": [123.41746520996094, -12.88507080078125, 44.12858581542969, 51.25337219238281, 151.19920349121094, 15.545135498046875, -166.94412231445312, 31.454803466796875, 0.6294651031494141, 193.99945068359375, 118.09625244140625, 88.03428649902344, 180.55206298828125, 39.5648193359375, 158.66824340820312, 47.01991271972656, 64.87023162841797, -10.919288635253906, 21.15192985534668, 95.48451232910156, -1.4822750091552734, -132.51412963867188, -126.80519104003906, 234.64248657226562, 168.70066833496094, -32.1273193359375, 89.12960815429688, -142.20458984375, -36.48051452636719, -1.0824394226074219, 41.785865783691406, 238.89712524414062, -15.900493621826172, -0.8519687652587891, 153.13201904296875, 85.55416107177734, 179.17349243164062, 141.89295959472656, -23.52825927734375, 11.663810729980469, 17.841163635253906, 106.84037780761719, 2.8600730895996094, -3.108001708984375, 102.63689422607422, 19.95440673828125, 5.595409393310547, 135.20143127441406, 192.8350067138672, 107.54609680175781, -24.330596923828125, 149.68545532226562, 33.506683349609375, -6.998039245605469, 2.5221939086914062, 15.760818481445312, 159.2550811767578, 0.06638717651367188, -6.943628311157227, 50.9621467590332, 183.15518188476562, -10.326663970947266, -127.342529296875, -26.560874938964844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000451.npy"}
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 57.45180892944336, "std": 100.72935485839844, "min": -158.848388671875, "p10": -67.80474700927734, "median": 44.73244857788086, "p90": 187.1694534301758, "max": 243.7777862548828, "pos_frac": 0.765625, "sample": [16.47942352294922, -18.632301330566406, 18.8039608001709, 91.6666259765625, 20.62451934814453, 108.88433074951172, 7.470649719238281, 173.28663635253906, 157.29824829101562, 193.17886352539062, 13.904003143310547, 230.3776092529297, -158.39752197265625, 146.39610290527344, 160.14712524414062, 71.93169403076172, 243.7777862548828, 109.8555908203125, 126.77151489257812, 103.175048828125, 46.46526336669922, -28.5849609375, 231.2239990234375, 136.31907653808594, 35.205745697021484, -17.523706436157227, 163.83824157714844, 42.9996337890625, 3.1016006469726562, 21.861549377441406, 1.7311210632324219, 212.53280639648438, -125.13848876953125, 184.10707092285156, -53.03605651855469, 108.45938110351562, 42.571189880371094, 104.08320617675781, -144.02236938476562, 6.396221160888672, 109.60161590576172, -74.13418579101562, 123.7865219116211, 51.13880920410156, 133.19009399414062, -153.30197143554688, 188.48190307617188, 153.92294311523438, 23.054702758789062, 5.618232727050781, 57.28001403808594, -6.7860870361328125, 3.647279739379883, 143.19664001464844, 5.229747772216797, 4.8894500732421875, -47.84046173095703, 194.63644409179688, 147.36790466308594, -97.54655456542969, -16.710586547851562, 109.87488555908203, -158.848388671875, -12.423873901367188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000452.npy"}
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 59.804588317871094, "std": 97.74253845214844, "min": -150.69610595703125, "p10": -65.54375991821288, "median": 45.59867477416992, "p90": 187.44487609863282, "max": 272.26971435546875, "pos_frac": 0.671875, "sample": [129.08395385742188, 46.11595153808594, 19.892227172851562, -19.48596954345703, 199.01939392089844, 26.205780029296875, -20.107276916503906, 164.52944946289062, 161.44993591308594, -27.50189971923828, -120.70567321777344, -6.952796936035156, 83.52084350585938, 4.046440124511719, 165.91586303710938, 165.7852783203125, 272.26971435546875, 57.8740234375, 186.15267944335938, 79.52790832519531, -0.3668231964111328, -17.64337158203125, 41.28919982910156, 21.421371459960938, 7.676971435546875, -139.1314697265625, -150.69610595703125, 187.37265014648438, -3.8381423950195312, 172.78074645996094, 187.475830078125, 45.081398010253906, 100.27471923828125, -57.16583251953125, -5.920459747314453, 50.51258850097656, 163.83566284179688, 169.8423309326172, -89.62102508544922, 163.73406982421875, 72.97176361083984, 223.28770446777344, -82.11672973632812, -0.3631019592285156, -22.349472045898438, 35.87522888183594, -69.1343002319336, 22.17197036743164, 132.77401733398438, 69.76809692382812, -5.984737396240234, -26.71677017211914, 63.845497131347656, 96.97532653808594, 106.78132629394531, -82.939697265625, -1.2635917663574219, 75.3507080078125, 191.1765899658203, 41.54632568359375, 219.55612182617188, 146.40057373046875, 200.70953369140625, 5.6211700439453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000453.npy"}
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 68.40727233886719, "std": 94.24531555175781, "min": -223.02105712890625, "p10": -21.082277870178213, "median": 68.58141708374023, "p90": 183.9515350341797, "max": 297.00341796875, "pos_frac": 0.796875, "sample": [166.47146606445312, -33.27354431152344, -12.562698364257812, 190.70050048828125, -223.02105712890625, -12.98493766784668, 150.61790466308594, 71.48102569580078, 7.157524108886719, 23.43144989013672, 77.67437744140625, 180.35450744628906, 19.98040771484375, -191.20767211914062, 186.177734375, 120.52052307128906, 176.45167541503906, 75.81178283691406, 206.36709594726562, 37.76152038574219, -12.435356140136719, -100.86862182617188, 5.792932510375977, 163.20864868164062, 202.07244873046875, 297.00341796875, -2.422513961791992, 36.31388854980469, -29.953346252441406, 23.44072723388672, -24.552566528320312, 116.24156951904297, 149.9297637939453, 19.72991180419922, 115.8656234741211, 95.91503143310547, 202.36752319335938, -36.74142837524414, 185.4931182861328, 90.70365142822266, 106.8636474609375, 103.01612854003906, 2.9304733276367188, 0.6824407577514648, 9.112808227539062, 22.034263610839844, 176.03692626953125, 113.24647521972656, 1.6959095001220703, 34.616455078125, 124.63063049316406, 152.54879760742188, 6.133831024169922, 161.3886260986328, 139.43606567382812, 31.4853515625, 125.79063415527344, 62.987518310546875, -4.0992279052734375, 135.04356384277344, 99.5157241821289, 65.68180847167969, 3.270418167114258, -10.99847412109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000454.npy"}
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 64.12175750732422, "std": 91.70115661621094, "min": -137.00405883789062, "p10": -41.84453010559082, "median": 34.82641792297363, "p90": 196.24749450683598, "max": 252.40420532226562, "pos_frac": 0.765625, "sample": [76.18307495117188, 34.600250244140625, 73.04935455322266, 220.20901489257812, 92.08441925048828, 2.4719600677490234, 239.6197967529297, 147.2113494873047, -41.06989288330078, 27.40142059326172, 124.13302612304688, -137.00405883789062, 71.60330200195312, 123.37474060058594, 61.269927978515625, 252.40420532226562, -20.62578582763672, -0.3813629150390625, 6.264152526855469, -98.08438873291016, 212.57879638671875, 3.012033462524414, 159.05978393554688, -3.9648056030273438, -2.2296524047851562, 199.58740234375, 35.09574890136719, 121.41818237304688, 183.91122436523438, 52.24687957763672, 13.255054473876953, 21.79277229309082, 187.50454711914062, 19.71765899658203, 34.656402587890625, -5.341377258300781, -70.70753479003906, 120.19200897216797, 160.37973022460938, 169.03985595703125, -42.176517486572266, 23.49028778076172, 93.79473876953125, -7.29473876953125, 164.17947387695312, 221.12217712402344, 86.39119720458984, 58.272117614746094, 13.002944946289062, 0.5704345703125, 188.45437622070312, 22.773033142089844, -0.22116470336914062, 171.28091430664062, 24.707687377929688, 10.33282470703125, 33.38004684448242, -69.6372299194336, -44.65385437011719, 18.798057556152344, 63.090484619140625, 34.99643325805664, -74.15678405761719, 247.3765869140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000455.npy"}
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 56.78841018676758, "std": 95.958251953125, "min": -153.7754364013672, "p10": -28.30454292297362, "median": 35.7097110748291, "p90": 179.32245330810548, "max": 292.2891540527344, "pos_frac": 0.734375, "sample": [117.15467834472656, -2.5860366821289062, 2.187440872192383, 112.52411651611328, 292.2891540527344, 2.3877830505371094, -33.768001556396484, 78.654541015625, -7.126399993896484, 128.28890991210938, 45.88655090332031, 25.53287124633789, 245.18618774414062, 167.31634521484375, 16.950363159179688, 3.3245086669921875, 108.67730712890625, 81.1613998413086, -0.7254734039306641, 187.8724365234375, -121.50856018066406, 15.279434204101562, 57.896522521972656, -15.556472778320312, 161.8749237060547, -12.13554573059082, -153.7754364013672, -141.52552795410156, 148.85549926757812, -72.39627075195312, 10.030162811279297, 77.95664978027344, 6.223161697387695, 118.65226745605469, 9.933517456054688, 58.0479736328125, 59.39485168457031, -3.3707427978515625, 102.75729370117188, 6.725978851318359, -9.174688339233398, 49.05513381958008, 83.51700592041016, 102.72093200683594, -105.66377258300781, -117.16539764404297, 124.43084716796875, 1.482574462890625, -14.627555847167969, -7.852565765380859, 149.1458282470703, 181.96388244628906, 13.695816040039062, 231.9883575439453, 24.11064910888672, 147.76417541503906, 221.1947784423828, 24.843772888183594, 161.907470703125, 173.15911865234375, 8.454965591430664, -7.744873046875, 72.58390808105469, 240.1193389892578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000456.npy"}
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 63.27531051635742, "std": 83.34642791748047, "min": -114.83326721191406, "p10": -11.059110832214355, "median": 53.272361755371094, "p90": 178.59868621826172, "max": 207.33981323242188, "pos_frac": 0.8125, "sample": [20.240943908691406, 4.1088104248046875, -5.180171966552734, 118.59539794921875, 65.76449584960938, -114.83326721191406, 23.60049819946289, 35.0811767578125, 45.48406219482422, 5.91900634765625, 16.698102951049805, 177.80899047851562, 162.20896911621094, 114.0421371459961, -6.586128234863281, -101.58792114257812, 178.5912628173828, 30.070266723632812, 0.1423664093017578, 164.1488037109375, 159.5013427734375, 60.26805114746094, 205.4232940673828, 57.71897888183594, 5.600650787353516, 65.3026123046875, -11.52824592590332, 173.51873779296875, 2.8887863159179688, -0.9690895080566406, -102.34761047363281, -77.90982818603516, 11.582115173339844, 167.83067321777344, 25.844074249267578, -42.44160461425781, -106.51399230957031, 48.82574462890625, 118.1994400024414, 21.4539794921875, 92.77796173095703, 7.7020721435546875, 180.4169464111328, 152.25125122070312, -0.06163787841796875, 11.976051330566406, 137.67994689941406, 72.17259216308594, 7.025276184082031, 181.01419067382812, 106.97135162353516, 114.9930648803711, 207.33981323242188, 125.08350372314453, 106.079833984375, 186.1097869873047, 190.34576416015625, 63.15705871582031, 20.841978073120117, 30.715923309326172, 178.60186767578125, -9.964462280273438, 95.60884094238281, 74.21495056152344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000457.npy"}
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 64.72538757324219, "std": 99.8543930053711, "min": -193.69342041015625, "p10": -25.56526126861572, "median": 47.315269470214844, "p90": 192.97918548583988, "max": 255.3730010986328, "pos_frac": 0.796875, "sample": [155.88214111328125, 137.0367431640625, 4.682249069213867, 110.69442749023438, 66.80404663085938, 28.79043960571289, 130.437255859375, -193.69342041015625, -0.1113739013671875, 6.0996551513671875, -11.460845947265625, -62.41310119628906, -21.013931274414062, 224.2786102294922, 184.19357299804688, 46.54808807373047, 48.08245086669922, 181.00741577148438, 71.18791961669922, 78.14888000488281, 10.613462448120117, 33.32228088378906, 32.19158172607422, -25.518089294433594, 133.3330841064453, 26.684402465820312, 225.62132263183594, 64.79866790771484, 17.73271942138672, 179.4422607421875, 213.60626220703125, 78.27510070800781, 89.99049377441406, -105.2717514038086, 49.47859191894531, 0.018693923950195312, 3.6022720336914062, 17.982818603515625, 95.37879943847656, 160.54335021972656, 136.25192260742188, 45.31525802612305, -189.0679931640625, -5.448051452636719, 253.42279052734375, 1.0223426818847656, 155.6978759765625, -25.585477828979492, 181.41590881347656, 203.9377899169922, -17.358760833740234, 255.3730010986328, 109.38986206054688, 22.39324378967285, -104.08802032470703, 185.69178771972656, 41.161033630371094, 177.72943115234375, -116.43716430664062, 31.453460693359375, 9.566722869873047, 33.10230255126953, 196.10235595703125, 74.37608337402344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000458.npy"}
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 80.73894500732422, "std": 96.46329498291016, "min": -203.93539428710938, "p10": -15.686406326293941, "median": 80.48014068603516, "p90": 199.25577239990236, "max": 259.7022705078125, "pos_frac": 0.796875, "sample": [44.375633239746094, 121.47187805175781, -11.762687683105469, 143.90274047851562, 16.781219482421875, -203.93539428710938, 0.9506855010986328, 186.16171264648438, 132.63632202148438, 166.27639770507812, 205.77218627929688, -39.87723159790039, 110.17955017089844, -64.79790496826172, 185.50738525390625, 217.09043884277344, 76.74717712402344, 84.21310424804688, 9.515998840332031, 171.88417053222656, 146.22854614257812, 154.2582550048828, 100.8994140625, 23.953392028808594, 10.104635238647461, 178.96575927734375, 62.577919006347656, 160.2503662109375, 41.02141189575195, 141.14913940429688, 218.53176879882812, 41.955284118652344, 0.9220333099365234, 12.067869186401367, -10.486289978027344, 134.4382781982422, 70.41020965576172, -104.56373596191406, 171.9501953125, -17.368000030517578, -3.067567825317383, 118.76876831054688, 179.54348754882812, 2.8492889404296875, 259.7022705078125, -113.84707641601562, 84.52120208740234, 24.85169219970703, -51.280609130859375, 93.8134994506836, -8.780677795410156, -2.640106201171875, -4.058446884155273, 229.64990234375, 132.53428649902344, 59.69340515136719, 183.8740997314453, 59.462127685546875, 199.61932373046875, 196.93038940429688, 200.39247131347656, 14.767341613769531, 198.40748596191406, 21.226112365722656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000459.npy"}
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 77.29153442382812, "std": 96.43618774414062, "min": -119.0399169921875, "p10": -35.7505989074707, "median": 57.328460693359375, "p90": 192.90688018798832, "max": 300.48992919921875, "pos_frac": 0.78125, "sample": [220.4156036376953, 112.599365234375, 221.70895385742188, -44.65782165527344, 10.209465026855469, 180.6951904296875, 226.07408142089844, 56.1431884765625, 291.74658203125, 6.049243927001953, -46.05143737792969, -38.32438659667969, -3.4277420043945312, 22.16191864013672, 3.271209716796875, 9.099128723144531, 13.700422286987305, 181.01129150390625, -55.262908935546875, 173.89974975585938, 74.61959838867188, 182.73480224609375, 34.240638732910156, 196.13589477539062, 174.21377563476562, 85.18375396728516, 3.2426719665527344, 58.51373291015625, 182.8900909423828, 77.3828353881836, 137.4066162109375, 174.8867950439453, 154.74681091308594, 145.295166015625, 16.517122268676758, 62.9366455078125, 2.065776824951172, 19.040115356445312, 29.310455322265625, -0.6017475128173828, 87.6250991821289, 183.77886962890625, -35.227622985839844, 13.633996963500977, 149.27061462402344, 185.3725128173828, -35.9747314453125, 16.587539672851562, 167.11257934570312, -74.99127197265625, 2.800516128540039, 41.206085205078125, 180.77735900878906, 66.07327270507812, -14.972549438476562, 196.83676147460938, 45.760108947753906, 300.48992919921875, -119.0399169921875, -33.86598205566406, 104.07785034179688, -6.1336669921875, -6.558052062988281, 180.19619750976562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000460.npy"}
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 79.58551025390625, "std": 95.46094512939453, "min": -117.61900329589844, "p10": -44.97841873168945, "median": 70.96077728271484, "p90": 201.49764251708987, "max": 244.75827026367188, "pos_frac": 0.765625, "sample": [203.86172485351562, 71.57697296142578, 44.98551940917969, 31.67401123046875, 166.16015625, 203.2430877685547, 70.3445816040039, 164.42233276367188, 153.47000122070312, -109.36537170410156, 0.4685821533203125, 89.40478515625, 61.551788330078125, 3.5659046173095703, 128.74020385742188, -15.334060668945312, -42.77009582519531, 187.68011474609375, 1.1011810302734375, 4.292089462280273, 134.67007446289062, 162.9776611328125, 164.66302490234375, 2.237213134765625, 50.157020568847656, 130.5844268798828, -50.56555938720703, -7.9402008056640625, -36.38959503173828, 127.22589874267578, 183.4166259765625, -45.924842834472656, -14.559457778930664, 77.03253173828125, 168.10215759277344, 197.69638061523438, 244.75827026367188, 189.28744506835938, 62.158233642578125, 229.26571655273438, 207.10655212402344, 174.2076416015625, 103.66708374023438, 44.114707946777344, 1.8028240203857422, -61.83638000488281, 107.73091125488281, -0.4548931121826172, 165.59913635253906, 20.059242248535156, 227.7040557861328, 99.6104965209961, 68.0670394897461, 26.95880126953125, 194.15402221679688, -47.19688415527344, 203.1267547607422, -117.61900329589844, 59.733192443847656, -17.88983917236328, 174.208984375, -38.850563049316406, 179.38084411621094, -67.83848571777344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000461.npy"}
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 81.2914047241211, "std": 93.81979370117188, "min": -184.3262939453125, "p10": -21.97222747802734, "median": 88.23150634765625, "p90": 189.28539276123047, "max": 239.99191284179688, "pos_frac": 0.796875, "sample": [157.90843200683594, 198.74563598632812, 192.68231201171875, 186.6255340576172, 7.9747467041015625, 20.248336791992188, 126.75936126708984, 2.5221595764160156, -5.306648254394531, 3.4320907592773438, -36.62744903564453, 175.6537322998047, 169.09579467773438, 39.999778747558594, 183.87486267089844, 193.88052368164062, -24.552764892578125, 159.8052520751953, 20.06615447998047, 105.43758392333984, 162.17698669433594, 53.19758605957031, 56.43506622314453, 158.16970825195312, 64.65115356445312, 9.437347412109375, 166.2471923828125, 196.8721466064453, -12.595340728759766, 155.05557250976562, -33.478485107421875, -15.950973510742188, -8.148246765136719, 176.24630737304688, 189.96908569335938, -53.856109619140625, -28.359403610229492, 111.35565185546875, 46.078590393066406, 140.2021484375, 187.21273803710938, 196.41493225097656, 90.67559051513672, 31.716461181640625, 36.93443298339844, 187.6901092529297, -1.450653076171875, 23.103763580322266, 163.98651123046875, 167.86343383789062, 92.93901062011719, -164.26194763183594, 165.54510498046875, 178.37405395507812, 159.66738891601562, 8.218299865722656, 16.436601638793945, 239.99191284179688, 88.4333267211914, 88.0296859741211, -184.3262939453125, 8.257591247558594, -4.960916519165039, 14.227313995361328], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000462.npy"}
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 50.386138916015625, "std": 88.78050994873047, "min": -188.78817749023438, "p10": -44.85383911132813, "median": 45.89540100097656, "p90": 179.0045349121094, "max": 214.02574157714844, "pos_frac": 0.71875, "sample": [-11.915966033935547, 165.30545043945312, 102.30686950683594, -188.78817749023438, 104.60604858398438, 147.45870971679688, -25.90130043029785, 181.06539916992188, 188.67984008789062, 128.22076416015625, -54.69252395629883, -41.66065979003906, -7.789222717285156, 54.33307647705078, 130.72496032714844, -102.22817993164062, 48.535377502441406, 68.88711547851562, 23.259170532226562, -72.4562759399414, 72.74952697753906, 29.338279724121094, 11.46624755859375, 186.94082641601562, 79.8507308959961, -44.58258056640625, 21.30345916748047, 18.404945373535156, 1.450836181640625, 14.326549530029297, 31.681594848632812, -4.903022766113281, 180.02999877929688, 175.9514923095703, 54.73410415649414, 115.84798431396484, 132.11361694335938, 38.35918426513672, 2.0325393676757812, 214.02574157714844, 50.94887924194336, 8.029365539550781, 65.18984985351562, -120.68348693847656, -34.179439544677734, -44.9700927734375, 195.253662109375, 134.06869506835938, 164.90184020996094, 111.38346862792969, 25.957866668701172, -135.94866943359375, -5.6237335205078125, -41.570716857910156, 43.68345642089844, 183.4297332763672, 57.721500396728516, -8.873443603515625, 80.95476531982422, 101.87395477294922, -9.549293518066406, 48.10734558105469, 8.923063278198242, 176.61178588867188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000463.npy"}
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 74.2612533569336, "std": 104.40400695800781, "min": -183.44320678710938, "p10": -38.969729232788076, "median": 74.93519592285156, "p90": 204.1142318725586, "max": 228.9141387939453, "pos_frac": 0.734375, "sample": [-26.489242553710938, 27.641830444335938, 179.52777099609375, -24.78887939453125, 177.73387145996094, -2.320220947265625, 4.199562072753906, 174.95272827148438, 88.76162719726562, 149.20355224609375, 186.28382873535156, -27.40312957763672, 171.76718139648438, 116.16606903076172, 94.78805541992188, -81.89859008789062, 210.6663818359375, 157.85714721679688, 23.684036254882812, 170.8614044189453, 5.862457275390625, 228.9141387939453, -3.8379688262939453, 120.49942779541016, 20.215316772460938, 203.60870361328125, 47.346412658691406, 211.45669555664062, 120.26486206054688, 137.73777770996094, 79.16893005371094, 10.049552917480469, 180.83038330078125, 220.17027282714844, -183.44320678710938, -43.30998229980469, 185.65029907226562, 50.44486999511719, -138.9215545654297, -86.34707641601562, 159.59129333496094, 10.136207580566406, 37.366024017333984, 221.34335327148438, -75.54528045654297, -5.093879699707031, 15.215621948242188, 70.70146179199219, -15.768280029296875, 194.0690155029297, 131.131591796875, 204.95750427246094, -167.966552734375, -28.842472076416016, 54.94237518310547, -18.893199920654297, 10.006965637207031, 183.6083984375, 132.0641326904297, 204.3308868408203, -19.291175842285156, 142.2310333251953, 155.51089477539062, 19.358795166015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000464.npy"}
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 63.873931884765625, "std": 99.93558502197266, "min": -139.40817260742188, "p10": -69.16403541564941, "median": 69.57762145996094, "p90": 187.49158325195313, "max": 280.1383972167969, "pos_frac": 0.6875, "sample": [-27.72327423095703, 280.1383972167969, 167.1048583984375, -113.51727294921875, 152.44732666015625, 146.24252319335938, -30.870376586914062, 188.84364318847656, 70.7342529296875, 63.65576934814453, 68.42098999023438, 182.29498291015625, 129.54684448242188, 209.8184814453125, -44.318336486816406, 160.2030029296875, 8.815956115722656, 183.26223754882812, 116.07357788085938, -5.969810485839844, -82.91043090820312, -113.60957336425781, 189.82191467285156, 204.4243621826172, 95.40824890136719, 194.08038330078125, -71.41744232177734, -109.72071838378906, -23.994728088378906, 90.9008560180664, 0.9972763061523438, -28.641746520996094, 63.4295654296875, 88.88016510009766, 151.04159545898438, 26.08135986328125, -61.639312744140625, -3.2151336669921875, 5.406614303588867, 172.55137634277344, -85.21644592285156, -8.116096496582031, 82.89657592773438, 36.69031524658203, -11.312873840332031, -139.40817260742188, -63.90608596801758, 91.86434936523438, 32.487152099609375, 184.33677673339844, 7.427299499511719, 213.70947265625, 19.666297912597656, 100.85592651367188, 31.640296936035156, -26.073156356811523, 82.52045440673828, -14.108282089233398, 121.85935974121094, 87.74024200439453, 172.894775390625, 142.4889373779297, 160.74505615234375, 173.17091369628906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000465.npy"}
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 37.982765197753906, "std": 91.31265258789062, "min": -238.914794921875, "p10": -60.366606140136696, "median": 17.18064594268799, "p90": 172.19578552246094, "max": 234.4334716796875, "pos_frac": 0.703125, "sample": [1.8676376342773438, 160.7148895263672, -33.081939697265625, -110.94744873046875, 15.937707901000977, 2.682668685913086, 18.423583984375, -11.486434936523438, 23.918724060058594, -238.914794921875, 72.86354064941406, 68.67721557617188, 182.1497802734375, -115.8304672241211, 84.97040557861328, 4.24774169921875, 94.59339904785156, 218.20339965820312, 33.56343078613281, 201.459228515625, 65.85511779785156, 7.1603851318359375, 169.96929931640625, -0.09100723266601562, -8.017059326171875, 0.8660049438476562, 126.20121765136719, -69.29481506347656, 234.4334716796875, 32.3643798828125, -3.9459266662597656, 109.18223571777344, -15.336517333984375, 21.999130249023438, 162.31988525390625, 49.99726486206055, 129.59109497070312, 55.65357208251953, -22.578968048095703, 15.910758972167969, -19.15595245361328, -148.1229248046875, -73.55220794677734, -77.26924133300781, 142.41519165039062, 24.3648681640625, 13.420608520507812, 121.87386322021484, 218.38278198242188, -1.2930717468261719, -39.53411865234375, 60.437225341796875, -27.159812927246094, 39.16291809082031, -0.8708724975585938, 1.9538898468017578, 11.376035690307617, 11.198713302612305, 23.465360641479492, 173.14999389648438, 192.14474487304688, 1.8925514221191406, 8.857582092285156, 37.506980895996094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000466.npy"}
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 56.93370819091797, "std": 88.9612808227539, "min": -149.64910888671875, "p10": -33.73662796020508, "median": 28.803146362304688, "p90": 178.43229675292974, "max": 353.8275451660156, "pos_frac": 0.828125, "sample": [1.5778045654296875, 95.69560241699219, 28.87591552734375, 45.69609069824219, 28.730377197265625, 78.73348999023438, 1.8656845092773438, -15.517292022705078, -26.200878143310547, 74.1258773803711, 117.36651611328125, -60.824832916259766, 184.44607543945312, -46.568634033203125, 77.91160583496094, 116.03641510009766, 87.57528686523438, 29.797523498535156, -132.59329223632812, 10.992630004882812, 152.5958251953125, -62.75225830078125, -29.248573303222656, 127.85370635986328, 205.56089782714844, 21.021522521972656, 20.89276885986328, 105.6383056640625, -149.64910888671875, 158.64816284179688, -4.385871887207031, 42.720489501953125, 8.726812362670898, 9.69464111328125, 1.4021759033203125, 22.721603393554688, 94.708984375, 205.31619262695312, 102.59718322753906, 85.7664794921875, 0.603485107421875, 0.36077308654785156, 6.731193542480469, 27.886476516723633, 5.208402633666992, -35.66007995605469, 161.72311401367188, 5.0136260986328125, 154.2884979248047, 127.12434387207031, 353.8275451660156, 11.17193603515625, 6.043220520019531, 8.794683456420898, 95.84345245361328, 107.5116958618164, 29.685142517089844, 188.99569702148438, 9.753326416015625, 222.1427764892578, -45.17234802246094, 203.25022888183594, 164.400146484375, 16.678077697753906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000467.npy"}
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 78.296630859375, "std": 100.26899719238281, "min": -179.51507568359375, "p10": -35.45397262573241, "median": 75.31394577026367, "p90": 204.28363952636718, "max": 230.41995239257812, "pos_frac": 0.828125, "sample": [2.3192920684814453, 35.55548095703125, 127.55654907226562, 228.1230926513672, 27.00194549560547, -39.454444885253906, 3.394582748413086, 142.52484130859375, 189.4873046875, 71.52302551269531, 111.39082336425781, 212.270751953125, 206.0216827392578, -171.522705078125, 21.218978881835938, 23.85650634765625, 51.70098114013672, -26.119537353515625, -179.51507568359375, 122.51586151123047, 176.25103759765625, 114.5556640625, 170.58554077148438, 203.67526245117188, 75.40081024169922, 159.61016845703125, 180.62368774414062, 0.5859470367431641, -10.083061218261719, -130.05484008789062, 204.54437255859375, -127.46755981445312, 75.22708129882812, -1.9600181579589844, 23.37236785888672, -53.99506378173828, 138.46392822265625, 158.02969360351562, -41.33270263671875, 174.4597930908203, 56.909263610839844, 52.080810546875, 188.17807006835938, 205.66748046875, 163.56536865234375, 1.5360374450683594, 23.27669334411621, 43.73713684082031, 7.543708801269531, 1.2399349212646484, 183.67770385742188, 32.23432159423828, 117.54104614257812, 35.543060302734375, 230.41995239257812, 108.97442626953125, -24.83056640625, 209.0525665283203, 175.03427124023438, 183.2613067626953, 178.06668090820312, 86.91934967041016, 81.34884643554688, 19.665199279785156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000468.npy"}
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 73.14139556884766, "std": 82.78005981445312, "min": -123.77951049804688, "p10": -13.819326972961417, "median": 59.075687408447266, "p90": 175.99893035888672, "max": 260.5042724609375, "pos_frac": 0.796875, "sample": [20.527389526367188, 175.59104919433594, 101.30540466308594, 153.61819458007812, 14.957962036132812, 4.761760711669922, 83.84754943847656, 153.8536376953125, 89.11897277832031, -123.77951049804688, 168.2923583984375, 89.978759765625, 147.22947692871094, 2.9804458618164062, -5.440071105957031, 174.5978240966797, 125.43897247314453, 62.23346710205078, 144.63385009765625, 177.929931640625, 190.89437866210938, 8.644287109375, 138.1722412109375, -2.175689697265625, 117.12551879882812, -82.3145980834961, 39.72228240966797, 11.093963623046875, 5.943037033081055, 49.32777404785156, -4.764884948730469, 0.8317165374755859, 76.64523315429688, 198.30197143554688, 175.37197875976562, 3.0840301513671875, -0.4679107666015625, -18.140792846679688, 173.9331817626953, 149.41920471191406, 176.17373657226562, 91.2007827758789, 40.50791549682617, 130.79766845703125, 16.10265350341797, -2.4686279296875, 55.91790771484375, 1.0986404418945312, 168.7146759033203, 20.577438354492188, -36.703887939453125, -21.945995330810547, 220.58465576171875, -3.7729949951171875, 260.5042724609375, 0.8844985961914062, 116.14309692382812, 37.81310272216797, 132.80921936035156, -17.410436630249023, 135.216796875, 11.191160202026367, -21.708984375, 176.49761962890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000469.npy"}
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 68.60761260986328, "std": 95.48844146728516, "min": -196.2356414794922, "p10": -39.83362617492674, "median": 70.85810089111328, "p90": 194.42930297851564, "max": 221.89584350585938, "pos_frac": 0.765625, "sample": [207.3431396484375, 81.32988739013672, 221.89584350585938, -196.2356414794922, 66.78224182128906, 173.12197875976562, 27.084548950195312, 46.98749542236328, 175.79873657226562, 86.48992919921875, 188.0985107421875, -72.42747497558594, 194.216064453125, 9.323387145996094, 149.2509765625, 33.830955505371094, 88.86686706542969, -94.53534698486328, -17.247711181640625, 153.7159881591797, -25.98740005493164, 180.4251708984375, 5.114534378051758, -104.50967407226562, 196.94216918945312, 34.85084533691406, 174.63082885742188, 129.54673767089844, 2.1910934448242188, -4.534128189086914, 104.52395629882812, 3.9719161987304688, 138.69412231445312, 30.012733459472656, -67.87803649902344, 10.213584899902344, 97.84251403808594, 1.6144771575927734, 74.9339599609375, -25.81047821044922, 220.8838348388672, 187.4536895751953, 199.1856689453125, 28.762107849121094, -24.69019317626953, -21.715194702148438, 20.704912185668945, 179.34202575683594, 102.43450927734375, 138.53848266601562, 101.939208984375, 196.98504638671875, -45.767723083496094, 150.9602508544922, 21.986892700195312, 159.3460693359375, 98.80864715576172, 81.62518310546875, -8.774124145507812, -60.694068908691406, 194.52069091796875, 0.16193199157714844, -13.39468765258789, 1.8046112060546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000470.npy"}
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 78.38136291503906, "std": 95.31229400634766, "min": -180.6544952392578, "p10": -28.457382202148423, "median": 76.51891326904297, "p90": 190.61995239257814, "max": 256.8545227050781, "pos_frac": 0.8125, "sample": [4.5388031005859375, 223.8118896484375, 145.7903594970703, 159.24969482421875, 74.80078125, -33.871307373046875, 176.10470581054688, 51.069580078125, 183.6614227294922, 211.85446166992188, 75.51790618896484, 49.98457336425781, 185.11709594726562, 19.485458374023438, -146.72705078125, -2.235372543334961, 77.5199203491211, 192.20294189453125, 181.04908752441406, -87.67904663085938, 36.970882415771484, 41.49728775024414, 150.92977905273438, 106.395263671875, 167.962646484375, -180.6544952392578, 80.72664642333984, 108.1917724609375, 182.3125, 14.022415161132812, 143.62734985351562, 125.91642761230469, 28.212154388427734, 35.41810607910156, 87.80413818359375, 85.11392974853516, 187.27423095703125, -51.67460632324219, 49.89849090576172, 179.25930786132812, 15.863960266113281, 21.212799072265625, 184.4311981201172, 206.44482421875, -7.161436080932617, 7.7255401611328125, 16.653518676757812, 55.90117263793945, -37.49006652832031, 153.13731384277344, -15.82489013671875, 192.0538330078125, 91.10990142822266, 16.115882873535156, 180.00942993164062, -0.051815032958984375, 19.545059204101562, 101.74452209472656, -115.93394470214844, 256.8545227050781, 24.4791259765625, 116.01559448242188, -4.842018127441406, 217.96351623535156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000471.npy"}
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 60.669456481933594, "std": 110.44590759277344, "min": -207.4546356201172, "p10": -75.8653984069824, "median": 55.39923858642578, "p90": 189.0542007446289, "max": 349.4891357421875, "pos_frac": 0.71875, "sample": [1.5619735717773438, 41.73621368408203, 87.51457977294922, -30.282485961914062, 189.34759521484375, 169.65760803222656, -26.39776611328125, 137.28355407714844, 208.9285888671875, 112.2218017578125, 84.23088073730469, 227.89852905273438, 11.26722526550293, -43.131805419921875, -6.809301376342773, 184.880615234375, -0.7223453521728516, 85.23766326904297, 182.4552001953125, 33.74336242675781, -110.01758575439453, 33.087371826171875, -29.320865631103516, 180.04522705078125, 11.516845703125, 176.15447998046875, 33.23933410644531, 193.89654541015625, 185.3388671875, -7.856105804443359, 19.580230712890625, 85.48532104492188, 87.97296905517578, -146.1453399658203, -36.62565612792969, 61.24822998046875, 219.84124755859375, 349.4891357421875, -6.354713439941406, -3.7324752807617188, 183.21221923828125, 188.36961364746094, -207.4546356201172, 192.59378051757812, -161.92970275878906, -89.89408111572266, 176.8868408203125, 16.30949592590332, -153.89190673828125, 94.66643524169922, 8.204032897949219, 61.375579833984375, -9.76324462890625, 50.72582244873047, 19.649429321289062, 60.072654724121094, 83.9095458984375, 28.493560791015625, 186.32095336914062, 40.36058807373047, 87.62545776367188, 66.83660125732422, -153.044189453125, 165.7454833984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000472.npy"}
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 66.60111999511719, "std": 114.08341217041016, "min": -171.5784149169922, "p10": -82.26561431884765, "median": 40.155906677246094, "p90": 222.48871459960938, "max": 295.2044677734375, "pos_frac": 0.734375, "sample": [238.24884033203125, 199.39910888671875, 29.036893844604492, -143.2164306640625, -120.91618347167969, 158.15492248535156, 43.39796447753906, 15.239410400390625, 27.78013038635254, 202.89962768554688, 176.05487060546875, 33.280487060546875, -3.767953872680664, 98.07762145996094, 1.2796249389648438, -58.315650939941406, 241.5255584716797, 222.68093872070312, 13.630561828613281, 196.65045166015625, -119.32278442382812, 28.577293395996094, 124.43695068359375, -83.95796203613281, -66.98544311523438, 169.43064880371094, 156.0464630126953, 179.84152221679688, 9.23906135559082, -16.64122772216797, -14.123199462890625, 4.949348449707031, -167.96798706054688, -32.305274963378906, -21.406003952026367, 95.08682250976562, 295.2044677734375, 232.68893432617188, 21.189189910888672, 13.310846328735352, 242.41522216796875, 172.25502014160156, 234.53228759765625, 96.52326965332031, 91.49981689453125, 76.72247314453125, 72.68640899658203, 5.14312744140625, 36.913848876953125, 173.2429962158203, 130.6629180908203, -2.6600189208984375, 115.67520141601562, 120.70415496826172, -12.976966857910156, 128.4199676513672, 222.04019165039062, -113.05532836914062, 12.725326538085938, -78.31680297851562, 19.102981567382812, 119.81402587890625, 191.56732177734375, -171.5784149169922], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000473.npy"}
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 62.0037727355957, "std": 108.03253936767578, "min": -181.117919921875, "p10": -53.27747459411621, "median": 39.131540298461914, "p90": 202.64934692382815, "max": 273.92510986328125, "pos_frac": 0.671875, "sample": [195.82870483398438, 175.80776977539062, 145.527587890625, -73.05886840820312, -8.29147720336914, 10.16598129272461, 230.1630859375, 209.75588989257812, 83.66120910644531, 58.62359619140625, 124.2042236328125, 69.67955017089844, -181.117919921875, 133.1980743408203, 73.72169494628906, 188.1338653564453, -98.73748016357422, 0.7303657531738281, -39.011268615722656, 32.2232666015625, -56.31050109863281, -7.784523010253906, 160.16078186035156, 39.53361511230469, 161.4540252685547, 40.31494140625, -0.37645912170410156, -33.01396179199219, -10.756515502929688, -2.4736404418945312, 162.26541137695312, 38.72946548461914, -162.38479614257812, -0.536468505859375, -19.047733306884766, 184.7224578857422, 20.78467559814453, 211.30947875976562, 205.57247924804688, 3.5759658813476562, 23.09333610534668, 45.110809326171875, 237.21942138671875, 159.0212860107422, -46.20041275024414, 273.92510986328125, 179.641357421875, 75.37443542480469, 130.28982543945312, -0.6665592193603516, 179.75120544433594, 265.9932861328125, -43.92529296875, 35.50641632080078, 23.293190002441406, -22.04965591430664, 183.18576049804688, -140.77694702148438, -7.128253936767578, 105.19062805175781, 154.57183837890625, -134.4146728515625, 11.86407470703125, 13.424602508544922], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000474.npy"}
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 88.88224792480469, "std": 99.47547912597656, "min": -82.82131958007812, "p10": -18.587537002563472, "median": 55.660661697387695, "p90": 198.3091033935547, "max": 286.23004150390625, "pos_frac": 0.78125, "sample": [-14.230257034301758, 4.997730255126953, -9.890586853027344, -3.8659820556640625, 172.20091247558594, 182.440673828125, 181.55567932128906, 199.4814453125, -12.214426040649414, -35.876319885253906, 174.87496948242188, -82.82131958007812, 176.09487915039062, 274.1787414550781, 14.591257095336914, -13.112686157226562, 164.57980346679688, 195.09715270996094, 8.225406646728516, 120.93494415283203, 41.636878967285156, 12.497756958007812, -20.45494270324707, 14.397918701171875, 160.399658203125, 7.033329010009766, 24.620811462402344, 171.77247619628906, 175.64694213867188, -7.461723327636719, 202.07644653320312, 188.29428100585938, 191.8899383544922, 195.57363891601562, 59.134727478027344, 2.144023895263672, 163.6770477294922, 10.008705139160156, 130.45787048339844, 31.071025848388672, 259.4351806640625, 176.17599487304688, 14.819324493408203, 25.425750732421875, 159.31842041015625, -55.29865646362305, 31.395153045654297, 98.00642395019531, 286.23004150390625, 29.33521270751953, 181.69261169433594, 232.43978881835938, 238.27407836914062, 105.35289764404297, 174.42117309570312, -65.12078094482422, -62.86078643798828, 180.83981323242188, 45.737091064453125, 183.68295288085938, 1.5289497375488281, -4.255516052246094, -51.95631408691406, 52.18659591674805], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000475.npy"}
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 59.983612060546875, "std": 100.30513763427734, "min": -138.19207763671875, "p10": -46.660711669921874, "median": 52.41046142578125, "p90": 179.13789672851564, "max": 286.5003356933594, "pos_frac": 0.671875, "sample": [206.9154052734375, 70.39570617675781, -11.334722518920898, 80.01568603515625, -46.42826843261719, 40.35662841796875, -43.831825256347656, 180.515380859375, 286.5003356933594, 57.713897705078125, 231.43450927734375, -2.079944610595703, 173.5997314453125, 162.5793914794922, 135.18650817871094, 124.8545150756836, 126.01126098632812, -1.0634078979492188, 58.18163299560547, 44.738677978515625, -129.83029174804688, 2.425189971923828, 130.05059814453125, 171.7576141357422, 40.67823791503906, -125.18174743652344, 33.97400665283203, -26.089248657226562, -39.818695068359375, 4.996623992919922, 201.22007751464844, -26.635513305664062, 115.71659851074219, 10.18777847290039, 68.10106658935547, -66.57093811035156, -124.87843322753906, -138.19207763671875, 47.107025146484375, 9.678031921386719, -21.756210327148438, 161.52891540527344, 16.333541870117188, -31.993438720703125, 125.95890808105469, 64.04533386230469, 166.4381561279297, -13.257347106933594, -25.968170166015625, 153.5528564453125, 124.20777893066406, -90.48049926757812, 199.34326171875, -25.5926513671875, 104.85853576660156, 230.70065307617188, 3.948366165161133, 66.67078399658203, 175.92376708984375, 156.28787231445312, -21.024919509887695, 168.1396484375, 164.88916015625, -46.76033020019531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000476.npy"}
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 58.8123779296875, "std": 120.48551940917969, "min": -220.86819458007812, "p10": -128.44216156005857, "median": 53.1734733581543, "p90": 204.04732818603517, "max": 280.37115478515625, "pos_frac": 0.765625, "sample": [-139.7643280029297, -102.02377319335938, 149.37176513671875, 61.21508026123047, -1.904693603515625, 186.21749877929688, 7.112068176269531, 23.565561294555664, 51.070953369140625, 3.1393280029296875, 202.39013671875, 56.997650146484375, -150.10826110839844, -4.876914978027344, 23.133272171020508, 7.095180511474609, -18.288955688476562, 251.25289916992188, 150.4187469482422, -82.85079956054688, 221.2239227294922, 124.35530090332031, 5.382293701171875, 55.27599334716797, 91.81573486328125, -21.21715545654297, -152.17703247070312, 1.2005233764648438, 191.99163818359375, 5.7588043212890625, -185.9712677001953, 12.629631042480469, 60.213417053222656, 34.11147689819336, -194.89697265625, -35.21076583862305, -45.33049011230469, 43.586585998535156, 136.7371368408203, 40.26115417480469, 209.2737274169922, 4.1073455810546875, 204.75755310058594, 192.36354064941406, 80.01664733886719, 208.876953125, -190.58413696289062, 141.3704833984375, 124.50992584228516, 0.695404052734375, 280.37115478515625, 149.04696655273438, 9.36601448059082, 186.38735961914062, 209.9727020263672, 71.31727600097656, -220.86819458007812, 194.87811279296875, 190.57061767578125, 193.79531860351562, 100.26990509033203, 174.53912353515625, 26.2978515625, 159.75799560546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000477.npy"}
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 50.677398681640625, "std": 103.4160385131836, "min": -165.34530639648438, "p10": -92.16498641967773, "median": 37.30876922607422, "p90": 185.3426971435547, "max": 229.49436950683594, "pos_frac": 0.6875, "sample": [-5.085241317749023, 174.8362579345703, 85.97543334960938, 156.97958374023438, 139.3545684814453, -3.009918212890625, 191.05477905273438, 83.54238891601562, -22.41533851623535, 103.70704650878906, 3.2765884399414062, 12.766876220703125, 88.25482177734375, 209.64627075195312, -39.95496368408203, 17.68951416015625, -100.99351501464844, 38.463958740234375, -16.03418731689453, 73.28863525390625, -51.714195251464844, -35.27569580078125, 185.90643310546875, 143.0170135498047, 176.12875366210938, -88.5583724975586, 153.939453125, 145.62655639648438, 87.66783142089844, -165.1474609375, 61.62998962402344, 7.464588165283203, -149.44064331054688, -157.33816528320312, 16.110309600830078, 131.89691162109375, 179.86837768554688, 27.789703369140625, 177.84970092773438, 28.858070373535156, 190.18844604492188, -5.464242935180664, -131.61489868164062, 44.05824279785156, 184.02731323242188, 54.76507568359375, -3.9384307861328125, 229.49436950683594, 30.917633056640625, -1.6445121765136719, 209.8047332763672, 81.8311538696289, -47.461883544921875, 139.86630249023438, 2.0895843505859375, 202.47970581054688, 141.9056854248047, -41.179481506347656, 4.515205383300781, 36.15357971191406, -93.71067810058594, -165.34530639648438, 100.47028350830078, 13.522966384887695], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000478.npy"}
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 46.35863494873047, "std": 98.26921081542969, "min": -221.67648315429688, "p10": -79.35188446044921, "median": 35.7416877746582, "p90": 171.81891479492188, "max": 288.62615966796875, "pos_frac": 0.671875, "sample": [6.496669769287109, 23.768115997314453, 135.287841796875, -143.8801727294922, -81.9720458984375, 81.1265869140625, 213.4912109375, 94.29594421386719, 119.85060119628906, -82.7000961303711, 55.2789192199707, -53.45521926879883, 15.114227294921875, -13.136350631713867, -14.558830261230469, 31.99414825439453, 220.5815887451172, 195.81500244140625, -21.989423751831055, 1.7810020446777344, -43.566165924072266, 51.50417709350586, 288.62615966796875, 194.44509887695312, 28.254907608032227, -123.89480590820312, 65.43217468261719, 39.489227294921875, -108.00641632080078, -23.971946716308594, 0.0571441650390625, 122.9719009399414, -3.50286865234375, 156.5184783935547, -8.299468994140625, 60.749691009521484, -131.63494873046875, -1.718109130859375, 17.338104248046875, 147.41944885253906, 143.6460723876953, -221.67648315429688, -25.915122985839844, 63.79319763183594, 167.97634887695312, 173.46572875976562, 85.2420883178711, -24.998779296875, -1.7591171264648438, 94.64896392822266, 41.390106201171875, 117.84272003173828, -73.23817443847656, 176.7860107421875, -7.881837844848633, 124.11386108398438, 119.45111846923828, 19.111223220825195, 56.176231384277344, 140.85250854492188, 3.077810287475586, 31.379302978515625, 98.11172485351562, 153.95555114746094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000479.npy"}
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 70.82891082763672, "std": 102.57160186767578, "min": -200.1168212890625, "p10": -45.98361892700195, "median": 73.94646835327148, "p90": 179.51672973632813, "max": 329.9810485839844, "pos_frac": 0.765625, "sample": [80.51566314697266, -5.36492919921875, 177.68624877929688, -161.7890167236328, 135.56028747558594, 20.48724937438965, 162.845458984375, -200.1168212890625, 36.23548889160156, 329.9810485839844, -44.43956756591797, 97.32327270507812, 178.45745849609375, -25.80352020263672, 98.90402221679688, -18.044235229492188, -46.645355224609375, 187.92344665527344, -38.248077392578125, 12.492542266845703, -115.2235107421875, 162.83071899414062, 16.471351623535156, -112.40087127685547, 151.1064453125, 157.36053466796875, 275.68511962890625, 88.82257080078125, 176.4580841064453, 67.37727355957031, 104.83324432373047, 2.1625328063964844, 23.47357940673828, 139.02598571777344, 42.399017333984375, 98.1317138671875, 204.67787170410156, 133.39605712890625, -117.03028106689453, 105.018798828125, 168.61700439453125, 93.1662826538086, 56.809898376464844, 166.23773193359375, 52.89057922363281, 156.26600646972656, 5.423370361328125, 24.94073486328125, 51.36152267456055, 44.23837661743164, -6.418172836303711, -6.0947113037109375, 179.970703125, 87.89369201660156, 160.66371154785156, 121.91572570800781, 189.35804748535156, 168.70787048339844, 4.514537811279297, -53.800758361816406, 53.74513244628906, 206.52102661132812, -3.702880859375, 27.2879638671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000480.npy"}
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 61.932533264160156, "std": 102.05841064453125, "min": -181.97032165527344, "p10": -51.58416366577147, "median": 53.03235626220703, "p90": 179.61605529785157, "max": 316.78704833984375, "pos_frac": 0.734375, "sample": [51.76342010498047, -1.4035720825195312, 180.06930541992188, 57.461021423339844, -91.19514465332031, 124.61996459960938, 161.09759521484375, 67.736083984375, -72.55281829833984, -105.0367431640625, 157.22393798828125, 61.56465148925781, 171.71910095214844, 76.3041000366211, 13.016029357910156, 120.9061279296875, 175.60073852539062, 33.33393859863281, 138.306396484375, 153.8753662109375, -178.97340393066406, -4.6528778076171875, 176.65713500976562, 53.14598083496094, -2.031848907470703, -37.51807403564453, 27.518417358398438, 176.00466918945312, 2.190868377685547, 95.42105102539062, 0.9428348541259766, 183.158935546875, 17.931533813476562, -20.083740234375, 53.522308349609375, -24.175827026367188, 122.97801971435547, 9.44265365600586, 4.715518951416016, 7.28179931640625, 52.918731689453125, 32.645469665527344, 178.5584716796875, 192.56008911132812, -4.72735595703125, 204.14320373535156, -33.55523681640625, -16.996986389160156, 184.6085662841797, 178.27000427246094, 174.60531616210938, 15.739501953125, 83.12395477294922, -30.91724395751953, -57.61248779296875, 316.78704833984375, -181.97032165527344, 123.8803939819336, 104.85371398925781, 287.494873046875, 101.01541900634766, 14.018295288085938, -107.3564453125, 13.709651947021484], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000481.npy"}
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 63.25123977661133, "std": 116.69480895996094, "min": -246.02027893066406, "p10": -75.37815170288084, "median": 48.95168685913086, "p90": 202.43208923339847, "max": 336.1891174316406, "pos_frac": 0.6875, "sample": [25.477310180664062, -246.02027893066406, -20.965354919433594, 141.46597290039062, 68.99201202392578, 149.81423950195312, 49.50526428222656, -31.214439392089844, 6.539630889892578, 212.49952697753906, 103.21957397460938, 183.44107055664062, 166.9180145263672, 40.800270080566406, 120.45108032226562, 150.88079833984375, 239.59197998046875, -5.522621154785156, -184.53086853027344, 110.65193939208984, 205.55010986328125, -0.944122314453125, 336.1891174316406, -36.06178283691406, -3.783599853515625, 137.29086303710938, 143.67861938476562, 69.68074035644531, -9.268814086914062, 5.526386260986328, -6.224052429199219, 159.7303466796875, -87.75248718261719, 108.30001831054688, 195.15670776367188, 169.18324279785156, 108.4112777709961, 25.766773223876953, -21.17447280883789, -65.24209594726562, 314.0237121582031, 112.62261962890625, 46.879859924316406, 8.406005859375, 0.7119140625, 182.14134216308594, 55.828651428222656, -0.29025840759277344, 292.40411376953125, -123.48402404785156, 109.77635955810547, 31.253494262695312, 74.9918212890625, 33.84797668457031, 163.2704315185547, 0.26555824279785156, -64.20226287841797, 259.10723876953125, -79.72217559814453, -111.0257568359375, 132.66531372070312, -21.58936309814453, -134.20928955078125, 48.398109436035156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000482.npy"}
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 76.7440185546875, "std": 97.56475067138672, "min": -185.093994140625, "p10": -24.34904518127441, "median": 63.603092193603516, "p90": 196.4373321533203, "max": 290.7118225097656, "pos_frac": 0.78125, "sample": [16.752098083496094, 188.3512725830078, -20.046215057373047, 97.33427429199219, 168.07566833496094, 172.11386108398438, 171.85874938964844, -54.65191650390625, -11.04559326171875, -18.118946075439453, 80.88504028320312, 173.53836059570312, 125.74923706054688, 143.18312072753906, -56.411739349365234, 157.5988311767578, 71.69505310058594, -52.587501525878906, 4.429473876953125, 175.69175720214844, 59.74476623535156, 185.22512817382812, 185.63992309570312, 42.61335754394531, 197.146728515625, 61.435997009277344, 228.6239776611328, -29.29004669189453, 8.520645141601562, 85.88516235351562, 114.94486236572266, -157.58090209960938, 187.44447326660156, 3.7686290740966797, -26.193115234375, 290.7118225097656, 0.1641082763671875, -185.093994140625, 43.735382080078125, -19.621253967285156, 66.73491668701172, 36.915184020996094, 202.39718627929688, 111.17790985107422, 3.46270751953125, 65.77018737792969, 230.23524475097656, 199.69549560546875, 71.23961639404297, 150.17868041992188, 41.03008270263672, 57.56776428222656, 31.517005920410156, 208.06076049804688, -17.167749404907227, 28.47987174987793, 54.13840103149414, 5.458063125610352, 194.78207397460938, 163.31710815429688, 19.564437866210938, -9.808250427246094, -2.751667022705078, 187.40162658691406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000483.npy"}
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 51.696807861328125, "std": 76.41914367675781, "min": -130.49691772460938, "p10": -33.01695899963378, "median": 39.47609901428223, "p90": 154.31938171386722, "max": 223.60464477539062, "pos_frac": 0.828125, "sample": [31.972984313964844, 11.915313720703125, 1.4385223388671875, 0.33429718017578125, 198.21876525878906, 4.38902473449707, 81.498291015625, -14.744369506835938, 144.93246459960938, 0.4667205810546875, 29.341087341308594, -103.71881866455078, 69.7947998046875, 68.18392944335938, 93.53300476074219, 95.44677734375, -66.91099548339844, -18.436878204345703, 91.86282348632812, 86.8686752319336, 0.7216625213623047, 158.34234619140625, 122.77229309082031, 6.625633239746094, 92.69303131103516, 112.33519744873047, 192.1170196533203, 35.704742431640625, 95.42890930175781, 24.67607307434082, 42.74558639526367, 9.996665954589844, 1.4685211181640625, -36.4683837890625, 223.60464477539062, -11.298206329345703, 16.53455924987793, 16.33904266357422, 72.86166381835938, 0.0337677001953125, 135.551513671875, -42.327293395996094, 216.76658630371094, -63.56646728515625, 138.89166259765625, 86.4092025756836, 180.4913330078125, -24.963634490966797, 70.86953735351562, 36.96513366699219, 199.76864624023438, 19.939716339111328, 85.7440185546875, 20.048995971679688, -83.19327545166016, -130.49691772460938, 55.05791091918945, 48.704315185546875, 117.64517211914062, 114.60848236083984, 25.798004150390625, 59.542938232421875, 41.987064361572266, 14.731765747070312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000484.npy"}
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 56.73398208618164, "std": 88.64051055908203, "min": -183.13101196289062, "p10": -20.00206890106201, "median": 20.84339141845703, "p90": 186.75169677734374, "max": 279.04541015625, "pos_frac": 0.734375, "sample": [9.461967468261719, 121.83781433105469, 85.96696472167969, 134.68942260742188, 116.703125, -7.337806701660156, 2.3000659942626953, 14.787313461303711, 51.77363586425781, -47.06092071533203, -132.31788635253906, 21.38482666015625, 186.20208740234375, 187.95114135742188, -10.7864990234375, 196.63771057128906, 8.766475677490234, 175.11741638183594, 228.1334228515625, 16.43251609802246, 15.351430892944336, 122.05474853515625, -20.914688110351562, 186.98724365234375, -4.1819915771484375, 0.3137245178222656, -183.13101196289062, 89.96821594238281, 61.870445251464844, 78.76518249511719, -16.38037109375, -2.9996604919433594, 66.90465545654297, 0.1707019805908203, 153.7518310546875, -78.70735168457031, 279.04541015625, 192.62216186523438, -14.714645385742188, 172.96498107910156, 18.802936553955078, -20.78525733947754, 37.829193115234375, 12.232780456542969, 80.33887481689453, 2.207387924194336, -5.465238571166992, 179.912353515625, 112.43912506103516, 11.507644653320312, 17.523494720458984, 37.38993835449219, 126.70185089111328, 184.8123321533203, 85.35449981689453, 68.13246154785156, 51.69001007080078, 188.18267822265625, -18.17462921142578, -27.420927047729492, 20.301956176757812, -1.8537788391113281, 19.070602416992188, -10.139419555664062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000485.npy"}
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 52.51021957397461, "std": 109.31747436523438, "min": -188.4659881591797, "p10": -75.65203781127929, "median": 42.08181571960449, "p90": 199.51504211425782, "max": 348.1924743652344, "pos_frac": 0.6875, "sample": [31.25035858154297, -7.8510589599609375, 260.5726318359375, 4.186883926391602, -19.760986328125, -3.115100860595703, 34.51094055175781, 348.1924743652344, 19.845237731933594, -6.2300262451171875, 5.211172103881836, 168.66015625, 111.01265716552734, 88.30770111083984, 57.430213928222656, 32.82923889160156, -15.276947021484375, 195.30523681640625, -1.3534736633300781, 201.31924438476562, 116.89116668701172, 47.94502258300781, -65.40483093261719, -68.4127197265625, 208.74920654296875, -138.1400909423828, 39.60711669921875, -78.71138000488281, 60.19495391845703, -188.4659881591797, 204.85049438476562, 45.66679382324219, 46.38934326171875, -122.8865966796875, 1.6608753204345703, 136.65716552734375, 74.29056549072266, 156.13658142089844, -132.11184692382812, 44.556514739990234, 0.23447036743164062, 176.7792205810547, 59.45904541015625, 187.43618774414062, 190.1622314453125, -68.5135726928711, 101.44984436035156, 149.2325897216797, 62.16304016113281, 179.6335906982422, 87.197509765625, -29.872940063476562, 20.298385620117188, 229.60470581054688, -1.5070114135742188, -82.91746520996094, 0.3954811096191406, 105.56660461425781, 232.14125061035156, -17.68027687072754, 34.05609893798828, -180.83897399902344, 69.24551391601562, -37.580482482910156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000486.npy"}
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 65.39157104492188, "std": 100.8088150024414, "min": -160.7935333251953, "p10": -42.802426147460935, "median": 51.85353469848633, "p90": 184.4798355102539, "max": 389.6029968261719, "pos_frac": 0.75, "sample": [103.73805236816406, -54.134613037109375, 156.0166015625, 109.22943878173828, 71.6388931274414, 10.810003280639648, -58.16954040527344, 15.766044616699219, 96.28792572021484, 157.0447235107422, -43.857452392578125, -29.28668212890625, 37.56220245361328, -59.83067321777344, -9.337699890136719, 185.7114715576172, 123.24493408203125, 24.172611236572266, 334.1898193359375, 79.12779235839844, -2.2270736694335938, 173.55459594726562, 36.49452209472656, 193.6108856201172, 135.45077514648438, -15.452022552490234, 48.58313751220703, -131.58905029296875, 16.60681915283203, 57.77886962890625, 159.91383361816406, 67.39710235595703, 83.30721282958984, -0.35701560974121094, 186.64642333984375, 8.801841735839844, 128.33729553222656, -34.754913330078125, 55.123931884765625, 109.60072326660156, 177.57220458984375, 25.029071807861328, 73.92141723632812, 208.17105102539062, -140.7953338623047, 133.59466552734375, 0.87823486328125, -40.3406982421875, 86.21500396728516, -160.7935333251953, -11.226930618286133, 181.60601806640625, 25.14118194580078, 36.90608215332031, 7.7236175537109375, 161.49017333984375, 27.624515533447266, 23.731414794921875, -19.1923828125, 202.17079162597656, 99.1292953491211, 29.15213966369629, 389.6029968261719, 140.9974365234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000487.npy"}
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 68.73626708984375, "std": 96.98046875, "min": -170.01771545410156, "p10": -30.28283462524414, "median": 50.42991065979004, "p90": 196.1921157836914, "max": 283.90533447265625, "pos_frac": 0.703125, "sample": [28.56573486328125, -6.761573791503906, -3.190683364868164, -6.314168930053711, 27.496788024902344, -73.2373275756836, -10.358436584472656, 189.10159301757812, 191.4445343017578, 82.81649780273438, 0.9011821746826172, -4.034423828125, 188.30569458007812, 16.3258056640625, 10.616386413574219, 226.28506469726562, -26.63500213623047, 178.3094024658203, 3.3799667358398438, 97.6864013671875, 117.22039031982422, 42.990440368652344, 3.6145401000976562, -12.283401489257812, -57.67792510986328, 54.111595153808594, -94.10614013671875, 195.6385498046875, -5.66595458984375, 192.50552368164062, 0.6893157958984375, -67.4429931640625, -15.973058700561523, 170.77047729492188, 211.92613220214844, 118.50676727294922, 182.01515197753906, -31.84619140625, 217.04412841796875, 216.7401580810547, 218.44471740722656, 153.58358764648438, 168.34715270996094, -12.086669921875, 79.8367691040039, 116.3558120727539, -8.837844848632812, 83.78938293457031, -44.991546630859375, 98.43968200683594, 85.32611083984375, 98.27546691894531, -170.01771545410156, 93.00897216796875, 6.30517578125, 68.75037384033203, 283.90533447265625, 46.748226165771484, 195.99595642089844, 25.80080795288086, 13.313789367675781, 55.24842834472656, -2.1780319213867188, 196.27618408203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000488.npy"}
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 43.10588836669922, "std": 104.46649932861328, "min": -205.85873413085938, "p10": -70.76285324096679, "median": 23.480741500854492, "p90": 174.60291137695313, "max": 263.3831481933594, "pos_frac": 0.640625, "sample": [-162.6168670654297, 165.83709716796875, 17.800582885742188, 166.54571533203125, 133.5896453857422, 177.58645629882812, -48.171966552734375, 148.03421020507812, 55.52460861206055, 235.16024780273438, 105.85294342041016, 18.00811004638672, 127.84689331054688, -72.86227416992188, -6.738807678222656, -56.19627380371094, -24.175445556640625, -32.15129470825195, 149.86181640625, 40.554664611816406, 191.81936645507812, 70.731689453125, 144.45797729492188, 19.471019744873047, 142.4146728515625, 143.51353454589844, -6.732170104980469, -47.765403747558594, -50.391357421875, 47.53861618041992, -173.34121704101562, -52.553707122802734, 175.67742919921875, 19.32878875732422, 9.251007080078125, -16.323348999023438, -27.30791473388672, -2.5279483795166016, 172.095703125, 157.35488891601562, 45.37669372558594, 108.55673217773438, 7.034294128417969, 263.3831481933594, 57.610496520996094, -114.09645080566406, 192.7135009765625, 15.182693481445312, 116.2516860961914, 90.69377136230469, -159.01022338867188, 17.710594177246094, -18.89535140991211, 11.037153244018555, -5.644683837890625, -205.85873413085938, 162.73870849609375, 30.788108825683594, -119.34351348876953, 67.44635009765625, -65.86420440673828, 179.34674072265625, -1.872934341430664, 27.490463256835938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000489.npy"}
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 52.64936828613281, "std": 120.94915771484375, "min": -249.2847900390625, "p10": -114.96305541992186, "median": 48.31814384460449, "p90": 196.52640380859376, "max": 313.88946533203125, "pos_frac": 0.625, "sample": [-40.56097412109375, -3.4983978271484375, -249.2847900390625, -2.6506271362304688, -6.554437637329102, 88.95492553710938, 38.68046569824219, 183.7192840576172, 202.56903076171875, 244.61605834960938, -169.20486450195312, 192.76246643066406, -191.84722900390625, 7.459432601928711, 42.93888854980469, 97.67860412597656, 140.91519165039062, 105.56510162353516, 283.6575927734375, 25.92664337158203, 24.951553344726562, 194.77713012695312, 155.62733459472656, 212.7064208984375, -128.69833374023438, -39.92767333984375, 180.74940490722656, -2.6824073791503906, 53.70175552368164, 20.053260803222656, -150.44003295898438, 115.79324340820312, 86.05645751953125, 101.65180969238281, -0.10817527770996094, -4.5631866455078125, -19.386085510253906, 197.27609252929688, -122.40589904785156, -14.612754821777344, 120.3923110961914, -75.46357727050781, 84.8048324584961, 151.13632202148438, -5.766084671020508, 143.08340454101562, 43.09407043457031, 5.119295120239258, -71.73868560791016, -17.86767578125, 313.88946533203125, 153.96139526367188, -17.414432525634766, 107.42825317382812, 223.61265563964844, 125.52926635742188, 192.88897705078125, 66.77538299560547, 56.839752197265625, 185.74102783203125, -196.62049865722656, -97.59642028808594, -28.174209594726562, 53.54221725463867], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000490.npy"}
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 58.64308166503906, "std": 93.57073211669922, "min": -207.86257934570312, "p10": -42.986394500732416, "median": 29.71750259399414, "p90": 188.03905029296877, "max": 320.3740234375, "pos_frac": 0.71875, "sample": [152.4199676513672, -21.619972229003906, 123.92869567871094, 28.56829833984375, 163.32232666015625, 90.05496978759766, -3.5947818756103516, 146.41265869140625, 165.70327758789062, 45.80733108520508, 3.00018310546875, -103.17760467529297, 191.14492797851562, 37.28581237792969, 169.2030029296875, -70.3390884399414, 15.428211212158203, 117.73348999023438, 11.195240020751953, 105.04096984863281, -45.930511474609375, 152.56454467773438, 189.788330078125, -7.725742340087891, 42.25806427001953, 149.494873046875, 13.150321960449219, -16.245040893554688, -6.98614501953125, 5.622440338134766, 189.8230743408203, 109.94927978515625, -40.12458801269531, 28.102928161621094, 136.67984008789062, 19.87802505493164, 30.86670684814453, 320.3740234375, 41.12115478515625, -53.03630828857422, 183.9573974609375, 10.093185424804688, 18.983169555664062, -14.214563369750977, -0.14679336547851562, 107.08460998535156, -14.32147216796875, -63.099761962890625, 96.63677215576172, 190.89476013183594, -3.6513595581054688, 142.37432861328125, 212.40811157226562, 173.01907348632812, 8.027399063110352, 1.2689056396484375, 9.415611267089844, 63.61164093017578, 76.24821472167969, -207.86257934570312, -44.21288299560547, 6.254144668579102, -18.40018081665039, 191.6463623046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000491.npy"}
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 47.14491271972656, "std": 107.70629119873047, "min": -246.54542541503906, "p10": -68.65388107299803, "median": 20.839879989624023, "p90": 199.73701171875, "max": 264.92279052734375, "pos_frac": 0.671875, "sample": [264.92279052734375, 225.39892578125, -116.00820922851562, 1.243011474609375, 207.7264404296875, 6.548330307006836, 129.0081329345703, 78.54444885253906, 85.70808410644531, -2.307504653930664, 148.85133361816406, 199.1142578125, 57.867218017578125, -33.454307556152344, 15.195220947265625, 137.94607543945312, -198.65170288085938, 105.20747375488281, 3.7078094482421875, 115.01167297363281, 157.0394744873047, -72.9994888305664, -4.447998046875, -27.86797332763672, 12.186058044433594, 91.55293273925781, 111.86225128173828, 215.6451416015625, -58.514129638671875, -8.235542297363281, 180.50405883789062, 11.745988845825195, -1.9361419677734375, 80.35284423828125, -1.6051654815673828, 18.202194213867188, -246.54542541503906, 216.02334594726562, -98.37750244140625, -2.0517501831054688, 135.40130615234375, 2.5912399291992188, 89.84300994873047, 6.626361846923828, -44.38600158691406, 136.64776611328125, 23.47756576538086, 63.58149719238281, -56.48102569580078, 136.62135314941406, -192.89627075195312, -37.29042053222656, -11.850128173828125, 157.2830047607422, 209.09503173828125, 18.115703582763672, 127.19638061523438, 55.976715087890625, 114.34284973144531, 30.862533569335938, 3.570589065551758, -113.93669128417969, -41.23445129394531, 200.00390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000492.npy"}
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 72.34774017333984, "std": 102.5592041015625, "min": -184.62930297851562, "p10": -50.19787673950195, "median": 59.07877540588379, "p90": 204.08397521972657, "max": 258.3606872558594, "pos_frac": 0.734375, "sample": [7.760406494140625, 13.302349090576172, -39.291015625, 168.0237579345703, -29.020164489746094, 70.2237548828125, -152.2621307373047, 16.273555755615234, 2.3939380645751953, -50.371925354003906, 67.59762573242188, 239.46719360351562, 175.35430908203125, 199.96615600585938, 61.22907638549805, -63.55782699584961, 161.33914184570312, -2.1972579956054688, 123.28144836425781, 162.8779296875, 49.16504669189453, 35.72997283935547, -0.6297702789306641, 27.906753540039062, 56.92847442626953, -79.12007141113281, 154.1271514892578, -6.992210388183594, 198.8351287841797, 148.2225341796875, 142.16722106933594, 184.19432067871094, -102.94076538085938, -25.402454376220703, -78.78797912597656, 137.8016357421875, 186.28848266601562, -25.249298095703125, 207.6077880859375, 1.4585514068603516, 234.45388793945312, 113.16935729980469, 139.01051330566406, 258.3606872558594, 68.38663482666016, 206.82241821289062, -49.79176330566406, 205.8487548828125, 140.44403076171875, -4.994987487792969, -12.141288757324219, 165.63595581054688, 53.04180145263672, 20.564712524414062, 213.20309448242188, 156.48953247070312, 43.494110107421875, 45.24395751953125, 131.159912109375, 191.025634765625, 8.834266662597656, 95.96633911132812, 46.95630645751953, -184.62930297851562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000493.npy"}
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 60.82097625732422, "std": 113.88327026367188, "min": -236.80093383789062, "p10": -67.85813064575194, "median": 49.669782638549805, "p90": 189.29591522216796, "max": 341.32208251953125, "pos_frac": 0.671875, "sample": [132.6562042236328, 46.461158752441406, 55.83586883544922, 38.38703918457031, 182.8907470703125, 6.1981353759765625, -12.233642578125, 163.93385314941406, 118.66191864013672, -119.0606689453125, 10.851455688476562, -36.779693603515625, 1.5016021728515625, 115.65564727783203, 126.62747192382812, 77.3687744140625, 178.722412109375, -1.1723098754882812, 51.46733474731445, -88.32626342773438, 21.680484771728516, 341.32208251953125, 137.9929962158203, -26.204757690429688, 44.02417755126953, 184.67323303222656, 191.69949340820312, -5.387214660644531, 148.51791381835938, 187.85350036621094, 172.6749267578125, -187.82171630859375, -201.28167724609375, 174.8445281982422, -5.813249588012695, -115.77345275878906, -72.10262298583984, 170.38504028320312, 28.08525848388672, 152.55111694335938, 62.49977111816406, -1.5631427764892578, -236.80093383789062, -4.071529388427734, 24.896129608154297, 47.872230529785156, 223.6619110107422, -3.3837032318115234, -32.15003967285156, 41.239990234375, 189.91409301757812, 72.11538696289062, 84.46311950683594, 247.79537963867188, 66.88619232177734, 78.82068634033203, -57.954315185546875, -10.219928741455078, 136.08975219726562, 200.084228515625, 254.79922485351562, -22.829910278320312, -45.4228630065918, 184.23370361328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000494.npy"}
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 49.724788665771484, "std": 110.89356231689453, "min": -191.3806915283203, "p10": -74.66144790649415, "median": 19.425037384033203, "p90": 188.09420318603517, "max": 379.98101806640625, "pos_frac": 0.671875, "sample": [-74.83745574951172, -80.56920623779297, -191.3806915283203, -156.33612060546875, 162.2342987060547, -6.739631652832031, -131.6363983154297, 19.214981079101562, 93.67352294921875, -65.39497375488281, 1.697479248046875, -152.2623291015625, -13.735946655273438, 87.08256530761719, 76.50267791748047, 187.09817504882812, -27.294296264648438, 19.3992919921875, 132.13909912109375, -4.684385299682617, 31.133358001708984, 134.21971130371094, 157.73416137695312, 4.117837905883789, 158.20465087890625, 188.5210723876953, -15.550907135009766, 97.37602996826172, -0.24410629272460938, 54.478458404541016, 379.98101806640625, 5.645166397094727, 27.26184844970703, -74.25076293945312, 15.367177963256836, 42.6231689453125, 3.098114013671875, 141.06251525878906, 297.4526062011719, 156.64962768554688, -85.8631591796875, -9.981452941894531, 49.89711380004883, 59.81439208984375, 213.2039794921875, 19.450782775878906, 76.01752471923828, -5.400718688964844, -0.3662567138671875, 190.84140014648438, 4.867673873901367, 162.3265838623047, 47.919029235839844, -27.981910705566406, -61.81322479248047, 298.57427978515625, 84.1920166015625, -3.8702850341796875, 9.562660217285156, 64.9522705078125, 134.923095703125, 16.794342041015625, 260.4642333984375, 4.810657501220703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000495.npy"}
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 61.514312744140625, "std": 121.03245544433594, "min": -232.83853149414062, "p10": -76.17808380126952, "median": 40.83534622192383, "p90": 201.43731079101565, "max": 526.27783203125, "pos_frac": 0.703125, "sample": [120.77897644042969, 151.99526977539062, 204.14593505859375, 176.7832794189453, -103.9537582397461, 41.46014404296875, 82.14111328125, 279.3119201660156, 19.497819900512695, 119.31278991699219, -79.86082458496094, 40.210548400878906, 47.59795379638672, 41.670562744140625, 4.696393966674805, -11.084754943847656, -7.380741119384766, 21.72368812561035, 163.4407196044922, 32.336944580078125, -53.26520538330078, 526.27783203125, -3.9255104064941406, 197.02932739257812, 70.66366577148438, -232.83853149414062, 292.5326843261719, 142.78018188476562, -1.5757255554199219, 230.78878784179688, -94.89517974853516, 8.92123794555664, 193.6639862060547, 198.54583740234375, 5.1585540771484375, 1.6082630157470703, -120.39895629882812, -5.78044319152832, 58.515411376953125, 123.4450454711914, 19.962677001953125, 68.46322631835938, 202.676513671875, 94.67305755615234, 147.41787719726562, 49.661346435546875, -36.73689270019531, 95.7343978881836, -20.458045959472656, -0.6002388000488281, 213.35008239746094, -167.22113037109375, -38.79484558105469, 3.695791244506836, 159.3153076171875, 176.89004516601562, 6.304328918457031, 144.15264892578125, -73.63130187988281, 1.35479736328125, -77.26956176757812, 77.8049545288086, 14.47552490234375, -6.379783630371094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000496.npy"}
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 70.92979431152344, "std": 100.00299835205078, "min": -181.5006103515625, "p10": -23.379069137573236, "median": 69.22582626342773, "p90": 191.29019165039065, "max": 247.65335083007812, "pos_frac": 0.703125, "sample": [180.8961181640625, 37.093482971191406, 247.65335083007812, 164.51943969726562, 221.2821044921875, 23.579910278320312, -8.184982299804688, -6.1368255615234375, 11.502847671508789, 101.20449829101562, 135.17271423339844, 139.13540649414062, 111.6869888305664, -6.757957458496094, 144.84109497070312, 1.0977249145507812, 123.07302856445312, -178.87057495117188, 178.50721740722656, 207.09963989257812, -2.8739662170410156, -2.3402366638183594, -4.115913391113281, -6.496269226074219, 58.19189453125, -36.57713317871094, -13.328338623046875, 194.1888427734375, 152.81138610839844, 175.10482788085938, -5.920686721801758, 93.45658874511719, 114.40884399414062, 206.86219787597656, -181.5006103515625, 173.55816650390625, 184.4931640625, 82.92304229736328, 84.02166748046875, -37.727088928222656, 174.71978759765625, 132.63467407226562, -10.563968658447266, 70.3878173828125, 84.06803894042969, -26.108612060546875, 220.50814819335938, 133.5848388671875, -8.215194702148438, 36.32661437988281, 45.977657318115234, 46.0355224609375, 55.633453369140625, 152.6175079345703, 68.06383514404297, 222.44993591308594, -164.68374633789062, 3.6300277709960938, -17.010135650634766, 184.52667236328125, 38.12591552734375, 45.162391662597656, -133.67503356933594, 127.7752685546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000497.npy"}
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 69.00605010986328, "std": 93.96923828125, "min": -269.1732482910156, "p10": -25.020743560791015, "median": 56.85306167602539, "p90": 193.41942138671877, "max": 247.3131561279297, "pos_frac": 0.828125, "sample": [-25.924911499023438, 113.89942169189453, 161.3037872314453, 80.79298400878906, 67.57489013671875, -4.6692352294921875, 166.16375732421875, 159.48788452148438, -10.500202178955078, -65.96061706542969, 184.7757568359375, 60.98624801635742, 75.06698608398438, -26.283653259277344, 148.31948852539062, 74.05459594726562, 9.185020446777344, -50.877845764160156, 91.22919464111328, 143.0010223388672, 247.3131561279297, 38.635841369628906, 190.06411743164062, 19.76260757446289, 63.08684539794922, 32.15275573730469, -29.813194274902344, 171.65769958496094, 171.61923217773438, 44.85962677001953, 205.08477783203125, 179.05174255371094, 2.712726593017578, 15.307083129882812, 55.370506286621094, -89.39923095703125, -7.474006652832031, 57.3687744140625, 1.6296443939208984, 6.857980728149414, 98.27754211425781, 190.5867156982422, -269.1732482910156, 141.3759002685547, 99.8454818725586, 43.98900604248047, 3.2277565002441406, 56.33734893798828, 5.967899322509766, 62.61769485473633, 218.5821533203125, 194.63343811035156, 92.97106170654297, 15.523735046386719, 201.19581604003906, 3.7389087677001953, 245.0063018798828, 20.981719970703125, 6.599695205688477, 14.7305908203125, 23.732192993164062, -22.91101837158203, 19.831077575683594, 221.2462615966797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000498.npy"}
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 68.4577865600586, "std": 120.72773742675781, "min": -189.00186157226562, "p10": -107.27228775024412, "median": 81.28788375854492, "p90": 213.4849792480469, "max": 360.920166015625, "pos_frac": 0.765625, "sample": [120.12689971923828, 209.3408203125, -182.83981323242188, 242.22216796875, 19.021652221679688, 67.92333984375, 26.97911262512207, -189.00186157226562, 21.485301971435547, 112.98307800292969, 85.14551544189453, 163.46981811523438, 155.9282989501953, -145.67794799804688, 176.04693603515625, 5.911827087402344, 360.920166015625, -122.79093933105469, 199.03724670410156, 184.02439880371094, 78.67155456542969, -51.139923095703125, -37.179344177246094, -116.9171142578125, -84.76769256591797, 133.60980224609375, 69.40170288085938, 219.8895721435547, -63.14448547363281, 81.13768768310547, -45.182247161865234, 24.81008529663086, 116.56294250488281, 154.9818878173828, 81.43807983398438, -31.967147827148438, 25.125015258789062, 133.81146240234375, 6.988248825073242, 89.86726379394531, 217.53253173828125, 10.607276916503906, 199.49002075195312, 63.00808334350586, -164.01063537597656, 98.4782485961914, 102.06998443603516, 101.22135162353516, 215.26104736328125, 242.9545440673828, 195.66065979003906, 18.779319763183594, 59.684478759765625, -68.73779296875, -181.06854248046875, 156.80905151367188, 127.42982482910156, 152.368408203125, -50.0283203125, 244.8865509033203, 7.182136535644531, 173.08181762695312, 144.14923095703125, 18.235580444335938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000499.npy"}
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 61.45201110839844, "std": 115.31900024414062, "min": -311.99981689453125, "p10": -89.60629959106444, "median": 73.59747314453125, "p90": 189.82392272949218, "max": 258.50128173828125, "pos_frac": 0.75, "sample": [-79.77372741699219, 140.3519287109375, 15.967689514160156, 7.003324508666992, 112.38885498046875, 94.43132019042969, 100.19854736328125, 154.76992797851562, 98.14694213867188, 194.09629821777344, 58.210723876953125, 182.80230712890625, 184.20748901367188, -93.82025909423828, 190.66807556152344, 13.244003295898438, 181.2585906982422, 50.532958984375, 24.864051818847656, -15.324920654296875, 205.61915588378906, 171.49838256835938, 20.47101593017578, 131.1835174560547, -10.387252807617188, 192.6520538330078, 86.10331726074219, 9.839183807373047, -41.840599060058594, -140.31161499023438, -146.089599609375, 93.87879943847656, 19.229373931884766, 48.28160095214844, -143.7642822265625, 7.202535629272461, 84.205078125, 258.50128173828125, 208.231689453125, -3.6027908325195312, 131.53172302246094, 19.24091148376465, 187.85423278808594, 50.33778762817383, -47.63840103149414, 141.94107055664062, 79.46340942382812, -311.99981689453125, 181.06752014160156, -192.87094116210938, 218.1515655517578, -59.616607666015625, 2.7282257080078125, 163.23269653320312, 163.8614501953125, 7.986530303955078, -6.372962951660156, 67.73153686523438, 118.5958480834961, -39.827186584472656, -128.90084838867188, 146.15866088867188, 187.50003051757812, 187.64712524414062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000500.npy"}
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 54.93168640136719, "std": 77.73410034179688, "min": -158.3297119140625, "p10": -16.295884132385254, "median": 37.870174407958984, "p90": 172.55080871582035, "max": 218.03956604003906, "pos_frac": 0.765625, "sample": [27.270973205566406, 192.24765014648438, -22.177505493164062, 46.88248062133789, 123.58921813964844, 138.1266632080078, -27.388626098632812, 33.61669158935547, 9.026897430419922, 133.99920654296875, 152.71571350097656, 20.726987838745117, 78.70307159423828, 2.6549835205078125, -15.94422721862793, -8.19727897644043, 57.299903869628906, 177.60238647460938, -5.054267883300781, -7.394662857055664, 48.968360900878906, -2.4270477294921875, 0.6210517883300781, 218.03956604003906, 182.61956787109375, 158.1857147216797, 160.7637939453125, 216.62789916992188, 0.7743644714355469, 9.121055603027344, 47.33698272705078, -65.06582641601562, -4.385017395019531, 2.150440216064453, 81.56462097167969, 67.11956787109375, 114.38150024414062, 4.514396667480469, 1.4164619445800781, 81.59593200683594, 54.755950927734375, 158.349365234375, 19.738765716552734, 110.16471099853516, -16.511695861816406, -158.3297119140625, 63.435585021972656, 110.81678771972656, -67.57991027832031, 24.864036560058594, 42.1236572265625, 51.328330993652344, 15.498344421386719, -15.373336791992188, 82.89002990722656, -6.297235488891602, 21.445098876953125, 2.8900928497314453, 195.97535705566406, 182.67677307128906, 69.12466430664062, 0.09988021850585938, 157.7597198486328, -16.44659423828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000501.npy"}
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 58.989627838134766, "std": 90.43058776855469, "min": -177.75865173339844, "p10": -47.98886413574217, "median": 46.96715545654297, "p90": 188.03990173339847, "max": 245.2069549560547, "pos_frac": 0.75, "sample": [-31.165077209472656, 6.416934967041016, 9.264999389648438, -60.99150085449219, 174.30259704589844, 129.16705322265625, 22.909996032714844, 45.78114318847656, -111.00321197509766, 68.05398559570312, 24.947025299072266, 201.2908935546875, -9.033576965332031, -55.199058532714844, 129.97824096679688, 245.2069549560547, 33.54180908203125, -177.75865173339844, -24.463951110839844, 99.76991271972656, 25.910354614257812, -14.489673614501953, 29.435956954956055, 165.41207885742188, 196.36251831054688, 89.92024993896484, 144.3054656982422, -15.592975616455078, -1.604269027709961, 48.153167724609375, 24.80078887939453, 8.741926193237305, 100.59730529785156, 6.269491195678711, 114.04396057128906, 91.98297119140625, 189.5931854248047, 212.43333435058594, -61.23895263671875, 3.6420021057128906, -11.667655944824219, 41.03453826904297, 77.86469268798828, 7.382720947265625, 95.1263427734375, -11.621627807617188, 33.823974609375, 195.104248046875, 81.65795135498047, -106.88343811035156, 79.40628051757812, 80.77963256835938, 163.32452392578125, 111.1180419921875, 234.24404907226562, -88.90316772460938, 61.47129821777344, 173.45941162109375, -19.331771850585938, 184.4155731201172, 2.0220985412597656, 66.20845794677734, 153.7109375, 91.89386749267578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000502.npy"}
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 62.25199890136719, "std": 100.76827239990234, "min": -192.98297119140625, "p10": -41.81475982666015, "median": 37.39867401123047, "p90": 190.42747192382814, "max": 216.8995819091797, "pos_frac": 0.734375, "sample": [157.62704467773438, 5.883243560791016, -30.2713623046875, 73.80835723876953, 123.78112030029297, 31.085792541503906, 216.76858520507812, -192.98297119140625, -0.9893035888671875, 130.05377197265625, 165.36907958984375, -19.265853881835938, 28.678504943847656, 189.32598876953125, -35.52534484863281, 90.42071533203125, -44.510223388671875, -4.081573486328125, 216.8995819091797, -61.02252960205078, 14.716880798339844, 128.120361328125, 181.03182983398438, 36.15184020996094, 7.152008056640625, 38.6455078125, 141.9972381591797, 200.83349609375, -2.169872283935547, 137.39071655273438, -15.38741683959961, 77.73450469970703, 195.70167541503906, 191.00521850585938, 190.8995361328125, -16.849945068359375, 133.19012451171875, -17.93158721923828, -140.2124481201172, 27.47510528564453, 1.6709098815917969, 25.240966796875, 5.058847427368164, 104.37956237792969, -2.586956024169922, 13.218582153320312, 60.52142333984375, 195.6986083984375, 133.65550231933594, 10.409194946289062, 182.46197509765625, 175.0865936279297, -151.4847869873047, 8.5794677734375, 173.09132385253906, 147.982666015625, 4.349416732788086, -92.18660736083984, 156.74806213378906, 186.48497009277344, 15.53030014038086, 119.86263275146484, 122.98113250732422, -163.17315673828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000503.npy"}
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 81.27403259277344, "std": 89.22906494140625, "min": -148.7404327392578, "p10": -10.528705978393553, "median": 74.43722534179688, "p90": 199.90596466064454, "max": 271.91693115234375, "pos_frac": 0.828125, "sample": [16.7476806640625, 183.3475799560547, -1.7866973876953125, 132.97085571289062, 230.39161682128906, -15.230537414550781, 130.61553955078125, 25.46991539001465, 197.2494659423828, 11.279436111450195, 53.458648681640625, 27.058624267578125, 7.8735198974609375, 201.04446411132812, 33.83076858520508, 169.6264190673828, 170.48452758789062, -113.44551849365234, 121.40664672851562, 148.8529052734375, -11.370365142822266, 102.61483764648438, 3.4551773071289062, 178.28053283691406, 204.0585174560547, -27.831588745117188, 83.4070053100586, 187.57423400878906, 100.59248352050781, 271.91693115234375, 209.65005493164062, 231.70834350585938, -5.635967254638672, 65.37446594238281, 79.302978515625, -8.564834594726562, 6.069175720214844, -127.23011016845703, 167.19241333007812, 72.72154235839844, 170.33877563476562, -13.34033203125, -3.111560821533203, 27.9564208984375, 69.156494140625, 169.1815185546875, 103.77593994140625, 57.10417938232422, 38.74798583984375, 130.1253662109375, 131.6566619873047, 70.74203491210938, -148.7404327392578, 111.26126098632812, 125.41732788085938, 27.75194549560547, 14.588834762573242, 76.15290832519531, 128.11651611328125, 26.708528518676758, 9.257608413696289, 208.42526245117188, 100.43750762939453, 55.29527282714844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000504.npy"}
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 53.1377067565918, "std": 111.53194427490234, "min": -243.45095825195312, "p10": -67.33542556762694, "median": 29.07724952697754, "p90": 190.38143157958984, "max": 305.30462646484375, "pos_frac": 0.703125, "sample": [126.92019653320312, -7.731403350830078, 103.42137145996094, 26.706478118896484, 1.1468391418457031, 305.30462646484375, -165.62982177734375, 246.4390106201172, 166.91062927246094, 186.8905029296875, 190.66343688964844, -17.947799682617188, 42.444801330566406, 5.992805480957031, 21.22198486328125, -28.9774169921875, -25.813385009765625, -13.158100128173828, 135.0387420654297, 6.134380340576172, -94.75794982910156, 177.88931274414062, 141.03146362304688, 28.936222076416016, 83.69120025634766, 241.58248901367188, 2.2897872924804688, -74.60502624511719, 29.218276977539062, 0.12427139282226562, -48.36114501953125, 0.8891220092773438, 189.72341918945312, -142.77606201171875, 203.6231231689453, 167.99349975585938, 135.61251831054688, 186.49656677246094, -3.933805465698242, -12.640396118164062, 33.3599853515625, 22.18779754638672, -9.522911071777344, -139.41873168945312, 34.64019775390625, 69.39045715332031, 89.8836898803711, -13.003725051879883, 210.9114532470703, -149.58941650390625, -19.919151306152344, 173.50027465820312, 183.54212951660156, 50.116912841796875, 177.45263671875, 37.561279296875, 156.20428466796875, 17.973464965820312, 11.703933715820312, 203.79165649414062, 35.040245056152344, 0.8260879516601562, -50.373023986816406, -243.45095825195312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000505.npy"}
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 58.36964416503906, "std": 109.03714752197266, "min": -264.8327331542969, "p10": -42.581746292114246, "median": 53.26981735229492, "p90": 184.8759735107422, "max": 256.43609619140625, "pos_frac": 0.71875, "sample": [31.754995346069336, 152.200927734375, 6.866893768310547, -22.606613159179688, 177.3612518310547, 183.7001190185547, 77.09893798828125, -47.40952682495117, 160.42312622070312, -29.358306884765625, 3.6107311248779297, 90.46464538574219, 52.536216735839844, 15.133232116699219, 105.95834350585938, 114.94256591796875, -184.88851928710938, 227.8851776123047, 243.1953125, 24.39813995361328, 184.5509033203125, -152.40728759765625, -10.656936645507812, -264.8327331542969, -151.4609375, -180.93585205078125, 33.659576416015625, 196.22254943847656, 222.31338500976562, 185.01528930664062, 185.6066436767578, -31.316925048828125, 256.43609619140625, -18.31903076171875, 183.0001220703125, 26.784873962402344, 161.4463653564453, 88.9920425415039, 140.96841430664062, -12.672271728515625, -91.50909423828125, 11.441719055175781, 9.371749877929688, 12.500648498535156, 90.43212890625, 79.77165222167969, -0.4621086120605469, 54.92560577392578, 131.1942596435547, 52.372779846191406, 54.00341796875, 170.086669921875, -5.6953277587890625, 149.13314819335938, 128.4722442626953, -3.4882278442382812, 4.938468933105469, 124.8039779663086, -6.245750427246094, -9.154193878173828, 62.976593017578125, 168.97708129882812, 105.3726806640625, 15.7752685546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000506.npy"}
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 62.220069885253906, "std": 114.26143646240234, "min": -235.84393310546875, "p10": -66.17942886352539, "median": 42.053714752197266, "p90": 216.79357452392583, "max": 334.2452392578125, "pos_frac": 0.6875, "sample": [173.91448974609375, 13.022613525390625, -12.750778198242188, 30.8565673828125, 332.02642822265625, -10.799873352050781, 265.9944152832031, -235.84393310546875, -84.23992919921875, 123.69831848144531, 86.2625732421875, 57.96357727050781, -53.540985107421875, 134.88516235351562, -63.19152069091797, -67.4599609375, 94.08818054199219, -21.315673828125, 184.84825134277344, 28.004314422607422, 43.02180480957031, 43.71026611328125, 176.12582397460938, -33.36894226074219, 200.20767211914062, -12.925271987915039, 68.43602752685547, 229.62158203125, 94.86798095703125, 3.6666259765625, 59.02783203125, 25.472808837890625, 273.89752197265625, -51.216957092285156, 197.4893798828125, 133.60218811035156, -62.89923095703125, 204.92787170410156, -92.36788177490234, 1.77880859375, -33.445465087890625, 8.59564208984375, -4.504966735839844, 148.49903869628906, 130.09278869628906, 25.356002807617188, 41.08562469482422, 14.347015380859375, 334.2452392578125, 226.93597412109375, -69.41938781738281, 122.34478759765625, -41.20136260986328, 5.742958068847656, 110.3385009765625, -3.2001876831054688, 112.42405700683594, -106.71546936035156, 58.44329071044922, -108.20735168457031, 18.934356689453125, 221.87887573242188, 114.389404296875, 175.62680053710938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000507.npy"}
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 70.27108764648438, "std": 106.78490447998047, "min": -172.70640563964844, "p10": -62.359275054931636, "median": 59.337697982788086, "p90": 209.29623107910157, "max": 267.5186767578125, "pos_frac": 0.734375, "sample": [156.1250457763672, 23.673965454101562, 154.7238006591797, 18.003952026367188, 41.567596435546875, -58.768768310546875, 229.66738891601562, -2.2303733825683594, 142.60911560058594, 209.6864013671875, 244.8489532470703, 171.02761840820312, 42.44493103027344, 13.5650634765625, 6.607086181640625, 126.58336639404297, -172.70640563964844, 41.30255126953125, 26.000579833984375, -72.86988830566406, 18.530136108398438, 15.62738037109375, 152.20545959472656, -63.89806365966797, -70.63872528076172, -161.98594665527344, 80.82048034667969, 0.38300514221191406, 267.5186767578125, 95.40742492675781, -48.11090087890625, 236.44505310058594, 98.03669738769531, 79.39708709716797, 149.1630859375, 98.17730712890625, 104.11058044433594, 86.68634033203125, 184.55902099609375, -10.610084533691406, -2.436767578125, -21.965843200683594, 172.2796630859375, 195.37136840820312, -8.582839965820312, -100.20051574707031, 143.34722900390625, 213.3533935546875, 221.686767578125, 5.340753555297852, 193.26467895507812, -2.3404083251953125, 58.13126754760742, -1.282796859741211, 5.8997955322265625, -148.5608367919922, -18.231246948242188, 196.36483764648438, 206.50546264648438, 185.50439453125, 71.59546661376953, 60.54412841796875, 9.6895751953125, 208.38583374023438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000508.npy"}
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 58.37133026123047, "std": 100.17506408691406, "min": -169.29562377929688, "p10": -90.96180343627928, "median": 59.11713409423828, "p90": 184.70792236328126, "max": 226.7489013671875, "pos_frac": 0.703125, "sample": [57.90374755859375, -37.4553337097168, 75.22804260253906, 176.8780517578125, 60.33052062988281, -1.8705196380615234, 8.548309326171875, -18.5203800201416, -98.5855484008789, 99.71456909179688, 19.90703582763672, 68.05157470703125, -126.01779174804688, 88.07855224609375, 171.40164184570312, -125.57888793945312, 137.62701416015625, 140.659423828125, -29.676116943359375, 17.894866943359375, 156.1090545654297, -111.15009307861328, 25.0965576171875, -21.358131408691406, 226.7489013671875, 39.04554748535156, -30.368385314941406, 41.61014938354492, 67.28622436523438, -6.79876708984375, 219.9853515625, 32.46062088012695, 192.09521484375, 147.90866088867188, 35.887908935546875, -1.8193492889404297, 178.53042602539062, 92.709228515625, 145.9271240234375, 179.43531799316406, 14.552909851074219, 0.7531890869140625, 210.487548828125, -73.17306518554688, 122.82366943359375, 201.2438201904297, 53.616485595703125, 76.31686401367188, 205.76364135742188, 167.111572265625, 186.27883911132812, -26.825180053710938, -41.88526153564453, -169.29562377929688, -1.334014892578125, 162.41140747070312, 112.0719223022461, -130.43247985839844, -104.92472076416016, 76.8041763305664, 8.791778564453125, 75.4369888305664, 181.04244995117188, 134.2677764892578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000509.npy"}
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 71.32635498046875, "std": 118.16532897949219, "min": -189.35598754882812, "p10": -82.68860397338862, "median": 56.16164779663086, "p90": 225.7984619140625, "max": 334.3409118652344, "pos_frac": 0.75, "sample": [68.98124694824219, 182.97853088378906, 180.95932006835938, 158.334228515625, 12.207672119140625, 23.00616455078125, 16.824810028076172, 218.215087890625, -2.7569961547851562, 5.477294921875, -38.701316833496094, 33.94127655029297, 29.010204315185547, 47.21525573730469, -152.26673889160156, -11.555000305175781, 7.3593292236328125, 58.41239929199219, -20.216094970703125, 15.063982009887695, 67.01127624511719, 184.288818359375, 204.5957489013672, -101.54029846191406, 5.9560546875, 183.65870666503906, 46.654869079589844, -28.944015502929688, -0.08944320678710938, 155.4066162109375, 47.36561584472656, 128.15863037109375, 262.95947265625, 80.46476745605469, 11.816593170166016, 152.18292236328125, 98.85997772216797, 246.23947143554688, 66.82183837890625, -5.483642578125, 8.78156852722168, -166.95516967773438, 181.491943359375, 62.580013275146484, 244.6243896484375, 27.56592559814453, 153.73532104492188, 234.23049926757812, 324.0995178222656, -105.31268310546875, 53.91089630126953, 71.05990600585938, 180.1676025390625, 168.15936279296875, -129.54373168945312, -0.4148368835449219, 224.52206420898438, 226.34548950195312, -9.874755859375, 112.83948516845703, 334.3409118652344, 74.54414367675781, -155.52984619140625, -189.35598754882812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000510.npy"}
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 83.08503723144531, "std": 101.5050048828125, "min": -140.3784942626953, "p10": -39.9903953552246, "median": 77.83048629760742, "p90": 204.09449768066406, "max": 250.32073974609375, "pos_frac": 0.765625, "sample": [205.21734619140625, 199.43772888183594, 80.87028503417969, 190.7220916748047, 19.78287124633789, -16.080368041992188, 43.2443733215332, -3.9045028686523438, 163.84280395507812, 200.48837280273438, 105.51567077636719, -63.957374572753906, 15.526748657226562, 58.942161560058594, 185.49298095703125, 229.7095489501953, 164.69723510742188, -43.32477569580078, 141.76490783691406, 116.51924133300781, 190.825439453125, -32.210174560546875, 74.79068756103516, -93.25801849365234, -123.03305053710938, -5.499725341796875, 208.05148315429688, 145.7766571044922, 136.09689331054688, 18.609481811523438, 36.215850830078125, 159.70297241210938, 18.632110595703125, 58.1561279296875, 176.76470947265625, 136.7395477294922, 172.97808837890625, 34.874385833740234, -16.728954315185547, 17.295866012573242, 45.56373596191406, 86.31710815429688, 189.60069274902344, 219.94097900390625, 250.32073974609375, 155.08737182617188, -140.3784942626953, 50.091766357421875, -18.852983474731445, 186.5053253173828, 44.783424377441406, 0.1869354248046875, 138.45034790039062, 172.46005249023438, 8.71063232421875, 58.176841735839844, -100.38507080078125, 229.866943359375, 212.73011779785156, 150.66952514648438, 201.47451782226562, -4.077747344970703, -105.853271484375, -23.23537826538086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000511.npy"}
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 55.1219596862793, "std": 104.09164428710938, "min": -264.154296875, "p10": -70.72918128967284, "median": 44.67077445983887, "p90": 181.35051727294922, "max": 217.75555419921875, "pos_frac": 0.734375, "sample": [168.27554321289062, 73.63948059082031, 26.23111915588379, -157.35792541503906, -77.46751403808594, 183.63743591308594, 48.066402435302734, 18.23963737487793, 49.880165100097656, 141.99546813964844, 158.86170959472656, 204.3774871826172, -264.154296875, -55.006404876708984, 11.603141784667969, -4.0128326416015625, -16.227210998535156, -0.07986831665039062, 165.97935485839844, 4.792879104614258, 197.5408935546875, 179.06814575195312, 17.40892791748047, 153.11807250976562, -27.850265502929688, 147.38653564453125, -95.44013977050781, 181.63485717773438, -22.11994171142578, 162.49049377441406, 2.913837432861328, 144.905517578125, 141.208984375, 5.488424301147461, 144.9197540283203, 29.553466796875, 5.532745361328125, 103.43743896484375, 2.135904312133789, 188.43878173828125, 0.0955657958984375, -145.1306610107422, -9.692743301391602, -34.1590576171875, 101.5076904296875, 215.40574645996094, 49.853736877441406, 28.19723129272461, 52.10845947265625, 3.891803741455078, 172.11143493652344, -4.838127136230469, 41.275146484375, -135.16265869140625, 76.00955200195312, 147.2172393798828, 217.75555419921875, 97.62059783935547, 180.6870574951172, 158.97650146484375, 39.89031219482422, -43.20716094970703, 86.46118927001953, -112.11515045166016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000512.npy"}
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 69.51165771484375, "std": 91.24283599853516, "min": -166.83984375, "p10": -23.918778991699217, "median": 60.93899917602539, "p90": 196.7213317871094, "max": 297.67327880859375, "pos_frac": 0.78125, "sample": [277.4180908203125, 146.1811065673828, 183.21971130371094, 108.287353515625, 141.08392333984375, 64.04483032226562, 151.987548828125, -0.2657890319824219, -0.9491081237792969, -5.375877380371094, 65.21300506591797, -91.14262390136719, 170.31564331054688, 117.58297729492188, 105.70046997070312, 105.39253997802734, -26.74332046508789, 12.023551940917969, 194.84010314941406, 15.557243347167969, 41.15736389160156, 208.96969604492188, 126.3855209350586, 84.45809936523438, 206.59390258789062, 4.6343841552734375, -23.026870727539062, 117.7796630859375, 32.27363586425781, 86.83271026611328, 18.128623962402344, 81.34359741210938, 127.10216522216797, -166.83984375, 57.833168029785156, 14.709993362426758, -49.560630798339844, 178.07858276367188, 297.67327880859375, 18.915546417236328, 1.6089744567871094, -1.7314739227294922, 226.46585083007812, 8.527191162109375, 143.71295166015625, 197.52757263183594, 212.84796142578125, -9.177421569824219, 157.67660522460938, 96.52726745605469, 67.9490737915039, -4.6251373291015625, 34.81713104248047, 75.68269348144531, -24.301025390625, -71.31605529785156, 36.104774475097656, 15.490764617919922, 4.788616180419922, 31.286218643188477, 94.97445678710938, -56.97606658935547, 9.134292602539062, 3.937175750732422], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000513.npy"}
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 75.80330657958984, "std": 106.46144104003906, "min": -214.25323486328125, "p10": -62.08946723937986, "median": 91.07062149047852, "p90": 188.18272399902344, "max": 350.5181884765625, "pos_frac": 0.75, "sample": [6.062314987182617, 131.38507080078125, 114.65325927734375, 108.91416931152344, 101.24739074707031, 85.40839385986328, 162.76724243164062, -22.301551818847656, 67.46853637695312, -17.71518325805664, 29.433292388916016, 197.98634338378906, 134.54161071777344, 90.77615356445312, -11.108652114868164, 71.2762222290039, -110.14157104492188, 136.4066619873047, 10.611640930175781, 65.97594451904297, -185.9400634765625, -0.39867401123046875, 178.80641174316406, 145.60348510742188, 205.24728393554688, 9.566970825195312, 63.154144287109375, 160.96286010742188, 125.30067443847656, 222.1112060546875, -74.51510620117188, -78.15121459960938, 161.91384887695312, 187.41195678710938, 23.335960388183594, 91.80623626708984, 101.488037109375, 170.15283203125, -13.670684814453125, 130.75941467285156, 49.799705505371094, -92.83393859863281, -33.096309661865234, 15.37750244140625, 166.08851623535156, 91.3650894165039, 174.4465789794922, -32.957672119140625, 20.031902313232422, 115.47762298583984, -31.29254913330078, 43.20701217651367, -6.5140838623046875, 187.77182006835938, -214.25323486328125, 131.2923583984375, 210.14378356933594, 173.59597778320312, 250.53411865234375, 5.540899276733398, 350.5181884765625, 188.35882568359375, -76.78025817871094, 186.99684143066406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000514.npy"}
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 59.75636291503906, "std": 103.64120483398438, "min": -220.9852294921875, "p10": -83.10757217407226, "median": 62.58243179321289, "p90": 198.46987457275392, "max": 224.25572204589844, "pos_frac": 0.703125, "sample": [-16.43625259399414, 121.0448989868164, 16.76177215576172, 48.378028869628906, 224.25572204589844, 166.92352294921875, 17.22030258178711, 185.3719940185547, 7.4203643798828125, 194.96185302734375, 78.1888198852539, 47.019203186035156, 101.25440216064453, -9.240297317504883, 126.51988220214844, 79.25042724609375, -107.00643920898438, 201.60484313964844, -81.61840057373047, -143.5946044921875, 84.08673858642578, 1.0137252807617188, -29.52532958984375, 40.65272521972656, -220.9852294921875, 127.27400207519531, 190.68374633789062, 60.92039489746094, -31.680152893066406, -40.31951141357422, 126.03174591064453, 161.41258239746094, -0.8146629333496094, -0.39555931091308594, 46.043968200683594, 172.3092041015625, -104.33561706542969, 170.75975036621094, -13.995428085327148, 117.12898254394531, -94.29023742675781, 96.46400451660156, 215.3579559326172, -138.68606567382812, 88.01466369628906, 208.8546600341797, 25.825578689575195, -22.39777183532715, 64.24446868896484, 189.8099822998047, 23.28331756591797, 135.09033203125, 115.636474609375, 199.9733123779297, 218.04966735839844, -83.74578857421875, 141.17552185058594, 3.7581520080566406, 206.82276916503906, -51.750144958496094, 73.51366424560547, 90.16302490234375, -2.5206165313720703, 7.214134216308594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000515.npy"}
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 59.737335205078125, "std": 102.1108627319336, "min": -219.19894409179688, "p10": -60.548143768310545, "median": 62.63837814331055, "p90": 174.5948913574219, "max": 221.33534240722656, "pos_frac": 0.75, "sample": [153.66932678222656, 202.12940979003906, 158.29965209960938, -85.3699951171875, -60.26301574707031, 13.768375396728516, -51.87354278564453, 145.3956298828125, 175.48379516601562, 123.3035888671875, 50.85405731201172, 29.20775032043457, 151.75543212890625, 218.1876678466797, 1.3194999694824219, 13.18663215637207, 110.80844116210938, 173.44615173339844, 221.33534240722656, 160.68409729003906, -60.09864044189453, -194.96034240722656, 78.48414611816406, 99.0817642211914, 162.38333129882812, 67.95696258544922, 165.1995391845703, 9.469680786132812, 169.0494842529297, 127.9881362915039, -219.19894409179688, 20.241348266601562, 0.5223255157470703, -28.01384735107422, -12.265174865722656, 174.24600219726562, 174.74441528320312, -5.480535507202148, 3.458454132080078, 11.534595489501953, 133.81849670410156, -174.08010864257812, 113.74470520019531, 191.95892333984375, 57.319793701171875, 42.751617431640625, 171.2034149169922, 52.744686126708984, 81.87315368652344, 109.49571228027344, 4.59698486328125, 155.2049560546875, 11.360076904296875, -60.67034149169922, -96.79214477539062, 113.6256103515625, -33.788143157958984, 101.724365234375, -6.8850860595703125, 10.688297271728516, 82.15742492675781, -2.444103240966797, -72.3181381225586, 186.22824096679688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000516.npy"}
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 41.239402770996094, "std": 96.82835388183594, "min": -142.74337768554688, "p10": -96.07858505249023, "median": 22.76909637451172, "p90": 184.38735198974612, "max": 244.17715454101562, "pos_frac": 0.65625, "sample": [14.81875228881836, -12.693000793457031, -9.899288177490234, 103.20387268066406, -60.153175354003906, 20.940021514892578, -53.80584716796875, 94.53028869628906, 215.07855224609375, 30.537967681884766, 6.0030364990234375, -17.309112548828125, -72.76034545898438, -3.0804214477539062, 24.574981689453125, -60.220237731933594, -3.9864749908447266, -4.590660095214844, 43.52113342285156, 13.816482543945312, 112.46449279785156, -123.39404296875, 231.20785522460938, 163.6041259765625, -142.74337768554688, 56.11836242675781, -11.893096923828125, 90.97843170166016, 54.680877685546875, 167.09469604492188, 63.624366760253906, 163.7815704345703, 6.067293167114258, 57.56311798095703, -93.231689453125, -97.2986831665039, 21.040084838867188, 154.90077209472656, -38.99664306640625, -123.06127166748047, 73.01932525634766, 194.6796875, 244.17715454101562, 45.42634201049805, 104.53242492675781, 158.80081176757812, 8.372386932373047, 80.62184143066406, -104.73871612548828, -32.50200653076172, -98.27745819091797, 96.67657470703125, 15.498451232910156, 179.27059936523438, 219.57891845703125, 190.19244384765625, 50.60285949707031, -10.012371063232422, 24.49810791015625, 18.495304107666016, 186.5802459716797, -109.73103332519531, 110.56678771972656, 11.959213256835938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000517.npy"}
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 61.39295959472656, "std": 99.63323211669922, "min": -171.6096954345703, "p10": -58.56898727416991, "median": 37.2015266418457, "p90": 203.81177673339846, "max": 295.6920166015625, "pos_frac": 0.71875, "sample": [101.56541442871094, -1.56793212890625, -2.0140151977539062, 199.94692993164062, -22.115570068359375, 6.644157409667969, 20.45067596435547, 126.0821533203125, -49.30801773071289, 30.515945434570312, 70.37722778320312, -63.98890686035156, 41.022682189941406, 89.75334167480469, 112.13875579833984, -74.73713684082031, 44.46293640136719, 125.21956634521484, -84.68500518798828, 173.69830322265625, 287.5556640625, -171.6096954345703, 0.07590484619140625, 111.11943054199219, 147.28150939941406, 165.15943908691406, -62.368560791015625, 138.8236846923828, -80.17732238769531, 131.89599609375, 7.156467437744141, -42.23534393310547, 3.7574901580810547, 103.28663635253906, 33.38037109375, -1.1873931884765625, 3.4118919372558594, -0.1040496826171875, 94.40370178222656, 263.11383056640625, 18.008712768554688, 67.61128997802734, 0.1856403350830078, 205.4681396484375, 125.83570861816406, 295.6920166015625, -65.17796325683594, 167.76107788085938, 217.11483764648438, -49.70331573486328, -10.08073616027832, -15.435434341430664, 219.44921875, 0.5606155395507812, 43.08531951904297, 169.51608276367188, 143.11920166015625, 54.87834167480469, -1.4507617950439453, 4.282012939453125, 3.5834884643554688, 6.33949089050293, 224.888916015625, 127.41644287109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000518.npy"}
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 36.266517639160156, "std": 124.37303924560547, "min": -304.35986328125, "p10": -119.42066345214843, "median": 36.984825134277344, "p90": 189.5608856201172, "max": 228.68017578125, "pos_frac": 0.640625, "sample": [-148.48793029785156, 177.7293701171875, 94.454345703125, -196.2983856201172, -43.76301956176758, 186.2643280029297, -7.503322601318359, 228.68017578125, 25.382308959960938, 164.49407958984375, 47.846588134765625, 31.58463478088379, 170.19869995117188, -40.495689392089844, 72.82178497314453, 97.39188385009766, 187.3097686767578, 22.23381996154785, -169.1726531982422, -41.002899169921875, 181.56919860839844, 179.6797637939453, -181.7133026123047, 218.90724182128906, 143.5845184326172, 192.94308471679688, -85.29507446289062, 3.7537307739257812, 67.34963989257812, 94.50552368164062, -63.3468017578125, 164.4000244140625, 36.18898010253906, -81.85665893554688, 90.42059326171875, -65.941650390625, -80.75198364257812, 2.239574432373047, 217.6851043701172, -110.42230224609375, 111.82118225097656, 5.1352386474609375, 181.82101440429688, 70.72180938720703, 42.854957580566406, -304.35986328125, 61.45292663574219, -54.729278564453125, 37.780670166015625, -123.04154205322266, 186.0186004638672, -63.69115447998047, -124.62876892089844, 223.69412231445312, 59.51860046386719, 21.61009979248047, 210.79505920410156, 117.61225128173828, 0.7382469177246094, -110.9719467163086, -65.25614929199219, -36.666481018066406, 190.52565002441406, -101.26545715332031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000519.npy"}
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 52.05445861816406, "std": 96.09293365478516, "min": -156.1901092529297, "p10": -55.38065185546874, "median": 36.65928649902344, "p90": 175.20543518066407, "max": 311.38018798828125, "pos_frac": 0.703125, "sample": [149.87364196777344, -23.92392921447754, -2.637439727783203, -13.93581771850586, 175.63734436035156, 15.74383544921875, 83.58404541015625, 7.762290954589844, -1.9925804138183594, 311.38018798828125, 149.64256286621094, 7.338434219360352, -15.291351318359375, -57.72486877441406, 78.39981079101562, 79.61629486083984, 14.221923828125, -49.91081237792969, 10.033077239990234, 174.19764709472656, 263.84747314453125, 87.65884399414062, 41.082862854003906, 196.29811096191406, 76.60760498046875, 59.815162658691406, 18.531295776367188, -89.99388122558594, -112.5027847290039, 141.48907470703125, 58.20133972167969, -1.5902328491210938, -2.5282058715820312, 22.385841369628906, 4.9522552490234375, -0.9539756774902344, 189.8994903564453, 105.78063201904297, 14.8070068359375, -95.74427032470703, -156.1901092529297, 40.33268737792969, 59.727081298828125, -120.37763977050781, 76.59262084960938, 68.60192108154297, 11.376754760742188, 130.76116943359375, 116.90806579589844, 33.24272155761719, 40.07585144042969, 196.0634002685547, 160.51217651367188, 60.82945251464844, 245.72740173339844, 31.91144371032715, 79.86270141601562, 1.963226318359375, -10.442996978759766, -0.31871795654296875, -4.670310974121094, 174.0908966064453, 172.21791076660156, -147.3704376220703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000520.npy"}
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 77.72957611083984, "std": 99.02833557128906, "min": -201.59756469726562, "p10": -18.06523132324218, "median": 78.72232818603516, "p90": 184.01058349609374, "max": 340.0034484863281, "pos_frac": 0.734375, "sample": [171.63418579101562, 56.20869445800781, 160.43807983398438, 59.8756103515625, 175.66180419921875, 125.6080551147461, 177.29714965820312, -5.7662811279296875, -7.9087371826171875, 52.313629150390625, 63.54866027832031, -30.934795379638672, 176.34542846679688, 75.1356430053711, 149.03932189941406, 105.56830596923828, 14.03776741027832, 24.56684112548828, -5.7769622802734375, 181.4512481689453, -10.101112365722656, 182.034912109375, 42.154266357421875, 13.384002685546875, -9.46617317199707, 145.33670043945312, -106.33491516113281, 184.8572998046875, -8.44390869140625, 180.8622589111328, 233.6214141845703, 340.0034484863281, -120.41643524169922, 21.787982940673828, -7.434211730957031, -20.837745666503906, -7.563789367675781, 107.87237548828125, 3.0117340087890625, 229.51904296875, 227.96604919433594, 19.32827377319336, -73.69597625732422, 85.78592681884766, -2.462301254272461, 17.277746200561523, 160.2873077392578, 103.68568420410156, 6.476894378662109, 107.38227844238281, 152.47525024414062, 106.23676300048828, -201.59756469726562, 259.1815490722656, 143.87744140625, 100.69451904296875, 59.15870666503906, -28.257293701171875, 89.94258117675781, -11.596031188964844, 122.74293518066406, 82.30901336669922, 186.19268798828125, 149.1097412109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000521.npy"}
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 49.74985122680664, "std": 117.78759765625, "min": -255.06106567382812, "p10": -87.8678123474121, "median": 44.07674789428711, "p90": 204.33154907226563, "max": 263.2784118652344, "pos_frac": 0.65625, "sample": [-53.436309814453125, 83.7160873413086, 4.673669815063477, 78.42798614501953, 95.7739486694336, 185.70530700683594, 55.46839141845703, 227.437744140625, 176.921142578125, 75.48157501220703, -164.38795471191406, -129.74600219726562, 59.93553161621094, 205.79824829101562, 42.02619934082031, 172.29495239257812, -58.035194396972656, 82.96038818359375, 150.31507873535156, 13.245817184448242, 203.9728546142578, -1.2021141052246094, 61.81139373779297, -238.8231201171875, 263.2784118652344, 1.452484130859375, -1.9607353210449219, -77.40270233154297, -255.06106567382812, 12.235109329223633, -22.396881103515625, 193.92776489257812, -41.14353942871094, 216.55699157714844, 155.72959899902344, 86.11369323730469, -9.865646362304688, 51.87556457519531, 6.060337066650391, 46.127296447753906, 41.5216064453125, -8.689865112304688, 8.006477355957031, -28.36054229736328, 121.60658264160156, -0.8484420776367188, 125.5775146484375, -191.775146484375, 204.4852752685547, -59.62042236328125, -14.104637145996094, -103.04914855957031, 163.33721923828125, 262.52105712890625, -11.469253540039062, 12.922233581542969, -92.35285949707031, 243.1724090576172, 148.6919708251953, 134.38494873046875, 8.185647964477539, -23.91281509399414, 110.3316650390625, 177.566650390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000522.npy"}
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 57.53034210205078, "std": 107.86752319335938, "min": -286.71795654296875, "p10": -93.21825942993162, "median": 84.8509521484375, "p90": 173.8283462524414, "max": 222.59510803222656, "pos_frac": 0.75, "sample": [30.585865020751953, 210.85818481445312, -2.4095726013183594, -19.66980743408203, 126.62781524658203, 95.2249984741211, 85.45228576660156, 35.91539764404297, 100.6712646484375, 37.05958557128906, 0.020933151245117188, 84.24961853027344, 211.0179443359375, -132.27325439453125, 66.55449676513672, 87.60955810546875, 121.85542297363281, -8.575780868530273, 66.30270385742188, -160.67562866210938, 11.900543212890625, 186.76602172851562, -58.91558837890625, 90.83793640136719, -99.24380493164062, 167.11447143554688, -61.699180603027344, 29.116741180419922, 3.3172988891601562, 153.9647674560547, 23.99098777770996, 136.9806671142578, 141.55801391601562, 124.8464584350586, 161.2440948486328, 160.36151123046875, 172.24256896972656, 17.37725830078125, 18.69902801513672, -286.71795654296875, 90.52566528320312, -151.888671875, 221.9814453125, 108.0868148803711, 167.24766540527344, -10.478981018066406, 8.931705474853516, 130.95220947265625, 119.59510803222656, -79.15865325927734, 222.59510803222656, 52.40669631958008, 144.17318725585938, -112.1163558959961, 146.32760620117188, 174.50796508789062, 96.44657897949219, 200.11727905273438, -8.510345458984375, -173.85455322265625, -54.37706756591797, 99.78858947753906, 23.14947509765625, 135.34970092773438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000523.npy"}
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 60.818992614746094, "std": 93.69459533691406, "min": -188.05853271484375, "p10": -51.64023284912108, "median": 52.268232345581055, "p90": 177.50223236083986, "max": 285.7808532714844, "pos_frac": 0.796875, "sample": [167.5515594482422, 31.579910278320312, 219.30377197265625, 66.801025390625, -22.56500244140625, -16.521522521972656, -74.13741302490234, -38.73004150390625, 98.04804992675781, 86.61032104492188, 201.65725708007812, 0.7541656494140625, -70.10066223144531, 82.05802917480469, 46.70428466796875, 209.92599487304688, -175.21536254882812, -57.17317199707031, 17.443073272705078, 175.53857421875, 171.5801544189453, -188.05853271484375, 154.83106994628906, -1.1718902587890625, 17.216796875, 22.66028594970703, 178.08468627929688, 99.72284698486328, 109.53190612792969, 7.36376953125, -10.089088439941406, 95.66600799560547, 39.12890625, 96.29610443115234, 89.5675048828125, 52.713497161865234, 33.856529235839844, 161.5420379638672, 22.518905639648438, -73.59776306152344, 45.853668212890625, 3.4654388427734375, 6.213356018066406, 285.7808532714844, 42.511009216308594, 134.70840454101562, 176.14317321777344, 5.791744232177734, 56.35516357421875, 51.822967529296875, 111.5208969116211, 30.96501922607422, 55.59180450439453, 198.09432983398438, 82.38011169433594, 134.85690307617188, 183.37606811523438, 98.0590591430664, -130.64349365234375, 15.759895324707031, 169.7744140625, -3.8294601440429688, 101.06101989746094, 7.9067535400390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000524.npy"}
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 51.157466888427734, "std": 104.68022918701172, "min": -218.05404663085938, "p10": -67.43512725830077, "median": 28.88530731201172, "p90": 187.2137008666992, "max": 294.766357421875, "pos_frac": 0.640625, "sample": [-31.425323486328125, -189.84446716308594, 1.62652587890625, -37.42893600463867, -116.22667694091797, 51.371612548828125, 16.676376342773438, 60.3671875, 232.3309326171875, -7.931434631347656, 151.51768493652344, -7.206634521484375, 145.6892547607422, 27.09149169921875, 141.00103759765625, 209.31890869140625, 103.43663024902344, -27.44377899169922, 181.8446807861328, -55.5491943359375, 91.87548065185547, -2.4635353088378906, 216.76400756835938, 170.84230041503906, -17.30652618408203, 94.01316833496094, -14.398773193359375, 106.46427917480469, 49.614112854003906, 111.99555969238281, 3.8300628662109375, 193.38845825195312, 30.679122924804688, 106.91871643066406, 82.79518127441406, -27.768478393554688, 45.31370162963867, 2.305145263671875, 294.766357421875, -15.08616828918457, -72.52909851074219, 55.05653381347656, 0.6373100280761719, -218.05404663085938, -17.552940368652344, 187.84877014160156, -1.5516853332519531, 22.016212463378906, 23.4132080078125, 154.29705810546875, -81.32970428466797, 185.73187255859375, 9.364631652832031, -29.893310546875, 150.68460083007812, 253.47451782226562, -102.09823608398438, 82.10758972167969, 117.61231994628906, 124.21261596679688, -12.459373474121094, 175.011962890625, -2.954252243041992, -102.72682189941406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000525.npy"}
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 43.982322692871094, "std": 98.2674560546875, "min": -183.828857421875, "p10": -78.08744354248046, "median": 53.56996154785156, "p90": 162.79382781982426, "max": 271.3304138183594, "pos_frac": 0.75, "sample": [210.20545959472656, 9.846111297607422, -38.988563537597656, 4.736021041870117, -94.90184783935547, 8.361946105957031, 0.9122276306152344, 264.5850830078125, -149.2607879638672, -157.0281524658203, 59.035064697265625, 75.37632751464844, -149.97760009765625, 22.034332275390625, 149.9560089111328, -34.07970428466797, 75.6297607421875, 154.4358367919922, 19.712318420410156, 140.69140625, 69.64991760253906, 200.58294677734375, 31.631362915039062, 120.84060668945312, 181.26144409179688, 8.771610260009766, 56.42426300048828, 138.50762939453125, -70.36613464355469, 124.132080078125, -4.431665420532227, 68.19032287597656, 50.715660095214844, -22.497255325317383, 166.37582397460938, 0.3287639617919922, 109.7097396850586, 13.848609924316406, 32.833221435546875, 14.004524230957031, 77.7215805053711, 26.843090057373047, -38.28303527832031, 121.05338287353516, 56.54777145385742, -176.28204345703125, 0.000408172607421875, 148.575439453125, 113.19424438476562, -22.582550048828125, 80.63922119140625, -17.012630462646484, 271.3304138183594, 58.38200759887695, 185.06594848632812, 6.04766845703125, -81.39657592773438, 67.57527160644531, 66.23625946044922, 61.511016845703125, 88.5517578125, -35.68574523925781, 78.86982727050781, -183.828857421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000526.npy"}
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 39.389434814453125, "std": 106.23471069335938, "min": -204.76370239257812, "p10": -87.69841537475584, "median": 19.542469024658203, "p90": 191.19706878662112, "max": 289.0994873046875, "pos_frac": 0.640625, "sample": [0.07871055603027344, 22.039657592773438, 38.87389373779297, 105.93901062011719, 19.261734008789062, 2.289203643798828, 223.89706420898438, 82.36740112304688, 169.94590759277344, 289.0994873046875, 171.8245391845703, -37.855072021484375, 185.0844268798828, -0.8800582885742188, 39.9937744140625, -97.42786407470703, 215.44573974609375, 0.5733127593994141, 105.32720947265625, -64.99636840820312, -138.16387939453125, -48.996742248535156, -14.720260620117188, -108.64134216308594, 82.64707946777344, 11.482757568359375, 234.25424194335938, 53.09602355957031, 13.053966522216797, 13.466827392578125, 98.0889892578125, 22.78466033935547, -54.795318603515625, 43.50454330444336, 34.32138442993164, 193.8167724609375, 47.728294372558594, -20.3001708984375, -12.932798385620117, 171.67440795898438, 72.05084228515625, 12.816631317138672, 195.74169921875, -5.473865509033203, -31.17120361328125, -17.29430389404297, -137.5958251953125, 184.9610595703125, -6.5352630615234375, -3.302133560180664, -19.46771240234375, 0.3693218231201172, 135.34231567382812, 131.95504760742188, -204.76370239257812, -35.697998046875, 109.10733032226562, -13.957054138183594, 98.56825256347656, 205.88743591308594, -164.73544311523438, -193.78810119628906, 95.83200073242188, 19.823204040527344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000527.npy"}
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 44.45103454589844, "std": 112.75184631347656, "min": -230.5989990234375, "p10": -62.3105842590332, "median": 22.228364944458008, "p90": 197.68669128417972, "max": 279.18408203125, "pos_frac": 0.703125, "sample": [139.9379119873047, -230.5989990234375, 200.046142578125, -7.904863357543945, 53.435447692871094, 127.48119354248047, 73.371826171875, -9.581985473632812, 17.372451782226562, -4.842948913574219, -3.6211776733398438, -53.81498718261719, 200.48114013671875, -178.96084594726562, 3.7531375885009766, -1.669891357421875, 170.19866943359375, 12.134817123413086, 16.618894577026367, -64.53549194335938, -54.508445739746094, 275.7882385253906, -1.1983718872070312, -204.94595336914062, -8.122673034667969, 31.762489318847656, -57.11913299560547, -33.669029235839844, 23.048307418823242, 172.11538696289062, 82.55291748046875, 20.350082397460938, 33.83297348022461, 220.9788818359375, 125.54702758789062, 13.618085861206055, 214.4644012451172, 0.7513351440429688, 79.12983703613281, 107.228515625, 192.18130493164062, 204.42991638183594, 26.318082809448242, 21.408422470092773, -178.3402862548828, 116.58341979980469, 12.441303253173828, 11.863555908203125, 155.31820678710938, 1.0052947998046875, 56.52433776855469, -167.674072265625, -180.7663116455078, 279.18408203125, 177.23526000976562, 64.42295837402344, 5.043369293212891, -7.660789489746094, 172.91873168945312, 5.01396369934082, 137.44850158691406, 64.8058090209961, 135.08917236328125, 39.16670227050781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000528.npy"}
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 92.30682373046875, "std": 94.47373962402344, "min": -168.68125915527344, "p10": -10.774095153808593, "median": 101.57752990722656, "p90": 200.46627197265627, "max": 252.00003051757812, "pos_frac": 0.828125, "sample": [109.95741271972656, -55.498409271240234, 97.38453674316406, -4.384941101074219, 56.090091705322266, 1.6752471923828125, 252.00003051757812, 196.91297912597656, -40.29478073120117, 16.682754516601562, 10.282567977905273, 0.9120082855224609, 19.63885498046875, 98.72830963134766, 164.30699157714844, 137.2645263671875, -69.53170013427734, 169.1945037841797, -7.766998291015625, 161.620361328125, 206.20669555664062, -11.243545532226562, -9.6787109375, 188.86251831054688, 95.3422622680664, 178.2889404296875, -11.748527526855469, 29.46532440185547, 176.7314910888672, 29.581588745117188, 215.16616821289062, 157.82289123535156, 110.97824096679688, 187.56495666503906, 109.99565124511719, 111.98217010498047, 181.46253967285156, 191.25343322753906, 28.40145492553711, 223.17520141601562, 216.46685791015625, 187.81076049804688, 232.0546112060547, 27.428329467773438, 155.85836791992188, 181.12435913085938, 65.83320617675781, -2.3887062072753906, 107.7956314086914, 37.1077880859375, 43.993019104003906, 87.57537078857422, 48.404457092285156, 197.60256958007812, -120.89947509765625, 201.69357299804688, 104.42675018310547, 192.22471618652344, 123.9053955078125, 67.97813415527344, -168.68125915527344, 8.469581604003906, 9.4722900390625, 197.59518432617188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000529.npy"}
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 55.68548583984375, "std": 97.39543914794922, "min": -186.92367553710938, "p10": -26.4978645324707, "median": 32.125099182128906, "p90": 184.23511505126953, "max": 297.4369201660156, "pos_frac": 0.734375, "sample": [265.171875, 69.09992980957031, 16.0767765045166, 0.6129646301269531, 7.650794982910156, 4.483428955078125, 245.76138305664062, -11.455909729003906, 157.9864044189453, 153.28538513183594, 15.869159698486328, 210.47225952148438, -22.29960060119629, 34.49644470214844, 112.74864196777344, 59.857505798339844, 81.15388488769531, 100.49310302734375, 161.99392700195312, 24.185386657714844, 1.2236671447753906, -5.992889404296875, -186.92367553710938, 165.34848022460938, 133.7887420654297, -0.1917400360107422, 88.81623840332031, 71.89738464355469, 20.02977752685547, 186.09732055664062, 18.747764587402344, 16.60381317138672, 225.44210815429688, 297.4369201660156, 40.23432922363281, 52.037742614746094, 52.11064529418945, 97.12226867675781, -84.69625854492188, 8.629940032958984, -23.070785522460938, 79.59483337402344, -18.154563903808594, 52.71923828125, -93.77326202392578, -23.299644470214844, 181.61744689941406, -16.059616088867188, -133.78465270996094, -105.89218139648438, -7.531839370727539, -78.07095336914062, 184.95126342773438, -13.961132049560547, 182.56410217285156, 97.71348571777344, 137.04417419433594, -27.8685302734375, 4.659210205078125, 11.362960815429688, 176.7086181640625, 78.85040283203125, 2.3926963806152344, 29.753753662109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000530.npy"}
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 59.49530792236328, "std": 95.9061050415039, "min": -203.3038787841797, "p10": -45.39587860107421, "median": 46.038856506347656, "p90": 189.49003143310546, "max": 247.65350341796875, "pos_frac": 0.71875, "sample": [-50.12379455566406, 119.24406433105469, 220.50973510742188, 124.36885070800781, 81.94320678710938, 89.30072021484375, 3.5408782958984375, 247.65350341796875, 49.852630615234375, -12.886547088623047, 61.28553009033203, 0.6063823699951172, 239.30078125, 0.6402740478515625, 132.26925659179688, 172.27606201171875, -139.92449951171875, -63.91674041748047, 22.915245056152344, 189.5292205810547, 77.08360290527344, 224.6129608154297, -67.7381820678711, -4.487632751464844, -34.36407470703125, 18.180009841918945, 90.53856658935547, 73.645263671875, 13.305572509765625, 1.4776687622070312, 60.30274200439453, -10.165727615356445, -3.283803939819336, 23.2130126953125, -203.3038787841797, -4.394412994384766, 107.7093505859375, 149.33895874023438, 3.4365158081054688, -20.758041381835938, 42.22508239746094, 85.96504974365234, 184.7943572998047, 184.57199096679688, -66.25465393066406, -69.73531341552734, 189.39859008789062, 163.2475128173828, 99.63056945800781, 30.449920654296875, -17.729888916015625, 26.915401458740234, 88.43115234375, 188.27418518066406, 91.36294555664062, -32.88472366333008, 62.784698486328125, -20.389202117919922, 182.2421112060547, 201.58978271484375, -3.3390426635742188, 0.8990535736083984, 197.20416259765625, 15.312736511230469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000531.npy"}
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 63.0866584777832, "std": 99.0206069946289, "min": -166.156494140625, "p10": -73.83313674926755, "median": 61.395774841308594, "p90": 193.56812591552736, "max": 245.35194396972656, "pos_frac": 0.71875, "sample": [98.11337280273438, 83.29512786865234, 194.79827880859375, 187.44073486328125, -29.765838623046875, -23.81090545654297, 130.04058837890625, 191.37783813476562, -0.3593635559082031, 132.8874969482422, -103.12989044189453, 147.8266143798828, 161.00863647460938, 59.943267822265625, 51.18247985839844, 29.09674072265625, 63.197330474853516, 99.43328857421875, 3.7630138397216797, -108.46945190429688, -48.0460205078125, 12.115167617797852, 182.52430725097656, 52.30250549316406, -17.819061279296875, 62.84828186035156, 101.12002563476562, 245.35194396972656, 111.43988037109375, 225.31973266601562, 166.31263732910156, 7.184413909912109, -25.222259521484375, 200.74237060546875, 20.95545196533203, 64.60969543457031, 119.03117370605469, -84.88475799560547, 15.896453857421875, 29.21525764465332, -107.3016357421875, -5.3573455810546875, 30.659442901611328, -31.61691665649414, 201.44949340820312, 35.34722137451172, 184.17584228515625, -46.56194305419922, 45.216033935546875, 194.50682067871094, 151.35302734375, 116.55774688720703, -22.411537170410156, 201.44677734375, -166.156494140625, 83.62312316894531, -126.39065551757812, 114.04830932617188, 180.59375, -86.77164459228516, 2.7720813751220703, -15.046945571899414, 131.49073791503906, 163.05438232421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000532.npy"}
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 66.10958862304688, "std": 95.86664581298828, "min": -184.92449951171875, "p10": -32.34723777770996, "median": 53.40103530883789, "p90": 186.7849105834961, "max": 261.7121887207031, "pos_frac": 0.71875, "sample": [59.093116760253906, 176.8331298828125, 3.3712844848632812, 36.45606994628906, 115.53363037109375, 85.83605194091797, 10.109039306640625, -28.721576690673828, 182.7405548095703, 129.20657348632812, -56.91455078125, -0.9159049987792969, -5.690589904785156, -0.3158855438232422, 97.89824676513672, 229.24951171875, 12.005289077758789, 125.2724380493164, -33.901092529296875, -3.9429855346679688, 11.103324890136719, -2.7959346771240234, 118.03648376464844, 11.048355102539062, -67.48519134521484, 136.81460571289062, 152.9716033935547, 47.708953857421875, 5.425376892089844, -170.9332733154297, -28.126358032226562, -15.305805206298828, 185.12210083007812, 154.8133544921875, -2.3723011016845703, 220.6713409423828, 59.729705810546875, 261.7121887207031, 47.46025085449219, -50.39802932739258, 31.912918090820312, 160.61221313476562, 135.2625274658203, 111.78826141357422, 179.93450927734375, 191.13189697265625, 78.02351379394531, -86.002685546875, 24.210357666015625, -1.1043701171875, 7.765172958374023, 167.06771850585938, 149.47457885742188, -19.995315551757812, 1.6817493438720703, 74.93846893310547, 180.7214813232422, 138.66262817382812, 21.914260864257812, 187.49754333496094, 191.63339233398438, 193.92787170410156, -184.92449951171875, 86.47610473632812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000533.npy"}
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 61.06271743774414, "std": 109.14180755615234, "min": -201.3072509765625, "p10": -66.63234558105468, "median": 48.57552719116211, "p90": 198.9915313720703, "max": 298.13018798828125, "pos_frac": 0.671875, "sample": [46.60346221923828, 2.4685726165771484, 215.19876098632812, 14.910938262939453, -25.380704879760742, 87.3719253540039, 15.347305297851562, -201.3072509765625, 106.47486877441406, -100.19639587402344, -7.349435806274414, 33.788787841796875, 174.61231994628906, 161.7613525390625, -96.80445098876953, 298.13018798828125, 126.89057159423828, 39.03158950805664, 50.54759216308594, -24.16937255859375, 158.45816040039062, 243.50721740722656, 105.38015747070312, 65.33056640625, -10.516860961914062, -138.64044189453125, -172.26486206054688, -9.832845687866211, 221.18600463867188, 10.112701416015625, 126.93441772460938, -83.66159057617188, 216.11558532714844, 186.5351104736328, 193.27975463867188, 109.66793823242188, -68.42921447753906, 53.21021270751953, 257.390625, 187.26512145996094, 85.23075866699219, 197.33221435546875, -49.330162048339844, -3.215057373046875, 175.7156982421875, 116.13275146484375, 0.01995849609375, 137.79022216796875, 21.430091857910156, -21.585617065429688, -7.257354736328125, 199.70266723632812, 100.6558837890625, -15.219417572021484, -21.639144897460938, 9.23974609375, 150.57647705078125, -62.43965148925781, 111.77392578125, -30.395736694335938, 59.209251403808594, 42.436195373535156, 183.22720336914062, -40.335514068603516], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000534.npy"}
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 52.272666931152344, "std": 101.9295654296875, "min": -198.21957397460938, "p10": -49.84320259094237, "median": 30.890300750732422, "p90": 184.5815628051758, "max": 246.8231658935547, "pos_frac": 0.65625, "sample": [30.474472045898438, -32.86130142211914, 118.36257934570312, 46.250728607177734, 19.335289001464844, 22.533039093017578, -185.56190490722656, -56.097232818603516, -109.11624908447266, 9.80841064453125, 132.77984619140625, -96.99783325195312, 43.87195587158203, 149.38351440429688, 2.473846435546875, -16.984580993652344, -19.28436279296875, 64.15604400634766, -0.6710872650146484, -34.38667297363281, 27.081466674804688, 81.06504821777344, 168.58929443359375, -56.29681396484375, 151.4888458251953, 13.447731018066406, 209.5345458984375, -135.17543029785156, -35.250465393066406, 3.1035289764404297, 184.58531188964844, 200.35470581054688, 113.43984985351562, -11.067649841308594, 88.89920043945312, 180.05641174316406, 31.306129455566406, 246.8231658935547, -9.40850830078125, 123.34246063232422, 40.681373596191406, -30.533737182617188, 238.98809814453125, 156.34817504882812, -15.435134887695312, -198.21957397460938, -30.232025146484375, 133.64837646484375, -7.508182525634766, 118.19657135009766, 82.33403778076172, -8.837081909179688, 46.02888870239258, 209.31118774414062, 4.639106750488281, 160.05470275878906, 2.375396728515625, 113.27606201171875, 184.2076416015625, -26.80718994140625, 100.09083557128906, -14.037132263183594, 238.92037963867188, 184.57281494140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000535.npy"}
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 83.77720642089844, "std": 91.22732543945312, "min": -68.98014831542969, "p10": -7.9752618789672844, "median": 64.80977249145508, "p90": 210.94845275878907, "max": 277.6231384277344, "pos_frac": 0.8125, "sample": [64.74380493164062, 20.36174774169922, -48.391151428222656, 187.83775329589844, 139.09698486328125, 21.958740234375, 6.81256103515625, -23.117843627929688, 109.093994140625, 12.976968765258789, -68.98014831542969, 146.01138305664062, 110.27843475341797, 218.684814453125, 178.78770446777344, 146.47647094726562, 72.20559692382812, 192.73153686523438, 255.0193328857422, 64.87574005126953, 3.7735118865966797, 6.746278762817383, 189.61764526367188, 175.27928161621094, 31.304977416992188, 201.54116821289062, 154.53506469726562, 5.02702522277832, 88.20899963378906, 124.09060668945312, 212.591796875, -1.4987449645996094, 3.504261016845703, 91.57476806640625, -21.03515625, -17.691986083984375, 191.7895050048828, 7.3683319091796875, 221.29624938964844, 1.9315166473388672, 207.11398315429688, 198.70208740234375, 31.511526107788086, -4.287797927856445, 9.11355972290039, 14.367660522460938, 12.338325500488281, 74.66901397705078, 71.05960083007812, 277.6231384277344, -2.784994125366211, 177.14971923828125, 57.4720458984375, 235.238525390625, 125.52366638183594, 53.12413024902344, 25.325340270996094, 242.02830505371094, -8.193136215209961, -61.38740539550781, 31.00226402282715, 127.38436889648438, -7.466888427734375, -2.3054046630859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000536.npy"}
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 65.09119415283203, "std": 103.89234161376953, "min": -149.46875, "p10": -49.83370285034179, "median": 50.421024322509766, "p90": 193.15129699707032, "max": 297.147705078125, "pos_frac": 0.671875, "sample": [156.20262145996094, 102.5963363647461, -17.728729248046875, 143.21267700195312, 230.68243408203125, 182.896484375, 132.241455078125, -0.31831932067871094, 124.45748901367188, 86.29335021972656, 87.58660125732422, -4.620779037475586, 194.22622680664062, 182.37054443359375, 132.8275146484375, 164.300537109375, 190.17288208007812, 4.361089706420898, -0.6476669311523438, -6.952619552612305, -0.5709190368652344, 60.440093994140625, 16.322036743164062, 11.573570251464844, 1.4103374481201172, 208.8441162109375, 122.84280395507812, 15.515968322753906, -104.3409194946289, -8.124677658081055, 40.401954650878906, 33.61056900024414, 135.72311401367188, -39.81695556640625, -124.24925231933594, 17.33298110961914, 270.1873779296875, 104.3148193359375, 1.9468994140625, -14.322704315185547, 25.1722412109375, 190.64312744140625, -136.6360626220703, 94.13807678222656, -30.957853317260742, -16.553274154663086, 116.179931640625, -23.132423400878906, 109.27960205078125, 64.26591491699219, -7.85784912109375, 146.27334594726562, -34.75907897949219, 175.19215393066406, 174.17678833007812, 297.147705078125, -102.14988708496094, 120.13517761230469, -94.90715789794922, 36.38014221191406, 202.30258178710938, 231.8973388671875, -149.46875, -54.12659454345703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000537.npy"}
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 71.08383178710938, "std": 93.6263427734375, "min": -184.33197021484375, "p10": -33.71085433959961, "median": 51.46112823486328, "p90": 182.40737609863282, "max": 308.38665771484375, "pos_frac": 0.796875, "sample": [-32.185890197753906, 180.53074645996094, 40.66722869873047, 15.700862884521484, 87.6820068359375, -184.33197021484375, 68.70916748046875, -34.364410400390625, 108.58247375488281, 162.44595336914062, 87.9854507446289, 183.04537963867188, 33.20361328125, 161.99754333496094, 21.780912399291992, 35.49639892578125, 151.3928985595703, 174.6767578125, 17.989707946777344, 40.78708267211914, 53.804840087890625, 2.2046356201171875, 190.5276641845703, 139.2071533203125, 43.832679748535156, -72.02005004882812, 155.23052978515625, -29.673866271972656, 222.57577514648438, 0.46553802490234375, -38.66703796386719, 175.62966918945312, -17.073448181152344, -30.19952392578125, 5.503242492675781, 44.884498596191406, 41.917388916015625, 72.82899475097656, 1.7923736572265625, 48.08433532714844, 261.1463317871094, 9.8837890625, 217.02044677734375, 92.59262084960938, -97.44524383544922, 131.80540466308594, -40.025230407714844, 180.918701171875, 152.5453643798828, 308.38665771484375, 176.5322723388672, 48.67546081542969, 49.11741638183594, -85.20864868164062, 147.90826416015625, 106.96647644042969, 208.12081909179688, -0.7811851501464844, 101.52101135253906, 41.043060302734375, 64.5549087524414, 62.487213134765625, 92.6369400024414, -13.685192108154297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000538.npy"}
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 54.29193115234375, "std": 96.1384506225586, "min": -143.6624755859375, "p10": -36.19008712768554, "median": 27.180328369140625, "p90": 195.7361267089844, "max": 295.6672668457031, "pos_frac": 0.734375, "sample": [65.96185302734375, 103.89334106445312, 0.06681060791015625, 211.80548095703125, 14.168325424194336, -19.463516235351562, 196.8529052734375, 193.13031005859375, 143.3193359375, 7.9849090576171875, 47.07441711425781, 42.6619873046875, 81.00485229492188, 0.2109527587890625, -46.598907470703125, -7.564735412597656, -6.234527587890625, -2.6605987548828125, 201.8859100341797, 128.63763427734375, 2.007509231567383, -55.359249114990234, 41.159149169921875, -130.7232666015625, 9.74517822265625, -136.8157958984375, -17.03102684020996, 142.37481689453125, 247.0794677734375, 40.704933166503906, 20.898216247558594, -9.323089599609375, -26.987991333007812, 31.604949951171875, 265.8190612792969, 59.060569763183594, 7.595375061035156, 63.647125244140625, 191.2987060546875, 4.666078567504883, -15.272232055664062, 70.6275634765625, 108.46974182128906, 21.57178497314453, 92.85748291015625, 72.85795593261719, -112.2175064086914, 150.3531494140625, 8.908622741699219, -19.83221435546875, 187.1583251953125, 73.27916717529297, 15.018413543701172, -40.13384246826172, 167.92276000976562, 62.81719970703125, -10.845718383789062, 18.310909271240234, -143.6624755859375, 98.02384185791016, 15.820711135864258, 226.66958618164062, 295.6672668457031, 22.755706787109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000539.npy"}
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 80.66496276855469, "std": 95.55758666992188, "min": -146.2391357421875, "p10": -31.194113159179675, "median": 64.58519744873047, "p90": 201.92771453857424, "max": 245.35153198242188, "pos_frac": 0.765625, "sample": [-8.62091064453125, -1.628387451171875, 245.35153198242188, 191.0758056640625, 4.715719223022461, -146.2391357421875, 129.359375, 112.96080017089844, 76.40999603271484, -0.24430465698242188, 50.93979263305664, 149.84521484375, 183.4835662841797, 52.46587371826172, -106.33814239501953, 57.62559509277344, 169.0172119140625, 136.66427612304688, 12.822154998779297, 129.8107452392578, 23.309646606445312, -36.13691711425781, 130.49234008789062, 34.625030517578125, 222.98956298828125, 164.22976684570312, -40.95391082763672, 134.9771728515625, -60.483642578125, 25.896621704101562, 84.11233520507812, 190.0587921142578, 240.26211547851562, 157.22792053222656, 221.53952026367188, -6.690605163574219, -19.660903930664062, 187.25201416015625, -37.88520812988281, 2.3934879302978516, -0.5143775939941406, 199.38291931152344, 176.01614379882812, 203.01834106445312, 198.99273681640625, 26.924636840820312, 205.26234436035156, 45.43357467651367, 71.5447998046875, -91.06177520751953, 33.498008728027344, 104.82553100585938, 46.374755859375, 6.967979431152344, -0.4339447021484375, 40.167816162109375, -14.099838256835938, 231.61851501464844, 34.38883972167969, 104.27595520019531, 192.49398803710938, 10.735868453979492, 83.7113037109375, 196.00169372558594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000540.npy"}
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 70.99945068359375, "std": 92.2991943359375, "min": -176.63890075683594, "p10": -33.016078186035145, "median": 69.03461456298828, "p90": 187.9921875, "max": 228.00210571289062, "pos_frac": 0.71875, "sample": [171.47988891601562, 74.27598571777344, 100.012451171875, 186.3059539794922, -9.169601440429688, 156.31954956054688, -72.01838684082031, 182.52297973632812, 180.24267578125, 6.986572265625, 27.593704223632812, -24.704742431640625, -7.023674011230469, 135.10699462890625, -8.036293029785156, -176.63890075683594, 173.02919006347656, 179.26220703125, 67.62991333007812, 188.71485900878906, -57.78827667236328, 35.35514831542969, 194.54859924316406, 192.62640380859375, 170.20999145507812, 117.79649353027344, 204.61489868164062, -36.57807922363281, -66.09344482421875, -4.161174774169922, 108.93839263916016, 23.88022804260254, 190.4372100830078, 228.00210571289062, 9.200645446777344, 120.064453125, 105.96189880371094, -13.251922607421875, 133.82730102539062, -5.245279312133789, -3.0609874725341797, -99.30081176757812, 156.3527069091797, 175.3268280029297, 16.788774490356445, 86.77688598632812, 155.60308837890625, 159.28927612304688, -66.52784729003906, 27.044219970703125, 0.25949668884277344, 162.5360565185547, 65.24381256103516, 54.02162170410156, -13.552536010742188, 2.565338134765625, 70.43931579589844, 84.563720703125, -22.43619728088379, 28.56525421142578, 112.5046615600586, -5.672819137573242, 192.41409301757812, 19.983863830566406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000541.npy"}
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 79.7806396484375, "std": 108.59000396728516, "min": -224.9474334716797, "p10": -70.01330261230467, "median": 82.15291976928711, "p90": 203.65273437500002, "max": 271.64068603515625, "pos_frac": 0.8125, "sample": [156.13294982910156, 174.43832397460938, 12.371368408203125, 255.53695678710938, 82.1872329711914, 15.6319580078125, 129.1973876953125, 65.45144653320312, 160.4319305419922, 92.4903335571289, 12.156429290771484, -13.508888244628906, -86.95336151123047, 199.01980590820312, 18.312292098999023, 168.6759796142578, 50.198020935058594, 271.64068603515625, -141.9470977783203, 186.34442138671875, 184.597900390625, 236.4190673828125, 48.97283172607422, -2.0491867065429688, -144.65992736816406, -224.9474334716797, 129.95361328125, 73.06910705566406, 137.94662475585938, -48.521820068359375, 186.91896057128906, 194.6709442138672, 0.9059886932373047, 205.63827514648438, -104.58917236328125, -7.177600860595703, 1.07061767578125, 114.4385986328125, 1.2406997680664062, 127.16600799560547, 11.70263671875, 21.904022216796875, 193.6978759765625, -79.22393798828125, 89.44801330566406, 0.6653213500976562, 193.99282836914062, 24.96172332763672, 85.79554748535156, 212.1193084716797, 153.73712158203125, 208.28720092773438, -81.95646667480469, 46.14665985107422, -6.227968215942383, 5.073139190673828, 15.279098510742188, 189.29832458496094, 170.83291625976562, 73.61558532714844, 82.11860656738281, 224.5823974609375, 157.58053588867188, 193.65835571289062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000542.npy"}
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 75.18693542480469, "std": 98.36256408691406, "min": -209.5611572265625, "p10": -22.35770530700683, "median": 73.97373962402344, "p90": 194.14971313476565, "max": 251.81222534179688, "pos_frac": 0.8125, "sample": [28.164047241210938, 32.93259811401367, 76.77459716796875, 0.29425621032714844, 117.48817443847656, 12.572345733642578, 143.01046752929688, 195.66549682617188, -174.4495086669922, 220.790283203125, 100.85610961914062, 160.3565216064453, 144.98350524902344, 183.39134216308594, 229.83273315429688, 110.41778564453125, 181.36758422851562, -128.75123596191406, 72.8529281616211, 201.83387756347656, 16.42488670349121, -8.558876037597656, 0.2648887634277344, 46.6180419921875, 7.240917205810547, 184.106689453125, -209.5611572265625, 1.1316070556640625, -16.586788177490234, -54.755828857421875, 50.48866271972656, 176.36639404296875, 221.03134155273438, 2.236919403076172, 174.35218811035156, 71.42994689941406, 190.61288452148438, -2.535043716430664, -28.855079650878906, -7.1904296875, 87.25640869140625, 126.18477630615234, 181.41098022460938, 31.19171142578125, 185.71751403808594, 25.179176330566406, 134.4317626953125, 251.81222534179688, 33.30704879760742, 120.83804321289062, -13.474163055419922, 75.09455108642578, 99.88388061523438, -24.830955505371094, 120.87763977050781, 42.1387939453125, 117.09335327148438, 36.50830078125, 10.445755004882812, -60.04036331176758, 158.0851593017578, 225.09786987304688, 2.9825782775878906, 120.12568664550781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000543.npy"}
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 81.4837875366211, "std": 109.63634490966797, "min": -150.0077362060547, "p10": -52.494215393066405, "median": 86.34847259521484, "p90": 232.5356475830078, "max": 276.24005126953125, "pos_frac": 0.78125, "sample": [8.47314453125, -132.4404296875, 65.6512451171875, 23.139192581176758, 142.87078857421875, 68.31494140625, -102.13912200927734, 32.62616729736328, -40.954437255859375, 15.306663513183594, 43.981414794921875, 90.23899841308594, -17.538482666015625, 210.28582763671875, 102.51862335205078, 0.43842124938964844, -140.78443908691406, 105.65186309814453, 110.93726348876953, 60.548431396484375, 138.26918029785156, 149.61807250976562, 164.29351806640625, 178.16969299316406, 235.98275756835938, 52.43310546875, -4.3135528564453125, -8.967605590820312, 1.2787151336669922, -150.0077362060547, 276.24005126953125, -144.82449340820312, 181.9755401611328, -7.131500244140625, 194.09976196289062, 155.4885711669922, 236.5966796875, -53.78816223144531, 40.24015808105469, 89.2705307006836, 180.44056701660156, 39.576072692871094, 232.3736572265625, 62.859413146972656, 83.4264144897461, 93.37586975097656, 122.08692932128906, 152.6470947265625, 232.60507202148438, -49.475006103515625, 26.437944412231445, 168.93106079101562, 267.84844970703125, 261.46978759765625, -86.1186752319336, -18.382492065429688, 186.47720336914062, 34.640716552734375, 193.64547729492188, 160.8023681640625, 2.2209320068359375, 96.00564575195312, 124.03009033203125, 274.988525390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000544.npy"}
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 49.775474548339844, "std": 101.80859375, "min": -203.7584991455078, "p10": -75.71331787109375, "median": 26.38862419128418, "p90": 193.8134689331055, "max": 233.79934692382812, "pos_frac": 0.6875, "sample": [128.52638244628906, 108.00189971923828, -61.95208740234375, 205.50283813476562, 22.921031951904297, -3.8245391845703125, -77.5894546508789, -72.61754608154297, -34.84160614013672, 45.20731735229492, 131.01315307617188, -8.10595703125, 188.99440002441406, 61.59403991699219, 59.74909210205078, 5.265480041503906, 74.30607604980469, 196.36769104003906, -67.64483642578125, 111.72149658203125, -20.32752227783203, 222.85385131835938, -20.00560760498047, 138.31362915039062, 99.31561279296875, -20.508384704589844, 86.60164642333984, 24.819955825805664, -24.122787475585938, 1.7089214324951172, -3.459766387939453, -77.04007720947266, -89.05511474609375, 135.73294067382812, -93.81484985351562, 1.436594009399414, 185.3319549560547, 195.8787841796875, 157.1798095703125, 3.9109878540039062, -9.022228240966797, 233.79934692382812, 148.7549285888672, 10.464424133300781, 20.87291717529297, 88.87390899658203, -44.82970428466797, 129.72512817382812, 60.82586669921875, -203.7584991455078, 22.5755615234375, 95.9140625, 27.061920166015625, 233.5723114013672, 98.3883056640625, -132.30726623535156, 4.336490631103516, 144.48565673828125, -178.4261016845703, 1.4271564483642578, 89.62647247314453, 178.85910034179688, 221.349853515625, 25.715328216552734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000545.npy"}
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 56.357269287109375, "std": 107.62983703613281, "min": -230.78265380859375, "p10": -98.14736938476562, "median": 47.70923614501953, "p90": 185.53152770996095, "max": 286.6864013671875, "pos_frac": 0.765625, "sample": [188.69529724121094, -45.91314697265625, 182.38490295410156, 60.691551208496094, 111.114990234375, 164.47726440429688, -7.5543212890625, -44.38859558105469, 37.33924865722656, -99.90521240234375, -230.78265380859375, 149.98887634277344, 9.241207122802734, 25.653839111328125, -167.8919219970703, 242.33700561523438, 9.82996940612793, -9.313285827636719, -58.947715759277344, 29.10626983642578, -162.10394287109375, 58.076263427734375, 114.44941711425781, 130.01446533203125, 14.529029846191406, 95.06883239746094, 151.456787109375, -38.30792236328125, 141.80702209472656, 207.32037353515625, 145.24053955078125, 25.153099060058594, 163.21246337890625, 37.34220886230469, 20.74515151977539, -96.5513916015625, 31.559730529785156, 286.6864013671875, 129.0006103515625, 114.12347412109375, 2.167400360107422, 34.24785614013672, -108.67631530761719, 119.9433822631836, 26.23516273498535, 107.41779327392578, -23.56391143798828, 15.514057159423828, 98.23294830322266, 108.67494201660156, 161.8020477294922, 35.33320617675781, 221.0482177734375, -98.83135986328125, 12.705162048339844, 67.91253662109375, 148.6486358642578, 86.84852600097656, 186.8800811767578, 65.8627700805664, 198.11929321289062, -162.99447631835938, 181.82864379882812, 6.522520065307617], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000546.npy"}
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 77.2716064453125, "std": 104.11660766601562, "min": -214.35018920898438, "p10": -8.52962417602539, "median": 68.08197784423828, "p90": 202.28126525878906, "max": 315.47686767578125, "pos_frac": 0.828125, "sample": [3.8255844116210938, 39.45881652832031, 11.336690902709961, 12.693099975585938, 239.5406494140625, 91.21968841552734, -73.31246185302734, 229.68429565429688, 98.65643310546875, 156.92872619628906, 198.17556762695312, 144.4598388671875, 44.82740020751953, 83.12596130371094, 75.15608215332031, -8.336311340332031, 112.10859680175781, 181.17251586914062, 23.97256088256836, 20.90636444091797, 206.49024963378906, 169.20179748535156, 186.65530395507812, 3.285675048828125, 251.22406005859375, 156.42446899414062, 11.051010131835938, 100.72844696044922, 49.411651611328125, 6.1989593505859375, 95.62049865722656, -2.1248722076416016, -209.9332275390625, 109.93888854980469, -8.612472534179688, 67.83021545410156, 195.45648193359375, 8.824115753173828, 202.62619018554688, 66.05458068847656, 141.71566772460938, 184.1717529296875, 121.41200256347656, 155.38526916503906, -156.50054931640625, 68.333740234375, 11.503242492675781, -214.35018920898438, 37.579620361328125, 21.581069946289062, 201.4764404296875, 190.60745239257812, 315.47686767578125, -1.7240219116210938, 87.23164367675781, -16.99981689453125, 39.25679397583008, 168.9190673828125, -4.487451553344727, 36.183799743652344, 13.257047653198242, 8.922103881835938, 217.7691650390625, -33.29010772705078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000547.npy"}
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 50.482913970947266, "std": 98.31647491455078, "min": -141.31105041503906, "p10": -79.14031524658203, "median": 41.56401062011719, "p90": 195.8153091430664, "max": 233.00680541992188, "pos_frac": 0.671875, "sample": [147.7255096435547, 160.52035522460938, -37.803924560546875, -57.34483337402344, -141.31105041503906, 68.97404479980469, -8.744773864746094, 227.842529296875, 161.6515655517578, 58.07201385498047, -32.517333984375, 193.7534942626953, 0.4376716613769531, 12.918083190917969, 43.37577819824219, -2.048398971557617, -79.45846557617188, 41.605064392089844, -5.74285888671875, 112.11471557617188, 223.52261352539062, 57.08464050292969, -89.19320678710938, 16.455459594726562, 142.77056884765625, 37.72202682495117, 217.86160278320312, 197.28610229492188, 20.948150634765625, 109.56871032714844, -56.492679595947266, 114.53208923339844, -16.28826904296875, 128.1252899169922, 28.519699096679688, -46.99061584472656, 205.3958740234375, 41.52295684814453, -70.29240417480469, 127.5343246459961, 69.54447174072266, -3.5556888580322266, 55.71424102783203, -114.37655639648438, 233.00680541992188, 1.6174468994140625, -22.84667205810547, 80.38044738769531, -123.81233978271484, 130.68478393554688, 75.0927734375, -115.04060363769531, 49.49803924560547, 156.31089782714844, 92.96349334716797, 15.271873474121094, 158.89198303222656, -18.643768310546875, 24.293121337890625, -78.39796447753906, 196.69894409179688, 175.87351989746094, -83.2913818359375, 21.4166259765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000548.npy"}
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 59.60078048706055, "std": 109.64542388916016, "min": -218.08767700195312, "p10": -78.34507217407226, "median": 57.79249572753906, "p90": 196.60572509765626, "max": 255.07318115234375, "pos_frac": 0.6875, "sample": [13.1015625, 22.11603546142578, -67.6760482788086, 93.24311065673828, 0.8767776489257812, -86.61015319824219, 88.9986343383789, 3.928171157836914, 75.33245086669922, 210.47564697265625, 235.3814239501953, 3.1295547485351562, 91.62248229980469, 179.9759063720703, -5.2830047607421875, -42.501014709472656, -109.1942138671875, 80.33045959472656, 140.91085815429688, 166.2050018310547, 168.841796875, 142.13113403320312, 221.25296020507812, -109.91539001464844, 74.06838989257812, 11.281036376953125, 34.81930160522461, 197.59725952148438, 159.7576141357422, -64.67928314208984, 194.29214477539062, 162.27630615234375, 255.07318115234375, -26.150697708129883, 165.25454711914062, 248.66143798828125, -11.354070663452148, -137.68841552734375, -22.514190673828125, 153.33236694335938, -218.08767700195312, 30.318435668945312, -10.774024963378906, 56.62236404418945, 58.96262741088867, 146.84344482421875, -39.04717254638672, 140.86582946777344, 228.947021484375, 52.64305114746094, -1.497039794921875, 137.88323974609375, 152.422119140625, 84.56489562988281, 82.8975830078125, -10.58380126953125, 5.433767318725586, -3.548259735107422, -20.37784194946289, 124.01866912841797, 185.7852020263672, -197.7760772705078, -82.91751098632812, 0.1498870849609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000549.npy"}
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 39.54109191894531, "std": 97.61637878417969, "min": -197.30714416503906, "p10": -85.69668121337888, "median": 21.761505126953125, "p90": 166.40098724365234, "max": 272.7492370605469, "pos_frac": 0.71875, "sample": [89.51678466796875, 207.06005859375, 191.56307983398438, 21.752288818359375, -113.683349609375, 9.538520812988281, 0.6802616119384766, -33.522003173828125, -25.772377014160156, 91.19265747070312, 167.5408935546875, 42.93925476074219, 16.36937713623047, 2.1020355224609375, 11.872407913208008, 211.13983154296875, 208.68927001953125, 87.92723846435547, 21.770721435546875, 31.49184799194336, 146.81094360351562, -13.788692474365234, 122.19891357421875, 93.51235961914062, 160.59825134277344, -41.08186340332031, 272.7492370605469, -8.934215545654297, 14.401725769042969, -110.13902282714844, -116.89252471923828, 25.660171508789062, -49.76446533203125, 78.49482727050781, 5.756561279296875, 70.67577362060547, 70.27336883544922, 17.27777099609375, -62.996971130371094, 12.051254272460938, -93.93826293945312, 90.71049499511719, 23.978809356689453, 64.60267639160156, -2.9321441650390625, 130.31124877929688, 14.82354736328125, 16.55931282043457, 56.548065185546875, 133.50747680664062, -149.91883850097656, 121.57162475585938, 165.97723388671875, 52.26873016357422, 159.52999877929688, 20.349626541137695, -197.30714416503906, 10.35243034362793, 166.5825958251953, -172.83670043945312, 127.4000244140625, -52.80957794189453, -15.267162322998047, -66.46632385253906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000550.npy"}
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 50.62107849121094, "std": 91.17156982421875, "min": -217.1175537109375, "p10": -24.13186798095703, "median": 23.37255096435547, "p90": 182.15781555175786, "max": 382.02423095703125, "pos_frac": 0.75, "sample": [-15.87701416015625, 80.17678833007812, 95.46134948730469, 47.928863525390625, 114.34039306640625, 44.229026794433594, 382.02423095703125, 1.4868278503417969, 29.040098190307617, 27.31359100341797, -25.883625030517578, 23.24828338623047, 172.94256591796875, 0.37621498107910156, 41.887664794921875, 25.808391571044922, 131.15267944335938, 23.49681854248047, -217.1175537109375, -16.20673370361328, -12.593477249145508, 7.332344055175781, 114.08621978759766, 43.171173095703125, 95.83443450927734, 190.37608337402344, -22.504562377929688, 87.81525421142578, 38.78919982910156, 158.04934692382812, 19.672693252563477, -45.96612548828125, -0.3493499755859375, -24.82928466796875, 22.12713623046875, 234.93011474609375, 5.776885986328125, 154.997802734375, -15.594160079956055, 6.1132659912109375, 8.879959106445312, 29.346229553222656, -4.952766418457031, 2.7091407775878906, 170.95761108398438, 228.16082763671875, 186.10720825195312, 14.868839263916016, 19.608497619628906, 1.8840522766113281, 63.96387481689453, 91.66046905517578, 37.168731689453125, -45.36166763305664, 14.289215087890625, -4.3170318603515625, -10.53842544555664, 208.9174041748047, 11.69927978515625, 8.683841705322266, -94.70684814453125, -31.27083969116211, 193.0932159423828, 115.83432006835938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000551.npy"}
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 60.77858352661133, "std": 89.81974029541016, "min": -200.99722290039062, "p10": -29.18855972290038, "median": 61.52622985839844, "p90": 174.29531097412115, "max": 245.64393615722656, "pos_frac": 0.796875, "sample": [9.12114143371582, 37.021148681640625, 25.241676330566406, -17.611167907714844, 132.2391815185547, 50.860103607177734, 2.430379867553711, 39.01106262207031, 39.64021301269531, 140.48831176757812, 203.57644653320312, 245.64393615722656, 121.85661315917969, 61.457801818847656, -14.846122741699219, -73.64990997314453, 13.740348815917969, 129.45370483398438, 59.679039001464844, -34.150299072265625, 53.615203857421875, 111.40132141113281, 121.16865539550781, 152.617919921875, 139.80868530273438, -12.718528747558594, 54.744041442871094, 135.86468505859375, 178.82215881347656, -200.99722290039062, 4.191305160522461, -1.039581298828125, 85.98712158203125, -182.63308715820312, 82.9908218383789, -0.2127685546875, 149.46484375, 39.82341003417969, 12.337959289550781, -5.397361755371094, 190.0867919921875, 15.080963134765625, -37.80725860595703, 90.29048156738281, 163.732666015625, 16.657379150390625, 77.43650817871094, 62.43335723876953, -197.8401641845703, 192.67201232910156, 195.0132293701172, 98.22705841064453, 9.90089225769043, -60.135231018066406, 113.73101806640625, 90.24248504638672, 81.13903045654297, 88.0324935913086, 180.1605682373047, 96.13102722167969, 61.59465789794922, 59.19422149658203, 148.29112243652344, 64.52078247070312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000552.npy"}
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 59.167213439941406, "std": 108.52555847167969, "min": -199.99920654296875, "p10": -65.50988121032715, "median": 50.663856506347656, "p90": 217.59676361083987, "max": 297.03460693359375, "pos_frac": 0.671875, "sample": [239.19253540039062, -27.132919311523438, -5.1524810791015625, 92.68275451660156, 8.503044128417969, -15.345743179321289, 164.05921936035156, 210.60760498046875, -16.281166076660156, -15.231796264648438, 52.4212646484375, 104.76443481445312, 220.5921173095703, 258.12628173828125, -22.37885284423828, 122.65115356445312, -11.83924674987793, 67.79216766357422, 82.57968139648438, -199.99920654296875, -4.388891220092773, 123.39192199707031, -16.426307678222656, 147.60931396484375, 179.96072387695312, 35.994041442871094, 32.15357208251953, 48.90644836425781, 3.41265869140625, -52.79103088378906, 255.19219970703125, 183.1796875, 92.60222625732422, 297.03460693359375, 28.401901245117188, 114.5835952758789, 189.665283203125, 160.1222381591797, 73.39389038085938, 95.72489166259766, 39.91472625732422, 78.75994873046875, 28.53883171081543, -10.45595932006836, -66.68177795410156, 65.2206802368164, -1.4253578186035156, 223.30401611328125, -175.80520629882812, -107.09744262695312, 60.936767578125, 162.87843322753906, 123.7684326171875, -72.44100952148438, -68.73917388916016, 26.342086791992188, 2.5773963928222656, -177.2546844482422, 118.61569213867188, 84.93306732177734, -38.034034729003906, -62.775455474853516, 236.68338012695312, 16.604660034179688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000553.npy"}
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 70.09081268310547, "std": 99.37731170654297, "min": -193.36923217773438, "p10": -36.59535064697265, "median": 62.56979560852051, "p90": 189.74175415039062, "max": 248.0257110595703, "pos_frac": 0.75, "sample": [87.64778137207031, 223.58888244628906, 193.54806518554688, 236.9996337890625, -6.995994567871094, 92.16273498535156, 9.522981643676758, 8.898746490478516, 65.64987182617188, 164.02276611328125, -8.392547607421875, -9.119880676269531, -3.7032928466796875, 128.237060546875, -193.36923217773438, 44.89302062988281, 119.16400146484375, -25.170928955078125, 143.26351928710938, 32.899452209472656, -13.540000915527344, 166.86117553710938, 38.02581787109375, -91.53485107421875, 131.83384704589844, 139.97198486328125, 195.70257568359375, 43.1307258605957, -1.3469352722167969, 33.034873962402344, 190.23731994628906, 6.3626251220703125, 137.7515106201172, 60.29092788696289, 5.597881317138672, -87.68943786621094, 184.4821319580078, 26.288408279418945, 137.8463134765625, -8.56695556640625, -163.9910430908203, 140.9204559326172, 64.84866333007812, 101.82466125488281, -41.49153137207031, 131.30850219726562, 16.1387996673584, 137.54330444335938, 47.28977966308594, 8.24822998046875, 150.05787658691406, -9.156684875488281, -60.55043411254883, 174.89111328125, 203.25509643554688, 14.724800109863281, 183.80862426757812, 172.70033264160156, 119.4072494506836, 188.58543395996094, 248.0257110595703, -136.15542602539062, 180.21072387695312, 14.881332397460938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000554.npy"}
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 58.08625030517578, "std": 110.61315155029297, "min": -210.80294799804688, "p10": -92.34090728759764, "median": 57.71342086791992, "p90": 190.22879638671876, "max": 230.4990692138672, "pos_frac": 0.734375, "sample": [-0.40796852111816406, -209.53512573242188, 65.72014617919922, 59.25227355957031, 186.12945556640625, 20.50312042236328, -205.15699768066406, -20.484619140625, -125.40467834472656, 103.54383850097656, 206.99273681640625, -43.52406311035156, 132.26004028320312, 38.56050491333008, 141.69419860839844, 215.11575317382812, 44.809165954589844, -4.4665374755859375, 11.361846923828125, 193.77365112304688, 84.62726593017578, 3.4144744873046875, 206.5858612060547, 61.31413269042969, 183.05931091308594, 34.31913757324219, 126.76071166992188, 184.3728485107422, 34.08113098144531, 191.98565673828125, 161.4012908935547, 168.09542846679688, 71.35623168945312, 160.15927124023438, -119.69414520263672, 201.48780822753906, -210.80294799804688, 160.55479431152344, -84.99466705322266, 52.70222854614258, 91.30694580078125, 47.34193420410156, -24.66057586669922, 27.043251037597656, -64.27549743652344, -7.381908416748047, 102.24102783203125, 102.89081573486328, 175.20697021484375, -20.407730102539062, 153.2462158203125, 56.17456817626953, -197.3088836669922, -0.43512725830078125, 172.89642333984375, 38.20249938964844, 95.23992919921875, 41.17791748046875, 109.56048583984375, -95.48929595947266, 230.4990692138672, 29.3692684173584, 160.1512451171875, 13.407934188842773], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000555.npy"}
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 51.2664680480957, "std": 109.87361145019531, "min": -206.6536407470703, "p10": -83.09500122070312, "median": 26.61295509338379, "p90": 199.40765075683595, "max": 279.92431640625, "pos_frac": 0.71875, "sample": [167.8624267578125, 19.657745361328125, 39.507781982421875, -11.835880279541016, 14.665969848632812, 181.09197998046875, 205.01673889160156, 60.700191497802734, -6.957658767700195, -83.01702880859375, 18.48484992980957, 2.566436767578125, 92.4639663696289, 175.95584106445312, 29.781291961669922, -5.8907623291015625, 39.51544189453125, -6.208152770996094, 116.07460021972656, 174.83428955078125, -20.77649688720703, 154.4491424560547, 200.42269897460938, 23.444618225097656, -49.61707305908203, 20.170021057128906, -206.6536407470703, 30.634920120239258, 143.2193603515625, -140.22462463378906, 0.5614719390869141, -188.49673461914062, 98.48182678222656, 14.761405944824219, 30.329971313476562, 2.345029830932617, 152.40769958496094, 170.17076110839844, 7.6257476806640625, 4.909360885620117, 187.40956115722656, 279.92431640625, 44.26236343383789, 8.273590087890625, 232.97621154785156, 230.95285034179688, -51.58576965332031, 23.276885986328125, -140.2672576904297, -2.9286346435546875, 205.52879333496094, -98.89736938476562, -48.85676574707031, 194.53839111328125, 3.4794044494628906, -83.12841796875, 57.71226501464844, -82.2147216796875, 82.07240295410156, 182.29025268554688, 198.13815307617188, 199.95172119140625, 75.18952941894531, -89.47929382324219], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000556.npy"}
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 78.81112670898438, "std": 91.79977416992188, "min": -124.85125732421875, "p10": -14.979458618164063, "median": 69.18887329101562, "p90": 204.9473083496094, "max": 239.28594970703125, "pos_frac": 0.75, "sample": [202.6643524169922, 128.92591857910156, 96.38534545898438, 13.053329467773438, -0.09846115112304688, -36.046356201171875, 195.6661376953125, 32.402320861816406, 193.1759796142578, 91.89450073242188, -0.2663421630859375, -15.008224487304688, -3.464691162109375, 94.29869842529297, 8.300952911376953, 79.76460266113281, 66.44772338867188, 3.4723892211914062, 187.63279724121094, 96.13373565673828, 28.985240936279297, 14.302520751953125, 216.80184936523438, 60.818084716796875, -69.29173278808594, -50.574432373046875, 180.49478149414062, 213.49642944335938, 31.721126556396484, -51.80799102783203, 228.61141967773438, 135.81919860839844, 176.42047119140625, -10.68777084350586, -14.912338256835938, 195.8440704345703, 116.1695327758789, 129.59817504882812, 38.53141784667969, 205.91650390625, -13.276168823242188, 15.143129348754883, -11.339515686035156, 0.8748779296875, -0.4637622833251953, -124.85125732421875, 219.64381408691406, 239.28594970703125, 181.34814453125, 35.182701110839844, 12.448066711425781, 76.55951690673828, 32.55021667480469, 171.91078186035156, 164.728271484375, 71.93002319335938, 202.68585205078125, 96.21012878417969, 38.055538177490234, 93.87113952636719, -49.07615661621094, -7.8315277099609375, 211.7481689453125, 174.9831085205078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000557.npy"}
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 84.30059814453125, "std": 101.3362045288086, "min": -155.3302001953125, "p10": -35.837203216552716, "median": 80.15031433105469, "p90": 204.71068115234377, "max": 326.9744567871094, "pos_frac": 0.765625, "sample": [175.45950317382812, 215.7766571044922, 205.73056030273438, -19.493911743164062, 197.41885375976562, 7.7992401123046875, 188.31344604492188, -44.389896392822266, 2.3998641967773438, 88.20600891113281, -20.278785705566406, 175.62738037109375, -46.21412658691406, 21.426979064941406, -3.493682861328125, 155.4268798828125, 187.06637573242188, -2.189666748046875, 113.5447998046875, 92.97113800048828, 326.9744567871094, 163.0617218017578, 5.859485626220703, 187.27810668945312, 139.22900390625, -77.99563598632812, 177.68740844726562, 21.42162322998047, 200.9228515625, 72.35503387451172, 176.6001739501953, 28.689254760742188, 86.37863159179688, 220.7291717529297, 15.576919555664062, 4.159414291381836, 105.32212829589844, 10.000991821289062, 73.14790344238281, 248.34323120117188, 157.11573791503906, -42.505096435546875, 258.63385009765625, 26.408859252929688, 192.223876953125, 191.28182983398438, -13.521888732910156, -51.83448028564453, 232.1027069091797, -9.579391479492188, 133.76678466796875, -155.3302001953125, 92.91997528076172, -49.958770751953125, 73.9219970703125, -7.684333801269531, 136.8056182861328, 8.908782958984375, -17.809783935546875, 3.225677490234375, 202.33096313476562, 38.325538635253906, 98.69368743896484, 19.94696044921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000558.npy"}
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 73.0816650390625, "std": 113.8064956665039, "min": -166.6754913330078, "p10": -43.84312591552734, "median": 41.62673568725586, "p90": 207.82183074951172, "max": 409.72381591796875, "pos_frac": 0.671875, "sample": [24.204452514648438, -34.225685119628906, -9.668155670166016, -15.63299560546875, 106.97017669677734, 409.72381591796875, 244.31277465820312, 270.81121826171875, 44.254493713378906, -4.678752899169922, -130.71875, 159.94973754882812, 68.48851776123047, 119.83372497558594, 6.388456344604492, 194.12234497070312, 178.76773071289062, 83.87439727783203, 154.08958435058594, 99.11378479003906, 206.85470581054688, 29.32666015625, -34.34013366699219, 162.05654907226562, -3.7438125610351562, 37.980979919433594, -127.25518798828125, 187.80752563476562, 56.11065673828125, 194.52871704101562, 38.99897766113281, 79.73692321777344, -27.688003540039062, 285.73443603515625, 215.81246948242188, -47.54029083251953, 108.53495025634766, 10.344961166381836, -38.402809143066406, -20.24657440185547, -9.865776062011719, -14.063209533691406, -24.437557220458984, 23.754730224609375, 148.82611083984375, 202.18450927734375, 162.76869201660156, -46.17469024658203, 166.54200744628906, -18.334732055664062, 37.440673828125, 208.23631286621094, -116.77389526367188, 15.176986694335938, 170.76173400878906, 219.93467712402344, 19.022884368896484, 128.1956787109375, -0.77447509765625, 15.803865432739258, 134.8873748779297, 199.9648895263672, -63.76750946044922, -166.6754913330078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000559.npy"}
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 64.38724517822266, "std": 92.3763427734375, "min": -157.87435913085938, "p10": -38.00139007568359, "median": 44.748104095458984, "p90": 193.03089599609376, "max": 218.1884765625, "pos_frac": 0.78125, "sample": [58.63330841064453, 52.488739013671875, 199.25706481933594, 25.00005340576172, -157.87435913085938, -21.77674102783203, 197.36004638671875, 15.132186889648438, 191.51754760742188, 1.3546161651611328, 68.48887634277344, 13.645210266113281, 114.9102783203125, 43.465789794921875, 4.194244384765625, 10.2645263671875, -3.9413604736328125, 3.7508792877197266, 21.727039337158203, 133.2617645263672, -85.45353698730469, 141.07571411132812, 64.78205108642578, 6.441917419433594, 184.48338317871094, 152.56143188476562, -5.760353088378906, 188.2688751220703, 193.67947387695312, 6.376272201538086, 0.157745361328125, -11.586738586425781, -30.769737243652344, 211.49267578125, 162.05909729003906, -74.46978759765625, 38.256004333496094, -70.10188293457031, 188.42250061035156, -41.100669860839844, 195.68472290039062, 136.81134033203125, 46.030418395996094, 89.68939208984375, 5.103691101074219, 104.33402252197266, 2.2783889770507812, 191.20367431640625, 37.90804672241211, -21.473037719726562, 9.870925903320312, 218.1884765625, -63.102439880371094, -3.3899993896484375, 163.76055908203125, 199.54495239257812, -106.67167663574219, 66.73794555664062, 183.35366821289062, 55.68751907348633, 182.79977416992188, 95.559326171875, 35.898380279541016, 105.3013916015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000560.npy"}
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 63.501888275146484, "std": 100.86946868896484, "min": -173.92213439941406, "p10": -46.5287425994873, "median": 30.211685180664062, "p90": 199.12530822753905, "max": 267.80426025390625, "pos_frac": 0.734375, "sample": [4.595024108886719, 11.62762451171875, 22.22283935546875, -8.27183723449707, 24.830690383911133, 181.35552978515625, 110.20857238769531, -16.38848114013672, 180.87942504882812, 34.79052734375, 19.836029052734375, 139.6292266845703, 0.017505645751953125, 117.79666137695312, 18.493499755859375, -87.36251831054688, 27.587936401367188, 168.06591796875, 198.84475708007812, -5.5406036376953125, -47.65171813964844, 189.37677001953125, 146.43763732910156, -69.57473754882812, 126.97834777832031, -24.212440490722656, 132.28387451171875, -43.90846633911133, -173.92213439941406, 249.64569091796875, -9.363605499267578, 32.83543395996094, 199.24554443359375, 206.52488708496094, 35.66688537597656, 27.268035888671875, 53.13343048095703, 37.50060272216797, 11.592107772827148, 253.47531127929688, 267.80426025390625, 227.13258361816406, 183.44154357910156, 199.2996368408203, 155.40379333496094, -114.8744125366211, -57.38800811767578, 16.25952911376953, 83.28028869628906, 11.87220573425293, -40.7166748046875, 61.5201416015625, 8.453681945800781, -18.30622673034668, 96.70608520507812, 195.1923370361328, 2.788633346557617, 141.4976348876953, 109.42514038085938, -94.11129760742188, -2.704366683959961, 20.26170539855957, 174.20640563964844, -38.873809814453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000561.npy"}
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 79.74699401855469, "std": 104.31777954101562, "min": -113.33302307128906, "p10": -34.64662895202636, "median": 59.61567306518555, "p90": 202.66487426757814, "max": 343.663818359375, "pos_frac": 0.734375, "sample": [11.207733154296875, 188.55462646484375, -95.76278686523438, 275.4450988769531, 216.1073455810547, 197.1876220703125, 38.072593688964844, 314.226806640625, 171.77120971679688, -18.69293975830078, 79.12342834472656, 57.8441162109375, -110.59144592285156, 25.320568084716797, 40.21197509765625, -2.2069969177246094, 36.27354431152344, 21.0, 134.25030517578125, -79.23963928222656, 140.9310302734375, 189.41510009765625, -37.71412658691406, -8.422920227050781, -3.8471298217773438, 187.14752197265625, -4.937948226928711, 33.948020935058594, 40.60400390625, 0.055492401123046875, 180.17926025390625, -37.0443115234375, 170.3148193359375, 200.87411499023438, 85.25686645507812, 343.663818359375, -29.05203628540039, -18.728347778320312, 2.7150611877441406, 222.01446533203125, -2.9345016479492188, 18.743370056152344, 88.98008728027344, 123.32110595703125, 160.85711669921875, 194.40615844726562, 203.43234252929688, 127.07513427734375, 149.88726806640625, -19.17108154296875, 176.29461669921875, 65.37582397460938, -26.924470901489258, 58.720664978027344, 110.637451171875, 82.14032745361328, 181.02572631835938, 206.76210021972656, 144.42398071289062, -113.33302307128906, 0.023345947265625, -60.20594787597656, 16.28362274169922, 60.51068115234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000562.npy"}
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 70.55564880371094, "std": 108.48339080810547, "min": -199.96624755859375, "p10": -73.31519927978513, "median": 68.73756408691406, "p90": 200.9850555419922, "max": 238.6352996826172, "pos_frac": 0.75, "sample": [101.3444595336914, 161.48048400878906, 45.93558883666992, 0.2842254638671875, 192.9691925048828, 135.11026000976562, -199.96624755859375, 210.53225708007812, 177.1651153564453, 13.466026306152344, 14.307498931884766, 185.19309997558594, 15.955604553222656, -80.86640930175781, 190.3523406982422, 55.16084289550781, -164.14231872558594, 106.0092544555664, -40.02875518798828, 238.6352996826172, 35.70098876953125, 223.29037475585938, -55.695709228515625, 209.8380126953125, 169.710205078125, 108.19889068603516, 125.65998840332031, -0.21030426025390625, 80.33636474609375, -155.92230224609375, -5.1550140380859375, -23.24429702758789, 7.95660400390625, 47.15321350097656, -0.6807746887207031, 181.58462524414062, 72.42755126953125, -88.7467041015625, 115.15264892578125, 27.749629974365234, 196.39218139648438, 202.95343017578125, 186.17225646972656, -23.882278442382812, -110.4296875, 65.04757690429688, 52.238677978515625, -42.465110778808594, 195.6995849609375, 119.39202117919922, -140.8775634765625, 81.03689575195312, 39.25577926635742, 143.964111328125, 148.8864288330078, 55.59584045410156, 204.70797729492188, 231.99554443359375, 36.25471496582031, 188.0204620361328, 136.84149169921875, -43.44059753417969, 143.877197265625, 14.322952270507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000563.npy"}
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 88.3934097290039, "std": 109.9214859008789, "min": -212.5088653564453, "p10": -42.80894546508788, "median": 92.07948684692383, "p90": 219.3624465942383, "max": 245.5805206298828, "pos_frac": 0.8125, "sample": [186.7901611328125, 10.913215637207031, 215.05972290039062, 193.03033447265625, 84.79676055908203, 239.36349487304688, 136.16270446777344, -197.64064025878906, -32.867820739746094, 8.184799194335938, 206.25802612304688, 175.24044799804688, 50.61265182495117, -29.52305793762207, 5.91912841796875, 24.337234497070312, -16.90252685546875, 159.69366455078125, 177.5478515625, 94.89000701904297, 42.46049880981445, 1.1551685333251953, 122.89804077148438, 159.39849853515625, 219.3247528076172, 167.35189819335938, 233.49197387695312, -74.34933471679688, -5.134511947631836, 127.095458984375, 236.470947265625, 245.5805206298828, 221.06845092773438, 111.259521484375, 34.622283935546875, -147.88037109375, 172.13021850585938, 69.30599975585938, 89.26896667480469, 8.990653991699219, 184.01132202148438, 52.669830322265625, 74.04240417480469, -26.357118606567383, -87.64897155761719, 70.106689453125, 75.03076934814453, 241.1866455078125, 106.70648956298828, -67.61109924316406, 86.77476501464844, 147.5621795654297, 196.55609130859375, 14.29075813293457, 173.94500732421875, 33.88182830810547, 188.37644958496094, 219.37860107421875, -212.5088653564453, 197.97271728515625, -47.069427490234375, 16.130817413330078, 114.29070281982422, 179.0840301513672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000564.npy"}
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 60.46666717529297, "std": 94.19705200195312, "min": -171.53915405273438, "p10": -54.43049430847168, "median": 47.03476333618164, "p90": 179.47422790527347, "max": 291.2968444824219, "pos_frac": 0.765625, "sample": [8.183143615722656, 14.98263931274414, -6.753395080566406, 20.097015380859375, 46.503456115722656, 47.566070556640625, 107.41143798828125, 46.40837860107422, 4.386566162109375, 26.965065002441406, 39.74437713623047, 121.35459899902344, 263.27703857421875, -171.53915405273438, -21.537811279296875, 158.72830200195312, 75.32383728027344, 108.43304443359375, 291.2968444824219, 215.09329223632812, 123.23253631591797, -91.24345397949219, 8.204200744628906, 18.86542510986328, 61.301849365234375, 127.07125854492188, 91.17122650146484, 26.441612243652344, 70.5049819946289, 86.70140838623047, 182.1239013671875, 6.466440200805664, -52.863555908203125, -48.219966888427734, 115.164794921875, -55.1020393371582, 9.37828254699707, 165.9132537841797, -168.0908203125, 73.93003845214844, 153.4425048828125, -1.1906166076660156, -60.904144287109375, 83.61721801757812, -88.08847045898438, 130.6562042236328, 30.746170043945312, 173.29165649414062, 226.06007385253906, 12.560508728027344, 129.62680053710938, -13.327102661132812, -67.25323486328125, -2.413555145263672, 40.95158004760742, 38.368961334228516, -30.71221923828125, 163.36361694335938, 85.4482192993164, 92.99887084960938, 192.52137756347656, 135.92025756835938, 113.84325408935547, 183.46255493164062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000565.npy"}
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 56.84604263305664, "std": 120.94518280029297, "min": -280.4864501953125, "p10": -87.43488235473632, "median": 48.17746353149414, "p90": 200.16748657226563, "max": 304.2760009765625, "pos_frac": 0.65625, "sample": [99.41110229492188, 52.65233612060547, -71.4438247680664, 125.96795654296875, -6.634246826171875, 41.48799514770508, 192.59999084472656, 200.46316528320312, 1.6184520721435547, 132.76107788085938, 63.993343353271484, 90.41978454589844, 45.4075927734375, 27.072200775146484, 9.828941345214844, 304.2760009765625, -280.4864501953125, 156.05575561523438, 207.80282592773438, 153.7825927734375, 214.5950469970703, -14.537994384765625, -33.256778717041016, 79.25467681884766, 2.3543872833251953, -105.95793151855469, 103.06922912597656, 174.41595458984375, 196.1610107421875, 3.2642364501953125, 188.5206756591797, 170.37596130371094, 170.5732879638672, -15.946319580078125, -68.31207275390625, -188.03305053710938, -64.95738983154297, 42.6464958190918, 290.02593994140625, -75.8697738647461, 29.001953125, -92.391357421875, 268.14642333984375, -28.533096313476562, 185.30950927734375, -0.5714645385742188, -113.17135620117188, -108.10921478271484, 50.94733428955078, -14.701957702636719, -60.7353630065918, 2.9897994995117188, -71.29853820800781, 188.66250610351562, -0.9181632995605469, 76.94740295410156, -63.81354522705078, 207.47586059570312, 79.50086212158203, -125.30046844482422, 160.44749450683594, 124.25896453857422, 129.10348510742188, 199.47756958007812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000566.npy"}
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 72.50308990478516, "std": 100.37376403808594, "min": -149.99163818359375, "p10": -68.69245529174805, "median": 68.02241897583008, "p90": 190.51904296875003, "max": 276.2027893066406, "pos_frac": 0.78125, "sample": [107.40986633300781, -3.3354148864746094, 1.0089645385742188, 181.97494506835938, -85.09153747558594, 174.8717041015625, 204.54437255859375, 172.7417449951172, 181.9612274169922, 39.11095428466797, 185.83070373535156, 60.65223693847656, 116.39000701904297, -31.791091918945312, 122.78376007080078, -17.923500061035156, -5.791954040527344, 38.59112548828125, 26.47351837158203, 56.17593002319336, 133.4007568359375, 13.049407958984375, 3.0891551971435547, 184.52581787109375, -69.48316192626953, 0.5947017669677734, -137.14495849609375, -112.85028076171875, 97.93885040283203, -130.84091186523438, 160.5570526123047, 66.51109313964844, 192.4237060546875, 276.2027893066406, 217.50521850585938, -0.4212532043457031, 121.99170684814453, 53.882537841796875, 138.66734313964844, 198.40684509277344, -110.95962524414062, 104.96485137939453, 52.59014892578125, 103.83500671386719, 177.1043701171875, 166.69915771484375, 1.6044235229492188, 2.9900360107421875, 186.0748291015625, 166.23121643066406, 69.53374481201172, 29.465351104736328, 196.2887420654297, -1.8348388671875, 15.010635375976562, -149.99163818359375, -66.84747314453125, 109.08294677734375, 210.79464721679688, 28.98553466796875, 150.44615173339844, 90.3110580444336, 42.64739227294922, 130.57720947265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000567.npy"}
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 89.38102722167969, "std": 100.56726837158203, "min": -81.03019714355469, "p10": -34.86601562499999, "median": 61.199832916259766, "p90": 216.7351760864258, "max": 360.4110107421875, "pos_frac": 0.796875, "sample": [146.04281616210938, 54.97352600097656, 52.0373420715332, 31.92694091796875, 107.46414184570312, 149.190185546875, 160.86941528320312, -60.939674377441406, 17.565509796142578, -0.28279876708984375, -57.381126403808594, 183.92086791992188, 52.38488006591797, 256.3592224121094, 4.8564300537109375, -49.47456359863281, 47.90996551513672, -10.655235290527344, 50.05211639404297, 144.72506713867188, -81.03019714355469, -24.4256591796875, 67.42613983154297, 184.0516357421875, 218.4862823486328, 212.64926147460938, 152.63128662109375, 252.1121826171875, -73.81974792480469, 54.123077392578125, 184.6372833251953, 107.72734069824219, 97.65172576904297, -19.55084991455078, 158.16961669921875, 11.573640823364258, -4.58380126953125, 102.11652374267578, -0.5539417266845703, 171.64727783203125, 3.9664077758789062, 205.01312255859375, 112.46841430664062, 100.28401184082031, 171.5235595703125, 44.389015197753906, 32.59373474121094, 1.395355224609375, 5.4382781982421875, 48.157196044921875, 94.21163940429688, 186.80093383789062, -44.417144775390625, 154.38340759277344, 195.33856201171875, 44.109100341796875, 0.5375823974609375, 303.5646667480469, 229.39434814453125, 230.9976043701172, -39.3404541015625, 29.804481506347656, 196.77711486816406, 360.4110107421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000568.npy"}
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 44.742897033691406, "std": 98.29059600830078, "min": -193.11573791503906, "p10": -73.82600479125975, "median": 31.237274169921875, "p90": 186.48003997802735, "max": 259.88897705078125, "pos_frac": 0.734375, "sample": [173.22991943359375, -80.19828796386719, 3.0686779022216797, 173.40489196777344, 35.01971435546875, 190.76837158203125, 22.625991821289062, 210.50555419921875, 51.856964111328125, 188.39425659179688, 89.345703125, -39.886680603027344, 9.791927337646484, 37.981040954589844, -193.11573791503906, 58.088470458984375, 20.526226043701172, 259.88897705078125, -27.021650314331055, 87.00485229492188, 147.3162384033203, 137.08682250976562, 1.1872062683105469, 218.01632690429688, -19.142852783203125, 178.69056701660156, 107.08221435546875, 18.824756622314453, 42.63909912109375, -131.26925659179688, -144.3732452392578, 80.42892456054688, 6.7135467529296875, 12.905460357666016, -58.95734405517578, 5.6361541748046875, 197.58609008789062, 2.9944019317626953, 41.73387145996094, 21.639230728149414, 70.57154846191406, -33.890892028808594, -127.08009338378906, -29.769325256347656, 111.83799743652344, -121.5260238647461, 61.19963073730469, -0.6246185302734375, 24.982280731201172, 106.02582550048828, 57.3232536315918, 68.41261291503906, 182.01353454589844, -17.46692657470703, -39.27562713623047, 34.808753967285156, 91.23295593261719, 25.421127319335938, 199.2635498046875, -45.616790771484375, -119.30240631103516, 27.665794372558594, 176.8948974609375, 22.426963806152344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000569.npy"}
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 87.59431457519531, "std": 101.02674865722656, "min": -139.3586883544922, "p10": -31.394090461730947, "median": 87.13399124145508, "p90": 204.08053436279297, "max": 358.8148498535156, "pos_frac": 0.765625, "sample": [200.49032592773438, -76.58600616455078, 174.2626953125, 64.01556396484375, 143.59072875976562, 26.196510314941406, -72.385986328125, -0.2381134033203125, 57.33992004394531, -12.774581909179688, 92.10094451904297, 82.16703796386719, -0.4733905792236328, 159.08193969726562, -10.890312194824219, 170.16226196289062, 0.10272026062011719, -1.4641532897949219, 92.26548767089844, 127.92239379882812, 113.16604614257812, -133.26919555664062, 34.882591247558594, 182.5131378173828, 204.12831115722656, 3.466428756713867, 58.23905563354492, 181.08082580566406, 218.47616577148438, -36.03697967529297, 8.534461975097656, 128.40203857421875, 177.5035400390625, -71.22984313964844, 1.5398674011230469, 130.7513427734375, 184.64901733398438, 79.49745178222656, 192.32107543945312, 47.7896728515625, 228.63888549804688, 133.61154174804688, 241.6580810546875, -20.56068229675293, 40.36119842529297, 150.14524841308594, 160.40301513671875, 220.80274963378906, 124.69082641601562, -139.3586883544922, 182.91778564453125, -12.415809631347656, 64.90132141113281, 162.16439819335938, 213.67901611328125, 43.68995666503906, 79.22978210449219, -3.6857433319091797, 108.72927856445312, 170.42315673828125, 358.8148498535156, 36.81247329711914, 203.96905517578125, -64.87615966796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000570.npy"}
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 49.79738235473633, "std": 104.3386459350586, "min": -192.56793212890625, "p10": -101.11254882812499, "median": 62.55795669555664, "p90": 176.7143692016602, "max": 277.78717041015625, "pos_frac": 0.703125, "sample": [111.11077880859375, 119.73250579833984, 69.44911193847656, 15.836515426635742, -0.1852874755859375, 113.72256469726562, 76.14578247070312, 93.08228302001953, 142.55982971191406, 277.78717041015625, 102.84996032714844, 62.134910583496094, -122.08577728271484, -4.079624176025391, -2.0998306274414062, 117.54478454589844, 8.21978759765625, 168.09097290039062, 19.003662109375, 116.70792388916016, 171.21435546875, 15.479408264160156, 62.98100280761719, 84.14283752441406, 205.34075927734375, 87.610107421875, 109.88298034667969, 125.91138458251953, -192.56793212890625, 14.314872741699219, 2.1400012969970703, -104.89921569824219, -11.344505310058594, -142.91087341308594, 134.63914489746094, -3.2082042694091797, -35.687217712402344, 1.2807579040527344, -92.27699279785156, -41.15461730957031, 31.52342987060547, 77.10748291015625, 179.07151794433594, 121.59148406982422, 68.44331359863281, 189.81103515625, 203.21676635742188, -185.96826171875, 53.69115447998047, -35.24266815185547, -54.09779357910156, -164.34249877929688, 105.64521026611328, 47.717864990234375, 170.30247497558594, -27.30541229248047, 138.4715118408203, 144.24169921875, 46.89366912841797, 203.8337860107422, -42.562374114990234, 5.4335174560546875, -160.5362091064453, 193.67564392089844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000571.npy"}
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 58.30277633666992, "std": 93.62798309326172, "min": -158.69088745117188, "p10": -50.21811676025389, "median": 35.471099853515625, "p90": 182.66600494384767, "max": 218.11083984375, "pos_frac": 0.734375, "sample": [12.410346984863281, -59.70310974121094, 2.0102005004882812, 98.50389099121094, 178.30149841308594, 46.56134033203125, 179.76022338867188, 34.145721435546875, -6.333164215087891, 197.95516967773438, -12.577926635742188, 146.5034942626953, 163.06320190429688, 195.847412109375, -28.208324432373047, 186.12548828125, -56.39007568359375, 20.230409622192383, 27.058218002319336, 124.30332946777344, 42.98567581176758, 17.421600341796875, 128.6951446533203, 181.22555541992188, 190.82723999023438, 82.80341339111328, -32.08477020263672, 134.33203125, -8.365348815917969, 24.800216674804688, 218.11083984375, 183.28334045410156, 177.4697723388672, -150.36099243164062, 129.16685485839844, 31.634220123291016, 46.864990234375, -62.56373596191406, 7.361759185791016, 111.91927337646484, 98.12862396240234, -11.757673263549805, 136.41281127929688, 137.26095581054688, 36.796478271484375, 149.31948852539062, 22.61346435546875, -135.24461364746094, 208.31190490722656, 2.85784912109375, 87.20751953125, 125.81768798828125, -31.194625854492188, -7.967552185058594, -158.69088745117188, -35.81687927246094, 32.005043029785156, 117.28936767578125, -17.392196655273438, -97.53741455078125, 147.10052490234375, 1.2252254486083984, 14.688619613647461, 4.849527359008789], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000572.npy"}
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 61.77076721191406, "std": 116.64591979980469, "min": -222.25540161132812, "p10": -80.82622146606445, "median": 52.306278228759766, "p90": 227.3158416748047, "max": 340.4586181640625, "pos_frac": 0.71875, "sample": [38.42323303222656, 126.83602905273438, -182.1975860595703, 185.010498046875, -83.05249786376953, 51.35748291015625, -5.010004043579102, 20.176414489746094, 71.10720825195312, -75.63157653808594, -183.47067260742188, -156.24935913085938, -9.00013542175293, -48.27467346191406, -28.5980224609375, 159.695068359375, -20.447006225585938, 25.469955444335938, 195.32443237304688, -222.25540161132812, 53.25507354736328, 83.05895233154297, 227.35699462890625, 88.97466278076172, -10.91094970703125, 15.847064971923828, 340.4586181640625, 2.176624298095703, 144.9638671875, 97.89435577392578, 20.230743408203125, 26.5062255859375, 93.5166015625, 39.13905334472656, 101.62359619140625, -2.883655548095703, 95.19410705566406, 69.11663818359375, 28.17487335205078, 253.13619995117188, 172.49832153320312, 160.923583984375, 177.0286102294922, -4.825098037719727, -40.45106506347656, 102.7105941772461, 237.78431701660156, 3.6938323974609375, 38.193992614746094, 100.8154525756836, 227.21981811523438, 259.686767578125, 109.2401123046875, 121.52823638916016, 3.6056747436523438, -115.12382507324219, 285.7169189453125, -27.91234588623047, 95.35525512695312, 102.53898620605469, 128.41380310058594, 36.47534942626953, -96.18270874023438, 248.35140991210938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000573.npy"}
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 77.98553466796875, "std": 85.57843017578125, "min": -112.30318450927734, "p10": -11.479727935791013, "median": 67.57010650634766, "p90": 194.0708206176758, "max": 335.2005310058594, "pos_frac": 0.8125, "sample": [73.79511260986328, -50.373619079589844, -12.562995910644531, -112.30318450927734, 101.81898498535156, 113.37100219726562, 37.4359130859375, 25.479522705078125, 6.478334426879883, -19.714317321777344, -8.952102661132812, 90.54617309570312, 12.828964233398438, 37.86644744873047, -4.529541015625, 2.4065933227539062, -3.126758575439453, 49.80455017089844, 45.12305450439453, 110.92303466796875, 160.8957061767578, 31.01598358154297, 11.86767578125, 160.73663330078125, -8.509620666503906, 198.99757385253906, 46.25995635986328, 193.90858459472656, -0.7179088592529297, 195.3531494140625, 100.49185180664062, 0.04975128173828125, 85.53550720214844, 41.48089599609375, 105.00128173828125, 148.68707275390625, -78.8644790649414, 139.56541442871094, 194.14035034179688, 7.766176223754883, 118.38258361816406, 23.40264892578125, 183.55029296875, 200.13345336914062, 143.3463134765625, 196.29638671875, 114.98388671875, 90.68133544921875, 67.93766784667969, 17.533203125, -14.546142578125, 193.6166229248047, 2.7913055419921875, 129.06448364257812, 108.24645233154297, 34.61180877685547, 67.20254516601562, 184.23135375976562, 187.34494018554688, 220.2574005126953, -15.990989685058594, 149.6027374267578, 335.2005310058594, 23.216537475585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000574.npy"}
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 56.82392501831055, "std": 119.41057586669922, "min": -205.03944396972656, "p10": -93.67688369750974, "median": 31.9075927734375, "p90": 215.8742630004883, "max": 378.5587158203125, "pos_frac": 0.671875, "sample": [128.76882934570312, -147.08120727539062, -205.03944396972656, -5.537912368774414, 378.5587158203125, 84.12823486328125, -20.263717651367188, 119.97708892822266, 6.6881256103515625, -18.089710235595703, 133.8702392578125, -61.431419372558594, 215.09580993652344, -41.019351959228516, -141.1446075439453, 93.40214538574219, -131.48431396484375, 106.84431457519531, 0.13481903076171875, 6.648765563964844, 54.76868438720703, 216.2078857421875, 167.11520385742188, 9.099418640136719, 8.756019592285156, 18.356399536132812, -150.21925354003906, 146.13096618652344, 245.66653442382812, -10.931190490722656, 19.566892623901367, 62.03032684326172, 198.2957763671875, 270.418212890625, 178.16941833496094, -73.42720794677734, 218.37869262695312, -7.611572265625, 235.197021484375, 161.49273681640625, -43.99275207519531, 34.0452880859375, 105.84346771240234, 29.7698974609375, -0.4294281005859375, -10.980873107910156, 48.596649169921875, -175.90208435058594, 123.84791564941406, 273.77056884765625, 212.5632781982422, 125.37733459472656, 13.560256958007812, 39.38017654418945, -102.35531616210938, 23.92664337158203, 85.04367065429688, -10.164104461669922, 17.8425235748291, 95.85704040527344, -15.156295776367188, 169.47015380859375, 148.2024688720703, -21.87173843383789], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000575.npy"}
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 80.2456283569336, "std": 104.21712493896484, "min": -211.46495056152344, "p10": -50.382792663574214, "median": 67.34476089477539, "p90": 206.2575653076172, "max": 304.96826171875, "pos_frac": 0.765625, "sample": [27.977935791015625, 175.98928833007812, -10.312751770019531, 68.38996124267578, 158.31915283203125, 59.14405822753906, 194.8943328857422, 17.437829971313477, 66.299560546875, 145.0084686279297, -53.346702575683594, -80.75445556640625, 189.72714233398438, -64.84805297851562, 116.51021575927734, -104.466552734375, -43.467002868652344, 174.78160095214844, 304.96826171875, 145.660400390625, 194.03428649902344, 221.279052734375, 113.36940002441406, 4.24293327331543, 154.97463989257812, 97.75379943847656, 11.507341384887695, -69.75968170166016, 261.84759521484375, -54.82238006591797, 32.527008056640625, 40.987098693847656, 56.79302978515625, 188.34156799316406, 78.8526611328125, 59.05076599121094, 246.06492614746094, -8.586441040039062, -2.615814208984375, 8.539260864257812, 198.67063903808594, -17.05234146118164, 179.837158203125, 18.006765365600586, 205.8701171875, 2.56365966796875, 206.42361450195312, 171.9549560546875, -2.983205795288086, 207.25338745117188, 189.5585479736328, -211.46495056152344, 216.3930206298828, 114.91065979003906, 73.77765655517578, 187.11245727539062, -27.9512882232666, -2.9853515625, 27.55689239501953, 94.7713851928711, 0.7034206390380859, 49.155418395996094, 23.403465270996094, 107.94036865234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000576.npy"}
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 56.00664138793945, "std": 104.32469940185547, "min": -224.00503540039062, "p10": -59.508275222778316, "median": 43.24307441711426, "p90": 191.52637786865236, "max": 255.2894287109375, "pos_frac": 0.78125, "sample": [24.660293579101562, 11.726058959960938, -19.202964782714844, 57.213897705078125, 5.763916015625, 176.31219482421875, 14.18265151977539, -22.1640567779541, 229.51934814453125, 6.778846740722656, -53.40998077392578, 192.7687530517578, 158.50814819335938, -52.01612854003906, -84.16986083984375, 199.96987915039062, 185.8162841796875, 181.59857177734375, -161.07374572753906, 124.19550323486328, 200.2706756591797, 84.68140411376953, 12.50003433227539, 169.9257354736328, 45.839599609375, -68.6422348022461, -126.15594482421875, 138.3754119873047, 150.4704132080078, 95.03115844726562, -23.2403564453125, 173.42196655273438, -30.52532958984375, 54.15080261230469, 20.955718994140625, 10.398147583007812, 49.48991012573242, 3.919586181640625, 34.329505920410156, 6.980541229248047, 52.29821014404297, 20.078887939453125, 210.8304901123047, 2.6014842987060547, 192.77413940429688, 255.2894287109375, -219.79611206054688, 130.93743896484375, 66.69107055664062, 54.402099609375, 30.288619995117188, 107.85157775878906, 188.62750244140625, 178.98780822753906, 17.088211059570312, 40.646549224853516, -224.00503540039062, -4.20311164855957, -62.121829986572266, 122.06390380859375, 2.480072021484375, 99.4044189453125, 104.03477478027344, 38.02018737792969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000577.npy"}
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 53.528045654296875, "std": 108.41510772705078, "min": -202.56578063964844, "p10": -62.26083374023436, "median": 33.87908935546875, "p90": 190.69869232177734, "max": 302.5419921875, "pos_frac": 0.640625, "sample": [95.91468811035156, -2.6435699462890625, 86.08964538574219, 37.83319091796875, 61.140281677246094, -67.03164672851562, 84.73042297363281, 16.816917419433594, 155.135009765625, 0.8194904327392578, 9.29742431640625, 302.5419921875, -4.78460693359375, 128.17083740234375, 173.7484893798828, -202.56578063964844, 165.91493225097656, 189.46432495117188, 156.7854461669922, 125.69207763671875, 173.339111328125, 103.85739135742188, 2.908510208129883, -3.9199485778808594, 10.98200798034668, 83.79827117919922, 211.68807983398438, -1.4469375610351562, 90.014404296875, -4.966072082519531, 243.077880859375, 181.89129638671875, 191.2277069091797, -51.128936767578125, -22.113243103027344, 31.667709350585938, -2.6018733978271484, 88.35099792480469, 69.6173324584961, 74.51458740234375, 179.4992218017578, -187.68096923828125, 36.09046936035156, -49.64762878417969, -5.1505279541015625, 12.076507568359375, -39.474143981933594, 182.54693603515625, -97.38949584960938, 209.8330078125, 104.30818939208984, -113.59793853759766, 280.35455322265625, -32.533355712890625, -48.516448974609375, -78.47640991210938, 23.884063720703125, 64.01190948486328, -39.583473205566406, 13.813100814819336, -29.99837875366211, -4.778816223144531, -144.44009399414062, 206.8169708251953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000578.npy"}
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 77.98841094970703, "std": 91.1905288696289, "min": -183.4097137451172, "p10": -29.368903350830074, "median": 82.55094909667969, "p90": 197.47319488525392, "max": 248.5243377685547, "pos_frac": 0.78125, "sample": [108.89798736572266, 83.3092041015625, 76.46160125732422, 51.294349670410156, 65.5500259399414, -124.90571594238281, 201.87904357910156, 81.86795043945312, 138.68115234375, 248.5243377685547, 109.96456909179688, 179.18753051757812, 133.08685302734375, 199.8515625, 129.92945861816406, 18.333396911621094, 180.54248046875, -31.6170654296875, -24.123191833496094, 116.59684753417969, 219.732177734375, -58.528316497802734, 187.1014404296875, -4.166044235229492, 62.89134216308594, 138.16127014160156, 201.4011993408203, 15.009933471679688, 192.24147033691406, 173.9641571044922, -57.54973602294922, 25.738967895507812, 152.86868286132812, -1.072988510131836, 89.49683380126953, 112.3685302734375, 46.88945007324219, -8.900724411010742, -60.59999084472656, 186.2493133544922, -13.58355712890625, 5.424890518188477, -10.787208557128906, 8.885826110839844, 83.23394775390625, 4.232881546020508, -18.795997619628906, -183.4097137451172, 199.71536254882812, 127.16641998291016, 76.12820434570312, 165.39395141601562, 46.68035888671875, 214.8960723876953, 126.01425170898438, 7.104948043823242, 150.74168395996094, 142.96014404296875, 40.93187713623047, 84.76263427734375, -49.71351623535156, 21.113330841064453, 131.7667236328125, 73.78533935546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000579.npy"}
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 63.51256561279297, "std": 113.27708435058594, "min": -191.2840576171875, "p10": -61.56902847290037, "median": 57.65310287475586, "p90": 221.94192199707035, "max": 296.05426025390625, "pos_frac": 0.671875, "sample": [-150.24920654296875, -68.55560302734375, -30.376556396484375, 13.014936447143555, -36.5733642578125, 21.026865005493164, 164.58160400390625, 79.69984436035156, 76.14468383789062, 33.39374542236328, 213.3743133544922, 84.16158294677734, 8.930694580078125, -5.467617034912109, -34.30109405517578, 68.77493286132812, 168.03619384765625, 133.89019775390625, 2.1932506561279297, 43.07997131347656, -1.6331291198730469, 130.0124969482422, 241.31222534179688, 66.99610137939453, 134.0674591064453, -28.812698364257812, 250.54266357421875, -6.64007568359375, -6.638799667358398, 224.91021728515625, 215.01589965820312, 196.64500427246094, -2.6542739868164062, 135.24461364746094, -38.91777038574219, 192.76278686523438, -18.807811737060547, -191.2840576171875, 9.47549819946289, 296.05426025390625, -178.3272247314453, 20.56085205078125, -86.56666564941406, 101.35409545898438, 267.23779296875, -22.275590896606445, 53.785003662109375, 61.521202087402344, 238.09158325195312, 142.82601928710938, 147.80284118652344, 150.83901977539062, -45.26702117919922, -151.64630126953125, 23.306049346923828, -38.882240295410156, 77.60414123535156, 144.8258514404297, 124.23204040527344, 245.23873901367188, 37.647308349609375, -92.13795471191406, 127.54167175292969, 133.06304931640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000580.npy"}
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 64.63516998291016, "std": 103.81082153320312, "min": -158.541015625, "p10": -83.92738647460938, "median": 38.90806007385254, "p90": 194.12620544433597, "max": 315.0457458496094, "pos_frac": 0.71875, "sample": [129.6711883544922, 188.69964599609375, 35.40039825439453, 2.7858047485351562, 102.14434814453125, -5.8249359130859375, 7.6638031005859375, -82.56088256835938, 94.33047485351562, 102.46576690673828, -84.51303100585938, 17.509075164794922, 18.453184127807617, 37.89509582519531, 158.56927490234375, -78.04057312011719, 180.68923950195312, 76.83273315429688, 134.1892547607422, 27.262845993041992, 179.17807006835938, 29.941604614257812, 200.01023864746094, -55.11627197265625, 229.7860870361328, 196.45187377929688, 160.19215393066406, -35.40296936035156, 41.281715393066406, 97.66799926757812, -11.1097412109375, 170.0819091796875, 36.13862228393555, 129.750732421875, 145.49525451660156, 132.1628875732422, 39.31980895996094, -158.541015625, 251.74790954589844, 223.26751708984375, -33.202117919921875, -98.32174682617188, 110.0335693359375, 23.376602172851562, 103.49868774414062, -97.87730407714844, 315.0457458496094, 23.05908203125, -9.543575286865234, -93.00408935546875, 183.6566162109375, -20.012535095214844, 115.57591247558594, 170.3904571533203, 17.278095245361328, 124.52183532714844, -2.6644058227539062, -95.2608642578125, -90.316162109375, 38.49631118774414, 29.02655029296875, 181.16334533691406, 200.06439208984375, -24.260643005371094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000581.npy"}
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 61.742095947265625, "std": 101.80706024169922, "min": -177.258544921875, "p10": -42.21149291992187, "median": 47.227874755859375, "p90": 188.16056060791016, "max": 289.8363037109375, "pos_frac": 0.734375, "sample": [124.30170440673828, 181.4761962890625, 45.63367462158203, 104.67416381835938, 23.4351806640625, 27.478649139404297, 214.87156677246094, 62.07841873168945, 158.17056274414062, 156.00186157226562, 88.09616088867188, 48.82207489013672, 250.70973205566406, 289.8363037109375, -37.285430908203125, 189.49057006835938, -44.322662353515625, -20.675491333007812, 87.42816162109375, 120.81563568115234, -4.6045989990234375, 138.92062377929688, 15.073902130126953, 220.72265625, 162.8206787109375, 220.7285614013672, -6.518041610717773, 15.864282608032227, -135.59112548828125, -177.02821350097656, -5.966817855834961, 15.01959228515625, 117.06511688232422, -23.328628540039062, -19.575719833374023, 146.17755126953125, 143.9318084716797, 65.74874877929688, 199.53152465820312, 6.844539642333984, 119.19522094726562, 99.94746398925781, 72.34977722167969, 8.533998489379883, 3.282773971557617, -25.592330932617188, -46.24127960205078, 168.50735473632812, 10.185722351074219, -177.258544921875, 2.024862289428711, 18.864303588867188, 0.4019737243652344, 0.0067291259765625, -102.3704833984375, 79.41773986816406, 168.7288818359375, -34.375526428222656, 180.24815368652344, -30.099945068359375, -67.77737426757812, 9.279136657714844, 142.304931640625, 185.0572052001953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000582.npy"}
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 73.37625122070312, "std": 113.7681884765625, "min": -206.4808807373047, "p10": -50.07701263427733, "median": 86.70125579833984, "p90": 206.0067337036133, "max": 276.17584228515625, "pos_frac": 0.671875, "sample": [85.802490234375, -24.570159912109375, -24.30878448486328, -1.5553092956542969, 179.4249267578125, 87.60002136230469, -59.117088317871094, -28.68755340576172, -151.3437957763672, -8.313674926757812, 202.7692413330078, -16.789241790771484, -33.46047592163086, 161.13414001464844, 96.53648376464844, -55.137535095214844, 72.85359191894531, 1.3188037872314453, 203.54388427734375, 173.49423217773438, 47.70216369628906, -163.13339233398438, 173.9485626220703, 5.432670593261719, 138.50245666503906, 136.47225952148438, 250.468017578125, -206.4808807373047, 26.671527862548828, 177.15447998046875, 46.0015983581543, 51.280364990234375, -15.188095092773438, 26.136154174804688, 174.14859008789062, 220.5712127685547, -15.26068115234375, -38.269126892089844, 200.862548828125, 181.5771942138672, 91.97581481933594, 217.24502563476562, -196.17893981933594, 276.17584228515625, -4.420110702514648, 182.4718017578125, 145.94070434570312, 156.05059814453125, 103.2451400756836, -19.92974853515625, 232.82070922851562, 94.80474090576172, 196.12747192382812, 13.47075080871582, 109.01174926757812, 171.69061279296875, -87.811279296875, 103.36344146728516, -1.1666336059570312, 185.4922637939453, 207.06224060058594, 212.67437744140625, 29.786727905273438, -3.6149063110351562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000583.npy"}
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 65.74142456054688, "std": 117.31839752197266, "min": -180.95362854003906, "p10": -80.79412994384765, "median": 42.80034637451172, "p90": 229.08000183105474, "max": 284.1932678222656, "pos_frac": 0.671875, "sample": [208.78289794921875, 235.9468994140625, 69.94392395019531, -4.059989929199219, 76.05169677734375, -8.036447525024414, 235.57443237304688, 213.92633056640625, 160.0811767578125, 199.2374725341797, 111.49899291992188, -3.6991138458251953, -10.281137466430664, -74.02294158935547, -141.94479370117188, 127.13367462158203, 120.20711517333984, 197.63267517089844, 182.53900146484375, -6.595127105712891, 180.37139892578125, 239.6558380126953, -0.5502243041992188, 174.90452575683594, 284.1932678222656, 183.45809936523438, 91.1618881225586, 4.012825012207031, 35.01020812988281, 18.488189697265625, 192.2635955810547, -88.73173522949219, 143.0697021484375, 9.007965087890625, -111.2486572265625, 182.2158203125, -17.823089599609375, -27.752166748046875, -18.13823699951172, -47.58332824707031, 4.623027801513672, 96.45685577392578, -73.36415100097656, 50.590484619140625, 94.09564971923828, 260.3782958984375, 4.49504280090332, 65.94659423828125, 91.97692108154297, -83.6960678100586, -180.95362854003906, 26.303752899169922, -41.437767028808594, 179.88677978515625, 1.5899124145507812, 273.9590148925781, 244.10250854492188, 10.506258010864258, 12.278654098510742, 33.06169128417969, -6.190269470214844, 127.4381103515625, -123.36776733398438, -177.1312713623047], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000584.npy"}
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 45.716304779052734, "std": 100.82054901123047, "min": -235.42034912109375, "p10": -36.15743560791015, "median": 34.50090026855469, "p90": 191.50484924316407, "max": 286.5470275878906, "pos_frac": 0.734375, "sample": [23.850727081298828, 112.40150451660156, 89.19795227050781, -3.6745758056640625, 71.63001251220703, 90.28193664550781, 41.965309143066406, 210.63699340820312, -10.585861206054688, 5.980247497558594, -182.8704071044922, 21.087947845458984, 119.77810668945312, 34.32902526855469, 0.06690597534179688, 286.5470275878906, -146.44509887695312, -16.998680114746094, 56.33320617675781, 105.25395202636719, 126.35038757324219, 18.08046531677246, -5.16737174987793, -139.99806213378906, -5.657522201538086, 22.18019676208496, -30.114295959472656, 7.590538024902344, 201.9597625732422, 152.8974609375, 7.325885772705078, 14.997100830078125, 47.5611457824707, -4.687450408935547, -208.5500946044922, 5.687141418457031, 15.336654663085938, 149.27975463867188, 197.82833862304688, 106.76179504394531, 40.93123245239258, 154.69790649414062, 84.34442138671875, -59.98572540283203, -235.42034912109375, 200.72555541992188, 203.3009796142578, 192.0883331298828, -38.747352600097656, -12.721000671386719, 34.67277526855469, 30.770179748535156, -21.799285888671875, 7.053285598754883, 50.55829620361328, 78.20745086669922, 36.3150634765625, 189.6119384765625, 190.1433868408203, 87.64508056640625, 77.6083755493164, 4.039772033691406, 66.37262725830078, -22.997535705566406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000585.npy"}
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 67.28589630126953, "std": 78.83846282958984, "min": -94.51318359375, "p10": -13.97823963165283, "median": 46.872215270996094, "p90": 188.68729553222656, "max": 233.95106506347656, "pos_frac": 0.796875, "sample": [105.66796875, 18.51575469970703, 157.8722686767578, 62.704505920410156, 58.79449462890625, 149.61978149414062, 28.04902458190918, 8.592575073242188, -7.246391296386719, -15.512128829956055, 112.35272979736328, -15.086433410644531, 4.2261199951171875, 226.73184204101562, 200.35806274414062, -11.442255020141602, 114.36872100830078, 109.75593566894531, 82.83894348144531, 130.73126220703125, 23.936443328857422, 69.53373718261719, 24.608936309814453, 188.16351318359375, 16.73992919921875, -3.118192672729492, 19.378700256347656, 16.998138427734375, 163.8022003173828, 33.75204086303711, 148.21994018554688, 158.39654541015625, 61.894371032714844, 59.43610763549805, 223.87071228027344, 3.6078338623046875, -94.51318359375, 221.486083984375, -4.115352630615234, 6.223165512084961, 6.806779861450195, 31.450515747070312, -27.566421508789062, 128.84568786621094, 95.55720520019531, -15.06509017944336, 0.46721649169921875, 69.815673828125, -3.8338165283203125, 125.783447265625, 233.95106506347656, -5.0830230712890625, 12.112838745117188, 46.475372314453125, 188.91177368164062, 39.154396057128906, -60.22252655029297, 190.4232177734375, 47.26905822753906, -28.256179809570312, 180.1683349609375, 14.119060516357422, 87.86443328857422, 86.95394897460938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000586.npy"}
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 66.17654418945312, "std": 126.85829162597656, "min": -453.0540466308594, "p10": -67.68053169250487, "median": 74.70962905883789, "p90": 202.9984069824219, "max": 295.3566589355469, "pos_frac": 0.78125, "sample": [25.266040802001953, -121.57118225097656, -2.2048110961914062, 140.23007202148438, 140.62063598632812, -134.14622497558594, 46.221012115478516, -38.260772705078125, 178.88504028320312, 161.4906005859375, 15.566234588623047, 82.27689361572266, -72.71177673339844, 14.126693725585938, -0.126708984375, 86.18167114257812, 151.34861755371094, -5.028026580810547, -40.91587829589844, 218.26519775390625, 205.40293884277344, 147.85894775390625, 0.35994720458984375, 197.38783264160156, 128.73037719726562, 21.70520782470703, 206.784912109375, 13.553009033203125, -453.0540466308594, 13.452678680419922, 127.93309783935547, 131.9756622314453, 128.90036010742188, 137.0373077392578, 13.932819366455078, 82.4403076171875, 91.75636291503906, -275.12994384765625, -91.17324829101562, 295.3566589355469, 169.34356689453125, 1.7274208068847656, 186.03134155273438, 113.22793579101562, 1.0848064422607422, 166.22567749023438, 9.776611328125, 184.8712158203125, 175.0462646484375, 115.75146484375, 214.91693115234375, -111.91057586669922, 155.10067749023438, 10.820526123046875, 67.14236450195312, 249.1255340576172, 25.721839904785156, -12.844207763671875, -55.94095993041992, 55.639183044433594, 63.556434631347656, 283.44708251953125, 190.4378204345703, 6.275398254394531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000587.npy"}
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 75.01617431640625, "std": 114.82534790039062, "min": -222.27447509765625, "p10": -70.04922790527343, "median": 67.34754180908203, "p90": 208.98488159179686, "max": 333.09002685546875, "pos_frac": 0.78125, "sample": [85.97256469726562, -95.99827575683594, -44.20886993408203, 169.62850952148438, 5.9385528564453125, 157.23736572265625, 178.96200561523438, 25.815353393554688, 208.28659057617188, 191.3355712890625, 312.41314697265625, 42.22026062011719, 18.223478317260742, 128.34283447265625, 45.023353576660156, 127.48210906982422, 209.28414916992188, 298.77667236328125, 20.880338668823242, 333.09002685546875, 80.27474212646484, 164.9728546142578, -34.72536849975586, 188.48033142089844, 1.4969654083251953, 78.67974853515625, 40.104270935058594, -6.526283264160156, 4.7015533447265625, -60.50384521484375, 4.62083625793457, 115.24205017089844, -130.3018798828125, 124.41017150878906, 129.09716796875, 233.182861328125, 17.43828773498535, 129.85348510742188, 81.74894714355469, -85.5801010131836, -27.146995544433594, -58.118316650390625, 90.25521087646484, 211.09523010253906, 130.2825927734375, 103.57368469238281, 179.26583862304688, -7.403202056884766, 30.034835815429688, 173.44322204589844, 27.541656494140625, -128.31881713867188, 45.749298095703125, 43.436805725097656, 134.17869567871094, 56.01533508300781, -74.14010620117188, 21.521080017089844, 51.666259765625, -125.95877075195312, 175.2939453125, 273.8438720703125, 201.82562255859375, -222.27447509765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000588.npy"}
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 72.73495483398438, "std": 84.8216323852539, "min": -109.29750061035156, "p10": -32.36242828369141, "median": 72.3225326538086, "p90": 185.38097534179687, "max": 278.31396484375, "pos_frac": 0.78125, "sample": [46.464080810546875, 47.90119171142578, -109.29750061035156, 194.88299560546875, 103.94988250732422, -0.41949462890625, -66.73006439208984, -6.345245361328125, 39.88300323486328, -6.312408447265625, -38.04747009277344, 43.64685821533203, 278.31396484375, 2.597766876220703, 143.93418884277344, 86.0628433227539, 89.82137298583984, 144.97708129882812, 147.3616180419922, 132.0321044921875, 85.51229858398438, 98.339599609375, 174.3601837158203, -83.13491821289062, 211.3907012939453, 119.77340698242188, -31.95672607421875, 110.44805145263672, 147.03236389160156, 137.42001342773438, 57.3944091796875, -3.9584312438964844, 185.83181762695312, 55.460968017578125, -32.53630065917969, 79.35272216796875, 62.752830505371094, 31.369834899902344, 86.5806655883789, 192.4800262451172, 34.09019470214844, 99.59112548828125, 191.74917602539062, 108.5608901977539, 65.29234313964844, -45.8243408203125, 120.84149932861328, -28.440845489501953, -17.24862289428711, 20.206520080566406, -101.35031127929688, 165.71319580078125, 170.06649780273438, 17.29468536376953, 50.73841094970703, 184.32901000976562, 123.41264343261719, 147.07339477539062, 18.89441680908203, 22.551048278808594, 81.20550537109375, 30.355106353759766, 234.30184936523438, 3.043680191040039], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000589.npy"}
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 69.0606689453125, "std": 96.92897033691406, "min": -169.56504821777344, "p10": -41.65683822631836, "median": 65.35202026367188, "p90": 194.82359161376954, "max": 240.97039794921875, "pos_frac": 0.734375, "sample": [114.2278823852539, 27.413673400878906, 135.627685546875, 5.381614685058594, 190.09375, 179.34219360351562, 94.2952880859375, -157.83090209960938, 35.26888656616211, -33.04332733154297, 56.965789794921875, 2.6224708557128906, 6.73968505859375, 29.494544982910156, -169.56504821777344, 184.22012329101562, 168.0908966064453, 62.217529296875, -19.359233856201172, -3.5887298583984375, -56.620262145996094, 70.2523193359375, -2.3059310913085938, 199.25973510742188, 196.25839233398438, -2.2955093383789062, -36.39048767089844, 99.54658508300781, 240.97039794921875, 68.48651123046875, -36.590003967285156, 225.52120971679688, 13.938783645629883, 8.416015625, 114.83724975585938, 75.4820556640625, 98.00489044189453, 29.212753295898438, 74.88008117675781, 14.631240844726562, 208.98712158203125, 108.62123107910156, 238.11520385742188, 167.48190307617188, 184.75497436523438, 107.869384765625, 100.1915283203125, 184.19898986816406, -43.828338623046875, 60.98966979980469, -56.023773193359375, 198.8773651123047, 131.8063201904297, -21.387283325195312, -0.6695709228515625, 188.29522705078125, 4.408117294311523, 78.25603485107422, 191.47572326660156, -3.6531333923339844, 43.39847183227539, -49.347503662109375, 186.9658203125, -94.01124572753906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000590.npy"}
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 84.97420501708984, "std": 125.46168518066406, "min": -195.26206970214844, "p10": -86.24580001831052, "median": 84.87051391601562, "p90": 220.2357162475586, "max": 400.34765625, "pos_frac": 0.765625, "sample": [2.2059574127197266, 197.71078491210938, 170.2750244140625, 174.58363342285156, 167.66940307617188, 53.983497619628906, 193.0752410888672, 89.02830505371094, 80.71272277832031, 105.51942443847656, 53.03755187988281, -135.43197631835938, 201.53819274902344, -97.41730499267578, 104.45034790039062, -23.335342407226562, 40.86235046386719, 91.09295654296875, 103.90716552734375, 4.345348358154297, 221.7215576171875, 26.669641494750977, 290.2692565917969, -168.45521545410156, 187.73089599609375, -159.3940887451172, 184.16827392578125, 100.29568481445312, -6.151031494140625, 38.23406982421875, -195.26206970214844, -173.12754821777344, 267.6033020019531, 216.7687530517578, -2.4998397827148438, 208.11875915527344, -3.4433326721191406, 54.652435302734375, 187.04605102539062, 39.70976638793945, 6.786430358886719, -8.497535705566406, 57.949066162109375, 177.17393493652344, 0.9008655548095703, 245.97500610351562, -10.981172561645508, -101.33905029296875, -6.175148010253906, 400.34765625, -60.178955078125, 21.26557159423828, 258.0049133300781, 28.08562469482422, 18.32387924194336, 203.44894409179688, 203.1600341796875, 138.6666717529297, 167.05052185058594, 135.29971313476562, 211.4699249267578, 258.12384033203125, 6.016603469848633, 195.00360107421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000591.npy"}
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 63.23442077636719, "std": 100.31036376953125, "min": -124.60765075683594, "p10": -47.055089569091784, "median": 24.914371490478516, "p90": 195.58048248291016, "max": 328.3680725097656, "pos_frac": 0.734375, "sample": [66.61491394042969, 214.32443237304688, 196.74514770507812, 19.95365333557129, 185.61575317382812, 192.0384063720703, 231.11508178710938, 15.48897933959961, 4.653472900390625, 35.62230682373047, -27.28044891357422, 128.20230102539062, 62.94343566894531, 16.0562744140625, -56.98370361328125, 87.10143280029297, 192.86293029785156, 112.85952758789062, 4.882194519042969, 28.25314712524414, -105.53691101074219, 181.18630981445312, 103.77587890625, 254.27633666992188, 18.974781036376953, 123.03712463378906, 122.21074676513672, 188.24179077148438, 186.05810546875, 23.114944458007812, -90.28343963623047, 126.86813354492188, -15.352386474609375, 160.75885009765625, 31.00084686279297, 76.98872375488281, -124.60765075683594, 91.369140625, -106.41890716552734, -53.02240753173828, -5.070652008056641, 5.236391067504883, 26.71379852294922, -12.793106079101562, -62.2078857421875, -0.2872314453125, 14.935440063476562, 277.28314208984375, 228.25799560546875, 11.2958984375, 328.3680725097656, -0.46189117431640625, 21.530765533447266, -0.01941680908203125, 18.782196044921875, 4.355207443237305, -33.13134765625, -10.344757080078125, -31.16219711303711, 11.497264862060547, 61.44002151489258, 21.549949645996094, 120.10733032226562, 147.41860961914062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000592.npy"}
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 68.76606750488281, "std": 103.7433853149414, "min": -164.55661010742188, "p10": -72.20729293823237, "median": 62.74808120727539, "p90": 207.79457244873052, "max": 231.6938018798828, "pos_frac": 0.78125, "sample": [171.1251983642578, 27.835147857666016, 192.9825897216797, 3.76654052734375, 176.33750915527344, -24.858016967773438, 92.41080474853516, -164.55661010742188, 211.85646057128906, 62.800331115722656, 196.89605712890625, 32.49087905883789, 231.6938018798828, 164.70213317871094, 45.973052978515625, 4.268955230712891, 119.63333892822266, 108.69076538085938, 176.00784301757812, 147.62619018554688, 16.852651596069336, 69.3260498046875, 224.8656005859375, 212.54788208007812, -11.276824951171875, -5.278114318847656, 7.180803298950195, 166.33438110351562, 105.57718658447266, -124.14663696289062, 85.04248046875, -92.49983978271484, 20.257064819335938, 62.695831298828125, 103.73345947265625, 167.75759887695312, 0.697906494140625, -5.6189727783203125, 141.4021453857422, -122.7633056640625, -135.75503540039062, -6.941474914550781, 218.54689025878906, -125.79068756103516, 173.63333129882812, 158.1716766357422, 171.1617889404297, 9.921609878540039, -15.504844665527344, 230.81314086914062, 54.067474365234375, -10.49211311340332, 198.31683349609375, 95.98817443847656, 75.19335174560547, -124.0563735961914, 11.630517959594727, 213.32017517089844, 1.6949958801269531, 11.093254089355469, 15.783935546875, 28.683738708496094, 21.29132080078125, 129.88612365722656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000593.npy"}
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 61.746273040771484, "std": 92.51199340820312, "min": -246.22825622558594, "p10": -49.546811676025385, "median": 68.6722412109375, "p90": 181.5667938232422, "max": 225.63406372070312, "pos_frac": 0.765625, "sample": [68.13905334472656, 143.86207580566406, -89.44656372070312, -1.6744003295898438, 79.7413330078125, 136.98086547851562, 27.58740997314453, 46.56622314453125, 69.20542907714844, 3.9413833618164062, -71.4283447265625, 25.313411712646484, -8.233634948730469, 133.1057891845703, 36.545066833496094, 5.392681121826172, -63.88330078125, 2.0500221252441406, 88.78756713867188, 8.042098999023438, 189.54200744628906, 80.2540054321289, 0.9533901214599609, 184.5812225341797, 200.6714630126953, 105.71487426757812, 176.3389892578125, -47.36902618408203, 178.39779663085938, -1.813516616821289, 154.97109985351562, 123.67361450195312, 221.00778198242188, 152.5568389892578, 49.575103759765625, -43.58039855957031, -54.40528106689453, 114.2049560546875, 140.78189086914062, -50.48014831542969, 182.04708862304688, 5.922685623168945, 73.15574645996094, 180.44610595703125, 2.0916824340820312, 25.11493492126465, 71.02406311035156, 192.25950622558594, 1.6603260040283203, 180.05636596679688, 157.48104858398438, -246.22825622558594, 109.4527587890625, 225.63406372070312, 46.248138427734375, 116.06501770019531, 1.3121795654296875, 75.22502899169922, 87.5582275390625, -34.38800811767578, 104.97247314453125, -36.05668640136719, -22.108604431152344, -63.35508728027344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000594.npy"}
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 81.56169891357422, "std": 116.18535614013672, "min": -195.12547302246094, "p10": -27.084167480468746, "median": 71.17181015014648, "p90": 220.99959259033204, "max": 348.6747741699219, "pos_frac": 0.796875, "sample": [-4.773231506347656, 78.25569915771484, 348.6747741699219, 112.22451782226562, 22.86248016357422, 11.742204666137695, 95.97820281982422, 40.07948684692383, 14.145309448242188, 194.57290649414062, 229.49261474609375, 14.262031555175781, 78.65684509277344, -134.45248413085938, 197.15518188476562, 14.098159790039062, -1.7841129302978516, 6.7683563232421875, 200.64796447753906, 55.28260040283203, 27.42632293701172, 117.76765441894531, 229.81349182128906, 8.087455749511719, 252.19488525390625, 48.63654327392578, 89.37480926513672, 88.07057189941406, 208.2560272216797, 146.54110717773438, -0.9668807983398438, -126.64677429199219, 64.08792114257812, -28.600372314453125, 186.78453063964844, 182.73519897460938, 243.5782012939453, -171.59774780273438, -191.76051330566406, 215.78292846679688, 191.66989135742188, 196.90817260742188, 154.76495361328125, 57.350189208984375, -92.38203430175781, 199.17662048339844, 102.60443115234375, 145.72409057617188, -23.546356201171875, 21.214191436767578, -0.6662502288818359, 224.87631225585938, 153.33929443359375, 31.61573028564453, -10.222694396972656, 33.093650817871094, 223.2353057861328, 180.1370391845703, 53.95967102050781, 5.850433349609375, 0.07302665710449219, -195.12547302246094, 187.2962188720703, 215.54751586914062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000595.npy"}
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 36.11613464355469, "std": 95.77941131591797, "min": -185.8578643798828, "p10": -78.76354980468747, "median": 18.25511932373047, "p90": 185.578776550293, "max": 244.8114776611328, "pos_frac": 0.5625, "sample": [-89.5985107421875, 19.42736053466797, 205.69557189941406, 18.856216430664062, 41.047080993652344, -87.73111724853516, -23.48886489868164, -0.0222015380859375, -57.83922576904297, 199.056640625, -112.06608581542969, 63.40922927856445, -112.63632202148438, -21.934967041015625, 244.8114776611328, 184.43641662597656, 168.72474670410156, 86.88760375976562, -126.85762023925781, 49.874122619628906, -38.932342529296875, 186.068359375, 49.57904052734375, 122.30160522460938, 195.83343505859375, 179.52723693847656, 108.69292449951172, -17.038808822631836, 79.99077606201172, -101.58984375, -46.0007209777832, 82.08302307128906, 24.84770965576172, 10.574670791625977, 85.37325286865234, 4.042911529541016, 52.014930725097656, 228.26060485839844, 10.36544418334961, -14.912467956542969, -11.792327880859375, -7.691154479980469, -18.631778717041016, -185.8578643798828, -24.168533325195312, -5.685741424560547, 17.654022216796875, -14.54513931274414, 205.21954345703125, 73.02828979492188, 128.16339111328125, -3.995758056640625, -21.329994201660156, -36.208641052246094, -13.385292053222656, 172.7925567626953, 18.98462677001953, 131.53350830078125, -40.908966064453125, 45.18577575683594, 20.831859588623047, 87.76539611816406, -15.974899291992188, -40.68366241455078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000596.npy"}
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 62.523338317871094, "std": 97.48439025878906, "min": -151.46873474121094, "p10": -57.27741470336913, "median": 37.35534477233887, "p90": 194.39116516113282, "max": 242.02207946777344, "pos_frac": 0.71875, "sample": [-5.290912628173828, -36.06684112548828, -84.19735717773438, -14.294357299804688, 202.58535766601562, -15.39202880859375, 37.3063850402832, -100.86298370361328, -48.45655059814453, 199.39483642578125, 6.216747283935547, 121.4762191772461, 1.9486942291259766, 190.262451171875, -61.05778503417969, 8.542593002319336, 191.5262451171875, 206.8611297607422, 185.5242919921875, -16.547937393188477, 101.4872817993164, 9.321090698242188, -79.13298034667969, 25.96135711669922, 183.48611450195312, 167.69497680664062, 11.035139083862305, -68.37944793701172, 178.566650390625, 7.22850227355957, 145.9137420654297, 38.16925811767578, 198.47885131835938, 35.47486114501953, 88.14073944091797, 99.14363098144531, -15.021072387695312, 115.033203125, 37.40430450439453, 156.85800170898438, 2.502361297607422, -107.805419921875, 9.900459289550781, -9.245939254760742, -4.534479141235352, -7.450910568237305, 4.219743728637695, 58.05961608886719, 161.12750244140625, 60.531761169433594, 195.61898803710938, 3.871685028076172, 145.1358642578125, -13.463577270507812, -151.46873474121094, 131.568603515625, 1.5562095642089844, 145.23863220214844, 242.02207946777344, 228.34152221679688, 66.04586791992188, 180.4347381591797, 80.9596939086914, 171.98489379882812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000597.npy"}
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 39.72389221191406, "std": 100.14009094238281, "min": -189.54150390625, "p10": -58.28458747863769, "median": 19.569866180419922, "p90": 180.82977142333988, "max": 286.1660461425781, "pos_frac": 0.640625, "sample": [-4.002389907836914, -124.49299621582031, 0.12483978271484375, 33.53459930419922, 63.004234313964844, 1.8672523498535156, 7.0328826904296875, 193.8773651123047, 175.9424285888672, -187.12213134765625, 61.50336456298828, 47.159759521484375, 56.369571685791016, -41.14234924316406, 144.9187774658203, -21.95014190673828, 227.89556884765625, -36.95404052734375, 169.55294799804688, 40.50209426879883, -19.24149513244629, -189.54150390625, -36.26329040527344, 103.65968322753906, 61.66746520996094, 43.71842956542969, 66.27513122558594, 53.8233642578125, 3.1420669555664062, -49.17848587036133, -62.18720245361328, -120.1114273071289, 11.163850784301758, 117.559814453125, 221.67333984375, -5.450771331787109, -1.2911224365234375, 6.949993133544922, 85.69917297363281, 102.15469360351562, 160.93740844726562, 22.3568115234375, -3.7716007232666016, 195.01100158691406, -17.434494018554688, 143.3965301513672, 31.374771118164062, -0.8612518310546875, 152.82301330566406, 16.782920837402344, -48.87701416015625, -75.9968032836914, 7.918907165527344, 5.392303466796875, 27.923677444458008, -36.349578857421875, 171.43466186523438, 120.30680847167969, -30.90258026123047, -21.64879608154297, 286.1660461425781, 194.947265625, -143.36856079101562, 182.92434692382812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000598.npy"}
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 48.73453903198242, "std": 101.4479751586914, "min": -175.83975219726562, "p10": -72.8985641479492, "median": 38.10587692260742, "p90": 186.55791931152345, "max": 268.99700927734375, "pos_frac": 0.65625, "sample": [-52.13975524902344, -26.439149856567383, 58.650733947753906, 46.022850036621094, -47.964271545410156, 88.53221130371094, 27.883289337158203, 236.91741943359375, -98.53367614746094, -175.83975219726562, -6.063026428222656, -45.396202087402344, 12.583511352539062, 63.31399917602539, 115.07909393310547, -7.076148986816406, -108.74006652832031, 1.1806755065917969, 1.2356224060058594, -39.095001220703125, 188.06396484375, 126.76608276367188, 72.08364868164062, -173.19290161132812, -0.4799652099609375, 13.531021118164062, 24.611251831054688, 187.14520263671875, -81.79519653320312, 41.214508056640625, 215.67030334472656, 186.0071563720703, -111.31194305419922, -31.17986297607422, 57.80231857299805, 22.988250732421875, 102.49999237060547, -26.29962158203125, 81.81192779541016, 74.41656494140625, 160.00306701660156, -28.749969482421875, 34.99724578857422, -49.26860046386719, 174.6676025390625, 63.51822280883789, 184.462158203125, -11.189468383789062, -121.93734741210938, 14.574092864990234, 171.70982360839844, 222.01815795898438, 87.43460083007812, 166.17735290527344, 186.79396057128906, 29.428003311157227, 268.99700927734375, 177.99746704101562, 128.15895080566406, -3.8193607330322266, -18.292648315429688, 81.01374053955078, 103.02824401855469, 82.82295989990234], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000599.npy"}
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 56.602394104003906, "std": 102.49665832519531, "min": -190.3856658935547, "p10": -63.76260910034179, "median": 34.09359931945801, "p90": 207.29075927734377, "max": 293.2670593261719, "pos_frac": 0.765625, "sample": [147.864990234375, 63.690345764160156, 8.271791458129883, 6.805572509765625, 209.75173950195312, 1.2558746337890625, 43.35725784301758, -125.15743255615234, 172.16973876953125, 223.70887756347656, -18.428504943847656, -0.6333732604980469, 73.5016860961914, 115.07249450683594, 3.56927490234375, 22.364564895629883, -83.70458984375, -90.61872863769531, -79.603759765625, 164.4866485595703, 20.787670135498047, 91.58479309082031, 144.25697326660156, 119.7825927734375, -59.815757751464844, 264.47601318359375, 109.34354400634766, 181.85302734375, -17.917144775390625, 200.8951416015625, 213.80990600585938, 4.974037170410156, -190.3856658935547, -15.611412048339844, 83.66152954101562, -39.74544906616211, 2.518707275390625, 137.92681884765625, 1.2417869567871094, 293.2670593261719, -6.146400451660156, 142.6239013671875, 8.460807800292969, 201.7308349609375, 4.538187026977539, 8.94558334350586, -65.45411682128906, -29.44279670715332, -144.953369140625, 66.06068420410156, 2.8151779174804688, 87.18931579589844, 65.23954010009766, 24.829940795898438, 107.49430847167969, 10.894744873046875, 45.70924377441406, 58.886844635009766, 97.39854431152344, 209.673583984375, 1.4376296997070312, 54.52222442626953, 10.61962890625, 254.85040283203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000600.npy"}
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 77.39845275878906, "std": 113.85598754882812, "min": -192.7987518310547, "p10": -51.38692703247069, "median": 78.10984802246094, "p90": 203.67511749267578, "max": 303.256103515625, "pos_frac": 0.71875, "sample": [162.1734161376953, 12.093132019042969, 192.18429565429688, -124.65576171875, 205.04550170898438, 127.07369995117188, 303.256103515625, -76.44277954101562, 64.17308044433594, -8.498245239257812, 214.99034118652344, -159.6907501220703, 189.58351135253906, 16.995988845825195, -29.913270950317383, 91.40662384033203, 174.60179138183594, 278.70233154296875, -57.257720947265625, 158.25314331054688, 200.47755432128906, 3.8746566772460938, 0.8413848876953125, 230.97421264648438, 200.00299072265625, 145.7508087158203, -1.9339427947998047, -17.608959197998047, 148.30197143554688, 54.37040710449219, 20.92438507080078, 146.3319549560547, 153.92385864257812, -21.117713928222656, -3.5440311431884766, 120.6461181640625, -5.668281555175781, 130.44573974609375, 80.15573120117188, 9.685606002807617, -59.01054382324219, 104.48251342773438, 15.180400848388672, 183.27261352539062, 76.06396484375, 146.30548095703125, 33.88264846801758, -37.68840789794922, -33.31340789794922, 7.170661926269531, -4.751609802246094, 255.46014404296875, -1.0493183135986328, 4.271915435791016, 178.78646850585938, 188.99929809570312, 301.47088623046875, 125.09353637695312, 178.47598266601562, 1.9854049682617188, -149.97689819335938, 169.98876953125, 130.29051208496094, -192.7987518310547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000601.npy"}
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 73.1083984375, "std": 95.88612365722656, "min": -186.1676025390625, "p10": -9.691604995727536, "median": 55.13628959655762, "p90": 203.62288360595704, "max": 297.1530456542969, "pos_frac": 0.8125, "sample": [-10.966949462890625, 148.2235107421875, 210.60308837890625, 297.1530456542969, 87.89045715332031, 38.565521240234375, 63.6793212890625, 2.8926219940185547, 54.69097900390625, -4.598323822021484, 221.737060546875, 1.2743492126464844, 204.6908416748047, 178.0892333984375, -109.05757141113281, 201.1309814453125, -127.26254272460938, 225.06591796875, 121.64411926269531, -6.715801239013672, 15.598068237304688, -2.9811553955078125, 16.054412841796875, -5.753314971923828, 32.61061096191406, 164.587646484375, 79.27428436279297, 171.68890380859375, 25.74502944946289, 104.1800537109375, 12.600650787353516, 156.5641326904297, 88.67729949951172, 55.581600189208984, 40.706729888916016, 20.163501739501953, 22.630512237548828, 46.92040252685547, 186.25735473632812, 90.62116241455078, 17.051692962646484, 150.75010681152344, 177.9510040283203, 170.25531005859375, 14.4180908203125, -186.1676025390625, 99.70829010009766, 61.40449523925781, 3.593658447265625, 16.62415313720703, -12.51766586303711, -1.3236541748046875, 174.7587890625, -12.999015808105469, 17.04633331298828, 193.5907440185547, 84.8065414428711, 83.73117065429688, 213.33572387695312, 209.8291473388672, -108.41541290283203, 162.0747528076172, 13.333061218261719, 15.639862060546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000602.npy"}
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 69.68728637695312, "std": 103.86468505859375, "min": -195.87615966796875, "p10": -60.588893890380845, "median": 56.55825233459473, "p90": 189.80162353515627, "max": 292.5566101074219, "pos_frac": 0.78125, "sample": [184.42770385742188, 112.27146911621094, 188.55859375, -30.91120147705078, -95.22197723388672, 45.08015441894531, 51.559242248535156, 111.14884948730469, -5.707489013671875, 5.4969635009765625, 184.5958251953125, 63.70823669433594, 0.5485763549804688, -85.87076568603516, 36.915672302246094, 256.7422180175781, -98.78668212890625, 42.606239318847656, 16.918319702148438, 292.5566101074219, 78.49529266357422, 21.557891845703125, 187.71527099609375, 71.09503173828125, -8.103534698486328, 53.78411865234375, 177.44786071777344, -70.44589233398438, 173.12661743164062, -177.2316436767578, 0.499114990234375, 16.924346923828125, 137.3226318359375, 111.26628112792969, -66.19925689697266, 168.76815795898438, 59.3323860168457, 33.77743148803711, -7.531904220581055, 190.51861572265625, 106.65718078613281, 159.82254028320312, 117.32272338867188, 67.93827056884766, 132.45526123046875, 194.9566650390625, 34.15037536621094, -47.498046875, 78.25953674316406, 17.531618118286133, 19.921348571777344, 50.258056640625, 173.2371826171875, 186.87380981445312, 93.00414276123047, -15.1898193359375, 7.384677886962891, -195.87615966796875, 190.3343505859375, 187.8649444580078, 196.54176330566406, 2.181650161743164, 286.9873046875, -13.888931274414062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000603.npy"}
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 62.07550048828125, "std": 107.60479736328125, "min": -227.3004913330078, "p10": -68.8107307434082, "median": 51.1270809173584, "p90": 201.85733489990236, "max": 341.139892578125, "pos_frac": 0.71875, "sample": [97.50723266601562, 150.99990844726562, -10.573413848876953, -58.1614990234375, 13.757061004638672, 159.60179138183594, -20.02389144897461, 0.7040195465087891, 190.2262420654297, -38.19682312011719, 31.913299560546875, -12.619132995605469, 2.267118453979492, 161.30422973632812, -88.33334350585938, -23.255386352539062, 180.27052307128906, 71.05322265625, 54.08574676513672, 210.1575927734375, -153.04150390625, -55.01850891113281, 26.386642456054688, -86.0885009765625, 130.89486694335938, 8.889787673950195, 12.030672073364258, 86.8997573852539, 70.38673400878906, -227.3004913330078, 52.515750885009766, -67.87029266357422, 68.0982437133789, -87.06367492675781, 185.5341033935547, -15.409172058105469, 79.77467346191406, -21.12591552734375, 202.59945678710938, 151.84478759765625, 101.37326049804688, 200.12571716308594, 341.139892578125, 26.92314910888672, 173.01858520507812, 116.61773681640625, 40.95636749267578, 189.23410034179688, 207.40049743652344, -69.21377563476562, 212.98764038085938, 2.1296234130859375, 96.95165252685547, 45.266502380371094, 147.93280029296875, 49.73841094970703, 151.7592315673828, 38.86748504638672, 1.828420639038086, 217.72991943359375, 159.8371124267578, -17.196304321289062, -101.27581787109375, 203.0779266357422], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000604.npy"}
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 62.13636779785156, "std": 106.17876434326172, "min": -204.025146484375, "p10": -53.327106857299796, "median": 58.28291893005371, "p90": 192.57646942138672, "max": 265.4395751953125, "pos_frac": 0.703125, "sample": [83.05332946777344, 95.88777160644531, -146.2375030517578, 265.4395751953125, -13.829307556152344, 197.7418212890625, 59.48179626464844, 193.1518096923828, 113.7480239868164, -43.972625732421875, 176.34912109375, -26.665603637695312, 48.96308135986328, -25.037330627441406, 2.730030059814453, -87.05409240722656, 58.89906311035156, 168.01016235351562, 149.38865661621094, 57.66677474975586, -38.64445495605469, 20.147430419921875, 137.49566650390625, -204.025146484375, -0.37772369384765625, 147.02076721191406, 191.2340087890625, 178.1650390625, 60.376487731933594, 56.67366027832031, 155.78216552734375, 35.50763702392578, -198.05848693847656, 170.28433227539062, -3.0765380859375, 195.25341796875, 184.82241821289062, 133.33743286132812, 25.02349853515625, 112.7039794921875, -67.53373718261719, -12.514293670654297, 181.881103515625, 42.54417419433594, 197.27420043945312, 29.722267150878906, 198.03897094726562, 177.37059020996094, 179.49703979492188, -25.719329833984375, 230.9473876953125, 68.36015319824219, 99.17913818359375, -31.632606506347656, -27.5615234375, 15.887491226196289, 19.48979377746582, -172.04855346679688, 39.74555969238281, -57.3361701965332, 9.067216873168945, 90.00859069824219, -11.740982055664062, 116.4407958984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000605.npy"}
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 76.15914916992188, "std": 107.2137451171875, "min": -142.71072387695312, "p10": -57.469087219238276, "median": 64.06488418579102, "p90": 220.56694335937502, "max": 312.4819030761719, "pos_frac": 0.75, "sample": [-52.94489288330078, 215.7841796875, 49.2020263671875, 234.72198486328125, -87.79312896728516, 22.25652313232422, 181.30235290527344, 42.16148376464844, -2.8119258880615234, 159.2305450439453, 189.2659912109375, -25.942337036132812, 108.84921264648438, 222.61669921875, 186.93222045898438, -142.71072387695312, 119.75126647949219, 74.63543701171875, 5.98219108581543, 153.13739013671875, 312.4819030761719, 13.36834716796875, 49.15176010131836, 170.23956298828125, -59.40802764892578, -79.4100112915039, 267.00341796875, -37.682525634765625, 114.45135498046875, 143.16268920898438, 113.73295593261719, 0.9341201782226562, -98.52523040771484, 183.02633666992188, 96.720458984375, 2.5420398712158203, 62.05790710449219, -3.6393680572509766, 152.05206298828125, 0.20383453369140625, 194.30453491210938, 233.77435302734375, 66.07186126708984, 143.22564697265625, 6.749778747558594, 266.66448974609375, -20.737228393554688, 60.69990539550781, -94.17662811279297, -48.8426399230957, 187.49197387695312, 33.24513244628906, -107.05146026611328, 28.528717041015625, -17.507843017578125, 136.90420532226562, 81.56778717041016, -24.394329071044922, 112.60824584960938, 249.7141876220703, 137.45343017578125, 152.32150268554688, 39.443939208984375, 0.03558540344238281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000606.npy"}
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 66.49452209472656, "std": 106.2885971069336, "min": -219.70407104492188, "p10": -66.46510925292968, "median": 58.743892669677734, "p90": 202.96891326904301, "max": 290.350830078125, "pos_frac": 0.71875, "sample": [2.0601425170898438, 27.833648681640625, -133.92747497558594, 89.37518310546875, 140.6036376953125, 4.747406005859375, 139.08975219726562, 73.50226593017578, -219.70407104492188, -59.203575134277344, -2.549854278564453, -49.829261779785156, 124.1876220703125, 188.8262481689453, 52.98921203613281, -69.56692504882812, -13.329231262207031, 9.677129745483398, 53.97956085205078, 208.60597229003906, 27.70751953125, 219.63124084472656, 142.6719207763672, 134.93093872070312, 110.26646423339844, 32.086891174316406, -31.59021759033203, 176.0000457763672, -7.412574768066406, -17.412185668945312, 207.74655151367188, 102.48251342773438, 127.55300903320312, 152.0926513671875, -92.09039306640625, 210.05435180664062, 290.350830078125, -6.976985931396484, 158.4398651123047, 14.908126831054688, -59.2275390625, 10.139633178710938, 33.78016662597656, 152.22779846191406, 106.60821533203125, -3.5458297729492188, -86.29178619384766, 155.39291381835938, 215.94601440429688, 175.35935974121094, 1.2821540832519531, 26.605606079101562, 142.20831298828125, 220.88235473632812, 45.36879348754883, 191.8210906982422, -1.5011405944824219, 76.55352783203125, 172.9659881591797, 188.85609436035156, 63.50822448730469, -164.19097900390625, -75.18059539794922, 147.27334594726562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000607.npy"}
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 67.75590515136719, "std": 96.27395629882812, "min": -158.19424438476562, "p10": -31.51845550537108, "median": 34.5908145904541, "p90": 188.08773193359374, "max": 294.15277099609375, "pos_frac": 0.765625, "sample": [177.5370330810547, -85.88560485839844, -0.5082664489746094, 187.49388122558594, 0.28748130798339844, -79.01663208007812, 151.08807373046875, 6.4558868408203125, 33.334781646728516, -125.94578552246094, 136.424560546875, 96.25533294677734, 138.03334045410156, 183.29762268066406, 65.44146728515625, 131.78611755371094, 234.74058532714844, -158.19424438476562, -5.230531692504883, 13.53561019897461, 56.09165954589844, -17.488956451416016, -0.5596561431884766, 214.2765350341797, 178.06576538085938, 17.928096771240234, 0.6576919555664062, 100.50752258300781, 7.536834716796875, 197.69549560546875, 186.218994140625, -36.968727111816406, 14.97393798828125, 117.76486206054688, 35.84684753417969, 110.56967163085938, 142.20986938476562, -18.80115509033203, -48.75148010253906, -7.256378173828125, 3.0392227172851562, 18.18883514404297, 50.75462341308594, 30.125099182128906, -14.452041625976562, 136.2479705810547, -0.094024658203125, 176.4413604736328, 294.15277099609375, 172.8407440185547, 123.5513916015625, 23.250839233398438, 41.004676818847656, 187.2484130859375, 2.8479042053222656, 158.0540008544922, -82.43080139160156, 192.42886352539062, 28.90118408203125, 9.876411437988281, 26.332130432128906, 188.3422393798828, 23.542434692382812, 194.73541259765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000608.npy"}
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 75.52725982666016, "std": 93.1353530883789, "min": -222.59765625, "p10": -44.71548309326172, "median": 82.87580871582031, "p90": 188.18302307128906, "max": 311.9989013671875, "pos_frac": 0.78125, "sample": [208.8429412841797, 177.6113739013672, 85.82312774658203, 9.655876159667969, 96.75689697265625, 24.880088806152344, 181.8721466064453, 116.18270874023438, -20.698959350585938, -0.10251617431640625, -45.04570007324219, 126.02296447753906, -16.587417602539062, 17.450763702392578, 311.9989013671875, 160.88491821289062, 200.09190368652344, 100.77764892578125, 82.02976989746094, 208.72357177734375, -53.823936462402344, 112.88047790527344, 83.09658813476562, 75.59365844726562, -77.72770690917969, 201.95700073242188, 152.760498046875, 97.04015350341797, -102.60072326660156, 138.81793212890625, 55.798561096191406, -0.9992618560791016, 181.7003631591797, 89.92870330810547, -29.736297607421875, 112.57064056396484, 64.16618347167969, -0.1663036346435547, 145.375732421875, 119.98826599121094, 70.38429260253906, -58.83529281616211, 70.03997039794922, 198.16845703125, 186.7012481689453, -222.59765625, 14.950252532958984, 84.67796325683594, 82.655029296875, 9.955556869506836, 38.75615310668945, 107.3131103515625, 121.96299743652344, 61.19340515136719, 1.7652416229248047, 12.392501831054688, 188.8180694580078, 117.07122039794922, 174.38876342773438, -53.454612731933594, 167.92669677734375, 81.94760131835938, 27.717193603515625, -43.944976806640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000609.npy"}
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 73.57546997070312, "std": 107.74991607666016, "min": -163.16433715820312, "p10": -38.789144897460936, "median": 62.7737922668457, "p90": 213.46763763427734, "max": 318.67401123046875, "pos_frac": 0.703125, "sample": [67.96354675292969, 318.67401123046875, 202.01947021484375, 136.6863555908203, 73.13195037841797, 192.11505126953125, 2.0262298583984375, 5.566553115844727, -114.24191284179688, 9.37799072265625, 67.25939178466797, -106.74506378173828, 108.78065490722656, 32.91987609863281, -12.900634765625, 59.94361877441406, 185.07675170898438, -30.712677001953125, -36.910491943359375, -14.143287658691406, 33.178565979003906, -83.62020874023438, -3.7042770385742188, 139.20045471191406, -37.51518249511719, -50.63249206542969, 22.549575805664062, 191.4405517578125, 65.60396575927734, 226.62542724609375, 58.238372802734375, 299.9991760253906, 217.24264526367188, 191.5111083984375, -39.33512878417969, 175.18101501464844, 189.98876953125, -1.8412761688232422, 179.31503295898438, 23.023590087890625, 16.246917724609375, 153.07337951660156, 71.78667449951172, -163.16433715820312, 212.5828857421875, -9.583332061767578, -30.08289337158203, -24.16031265258789, 184.28355407714844, 221.96047973632812, 103.58317565917969, 102.02080535888672, 45.02656555175781, -4.757234573364258, 1.3509349822998047, -18.045875549316406, -100.89665222167969, 178.23910522460938, 213.84681701660156, 99.11927795410156, 76.64176940917969, 40.91007995605469, 242.07809448242188, 154.4327850341797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000610.npy"}
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 57.99309539794922, "std": 90.64617919921875, "min": -169.84300231933594, "p10": -51.05070648193358, "median": 56.2694034576416, "p90": 179.3163848876953, "max": 237.33059692382812, "pos_frac": 0.734375, "sample": [106.13034057617188, 108.3205795288086, 29.132659912109375, 51.59354019165039, 150.4383544921875, -2.7303237915039062, -0.37841796875, 60.94526672363281, -3.0299720764160156, 68.07014465332031, 1.5686225891113281, 153.6040496826172, 70.8416519165039, 116.02104187011719, -5.507942199707031, 4.468902587890625, -98.00076293945312, 181.8445587158203, -36.55947494506836, 237.33059692382812, 176.3135223388672, 79.07141876220703, 128.62139892578125, -74.30769348144531, -57.261234283447266, -8.010536193847656, 179.136962890625, 70.45108795166016, 110.07781219482422, 224.09841918945312, -1.3514938354492188, -69.49512481689453, 44.06822204589844, 198.57473754882812, -29.1634521484375, 2.01385498046875, -169.84300231933594, 218.3787384033203, 51.492332458496094, 48.23258972167969, 129.51849365234375, 78.40229797363281, -156.43365478515625, 72.66935729980469, 31.18933868408203, 79.61910247802734, 92.5458984375, 39.96427917480469, 150.81808471679688, 174.82562255859375, 81.33534240722656, 158.7708740234375, 1.5343761444091797, 71.6688003540039, 4.7751007080078125, 30.203140258789062, 196.56817626953125, -104.78211212158203, 179.39328002929688, 32.99828338623047, -25.337133407592773, 6.644844055175781, -4.6151885986328125, 74.079345703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000611.npy"}
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 84.49961853027344, "std": 100.98075866699219, "min": -216.13441467285156, "p10": -18.862575149536134, "median": 83.12470245361328, "p90": 219.2919509887696, "max": 275.3831787109375, "pos_frac": 0.796875, "sample": [-2.5181961059570312, 9.143768310546875, 108.14523315429688, 27.717979431152344, -31.408950805664062, 7.206962585449219, 165.03428649902344, 133.306396484375, -9.43450927734375, 34.28788757324219, 140.32333374023438, 95.78544616699219, 71.3434066772461, 79.0306396484375, 143.82000732421875, 52.77130126953125, 46.913124084472656, 4.065528869628906, 87.21876525878906, 150.535400390625, 162.41981506347656, -50.190940856933594, 142.78851318359375, 39.95220947265625, 78.23075866699219, 32.376102447509766, 242.86289978027344, -18.95523452758789, -34.252559661865234, 20.18291473388672, 75.87157440185547, 194.44107055664062, 3.0887489318847656, -6.622228622436523, -18.64636993408203, 202.05621337890625, 21.95960235595703, 160.21609497070312, 253.5623779296875, -151.712646484375, 231.83799743652344, 75.60092163085938, 190.87107849121094, -133.9069061279297, 249.10604858398438, 126.50215148925781, 186.8130645751953, -9.678815841674805, 167.8615264892578, 226.67869567871094, 99.12818145751953, -8.879478454589844, 253.10659790039062, 99.92298889160156, 42.92400360107422, 275.3831787109375, 128.52328491210938, 74.80204010009766, 96.74710083007812, -216.13441467285156, 97.64875030517578, 150.44491577148438, 159.94363403320312, 179.8125457763672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000612.npy"}
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 77.8802490234375, "std": 99.3871078491211, "min": -188.40200805664062, "p10": -31.724790191650385, "median": 66.62068557739258, "p90": 199.24666137695314, "max": 236.89468383789062, "pos_frac": 0.75, "sample": [-11.392967224121094, 53.60908508300781, -11.896026611328125, 43.18254089355469, 111.98004913330078, 146.6346435546875, 137.17837524414062, 85.28056335449219, -33.528770446777344, -5.321863174438477, 52.43018341064453, 59.66647720336914, -183.27719116210938, 151.35986328125, 154.85726928710938, 59.51676940917969, -37.53688049316406, 3.667051315307617, 235.64651489257812, 170.29034423828125, 14.262779235839844, -15.506338119506836, -188.40200805664062, 15.75773811340332, 182.50497436523438, 72.57589721679688, 173.4045867919922, 7.030660629272461, 17.33904266357422, 18.067298889160156, 230.85488891601562, 60.705570220947266, 107.80029296875, -2.044567108154297, -9.699028015136719, 66.29884338378906, -79.1476058959961, 179.54693603515625, 117.34556579589844, 216.04861450195312, 178.95034790039062, 198.06216430664062, 56.121665954589844, 181.08782958984375, 231.7075653076172, 9.113357543945312, 236.89468383789062, -2.0013294219970703, -27.5155029296875, 197.71141052246094, -14.352828979492188, -38.31787109375, 180.61666870117188, 228.6444854736328, 116.22267150878906, 152.67324829101562, 16.618640899658203, 199.75430297851562, 131.52349853515625, 192.39077758789062, 66.9425277709961, 81.39614868164062, 103.30170440673828, -60.30072021484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000613.npy"}
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 71.2728500366211, "std": 96.91987609863281, "min": -173.55075073242188, "p10": -49.22794952392578, "median": 81.40971755981445, "p90": 183.08163452148438, "max": 280.80694580078125, "pos_frac": 0.78125, "sample": [184.31581115722656, 127.84217071533203, 31.050601959228516, 142.07656860351562, 102.6972427368164, 117.0318832397461, 11.400218963623047, 0.25746917724609375, 87.21357727050781, 66.19918823242188, 216.93841552734375, 138.7277069091797, 208.2402801513672, 197.92733764648438, 134.6163330078125, 160.4475860595703, 133.4159698486328, 80.88158416748047, 131.71688842773438, 122.13935852050781, -133.2453155517578, 5.9263153076171875, -45.50098419189453, -173.55075073242188, 280.80694580078125, -3.3915023803710938, -9.720556259155273, -46.267059326171875, 171.3328399658203, -59.16033935546875, -73.56365203857422, 11.942989349365234, 6.039943695068359, 175.13888549804688, 61.99271011352539, 10.112199783325195, 145.38380432128906, 91.23184204101562, -156.51515197753906, 141.413330078125, -6.202125549316406, 177.45126342773438, 152.0479736328125, 136.46084594726562, 104.68988800048828, 149.75352478027344, 25.92414093017578, 74.73178100585938, 143.09263610839844, 53.475921630859375, 23.249435424804688, 68.6976089477539, -50.49690246582031, -37.43891143798828, 15.43545913696289, 217.44314575195312, 87.81608581542969, -13.598747253417969, 180.20188903808594, 31.622528076171875, 81.93785095214844, 43.8270149230957, -92.81719970703125, 198.614501953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000614.npy"}
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 72.10533905029297, "std": 117.5101089477539, "min": -245.19586181640625, "p10": -69.60241088867187, "median": 59.87712860107422, "p90": 209.6304122924805, "max": 322.547607421875, "pos_frac": 0.78125, "sample": [63.89325714111328, 91.52238464355469, 181.87498474121094, 27.310836791992188, 74.05538940429688, 181.47738647460938, 84.03180694580078, -161.19557189941406, 187.94711303710938, 55.861000061035156, -120.26323699951172, 73.87500762939453, -80.1243896484375, 117.80088806152344, 49.821083068847656, -104.1640625, 149.26449584960938, 233.55670166015625, 182.21005249023438, 5.4346160888671875, 185.6834716796875, -27.770896911621094, 249.21585083007812, -2.773529052734375, 177.23809814453125, 44.7752685546875, -34.33848571777344, 50.78114318847656, 9.436798095703125, 205.3131103515625, 193.82870483398438, 64.17988586425781, -13.019294738769531, 46.558292388916016, 5.687232971191406, 188.75613403320312, 211.11526489257812, -69.35687255859375, 18.807113647460938, 188.96798706054688, 225.034912109375, 235.48919677734375, 184.8152313232422, 5.0016021728515625, 152.7955780029297, 16.216876983642578, 40.089820861816406, -56.72613525390625, 41.778106689453125, 206.16575622558594, -69.7076416015625, 255.68060302734375, 10.917938232421875, 79.13636779785156, -17.831146240234375, 322.547607421875, 65.29353332519531, 133.98822021484375, 190.97451782226562, 5.761131286621094, 15.73779296875, 31.438232421875, -201.935302734375, -245.19586181640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000615.npy"}
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 63.51789093017578, "std": 99.23487854003906, "min": -164.69776916503906, "p10": -35.70275211334228, "median": 48.18917465209961, "p90": 198.97677001953127, "max": 240.41885375976562, "pos_frac": 0.6875, "sample": [217.62045288085938, 14.214164733886719, -8.150390625, -20.656875610351562, 186.01976013183594, -3.465818405151367, 240.41885375976562, 152.94383239746094, 1.2735023498535156, 116.02980041503906, 49.137229919433594, 10.758247375488281, 62.806419372558594, -164.69776916503906, 28.673946380615234, -48.52893829345703, 46.628936767578125, -37.75740432739258, -13.360710144042969, 214.11703491210938, 50.29528045654297, 196.33309936523438, -12.423303604125977, -21.729768753051758, 200.10977172851562, 1.1375865936279297, -12.047361373901367, 6.754692077636719, -30.9085636138916, 191.3729248046875, 162.453369140625, 10.827041625976562, 83.79383850097656, 177.4523162841797, 233.00892639160156, 167.5489959716797, -9.077693939208984, 89.21345520019531, -29.235931396484375, 186.4360809326172, 179.9447021484375, 73.04747009277344, 47.241119384765625, 63.88789367675781, 166.85791015625, 218.47061157226562, -56.553260803222656, 19.444019317626953, -150.15802001953125, 12.034294128417969, -30.24606704711914, -104.24604797363281, 91.7032241821289, 81.30255126953125, 29.459312438964844, 209.47265625, 89.40959930419922, -6.189876556396484, 122.38725280761719, -18.971710205078125, 173.83676147460938, 186.61981201171875, 54.37493896484375, -73.32329559326172], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000616.npy"}
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 52.30952835083008, "std": 98.98080444335938, "min": -171.95465087890625, "p10": -59.09563980102537, "median": 41.112510681152344, "p90": 181.70648803710938, "max": 333.0641784667969, "pos_frac": 0.734375, "sample": [35.81433868408203, 2.155864715576172, -29.147598266601562, 34.894996643066406, 201.1006622314453, 116.92132568359375, 172.26177978515625, 181.21759033203125, 6.890140533447266, -89.42029571533203, 50.1234130859375, 26.42064666748047, -71.33784484863281, 0.9306697845458984, 1.8147621154785156, 177.0593719482422, -41.684654235839844, 203.02227783203125, 29.106658935546875, 4.425254821777344, 333.0641784667969, -14.13779067993164, -3.7648353576660156, 175.53720092773438, 92.61305236816406, -0.35649871826171875, -1.7389564514160156, 127.88961791992188, 58.94056701660156, -2.406024932861328, 19.580387115478516, 15.749073028564453, 50.92567443847656, 188.83767700195312, 116.697509765625, 222.02651977539062, 46.410682678222656, -171.95465087890625, -35.14531707763672, 135.56687927246094, 112.95792388916016, 14.632095336914062, -66.39379119873047, -11.680912017822266, 201.69354248046875, 181.916015625, -162.60574340820312, -156.90243530273438, -133.77992248535156, 72.27436828613281, 111.98825073242188, 129.92333984375, 59.29673767089844, 11.268346786499023, 174.108154296875, 104.16279602050781, 25.54302978515625, 9.951759338378906, 51.247474670410156, 89.05850219726562, 51.15653991699219, 78.77912139892578, 74.37678527832031, -42.066619873046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000617.npy"}
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 66.49580383300781, "std": 97.80500793457031, "min": -119.20596313476562, "p10": -38.36495819091796, "median": 44.95499038696289, "p90": 206.3839294433594, "max": 315.0124816894531, "pos_frac": 0.765625, "sample": [249.4781036376953, 216.1341552734375, 19.392650604248047, 166.49945068359375, -12.706779479980469, -0.05990028381347656, -65.224365234375, 17.911705017089844, 267.1932373046875, 1.2802162170410156, 4.38201904296875, 244.17031860351562, 151.86318969726562, 68.97740173339844, 195.57916259765625, 5.829719543457031, 64.73738861083984, 1.9463119506835938, 315.0124816894531, 119.28045654296875, 44.597633361816406, 239.1941375732422, 2.308877944946289, 45.312347412109375, 8.093330383300781, 198.6666259765625, 49.79296875, 58.089942932128906, 114.35655975341797, 74.49700164794922, 15.129287719726562, 29.22278594970703, 204.38040161132812, -8.706005096435547, 51.68940734863281, -62.913726806640625, 61.328125, -117.13195037841797, 117.25393676757812, 40.11847686767578, -97.47294616699219, -0.5454978942871094, 8.96017837524414, -119.20596313476562, 117.0835189819336, -7.3736114501953125, 40.536094665527344, -91.45695495605469, 119.36905670166016, 207.24258422851562, 105.99108123779297, 140.25894165039062, 8.005935668945312, -1.4985198974609375, 74.10922241210938, 44.13287353515625, -27.114425659179688, 92.33661651611328, 146.29156494140625, 174.37489318847656, -18.335403442382812, 152.93588256835938, -43.186614990234375, 33.335662841796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000618.npy"}
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 72.2548828125, "std": 104.84713745117188, "min": -271.1210021972656, "p10": -30.370564651489257, "median": 58.422996520996094, "p90": 210.68821716308594, "max": 264.95556640625, "pos_frac": 0.765625, "sample": [127.22630310058594, 22.710739135742188, 2.660062789916992, 25.4317626953125, 185.1239013671875, 226.0725860595703, 208.76422119140625, -59.24604034423828, -120.21490478515625, 248.44374084472656, 194.34384155273438, 72.8201904296875, 184.7296905517578, 12.341146469116211, 79.1851806640625, -11.643989562988281, 61.275108337402344, 67.26239013671875, -3.7528934478759766, -8.7635498046875, -114.85405731201172, 239.17001342773438, 32.774932861328125, -57.249114990234375, -9.913492202758789, 178.62472534179688, 158.37464904785156, 190.09962463378906, 1.5279216766357422, -28.200851440429688, 8.402610778808594, -271.1210021972656, -14.082984924316406, 229.16690063476562, 141.2705078125, 140.28887939453125, 18.415409088134766, 193.64930725097656, 211.51278686523438, -97.04913330078125, 122.67001342773438, 231.376953125, 100.95716857910156, 0.3699970245361328, 131.96315002441406, 123.03146362304688, 131.78358459472656, 160.26870727539062, 157.3342742919922, 40.74473190307617, 55.570884704589844, 53.6699104309082, 12.82733154296875, 46.71412658691406, 264.95556640625, 19.73480987548828, 69.01631164550781, 114.91767120361328, -5.106138229370117, 21.994935989379883, -8.34307861328125, 112.5013656616211, 31.082382202148438, -31.30044174194336], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000619.npy"}
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 58.35517883300781, "std": 98.64330291748047, "min": -174.94949340820312, "p10": -57.98269882202147, "median": 35.47214698791504, "p90": 194.9239379882813, "max": 238.6818084716797, "pos_frac": 0.6875, "sample": [19.505165100097656, -156.83895874023438, -0.3748626708984375, 153.30810546875, -74.88045501708984, -0.14627838134765625, 62.37272644042969, 199.3245391845703, 1.8557205200195312, 97.81441497802734, 168.67503356933594, 147.60438537597656, 126.97113037109375, 107.74237060546875, -6.702178955078125, 234.72171020507812, 120.09243774414062, 56.951499938964844, -23.015640258789062, 5.547996520996094, -174.94949340820312, 26.189373016357422, -76.8516845703125, 232.47760009765625, -65.67765808105469, 163.64280700683594, -37.933799743652344, -18.91973876953125, 203.25189208984375, -40.027793884277344, 168.5587921142578, 208.52609252929688, -12.827960968017578, -12.019025802612305, -20.21286392211914, 25.506492614746094, 26.502498626708984, 39.673736572265625, 177.34910583496094, 2.4917144775390625, 163.59823608398438, 36.63899230957031, 115.10968780517578, 76.63463592529297, 34.305301666259766, -93.41329956054688, 238.6818084716797, 204.23171997070312, -6.34101676940918, 151.94570922851562, 1.836172103881836, 184.65586853027344, -3.97314453125, 5.589029312133789, -125.94908142089844, 26.488805770874023, 168.09991455078125, 20.941848754882812, 86.79248046875, -1.1500244140625, 120.48268127441406, 110.3386459350586, 113.76654815673828, 50.14112091064453], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000620.npy"}
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 58.716758728027344, "std": 107.32410430908203, "min": -197.6741943359375, "p10": -61.60534706115722, "median": 65.96910858154297, "p90": 187.2626159667969, "max": 349.15301513671875, "pos_frac": 0.71875, "sample": [2.847381591796875, 22.79405975341797, 20.54595184326172, -56.67264938354492, -63.7193603515625, -141.86697387695312, 73.76716613769531, 95.32704162597656, -48.16194152832031, 16.757272720336914, 83.72099304199219, 119.91615295410156, 182.2481231689453, -4.86015510559082, 186.7760467529297, 75.67753601074219, 78.4587173461914, 122.00701904296875, 64.96998596191406, 182.06634521484375, 89.1402587890625, 66.00917053222656, 287.20111083984375, -159.53231811523438, 214.74127197265625, 158.15008544921875, 73.712646484375, -197.6741943359375, 105.44146728515625, -95.81961059570312, 251.42062377929688, 101.2468490600586, 91.694091796875, 199.8787841796875, 173.77911376953125, -28.895370483398438, 65.92904663085938, 14.096338272094727, 77.18241882324219, -30.319061279296875, -80.86367797851562, 14.22651481628418, 214.9775848388672, -4.5953369140625, 13.434637069702148, -19.427173614501953, 42.02931213378906, 1.336151123046875, -10.595687866210938, -0.19299697875976562, 3.447813034057617, 169.13763427734375, 70.25054931640625, 10.967805862426758, 131.30169677734375, -10.748340606689453, -27.04156494140625, 349.15301513671875, -156.4918212890625, 72.57887268066406, 165.80026245117188, 24.562973022460938, 187.4711456298828, 127.17181396484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000621.npy"}
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 64.41081237792969, "std": 101.03971862792969, "min": -185.1044921875, "p10": -50.15358390808105, "median": 39.34842300415039, "p90": 189.25535583496094, "max": 247.356689453125, "pos_frac": 0.75, "sample": [164.11294555664062, 18.734912872314453, 12.010368347167969, -185.1044921875, 14.588890075683594, 98.12586212158203, -27.77679443359375, 13.831626892089844, -83.22686767578125, -3.6379852294921875, 220.68203735351562, 134.7464599609375, 0.06868553161621094, -50.74086380004883, 194.47557067871094, 72.59151458740234, 189.3477783203125, 183.49417114257812, 169.73377990722656, 189.03970336914062, 175.3686065673828, 183.05181884765625, -12.153387069702148, -22.120664596557617, 247.356689453125, 63.69629669189453, 39.00006103515625, 143.52578735351562, 9.52011489868164, 66.234619140625, 179.4366912841797, -122.78598022460938, 67.39737701416016, 27.27811050415039, 24.027565002441406, -2.929891586303711, 172.65985107421875, 161.9776611328125, 90.01664733886719, 39.69678497314453, 169.85369873046875, 31.720970153808594, 146.9869384765625, -46.636932373046875, 140.1908416748047, 157.61862182617188, 197.41583251953125, -151.90679931640625, 4.266178131103516, -12.246200561523438, 188.694091796875, 15.984764099121094, 1.8387298583984375, -48.78326416015625, -7.153596878051758, -79.70271301269531, 228.2828369140625, 20.53081512451172, -70.92571258544922, 17.99047088623047, 113.1295166015625, 7.17671012878418, 199.05682373046875, 43.5570068359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000622.npy"}
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 67.32402038574219, "std": 104.75052642822266, "min": -203.9384765625, "p10": -46.062001800537104, "median": 61.9371337890625, "p90": 199.00833740234376, "max": 316.2799987792969, "pos_frac": 0.71875, "sample": [159.25897216796875, 32.95879364013672, -20.406726837158203, -159.13595581054688, 66.5179443359375, 248.1793212890625, 158.739990234375, 163.10519409179688, 202.11703491210938, 88.87145233154297, -84.69103240966797, 61.88579559326172, 222.895751953125, 61.98847198486328, -28.961227416992188, 227.57418823242188, -13.767688751220703, 173.37948608398438, 57.62079620361328, 119.05789947509766, 40.310791015625, 55.505126953125, 182.83758544921875, -6.59466552734375, -21.66259002685547, 94.88333129882812, 142.7323455810547, 196.0390625, -203.9384765625, -63.982147216796875, 119.08802795410156, 151.70924377441406, 92.13015747070312, 124.29318237304688, -22.464447021484375, -118.61511993408203, 182.05078125, -9.8697509765625, 144.3394317626953, 194.71365356445312, 215.06597900390625, 200.2808837890625, 81.83485412597656, 4.111326217651367, -20.17259979248047, -11.037626266479492, -49.035362243652344, 30.66834831237793, 5.919506072998047, 120.05219268798828, 88.73452758789062, 316.2799987792969, 91.24443054199219, 85.56944274902344, 5.602119445800781, 16.70014190673828, -39.12416076660156, 6.735725402832031, 53.6180419921875, 6.28523063659668, 169.3660430908203, -3.2175064086914062, 54.520660400390625, -131.95855712890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000623.npy"}
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 74.47286224365234, "std": 109.65505981445312, "min": -195.9810791015625, "p10": -39.46215095520019, "median": 40.809810638427734, "p90": 200.15611114501954, "max": 302.7023010253906, "pos_frac": 0.6875, "sample": [-66.03939819335938, -13.1280517578125, 172.00221252441406, 130.98561096191406, 193.07037353515625, 180.35653686523438, -6.916893005371094, -12.598159790039062, -22.102081298828125, -50.298133850097656, 76.93932342529297, 19.387863159179688, 124.4403076171875, 146.9526824951172, 173.27175903320312, 164.87127685546875, 17.50950050354004, 271.0282287597656, 255.75390625, 19.88593292236328, -32.1928596496582, -56.652103424072266, -16.820329666137695, 193.73330688476562, 7.535587310791016, 162.44534301757812, 49.47893524169922, -27.278404235839844, 33.428314208984375, 36.74525451660156, 302.7023010253906, 193.24191284179688, 119.86563110351562, 296.4198913574219, 11.331989288330078, -42.07028579711914, 201.59608459472656, 171.32638549804688, 267.52520751953125, 173.0567626953125, 62.340721130371094, 196.79617309570312, 116.24015808105469, 18.1978759765625, -74.79533386230469, -16.577346801757812, 239.5389862060547, 2.1255645751953125, -14.826845169067383, 139.733154296875, 137.77320861816406, 0.37247657775878906, -1.7410354614257812, 11.324151992797852, -195.9810791015625, 137.84356689453125, 22.823104858398438, -33.376502990722656, 133.7913818359375, -12.895854949951172, -130.33433532714844, -0.8436279296875, 44.874366760253906, 163.06887817382812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000624.npy"}
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 57.492279052734375, "std": 111.60868835449219, "min": -182.35899353027344, "p10": -71.58622436523437, "median": 31.80315399169922, "p90": 214.08541870117188, "max": 299.87347412109375, "pos_frac": 0.671875, "sample": [59.81181335449219, 168.52313232421875, 47.386695861816406, 32.763275146484375, -0.10825538635253906, 146.73204040527344, 84.23616790771484, 124.60454559326172, 4.996488571166992, 268.8426818847656, 97.771728515625, -72.4144287109375, 189.13710021972656, -15.547351837158203, 263.90203857421875, 34.30595397949219, 236.34446716308594, 36.48109436035156, -76.48922729492188, -27.21125030517578, 163.02455139160156, 157.29922485351562, 24.408164978027344, -110.85546875, -5.46478271484375, 30.843032836914062, 28.202003479003906, 36.22003936767578, 14.421977996826172, 33.869781494140625, -2.986408233642578, 207.02854919433594, 299.87347412109375, -19.75647735595703, 145.94566345214844, -4.2764129638671875, -4.678064346313477, 290.43292236328125, -3.5317001342773438, 15.748931884765625, 89.56988525390625, 167.49420166015625, 224.69277954101562, 114.59432983398438, 28.832847595214844, -22.426834106445312, -178.57321166992188, -69.65374755859375, -115.97598266601562, 160.4424591064453, 20.43035888671875, -162.77316284179688, -182.35899353027344, 212.46786499023438, 152.3084716796875, -0.6448898315429688, 214.77865600585938, 16.457122802734375, 65.52886199951172, -29.14653778076172, 75.23577880859375, 15.41019058227539, 10.81995964050293, -27.84252166748047], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000625.npy"}
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 77.05049133300781, "std": 117.26177215576172, "min": -141.91287231445312, "p10": -75.09714355468749, "median": 68.37454605102539, "p90": 200.12186431884766, "max": 435.2178649902344, "pos_frac": 0.71875, "sample": [204.64694213867188, 17.937789916992188, 187.7598876953125, -36.37217712402344, 331.80157470703125, 191.57508850097656, 199.53359985351562, -59.26935577392578, 89.48851776123047, 190.57479858398438, -78.06561279296875, 18.639205932617188, 59.192405700683594, 4.436634063720703, -85.70893096923828, 181.69241333007812, 137.12417602539062, -102.63066101074219, 238.71835327148438, 45.643470764160156, -111.95135498046875, -5.30841064453125, 110.76225280761719, 181.40370178222656, 11.786293029785156, -6.118307113647461, 112.07908630371094, 70.62652587890625, -27.082809448242188, 435.2178649902344, 106.65807342529297, -68.17071533203125, 217.6033935546875, 183.04042053222656, 67.95433807373047, -107.04637145996094, -4.13287353515625, 68.79475402832031, -0.9913539886474609, 127.65848541259766, 9.770378112792969, 12.965837478637695, 193.67393493652344, 41.45612716674805, -21.516876220703125, 3.985910415649414, 89.39883422851562, 180.5421142578125, 63.21602249145508, 171.21914672851562, 154.9799041748047, 13.818737030029297, -141.91287231445312, 189.44140625, -106.07942199707031, 193.62857055664062, 22.948604583740234, 200.3739776611328, 160.4010772705078, 109.757568359375, 239.71531677246094, -66.15476989746094, 154.28628540039062, -38.185447692871094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000626.npy"}
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 82.21084594726562, "std": 108.0606460571289, "min": -188.82077026367188, "p10": -22.9889404296875, "median": 93.86074829101562, "p90": 213.81621704101565, "max": 265.38983154296875, "pos_frac": 0.75, "sample": [130.31201171875, 2.706958770751953, 93.61480712890625, 119.87548828125, 135.78128051757812, 117.01284790039062, 168.94265747070312, 253.07667541503906, 265.38983154296875, 95.78396606445312, -7.436920166015625, 215.15713500976562, 187.8433837890625, 168.8586883544922, 64.51695251464844, 69.43426513671875, -15.751739501953125, 36.324195861816406, 210.68740844726562, 145.55328369140625, 157.3660430908203, 146.5169219970703, 32.43534851074219, 8.20062255859375, -8.751602172851562, -142.0408172607422, 196.01266479492188, -54.17927551269531, 149.68431091308594, -16.351131439208984, 5.848405838012695, 202.0377960205078, 236.16525268554688, -23.88195037841797, 120.00485229492188, -12.642974853515625, -20.905250549316406, 140.65293884277344, -68.13375854492188, 217.15975952148438, 264.608154296875, 65.05107879638672, -154.48452758789062, -3.397186279296875, 53.97240447998047, -1.1128082275390625, -13.09716796875, 121.6456069946289, 27.220855712890625, 237.2564239501953, 53.922698974609375, -188.82077026367188, 6.863811492919922, 94.106689453125, 198.89584350585938, 168.4783935546875, 45.899635314941406, -184.53445434570312, 56.0391845703125, 66.84406280517578, 119.38896942138672, 172.96388244628906, 129.26092529296875, 201.64111328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000627.npy"}
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 79.59342193603516, "std": 115.59613037109375, "min": -241.95640563964844, "p10": -71.56490325927732, "median": 86.17058944702148, "p90": 206.48513641357422, "max": 272.0814208984375, "pos_frac": 0.796875, "sample": [207.9806365966797, 81.68087768554688, 94.14009094238281, 143.6052703857422, -49.497802734375, -126.43002319335938, 218.89230346679688, 28.86456298828125, 177.15048217773438, 183.66970825195312, 209.61212158203125, 230.31723022460938, 120.13420104980469, 41.29374313354492, 45.64710235595703, 161.28860473632812, -241.95640563964844, -157.49490356445312, 172.6660614013672, 30.932735443115234, 202.99563598632812, 75.21974182128906, 19.972688674926758, -95.56240844726562, 194.9334259033203, 154.41526794433594, 74.43032836914062, 152.41714477539062, 94.33712768554688, 195.6043701171875, 181.05369567871094, 233.77734375, 21.243621826171875, 201.54440307617188, 7.377285003662109, 260.2337646484375, -30.127334594726562, 4.461250305175781, 189.45108032226562, -160.87733459472656, 120.88203430175781, 86.52405548095703, 31.50447654724121, 168.5227813720703, 39.16602325439453, 30.363784790039062, 182.96826171875, 80.49075317382812, 168.25921630859375, 190.229736328125, 33.34654998779297, -42.90043640136719, 13.591728210449219, -190.53866577148438, 98.858154296875, 272.0814208984375, 17.005172729492188, 174.5709686279297, -2.1565704345703125, 85.81712341308594, -42.57218933105469, 110.85538482666016, -1.2662239074707031, -81.02223205566406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000628.npy"}
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 58.781219482421875, "std": 93.64983367919922, "min": -156.45953369140625, "p10": -35.93189315795898, "median": 32.47804260253906, "p90": 205.6465545654297, "max": 271.56787109375, "pos_frac": 0.671875, "sample": [-42.81835174560547, 104.13928985595703, -156.45953369140625, 103.32464599609375, 90.42800903320312, 11.992408752441406, -15.867050170898438, 81.70368194580078, -128.74685668945312, 103.66996765136719, -14.239578247070312, 26.06092071533203, 6.869295120239258, 208.90504455566406, -19.32234764099121, -51.09004592895508, 0.4868907928466797, -11.070953369140625, -60.368316650390625, 28.898834228515625, 32.344879150390625, -5.284477233886719, -4.803705215454102, 46.83645248413086, -47.77866744995117, 45.95946502685547, 271.56787109375, 215.75527954101562, -12.510305404663086, 187.58074951171875, 108.33877563476562, -37.98247528076172, 91.20561218261719, 91.1556625366211, -9.554584503173828, 75.05633544921875, 125.43925476074219, 16.6884765625, 4.265628814697266, -1.278116226196289, 12.611862182617188, 64.62916564941406, 238.3380126953125, -1.7400550842285156, -13.414054870605469, -11.513137817382812, 44.41710662841797, 150.06222534179688, 144.89382934570312, 161.0819091796875, -10.746658325195312, 72.9312744140625, 228.5299072265625, 15.913103103637695, 130.9312744140625, 249.22935485839844, 5.632240295410156, 238.04605102539062, 170.1448974609375, 198.0434112548828, 32.6112060546875, 158.11439514160156, -31.147201538085938, 54.899925231933594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000629.npy"}
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 80.39723205566406, "std": 103.49514770507812, "min": -157.124267578125, "p10": -41.49015808105467, "median": 66.01346206665039, "p90": 208.46230316162115, "max": 303.3341979980469, "pos_frac": 0.765625, "sample": [-76.87600708007812, 175.99600219726562, -12.430667877197266, 13.628509521484375, 163.06216430664062, 253.2321014404297, 112.7579116821289, 1.4764366149902344, 233.22811889648438, 157.16790771484375, -2.7245712280273438, 181.55784606933594, -10.282958984375, 188.14825439453125, 15.614280700683594, 178.83026123046875, -107.71682739257812, 83.45108795166016, 156.29379272460938, -126.2989501953125, 49.21649169921875, 1.1437149047851562, 57.36125946044922, 187.861328125, 245.71551513671875, 9.031364440917969, -58.331077575683594, 52.96310043334961, -157.124267578125, 148.1585693359375, -5.509880065917969, 45.75291442871094, 6.408069610595703, 239.7909698486328, 127.09140014648438, 61.863494873046875, 215.7832489013672, 29.565170288085938, 70.1634292602539, 110.5180435180664, 303.3341979980469, 31.222869873046875, 191.38009643554688, -70.93025207519531, 190.09671020507812, 134.96359252929688, 163.06822204589844, 44.990081787109375, 102.91767883300781, -46.827911376953125, 52.47803497314453, -23.26654052734375, -12.309490203857422, 243.27984619140625, 172.74285888671875, 9.086347579956055, 89.67194366455078, 16.76474380493164, 184.55728149414062, 188.79354858398438, -29.035400390625, 81.201904296875, 113.36943817138672, -1.6644096374511719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000630.npy"}
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 89.33956909179688, "std": 115.25093841552734, "min": -170.10336303710938, "p10": -50.96531448364257, "median": 73.81563568115234, "p90": 217.33641204833984, "max": 328.3668212890625, "pos_frac": 0.71875, "sample": [182.767333984375, 81.50436401367188, -53.222625732421875, 216.21243286132812, 149.51707458496094, 183.93927001953125, 216.22286987304688, 161.78042602539062, 54.048763275146484, -20.365509033203125, 21.09271240234375, -6.272525787353516, 142.16912841796875, 21.181411743164062, -1.5973472595214844, 41.76410675048828, -16.847579956054688, 37.13820266723633, 54.31818389892578, 271.58282470703125, 278.38800048828125, -70.70811462402344, 209.52557373046875, 217.1178436279297, 153.48019409179688, 59.313446044921875, 22.190353393554688, 290.3276672363281, -6.713827133178711, 138.52735900878906, 42.41416549682617, 54.452232360839844, -161.5594940185547, 113.75908660888672, 85.51423645019531, 291.3902587890625, 180.8444061279297, -9.052337646484375, 328.3668212890625, 209.94667053222656, 104.1075668334961, -52.97453308105469, -92.00398254394531, 161.81500244140625, -15.283538818359375, -170.10336303710938, -9.258367538452148, 4.787805557250977, -70.54904174804688, 130.57135009765625, -46.277137756347656, 143.40670776367188, -12.462799072265625, 212.41104125976562, -4.988353729248047, 175.8814697265625, 8.08205795288086, 217.43008422851562, 66.12690734863281, 260.9805908203125, 132.568359375, 186.4784698486328, 19.724788665771484, 202.8029022216797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000631.npy"}
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 85.19876861572266, "std": 101.02947998046875, "min": -203.09722900390625, "p10": -25.818331146240233, "median": 82.08902359008789, "p90": 206.3464584350586, "max": 277.4173278808594, "pos_frac": 0.78125, "sample": [114.00357055664062, 216.42788696289062, 102.06907653808594, -1.1529502868652344, 67.82424926757812, 191.26736450195312, 5.479877471923828, 123.04035186767578, -18.824596405029297, 198.41848754882812, 102.03225708007812, -43.151885986328125, 103.30130004882812, -10.946491241455078, -63.32447814941406, 200.72279357910156, 215.99093627929688, 5.393760681152344, 179.81411743164062, -81.07571411132812, 24.99432373046875, 29.571582794189453, -135.58546447753906, 23.024185180664062, 61.31555938720703, 173.79879760742188, 54.485504150390625, 156.82986450195312, 76.28886413574219, -8.070079803466797, 175.7554931640625, 277.4173278808594, 96.00006103515625, -27.110092163085938, 227.35336303710938, 11.667795181274414, 216.5570068359375, 45.332977294921875, 87.8891830444336, 114.80008697509766, 200.08346557617188, 190.63427734375, 17.760284423828125, -22.423683166503906, 8.469287872314453, 205.71237182617188, 140.61895751953125, 117.09657287597656, 200.0570068359375, 188.14913940429688, 195.17721557617188, -203.09722900390625, 46.253082275390625, 141.8688507080078, -2.127899169921875, -22.804222106933594, 206.6182098388672, 219.26095581054688, 195.15513610839844, 45.503448486328125, 56.744239807128906, 32.79288864135742, -48.96184158325195, 54.55428695678711], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000632.npy"}
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 66.50627899169922, "std": 99.79718780517578, "min": -139.0418701171875, "p10": -54.871221923828124, "median": 45.489803314208984, "p90": 199.42476196289064, "max": 265.83270263671875, "pos_frac": 0.71875, "sample": [-3.3476619720458984, 35.23179626464844, 15.695075988769531, 112.95404052734375, 159.16014099121094, -19.59625244140625, 129.61471557617188, 142.05027770996094, 61.45829772949219, 75.46379852294922, -5.2092742919921875, 15.816164016723633, 256.03997802734375, 49.03067398071289, -50.777034759521484, 151.0386962890625, 13.304855346679688, 116.23289489746094, 87.26241302490234, -25.935585021972656, 6.16180419921875, -31.18981170654297, 117.0932846069336, 37.368385314941406, 176.87472534179688, 68.71514129638672, -29.71384048461914, -22.968223571777344, 133.31871032714844, 108.69861602783203, 187.22023010253906, 7.062858581542969, 198.090087890625, 33.43328857421875, -56.62587356567383, 11.65524673461914, -86.23020935058594, -35.06464385986328, -63.012428283691406, -29.644615173339844, -76.67743682861328, 41.94893264770508, 2.0079612731933594, 243.384033203125, 52.37944793701172, 67.837158203125, 39.93798065185547, -72.22054290771484, 60.65898132324219, 192.78009033203125, 191.152099609375, 183.24827575683594, 199.99676513671875, 191.02349853515625, 4.0864410400390625, -85.41024017333984, -6.000835418701172, 22.46364974975586, 154.42083740234375, 216.9765167236328, 265.83270263671875, -139.0418701171875, 244.70262145996094, 214.18411254882812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000633.npy"}
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 26.432340621948242, "std": 103.29545593261719, "min": -187.72975158691406, "p10": -111.56288833618162, "median": 2.4425477981567383, "p90": 180.80621337890628, "max": 284.626708984375, "pos_frac": 0.53125, "sample": [-0.2894935607910156, -121.71026611328125, -3.8001556396484375, 193.61285400390625, 26.530609130859375, -20.6270809173584, -14.92208480834961, -21.260404586791992, 45.693687438964844, 18.811904907226562, 14.048616409301758, -7.769752502441406, 0.43682861328125, -31.37079620361328, -20.225692749023438, 60.71626281738281, 128.3079071044922, 62.49111557006836, -75.27787017822266, -41.13865661621094, -34.16838836669922, -1.2083473205566406, 135.16114807128906, -151.29299926757812, 197.15245056152344, 3.6240234375, 203.04429626464844, 154.3628387451172, 284.626708984375, -145.34373474121094, -23.7236328125, 17.92569923400879, -4.753574371337891, -37.08340835571289, 183.79747009277344, 7.482778549194336, -87.88567352294922, -161.3534698486328, 219.12770080566406, 173.8266143798828, 133.99778747558594, -69.00802612304688, -124.85037231445312, 22.830608367919922, -141.80422973632812, -187.72975158691406, -6.585849761962891, 1.8479042053222656, 185.76853942871094, 62.907169342041016, -30.8819580078125, -50.124725341796875, 142.5152587890625, 44.18970489501953, -1.292654037475586, -5.139636993408203, 172.1088409423828, 135.2366180419922, 74.42378997802734, 102.5251693725586, 13.130622863769531, 3.037191390991211, -41.82891082763672, 130.82064819335938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000634.npy"}
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 53.369022369384766, "std": 110.91737365722656, "min": -182.18057250976562, "p10": -89.84078826904297, "median": 38.79054260253906, "p90": 198.42307586669924, "max": 244.4678497314453, "pos_frac": 0.65625, "sample": [-89.52626037597656, 181.06021118164062, -117.98546600341797, 9.354053497314453, 52.26678466796875, 21.42375946044922, -89.9755859375, -140.0457763671875, -128.63551330566406, 186.53526306152344, 0.3377838134765625, 177.43197631835938, 16.325294494628906, -10.996253967285156, 203.10919189453125, 215.57623291015625, 164.92388916015625, 9.986839294433594, 196.53976440429688, 190.68722534179688, -72.99726867675781, 26.080474853515625, -36.97595977783203, 222.89959716796875, 51.5006103515625, -66.05677795410156, 166.501220703125, -2.6224422454833984, -13.367931365966797, 94.80960083007812, 98.15362548828125, -51.75353240966797, 191.71971130371094, 22.86212921142578, 52.651954650878906, 8.445316314697266, 186.8464813232422, -2.9724998474121094, 199.23020935058594, -164.18348693847656, 140.6453399658203, 238.62545776367188, -41.85637283325195, 206.9241180419922, -30.821380615234375, -37.16096496582031, 85.60417175292969, 96.52111053466797, 112.02851867675781, -46.16969299316406, 109.63658905029297, 244.4678497314453, -7.004547119140625, 76.4504623413086, -182.18057250976562, 132.53048706054688, 136.6277313232422, 22.162277221679688, -8.795501708984375, 145.95162963867188, -121.77552795410156, 105.26829528808594, 76.90128326416016, 1.8721084594726562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000635.npy"}
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 72.84630584716797, "std": 96.97818756103516, "min": -184.31271362304688, "p10": -19.084328460693357, "median": 79.85703659057617, "p90": 189.3914337158203, "max": 282.14031982421875, "pos_frac": 0.8125, "sample": [189.82374572753906, 9.766365051269531, 30.861257553100586, 22.50341033935547, 94.99681854248047, 15.785072326660156, 79.008056640625, -7.043434143066406, -27.257606506347656, 185.484375, 115.02302551269531, -11.46641731262207, 187.89833068847656, 154.72976684570312, 45.4852294921875, 50.34336853027344, -45.99189758300781, 188.38270568847656, 76.1894302368164, 124.03704071044922, 148.79721069335938, 6.1751861572265625, 2.709646224975586, 156.84681701660156, 2.0803070068359375, 89.70673370361328, 254.91128540039062, 4.43638801574707, -1.2303447723388672, 80.70601654052734, 98.88299560546875, 94.1307144165039, 241.4517059326172, 83.00738525390625, -13.683334350585938, 162.01705932617188, 112.70388793945312, 57.29943084716797, -20.75286102294922, 181.53823852539062, 119.59232330322266, 203.84664916992188, -15.191085815429688, 282.14031982421875, 7.660266876220703, -184.31271362304688, 217.33531188964844, 42.104248046875, 2.5550270080566406, 180.529296875, 138.35365295410156, 89.33521270751953, 129.9308624267578, 216.37570190429688, 90.33232116699219, -143.22052001953125, -170.88265991210938, 12.493555068969727, 81.36331176757812, 95.0509262084961, 25.29925537109375, -89.42334747314453, 73.29194641113281, 37.31097412109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000636.npy"}
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 54.00630187988281, "std": 112.38951110839844, "min": -273.3264465332031, "p10": -89.35890655517576, "median": 31.443653106689453, "p90": 206.37225341796878, "max": 349.2461853027344, "pos_frac": 0.671875, "sample": [-1.7623748779296875, 231.51901245117188, 15.757911682128906, 349.2461853027344, -23.84410858154297, -18.377166748046875, -7.191139221191406, 127.46794891357422, 216.28781127929688, 37.14192581176758, -40.923301696777344, 32.49018096923828, 130.79153442382812, -109.14591217041016, 145.29000854492188, 59.81203079223633, -17.96973419189453, 122.01959228515625, 193.42599487304688, -21.58983039855957, 97.30735778808594, 40.41047668457031, 19.17401123046875, 0.6931076049804688, 8.012845993041992, 51.76177978515625, -94.30968475341797, 166.8209686279297, 243.36048889160156, 196.58538818359375, 195.24526977539062, 15.651420593261719, -138.5310821533203, 151.59420776367188, -34.706146240234375, 145.4975128173828, -110.5919189453125, -4.034263610839844, 17.424659729003906, 44.460365295410156, 78.61906433105469, 222.03146362304688, 1.7878131866455078, 18.548465728759766, 200.3690185546875, 21.259300231933594, -2.416473388671875, 72.88330078125, -105.68144226074219, -9.610969543457031, 30.397125244140625, 232.82176208496094, 93.7889404296875, 72.63214874267578, 24.835739135742188, -132.24713134765625, 208.945068359375, 64.36642456054688, -15.283905029296875, -273.3264465332031, 100.05477142333984, 197.4666748046875, -77.80709075927734, -0.3035888671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000637.npy"}
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 64.90939331054688, "std": 99.2132339477539, "min": -152.37904357910156, "p10": -36.34038734436035, "median": 19.92477798461914, "p90": 203.15985412597658, "max": 249.53184509277344, "pos_frac": 0.71875, "sample": [-146.96212768554688, 156.8588104248047, 122.36872863769531, 1.2280769348144531, -152.37904357910156, -3.4665069580078125, 206.83636474609375, 133.21353149414062, 40.796321868896484, -21.037586212158203, 6.895263671875, -54.52797317504883, -15.696884155273438, -23.456436157226562, -41.522247314453125, 226.023193359375, -6.7489776611328125, -34.779510498046875, 202.25787353515625, 13.70941162109375, 97.62358093261719, 13.325355529785156, 162.23040771484375, 196.63211059570312, -90.7568130493164, 209.6034698486328, 123.57091522216797, 183.72308349609375, 1.5097808837890625, 61.546356201171875, 5.026327133178711, 208.18115234375, 159.6602325439453, 174.4436492919922, 18.04656982421875, -7.446657180786133, 11.533470153808594, 138.08370971679688, 12.056655883789062, 197.27137756347656, 83.8305892944336, 203.711181640625, 203.54641723632812, -3.5580177307128906, 190.23873901367188, -37.57270812988281, -3.769367218017578, 184.67526245117188, -9.409652709960938, 23.84253692626953, 0.5408039093017578, 17.911602020263672, 249.53184509277344, 83.74078369140625, -37.009334564208984, 21.80298614501953, 2.8354034423828125, 199.38641357421875, 191.54917907714844, 9.496044158935547, -11.64959716796875, 0.3001251220703125, 56.123924255371094, 48.63116455078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000638.npy"}
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 76.33148193359375, "std": 106.7479248046875, "min": -118.46179962158203, "p10": -55.238303375244136, "median": 50.99018096923828, "p90": 219.26441650390626, "max": 296.0768127441406, "pos_frac": 0.71875, "sample": [190.22886657714844, 133.89553833007812, 135.7283935546875, -72.90478515625, 61.41046142578125, 172.30856323242188, 11.401084899902344, -36.22275161743164, -55.70459747314453, 137.4998321533203, 24.171859741210938, 70.37017822265625, 270.93927001953125, 53.71415710449219, 172.9816131591797, -0.764251708984375, 47.31890869140625, 3.57373046875, -117.71263885498047, 103.8368148803711, 143.62428283691406, 187.58445739746094, -1.0123538970947266, 249.1552734375, 8.134017944335938, 129.8978729248047, 67.96372985839844, 196.1033477783203, -46.530303955078125, 220.11143493652344, 226.9796905517578, 0.2077007293701172, 0.2243175506591797, 144.44912719726562, -4.286380767822266, 19.901695251464844, 143.36178588867188, -9.915695190429688, 296.0768127441406, 168.828369140625, 172.31927490234375, 48.266204833984375, 276.0428771972656, 114.54228973388672, -77.49624633789062, -5.976078033447266, -25.768699645996094, -61.36018753051758, 44.19844055175781, 25.294414520263672, 226.51309204101562, 177.1697540283203, 0.7592697143554688, 26.2088623046875, -19.3048095703125, -118.46179962158203, 24.790443420410156, -54.15028381347656, -63.72161102294922, 217.2880401611328, -39.006370544433594, 159.0052032470703, 182.9253692626953, 208.2080078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000639.npy"}
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 70.56825256347656, "std": 102.77926635742188, "min": -198.29922485351562, "p10": -41.730719757080074, "median": 54.27428436279297, "p90": 196.88287658691408, "max": 297.431640625, "pos_frac": 0.796875, "sample": [185.444580078125, 43.90744400024414, 297.431640625, -31.62556266784668, 17.82379913330078, 116.52679443359375, 127.76319885253906, 5.501945495605469, 145.30010986328125, 199.42010498046875, 55.1795654296875, 15.352317810058594, 134.4212188720703, 166.3138427734375, -129.48963928222656, 141.5957489013672, 66.37361907958984, 33.984806060791016, -54.313350677490234, 209.24192810058594, -190.33592224121094, 140.3955078125, 235.61915588378906, 53.36900329589844, -53.75616455078125, -12.505870819091797, 155.8695068359375, 24.33216094970703, -22.28911781311035, 3.621204376220703, 172.22195434570312, 7.5750579833984375, 59.233642578125, 199.9451141357422, 115.57708740234375, -1.868804931640625, -20.96259307861328, 31.000823974609375, -63.80741882324219, 90.4069595336914, 45.972564697265625, 190.96267700195312, 6.629951477050781, 64.36019897460938, 190.51388549804688, 128.0623779296875, 88.94165802001953, -198.29922485351562, 20.597620010375977, 186.4375457763672, 139.28404235839844, 242.32867431640625, 31.763946533203125, 284.72796630859375, 11.608230590820312, -43.775428771972656, 178.04563903808594, -36.95973205566406, 107.09405517578125, 38.340057373046875, 12.847400665283203, 32.172576904296875, 33.72618865966797, 91.19001007080078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000640.npy"}
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 62.92747497558594, "std": 85.35399627685547, "min": -207.17030334472656, "p10": -18.28658790588379, "median": 49.02896499633789, "p90": 176.85278930664066, "max": 294.0379638671875, "pos_frac": 0.796875, "sample": [-19.168994903564453, -111.62401580810547, 98.42042541503906, 89.38693237304688, 30.48779296875, -2.6752777099609375, 156.29226684570312, 5.245201110839844, 0.48041534423828125, -1.8315658569335938, 36.86058044433594, 14.124673843383789, 143.860107421875, 167.6742401123047, 64.6534423828125, 8.77393913269043, -2.80535888671875, 191.87039184570312, 192.9722137451172, 140.70831298828125, 124.7643814086914, -16.227638244628906, 52.820335388183594, 45.55375671386719, 99.15055084228516, 212.5627899169922, 10.145980834960938, 79.22157287597656, 29.037490844726562, 116.26766967773438, 9.895538330078125, 67.15374755859375, -64.61023712158203, 98.8327865600586, 11.09759521484375, -54.882408142089844, 145.51425170898438, 54.05900573730469, -22.060218811035156, 47.12928771972656, -207.17030334472656, 131.527099609375, 54.55914306640625, 20.508438110351562, 97.21630859375, 220.41915893554688, 13.511726379394531, -2.7981319427490234, 160.377685546875, 31.374191284179688, 294.0379638671875, 118.11582946777344, 212.84226989746094, 180.7864532470703, -19.851715087890625, 86.30662536621094, 112.29864501953125, 24.928756713867188, 34.65652847290039, 16.723052978515625, -1.0874404907226562, 13.5078125, 50.92864227294922, 134.5078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000641.npy"}
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 45.49795913696289, "std": 106.11334228515625, "min": -176.01629638671875, "p10": -78.91648712158202, "median": 35.08260154724121, "p90": 182.26822204589845, "max": 312.5115661621094, "pos_frac": 0.6875, "sample": [73.6150131225586, 9.240379333496094, 1.7997207641601562, -108.73863220214844, 56.61829376220703, -11.646581649780273, 117.03486633300781, 196.1339111328125, -81.49569702148438, -166.27149963378906, 312.5115661621094, -5.7921600341796875, 108.12794494628906, -66.64836120605469, 83.05154418945312, 57.59059143066406, 137.58261108398438, 227.77813720703125, -39.757652282714844, -17.671939849853516, 121.22118377685547, 128.79736328125, 176.57916259765625, -38.9945182800293, 96.47996520996094, 161.58702087402344, 192.44215393066406, 24.250701904296875, 169.09375, 35.78064727783203, 104.7288589477539, 105.89312744140625, 81.69766235351562, 17.37936782836914, -15.482734680175781, 166.01361083984375, -67.27070617675781, 59.954124450683594, 7.575462341308594, -72.28923034667969, 15.71249008178711, -20.78790283203125, 9.182748794555664, -56.95954132080078, 149.77276611328125, 15.36412239074707, -85.57781982421875, 144.31838989257812, 162.28515625, 207.395263671875, 41.7637939453125, 27.769935607910156, 184.70639038085938, 117.88387298583984, 195.0410614013672, 34.38455581665039, -64.92310333251953, 55.34918975830078, -176.01629638671875, -171.50930786132812, 6.666595458984375, -72.89833068847656, -157.7355499267578, 12.181838989257812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000642.npy"}
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 72.06438446044922, "std": 108.8580551147461, "min": -189.32867431640625, "p10": -63.35343856811523, "median": 71.89620208740234, "p90": 208.64237060546878, "max": 304.735595703125, "pos_frac": 0.75, "sample": [67.85372924804688, 86.75906372070312, 88.66944885253906, 24.723388671875, 1.1344099044799805, 234.54672241210938, 0.9878330230712891, 180.20082092285156, 86.75939178466797, 28.39012908935547, 134.9110107421875, 36.752540588378906, 141.20458984375, 0.7537384033203125, 139.35598754882812, 127.33535766601562, -43.64088439941406, 7.489374160766602, 202.10110473632812, 210.99221801757812, 167.92239379882812, 26.00904083251953, 181.81576538085938, 63.70951843261719, -48.98249053955078, 203.15939331054688, 59.07069396972656, -36.652587890625, 198.8890380859375, 276.5284118652344, 98.62203216552734, 160.56874084472656, -34.51458740234375, 161.30665588378906, 71.33439636230469, -75.93922424316406, 47.54669189453125, -4.380828857421875, 231.82196044921875, 173.9273223876953, 148.41868591308594, -66.92760467529297, 146.96070861816406, -12.462478637695312, 165.44677734375, -139.60362243652344, -126.32001495361328, -189.32867431640625, 2.2976837158203125, 21.60049819946289, -65.01122283935547, 97.41108703613281, 220.55166625976562, 93.09751892089844, 72.4580078125, -59.48527526855469, -15.974044799804688, 227.14776611328125, 62.61225128173828, 73.67615509033203, -32.95572280883789, 304.735595703125, 142.2989501953125, -137.5662841796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000643.npy"}
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 52.47273635864258, "std": 107.96935272216797, "min": -242.272705078125, "p10": -72.26185150146482, "median": 26.724430084228516, "p90": 192.4839614868164, "max": 227.34564208984375, "pos_frac": 0.75, "sample": [207.87881469726562, -113.8845443725586, -242.272705078125, 169.04164123535156, -37.6987419128418, -82.54620361328125, 8.445304870605469, 2.4566402435302734, 186.713623046875, 227.34564208984375, 68.90107727050781, 44.587642669677734, 192.7291259765625, 143.4912109375, 92.67556762695312, 59.5531005859375, 54.481929779052734, 46.018821716308594, 5.419715881347656, 185.65234375, 10.416515350341797, 188.1938934326172, -150.48216247558594, -25.631118774414062, -12.636322021484375, 117.70398712158203, 1.28045654296875, 112.24920654296875, 7.2371368408203125, -48.26502990722656, 191.9119110107422, -25.302734375, 2.1674652099609375, 150.0745391845703, 164.82095336914062, -0.5665512084960938, -37.38726806640625, 217.93609619140625, 193.63389587402344, 17.781700134277344, 98.88492584228516, 199.28823852539062, 183.2932891845703, 33.031639099121094, 14.354888916015625, -204.3457794189453, -13.340957641601562, 125.61732482910156, 16.819305419921875, 164.9840545654297, -37.78129577636719, 13.184768676757812, 4.903715133666992, -106.97087860107422, 20.417221069335938, 4.8382110595703125, -133.39303588867188, 108.8992919921875, 130.86114501953125, 4.195594787597656, 18.215164184570312, 174.52230834960938, 33.18573760986328, 210.46365356445312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000644.npy"}
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 64.76499938964844, "std": 90.17715454101562, "min": -187.18930053710938, "p10": -33.46108894348144, "median": 52.24164962768555, "p90": 180.12332000732425, "max": 260.30804443359375, "pos_frac": 0.765625, "sample": [29.679080963134766, 210.972900390625, 124.3044662475586, -96.5826644897461, 99.65135192871094, 244.5008544921875, 144.69903564453125, -0.8003330230712891, -15.174552917480469, 20.605079650878906, 15.060726165771484, -99.39045715332031, 50.0052490234375, -0.14691543579101562, 162.34286499023438, 41.82521057128906, 116.11032104492188, 0.8087272644042969, -6.470703125, 22.875959396362305, 193.74581909179688, -31.62000274658203, 11.373802185058594, 110.36734771728516, 59.04985809326172, -7.072639465332031, 54.478050231933594, 103.60955810546875, 191.7853546142578, 113.11005401611328, 129.48272705078125, -1.8712749481201172, 74.1941146850586, 21.336563110351562, 7.5339813232421875, 150.04696655273438, -106.33833312988281, 20.9366455078125, -60.0157470703125, 203.98155212402344, 101.68260192871094, 171.2669677734375, 260.30804443359375, -8.454551696777344, 43.439632415771484, 119.19676208496094, 146.24853515625, 86.96172332763672, 81.34951782226562, 132.322021484375, 19.831893920898438, 141.01722717285156, 33.1700325012207, 166.49368286132812, 63.76214599609375, 0.197296142578125, 154.17294311523438, -34.250125885009766, -45.874237060546875, -187.18930053710938, 20.570518493652344, 183.9188995361328, 159.80029296875, 32.026390075683594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000645.npy"}
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 56.72221374511719, "std": 105.94404602050781, "min": -192.6645965576172, "p10": -65.26466903686521, "median": 39.39584541320801, "p90": 193.7498565673828, "max": 310.2866516113281, "pos_frac": 0.703125, "sample": [-45.844146728515625, 0.8048381805419922, -1.6168899536132812, 128.1322021484375, 190.7561798095703, 180.9442596435547, -135.76820373535156, 208.8758544921875, 94.45201110839844, 200.55795288085938, -10.058509826660156, 15.048431396484375, 185.05856323242188, 192.42669677734375, 144.35446166992188, -4.462921142578125, 93.42217254638672, 94.07516479492188, 194.31692504882812, 125.4276351928711, 215.8070068359375, 88.99481201171875, -192.6645965576172, -73.2308120727539, -86.46456909179688, 119.08232879638672, 2.368572235107422, -7.6425018310546875, 13.867729187011719, 41.873291015625, 59.766754150390625, 0.10671234130859375, 16.476539611816406, 204.42372131347656, 76.38076782226562, 4.187309265136719, 49.616058349609375, 8.344749450683594, 64.4847183227539, 142.04820251464844, -1.214120864868164, -46.677001953125, 121.90919494628906, -83.58784484863281, 150.88107299804688, 310.2866516113281, 182.05807495117188, 150.27548217773438, 27.472572326660156, 257.3561706542969, 22.074430465698242, -0.423553466796875, 3.748046875, -20.721847534179688, -188.9165802001953, 42.32231903076172, 91.50732421875, -140.5560760498047, -38.18547058105469, 164.03692626953125, 19.78457260131836, -11.826522827148438, -17.029857635498047, 36.918399810791016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000646.npy"}
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 91.19052124023438, "std": 98.63933563232422, "min": -140.43983459472656, "p10": -28.613979148864743, "median": 96.95951461791992, "p90": 213.37457275390625, "max": 302.7694396972656, "pos_frac": 0.828125, "sample": [-7.355113983154297, 118.26773834228516, 187.6692352294922, 16.314224243164062, 60.210357666015625, 136.7095489501953, 24.403854370117188, 220.15402221679688, 124.68182373046875, 269.79803466796875, 98.20652770996094, 151.23590087890625, 153.04830932617188, 1.3828697204589844, -36.43519592285156, 7.557861328125, 302.7694396972656, -29.12632942199707, -88.22186279296875, 171.264892578125, 186.1651611328125, 74.59614562988281, 121.0317153930664, 106.470458984375, 215.2054443359375, 167.19683837890625, 182.92050170898438, 0.5067996978759766, 198.1663360595703, 48.1591796875, 172.68194580078125, 11.07843017578125, 155.73876953125, 6.510383605957031, 152.41943359375, -25.372467041015625, 18.982891082763672, 200.7382049560547, 57.095481872558594, -74.53289031982422, 124.00358581542969, 35.32874298095703, 45.9448127746582, 161.52685546875, -140.43983459472656, 45.84123992919922, 92.36418914794922, 29.96768569946289, 48.91471862792969, 95.7125015258789, -25.369140625, 227.84725952148438, 181.0159912109375, -44.54522705078125, -92.69415283203125, 177.81951904296875, 45.69017028808594, 264.0337829589844, -27.418495178222656, 214.78265380859375, 210.08905029296875, 39.803794860839844, 144.0499267578125, 123.62884521484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000647.npy"}
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 63.872833251953125, "std": 113.85063934326172, "min": -201.17459106445312, "p10": -67.54464263916014, "median": 67.93975830078125, "p90": 194.96065521240237, "max": 308.74847412109375, "pos_frac": 0.6875, "sample": [173.05038452148438, 36.409446716308594, 15.436225891113281, -189.1765899658203, 177.21646118164062, -62.34141540527344, -201.17459106445312, 3.14019775390625, 189.14065551757812, 43.879173278808594, -61.63205337524414, 107.41878509521484, 51.179481506347656, -22.301734924316406, -105.91241455078125, -139.31222534179688, 48.87639236450195, -15.645713806152344, 308.74847412109375, 134.03744506835938, 196.60125732421875, 124.05155181884766, 119.6485595703125, 140.45672607421875, 181.36148071289062, -15.305252075195312, 108.37655639648438, 208.72817993164062, 63.76392364501953, 79.67886352539062, -39.697357177734375, -2.83770751953125, 276.78839111328125, 274.30999755859375, 1.9396591186523438, -96.81781768798828, 66.89479064941406, 108.13341522216797, 151.36985778808594, 0.6588058471679688, 131.54383850097656, 191.13258361816406, 212.3126678466797, -69.77459716796875, -17.328285217285156, 11.563102722167969, 56.91313171386719, 123.11957550048828, 188.6767120361328, 208.2138671875, 150.34030151367188, 136.68502807617188, 148.62564086914062, -154.1480255126953, -53.7547607421875, 68.98472595214844, 118.1370849609375, -46.18726348876953, 88.02934265136719, -52.300045013427734, -1.9090652465820312, -40.44080352783203, 134.40618896484375, 115.87995147705078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000648.npy"}
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 65.5841064453125, "std": 109.97557067871094, "min": -196.43312072753906, "p10": -85.90302276611327, "median": 66.43641090393066, "p90": 190.61991271972656, "max": 278.1448974609375, "pos_frac": 0.671875, "sample": [160.24481201171875, 144.04434204101562, -38.973995208740234, 173.22064208984375, 172.05548095703125, 183.03244018554688, -11.796852111816406, -6.319345474243164, -14.560081481933594, 178.59548950195312, 9.283926010131836, 278.1448974609375, 137.20677185058594, -1.3662452697753906, 174.3634033203125, 180.5737762451172, 126.3822021484375, 71.44622802734375, 150.17689514160156, 199.36004638671875, 191.37210083007812, -2.026803970336914, -129.5830078125, -18.59935760498047, -196.43312072753906, 107.28521728515625, 7.751548767089844, 57.124778747558594, -122.109375, -73.05941772460938, 61.42659378051758, 53.736839294433594, 159.86643981933594, -0.42932701110839844, 101.4227294921875, 161.06198120117188, 57.40007781982422, 163.1185760498047, 121.5155258178711, 182.96876525878906, 16.90106964111328, 203.67367553710938, 84.7626724243164, -19.91220474243164, 16.003143310546875, 188.86480712890625, 2.3409805297851562, 16.002174377441406, -3.3170623779296875, 78.75706481933594, 28.531394958496094, 245.20175170898438, -91.40742492675781, -185.5008544921875, 225.9752960205078, -28.527530670166016, -116.41354370117188, -111.93522644042969, 100.9717788696289, 192.97817993164062, 164.35891723632812, 94.14936065673828, -3.330608367919922, -50.67059326171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000649.npy"}
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 66.8469009399414, "std": 110.90471649169922, "min": -188.6079559326172, "p10": -57.48782272338866, "median": 52.00015449523926, "p90": 193.05318603515627, "max": 253.5922088623047, "pos_frac": 0.71875, "sample": [-33.14018630981445, 235.41200256347656, 170.13595581054688, 140.29458618164062, 15.195465087890625, 177.22235107421875, 181.64834594726562, 87.7564697265625, 8.643569946289062, 36.41305923461914, 167.14962768554688, 184.55918884277344, -150.59848022460938, -4.653450012207031, 3.4444828033447266, 155.50173950195312, 159.7399444580078, 27.316598892211914, 223.7953643798828, 0.06909370422363281, 174.6632537841797, 114.73451232910156, -43.08177947998047, 6.588649749755859, 185.11326599121094, 199.49996948242188, 194.30113220214844, 188.81939697265625, 117.45377349853516, -113.4321517944336, 2.691547393798828, -152.2711944580078, -6.518695831298828, 132.53335571289062, -181.04959106445312, 26.547439575195312, 190.1413116455078, 83.04692077636719, 17.91327667236328, -19.38660430908203, -48.356414794921875, 77.57249450683594, -188.6079559326172, 178.88204956054688, -63.74211120605469, -7.162254333496094, 253.5922088623047, 209.33023071289062, 124.23336791992188, -61.401283264160156, 154.79934692382812, 7.1254730224609375, 184.2908172607422, -1.6996498107910156, 11.11808967590332, 67.58724975585938, -23.952590942382812, 8.286972045898438, 181.77481079101562, 240.98452758789062, -21.787628173828125, -15.792491912841797, 92.63029479980469, 14.282501220703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000650.npy"}
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 57.69928741455078, "std": 98.78450775146484, "min": -162.92572021484375, "p10": -47.30834350585936, "median": 26.60142421722412, "p90": 197.26836242675782, "max": 285.81488037109375, "pos_frac": 0.703125, "sample": [-1.4648361206054688, 31.11492919921875, 100.70233154296875, -57.24908447265625, 35.687034606933594, -73.1227798461914, -36.667396545410156, 94.38544464111328, 22.488815307617188, 185.0311737060547, 1.59228515625, -26.79443359375, -4.9015960693359375, 20.211502075195312, 108.14692687988281, -162.92572021484375, 11.425514221191406, 277.1377868652344, 54.270530700683594, 191.37303161621094, 195.8782958984375, -6.090919494628906, -58.1441650390625, 126.03239440917969, 12.810348510742188, 28.328022003173828, 54.815032958984375, 82.380859375, -6.6029815673828125, 249.37252807617188, 8.96541976928711, 216.11306762695312, 9.961780548095703, -19.732582092285156, 39.28437805175781, 0.1377086639404297, 178.98341369628906, 15.292678833007812, -28.36444854736328, 42.82841873168945, 202.906982421875, 17.268478393554688, 134.2827911376953, -2.120258331298828, -16.825824737548828, 197.86410522460938, 223.50230407714844, 169.57275390625, -51.53246307373047, 182.00860595703125, 24.874826431274414, 7.519256591796875, 1.865203857421875, 123.14608001708984, 139.97286987304688, 110.21382904052734, 181.08314514160156, -1.8621253967285156, 45.83367919921875, -122.11849975585938, 40.28540802001953, -76.03999328613281, 285.81488037109375, -37.452064514160156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000651.npy"}
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 55.69186019897461, "std": 115.02366638183594, "min": -177.61373901367188, "p10": -71.14025802612305, "median": 21.83932113647461, "p90": 198.1249481201172, "max": 388.505126953125, "pos_frac": 0.6875, "sample": [15.067096710205078, 173.70643615722656, 179.20016479492188, 15.889579772949219, -38.861778259277344, 192.20181274414062, -2.8645458221435547, 108.00790405273438, 176.12786865234375, 7.539695739746094, 200.262451171875, -161.46307373046875, 5.481174468994141, 9.089967727661133, 87.4718017578125, 195.250244140625, 28.904815673828125, 112.51891326904297, 109.3006820678711, 14.492507934570312, -71.62088012695312, 206.94570922851562, 156.8159637451172, 1.9626998901367188, 13.916339874267578, -42.017112731933594, -172.79876708984375, -50.362884521484375, 226.4315643310547, 110.9643325805664, 32.919342041015625, -36.68396759033203, 98.49449157714844, 4.248832702636719, 150.13589477539062, -70.01880645751953, 187.55084228515625, 109.82638549804688, -15.042671203613281, -144.55560302734375, -32.63652038574219, 199.35696411132812, -8.068572998046875, 150.86407470703125, -51.941070556640625, -28.514911651611328, 201.31161499023438, 159.68692016601562, 0.019834518432617188, 207.93203735351562, 388.505126953125, 123.93799591064453, 111.5013427734375, -12.500740051269531, -2.642742156982422, 7.8321380615234375, -147.11961364746094, 4.341707229614258, -85.0618896484375, 192.87808227539062, 36.81610107421875, 27.7890625, -177.61373901367188, 173.17031860351562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000652.npy"}
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 66.8171157836914, "std": 107.11019134521484, "min": -213.07546997070312, "p10": -64.55327758789062, "median": 62.09437942504883, "p90": 196.50213012695315, "max": 285.8990173339844, "pos_frac": 0.78125, "sample": [8.628450393676758, 164.72103881835938, 14.673431396484375, -16.11927032470703, 88.51976013183594, 22.69500732421875, 64.1197509765625, -67.05597686767578, 197.46173095703125, 9.026077270507812, -14.87026596069336, 2.6128692626953125, 125.502197265625, -95.716064453125, 285.8990173339844, 146.90081787109375, 99.15176391601562, 212.99191284179688, 34.8414421081543, 207.71798706054688, 135.43133544921875, 219.25746154785156, 116.17037200927734, -17.299217224121094, 69.6854019165039, 43.326080322265625, 194.2630615234375, 60.069007873535156, 0.4902153015136719, -35.99766540527344, 220.15936279296875, -79.32337951660156, -58.713645935058594, 36.978267669677734, 4.88969612121582, -92.43836975097656, 127.42085266113281, 73.6728744506836, 169.0580291748047, 76.16168212890625, 23.084869384765625, -160.14353942871094, 260.8392028808594, 57.201446533203125, 157.82485961914062, -182.5463104248047, 187.25108337402344, -28.266571044921875, 146.53623962402344, 81.24111938476562, -13.462272644042969, 193.37899780273438, -213.07546997070312, 2.6855316162109375, 4.54487419128418, 54.106529235839844, 177.55189514160156, 189.53106689453125, 14.445720672607422, 175.86270141601562, 64.17264556884766, 124.57058715820312, 31.802635192871094, 172.19454956054688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000653.npy"}
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 57.78799057006836, "std": 100.73625183105469, "min": -137.25018310546875, "p10": -73.20216369628906, "median": 44.823076248168945, "p90": 199.77073669433597, "max": 321.20123291015625, "pos_frac": 0.734375, "sample": [13.98956298828125, 228.5725555419922, -135.037353515625, 127.37406158447266, 69.49116516113281, -97.88642883300781, 202.4271240234375, 79.62471771240234, 1.4277687072753906, -78.04127502441406, 3.42132568359375, -68.949462890625, 131.18728637695312, 4.983467102050781, -19.915489196777344, 22.150869369506836, 185.34974670410156, -107.77320861816406, 191.90065002441406, 28.103973388671875, 38.55488586425781, 14.80257797241211, 194.08665466308594, 218.91305541992188, -7.997465133666992, -14.271614074707031, 9.456546783447266, -15.014625549316406, -46.088905334472656, 0.3631401062011719, -3.69085693359375, 127.40798950195312, 321.20123291015625, 109.7200927734375, 25.176918029785156, 205.99386596679688, 208.36264038085938, -30.791671752929688, 60.584014892578125, 79.11062622070312, 126.67243194580078, 77.53700256347656, 122.64793395996094, -121.79653930664062, 15.010345458984375, 13.865188598632812, 202.20677185058594, 187.77835083007812, 52.48808288574219, 106.10943603515625, 74.83955383300781, 192.781005859375, 0.4817314147949219, 63.17292022705078, 176.63671875, -75.02474975585938, -2.30743408203125, 117.62261962890625, 83.38478088378906, 104.05812072753906, 39.75333023071289, 49.892822265625, -50.40898132324219, -137.25018310546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000654.npy"}
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 91.2408447265625, "std": 92.22754669189453, "min": -94.21287536621094, "p10": -12.627484893798828, "median": 81.02228164672852, "p90": 198.77205810546877, "max": 293.427734375, "pos_frac": 0.796875, "sample": [150.86936950683594, 152.73977661132812, -11.006057739257812, 5.939418792724609, 40.456146240234375, 125.90585327148438, 293.427734375, 142.1307373046875, 104.29474639892578, 3.63226318359375, -12.632293701171875, 165.59652709960938, 0.3354644775390625, -12.616264343261719, -2.5265045166015625, -21.916259765625, 38.121063232421875, 212.39865112304688, 195.94146728515625, 12.902896881103516, 199.98516845703125, -12.196060180664062, 179.80517578125, 278.8726806640625, 36.34235382080078, -17.59064483642578, 106.8070297241211, -69.50154113769531, 255.7650146484375, -56.39874267578125, 180.52130126953125, 114.46759033203125, 186.1566162109375, 44.78258514404297, 65.47395324707031, 21.512454986572266, 86.85650634765625, 175.7781524658203, 155.20510864257812, 236.65655517578125, 11.084381103515625, 37.950164794921875, -2.981109619140625, 225.25485229492188, 162.37120056152344, -58.42585754394531, 47.547515869140625, 138.84898376464844, 180.10256958007812, 187.15406799316406, -7.918422698974609, 60.30029296875, 188.91659545898438, 65.75439453125, 57.84234619140625, 115.52137756347656, 110.87384033203125, 75.18805694580078, 161.49098205566406, 36.85882568359375, -94.21287536621094, 72.57627868652344, 146.57296752929688, 167.44639587402344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000655.npy"}
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 69.22421264648438, "std": 102.52506256103516, "min": -207.5947723388672, "p10": -32.532582855224604, "median": 48.2989501953125, "p90": 208.58421783447267, "max": 242.91790771484375, "pos_frac": 0.78125, "sample": [175.76512145996094, 91.35706329345703, 165.601806640625, 180.28277587890625, 14.157611846923828, -5.5602569580078125, 1.5360755920410156, -6.8147125244140625, 38.828826904296875, -22.813255310058594, -6.106800079345703, 239.35989379882812, -14.259109497070312, 5.275634765625, 2.5384674072265625, 165.04627990722656, -207.5947723388672, -17.01555633544922, 136.42596435546875, 164.6251220703125, 226.76052856445312, 13.805946350097656, -165.6595001220703, 37.87370681762695, 214.28367614746094, -34.072078704833984, 93.49996185302734, 206.8824005126953, 0.1657257080078125, 154.6449737548828, 158.50897216796875, 1.0141372680664062, 54.351600646972656, 46.77091979980469, 64.15157318115234, 132.7344207763672, 40.58843994140625, 35.49132537841797, 209.31356811523438, 76.62025451660156, 67.49031066894531, -119.51194763183594, 27.37834930419922, 221.35545349121094, 38.95025634765625, 205.16653442382812, 83.4659423828125, 68.1374282836914, 33.309688568115234, 160.65322875976562, 193.50714111328125, -28.940425872802734, -34.6064453125, 216.90013122558594, 190.74227905273438, 15.684158325195312, 8.902862548828125, -132.7200469970703, -35.14311981201172, 40.740447998046875, 49.82698059082031, 54.01658630371094, 193.75877380371094, 242.91790771484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000656.npy"}
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 80.17169952392578, "std": 113.35909271240234, "min": -206.55612182617188, "p10": -22.446992492675772, "median": 83.47118759155273, "p90": 218.9834716796875, "max": 303.4898376464844, "pos_frac": 0.78125, "sample": [231.9053192138672, 16.682586669921875, 10.086414337158203, 303.4898376464844, 127.46824645996094, 13.251516342163086, 285.7794494628906, 195.50103759765625, 45.53325653076172, -177.73683166503906, 21.012195587158203, -5.605775833129883, 120.36833190917969, 30.963951110839844, 293.228271484375, 119.50788879394531, 77.38507843017578, 183.47349548339844, -149.3441925048828, 225.94168090820312, 12.821197509765625, 142.73526000976562, -163.1422119140625, 29.442766189575195, 72.88277435302734, 181.311767578125, 97.05740356445312, -12.037750244140625, -3.1461620330810547, 66.69449615478516, 123.28768920898438, 176.8842315673828, -26.908096313476562, 180.2249298095703, 102.50759887695312, 24.19317626953125, 156.52853393554688, 177.1187744140625, 219.22610473632812, 6.474298477172852, 276.1462707519531, 7.3388214111328125, -9.816513061523438, 218.41732788085938, 11.212112426757812, 94.25216674804688, 216.42556762695312, -3.450502395629883, -79.08451080322266, 89.55729675292969, 3.5599441528320312, -206.55612182617188, 156.89645385742188, -2.076559066772461, 30.592527389526367, 138.62472534179688, -53.352317810058594, 120.68421173095703, 98.54476928710938, 197.20506286621094, 8.120906829833984, 118.64786529541016, -11.083213806152344, 177.13401794433594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000657.npy"}
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 64.17313385009766, "std": 120.71430206298828, "min": -203.001708984375, "p10": -71.43320465087889, "median": 58.053890228271484, "p90": 215.1432067871094, "max": 348.72119140625, "pos_frac": 0.75, "sample": [81.94020080566406, 13.47171401977539, 209.5159454345703, -203.001708984375, 182.68954467773438, -3.7491397857666016, -179.90550231933594, 86.339111328125, 189.0657958984375, 219.70034790039062, -14.208465576171875, 175.03656005859375, -61.18017578125, 85.57113647460938, 204.64500427246094, 79.80348205566406, -61.482696533203125, 148.44203186035156, 348.72119140625, 6.875274658203125, -19.08286476135254, 38.990631103515625, 50.915122985839844, 145.02053833007812, 141.70919799804688, 203.31326293945312, -186.7984619140625, 144.88880920410156, 104.59674072265625, 24.620269775390625, 9.431882858276367, -1.7115898132324219, 62.704734802246094, 116.88223266601562, 2.0353622436523438, 56.11504364013672, 241.12985229492188, -182.59555053710938, 91.4228286743164, 50.581756591796875, 189.24496459960938, 4.5335693359375, 123.25437927246094, 221.16806030273438, -3.118711471557617, 91.36968231201172, -75.69770812988281, 210.64349365234375, 18.99109649658203, 70.8382568359375, 24.044601440429688, 2.170553207397461, -49.323097229003906, -128.45901489257812, 36.96460723876953, 142.4891357421875, 5.319377899169922, -187.6055908203125, 264.65789794921875, -4.33282470703125, 265.951416015625, 59.99273681640625, 217.0716552734375, 4.452398300170898], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000658.npy"}
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 50.49524688720703, "std": 109.20233917236328, "min": -200.78854370117188, "p10": -87.35944290161132, "median": 32.19138526916504, "p90": 195.80646209716798, "max": 256.0406799316406, "pos_frac": 0.671875, "sample": [28.230010986328125, 109.470703125, 202.29112243652344, 157.59030151367188, -2.9563140869140625, 256.0406799316406, 141.8353271484375, 112.19271087646484, 9.129594802856445, 203.48922729492188, 200.17010498046875, -7.0565948486328125, 23.658523559570312, 31.959339141845703, -0.3463478088378906, -5.729164123535156, 140.47927856445312, -158.3653564453125, 13.48835563659668, -42.70936584472656, 179.40985107421875, -71.57292938232422, 80.83604431152344, 53.3387451171875, 4.384443283081055, -26.84296417236328, 9.039369583129883, 32.423431396484375, -2.4855594635009766, 59.70085144042969, 37.88478088378906, 47.469749450683594, 194.1874237060547, 145.0083770751953, 97.46663665771484, -7.903831481933594, 196.50033569335938, 117.45343017578125, 191.6844482421875, 10.94668197631836, 38.79180908203125, -200.78854370117188, -94.12509155273438, 186.99560546875, -8.304483413696289, -1.831817626953125, -6.548187255859375, 175.47332763671875, 155.46456909179688, -64.38394927978516, 9.22021484375, 179.63980102539062, -189.29165649414062, -127.0801773071289, 206.0857391357422, 25.716690063476562, 11.700721740722656, -188.18759155273438, 222.69265747070312, 71.57904052734375, -31.050697326660156, 145.0806121826172, 77.60986328125, -124.55416870117188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000659.npy"}
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 79.76876068115234, "std": 103.27758026123047, "min": -174.3983154296875, "p10": -37.22953338623045, "median": 70.48786926269531, "p90": 201.3521240234375, "max": 446.589599609375, "pos_frac": 0.78125, "sample": [197.45175170898438, 80.1434555053711, 81.81809997558594, -71.2135238647461, 69.41891479492188, -90.31034088134766, -45.75286865234375, 192.72386169433594, -174.3983154296875, -15.914276123046875, 50.67033004760742, 207.72332763671875, 31.559833526611328, 122.74610900878906, 12.570281982421875, 277.419189453125, 184.36380004882812, 133.3353271484375, 22.76258659362793, -63.753753662109375, 119.17909240722656, 67.42669677734375, 39.29863739013672, 45.621681213378906, 105.01792907714844, 187.1009979248047, -12.682785034179688, 218.12960815429688, 89.57000732421875, -61.05029296875, 61.26319885253906, 153.7674560546875, 446.589599609375, 113.2772216796875, -4.609767913818359, -93.66523742675781, 15.769325256347656, -1.4121551513671875, 149.3284454345703, 83.87168884277344, 43.03119659423828, 5.708372116088867, 208.47817993164062, 157.4454345703125, 180.28553771972656, 187.56881713867188, -17.341751098632812, 26.892295837402344, 166.32334899902344, 125.49974822998047, -2.216236114501953, 234.4041748046875, 19.40367889404297, 4.062265396118164, 161.1083984375, 72.1075439453125, 71.55682373046875, 142.4380645751953, 2.752532958984375, -1.23834228515625, 34.931121826171875, 86.24598693847656, 67.57478332519531, 203.02371215820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000660.npy"}
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 52.053165435791016, "std": 116.07696533203125, "min": -257.2489013671875, "p10": -90.49703369140624, "median": 39.508140563964844, "p90": 193.28985748291018, "max": 310.514892578125, "pos_frac": 0.6875, "sample": [-14.864242553710938, -2.3005142211914062, -94.13262939453125, -153.08192443847656, 131.7738800048828, 151.6201629638672, 171.6111602783203, 3.2764892578125, -257.2489013671875, 178.57199096679688, -151.43673706054688, 12.765548706054688, -44.13584899902344, 188.29788208007812, 204.66806030273438, 119.00020599365234, 207.91287231445312, 104.0103759765625, 310.514892578125, 61.72662353515625, -187.03135681152344, 75.02037048339844, 147.09893798828125, -104.84420776367188, -4.918354034423828, 29.49920654296875, 31.525672912597656, -69.88114929199219, 1.6733474731445312, 3.7461929321289062, 45.91135025024414, 84.05223846435547, -166.56752014160156, 28.20917510986328, 175.7081298828125, -0.2929954528808594, 64.54631042480469, 195.4292755126953, 17.75027847290039, 174.5269012451172, 33.894134521484375, 45.12214660644531, -66.82585144042969, 171.06471252441406, 181.26235961914062, 91.88095092773438, 223.6944580078125, 23.5843448638916, 5.659811019897461, 112.95032501220703, -82.01397705078125, -3.8044204711914062, -19.886428833007812, -58.643089294433594, 60.83248519897461, 144.53012084960938, 86.74689483642578, 146.90704345703125, 84.50149536132812, 196.76052856445312, 16.495880126953125, -8.247344970703125, -24.906883239746094, 300.13165283203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.45-beta-0p8-20260429-085449/margin_logs/step_0000661.npy"}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:22dddd9b4bf59a58ac9754704862dc0b60abca7f1f9941029f73d8f470387557
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c4a0d8c26a315903fc2506660d8ac2eb82c1e4d9a761e6a7de89830e1a119f6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e8533585f97ad2d97016d5c2cc62c2fc33e841fbcb9b12197c0d62a9eea78c0a
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bbf4f36f52f14e583d712dc420a129210caf34397d751cd1a3272fcee26ee8c8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:63bf2f5d6dfa36324bd03f43294b3a0b141f7f3aa91f099c105fed4f4b814470
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c00771c370d154fb02173f4ce6154afb85256dd7ffcb34382c195652a54592e4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:257a631dd8d84d5fdecb9e9d4e8ffef373f2a1fad5ca1cc8ee6049a57c067899
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:617d7119beea56eb1b45cbaa15895ebae815fad798fcccefdba4f5cf1f23dbe6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6e0635c8c63777f3e5484ceb0db379878eff20e6fbed38935421cf2d2da35310
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:84329c14c1e4fa2d756b8b6447c83a183e11cbf7b21029eedb1698d72414ebcb
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:63c15262f867cc52362bfa372e45b225872a3c7a3bd9a95800a1d8763af2cc42
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0e524b904b1608c7b54c6291606df73f70876aefb585453a89eda3e240ccc6bf
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c2617db78367f49558e95fd8a64f5fadbe0504bb7ab803c1705e6541654b83a3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1beab8b962b0089fb3fc1ee631cfaf475658dd2fa67bc124fa0ab5d26afd88ad
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:872cf5365a900039735793d1d0a0567595774a49e011a8810a7dab8977180eba
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:058923479bb9261d2e9009306b66320e98eaeae5afd1f399cc84a0207bf1efca
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0fb1a363df6b2b2ab0182b2f0d0b88cdf93fdea1d5e9efd201f51d946c46e4ed
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e24ca94b374ab43da792a2060602c935b925aa10732667d85c6c292bf8a930fa
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3eab49c03c8f9eaab77ce6d7ee76caab77bb04e3d939e5da1195dc8cbd0d1596
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0ada5228c52575909589337b5227b046e69f46e87357ffad2b3c3f46a8fbfa86
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a18c06198665a4cb24cb2fda58a4d22c61fbbacc5753c7cd3d792061912d6816
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d373347f062d1d838dc87fb19205e5f129090e8aee4bc8a22bec0a56fd95a7e5
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:32326a3c8d1330ef0bdf76db8023a6f125959fa9195eb71ef9336f3535e1bdf2
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:042f7e13170251d740cad9262924e0e60b39093d1fc375565e3aa3df5c9b672c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3b5518b2b719346a5d79f06fe6504cdf151d264c58e92bf30cef9c927594d66a
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6a691019292c0f9d091b02987b115f57362e47bd77d154d98f297fabcd2e17b6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:950ec1a7eb798c9a6b8b244f7d965abf73895de160a7c12deedd07f44bcb60b4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cca8ebea6684550bdb7a444a1d0a7530f9bc8aa8b00b50db6e5ff039c89af791
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a700c7be34705fce516a3c3683dea50b7d2dd69f15c476d589f741b2cf3171da
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0169a74a3f456ace49eefb94f51ccc08f12004a4a24ed1a7f1bd609240a7df73
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4d5c53f74ed92cac67b4eb6880414dd1754a2b5a230750a7fcc52fdd5e9f3c0d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:53d1da4899ef0a1c9d485f40823c0f3787be76d3ae1aad914f14a161f205b4b4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:054cd2335ae540ee33e36651f062bc8853fca4d8830f580f47fc4181a66214cc
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1d76cd166eb0439905a5cc5dd38ad4c80ef8fe18d7bec82fab2c7ab41bfa0f65
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:76e16f7a384cdec76b1391adb7225c19769d7324fefdbf9f94354fe15962ef69
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ae22eacf4241de61268698cb917d2b38e68e16ef2488147fea84164a6ff6ad43
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:efcfad54e8dd0d0f947a98616928cd1a9df1a07c78cf509ac1327f9b5e68d38a
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:693eeca924ae92eae4e1813144377cc24af0b4fc13e3afc7dff403d2887ed5a6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2de87a06ecc041510b50bfe681a61180cb42bd2a660baa49c3cfd26ef407d291
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9ccbbfcdd6a5e0bf13a4d0e990aa6c41b7a4ee440117d7ce99da505ce6917598
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e180163fff5b3da1120590f6abd97c34cdc01a8f72cf3558485679dae6a9226f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fc7145754a47d81b19caa092e457caab13968defe7f480bcc11bd69b14734507
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:275a0b365472430e36964ed2bba2ad9952bd26815e1e2274f36a05a64d10c9d2
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ae491e86917e5b05772ad967dabba643731410a9dc7b689889cac2d2bf9926e5
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ed3731aef83b72a690537979a4f28556ccbb5df810f71083ac8a569f60fa5d89
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8e95b2a8efc8bfbadd0029c29d0b7e102edb969aa35ddcea103e3984f437eb37
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6dfb61c9889c9ce1347697cc58b014bfee7728c28ae99e5f6fffe8e248afdd70
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d610e38f6d9e0dd8ab3b7a5d9cd91a5e12564d58e41d418c1cfdf24f7b02ceeb
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3cb554b7b1a14e367d91bb8c556813ea662d95a1b0d6cb9f0ca5a867dbf380ec
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bc9381f1fe3ad18e818269da90cfbe29de742206b2f31d63a383d26ba15fd604
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:348f29092859b8455af5e44e71ce28a90e841e6a6105c75619e8c835267979ae
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:de8e018ce888ce75835f64c01eb19754817278ed6009a4d3bfcc504fb08c4df1
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2f2c6a2eb2e7369383d61ee4812d6b1d6afe189899df8a52498b6b69848549cc
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9977516a1afee28808681e4c2e4dfaa5ae84f075a5be8bf254b006cd981004cb
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1de29268810d32b332803368fc88a76443ea3e812f9eb67c31d532b877a6bba5
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a6344aff3d31777f139507d065384a32165d178f1325e0a46bbb2c5bc2a4d78c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:78095f0bd68c2c753a4fad98b66cedd56648dce074af817beb9e52b30991b590
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ec2c996deed237fcfdcde0c7cb9073a83dcf56a12be9e2b05718d8c0af2201fd
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5e37434d8460160c4d19fee38371abe9c3b03c17d6a98ae0f2142d1767d54055
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cd5924c12f4c0a22d3c148caa3c012d32727663a10da7eb44321112e3e9ff582
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6c9b9c878bbc50386663b37d92f3959c9d8d68d3858fb02db0796ac2e65cccdd
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:73da2ebb838e2f2c805385d916c77ff60b01e843553aeb7f39b8a5f1be5ed898
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d5b04fa75dbe9bb44dcd146a312d26522ba85f9fed3e21c8d773f2434d599546
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9dcd453158bda05e085e6519b81ca6b80ec66d703bdf807234938ddb8c400e1f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3f35bba9c13336c62a2e3818229fef185ec0634c1350fc2459a84a28a34f9219
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fef415dc11a6b5bb77f2515aaa8bdc8f650dbcef8e5768dfaf99de313fbe7440
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a460215c18356ad791866dcff66fa8c136a4ea61bc6f1192b927a6791317b07c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6fa5e8d41d80927cf608a273683a958060eec8dd4ede38be4b5241da31729189
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:513879f7e4b71378ab094adf21ef3432b48a5db4270a1860f6b131468bd2f282
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:904b5f37d1d3fcf53e97f0d5b9556a7d646dfc23ef995f38feb31d1e80727497
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ee22af28d089c762e0b3a81196f34da1a751552a1ef623eb2581fa9a7f7e1765
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1a6a16d33d6609875c78efa220e81cd48ce84e14a05d200bcb52a26730cc9e76
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9397cf6ff050d5febee0eedd1923f2d7419bc4e62c501f408273c443f0dc2cf7
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6dca4608cf9ca48a40c09d692a967861b6e1f9134ed85876f7b2a9f809b54c77
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:98d2e52060efe1b2f091b76f13d33d9e53d7ddf00a12ad223ac229d1856488d8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f0c4c10c583949ef753951a847ca61b82638bf27e5fcd1745653b70be6b7fe67
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eac1b8b5a6d2e2eb7ccaaa885838fa7fcb0a773e999b7cc52dd9c6ae86023c64
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e8e773d0805e5e23c0f31c2ae127dc834b67e37620d70d730573ce4d2650d687
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b93821a6f5767198567caa39095b17b8fd07f91eb9167f17690f27571005ce5a
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f4bf1f0945be88d573c9b89b8a4bf1e24d841756229ccef33ad46a3d69a35bc4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c63744745a5d6a26532bfd011f61eb0031b77c1c3ea65a4906e9a900de5ca912
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:53d566f768eb723a5d75956330c11b7f5d10db265f3e75b0e7636bcbbc20f715
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fbe107c24f46612f077487589d58f2d0ae0a6fdcb286e755095cc79ee4f28547
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bcc1aaf8023f6ba4aee5b7fb57cafbe8cb47953acefe830b4beafae09fbd7fa9
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:50aaabbc8957f7d7940086bf0b7675d66f8fdaa4edd0b3bcd8b7395ea9166a17
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6ff23de6df8d94cc6312fed4447590daa71455cfe712efaed4ead3720bb38f9c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:997c3640c77fce9d0a1cf45860f30b396f8d601a2338e873e91bbe26ab26f784
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8a593ddda189063672bd4b30e9a3af88736f867967a3474cace4763db1300ae9
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:633036c28142ec1b07fd3456b48621bed8368899b4b411fd7c433d548fd05c09
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c80514fac75fe2faae3233eeaefe857b43e00cd942403f1eced0a9995074f893
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e469ee3fd336be741f5760486f0e471cff5975f426a18ddd08ba8b5cdec93fb9
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a131a752aeb2a3dbfa8d09f11861de56db72847ff23ad94e1ca0231ec6886fac
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:de1ed1813aeb70aada9e336e9b06f3e4785e4222ad514dbe3d6cc36daa0529c0
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cda4c26b4e10c5a2fa2dcd38cde5baab113e31750da3bd950044c7cdaaf01f37
size 384

Some files were not shown because too many files have changed in this diff Show More